xenopict.alignment module

Functions for aligning 2D molecular depictions using RDKit.

This module provides methods for aligning molecules in 2D space to create consistent molecular depictions. The primary entrypoint is the auto_align_molecules() function, which can align multiple molecules simultaneously while ensuring optimal and consistent alignments across the entire set.

Primary Usage:
>>> from rdkit import Chem
>>> # Create some molecules to align
>>> ethanol = Chem.MolFromSmiles("CCO")
>>> methanol = Chem.MolFromSmiles("CO")
>>> propanol = Chem.MolFromSmiles("CCCO")
>>> # Align all molecules (automatically aligns OH groups)
>>> aligned = auto_align_molecules([ethanol, methanol, propanol])

For more control, you can provide hints to force specific alignments: >>> # Create a hint to align specific atoms >>> hint = Alignment.from_aligned_atoms(ethanol, methanol, [(0, 0)]) # align terminal carbons >>> aligned = auto_align_molecules([ethanol, methanol, propanol], hints=[hint])

The module also provides several lower-level alignment methods through the Alignment class:

  1. Maximum Common Substructure (MCS): - Automatically finds the largest common substructure between molecules - Best for general alignment when molecules share structural features - Example: aligning similar molecules like ethanol and methanol

  2. Explicit Atom Pairs: - Manual specification of which atoms should be aligned - Best when you need precise control over the alignment - Example: forcing specific functional groups to align

  3. Atom Map IDs: - Uses atom map numbers to determine alignment - Best for reaction-based alignments where atoms are already mapped - Example: aligning reactants and products in a reaction

  4. Atom Indices: - Uses direct atom index mappings - Best when working with known atom indices - Example: aligning based on atom ordering

These lower-level methods are wrapped in convenience functions (align_from_mcs, align_from_atom_pairs, etc.) but most users should prefer auto_align_molecules() as it handles the complexity of finding optimal alignments across multiple molecules.

Examples

Here are some common use cases:

1. Basic usage - align multiple molecules: >>> # Align a set of alcohols by their OH groups >>> molecules = [ethanol, methanol, propanol] >>> aligned = auto_align_molecules(molecules)

2. Using hints for custom alignments: >>> # Force ethanol’s CH3 to align with methanol’s OH >>> unusual_pairs = [(0, 1)] # ethanol CH3 to methanol OH >>> hint = Alignment.from_aligned_atoms(ethanol, methanol, unusual_pairs) >>> aligned = auto_align_molecules(molecules, hints=[hint])

3. Direct use of lower-level methods (if needed): >>> # Align two molecules using MCS >>> aligned = align_from_mcs(ethanol, methanol)

See individual class and function documentation for more detailed examples and usage.

class xenopict.alignment.Alignment(atom_pairs: list[tuple[int, int]], score: float, source_mol: Mol, template_mol: Mol)[source]

Bases: NamedTuple

A class representing a 2D alignment between two molecules.

This class stores information about how two molecules should be aligned in 2D space. It includes the atom pairs that should be matched between molecules, a score indicating alignment quality, and references to both molecules.

The class provides several factory methods for creating alignments: - from_aligned_atoms: Create from explicit atom pairs - from_mcs: Create using maximum common substructure - from_mapids: Create using atom map IDs - from_indices: Create using atom index mappings

atom_pairs

List of (source_idx, template_idx) tuples specifying matched atoms

score

Numerical score indicating alignment quality (higher is better)

source_mol

The molecule being aligned

template_mol

The template molecule being aligned to

apply()[source]

Apply the alignment to update the source molecule’s coordinates.

This method modifies the source molecule in place by: 1. Ensuring the template molecule has 2D coordinates 2. Using RDKit’s depiction generation to align the source to the template 3. If no atom pairs exist, just ensures source has 2D coordinates

Return type:

Alignment

Returns:

Self, allowing for method chaining

Examples

Create and apply an alignment between ethanol and methanol:

>>> from rdkit import Chem
>>> source = Chem.MolFromSmiles("CCO")  # indices: 0=C, 1=C, 2=O
>>> template = Chem.MolFromSmiles("CO")  # indices: 0=C, 1=O
>>> alignment = Alignment.from_aligned_atoms(source, template, [(1, 0), (2, 1)])
>>> applied = alignment.apply()
>>> applied == alignment  # Returns self for chaining
True

Verify coordinates were updated:

>>> source_O = GetCoords(source, 2)
>>> template_O = GetCoords(template, 1)
>>> abs(source_O[0] - template_O[0]) < 0.1
True
>>> abs(source_O[1] - template_O[1]) < 0.1
True

Empty alignments just ensure 2D coordinates exist:

>>> empty = Alignment([], 0.0, source, template)
>>> empty.apply()  # No error raised
Alignment(atom_pairs=[], score=0.0, source_mol=...)
>>> source.GetNumConformers() > 0  # Has coordinates
True
atom_pairs: list[tuple[int, int]]

Alias for field number 0

static from_aligned_atoms(source_mol, template_mol, atom_pairs)[source]

Create an alignment from explicit atom pairs.

This is the most basic factory method that creates an alignment directly from a list of atom pairs. The alignment score is set to 0 since the pairs are manually specified rather than discovered algorithmically.

Parameters:
  • source_mol (Mol) – The molecule to align

  • template_mol (Mol) – The template molecule to align to

  • atom_pairs (list[tuple[int, int]]) – List of (source_idx, template_idx) pairs specifying matched atoms

Return type:

Alignment

Returns:

A new Alignment object with the specified atom pairs

Raises:

AssertionError – If any atom indices are invalid

static from_indices(source_mol, template_mol, index_mapping)[source]

Align a molecule to a template molecule using atom index mappings.

This function aligns a molecule by mapping atom indices between the source and template molecules. The mapping can be provided either as a list or dictionary. When using a list, the index in the list corresponds to the source atom index, and the value is the template atom index. When using a dictionary, the keys are source atom indices and values are template atom indices.

Parameters:
  • source_mol (Mol) – The molecule to align

  • template_mol (Mol) – The template molecule to align to

  • index_mapping (Union[list[int], dict[int, int]]) – Either a list or dictionary mapping source atom indices to template indices. For list format, -1 indicates no mapping for that source atom.

Return type:

Alignment

Returns:

An Alignment object

static from_mapids(source_mol, template_mol, source_map, template_map)[source]

Create alignment using atom map IDs.

Parameters:
  • source_mol (Mol) – The source molecule to align

  • template_mol (Mol) – The template molecule to align to

  • source_map (Mol) – The source molecule with atom map IDs

  • template_map (Mol) – The template molecule with atom map IDs

Return type:

Alignment

Returns:

An Alignment object

Raises:

ValueError – If no atom maps are found in either molecule or no matching map IDs exist

static from_mcs(source_mol, template_mol)[source]

Create alignment using maximum common substructure (MCS) matching.

Uses both exact and inexact MCS matching to find the best alignment between molecules. The inexact matching allows for different bond types, which can help find alignments when bond orders differ between molecules.

The alignment score is computed as the average of: - Number of atoms in exact MCS match - Number of bonds in exact MCS match - Number of atoms in inexact MCS match - Number of bonds in inexact MCS match

Parameters:
  • source_mol (Mol) – The molecule to align

  • template_mol (Mol) – The template molecule to align to

Return type:

Alignment

Returns:

A new Alignment object based on MCS matching

reverse()[source]

Create a new alignment with source and template molecules swapped.

This method creates a new alignment where: - The source molecule becomes the template - The template molecule becomes the source - The atom pairs are reversed (b,a) instead of (a,b) - The score remains the same

Return type:

Alignment

Returns:

A new Alignment object with source and template swapped

Examples

Create and reverse an alignment between ethanol and methanol:

>>> from rdkit import Chem
>>> source = Chem.MolFromSmiles("CCO")  # indices: 0=C, 1=C, 2=O
>>> template = Chem.MolFromSmiles("CO")  # indices: 0=C, 1=O
>>> alignment = Alignment.from_aligned_atoms(source, template, [(1, 0), (2, 1)])
>>> reversed_align = alignment.reverse()
>>> reversed_align.source_mol == template
True
>>> reversed_align.template_mol == source
True
>>> reversed_align.atom_pairs == [(0, 1), (1, 2)]
True
>>> reversed_align.score == alignment.score
True
score: float

Alias for field number 1

source_mol: Mol

Alias for field number 2

template_mol: Mol

Alias for field number 3

validate()[source]

Validate that all atom indices in the alignment are valid.

This method checks that: - All atom indices are non-negative - All atom indices are within bounds for their respective molecules - No duplicate atom indices exist

Return type:

Alignment

Returns:

Self, allowing for method chaining

Raises:

AssertionError – If any validation checks fail

Examples

Create and validate a valid alignment:

>>> from rdkit import Chem
>>> source = Chem.MolFromSmiles("CCO")  # indices: 0=C, 1=C, 2=O
>>> template = Chem.MolFromSmiles("CO")  # indices: 0=C, 1=O
>>> alignment = Alignment([(1, 0), (2, 1)], 1.0, source, template)
>>> validated = alignment.validate()  # No error raised
>>> validated == alignment
True

Try to validate an alignment with invalid indices:

>>> bad_alignment = Alignment([(-1, 0)], 1.0, source, template)
>>> bad_alignment.validate()  
Traceback (most recent call last):
    ...
AssertionError: Negative source molecule atom index -1
>>> bad_alignment = Alignment([(3, 0)], 1.0, source, template)
>>> bad_alignment.validate()  
Traceback (most recent call last):
    ...
AssertionError: Source molecule atom index 3 out of range (max 2)
xenopict.alignment.GetCoords(mol, i)[source]
Return type:

tuple[float, float]

xenopict.alignment.align_from_atom_pairs(source_mol, template_mol, atom_pairs)[source]

Align a molecule to a template molecule using explicit atom pairs.

This function aligns a molecule by matching specific pairs of atoms between the source and template molecules. This is useful when you want precise control over which atoms should be aligned, rather than letting the algorithm find matches automatically.

The function modifies the source molecule in place by updating its 2D coordinates to match the template molecule’s orientation.

Parameters:
  • source_mol (Mol) – The molecule to align

  • template_mol (Mol) – The template molecule to align to

  • atom_pairs (List[Tuple[int, int]]) – List of (source_idx, template_idx) pairs specifying which atoms to match

Return type:

Mol

Returns:

The source molecule, modified in place with new 2D coordinates

Examples

First, let’s align ethanol to methanol by explicitly matching the OH group:

>>> from rdkit import Chem
>>> source = Chem.MolFromSmiles("CCO")
>>> template = Chem.MolFromSmiles("CO")

Align using explicit atom pairs - match oxygen and its attached carbon:

>>> atom_pairs = [(2, 1), (1, 0)]  # (source O, template O), (source C, template C)
>>> aligned = align_from_atom_pairs(source, template, atom_pairs)

Get coordinates after alignment:

>>> template_O = GetCoords(template, 1)
>>> template_C = GetCoords(template, 0)
>>> source_O = GetCoords(source, 2)
>>> source_C = GetCoords(source, 1)

Verify that the matched atoms now have the same coordinates:

>>> abs(source_O[0] - template_O[0]) < 0.1
True
>>> abs(source_O[1] - template_O[1]) < 0.1
True
>>> abs(source_C[0] - template_C[0]) < 0.1
True
>>> abs(source_C[1] - template_C[1]) < 0.1
True
xenopict.alignment.align_from_indices(source_mol, template_mol, index_mapping)[source]

Align a molecule to a template molecule using atom index mappings.

This function aligns a molecule by mapping atom indices between the source and template molecules. The mapping can be provided either as a list or dictionary. When using a list, the index in the list corresponds to the source atom index, and the value is the template atom index. When using a dictionary, the keys are source atom indices and values are template atom indices.

The function modifies the source molecule in place by updating its 2D coordinates to match the template molecule’s orientation.

Parameters:
  • source_mol (Mol) – The molecule to align

  • template_mol (Mol) – The template molecule to align to

  • index_mapping (Union[list[int], dict[int, int]]) – Either a list or dictionary mapping source atom indices to template indices. For list format, -1 indicates no mapping for that source atom.

Return type:

Mol

Returns:

The source molecule, modified in place with new 2D coordinates

Examples

First, let’s align ethanol to methanol using a list mapping:

>>> from rdkit import Chem
>>> source = Chem.MolFromSmiles("CCO")  # indices: 0=C, 1=C, 2=O
>>> template = Chem.MolFromSmiles("CO")  # indices: 0=C, 1=O

Align using a list mapping - match OH group:

>>> mapping = [-1, 0, 1]  # Source C1->Template C0, Source O2->Template O1
>>> aligned = align_from_indices(source, template, mapping)

Get coordinates after alignment:

>>> template_O = GetCoords(template, 1)
>>> template_C = GetCoords(template, 0)
>>> source_O = GetCoords(source, 2)
>>> source_C = GetCoords(source, 1)

Verify that the matched atoms have the same coordinates:

>>> abs(source_O[0] - template_O[0]) < 0.1
True
>>> abs(source_O[1] - template_O[1]) < 0.1
True
>>> abs(source_C[0] - template_C[0]) < 0.1
True
>>> abs(source_C[1] - template_C[1]) < 0.1
True

We can also use a dictionary mapping:

>>> mapping = {1: 0, 2: 1}  # Same mapping but as a dict
>>> aligned = align_from_indices(source, template, mapping)

The alignment results in the same coordinate matches:

>>> source_O = GetCoords(source, 2)
>>> source_C = GetCoords(source, 1)
>>> abs(source_O[0] - template_O[0]) < 0.1
True
>>> abs(source_C[0] - template_C[0]) < 0.1
True
xenopict.alignment.align_from_mapids(source_mol, template_mol, source_map, template_map)[source]

Align a molecule to a template using atom map IDs.

This function aligns a molecule by matching atoms with corresponding map IDs between the source and template molecules. Map IDs are integers assigned to atoms that help establish correspondence between different molecules. This is particularly useful when you want to align molecules based on atom-to-atom mapping from a reaction or transformation.

The function modifies the source molecule in place by updating its 2D coordinates to match the template molecule’s orientation.

Parameters:
  • source_mol (Mol) – The molecule to align

  • template_mol (Mol) – The template molecule to align to

  • source_map (Mol) – The source molecule with atom map IDs

  • template_map (Mol) – The template molecule with atom map IDs

Return type:

Mol

Returns:

The source molecule, modified in place with new 2D coordinates

Raises:

ValueError – If no atom maps are found in either molecule or no matching map IDs exist

Examples

Let’s align ethanol to methanol using atom map IDs:

>>> from rdkit import Chem
>>> source = Chem.MolFromSmiles("CCO")
>>> template = Chem.MolFromSmiles("CO")
>>> source_map = Chem.MolFromSmiles("[CH3:1][CH2:2][OH:3]")
>>> template_map = Chem.MolFromSmiles("[CH3:2][OH:3]")

Perform the alignment:

>>> aligned = align_from_mapids(source, template, source_map, template_map)

Get coordinates after alignment:

>>> template_O = GetCoords(template, 1)
>>> template_C = GetCoords(template, 0)
>>> source_O = GetCoords(source, 2)
>>> source_C = GetCoords(source, 1)

Verify that the matched atoms have the same coordinates:

>>> abs(source_O[0] - template_O[0]) < 0.1
True
>>> abs(source_O[1] - template_O[1]) < 0.1
True
>>> abs(source_C[0] - template_C[0]) < 0.1
True
>>> abs(source_C[1] - template_C[1]) < 0.1
True

The function raises an error when no atom maps are found:

>>> bad_map = Chem.MolFromSmiles("CCO")  # No atom maps
>>> align_from_mapids(source, template, bad_map, template_map)  
Traceback (most recent call last):
    ...
ValueError: No atom maps found in source molecule
xenopict.alignment.align_from_mcs(source_mol, template_mol)[source]

Align a molecule to a template molecule using maximum common substructure (MCS).

This function automatically aligns a molecule to a template by finding their maximum common substructure and using it to determine the alignment. It uses both exact and inexact matching to find the best possible alignment, where inexact matching allows for different bond types.

The function modifies the source molecule in place by updating its 2D coordinates to match the template molecule’s orientation.

Parameters:
  • source_mol (Mol) – The molecule to align

  • template_mol (Mol) – The template molecule to align to

Return type:

Mol

Returns:

The source molecule, modified in place with new 2D coordinates

Raises:

ValueError – If no alignment could be found between the molecules

Examples

First, let’s align ethanol to methanol, matching the OH group:

>>> from rdkit import Chem
>>> source = Chem.MolFromSmiles("CCO")
>>> template = Chem.MolFromSmiles("CO")

Perform the alignment:

>>> aligned = align_from_mcs(source, template)

Ensure get template and source coordinates:

>>> template_O = GetCoords(template, 1)  # Oxygen is at index 1
>>> template_C = GetCoords(template, 0)  # Carbon is at index 0
>>> source_O = GetCoords(source, 2)  # Oxygen is at index 2
>>> source_C = GetCoords(source, 1)  # Matching carbon is at index 1

Verify that the OH group coordinates match between molecules:

>>> abs(source_O[0] - template_O[0]) < 0.1
True
>>> abs(source_O[1] - template_O[1]) < 0.1
True
>>> abs(source_C[0] - template_C[0]) < 0.1
True
>>> abs(source_C[1] - template_C[1]) < 0.1
True

The function raises an error when no alignment is possible:

>>> source = Chem.MolFromSmiles("c1ccccc1")
>>> template = Chem.MolFromSmiles("O")
>>> align_from_mcs(source, template)  
Traceback (most recent call last):
    ...
ValueError: No alignment found
xenopict.alignment.auto_align_molecules(molecules, hints=None)[source]

Automatically align multiple molecules by finding an optimal alignment tree.

This function aligns a list of molecules by constructing a maximum spanning tree of alignments. Each edge in the tree represents an alignment between two molecules, with the edge weight being the alignment score. The tree structure ensures that all molecules are connected through a series of high-quality alignments.

The function can also use “hints” - predefined alignments that should be preferred over automatically discovered ones. This is useful when you want to enforce certain alignments while letting the algorithm figure out the rest.

The algorithm works by: 1. Building a complete graph where nodes are molecules and edges are alignments 2. Finding a maximum spanning tree to get the best set of alignments 3. Using the largest molecule as the root of the tree 4. Applying alignments in breadth-first order from the root

Parameters:
  • molecules (list[Mol]) – List of molecules to align

  • hints (Optional[list[Alignment]]) – Optional list of preferred Alignment objects. These alignments get a very high weight (score + 1000) to ensure they are used when possible.

Return type:

list[Mol]

Returns:

The input molecules, modified in place with new 2D coordinates

Examples

Let’s align three molecules - ethanol, methanol, and propanol:

>>> from rdkit import Chem
>>> ethanol = Chem.MolFromSmiles("CCO")
>>> methanol = Chem.MolFromSmiles("CO")
>>> propanol = Chem.MolFromSmiles("CCCO")

Align all molecules:

>>> aligned = auto_align_molecules([ethanol, methanol, propanol])
>>> len(aligned) == 3
True

We can verify the OH groups are aligned by checking coordinates:

>>> # Get OH coordinates for each molecule
>>> eth_O = GetCoords(ethanol, 2)  # O is at index 2
>>> meth_O = GetCoords(methanol, 1)  # O is at index 1
>>> prop_O = GetCoords(propanol, 3)  # O is at index 3
>>> # Check that OH groups are aligned
>>> abs(eth_O[0] - meth_O[0]) < 0.1
True
>>> abs(eth_O[1] - meth_O[1]) < 0.1
True
>>> abs(prop_O[0] - meth_O[0]) < 0.1
True
>>> abs(prop_O[1] - meth_O[1]) < 0.1
True

Now let’s provide a hint that aligns ethanol differently - matching the terminal carbon of ethanol to the oxygen of methanol (an unusual alignment):

>>> # Create a hint with unusual atom pairs
>>> unusual_pairs = [(0, 1)]  # Match ethanol's CH3 to methanol's OH
>>> hint = Alignment.from_aligned_atoms(ethanol, methanol, unusual_pairs)
>>> aligned = auto_align_molecules([ethanol, methanol, propanol], hints=[hint])
>>> len(aligned) == 3
True

Verify that our unusual hint was used - ethanol’s CH3 should be where methanol’s OH was:

>>> # Get coordinates after hint-based alignment
>>> eth_CH3 = GetCoords(ethanol, 0)  # Terminal carbon in ethanol
>>> meth_OH = GetCoords(methanol, 1)  # Oxygen in methanol
>>> # The hint should force these atoms to align
>>> abs(eth_CH3[0] - meth_OH[0]) < 0.1
True
>>> abs(eth_CH3[1] - meth_OH[1]) < 0.1
True