xenopict.alignment module
Functions for aligning 2D molecular depictions using RDKit.
This module provides methods for aligning molecules in 2D space to create consistent molecular depictions. The primary entrypoint is the auto_align_molecules() function, which can align multiple molecules simultaneously while ensuring optimal and consistent alignments across the entire set.
- Primary Usage:
>>> from rdkit import Chem >>> # Create some molecules to align >>> ethanol = Chem.MolFromSmiles("CCO") >>> methanol = Chem.MolFromSmiles("CO") >>> propanol = Chem.MolFromSmiles("CCCO") >>> # Align all molecules (automatically aligns OH groups) >>> aligned = auto_align_molecules([ethanol, methanol, propanol])
For more control, you can provide hints to force specific alignments: >>> # Create a hint to align specific atoms >>> hint = Alignment.from_aligned_atoms(ethanol, methanol, [(0, 0)]) # align terminal carbons >>> aligned = auto_align_molecules([ethanol, methanol, propanol], hints=[hint])
The module also provides several lower-level alignment methods through the Alignment class:
Maximum Common Substructure (MCS): - Automatically finds the largest common substructure between molecules - Best for general alignment when molecules share structural features - Example: aligning similar molecules like ethanol and methanol
Explicit Atom Pairs: - Manual specification of which atoms should be aligned - Best when you need precise control over the alignment - Example: forcing specific functional groups to align
Atom Map IDs: - Uses atom map numbers to determine alignment - Best for reaction-based alignments where atoms are already mapped - Example: aligning reactants and products in a reaction
Atom Indices: - Uses direct atom index mappings - Best when working with known atom indices - Example: aligning based on atom ordering
These lower-level methods are wrapped in convenience functions (align_from_mcs, align_from_atom_pairs, etc.) but most users should prefer auto_align_molecules() as it handles the complexity of finding optimal alignments across multiple molecules.
Examples
Here are some common use cases:
1. Basic usage - align multiple molecules: >>> # Align a set of alcohols by their OH groups >>> molecules = [ethanol, methanol, propanol] >>> aligned = auto_align_molecules(molecules)
2. Using hints for custom alignments: >>> # Force ethanol’s CH3 to align with methanol’s OH >>> unusual_pairs = [(0, 1)] # ethanol CH3 to methanol OH >>> hint = Alignment.from_aligned_atoms(ethanol, methanol, unusual_pairs) >>> aligned = auto_align_molecules(molecules, hints=[hint])
3. Direct use of lower-level methods (if needed): >>> # Align two molecules using MCS >>> aligned = align_from_mcs(ethanol, methanol)
See individual class and function documentation for more detailed examples and usage.
- class xenopict.alignment.Alignment(atom_pairs: list[tuple[int, int]], score: float, source_mol: Mol, template_mol: Mol)[source]
Bases:
NamedTupleA class representing a 2D alignment between two molecules.
This class stores information about how two molecules should be aligned in 2D space. It includes the atom pairs that should be matched between molecules, a score indicating alignment quality, and references to both molecules.
The class provides several factory methods for creating alignments: - from_aligned_atoms: Create from explicit atom pairs - from_mcs: Create using maximum common substructure - from_mapids: Create using atom map IDs - from_indices: Create using atom index mappings
- atom_pairs
List of (source_idx, template_idx) tuples specifying matched atoms
- score
Numerical score indicating alignment quality (higher is better)
- source_mol
The molecule being aligned
- template_mol
The template molecule being aligned to
- apply()[source]
Apply the alignment to update the source molecule’s coordinates.
This method modifies the source molecule in place by: 1. Ensuring the template molecule has 2D coordinates 2. Using RDKit’s depiction generation to align the source to the template 3. If no atom pairs exist, just ensures source has 2D coordinates
- Return type:
Alignment- Returns:
Self, allowing for method chaining
Examples
Create and apply an alignment between ethanol and methanol:
>>> from rdkit import Chem >>> source = Chem.MolFromSmiles("CCO") # indices: 0=C, 1=C, 2=O >>> template = Chem.MolFromSmiles("CO") # indices: 0=C, 1=O >>> alignment = Alignment.from_aligned_atoms(source, template, [(1, 0), (2, 1)]) >>> applied = alignment.apply() >>> applied == alignment # Returns self for chaining True
Verify coordinates were updated:
>>> source_O = GetCoords(source, 2) >>> template_O = GetCoords(template, 1) >>> abs(source_O[0] - template_O[0]) < 0.1 True >>> abs(source_O[1] - template_O[1]) < 0.1 True
Empty alignments just ensure 2D coordinates exist:
>>> empty = Alignment([], 0.0, source, template) >>> empty.apply() # No error raised Alignment(atom_pairs=[], score=0.0, source_mol=...) >>> source.GetNumConformers() > 0 # Has coordinates True
-
atom_pairs:
list[tuple[int,int]] Alias for field number 0
- static from_aligned_atoms(source_mol, template_mol, atom_pairs)[source]
Create an alignment from explicit atom pairs.
This is the most basic factory method that creates an alignment directly from a list of atom pairs. The alignment score is set to 0 since the pairs are manually specified rather than discovered algorithmically.
- Parameters:
source_mol (
Mol) – The molecule to aligntemplate_mol (
Mol) – The template molecule to align toatom_pairs (
list[tuple[int,int]]) – List of (source_idx, template_idx) pairs specifying matched atoms
- Return type:
Alignment- Returns:
A new Alignment object with the specified atom pairs
- Raises:
AssertionError – If any atom indices are invalid
- static from_indices(source_mol, template_mol, index_mapping)[source]
Align a molecule to a template molecule using atom index mappings.
This function aligns a molecule by mapping atom indices between the source and template molecules. The mapping can be provided either as a list or dictionary. When using a list, the index in the list corresponds to the source atom index, and the value is the template atom index. When using a dictionary, the keys are source atom indices and values are template atom indices.
- Parameters:
source_mol (
Mol) – The molecule to aligntemplate_mol (
Mol) – The template molecule to align toindex_mapping (
Union[list[int],dict[int,int]]) – Either a list or dictionary mapping source atom indices to template indices. For list format, -1 indicates no mapping for that source atom.
- Return type:
Alignment- Returns:
An Alignment object
- static from_mapids(source_mol, template_mol, source_map, template_map)[source]
Create alignment using atom map IDs.
- Parameters:
source_mol (
Mol) – The source molecule to aligntemplate_mol (
Mol) – The template molecule to align tosource_map (
Mol) – The source molecule with atom map IDstemplate_map (
Mol) – The template molecule with atom map IDs
- Return type:
Alignment- Returns:
An Alignment object
- Raises:
ValueError – If no atom maps are found in either molecule or no matching map IDs exist
- static from_mcs(source_mol, template_mol)[source]
Create alignment using maximum common substructure (MCS) matching.
Uses both exact and inexact MCS matching to find the best alignment between molecules. The inexact matching allows for different bond types, which can help find alignments when bond orders differ between molecules.
The alignment score is computed as the average of: - Number of atoms in exact MCS match - Number of bonds in exact MCS match - Number of atoms in inexact MCS match - Number of bonds in inexact MCS match
- Parameters:
source_mol (
Mol) – The molecule to aligntemplate_mol (
Mol) – The template molecule to align to
- Return type:
Alignment- Returns:
A new Alignment object based on MCS matching
- reverse()[source]
Create a new alignment with source and template molecules swapped.
This method creates a new alignment where: - The source molecule becomes the template - The template molecule becomes the source - The atom pairs are reversed (b,a) instead of (a,b) - The score remains the same
- Return type:
Alignment- Returns:
A new Alignment object with source and template swapped
Examples
Create and reverse an alignment between ethanol and methanol:
>>> from rdkit import Chem >>> source = Chem.MolFromSmiles("CCO") # indices: 0=C, 1=C, 2=O >>> template = Chem.MolFromSmiles("CO") # indices: 0=C, 1=O >>> alignment = Alignment.from_aligned_atoms(source, template, [(1, 0), (2, 1)]) >>> reversed_align = alignment.reverse() >>> reversed_align.source_mol == template True >>> reversed_align.template_mol == source True >>> reversed_align.atom_pairs == [(0, 1), (1, 2)] True >>> reversed_align.score == alignment.score True
-
score:
float Alias for field number 1
-
source_mol:
Mol Alias for field number 2
-
template_mol:
Mol Alias for field number 3
- validate()[source]
Validate that all atom indices in the alignment are valid.
This method checks that: - All atom indices are non-negative - All atom indices are within bounds for their respective molecules - No duplicate atom indices exist
- Return type:
Alignment- Returns:
Self, allowing for method chaining
- Raises:
AssertionError – If any validation checks fail
Examples
Create and validate a valid alignment:
>>> from rdkit import Chem >>> source = Chem.MolFromSmiles("CCO") # indices: 0=C, 1=C, 2=O >>> template = Chem.MolFromSmiles("CO") # indices: 0=C, 1=O >>> alignment = Alignment([(1, 0), (2, 1)], 1.0, source, template) >>> validated = alignment.validate() # No error raised >>> validated == alignment True
Try to validate an alignment with invalid indices:
>>> bad_alignment = Alignment([(-1, 0)], 1.0, source, template) >>> bad_alignment.validate() Traceback (most recent call last): ... AssertionError: Negative source molecule atom index -1
>>> bad_alignment = Alignment([(3, 0)], 1.0, source, template) >>> bad_alignment.validate() Traceback (most recent call last): ... AssertionError: Source molecule atom index 3 out of range (max 2)
- xenopict.alignment.GetCoords(mol, i)[source]
- Return type:
tuple[float,float]
- xenopict.alignment.align_from_atom_pairs(source_mol, template_mol, atom_pairs)[source]
Align a molecule to a template molecule using explicit atom pairs.
This function aligns a molecule by matching specific pairs of atoms between the source and template molecules. This is useful when you want precise control over which atoms should be aligned, rather than letting the algorithm find matches automatically.
The function modifies the source molecule in place by updating its 2D coordinates to match the template molecule’s orientation.
- Parameters:
source_mol (
Mol) – The molecule to aligntemplate_mol (
Mol) – The template molecule to align toatom_pairs (
List[Tuple[int,int]]) – List of (source_idx, template_idx) pairs specifying which atoms to match
- Return type:
Mol- Returns:
The source molecule, modified in place with new 2D coordinates
Examples
First, let’s align ethanol to methanol by explicitly matching the OH group:
>>> from rdkit import Chem >>> source = Chem.MolFromSmiles("CCO") >>> template = Chem.MolFromSmiles("CO")
Align using explicit atom pairs - match oxygen and its attached carbon:
>>> atom_pairs = [(2, 1), (1, 0)] # (source O, template O), (source C, template C) >>> aligned = align_from_atom_pairs(source, template, atom_pairs)
Get coordinates after alignment:
>>> template_O = GetCoords(template, 1) >>> template_C = GetCoords(template, 0) >>> source_O = GetCoords(source, 2) >>> source_C = GetCoords(source, 1)
Verify that the matched atoms now have the same coordinates:
>>> abs(source_O[0] - template_O[0]) < 0.1 True >>> abs(source_O[1] - template_O[1]) < 0.1 True >>> abs(source_C[0] - template_C[0]) < 0.1 True >>> abs(source_C[1] - template_C[1]) < 0.1 True
- xenopict.alignment.align_from_indices(source_mol, template_mol, index_mapping)[source]
Align a molecule to a template molecule using atom index mappings.
This function aligns a molecule by mapping atom indices between the source and template molecules. The mapping can be provided either as a list or dictionary. When using a list, the index in the list corresponds to the source atom index, and the value is the template atom index. When using a dictionary, the keys are source atom indices and values are template atom indices.
The function modifies the source molecule in place by updating its 2D coordinates to match the template molecule’s orientation.
- Parameters:
source_mol (
Mol) – The molecule to aligntemplate_mol (
Mol) – The template molecule to align toindex_mapping (
Union[list[int],dict[int,int]]) – Either a list or dictionary mapping source atom indices to template indices. For list format, -1 indicates no mapping for that source atom.
- Return type:
Mol- Returns:
The source molecule, modified in place with new 2D coordinates
Examples
First, let’s align ethanol to methanol using a list mapping:
>>> from rdkit import Chem >>> source = Chem.MolFromSmiles("CCO") # indices: 0=C, 1=C, 2=O >>> template = Chem.MolFromSmiles("CO") # indices: 0=C, 1=O
Align using a list mapping - match OH group:
>>> mapping = [-1, 0, 1] # Source C1->Template C0, Source O2->Template O1 >>> aligned = align_from_indices(source, template, mapping)
Get coordinates after alignment:
>>> template_O = GetCoords(template, 1) >>> template_C = GetCoords(template, 0) >>> source_O = GetCoords(source, 2) >>> source_C = GetCoords(source, 1)
Verify that the matched atoms have the same coordinates:
>>> abs(source_O[0] - template_O[0]) < 0.1 True >>> abs(source_O[1] - template_O[1]) < 0.1 True >>> abs(source_C[0] - template_C[0]) < 0.1 True >>> abs(source_C[1] - template_C[1]) < 0.1 True
We can also use a dictionary mapping:
>>> mapping = {1: 0, 2: 1} # Same mapping but as a dict >>> aligned = align_from_indices(source, template, mapping)
The alignment results in the same coordinate matches:
>>> source_O = GetCoords(source, 2) >>> source_C = GetCoords(source, 1) >>> abs(source_O[0] - template_O[0]) < 0.1 True >>> abs(source_C[0] - template_C[0]) < 0.1 True
- xenopict.alignment.align_from_mapids(source_mol, template_mol, source_map, template_map)[source]
Align a molecule to a template using atom map IDs.
This function aligns a molecule by matching atoms with corresponding map IDs between the source and template molecules. Map IDs are integers assigned to atoms that help establish correspondence between different molecules. This is particularly useful when you want to align molecules based on atom-to-atom mapping from a reaction or transformation.
The function modifies the source molecule in place by updating its 2D coordinates to match the template molecule’s orientation.
- Parameters:
source_mol (
Mol) – The molecule to aligntemplate_mol (
Mol) – The template molecule to align tosource_map (
Mol) – The source molecule with atom map IDstemplate_map (
Mol) – The template molecule with atom map IDs
- Return type:
Mol- Returns:
The source molecule, modified in place with new 2D coordinates
- Raises:
ValueError – If no atom maps are found in either molecule or no matching map IDs exist
Examples
Let’s align ethanol to methanol using atom map IDs:
>>> from rdkit import Chem >>> source = Chem.MolFromSmiles("CCO") >>> template = Chem.MolFromSmiles("CO") >>> source_map = Chem.MolFromSmiles("[CH3:1][CH2:2][OH:3]") >>> template_map = Chem.MolFromSmiles("[CH3:2][OH:3]")
Perform the alignment:
>>> aligned = align_from_mapids(source, template, source_map, template_map)
Get coordinates after alignment:
>>> template_O = GetCoords(template, 1) >>> template_C = GetCoords(template, 0) >>> source_O = GetCoords(source, 2) >>> source_C = GetCoords(source, 1)
Verify that the matched atoms have the same coordinates:
>>> abs(source_O[0] - template_O[0]) < 0.1 True >>> abs(source_O[1] - template_O[1]) < 0.1 True >>> abs(source_C[0] - template_C[0]) < 0.1 True >>> abs(source_C[1] - template_C[1]) < 0.1 True
The function raises an error when no atom maps are found:
>>> bad_map = Chem.MolFromSmiles("CCO") # No atom maps >>> align_from_mapids(source, template, bad_map, template_map) Traceback (most recent call last): ... ValueError: No atom maps found in source molecule
- xenopict.alignment.align_from_mcs(source_mol, template_mol)[source]
Align a molecule to a template molecule using maximum common substructure (MCS).
This function automatically aligns a molecule to a template by finding their maximum common substructure and using it to determine the alignment. It uses both exact and inexact matching to find the best possible alignment, where inexact matching allows for different bond types.
The function modifies the source molecule in place by updating its 2D coordinates to match the template molecule’s orientation.
- Parameters:
source_mol (
Mol) – The molecule to aligntemplate_mol (
Mol) – The template molecule to align to
- Return type:
Mol- Returns:
The source molecule, modified in place with new 2D coordinates
- Raises:
ValueError – If no alignment could be found between the molecules
Examples
First, let’s align ethanol to methanol, matching the OH group:
>>> from rdkit import Chem >>> source = Chem.MolFromSmiles("CCO") >>> template = Chem.MolFromSmiles("CO")
Perform the alignment:
>>> aligned = align_from_mcs(source, template)
Ensure get template and source coordinates:
>>> template_O = GetCoords(template, 1) # Oxygen is at index 1 >>> template_C = GetCoords(template, 0) # Carbon is at index 0
>>> source_O = GetCoords(source, 2) # Oxygen is at index 2 >>> source_C = GetCoords(source, 1) # Matching carbon is at index 1
Verify that the OH group coordinates match between molecules:
>>> abs(source_O[0] - template_O[0]) < 0.1 True >>> abs(source_O[1] - template_O[1]) < 0.1 True >>> abs(source_C[0] - template_C[0]) < 0.1 True >>> abs(source_C[1] - template_C[1]) < 0.1 True
The function raises an error when no alignment is possible:
>>> source = Chem.MolFromSmiles("c1ccccc1") >>> template = Chem.MolFromSmiles("O") >>> align_from_mcs(source, template) Traceback (most recent call last): ... ValueError: No alignment found
- xenopict.alignment.auto_align_molecules(molecules, hints=None)[source]
Automatically align multiple molecules by finding an optimal alignment tree.
This function aligns a list of molecules by constructing a maximum spanning tree of alignments. Each edge in the tree represents an alignment between two molecules, with the edge weight being the alignment score. The tree structure ensures that all molecules are connected through a series of high-quality alignments.
The function can also use “hints” - predefined alignments that should be preferred over automatically discovered ones. This is useful when you want to enforce certain alignments while letting the algorithm figure out the rest.
The algorithm works by: 1. Building a complete graph where nodes are molecules and edges are alignments 2. Finding a maximum spanning tree to get the best set of alignments 3. Using the largest molecule as the root of the tree 4. Applying alignments in breadth-first order from the root
- Parameters:
molecules (
list[Mol]) – List of molecules to alignhints (
Optional[list[Alignment]]) – Optional list of preferred Alignment objects. These alignments get a very high weight (score + 1000) to ensure they are used when possible.
- Return type:
list[Mol]- Returns:
The input molecules, modified in place with new 2D coordinates
Examples
Let’s align three molecules - ethanol, methanol, and propanol:
>>> from rdkit import Chem >>> ethanol = Chem.MolFromSmiles("CCO") >>> methanol = Chem.MolFromSmiles("CO") >>> propanol = Chem.MolFromSmiles("CCCO")
Align all molecules:
>>> aligned = auto_align_molecules([ethanol, methanol, propanol]) >>> len(aligned) == 3 True
We can verify the OH groups are aligned by checking coordinates:
>>> # Get OH coordinates for each molecule >>> eth_O = GetCoords(ethanol, 2) # O is at index 2 >>> meth_O = GetCoords(methanol, 1) # O is at index 1 >>> prop_O = GetCoords(propanol, 3) # O is at index 3
>>> # Check that OH groups are aligned >>> abs(eth_O[0] - meth_O[0]) < 0.1 True >>> abs(eth_O[1] - meth_O[1]) < 0.1 True >>> abs(prop_O[0] - meth_O[0]) < 0.1 True >>> abs(prop_O[1] - meth_O[1]) < 0.1 True
Now let’s provide a hint that aligns ethanol differently - matching the terminal carbon of ethanol to the oxygen of methanol (an unusual alignment):
>>> # Create a hint with unusual atom pairs >>> unusual_pairs = [(0, 1)] # Match ethanol's CH3 to methanol's OH >>> hint = Alignment.from_aligned_atoms(ethanol, methanol, unusual_pairs) >>> aligned = auto_align_molecules([ethanol, methanol, propanol], hints=[hint]) >>> len(aligned) == 3 True
Verify that our unusual hint was used - ethanol’s CH3 should be where methanol’s OH was:
>>> # Get coordinates after hint-based alignment >>> eth_CH3 = GetCoords(ethanol, 0) # Terminal carbon in ethanol >>> meth_OH = GetCoords(methanol, 1) # Oxygen in methanol
>>> # The hint should force these atoms to align >>> abs(eth_CH3[0] - meth_OH[0]) < 0.1 True >>> abs(eth_CH3[1] - meth_OH[1]) < 0.1 True