Universal Structural Alignment Task

US-align (Universal Structural alignment) is a unified protocol to compare 3D structures of different macromolecules (proteins, RNAs and DNAs) in different forms (monomers, oligomers and heterocomplexes) for both pairwise and multiple structure alignments. The core alogrithm of US-align is extended from TM-align and generates optimal structural alignments by maximizing TM-score of compared strucures through heuristic dynamic programming iterations. Large-scale benchmark tests showed that US-align can generate more accurate structural alignments with significantly reduced CPU time, compared to the state-of-the-art methods developed for specific structural alignment tasks. TM-score has values in (0,1] with 1 indicating an identical structure match, where a TM-score ≥0.5 (or 0.45) means the structures share the same global topology for proteins (or RNAs).

More information about US-align can be found at the official website: US-align.

protein_metamorphisms_is.operations.structural_alignment_tasks.universal.align_task(alignment_entry, conf)

Executes a protein structure alignment task using the US-align algorithm.

This function aligns a target protein structure against a representative structure using US-align, an advanced algorithm for measuring structural similarity between protein structures. It runs an external US-align binary, captures its output, and extracts key metrics such as RMSD (Root Mean Square Deviation), sequence identity, and alignment scores.

File paths for the representative and target structures are constructed based on their PDB IDs, chain identifiers, and model numbers. The US-align binary is executed with these structures as input, and its output is parsed using regular expressions to extract alignment metrics.

If execution is successful, a dictionary containing these metrics is returned. If US-align encounters an error or if an exception occurs, the error is logged and an error object with the error message is returned.

Parameters:
  • alignment_entry (object) – Contains data for the alignment task, including PDB IDs, chain identifiers, model numbers for both representative and target structures, and the cluster entry ID.

  • conf (dict) – Configuration settings, including paths to the directory where PDB chain files are stored and the directory containing the US-align binary.

Returns:

Contains the queue entry ID of the alignment task and either a result

dictionary with keys for ‘cluster_entry_id’, ‘us_rms’, ‘us_seq_id’, ‘us_score’, or an error object with ‘cluster_entry_id’ and ‘error_message’.

Return type:

tuple

Raises:

Exception – Any exceptions during the process are captured, logged, and an error object with the error message is returned.

Example

>>> alignment_entry = {
        'rep_pdb_id': '1A2B',
        'rep_chains': 'A',
        'rep_model': 0,
        'pdb_id': '2B3C',
        'chains': 'B',
        'model': 0,
        'cluster_id': 123,
        'queue_entry_id': 456
    }
>>> conf = {
        'pdb_chains_path': '/path/to/pdb_chains',
        'binaries_path': '/path/to/binaries'
    }
>>> align_task(alignment_entry, conf)
(456, {'cluster_entry_id': 123, 'us_rms': 0.5, 'us_seq_id': 0.8, 'us_score': 0.95})