Combinatorial Extension Alignment Task

Combinatorial Extension (CE) alignment task, a highly efficient algorithm for protein structure alignment. CE is renowned for its ability to rapidly identify alignments between protein structures by extending an initial seed alignment in a combinatorial fashion. This method is particularly effective for comparing protein domains, identifying fold similarities, and analyzing protein structure-function relationships.

The CE algorithm operates under the premise that a good structural alignment can be constructed by piecing together short segments of alignments, known as aligned fragment pairs (AFPs), that are identified across the entire length of the protein structures. This combinatorial approach to extending alignments allows CE to uncover optimal alignments that highlight both the similarities and differences in the 3D structures of proteins, even in the presence of significant structural variations.

More information about CE-Align can be found at the official website: CE-Align.

protein_metamorphisms_is.operations.structural_alignment_tasks.combinatorial_extension.align_task(alignment_entry, conf)

Performs the alignment task for a given pair of protein structures using the Combinatorial Extension (CE) algorithm. This function is intended for use within a parallel processing system, allowing for efficient structural alignments across multiple protein structures.

The function retrieves paths for the representative and target protein structures based on their PDB IDs, chain identifiers, and model numbers from configuration settings. It utilizes Biopython’s MMCIFParser for parsing the structures, sets the representative structure as a reference, and conducts the alignment against the target structure. The resulting root-mean-square deviation (RMSD) is calculated and returned along with the cluster entry ID, serving as a measure of alignment quality.

In case of exceptions during the alignment process, an error message is logged, and an error object is returned. This object includes the cluster entry ID and the error message, ensuring traceability and ease of debugging.

Parameters:
  • alignment_entry (object) – Contains data for the alignment task, including PDB IDs, chain identifiers, model numbers for the representative and target structures, and the cluster entry ID.

  • conf (dict) – Configuration settings, specifically the path to the directory where PDB chain files are stored.

Returns:

Contains the queue entry ID of the alignment task and either the

alignment result (a dictionary with the cluster entry ID and the CE RMSD value) or an error object (a dictionary with the cluster entry ID and an error message).

Return type:

tuple

Raises:

Exception – Captures any exceptions during the alignment process, logging them and returning an error object with detailed information.

Example

>>> alignment_entry = {
        'rep_pdb_id': '1A2B',
        'rep_chains': 'A',
        'rep_model': 0,
        'pdb_id': '2B3C',
        'chains': 'B',
        'model': 0,
        'cluster_id': 123,
        'queue_entry_id': 456
    }
>>> conf = {'pdb_chains_path': '/path/to/pdb_chains'}
>>> align_task(alignment_entry, conf)
(456, {'cluster_entry_id': 123, 'ce_rms': 0.5})