ORM Model
This section provides detailed documentation of the Object-Relational Mapping (ORM) models used in the Protein Data Handler. Each model represents a table in the database and is crucial for managing and storing protein data efficiently.
ER Diagram
Protein
- class protein_metamorphisms_is.sql.model.Protein(**kwargs)
Bases:
Base
Represents a protein, encapsulating its properties and relationships within a database.
This class models a protein entity, encompassing various attributes that describe its characteristics and relationships to other entities. It serves as a comprehensive record for proteins, covering aspects from basic sequence data to more complex annotations and references.
- entry_name
Unique entry name for the protein, serving as the primary key.
- Type:
str
- data_class
Categorization of the protein’s data (e.g., experimental, predicted).
- Type:
str
- molecule_type
Type of the protein molecule (e.g., enzyme, antibody).
- Type:
str
- sequence_length
The length of the amino acid sequence of the protein.
- Type:
int
- sequence
Full amino acid sequence of the protein.
- Type:
str
- accessions
A link to the ‘Accession’ class, detailing access codes associated with this protein.
- Type:
relationship
- created_date
The date when the protein record was first created.
- Type:
Date
- sequence_update_date
The date when the protein’s sequence was last updated.
- Type:
Date
- annotation_update_date
The date when the protein’s annotation was last updated.
- Type:
Date
- description
A general description or overview of the protein.
- Type:
str
- gene_name
The name of the gene that encodes this protein.
- Type:
str
- organism
The organism from which the protein is derived.
- Type:
str
- organelle
The specific organelle where the protein is localized, if applicable.
- Type:
str
- organism_classification
Taxonomic classification of the organism (e.g., species, genus).
- Type:
str
- taxonomy_id
A unique identifier for the organism in taxonomic databases.
- Type:
str
- host_organism
The host organism for the protein, relevant in cases of viral or symbiotic proteins.
- Type:
str
- host_taxonomy_id
Taxonomy identifier for the host organism, if applicable.
- Type:
str
- comments
Additional remarks or notes about the protein.
- Type:
str
- pdb_references
A link to the ‘PDBReference’ class, providing references to structural data in the PDB.
- Type:
relationship
- go_terms
A connection to the ‘GOTerm’ class, indicating Gene Ontology terms associated with the protein.
- Type:
relationship
- keywords
Descriptive keywords related to the protein, aiding in categorization and search.
- Type:
str
- protein_existence
A numerical code indicating the evidence level for the protein’s existence.
- Type:
int
- seqinfo
Supplementary information about the protein’s sequence.
- Type:
str
- disappeared
Flag indicating whether the protein is obsolete or no longer relevant.
- Type:
Boolean
- created_at
Timestamp of when the record was initially created.
- Type:
DateTime
- updated_at
Timestamp of the most recent update to the record.
- Type:
DateTime
This class is integral to managing and querying detailed protein data, supporting a wide range of bioinformatics and data analysis tasks.
Accession
- class protein_metamorphisms_is.sql.model.Accession(**kwargs)
Bases:
Base
Represents a unique access code for a protein in a database, often used in bioinformatics repositories.
This class models an accession record, which is essential for tracking and referencing protein data. Each accession record provides a unique identifier for a protein and is linked to detailed protein information.
The Accession class plays a crucial role in the organization and retrieval of protein data, acting as a key reference point for protein identification and database querying.
- id
A unique identifier for the accession record within the database.
- Type:
int
- accession_code
The unique access code associated with a specific protein. This code is typically used as a reference in various bioinformatics databases and literature.
- Type:
str
- primary
A flag indicating whether this accession code is the primary identifier for the associated protein. Primary accession codes are generally the most stable and widely used references.
- Type:
Boolean
- protein_entry_name
The entry name of the protein associated with this accession code. This serves as a link to the protein’s detailed record.
- Type:
str
- protein
A SQLAlchemy relationship with the ‘Protein’ class. This relationship provides a direct connection to the protein entity that this accession code represents, allowing for the retrieval of comprehensive protein information.
- Type:
relationship
- disappeared
A flag indicating whether the accession code is obsolete or no longer in use. This is important for maintaining the integrity and relevance of the database.
- Type:
Boolean
- created_at
The date and time when this accession record was first created in the database.
- Type:
DateTime
- updated_at
The date and time when this accession record was last updated, reflecting any changes or updates to the accession information.
- Type:
DateTime
PDBReference
- class protein_metamorphisms_is.sql.model.PDBReference(**kwargs)
Bases:
Base
Represents a reference to a structure in the Protein Data Bank (PDB).
This class is pivotal for storing and managing details about protein structures as cataloged in the PDB. It forms a bridge between PDB structures and UniProt entries, enabling comprehensive tracking and analysis of protein structures and their corresponding sequences.
The PDBReference class serves as a critical component for integrating structural data with protein sequence and functional information, thereby enriching the understanding of protein structures.
- id
A unique identifier for the PDB reference within the database. This serves as the primary key.
- Type:
int
- pdb_id
The unique identifier of the protein structure in PDB, typically a 4-character alphanumeric code.
- Type:
str
- protein_entry_name
The entry name of the associated protein in UniProt. This helps link the structure to its corresponding protein sequence and other relevant data in UniProt.
- Type:
str
- protein
A SQLAlchemy relationship to the ‘Protein’ class, establishing a connection to the UniProt entry corresponding to this PDB structure.
- Type:
relationship
- method
The method used for determining the protein structure, such as X-ray crystallography or NMR spectroscopy.
- Type:
str
- resolution
The resolution of the protein structure, measured in Ångströms (Å). A lower number indicates higher resolution.
- Type:
Float
- uniprot_chains
A relationship to the ‘UniprotChains’ class, detailing the individual protein chains in the structure as defined in UniProt.
- Type:
relationship
- pdb_chains
A relationship to the ‘PDBChains’ class, describing the chains in the protein structure as recorded in PDB.
- Type:
relationship
- created_at
The timestamp indicating when the PDB reference record was initially created in the database.
- Type:
DateTime
- updated_at
The timestamp of the most recent update to the PDB reference record. This field is automatically updated on each record modification.
- Type:
DateTime
PDBChains
- class protein_metamorphisms_is.sql.model.PDBChains(**kwargs)
Bases:
Base
Represents an individual polypeptide chain within a protein structure as cataloged in the Protein Data Bank (PDB).
The PDBChains class is instrumental in representing each distinct polypeptide chain encountered in protein structures from the PDB. This class enables detailed tracking and management of these chains, facilitating analyses and queries at the chain level. By associating each chain with its parent protein structure, the class enhances the database’s ability to model complex protein structures.
- id
A unique identifier for each polypeptide chain within the database, serving as the primary key.
- Type:
int
- chains
The specific identifier of the chain as referenced in the protein structure within PDB. This attribute, combined with ‘pdb_reference_id’, constitutes part of the composite primary key.
- Type:
String
- sequence
The complete amino acid sequence of the chain. Storing this mandatory attribute allows for in-depth analyses of the chain’s molecular structure.
- Type:
String
- pdb_reference_id
A foreign key linking to the unique identifier of the parent protein structure in the PDB. This attribute forms the other part of the composite primary key and establishes a direct relationship with the PDBReference entity.
- Type:
Integer
- model
An identifier for the model of the chain, particularly important for structures like NMR that may encompass multiple models.
- Type:
Integer
- pdb_reference
A SQLAlchemy relationship that connects to the PDBReference entity. This relationship provides access to comprehensive details about the entire protein structure to which this chain is a part.
- Type:
relationship
The composite primary key, comprising chains and pdb_reference_id, ensures that each instance of PDBChains is uniquely tied to a specific structure in the PDB. This key structure is critical for precise data retrieval and efficient management of the database’s structural data.
Cluster
- class protein_metamorphisms_is.sql.model.Cluster(**kwargs)
Bases:
Base
Represents a cluster of protein chains, where each cluster is formed by chains with significant similarity, determined using the cd-hit tool.
This class is instrumental in grouping protein chains that are highly similar to each other, aiding in the identification of common structures and functions.
- id
Unique identifier for each cluster.
- Type:
int
- pdb_chain_id
Foreign key referencing the ‘PDBChains’ entity. It is used to identify the specific protein
- Type:
int
- chain in the PDB database associated with this cluster.
- cluster_id
Identifier of the cluster, typically a unique string representing this specific group of protein chains.
- Type:
int
- is_representative
Indicates whether the cluster is representative of a larger set of similar chains. ‘True’ for yes, ‘False’ for no.
- Type:
Boolean
- sequence_length
Average length of the sequences of the chains in the cluster.
- Type:
int
- identity
Value representing the average sequence identity within the cluster, usually a percentage indicating how similar the chains are within the group.
- Type:
Float
The relationship with ‘PDBChains’ allows each cluster to be connected to its specific chain in the PDB database, providing a direct link to detailed structural information.
StructuralComplexityLevel
- class protein_metamorphisms_is.sql.model.StructuralComplexityLevel(**kwargs)
Bases:
Base
Captures the hierarchy of structural forms within proteins, ranging from individual proteins to the partitioning of chains through its secondary structure.
This class provides a foundational abstraction for handling proteins at various levels of structural complexity within the development environment. It allows for the execution of operations across different complexity levels, enabling a more flexible and nuanced approach to protein data manipulation and analysis. By defining distinct levels of structural complexity, it supports targeted queries and operations, enhancing the efficiency and precision of bioinformatics workflows.
- id
Unique identifier for each complexity level.
- Type:
Integer
- name
Descriptive name of the complexity level.
- Type:
String
- description
More detailed information about the complexity level.
- Type:
String, optional
StructuralAlignmentType
- class protein_metamorphisms_is.sql.model.StructuralAlignmentType(**kwargs)
Bases:
Base
Provides a framework for aligning protein structures, crucial for understanding the functional and evolutionary relationships between proteins. This class enables the use of various alignment strategies, supporting a comprehensive approach to protein comparison.
Structural alignment methods integrated within this framework include:
CE-align: Identifies optimal alignments based on the Combinatorial Extension method, focusing on similar backbone arrangements.
US-align: Utilizes an advanced algorithm for measuring structural similarity, offering insights into sequence identity and alignment scores.
FATCAT: Capable of accommodating protein flexibility during alignment, allowing for the detection of functionally important variations.
By incorporating these methodologies, the class facilitates diverse approaches to protein comparison. This enables researchers to gain deeper insights into protein functionality and evolution, highlighting the significance of structural alignment in the field of bioinformatics.
- id
Unique identifier for each alignment type.
- Type:
Integer
- name
Name of the alignment type.
- Type:
String
- description
Detailed description of the alignment method.
- Type:
String
- task_name
Name of the specific task or process associated with this alignment type.
- Type:
String
StructuralAlignmentQueue
- class protein_metamorphisms_is.sql.model.StructuralAlignmentQueue(**kwargs)
Bases:
Base
Manages a queue of pending structural alignment tasks, overseeing their execution and monitoring.
- id
Unique identifier for each queue entry.
- Type:
Integer
- cluster_entry_id
Reference to the protein chain cluster being aligned.
- Type:
Integer, ForeignKey
- alignment_type_id
ID of the structural alignment type to be applied.
- Type:
Integer, ForeignKey
- state
Current state of the task (e.g., pending, processing, completed, error).
- Type:
Integer
- retry_count
Number of retries attempted for the task.
- Type:
Integer
- error_message
Error message if the task fails.
- Type:
String, optional
- created_at
Timestamp when the queue entry was created.
- Type:
DateTime
- updated_at
Timestamp when the queue entry was last updated.
- Type:
DateTime
StructuralAlignmentResults
- class protein_metamorphisms_is.sql.model.StructuralAlignmentResults(**kwargs)
Bases:
Base
Stores results from structural alignment tasks, providing detailed metrics and scores.
- id
Unique identifier for each set of results.
- Type:
Integer
- cluster_entry_id
Reference to the cluster of protein chains analyzed.
- Type:
Integer, ForeignKey
- ce_rms
Root mean square deviation calculated by CE method.
- Type:
Float
- tm_rms
Root mean square deviation calculated by US-align.
- Type:
Float
- tm_seq_id
Sequence identity calculated by US-align.
- Type:
Float
- tm_score_chain_1
TM score for the first chain in the US-alignment.
- Type:
Float
- tm_score_chain_2
TM score for the second chain in the US-alignment.
- Type:
Float
- fc_rms
Root mean square deviation calculated by FATCAT.
- Type:
Float
- fc_identity
Sequence identity calculated by FATCAT.
- Type:
Float
- fc_similarity
Similarity score calculated by FATCAT.
- Type:
Float
- fc_score
Overall score calculated by FATCAT.
- Type:
Float
- fc_align_len
Length of the alignment calculated by FATCAT.
- Type:
Float
GOTerm
- class protein_metamorphisms_is.sql.model.GOTerm(**kwargs)
Bases:
Base
Represents a Gene Ontology (GO) term associated with a protein.
This class is used to store and manage information about the functional annotation of proteins as defined by the Gene Ontology Consortium. Each GO term provides a standardized description of a protein’s molecular function, biological process, or cellular component.
- id
Unique identifier for the GO term within the database.
- Type:
int
- go_id
Unique identifier of the GO term in the Gene Ontology system.
- Type:
str
- protein_entry_name
Entry name of the associated protein in UniProt.
- Type:
str
- protein
Relationship with the ‘Protein’ class, linking the GO term to its corresponding protein.
- Type:
relationship
- category
Category of the GO term, indicating whether it describes a molecular function, biological process, or cellular component.
- Type:
str
- description
Detailed description of the GO term, explaining the function, process, or component it represents.
- Type:
str
The relationship with the ‘Protein’ class allows for the association of functional, process, or component annotations with specific proteins.