analysis Module
The analysis module provides high-level tools for evaluating structural quality, comparing protein models, and analyzing conformational ensembles.
Overview
After generating synthetic structures or ensembles, it is often necessary to quantify their precision and accuracy. This module provides a suite of geometric analyzers that handle alignment, RMSD calculation, and residue-level strain analysis.
Key Features
- Structure Comparison: Optimal superposition of PDB models using the Kabsch algorithm.
- Ensemble Analysis: Identification of the "medoid" structure (most representative) and calculation of ensemble-averaged RMSD.
- Geometric Strain: Identification of localized structural distortions, such as non-trans peptide bonds.
API Reference
analysis
Classes
GeometryAnalyzer
High-level analysis suite for protein geometry and ensembles.
Source code in synth_pdb/analysis.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 | |
Functions
compare_pdbs(pdb_path1, pdb_path2, ca_only=True)
staticmethod
Compare two PDB files and calculate RMSD and optimal transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pdb_path1
|
str
|
Path to the first PDB file (mobile). |
required |
pdb_path2
|
str
|
Path to the second PDB file (reference). |
required |
ca_only
|
bool
|
If True, only uses C-alpha atoms for alignment and RMSD. |
True
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary containing 'rmsd', 'rotation', and 'translation'. |
Source code in synth_pdb/analysis.py
analyze_ensemble_pdbs(pdb_paths)
staticmethod
Analyze a list of PDB files as an NMR-style ensemble.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pdb_paths
|
list[str]
|
List of file paths to PDB files. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary with 'avg_rmsd', 'medoid_path', and 'medoid_index'. |
Source code in synth_pdb/analysis.py
calculate_residue_strain(pdb_path)
staticmethod
Calculates 'geometric strain' per residue.
Currently defined as the deviation of the peptide bond omega angle from trans (180 deg).
Source code in synth_pdb/analysis.py
Scientific Principles
Kabsch Superposition
To compare two structures, we must find the rotation matrix \(R\) and translation vector \(\mathbf{t}\) that minimize the Root Mean Square Deviation (RMSD):
The GeometryAnalyzer uses the Kabsch algorithm (via SVD) to find the optimal \(R\) that aligns the mobile structure \(\mathbf{x}\) to the reference structure \(\mathbf{y}\).
Ensemble Medoid
For an ensemble of NMR structures or MD frames, the medoid is the structure \(k\) that has the minimum average RMSD to all other structures in the set:
This is a more robust representative of the ensemble than a simple average structure, which may have physically impossible bond lengths or angles.
Usage Example
from synth_pdb.analysis import GeometryAnalyzer
# 1. Compare two structures (e.g., predicted vs. ground truth)
results = GeometryAnalyzer.compare_pdbs(
"model.pdb",
"reference.pdb",
ca_only=True
)
print(f"RMSD: {results['rmsd']:.2f} Å")
# 2. Analyze an NMR ensemble
ensemble_files = ["frame1.pdb", "frame2.pdb", "frame3.pdb"]
stats = GeometryAnalyzer.analyze_ensemble_pdbs(ensemble_files)
print(f"Ensemble Precision: {stats['avg_rmsd']:.2f} Å")
print(f"Most representative model: {stats['medoid_path']}")
# 3. Check for geometric strain
strain = GeometryAnalyzer.calculate_residue_strain("model.pdb")
for res_id, dev in strain.items():
if dev > 20: # Large deviation from trans
print(f"Warning: High omega strain at residue {res_id}: {dev:.1f}°")