ensemble Module
The synth_pdb.ensemble subpackage provides tools for quantitative analysis of NMR structure bundles — the multi-model ensembles produced by NMR structure calculation software (CYANA, XPLOR-NIH, CNS) that represent the conformational space consistent with experimental restraints.
[!NOTE] These tools mirror the analysis performed by PDBStat (Tejero et al. 2013) and are validated against the published NMR benchmarks in the
tests/test_scientific_validation.pysuite.
Background
An NMR structure ensemble is not a single model but a bundle of conformers — typically 10–40 structures — each consistent with the experimental data (NOE distances, dihedral restraints, RDCs). The spread of the bundle is the key indicator of precision:
- Tight bundle → well-determined, rigid region
- Loose bundle → flexible, disordered, or poorly-restrained region
Two complementary metrics characterise ensemble quality:
| Metric | What it measures |
|---|---|
| RMSD | Pairwise coordinate deviation (Å) — precision of atomic positions |
| DAOP | Dihedral Angle Order Parameter — backbone angular consistency (0–1) |
DAOPCalculator
DAOPCalculator
Calculator for Dihedral Angle Order Parameters (DAOP).
The DAOP quantifies the consistency of a dihedral angle across an NMR ensemble. Values range from 0 (completely disordered) to 1 (perfectly ordered). Typical NMR convention: S >= 0.9 means the residue is well-defined (corresponds to <~ +/-24deg circular standard deviation).
All methods are static - no instantiation required.
Examples:
>>> # Perfectly ordered angles (all -60deg)
>>> angles = np.full(20, -np.pi / 3)
>>> DAOPCalculator.calculate_order_parameter(angles)
1.0
>>> # Find which residues are well-defined across an ensemble
>>> phi = np.random.normal(-np.pi / 3, np.pi / 36, (50, 20))
>>> psi = np.random.normal( np.pi / 3, np.pi / 36, (50, 20))
>>> mask = DAOPCalculator.find_well_defined_residues(phi, psi)
>>> mask.all()
True
Functions
find_well_defined_residues(phi_angles, psi_angles, threshold=1.8)
staticmethod
Identify well-defined residues using the PDBStat S(phi) + S(psi) criterion.
A residue is considered well-defined when:
S(phi_i) + S(psi_i) >= threshold
The default threshold of 1.8 follows the PDBStat convention and corresponds to both angles having S >= 0.9, i.e. a circular standard deviation of <~ +/-24deg for each.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
phi_angles
|
NDArray[float64]
|
Array of shape (n_residues, n_models) with phi angles in radians. |
required |
psi_angles
|
NDArray[float64]
|
Array of shape (n_residues, n_models) with psi angles in radians. |
required |
threshold
|
float
|
Sum-of-order-parameters cutoff. Default 1.8 (PDBStat convention). |
1.8
|
Returns:
| Type | Description |
|---|---|
NDArray[bool_]
|
Boolean array of shape (n_residues,). |
NDArray[bool_]
|
well-defined residues. |
Examples:
What is DAOP?
The Dihedral Angle Order Parameter (Hyberts et al. 1992) measures the circular variance of backbone dihedral angles (φ, ψ) across all models in an ensemble:
- S = 1.0 → all models have identical dihedral angles (perfectly rigid)
- S = 0.0 → angles are uniformly distributed (completely disordered)
Well-Defined Residues (PDBStat Convention)
A residue is considered well-defined if it satisfies:
This is the threshold used by PDBStat and the BMRB deposition system. Well-defined residues form the core of the folded structure and are used for global RMSD calculations.
Usage
from synth_pdb.ensemble import DAOPCalculator
# coords_list: List[np.ndarray] — one (N_atoms, 3) array per model
# residue_ids: List[int] — residue numbers matching the atom arrays
calc = DAOPCalculator(coords_list, residue_ids)
daop_per_residue = calc.calculate()
# Returns: Dict[int, Dict[str, float]]
# {residue_id: {"phi": S_phi, "psi": S_psi, "combined": S_phi + S_psi}}
well_defined = calc.find_well_defined_residues(threshold=1.8)
# Returns: List[int] — residue IDs where S(φ)+S(ψ) ≥ threshold
print(f"Well-defined residues: {well_defined}")
EnsembleStatistics
EnsembleStatistics
dataclass
Comprehensive, typed container for NMR ensemble quality statistics.
Replaces the plain Dict[str, float] pattern with a type-safe
dataclass that adds computed properties, to_dict/from_dict
round-trips, and a human-readable __str__.
Attributes:
| Name | Type | Description |
|---|---|---|
n_models |
int
|
Number of models in the ensemble. |
n_residues |
int
|
Number of residues per model. |
mean_pairwise_rmsd |
float
|
Mean of all pairwise RMSDs (A). |
median_pairwise_rmsd |
float
|
Median of all pairwise RMSDs (A). |
std_pairwise_rmsd |
float
|
Standard deviation of pairwise RMSDs (A). |
min_pairwise_rmsd |
float
|
Minimum pairwise RMSD (A). |
max_pairwise_rmsd |
float
|
Maximum pairwise RMSD (A). |
medoid_index |
int
|
0-based index of the medoid model. |
medoid_mean_rmsd |
float
|
Mean RMSD of the medoid to all other models (A). |
rmsd_to_mean |
float
|
Mean RMSD-to-ensemble-mean (A). Primary precision indicator; thresholds from Tejero et al. 2013. |
mean_rmsf |
float
|
Mean per-residue RMSF (A). |
max_rmsf |
float
|
Maximum per-residue RMSF (A). |
well_defined_residues |
int
|
Count of residues with RMSF < 1.0 A. |
pct_well_defined |
float
|
Percentage of well-defined residues (0-100). |
Examples:
>>> from synth_pdb.ensemble.statistics import EnsembleStatistics
>>> stats = EnsembleStatistics(
... n_models=20, n_residues=91,
... mean_pairwise_rmsd=0.85, rmsd_to_mean=0.72,
... pct_well_defined=88.0,
... )
>>> stats.precision
'HIGH'
>>> stats.overall_quality
'Excellent quality NMR ensemble'
>>> stats.is_ensemble
True
Attributes
is_single_model
property
Return True if the ensemble contains exactly one model.
is_ensemble
property
Return True if the ensemble contains more than one model.
precision
property
Precision tier based on RMSD-to-mean (Tejero et al. 2013).
Returns:
| Type | Description |
|---|---|
str
|
|
str
|
|
str
|
|
overall_quality
property
Human-readable overall quality assessment.
Combines :attr:precision with the percentage of well-defined
residues to produce a single summary string.
Returns:
| Type | Description |
|---|---|
str
|
One of four quality strings ranging from |
str
|
|
str
|
|
Functions
to_dict()
Serialise all fields to a plain dictionary.
Useful for JSON export, backward compatibility with code that
previously consumed Dict[str, float] from ensemble analysers,
and for persistence.
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
Dictionary containing every field of this dataclass. |
Examples:
from_dict(data)
classmethod
Construct an instance from a plain dictionary.
Extra keys in data are silently ignored, making this robust
to version skew when deserialising persisted results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, Any]
|
Dictionary produced by :meth: |
required |
Returns:
| Name | Type | Description |
|---|---|---|
New |
EnsembleStatistics
|
class: |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
Examples:
Usage
from synth_pdb.ensemble import EnsembleStatistics
stats = EnsembleStatistics(coords_list, residue_ids=residue_ids)
# Pairwise RMSD matrix (Å)
rmsd_matrix = stats.pairwise_rmsd # np.ndarray (N_models × N_models)
print(f"Mean pairwise RMSD: {rmsd_matrix.mean():.2f} Å")
# RMSF per residue (Å) — after Kabsch alignment
rmsf = stats.rmsf # np.ndarray (N_residues,)
# Representative structure (medoid = lowest mean RMSD to all others)
medoid_idx = stats.medoid_index
print(f"Representative model: #{medoid_idx + 1}")
# Well-defined residues
print(f"Well-defined: {stats.well_defined_residues}")
# Overall quality assessment
qa = stats.quality_assessment # QualityAssessment dataclass
print(f"Quality: {qa.overall_quality}")
QualityAssessment
QualityAssessment
dataclass
Interpretive quality assessment for an NMR ensemble.
A lightweight companion to :class:EnsembleStatistics that carries the
human-readable precision tier and overall quality string.
Attributes:
| Name | Type | Description |
|---|---|---|
precision |
str
|
Precision tier - |
overall |
str
|
Free-text overall quality description. |
Examples:
>>> from synth_pdb.ensemble.statistics import QualityAssessment
>>> qa = QualityAssessment(precision="HIGH",
... overall="Excellent quality NMR ensemble")
>>> qa.to_dict()
{'precision': 'HIGH', 'overall': 'Excellent quality NMR ensemble'}
Functions
to_dict()
Serialise to a plain dictionary.
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
|
from_dict(data)
classmethod
Construct from a plain dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict[str, str]
|
Must contain |
required |
Returns:
| Name | Type | Description |
|---|---|---|
New |
QualityAssessment
|
class: |
Raises:
| Type | Description |
|---|---|
KeyError
|
If required keys are absent. |
Quality Thresholds (Tejero et al. 2013)
| Field | Excellent | Acceptable | Poor |
|---|---|---|---|
mean_pairwise_rmsd_backbone |
< 0.5 Å | 0.5–1.5 Å | > 1.5 Å |
mean_pairwise_rmsd_heavy |
< 1.0 Å | 1.0–2.5 Å | > 2.5 Å |
fraction_well_defined |
> 0.85 | 0.60–0.85 | < 0.60 |
overall_quality |
"excellent" |
"acceptable" |
"poor" |
Complete Example: Evaluating a Synthetic NMR Ensemble
import numpy as np
from synth_pdb import generate_structure
from synth_pdb.ensemble import DAOPCalculator, EnsembleStatistics
from synth_pdb.geometry.superposition import find_medoid
# --- 1. Generate a 20-model synthetic ensemble ---
sequence = "LKELEKELEKELEKELEKELEKEL"
models = []
for _ in range(20):
pdb_text = generate_structure(sequence=sequence, conformation="alpha")
# Parse backbone Cα coordinates (simplified — use biotite in practice)
# coords shape: (N_residues, 3)
coords = parse_ca_coords(pdb_text) # your parser here
models.append(coords)
residue_ids = list(range(1, len(models[0]) + 1))
# --- 2. Dihedral Angle Order Parameters ---
daop = DAOPCalculator(models, residue_ids)
daop_scores = daop.calculate()
well_def = daop.find_well_defined_residues(threshold=1.8)
print(f"Well-defined residues: {well_def}")
# --- 3. Full ensemble statistics ---
stats = EnsembleStatistics(models, residue_ids=residue_ids)
print(f"Backbone RMSD: {stats.quality_assessment.mean_pairwise_rmsd_backbone:.2f} Å")
print(f"Heavy-atom RMSD: {stats.quality_assessment.mean_pairwise_rmsd_heavy:.2f} Å")
print(f"Overall quality: {stats.quality_assessment.overall_quality}")
print(f"Representative model: #{stats.medoid_index + 1}")
References
- Hyberts, S. G. et al. (1992). The solution structure of eglin c based on measurements of many NOEs and coupling constants. Protein Science, 1(6), 736–751. (Introduces DAOP)
- Tejero, R. et al. (2013). PDBStat: a universal restraint converter and restraint quality analyzer for protein NMR structures. J. Biomol. NMR, 56(4), 337–351. (Defines quality thresholds)
See Also
- Tutorial: IDP Conformational Ensembles
- API: geometry module — Kabsch superposition used internally
- Glossary: DAOP, RMSF, S²