📏 Measuring Structural Truth: The RPF Score¶
Validating Protein Structures against NMR Restraints¶
Run this notebook in the cloud:
🎯 Objectives¶
In this tutorial, you will learn how to:
- Define "Structural Truth" in the context of solution NMR.
- Calculate RPF Scores (Recall, Precision, and F-measure) for a structural model.
- Identify Over-fitting by observing drops in the Precision score.
- Benchmark Refinement using real-world validation metrics from the Montelione group.
Key Reference: Huang, Y. J. et al. (2005). Protein NMR recall, precision, and F-measure scores (RPF scores). J. Am. Chem. Soc. 127, 1665–1674.
import os
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
# @title Environment Setup
!pip install -q synth-pdb biotite matplotlib numpy scipy py3Dmol openmm
else:
sys.path.append(os.path.abspath('../../'))
print("✅ Environment configured!")
🧬 Background: What are RPF Scores?¶
NMR structure determination relies on distance restraints derived from the Nuclear Overhauser Effect (NOE). However, a structural model can look "perfect" geometrically (no clashes, good angles) but completely fail to represent the experimental data.
To bridge this gap, the Montelione group developed RPF scores (Huang et al. 2005, JACS):
| Metric | Question it answers | Formula |
|---|---|---|
| Recall (R) | Of the restraints observed, how many does the model satisfy? | R = satisfied / total restraints |
| Precision (P) | Of the short distances in the model, how many are supported by data? | P = supported / total short distances |
| F-measure (F) | The harmonic mean of Recall and Precision. | F = 2PR / (P+R) |
A high-quality NMR structure generally targets F > 0.70–0.80. This threshold is used by the BMRB and PDB as a quality gate for deposited structures.
# @title 🛠️ Setup & Installation { display-mode: "form" }
import os
import sys
# ── Environment detection ────────────────────────────────────────────────────
IN_COLAB = "google.colab" in sys.modules
if IN_COLAB:
print("🌐 Running in Google Colab")
try:
import synth_pdb # already installed (e.g. via pip in a previous run)
print(" ✅ synth-pdb already installed")
except ImportError:
print(" 📦 Installing synth-pdb and dependencies...")
# synth-nmr provides calculate_synthetic_noes; biotite is a core dep.
import subprocess
subprocess.run(
[sys.executable, "-m", "pip", "install", "-q",
"synth-pdb", "synth-nmr", "biotite", "matplotlib"],
check=True,
)
print(" ✅ Installation complete")
else:
# Local / CI: add the repository root so the development copy is used.
# This notebook lives at docs/tutorials/ — two levels below the repo root.
print("💻 Running in local Jupyter environment")
_repo_root = os.path.abspath(os.path.join(os.getcwd(), "../.."))
if _repo_root not in sys.path:
sys.path.insert(0, _repo_root)
print(f" 📌 Added repo root to path: {_repo_root}")
print("✅ Environment ready!")
import io
import biotite.structure.io.pdb as pdb
import matplotlib.pyplot as plt
import numpy as np
from synth_pdb.generator import generate_pdb_content
from synth_pdb.nmr import calculate_rpf_score, calculate_synthetic_noes
print("✅ NMR Validation module loaded!")
🟢 Step 1: Generate a "Ground Truth" Structure¶
We start by generating a high-quality, energy-minimized structure of a small protein domain and "observing" its NOEs. These restraints become our experimental ground truth — the reference set every model will be scored against.
# Generate a high-quality helix and observe its NOEs as "ground truth"
seq = "AKAAKAKAAK" * 2
pdb_content = generate_pdb_content(
sequence_str=seq,
minimize_energy=True, # Physics-based refinement for realistic coordinates
)
# Parse the PDB string into a Biotite AtomArray
structure = pdb.PDBFile.read(io.StringIO(pdb_content)).get_structure(model=1)
# Derive "experimental" NOE restraints from the ground-truth geometry
restraints = calculate_synthetic_noes(structure)
print(f"Generated {len(restraints)} synthetic NOE restraints from ground truth.")
print(f"Structure: {len(structure)} atoms, sequence length {len(seq)} residues")
🏅 Step 2: Validating the Perfect Fit¶
If we test the Ground Truth structure against its own NOEs, all scores should be 1.0. This is our sanity check: the model that generated the restraints must perfectly satisfy them.
scores = calculate_rpf_score(structure, restraints)
print("--- Ground Truth Scores ---")
for k, v in scores.items():
print(f" {k.capitalize():12s}: {v:.4f}")
⚠️ Step 3: Identifying the "Over-folded" Decoy¶
What happens if we artificially compact the protein?
- Recall will stay high — the original short-range contacts are still present.
- Precision will plummet — the collapsed structure creates many new close contacts that are not supported by any NOE restraint.
This is exactly what happens when a structure determination program over-refines a model, creating an unrealistically compact fold.
# Create a 'collapsed' version by scaling coordinates towards the center
coords = structure.coord
center = np.mean(coords, axis=0)
collapsed_coords = (coords - center) * 0.7 + center # 30% collapse
collapsed_structure = structure.copy()
collapsed_structure.coord = collapsed_coords
collapsed_scores = calculate_rpf_score(collapsed_structure, restraints)
print("--- Collapsed Model Scores ---")
for k, v in collapsed_scores.items():
print(f" {k.capitalize():12s}: {v:.4f}")
📊 Step 4: Visualising the Score Collapse¶
Numbers are convincing — but a chart makes the story unmissable. The bar chart below shows Recall, Precision, and F-measure for both the Ground Truth and the over-compacted Collapsed Model.
Watch the Precision bar. It drops because the collapsed structure has many atom pairs that are newly close together but unsupported by any NOE.
plt.style.use("dark_background")
fig, ax = plt.subplots(figsize=(9, 5.5))
metrics = ["Recall", "Precision", "F-measure"]
keys = ["recall", "precision", "f_measure"]
ground_truth_vals = [scores.get(k, 0.0) for k in keys]
collapsed_vals = [collapsed_scores.get(k, 0.0) for k in keys]
x = range(len(metrics))
width = 0.35
bars_gt = ax.bar([i - width/2 for i in x], ground_truth_vals,
width, label="Ground Truth", color="#2ecc71", alpha=0.9)
bars_co = ax.bar([i + width/2 for i in x], collapsed_vals,
width, label="Collapsed Model", color="#e74c3c", alpha=0.9)
# Value labels on each bar
for bar in list(bars_gt) + list(bars_co):
ax.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.015,
f"{bar.get_height():.3f}", ha="center", va="bottom",
fontsize=10, fontweight="bold", color="white")
# Community acceptance threshold
ax.axhline(0.7, color="gold", linestyle="--", linewidth=1.2, alpha=0.8,
label="BMRB acceptance threshold (F ≥ 0.70)")
ax.axhline(1.0, color="gray", linestyle=":", linewidth=0.8, alpha=0.5)
ax.set_xticks(list(x))
ax.set_xticklabels(metrics, fontsize=13)
ax.set_ylim(0, 1.22)
ax.set_ylabel("Score", fontsize=12)
ax.set_title("RPF Scores: Ground Truth vs Over-Compacted Model",
fontsize=14, fontweight="bold", pad=14)
ax.legend(fontsize=10, loc="upper right")
ax.grid(axis="y", alpha=0.2)
plt.tight_layout()
plt.show()
precision_drop = scores.get('precision', 0) - collapsed_scores.get('precision', 0)
print(f"\nPrecision drop: {scores.get('precision',0):.3f} → "
f"{collapsed_scores.get('precision',0):.3f} "
f"({precision_drop*100:.1f} percentage points)")
🚀 Summary¶
| Metric | Ground Truth | Collapsed Decoy | Interpretation |
|---|---|---|---|
| Recall | ~1.0 | ~1.0 | Both models satisfy the distance bounds — collapse doesn't violate existing contacts |
| Precision | ~1.0 | ≪ 1.0 | Collapse creates many spurious close contacts unsupported by data |
| F-measure | ~1.0 | Low | Overall quality collapses with Precision |
The RPF score is a powerful tool for detecting over-fitting:
- High Recall, Low Precision → The model is likely too compact or incorrectly folded (extra contacts).
- Low Recall, High Precision → The structure hasn't yet satisfied all experimental constraints.
- Both High → The model agrees well with the data — a high-quality determination.