NMR Ensemble Analysis: Proteins in Motion¶
A single crystal or energy-minimised structure is a frozen snapshot. In solution, a protein is constantly flexing — backbone loops swing, side chains rotate, and even the core breathes gently. NMR spectroscopists routinely solve ensembles of structures (families of conformers all consistent with the experimental data) precisely to capture this motion.
In this tutorial we use the 10-conformer NMR ensemble of Ubiquitin (PDB: 1D3Z)
to demonstrate synth-nmr's ensemble averaging module. We will:
- Load the ensemble into a
TrajectoryEnsembleobject. - Compute S² (Lipari-Szabo order parameters) directly from NH vector geometry.
- Average chemical shifts correctly across all 10 conformers.
- Demonstrate NOE r⁻⁶ averaging — and why a simple mean gives the wrong answer.
- Average RDCs and visualise per-conformer scatter around the mean.
No MD simulation required. The 1D3Z NMR ensemble already provides a physically meaningful set of conformations sampled by the real protein in solution.
1. Setup & Load Libraries¶
# Install synth-nmr and its dependencies
!pip install -q synth-nmr biotite matplotlib numpy
# Download the Ubiquitin NMR solution ensemble — 10 conformers from NOE-restrained MD
!wget -q https://files.rcsb.org/download/1D3Z.pdb -O ubiquitin_ensemble.pdb
import biotite.structure.io as strucio
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams.update(
{
"figure.dpi": 120,
"font.size": 11,
"axes.titlesize": 13,
"axes.spines.top": False,
"axes.spines.right": False,
}
)
print("Environment ready.")
2. Loading the NMR Structural Ensemble¶
PDB entry 1D3Z contains 10 conformers of Ubiquitin determined by solution NMR. Each conformer was obtained by energy minimisation restrained by the experimental NOE distances and J-coupling dihedral angles — so each one is genuinely consistent with the spectroscopic data. Together they represent the conformational space sampled by Ubiquitin tumbling in solution.
biotite.structure.io.load_structure automatically detects multiple MODEL records
and returns an AtomArrayStack — a 3-D array indexed [model, atom, xyz].
from synth_nmr.trajectory import (
compute_s2_from_trajectory,
ensemble_average_noes,
ensemble_average_rdcs,
ensemble_average_shifts,
load_trajectory,
)
# Load all NMR conformers from the PDB file
stack = strucio.load_structure("ubiquitin_ensemble.pdb")
n_models = len(stack)
# Each stack[i] is a single AtomArray (one conformer)
frames = [stack[i] for i in range(n_models)]
# Wrap into a TrajectoryEnsemble container
ensemble = load_trajectory(frames)
print(f"Loaded PDB 1D3Z ({n_models} NMR conformers, {frames[0].array_length()} atoms each)")
print(f"TrajectoryEnsemble: {len(ensemble)} frames")
print(repr(ensemble))
3. Order Parameters: Quantifying Backbone Flexibility¶
The Lipari-Szabo generalized order parameter S² measures how much each backbone N–H bond vector moves across the ensemble:
$$S^2 = |\langle \hat{\mu} \rangle|^2$$
where $\hat{\mu}$ is the unit vector along the N→H bond in each conformer. If all conformers point the same way, the mean vector is long and $S^2 \approx 1$ (rigid). If they scatter in all directions, the mean vector cancels and $S^2 \approx 0$ (floppy).
| S² range | Interpretation |
|---|---|
| 0.85–1.0 | Rigid backbone (core helices and sheets) |
| 0.60–0.85 | Mild flexibility (surface loops, turns) |
| 0.0–0.60 | Significant disorder (termini, disordered regions) |
# Compute S² = |<μ̂>|² for every backbone amide in the ensemble
s2_map = compute_s2_from_trajectory(ensemble)
print(f"S² computed for {len(s2_map)} backbone amide sites\n")
# ---------- Secondary structure of Ubiquitin (1D3Z) ----------------------------
alpha_res = set(range(23, 35)) # α-Helix
beta_res = (
set(range(1, 8))
| set(range(10, 18)) # β1, β2
| set(range(40, 46))
| set(range(48, 51)) # β3, β4
| set(range(64, 73))
) # β5
def ss_color(r):
if r in alpha_res:
return "#e74c3c"
if r in beta_res:
return "#2980b9"
return "#7f8c8d"
residues = sorted(s2_map.keys())
s2_vals = [s2_map[r] for r in residues]
colors = [ss_color(r) for r in residues]
# ---------- Plot ---------------------------------------------------------------
fig, ax = plt.subplots(figsize=(13, 4.5))
ax.bar(residues, s2_vals, color=colors, alpha=0.88, width=0.9, zorder=3)
ax.axhline(
0.85, color="#2c3e50", ls="--", lw=1.2, alpha=0.5, label="Typical rigid threshold S² = 0.85"
)
ax.set_ylim(0, 1.15)
ax.set_xlabel("Residue Number", fontsize=12)
ax.set_ylabel("S² (order parameter)", fontsize=12)
ax.set_title(
"Backbone S² from the 10-Conformer NMR Ensemble of Ubiquitin (1D3Z)\n"
"S² = |\u27e8\u03bc\u0302\u27e9|\u00b2 (Lipari & Szabo 1982; Clore et al. 1990)",
fontsize=13,
)
ax.grid(axis="y", linestyle="--", alpha=0.35, zorder=0)
patches = [
mpatches.Patch(color="#e74c3c", alpha=0.88, label="\u03b1-Helix (res\u202023\u201334)"),
mpatches.Patch(color="#2980b9", alpha=0.88, label="\u03b2-Strands"),
mpatches.Patch(color="#7f8c8d", alpha=0.88, label="Loops / Turns"),
]
ax.legend(handles=patches + [ax.lines[0]], loc="lower left", fontsize=10)
ax.annotate(
"N-terminus\n(disordered)",
xy=(1, s2_map.get(1, 0.5)),
xytext=(4, 0.25),
fontsize=9,
color="#555",
arrowprops=dict(arrowstyle="->", color="#555", lw=0.8),
)
ax.annotate(
"C-terminus\n(disordered)",
xy=(76, s2_map.get(76, 0.5)),
xytext=(68, 0.25),
fontsize=9,
color="#555",
arrowprops=dict(arrowstyle="->", color="#555", lw=0.8),
)
plt.tight_layout()
plt.show()
flexible = [(r, s2_map[r]) for r in residues if s2_map[r] < 0.60]
if flexible:
print("Highly flexible residues (S² < 0.60):")
for r, s2 in flexible:
print(f" Res {r:3d}: S² = {s2:.3f}")
4. Ensemble-Averaged Chemical Shifts¶
In the fast-exchange limit (which holds for most backbone motions in solution), the chemical shift you observe in the spectrum is the time-average of the shift over all conformations:
$$\delta_\text{obs} = \langle \delta \rangle_t = \frac{1}{N} \sum_{i=1}^{N} \delta_i$$
A simple arithmetic mean is the correct choice here. Where the ensemble-averaged shift differs significantly from the single-model prediction, conformational averaging is important and should not be ignored.
⏱ This cell predicts shifts for each of the 10 conformers — allow ~30–60 s.
from synth_nmr.chemical_shifts import predict_chemical_shifts
def flatten_shifts(raw):
"""Merge {method: {res_id: {atom: shift}}} → {res_id: {atom: shift}}."""
merged = {}
for mdict in raw.values():
for res_id, atoms in mdict.items():
merged.setdefault(res_id, {}).update(atoms)
return merged
print("Predicting chemical shifts for all 10 conformers...")
per_frame_shifts = [flatten_shifts(predict_chemical_shifts(f)) for f in ensemble]
avg_shifts = ensemble_average_shifts(per_frame_shifts)
model1_shifts = per_frame_shifts[0]
print(f"Done. Ensemble-averaged shifts for {len(avg_shifts)} residues.")
# ---------- Cα shifts: model 1 vs ensemble average ----------------------------
res_ca = [
r
for r in sorted(avg_shifts)
if "CA" in model1_shifts.get(r, {}) and "CA" in avg_shifts.get(r, {})
]
ca_m1 = [model1_shifts[r]["CA"] for r in res_ca]
ca_avg = [avg_shifts[r]["CA"] for r in res_ca]
diff_ppm = [abs(m - a) for m, a in zip(ca_m1, ca_avg)]
fig, axes = plt.subplots(1, 2, figsize=(13, 5))
# Left: scatter model 1 vs ensemble
ax = axes[0]
sc = ax.scatter(
ca_m1, ca_avg, c=diff_ppm, cmap="plasma", s=65, alpha=0.85, edgecolors="none", zorder=3
)
cb = plt.colorbar(sc, ax=ax, label="|Model\u202001 \u2212 Ensemble avg| (ppm)")
lo = min(*ca_m1, *ca_avg) - 1
hi = max(*ca_m1, *ca_avg) + 1
ax.plot([lo, hi], [lo, hi], "k--", lw=1.2, alpha=0.45, label="Perfect agreement")
ax.set_xlabel("C\u03b1 shift \u2014 Model\u20021 (ppm)", fontsize=12)
ax.set_ylabel("C\u03b1 shift \u2014 Ensemble average (ppm)", fontsize=12)
ax.set_title("C\u03b1 Chemical Shifts:\nModel\u20021 vs. 10-Frame Ensemble Average", fontsize=12)
ax.legend(fontsize=10)
# Right: per-residue shift difference bar
ax2 = axes[1]
ax2.bar(res_ca, diff_ppm, color="#8e44ad", alpha=0.78, width=0.9)
ax2.set_xlabel("Residue Number", fontsize=12)
ax2.set_ylabel("|\u0394 C\u03b1| (ppm)", fontsize=12)
ax2.set_title("Per-Residue C\u03b1 Shift Change\ndue to Ensemble Averaging", fontsize=12)
ax2.grid(axis="y", ls="--", alpha=0.35)
plt.tight_layout()
plt.show()
print(f"Mean |\u0394C\u03b1| (model 1 vs ensemble): {np.mean(diff_ppm):.3f} ppm")
print(f"Max |\u0394C\u03b1| (model 1 vs ensemble): {max(diff_ppm):.3f} ppm")
5. NOE Averaging: Why r⁻⁶ Matters¶
The NOE signal intensity between two protons is proportional to $r^{-6}$, so short distances dominate overwhelmingly. The effective distance extracted from a NOESY spectrum is:
$$r_\text{eff} = \langle r^{-6} \rangle^{-1/6}$$
This is always shorter than the arithmetic mean distance whenever conformations differ. The scatter plot below makes this vivid: every point lies on or below the diagonal, because transient close contacts are amplified by the sixth power.
Using an arithmetic mean for NOE distances is a well-known source of systematic error in structure determination from dynamic proteins.
from synth_nmr.nmr import calculate_synthetic_noes
print("Calculating NOEs for each conformer (cutoff 5.5 \u00c5) ...")
# calculate_synthetic_noes returns a LIST of dicts (one per atom pair).
# Each dict has keys: 'index_1', 'atom_name_1', 'index_2', 'atom_name_2', 'distance', ...
# We build a flat dict keyed by (res_i, atom_i, res_j, atom_j) for each frame.
per_frame_noes = []
for frame in ensemble:
raw = calculate_synthetic_noes(frame, cutoff=5.5, exclude_intra_residue=True)
flat = {}
for entry in raw:
key = (entry["index_1"], entry["atom_name_1"], entry["index_2"], entry["atom_name_2"])
flat[key] = entry["distance"]
per_frame_noes.append(flat)
eff_noes = ensemble_average_noes(per_frame_noes)
# Arithmetic mean (for comparison — physically WRONG for NOEs)
r_arith = {}
for pair in eff_noes:
vals = [d[pair] for d in per_frame_noes if pair in d]
if len(vals) == len(ensemble):
r_arith[pair] = float(np.mean(vals))
common = [p for p in r_arith if p in eff_noes]
x = [r_arith[p] for p in common]
y = [eff_noes[p] for p in common]
gap = [a - e for a, e in zip(x, y)]
print(f"{len(common)} NOE pairs present in all {n_models} conformers.")
# ---------- Plot ---------------------------------------------------------------
fig, axes = plt.subplots(1, 2, figsize=(13, 5))
# Left: scatter r_arith vs r_eff
ax = axes[0]
sc = ax.scatter(x, y, c=gap, cmap="YlOrRd", s=20, alpha=0.65, edgecolors="none", vmin=0, zorder=3)
cb = plt.colorbar(sc, ax=ax, label="r_arith \u2212 r_eff (\u00c5)")
hi = max(*x, *y) + 0.15
ax.plot([1.5, hi], [1.5, hi], "k--", lw=1.2, alpha=0.5, label="r_eff = r_arith")
ax.set_xlabel("Arithmetic Mean Distance (\u00c5)", fontsize=12)
ax.set_ylabel("r\u207b\u2076-Averaged Distance r_eff (\u00c5)", fontsize=12)
ax.set_title("NOE Averaging:\nAll points at or BELOW the diagonal", fontsize=12)
ax.legend(fontsize=10)
# Right: histogram of gap (r_arith - r_eff)
ax2 = axes[1]
ax2.hist(gap, bins=40, color="#c0392b", alpha=0.78, edgecolor="white", lw=0.4)
ax2.axvline(
np.mean(gap), color="#2c3e50", ls="--", lw=1.5, label=f"Mean gap = {np.mean(gap):.3f} \u00c5"
)
ax2.set_xlabel("r_arith \u2212 r_eff (\u00c5)", fontsize=12)
ax2.set_ylabel("Count", fontsize=12)
ax2.set_title("Distribution of Distance Overestimation\nfrom Simple Averaging", fontsize=12)
ax2.legend(fontsize=10)
plt.tight_layout()
plt.show()
n_biased = sum(1 for g in gap if g > 0.05)
print(f"{n_biased}/{len(common)} pairs show r_arith overestimates r_eff by > 0.05 \u00c5")
if gap:
idx = gap.index(max(gap))
ri, ai, rj, aj = common[idx]
print(f"Largest overestimation: {max(gap):.3f} \u00c5 (Res {ri} {ai} \u2014 Res {rj} {aj})")
6. RDC Ensemble Averaging¶
Residual Dipolar Couplings (RDCs) depend on the orientation of a bond vector relative to a static alignment tensor. Like chemical shifts, RDCs are in the fast-exchange limit, so the observed coupling is the arithmetic mean over all conformers:
$$D_\text{obs} = \langle D_i \rangle_t$$
The plot below shows the 10 individual conformer RDC traces (grey) and the ensemble average (blue). The spread of grey lines is a direct visual measure of conformational variation — wider spread = more dynamic region.
from synth_nmr.rdc import calculate_rdcs
Da, R = 10.0, 0.2 # typical weak alignment tensor for Ubiquitin
per_frame_rdcs = [calculate_rdcs(f, Da=Da, R=R) for f in ensemble]
avg_rdcs = ensemble_average_rdcs(per_frame_rdcs)
res_all = sorted(avg_rdcs.keys())
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Left: line plot, individual models + average
ax = axes[0]
for i, fr in enumerate(per_frame_rdcs):
vals = [fr.get(r, np.nan) for r in res_all]
ax.plot(
res_all,
vals,
color="#bdc3c7",
lw=0.9,
alpha=0.75,
label="Individual conformer" if i == 0 else "",
)
avg_vals = [avg_rdcs.get(r, np.nan) for r in res_all]
ax.plot(res_all, avg_vals, color="#1a6ea3", lw=2.8, zorder=4, label="Ensemble average")
ax.axhline(0, color="#7f8c8d", lw=0.8, ls="--", alpha=0.5)
ax.set_xlabel("Residue Number", fontsize=12)
ax.set_ylabel("D_NH (Hz)", fontsize=12)
ax.set_title(f"N-H RDC: Conformer Spread vs. Ensemble Average\nDa = {Da} Hz, R = {R}", fontsize=12)
ax.legend(fontsize=10)
# Right: per-residue standard deviation across conformers
ax2 = axes[1]
rdc_std = []
for r in res_all:
vals = [fr[r] for fr in per_frame_rdcs if r in fr]
rdc_std.append(float(np.std(vals)) if vals else 0.0)
bar_colors = [ss_color(r) for r in res_all]
ax2.bar(res_all, rdc_std, color=bar_colors, alpha=0.82, width=0.9)
ax2.set_xlabel("Residue Number", fontsize=12)
ax2.set_ylabel("\u03c3(D_NH) (Hz)", fontsize=12)
ax2.set_title(
"Per-Residue RDC Variability Across Conformers\n"
"Higher = more conformational diversity at this site",
fontsize=12,
)
ax2.grid(axis="y", ls="--", alpha=0.35)
patches2 = [
mpatches.Patch(color="#e74c3c", alpha=0.82, label="\u03b1-Helix"),
mpatches.Patch(color="#2980b9", alpha=0.82, label="\u03b2-Strands"),
mpatches.Patch(color="#7f8c8d", alpha=0.82, label="Loops"),
]
ax2.legend(handles=patches2, fontsize=10)
plt.tight_layout()
plt.show()
most_variable = res_all[np.argmax(rdc_std)]
print(f"Most variable RDC residue: {most_variable} (\u03c3 = {max(rdc_std):.2f} Hz)")
Summary¶
Using the 10-conformer NMR ensemble of Ubiquitin (PDB: 1D3Z), we have demonstrated four key ensemble NMR analysis workflows:
| Observable | Averaging method | Key insight | |---|---|---| | S² order parameter | $|\langle\hat{\mu}\rangle|^2$ | Direct measure of NH bond mobility — no model fitting needed | | Chemical shifts | Arithmetic mean | Ensemble-averaged shifts differ most in flexible loops | | NOE distances | $\langle r^{-6}\rangle^{-1/6}$ | Always shorter than arithmetic mean — short contacts dominate | | RDCs | Arithmetic mean | Conformer spread reveals dynamic regions at a glance |
The same workflow applies directly to MD simulation trajectories: load your
trajectory frames (.xtc, .dcd, etc.) via load_trajectory("traj.xtc", topology="protein.pdb")
after pip install synth-nmr[trajectory], and the same four analyses run unchanged.
Further reading¶
- Lipari & Szabo (1982) J. Am. Chem. Soc. 104, 4546 — Model-free formalism
- Clore et al. (1990) J. Am. Chem. Soc. 112, 4989 — S² from trajectory
- Solomon (1955) Phys. Rev. 99, 559 — NOE and r⁻⁶ dependence
- Cornilescu et al. (1998) J. Biomol. NMR 13, 289 — Ubiquitin NMR ensemble