🔬 Cryo-EM & SAXS Lab: Visualizing Conformational Heterogeneity¶
Simulating Multi-Modal Observables for Synthetic Ensembles¶
🎯 Learning Objectives¶
In this lab, we explore how protein flexibility and resolution limits manifest in two critical structural biology techniques:
- SAXS (Small-Angle X-ray Scattering): How global shape and solution dynamics dictate 1D scattering profiles.
- Cryo-EM (Cryogenic Electron Microscopy): How conformational heterogeneity and resolution blurring affect 3D density maps.
By the end of this tutorial, you will be able to generate synthetic "ground truth" ensembles and simulate the corresponding experimental data that a structural biologist would see in the lab.
# 🔧 Environment Setup
import os
import sys
import matplotlib.pyplot as plt
import numpy as np
import biotite.structure as struc
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
!pip install -q synth-pdb mrcfile matplotlib numpy scipy biotite
else:
# Local development path
sys.path.append(os.path.abspath('../../'))
from synth_pdb.batch_generator import BatchedGenerator
from synth_pdb.cryo_em import generate_density_map, save_mrc_file
from synth_pdb.saxs import calculate_saxs_profile, calculate_radius_of_gyration
from synth_pdb.visualization_saxs import plot_saxs_results
print("✅ Environment configured!")
1. Generating Ensembles: Folded vs. Disordered¶
We will use the BatchedGenerator, our high-performance vectorized walker, to generate two different ensembles for the same sequence:
- Folded Ensemble: A compact alpha-helix with minimal noise.
- Disordered Ensemble: A highly diverse collection of states mimicking an Intrinsically Disordered Protein (IDP).
seq = "MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG" # Ubiquitin
n_models = 50
print(f"🚀 Generating Folded Ensemble (Alpha Helix)...")
bg_folded = BatchedGenerator(seq, n_batch=n_models, full_atom=True)
ensemble_folded = bg_folded.generate_batch(conformation="alpha", drift=1.0, seed=42).to_stack()
print(f"🚀 Generating Disordered Ensemble (IDP-like)...")
bg_disordered = BatchedGenerator(seq, n_batch=n_models, full_atom=True)
ensemble_disordered = bg_disordered.generate_batch(conformation="random", drift=10.0, seed=42).to_stack()
print(f"✅ Generated two ensembles with {n_models} models each.")
2. SAXS: Fingerprinting Conformational States¶
SAXS is uniquely sensitive to the overall shape of a protein. For an ensemble, the observed curve $I(q)$ is the average intensity over all members.
Kratky Plots: The Folding Sensor¶
The Kratky Plot ($q^2 \cdot I(q)$ vs. $q$) is the standard way to visualize folding:
- Folded Globular Proteins: Show a clear bell-shaped curve that returns to the baseline.
- Disordered Proteins: Show a curve that continues to rise or plateau at high $q$.
def get_avg_saxs(stack):
intensities = []
for i in range(len(stack)):
q, I = calculate_saxs_profile(stack[i], q_max=0.5, n_points=50)
intensities.append(I)
return q, np.mean(intensities, axis=0)
q, i_folded = get_avg_saxs(ensemble_folded)
_, i_disordered = get_avg_saxs(ensemble_disordered)
# Visualize both in a professional Kratky plot
fig, ax = plt.subplots(figsize=(8, 5))
ax.plot(q, (q**2) * i_folded, 'g-', linewidth=2, label='Folded (Alpha Helix)')
ax.plot(q, (q**2) * i_disordered, 'r--', linewidth=2, label='Disordered (Random Coil)')
ax.set_xlabel('q (Å⁻¹)', fontsize=12)
ax.set_ylabel('$q^2 \cdot I(q)$', fontsize=12)
ax.set_title('Kratky Plot: Folded vs. Disordered Ensembles', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
Radius of Gyration ($R_g$)¶
The Guinier Plot allows us to estimate the "size" of the protein. Let's use the built-in estimator to compare the physical size of our two ensembles.
rg_folded = np.mean([calculate_radius_of_gyration(m) for m in ensemble_folded])
rg_disordered = np.mean([calculate_radius_of_gyration(m) for m in ensemble_disordered])
print(f"Average Rg (Folded): {rg_folded:.2f} Å")
print(f"Average Rg (Disordered): {rg_disordered:.2f} Å")
if rg_disordered > rg_folded:
print("\n💡 As expected, the disordered ensemble occupies a much larger volume than the compact helix!")
3. Cryo-EM: Local Resolution and Heterogeneity¶
Cryo-EM maps are 3D grids of electron density. Regions that are rigid in an ensemble show up clearly, while mobile regions appear "blurred" because the density is spread over a larger volume.
Experiment: Comparing Ensembles at 4Å Resolution¶
We will generate density maps for both ensembles and look at a 2D projection (Maximum Intensity Projection).
res = 4.0
density_folded, _ = generate_density_map(ensemble_folded, resolution=res)
density_disordered, origin = generate_density_map(ensemble_disordered, resolution=res)
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
axes[0].imshow(np.max(density_folded, axis=0), cmap='magma')
axes[0].set_title(f'Folded Ensemble ({res}Å)\nClear Helical Envelope')
axes[0].axis('off')
axes[1].imshow(np.max(density_disordered, axis=0), cmap='magma')
axes[1].set_title(f'Disordered Ensemble ({res}Å)\nSmeared/Diffuse Density')
axes[1].axis('off')
plt.tight_layout()
plt.show()
save_mrc_file("lab_ensemble_map.mrc", density_disordered, origin)
print("✅ Saved the disordered ensemble map to lab_ensemble_map.mrc")
4. Summary: Integrative Modeling¶
In this lab, we've demonstrated how to move from atomic coordinates to multi-modal experimental signatures:
- SAXS captures the global precision and compaction of the ensemble.
- Cryo-EM captures the 3D occupancy and reveals where resolution is lost due to local flexibility.
Synthetic data generation with synth-pdb is essential for benchmarking the next generation of Multimodal AI models that aim to predict protein structures directly from these diverse experimental data types.