[INFO] The NeRF Geometry Lab¶
Interactive Exploration of the "Hidden Math" Behind Protein AI¶
[TARGET] What You'll Learn¶
Most modern protein AI models (like AlphaFold and trRosetta) don't predict 3D coordinates directly. Instead, they predict internal coordinates (bond lengths, angles, and torsions) and then use an algorithm called NeRF (Natural Extension Reference Frame) to build the 3D model atom-by-atom.
In this tutorial:
- [ANGLE] Explore the Z-Matrix (internal coordinate representation)
- [WIDGET] Use interactive sliders to manipulate backbone torsions (phi and psi)
- [PLOT] Watch a real-time Ramachandran plot update as you change angles
- [MAP] Visualize distance matrices showing how local changes affect global structure
[INFO] Why This Matters: Understanding internal coordinates is crucial for protein structure prediction, molecular dynamics, and protein design. This is the mathematical foundation of modern structural biology AI.
# [CONFIG] Environment Detection & Setup
import os
import sys
from pathlib import Path
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
print('[INFO] Running in Google Colab - Installing dependencies...')
# Increase timeout for large packages like OpenMM
%pip install --upgrade --default-timeout=1000 -q synth-pdb py3Dmol biotite ipywidgets
import plotly.io as pio
pio.renderers.default = 'colab'
else:
print('[INFO] Running in local environment')
# Robust path detection: find repo root by looking for 'synth_pdb' directory
repo_root = Path(os.getcwd())
while repo_root.parent != repo_root:
if (repo_root / "synth_pdb").exists():
sys.path.append(str(repo_root))
break
repo_root = repo_root.parent
print('[OK] Environment configured!')
import io
import ipywidgets as widgets
import numpy as np
import plotly.graph_objects as go
import py3Dmol
from biotite.structure.io.pdb import PDBFile
from IPython.display import HTML, clear_output, display
from plotly.subplots import make_subplots
from synth_pdb import PeptideGenerator
gen = PeptideGenerator('ALA-ALA-ALA-ALA-ALA-ALA-ALA-ALA-ALA-ALA')
print('[OK] NeRF Geometry Lab Ready!')
print(' Loaded: 10-residue polyalanine alpha-helix')
[INFO] Internal Coordinates: The Language of Protein Structure¶
Z-Matrix Representation¶
Instead of Cartesian coordinates (x, y, z), proteins can be described using internal coordinates:
| Coordinate | Symbol | Description | Typical Range |
|---|---|---|---|
| Bond Length | r | Distance between bonded atoms | 1.0-1.5 A |
| Bond Angle | theta | Angle between 3 consecutive atoms | 100-120 degrees |
| Dihedral Angle | phi, psi, omega | Rotation around bonds | -180 to +180 degrees |
The Backbone Dihedrals¶
For each residue i, we have three key angles:
$$phi_i = text{dihedral}(C_{i-1}, N_i, C alpha_i, C_i)$$ $$psi_i = text{dihedral}(N_i, C alpha_i, C_i, N_{i+1})$$ $$omega_i = text{dihedral}(C alpha_i, C_i, N_{i+1}, C alpha_{i+1})$$
- phi: Rotation around N-C alpha bond
- psi: Rotation around C alpha-C bond
- omega: Peptide bond rotation (usually ~180 degrees for trans, ~0 degrees for cis)
[INFO] NeRF Algorithm: Given these angles, NeRF reconstructs 3D coordinates by:
- Placing the first 3 atoms arbitrarily
- For each new atom: use bond length, angle, and dihedral to calculate position
- Build the entire structure atom-by-atom in a single forward pass
[WIDGET] Interactive Geometry Lab¶
Use the sliders below to modify the phi and psi angles of the central residue (residue 5). Watch how:
- The 3D structure changes in real-time
- The Ramachandran plot shows your current position
- The distance matrix reveals how local changes affect global structure
Try these experiments:
- Move to the beta-sheet region: phi approx -120 degrees, psi approx +120 degrees
- Explore forbidden regions and see steric clashes
- Create a beta-turn by setting phi approx -60 degrees, psi approx -30 degrees
[WARN] Important: If you see duplicate visualizations, restart your kernel (Kernel -> Restart Kernel) and run all cells from the top.
import os
IN_CI = bool(os.getenv("CI"))
if not IN_CI:
# Output area
out = widgets.Output()
# Sliders with enhanced styling
phi_slider = widgets.FloatSlider(
min=-180, max=180, step=10, value=0,
description='delta phi:',
continuous_update=False,
style={'description_width': '50px'},
layout=widgets.Layout(width='500px')
)
psi_slider = widgets.FloatSlider(
min=-180, max=180, step=10, value=0,
description='delta psi:',
continuous_update=False,
style={'description_width': '50px'},
layout=widgets.Layout(width='500px')
)
# Track initialization
_initializing = True
def update(change=None):
global _initializing
if _initializing and change is not None:
return
phi, psi = phi_slider.value, psi_slider.value
phis, psis = [-57.0]*10, [-47.0]*10
phis[4] += phi
psis[4] += psi
res = gen.generate(phi_list=phis, psi_list=psis)
final_phi = -57.0 + phi
final_psi = -47.0 + psi
# Determine region
region = 'Unknown'
region_color = '#FFD700'
if -90 < final_phi < -30 and -70 < final_psi < -20:
region = 'alpha-helix [OK]'
region_color = '#00FF00'
elif -150 < final_phi < -90 and 90 < final_psi < 150:
region = 'beta-sheet [OK]'
region_color = '#87CEEB'
elif -90 < final_phi < -60 and 120 < final_psi < 170:
region = 'PPII [OK]'
region_color = '#90EE90'
else:
region = 'Non-canonical [WARN]'
region_color = '#FF6B6B'
with out:
clear_output(wait=True)
# Info panel
display(HTML(f"""
<div style='background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white; padding: 15px; border-radius: 10px;
font-family: monospace; box-shadow: 0 4px 6px rgba(0,0,0,0.3);
margin-bottom: 15px;'>
<b>[TARGET] Current Angles:</b><br>
phi = {final_phi:.1f} degrees | psi = {final_psi:.1f} degrees<br>
<b>Region:</b> <span style='color: {region_color};'>{region}</span>
</div>
"""))
# 3D structure
pf = PDBFile()
pf.set_structure(res.structure)
s = io.StringIO()
pf.write(s)
v = py3Dmol.view(width=700, height=400)
v.addModel(s.getvalue(), 'pdb')
v.setStyle({'stick': {'colorscheme': 'chainHetatm', 'radius': 0.15}})
v.setStyle({'resi': 5}, {'stick': {'color': 'red', 'radius': 0.25}})
v.setBackgroundColor('#1a1a1a')
v.zoomTo()
display(v.show())
# Plots
fig = make_subplots(
rows=1, cols=2,
subplot_titles=('Ramachandran Plot', 'Ca Distance Matrix')
)
# Ramachandran with regions
regions = [
{'type': 'rect', 'x0': -90, 'x1': -30, 'y0': -70, 'y1': -20,
'fillcolor': 'rgba(0,100,200,0.2)', 'line': {'width': 0}},
{'type': 'rect', 'x0': -150, 'x1': -90, 'y0': 90, 'y1': 150,
'fillcolor': 'rgba(200,100,0,0.2)', 'line': {'width': 0}},
{'type': 'rect', 'x0': -90, 'x1': -60, 'y0': 120, 'y1': 170,
'fillcolor': 'rgba(100,200,0,0.2)', 'line': {'width': 0}}
]
for r in regions:
fig.add_shape(r, row=1, col=1)
fig.add_trace(go.Scatter(
x=[final_phi], y=[final_psi], mode='markers',
marker={'size': 15, 'color': 'red', 'symbol': 'star',
'line': {'color': 'white', 'width': 2}},
hovertemplate='phi: %{x:.1f} degrees<br>psi: %{y:.1f} degrees<extra></extra>'
), row=1, col=1)
fig.update_xaxes(title_text='Phi phi (degrees)', range=[-180,180], dtick=60, row=1, col=1)
fig.update_yaxes(title_text='Psi psi (degrees)', range=[-180,180], dtick=60, row=1, col=1)
# Distance matrix
ca = res.structure[res.structure.atom_name=='CA']
n = len(ca)
dm = np.zeros((n,n))
for i in range(n):
for j in range(n):
dm[i,j] = np.linalg.norm(ca.coord[i] - ca.coord[j])
fig.add_trace(go.Heatmap(
z=dm, colorscale='Viridis',
colorbar={'title': 'Distance (A)'},
hovertemplate='Residue %{x} <-> %{y}<br>Distance: %{z:.1f} A<extra></extra>'
), row=1, col=2)
fig.update_xaxes(title_text='Residue', dtick=1, row=1, col=2)
fig.update_yaxes(title_text='Residue', dtick=1, row=1, col=2)
fig.update_layout(
height=400, width=900,
template='plotly_dark',
showlegend=False
)
display(fig)
# Connect sliders
phi_slider.observe(update, 'value')
psi_slider.observe(update, 'value')
# Display UI
display(widgets.VBox([phi_slider, psi_slider, out]))
# Initialize
_initializing = False
with out:
from IPython.display import HTML as _HTML
from IPython.display import display as _disp
_disp(_HTML(
'<div style="text-align:center;padding:40px;color:#aaa;'
'border:1px dashed #555;border-radius:8px;'
'font-style:italic;background:#1a1a1a;">'
'[INFO] Move a slider above to load the 3D structure'
'</div>'
))
phi = {final_phi:.1f} degrees | psi = {final_psi:.1f} degrees
Region: {region}
psi: %{y:.1f} degrees
Distance: %{z:.1f} A
[BALANCE] The Ground Truth Challenge: Geometry vs. Physics¶
In synthetic biology and Protein AI, we face a fundamental question: What is the "Ground Truth"?
- Geometric Truth (The Ideal): A structure built exactly at the Ramachandran centers (-60, -45 for helices). This is the Label Intent.
- Physical Truth (The Reality): A structure that has been energy-minimized to resolve steric clashes. This is what you would see in a test tube.
Let's compare them.
from synth_pdb.generator import generate_pdb_content, PeptideResult
from synth_pdb.geometry import calculate_rmsd, kabsch_superposition
import os
seq = "MVLSPADKTN"
print("[INFO] Generating structures and measuring strain...")
# 1. Geometric Ground Truth (Non-minimized)
res_geom = generate_pdb_content(sequence_str=seq, conformation="alpha", minimize_energy=False)
struct_geom = PeptideResult(res_geom).structure
# 2. Physical Reality (Minimized)
print("[INFO] Generating minimized structure (200 iterations)... ")
res_phys = generate_pdb_content(
sequence_str=seq,
conformation="alpha",
minimize_energy=True,
minimization_max_iter=200
)
struct_phys = PeptideResult(res_phys).structure
# 3. Align and measure Strain
ca_geom = struct_geom[struct_geom.atom_name == "CA"].coord
ca_phys = struct_phys[struct_phys.atom_name == "CA"].coord
rot, trans = kabsch_superposition(ca_phys, ca_geom)
fitted_coords = (rot @ ca_phys.T).T + trans
strain_rmsd = calculate_rmsd(fitted_coords, ca_geom)
print(f"[MEASURE] Conformational Strain (RMSD): {strain_rmsd:.3f} A")
print(" Low RMSD (< 0.5 A) means the ideal geometry was already physically happy.")
print(" High RMSD means the physics engine had to 'warp' the intent to avoid clashes.")
Why this matters for AI¶
If you train an AI on minimized data, you are inadvertently teaching it the biases of the forcefield (e.g., Amber14). By using the Geometric Truth as our label, we ensure the model learns the underlying relationship between sequence and the Platonic Ideal of the fold.
[GRAD] Key Insights¶
- Local Changes -> Global Effects: Changing one residue's angles affects the entire downstream structure
- Ramachandran Constraints: Only certain phi/psi combinations are sterically allowed
- Distance Patterns: alpha-helices show characteristic i, i+4 contacts; beta-sheets show long-range contacts
- NeRF Reconstruction: This is exactly how AlphaFold and other AI models build 3D structures!
[READ] Further Reading¶
- Jumper et al. (2021). "Highly accurate protein structure prediction with AlphaFold." Nature 596:583-589. DOI: 10.1038/s41586-021-03819-2
- Parsons et al. (2005). "Practical conversion from torsion space to Cartesian space for in silico protein synthesis." J Comput Chem 26:1063-1068. DOI: 10.1002/jcc.20237
- Ramachandran et al. (1963). "Stereochemistry of polypeptide chain configurations." J Mol Biol 7:95-99. DOI: 10.1016/S0022-2836(63)80023-680023-6)
[OK] Lab Session Complete!
You've mastered internal coordinates and NeRF geometry!