[INFO] The NeRF Geometry Lab¶

Interactive Exploration of the "Hidden Math" Behind Protein AI¶

[TARGET] What You'll Learn¶

Most modern protein AI models (like AlphaFold and trRosetta) don't predict 3D coordinates directly. Instead, they predict internal coordinates (bond lengths, angles, and torsions) and then use an algorithm called NeRF (Natural Extension Reference Frame) to build the 3D model atom-by-atom.

In this tutorial:

[ANGLE] Explore the Z-Matrix (internal coordinate representation)
[WIDGET] Use interactive sliders to manipulate backbone torsions (phi and psi)
[PLOT] Watch a real-time Ramachandran plot update as you change angles
[MAP] Visualize distance matrices showing how local changes affect global structure

[INFO] Why This Matters: Understanding internal coordinates is crucial for protein structure prediction, molecular dynamics, and protein design. This is the mathematical foundation of modern structural biology AI.

In [ ]:

Copied!





# [CONFIG] Environment Detection & Setup
import os
import sys
from pathlib import Path

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print('[INFO] Running in Google Colab - Installing dependencies...')
    # Increase timeout for large packages like OpenMM
    %pip install --upgrade --default-timeout=1000 -q synth-pdb py3Dmol biotite ipywidgets
    import plotly.io as pio
    pio.renderers.default = 'colab'
else:
    print('[INFO] Running in local environment')
    # Robust path detection: find repo root by looking for 'synth_pdb' directory
    repo_root = Path(os.getcwd())
    while repo_root.parent != repo_root:
        if (repo_root / "synth_pdb").exists():
            sys.path.append(str(repo_root))
            break
        repo_root = repo_root.parent

print('[OK] Environment configured!')
# [CONFIG] Environment Detection & Setup
import os
import sys
from pathlib import Path

IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print('[INFO] Running in Google Colab - Installing dependencies...')
    # Increase timeout for large packages like OpenMM
    %pip install --upgrade --default-timeout=1000 -q synth-pdb py3Dmol biotite ipywidgets
    import plotly.io as pio
    pio.renderers.default = 'colab'
else:
    print('[INFO] Running in local environment')
    # Robust path detection: find repo root by looking for 'synth_pdb' directory
    repo_root = Path(os.getcwd())
    while repo_root.parent != repo_root:
        if (repo_root / "synth_pdb").exists():
            sys.path.append(str(repo_root))
            break
        repo_root = repo_root.parent

print('[OK] Environment configured!')

In [ ]:

Copied!





import io

import ipywidgets as widgets
import numpy as np
import plotly.graph_objects as go
import py3Dmol
from biotite.structure.io.pdb import PDBFile
from IPython.display import HTML, clear_output, display
from plotly.subplots import make_subplots

from synth_pdb import PeptideGenerator

gen = PeptideGenerator('ALA-ALA-ALA-ALA-ALA-ALA-ALA-ALA-ALA-ALA')
print('[OK] NeRF Geometry Lab Ready!')
print('   Loaded: 10-residue polyalanine alpha-helix')
import io

import ipywidgets as widgets
import numpy as np
import plotly.graph_objects as go
import py3Dmol
from biotite.structure.io.pdb import PDBFile
from IPython.display import HTML, clear_output, display
from plotly.subplots import make_subplots

from synth_pdb import PeptideGenerator

gen = PeptideGenerator('ALA-ALA-ALA-ALA-ALA-ALA-ALA-ALA-ALA-ALA')
print('[OK] NeRF Geometry Lab Ready!')
print('   Loaded: 10-residue polyalanine alpha-helix')

[INFO] Internal Coordinates: The Language of Protein Structure¶

Z-Matrix Representation¶

Instead of Cartesian coordinates (x, y, z), proteins can be described using internal coordinates:

Coordinate	Symbol	Description	Typical Range
Bond Length	r	Distance between bonded atoms	1.0-1.5 A
Bond Angle	theta	Angle between 3 consecutive atoms	100-120 degrees
Dihedral Angle	phi, psi, omega	Rotation around bonds	-180 to +180 degrees

The Backbone Dihedrals¶

For each residue i, we have three key angles:

$$phi_i = text{dihedral}(C_{i-1}, N_i, C alpha_i, C_i)$$ $$psi_i = text{dihedral}(N_i, C alpha_i, C_i, N_{i+1})$$ $$omega_i = text{dihedral}(C alpha_i, C_i, N_{i+1}, C alpha_{i+1})$$

phi: Rotation around N-C alpha bond
psi: Rotation around C alpha-C bond
omega: Peptide bond rotation (usually ~180 degrees for trans, ~0 degrees for cis)

[INFO] NeRF Algorithm: Given these angles, NeRF reconstructs 3D coordinates by:

Placing the first 3 atoms arbitrarily

For each new atom: use bond length, angle, and dihedral to calculate position

Build the entire structure atom-by-atom in a single forward pass

Use the sliders below to modify the phi and psi angles of the central residue (residue 5). Watch how:

The 3D structure changes in real-time
The Ramachandran plot shows your current position
The distance matrix reveals how local changes affect global structure

Try these experiments:

Move to the beta-sheet region: phi approx -120 degrees, psi approx +120 degrees
Explore forbidden regions and see steric clashes
Create a beta-turn by setting phi approx -60 degrees, psi approx -30 degrees

[WARN] Important: If you see duplicate visualizations, restart your kernel (Kernel -> Restart Kernel) and run all cells from the top.

In [ ]:

Copied!





import os

IN_CI = bool(os.getenv("CI"))

if not IN_CI:
    # Output area
    out = widgets.Output()

    # Sliders with enhanced styling
    phi_slider = widgets.FloatSlider(
        min=-180, max=180, step=10, value=0,
        description='delta phi:',
        continuous_update=False,
        style={'description_width': '50px'},
        layout=widgets.Layout(width='500px')
    )

    psi_slider = widgets.FloatSlider(
        min=-180, max=180, step=10, value=0,
        description='delta psi:',
        continuous_update=False,
        style={'description_width': '50px'},
        layout=widgets.Layout(width='500px')
    )

    # Track initialization
    _initializing = True

    def update(change=None):
        global _initializing
        if _initializing and change is not None:
            return

        phi, psi = phi_slider.value, psi_slider.value
        phis, psis = [-57.0]*10, [-47.0]*10
        phis[4] += phi
        psis[4] += psi
        res = gen.generate(phi_list=phis, psi_list=psis)

        final_phi = -57.0 + phi
        final_psi = -47.0 + psi

        # Determine region
        region = 'Unknown'
        region_color = '#FFD700'
        if -90 < final_phi < -30 and -70 < final_psi < -20:
            region = 'alpha-helix [OK]'
            region_color = '#00FF00'
        elif -150 < final_phi < -90 and 90 < final_psi < 150:
            region = 'beta-sheet [OK]'
            region_color = '#87CEEB'
        elif -90 < final_phi < -60 and 120 < final_psi < 170:
            region = 'PPII [OK]'
            region_color = '#90EE90'
        else:
            region = 'Non-canonical [WARN]'
            region_color = '#FF6B6B'

        with out:
            clear_output(wait=True)

            # Info panel
            display(HTML(f"""
            <div style='background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
                        color: white; padding: 15px; border-radius: 10px;
                        font-family: monospace; box-shadow: 0 4px 6px rgba(0,0,0,0.3);
                        margin-bottom: 15px;'>
                <b>[TARGET] Current Angles:</b><br>
                phi = {final_phi:.1f} degrees | psi = {final_psi:.1f} degrees<br>
                <b>Region:</b> <span style='color: {region_color};'>{region}</span>
            </div>
            """))

            # 3D structure
            pf = PDBFile()
            pf.set_structure(res.structure)
            s = io.StringIO()
            pf.write(s)
            v = py3Dmol.view(width=700, height=400)
            v.addModel(s.getvalue(), 'pdb')
            v.setStyle({'stick': {'colorscheme': 'chainHetatm', 'radius': 0.15}})
            v.setStyle({'resi': 5}, {'stick': {'color': 'red', 'radius': 0.25}})
            v.setBackgroundColor('#1a1a1a')
            v.zoomTo()
            display(v.show())

            # Plots
            fig = make_subplots(
                rows=1, cols=2,
                subplot_titles=('Ramachandran Plot', 'Ca Distance Matrix')
            )

            # Ramachandran with regions
            regions = [
                {'type': 'rect', 'x0': -90, 'x1': -30, 'y0': -70, 'y1': -20,
                     'fillcolor': 'rgba(0,100,200,0.2)', 'line': {'width': 0}},
                {'type': 'rect', 'x0': -150, 'x1': -90, 'y0': 90, 'y1': 150,
                     'fillcolor': 'rgba(200,100,0,0.2)', 'line': {'width': 0}},
                {'type': 'rect', 'x0': -90, 'x1': -60, 'y0': 120, 'y1': 170,
                     'fillcolor': 'rgba(100,200,0,0.2)', 'line': {'width': 0}}
            ]
            for r in regions:
                fig.add_shape(r, row=1, col=1)

            fig.add_trace(go.Scatter(
                x=[final_phi], y=[final_psi], mode='markers',
                marker={'size': 15, 'color': 'red', 'symbol': 'star',
                           'line': {'color': 'white', 'width': 2}},
                hovertemplate='phi: %{x:.1f} degrees<br>psi: %{y:.1f} degrees<extra></extra>'
            ), row=1, col=1)

            fig.update_xaxes(title_text='Phi phi (degrees)', range=[-180,180], dtick=60, row=1, col=1)
            fig.update_yaxes(title_text='Psi psi (degrees)', range=[-180,180], dtick=60, row=1, col=1)

            # Distance matrix
            ca = res.structure[res.structure.atom_name=='CA']
            n = len(ca)
            dm = np.zeros((n,n))
            for i in range(n):
                for j in range(n):
                    dm[i,j] = np.linalg.norm(ca.coord[i] - ca.coord[j])

            fig.add_trace(go.Heatmap(
                z=dm, colorscale='Viridis',
                colorbar={'title': 'Distance (A)'},
                hovertemplate='Residue %{x} <-> %{y}<br>Distance: %{z:.1f} A<extra></extra>'
            ), row=1, col=2)

            fig.update_xaxes(title_text='Residue', dtick=1, row=1, col=2)
            fig.update_yaxes(title_text='Residue', dtick=1, row=1, col=2)

            fig.update_layout(
                height=400, width=900,
                template='plotly_dark',
                showlegend=False
            )
            display(fig)

    # Connect sliders
    phi_slider.observe(update, 'value')
    psi_slider.observe(update, 'value')

    # Display UI
    display(widgets.VBox([phi_slider, psi_slider, out]))

    # Initialize
    _initializing = False
    with out:
        from IPython.display import HTML as _HTML
        from IPython.display import display as _disp
        _disp(_HTML(
            '<div style="text-align:center;padding:40px;color:#aaa;'
            'border:1px dashed #555;border-radius:8px;'
            'font-style:italic;background:#1a1a1a;">'
            '[INFO] Move a slider above to load the 3D structure'
            '</div>'
        ))
import os

IN_CI = bool(os.getenv("CI"))

if not IN_CI:
    # Output area
    out = widgets.Output()

    # Sliders with enhanced styling
    phi_slider = widgets.FloatSlider(
        min=-180, max=180, step=10, value=0,
        description='delta phi:',
        continuous_update=False,
        style={'description_width': '50px'},
        layout=widgets.Layout(width='500px')
    )

    psi_slider = widgets.FloatSlider(
        min=-180, max=180, step=10, value=0,
        description='delta psi:',
        continuous_update=False,
        style={'description_width': '50px'},
        layout=widgets.Layout(width='500px')
    )

    # Track initialization
    _initializing = True

    def update(change=None):
        global _initializing
        if _initializing and change is not None:
            return

        phi, psi = phi_slider.value, psi_slider.value
        phis, psis = [-57.0]*10, [-47.0]*10
        phis[4] += phi
        psis[4] += psi
        res = gen.generate(phi_list=phis, psi_list=psis)

        final_phi = -57.0 + phi
        final_psi = -47.0 + psi

        # Determine region
        region = 'Unknown'
        region_color = '#FFD700'
        if -90 < final_phi < -30 and -70 < final_psi < -20:
            region = 'alpha-helix [OK]'
            region_color = '#00FF00'
        elif -150 < final_phi < -90 and 90 < final_psi < 150:
            region = 'beta-sheet [OK]'
            region_color = '#87CEEB'
        elif -90 < final_phi < -60 and 120 < final_psi < 170:
            region = 'PPII [OK]'
            region_color = '#90EE90'
        else:
            region = 'Non-canonical [WARN]'
            region_color = '#FF6B6B'

        with out:
            clear_output(wait=True)

            # Info panel
            display(HTML(f"""
            
[TARGET] Current Angles:

                phi = {final_phi:.1f} degrees | psi = {final_psi:.1f} degrees

Region: {region}

            """))

            # 3D structure
            pf = PDBFile()
            pf.set_structure(res.structure)
            s = io.StringIO()
            pf.write(s)
            v = py3Dmol.view(width=700, height=400)
            v.addModel(s.getvalue(), 'pdb')
            v.setStyle({'stick': {'colorscheme': 'chainHetatm', 'radius': 0.15}})
            v.setStyle({'resi': 5}, {'stick': {'color': 'red', 'radius': 0.25}})
            v.setBackgroundColor('#1a1a1a')
            v.zoomTo()
            display(v.show())

            # Plots
            fig = make_subplots(
                rows=1, cols=2,
                subplot_titles=('Ramachandran Plot', 'Ca Distance Matrix')
            )

            # Ramachandran with regions
            regions = [
                {'type': 'rect', 'x0': -90, 'x1': -30, 'y0': -70, 'y1': -20,
                     'fillcolor': 'rgba(0,100,200,0.2)', 'line': {'width': 0}},
                {'type': 'rect', 'x0': -150, 'x1': -90, 'y0': 90, 'y1': 150,
                     'fillcolor': 'rgba(200,100,0,0.2)', 'line': {'width': 0}},
                {'type': 'rect', 'x0': -90, 'x1': -60, 'y0': 120, 'y1': 170,
                     'fillcolor': 'rgba(100,200,0,0.2)', 'line': {'width': 0}}
            ]
            for r in regions:
                fig.add_shape(r, row=1, col=1)

            fig.add_trace(go.Scatter(
                x=[final_phi], y=[final_psi], mode='markers',
                marker={'size': 15, 'color': 'red', 'symbol': 'star',
                           'line': {'color': 'white', 'width': 2}},
                hovertemplate='phi: %{x:.1f} degrees
psi: %{y:.1f} degrees'
            ), row=1, col=1)

            fig.update_xaxes(title_text='Phi phi (degrees)', range=[-180,180], dtick=60, row=1, col=1)
            fig.update_yaxes(title_text='Psi psi (degrees)', range=[-180,180], dtick=60, row=1, col=1)

            # Distance matrix
            ca = res.structure[res.structure.atom_name=='CA']
            n = len(ca)
            dm = np.zeros((n,n))
            for i in range(n):
                for j in range(n):
                    dm[i,j] = np.linalg.norm(ca.coord[i] - ca.coord[j])

            fig.add_trace(go.Heatmap(
                z=dm, colorscale='Viridis',
                colorbar={'title': 'Distance (A)'},
                hovertemplate='Residue %{x} <-> %{y}
Distance: %{z:.1f} A'
            ), row=1, col=2)

            fig.update_xaxes(title_text='Residue', dtick=1, row=1, col=2)
            fig.update_yaxes(title_text='Residue', dtick=1, row=1, col=2)

            fig.update_layout(
                height=400, width=900,
                template='plotly_dark',
                showlegend=False
            )
            display(fig)

    # Connect sliders
    phi_slider.observe(update, 'value')
    psi_slider.observe(update, 'value')

    # Display UI
    display(widgets.VBox([phi_slider, psi_slider, out]))

    # Initialize
    _initializing = False
    with out:
        from IPython.display import HTML as _HTML
        from IPython.display import display as _disp
        _disp(_HTML(
            ''
            '[INFO] Move a slider above to load the 3D structure'
            '
'
        ))

[BALANCE] The Ground Truth Challenge: Geometry vs. Physics¶

In synthetic biology and Protein AI, we face a fundamental question: What is the "Ground Truth"?

Geometric Truth (The Ideal): A structure built exactly at the Ramachandran centers (-60, -45 for helices). This is the Label Intent.
Physical Truth (The Reality): A structure that has been energy-minimized to resolve steric clashes. This is what you would see in a test tube.

Let's compare them.

In [ ]:

Copied!





from synth_pdb.generator import generate_pdb_content, PeptideResult
from synth_pdb.geometry import calculate_rmsd, kabsch_superposition
import os

seq = "MVLSPADKTN"

print("[INFO] Generating structures and measuring strain...")

# 1. Geometric Ground Truth (Non-minimized)
res_geom = generate_pdb_content(sequence_str=seq, conformation="alpha", minimize_energy=False)
struct_geom = PeptideResult(res_geom).structure

# 2. Physical Reality (Minimized)
print("[INFO] Generating minimized structure (200 iterations)... ")

res_phys = generate_pdb_content(
    sequence_str=seq, 
    conformation="alpha", 
    minimize_energy=True, 
    minimization_max_iter=200
)
struct_phys = PeptideResult(res_phys).structure

# 3. Align and measure Strain
ca_geom = struct_geom[struct_geom.atom_name == "CA"].coord
ca_phys = struct_phys[struct_phys.atom_name == "CA"].coord

rot, trans = kabsch_superposition(ca_phys, ca_geom)
fitted_coords = (rot @ ca_phys.T).T + trans
strain_rmsd = calculate_rmsd(fitted_coords, ca_geom)

print(f"[MEASURE] Conformational Strain (RMSD): {strain_rmsd:.3f} A")
print("   Low RMSD (< 0.5 A) means the ideal geometry was already physically happy.")
print("   High RMSD means the physics engine had to 'warp' the intent to avoid clashes.")
from synth_pdb.generator import generate_pdb_content, PeptideResult
from synth_pdb.geometry import calculate_rmsd, kabsch_superposition
import os

seq = "MVLSPADKTN"

print("[INFO] Generating structures and measuring strain...")

# 1. Geometric Ground Truth (Non-minimized)
res_geom = generate_pdb_content(sequence_str=seq, conformation="alpha", minimize_energy=False)
struct_geom = PeptideResult(res_geom).structure

# 2. Physical Reality (Minimized)
print("[INFO] Generating minimized structure (200 iterations)... ")

res_phys = generate_pdb_content(
    sequence_str=seq, 
    conformation="alpha", 
    minimize_energy=True, 
    minimization_max_iter=200
)
struct_phys = PeptideResult(res_phys).structure

# 3. Align and measure Strain
ca_geom = struct_geom[struct_geom.atom_name == "CA"].coord
ca_phys = struct_phys[struct_phys.atom_name == "CA"].coord

rot, trans = kabsch_superposition(ca_phys, ca_geom)
fitted_coords = (rot @ ca_phys.T).T + trans
strain_rmsd = calculate_rmsd(fitted_coords, ca_geom)

print(f"[MEASURE] Conformational Strain (RMSD): {strain_rmsd:.3f} A")
print("   Low RMSD (< 0.5 A) means the ideal geometry was already physically happy.")
print("   High RMSD means the physics engine had to 'warp' the intent to avoid clashes.")

Why this matters for AI¶

If you train an AI on minimized data, you are inadvertently teaching it the biases of the forcefield (e.g., Amber14). By using the Geometric Truth as our label, we ensure the model learns the underlying relationship between sequence and the Platonic Ideal of the fold.

[GRAD] Key Insights¶

Local Changes -> Global Effects: Changing one residue's angles affects the entire downstream structure
Ramachandran Constraints: Only certain phi/psi combinations are sterically allowed
Distance Patterns: alpha-helices show characteristic i, i+4 contacts; beta-sheets show long-range contacts
NeRF Reconstruction: This is exactly how AlphaFold and other AI models build 3D structures!

[READ] Further Reading¶

Jumper et al. (2021). "Highly accurate protein structure prediction with AlphaFold." Nature 596:583-589. DOI: 10.1038/s41586-021-03819-2
Parsons et al. (2005). "Practical conversion from torsion space to Cartesian space for in silico protein synthesis." J Comput Chem 26:1063-1068. DOI: 10.1002/jcc.20237
Ramachandran et al. (1963). "Stereochemistry of polypeptide chain configurations." J Mol Biol 7:95-99. DOI: 10.1016/S0022-2836(63)80023-680023-6)

[OK] Lab Session Complete!

You've mastered internal coordinates and NeRF geometry!

[INFO] The NeRF Geometry Lab¶

Interactive Exploration of the "Hidden Math" Behind Protein AI¶

[TARGET] What You'll Learn¶

[INFO] Internal Coordinates: The Language of Protein Structure¶

Z-Matrix Representation¶

The Backbone Dihedrals¶

[WIDGET] Interactive Geometry Lab¶

[BALANCE] The Ground Truth Challenge: Geometry vs. Physics¶

Why this matters for AI¶

[GRAD] Key Insights¶

[READ] Further Reading¶

[OK] Lab Session Complete!