📐 Geometry Tools Lab: Structural Engineering from Scratch¶
Mastering Protein Structure Manipulation with synth-pdb.geometry¶
🎯 Overview¶
The synth_pdb.geometry module is the high-performance heart of our structural engine. It provides domain-agnostic tools for building, aligning, and analyzing molecular structures.
In this lab, you will act as a Structural Engineer to:
- Build an atomic model from scratch using Internal Coordinates (NeRF).
- Align structural fragments using the Kabsch Algorithm.
- Audit structural quality using Dihedral Analysis and RMSD.
Performance Note: Most functions here are JIT-compiled with Numba for near-C speeds, making them suitable for large-scale MD analysis or AI training loops.
# 🛠️ SETUP: Environment and Dependencies
import os
import sys
from io import StringIO
# 1. Determine environment
IN_COLAB = 'google.colab' in sys.modules
# 2. Setup path and dependencies
if IN_COLAB:
print("🌐 Running in Google Colab")
%pip install -q biotite py3Dmol ipywidgets synth-pdb
else:
print("💻 Running locally")
sys.path.append(os.path.abspath('../../'))
# 3. Imports
import biotite.structure as struc
import biotite.structure.io.pdb as pdb
import numpy as np
import py3Dmol
from synth_pdb.geometry import (
calculate_dihedral,
calculate_rmsd,
position_atom_3d_from_internal_coords,
rotate_points,
superimpose_structures,
)
print("✅ Setup complete and imports successful!")
🏗️ Part 1: The Atomic Architect (NeRF Reconstruction)¶
Protein structures are often defined by Internal Coordinates: bond lengths ($d$), bond angles ($\\theta$), and dihedral angles ($\\phi$). The Natural Extension Reference Frame (NeRF) algorithm converts these into 3D Cartesian coordinates ($x, y, z$).
Let's build a C-alpha atom ($C\alpha_{i}$) based on the positions of the preceding backbone atoms ($N_{i-1}, C\alpha_{i-1}, C_{i-1}$).
# 1. Define previous 3 atoms (The 'Reference Frame')
# Units: Angstroms
pos_n = np.array([0.00, 0.00, 0.00])
pos_ca = np.array([1.46, 0.00, 0.00])
pos_c = np.array([2.01, 1.34, 0.00])
# 2. Define geometry for the NEXT atom (another Nitrogen)
bond_len = 1.33 # C-N peptide bond
angle = np.radians(116.0) # C-alpha-C-N angle
dihedral = np.radians(180.0) # Trans-peptide bond
# 3. Reconstruct position
next_n_pos = position_atom_3d_from_internal_coords(
pos_n, pos_ca, pos_c,
bond_len, angle, dihedral
)
print(f"Reconstructed N Position: {next_n_pos}")
print(f"Bond Check (Expected 1.33): {np.linalg.norm(next_n_pos - pos_c):.3f} Å")
🔄 Part 2: The Structural Surgeon (Superposition)¶
To compare structural models, we must first align them. The Kabsch Algorithm finds the optimal rotation and translation to minimize the Root Mean Square Deviation (RMSD).
We will load a reference structure of Ubiquitin (1UBQ), create a scrambled 'mobile' copy, and use superimpose_structures to heal the alignment.
# 1. Fetch reference (1UBQ)
try:
# Attempt local load from examples
ubq_file = pdb.PDBFile.read('../../examples/1UBQ.pdb')
ubq_ref = ubq_file.get_structure(model=1)
ca_ref = ubq_ref[ubq_ref.atom_name == 'CA'].coord
# 2. Scramble the coordinates (Rotate 45° and Translate)
theta = np.radians(45)
R_scramble = np.array([
[np.cos(theta), -np.sin(theta), 0],
[np.sin(theta), np.cos(theta), 0],
[0, 0, 1]
])
ca_mobile = (R_scramble @ ca_ref.T).T + np.array([50, -20, 10])
print(f"Initial RMSD (Scrambled): {calculate_rmsd(ca_mobile, ca_ref):.2f} Å")
# 3. Apply Kabsch Superposition
ca_aligned = superimpose_structures(mobile=ca_mobile, reference=ca_ref)
final_rmsd = calculate_rmsd(ca_aligned, ca_ref)
print(f"Final RMSD (Aligned): {final_rmsd:.6f} Å")
# Visual confirmation
view = py3Dmol.view(width=800, height=400)
# We'll use spheres to visualize the point cloud alignment
for i, pos in enumerate(ca_ref[:20]): # Show first 20 atoms
view.addSphere({'center': {'x': float(pos[0]), 'y': float(pos[1]), 'z': float(pos[2])},
'radius': 0.5, 'color': 'blue', 'opacity': 0.5})
for i, pos in enumerate(ca_aligned[:20]):
view.addSphere({'center': {'x': float(pos[0]), 'y': float(pos[1]), 'z': float(pos[2])},
'radius': 0.3, 'color': 'red'})
view.zoomTo()
view.show()
print("Blue = Reference | Red = Aligned Mobile")
except Exception as e:
print(f"⚠️ Structural files missing, skipping live visualization: {e}")
🧪 Part 3: The Geometric Analyst (Dihedrals)¶
Finally, let's analyze the Backbone Torsion Angles ($\\phi, \\psi$). These angles determine the secondary structure of the protein.
Our calculate_dihedral function uses a robust Praxeolitic formula that avoids numerical instability (NaNs) when atoms are nearly collinear—a common failure point in simpler geometry libraries.
# Example: Measuring the Psi (ψ) angle of the first residue in Ubiquitin
if 'ubq_ref' in locals():
res1 = ubq_ref[ubq_ref.res_id == 1]
res2 = ubq_ref[ubq_ref.res_id == 2]
# Psi is defined by N(i) - CA(i) - C(i) - N(i+1)
n1 = res1[res1.atom_name == 'N'].coord[0]
ca1 = res1[res1.atom_name == 'CA'].coord[0]
c1 = res1[res1.atom_name == 'C'].coord[0]
n2 = res2[res2.atom_name == 'N'].coord[0]
psi = calculate_dihedral(n1, ca1, c1, n2)
print(f"Measured ψ for Res 1: {psi:.2f}°")
if psi > 100:
print("Result: Likely part of a Beta-sheet or Poly-proline II helix.")
else:
print("Result: Likely helical or coiled.")
🎡 Part 4: Arbitrary Point Rotation¶
The rotate_points function is a low-level utility that rotates a set of points around an axis defined by two other points. This is the foundation for modeling side-chain rotamers or performing domain-swapping experiments.
# Rotate a point (Gamma atom) 120° around the CA-CB bond axis
p_ca = np.array([0.0, 0.0, 0.0])
p_cb = np.array([0.0, 0.0, 1.54]) # Axis of rotation
p_cg = np.array([[1.0, 0.0, 1.54]]) # Points to rotate (N, 3)
rotated_cg = rotate_points(p_cg, p_ca, p_cb, 120.0)
print(f"Original CG: {p_cg[0]}")
print(f"Rotated CG: {rotated_cg[0]}")
Congratulations! You've mastered the building blocks of structural bioinformatics. Whether you're building de-novo loops or analyzing massive MD trajectories, these tools provide the precision and performance you need.