📐 Geometry Tools Reference¶
Mastering Protein Structure Manipulation with synth-pdb.geometry¶
🎯 Overview¶
The synth_pdb.geometry module provides a high-performance, domain-agnostic suite of tools for structural biology calculations. Optimized for both speed (via Numba) and mathematical robustness (Praxeolitic dihedrals), these utilities are the core of our structural engines.
# 🛠️ SETUP: Environment and Dependencies
import os
import sys
# 1. Determine environment
IN_COLAB = 'google.colab' in sys.modules
# 2. Setup path and dependencies
if IN_COLAB:
print("🌐 Running in Google Colab")
# Install public dependencies
%pip install -q biotite py3Dmol ipywidgets
# Clone repository if module is missing
if not os.path.exists('synth-pdb'):
print("💾 Cloning synth-pdb repository for module access...")
!git clone -q https://github.com/elkins/synth-pdb.git
sys.path.append(os.path.abspath('synth-pdb'))
else:
sys.path.append(os.path.abspath('./'))
else:
print("💻 Running locally")
sys.path.append(os.path.abspath('../../'))
# 3. Imports
import numpy as np
import matplotlib.pyplot as plt
import biotite.structure as struc
import biotite.structure.io.pdb as pdb
from synth_pdb.geometry import (
superimpose_structures,
calculate_rmsd,
calculate_pairwise_rmsd,
calculate_dihedral,
position_atom_3d_from_internal_coords,
rotate_points,
superimpose_batch
)
print("✅ Setup complete and imports successful!")
1. Optimal Superposition (Kabsch Algorithm)¶
To compare structures, we must first align them. The Kabsch algorithm finds the optimal rotation and translation to minimize the Root Mean Square Deviation (RMSD).
# Load Ubiquitin (1UBQ) as a reference
try:
# Use the example PDB file provided with the repository
ubq_file = pdb.PDBFile.read('../../examples/1UBQ.pdb')
ubq_ref = ubq_file.get_structure(model=1)
ca_ref = ubq_ref[ubq_ref.atom_name == 'CA'].coord
# Create a 'mobile' structure with 45-degree rotation and 10,20,30 translation
theta = np.radians(45)
R_rand = np.array([
[np.cos(theta), -np.sin(theta), 0],
[np.sin(theta), np.cos(theta), 0],
[0, 0, 1]
])
ca_mobile = (R_rand @ ca_ref.T).T + np.array([10, 20, 30])
print(f'Initial distance between centroids: {np.linalg.norm(np.mean(ca_mobile, axis=0) - np.mean(ca_ref, axis=0)):.2f} Å')
# Superimpose mobile onto reference
ca_aligned = superimpose_structures(mobile=ca_mobile, reference=ca_ref)
print(f'Post-alignment RMSD: {calculate_rmsd(ca_aligned, ca_ref):.4f} Å')
except Exception as e:
print(f'⚠️ Skipping live PDB load: {e}')
2. Dihedral Analysis¶
calculate_dihedral uses a robust Praxeolitic formula that avoids numerical instability at 0/180 degrees. This is crucial for protein backbones where trans-peptides are common.
p1 = np.array([12.4, 15.3, 20.1])
p2 = np.array([13.8, 14.2, 19.8])
p3 = np.array([14.5, 13.0, 20.5])
p4 = np.array([15.2, 12.1, 19.9])
torsion = calculate_dihedral(p1, p2, p3, p4)
print(f'Calculated Torsion: {torsion:.2f}°')
3. Low-Level Point Rotation¶
Use rotate_points to rotate arbitrary coordinates around an axis defined by two points in space. This is the foundation of sidechain reconstruction.
# Rotate a point around a bond axis (simulating Chi1 rotation)
bond_start = np.array([0.0, 0.0, 0.0])
bond_end = np.array([0.0, 0.0, 1.5])
gamma_atom = np.array([[1.0, 0.0, 1.5]]) # Array shape (N, 3)
rotated_gamma = rotate_points(gamma_atom, bond_start, bond_end, 90.0)
print(f'Original Gamma: {gamma_atom[0]}')
print(f'Rotated Gamma (90°): {rotated_gamma[0]}')
4. NeRF Reconstruction¶
NeRF converts internal coordinates (bond length, angle, dihedral) into 3D Cartesian coordinates.
# Reconstruct a position from internal coordinates
node_n = np.array([0, 0, 0])
node_ca = np.array([1.45, 0, 0])
node_c = np.array([2.0, 1.2, 0])
dist = 1.52
angle = np.radians(111.0)
dihedral = np.radians(180.0)
pos = position_atom_3d_from_internal_coords(node_n, node_ca, node_c, dist, angle, dihedral)
print(f'Reconstructed Position: {pos}')