Skip to content

synth-pdb

PyPI version License: MIT DOI Tests codecov

Generate realistic PDB files with mixed secondary structures for bioinformatics testing, education, and tool development.

⚠️ Important: The generated structures use idealized geometries and may contain violations of standard structural constraints. These files are intended for testing computational tools and educational demonstrations, not for simulation or experimental validation.


Why synth-pdb?

In structural biology and bioinformatics, researchers frequently require datasets of protein structures to test algorithms, train machine learning models, or validate analytical pipelines. While the Protein Data Bank (PDB) contains over 200,000 experimental structures, relying solely on experimental data has limitations:

  1. Bias: PDB data is biased toward crystallizable or stable proteins.
  2. Complexity: Experimental files often contain artifacts, missing atoms, or non-standard residues.
  3. Lack of Ground Truth: For NMR assignment or structure calculation, "perfect" synthetic data is essential for unit testing.

synth-pdb fills this gap by providing a lightweight, deterministic generator that produces chemically valid, full-atom PDB files with user-defined secondary structures (helices, sheets) in seconds.

Educational Philosophy: Code as Textbook πŸŽ“

synth-pdb is built on the principle that scientific software should be readable and educational.

  • Code as Textbook: We reject "black box" algorithms. Our source code (e.g., generator.py, physics.py) is heavily annotated with biophysical reasonsβ€”explaining concepts like Boltzmann weighting, order parameters (\(S^2\)), and NOE distance dependence (\(r^{-6}\)).
  • Visual Learning: With the --visualize flag, students can instantly see how abstract concepts manifest in 3D, bridging the gap between equations and biology.
  • Integrity: Specialized tests ensure educational notes remain in the codebase, preventing refactoring from stripping away the scientific context.

Key Features

✨ Structure Generation

  • Full atomic representation with backbone and side-chain heavy atoms + hydrogens.
  • Customizable sequence (1-letter or 3-letter amino acid codes).
  • Conformational diversity: Generate alpha helices, beta sheets, extended chains, or random conformations.
  • Rotamer-based side-chain placement for all 20 standard amino acids using the Dunbrack library.
  • Advanced Chemistry: Metal coordination (Zn2+), Disulfide bonds (SSBOND), and PTM support (SEP, TPO, PTR).

πŸ”¬ Validation Suite

  • Geometric Checks: Bond length, bond angle (Engh & Huber Z-scores), and peptide plane planarity.
  • Ramachandran Checking: Upgraded to Top2018 high-resolution datasets.
  • Physical Validation: Steric clash detection and SASA-based burial ratios.

βš™οΈ Quality Control & Physics

  • --best-of-N: Generate multiple structures and select the one with the fewest violations.
  • Energy Minimization: Relax structures using OpenMM (Implicit Solvent / AMBER forcefield).
  • Quality Filtering: Integrated Random Forest and GNN classifiers for structural plausibility.

πŸ“š Interactive Tutorial Catalog

Explore synth-pdb through our curated interactive tutorials. Each notebook can be opened directly in Google Colab.

πŸ”¬ Core Biophysics & NMR

Tutorial Difficulty Time Action
The Virtual NMR Spectrometer ⭐⭐ 25 min Open In Colab
Cryo-EM & SAXS Lab ⭐ 20 min Open In Colab
BMRB Validation Pipeline ⭐⭐ 25 min Open In Colab
Ubiquitin Validation Suite ⭐⭐⭐ 45 min Open In Colab
RDC Alignment Tensor Explorer ⭐⭐ 30 min Open In Colab
RPF Score Validation ⭐⭐ 25 min Open In Colab
NeRF Geometry Lab ⭐⭐ 25 min Open In Colab
Modern Formats: mmCIF & BCIF ⭐⭐ 15 min Open In Colab
The GFP Molecular Forge ⭐⭐ 30 min Open In Colab
IDP Conformational Ensembles ⭐⭐⭐ 30 min Open In Colab
AlphaFold pLDDT vs NMR S² ⭐⭐⭐ 35 min Open In Colab
GNN pLDDT Explorer ⭐⭐ 30 min Open In Colab

πŸ€– ML & AI Integration

Tutorial Difficulty Time Action
Bulk Dataset Factory ⭐ 15 min Open In Colab
Hard Decoy Challenge ⭐⭐⭐ 35 min Open In Colab
PLM Embeddings (ESM-2) ⭐⭐ 30 min Open In Colab
Co-evolution Factory ⭐⭐⭐ 35 min Open In Colab
6D Orientogram Lab ⭐⭐⭐ 30 min Open In Colab
Drug Discovery Pipeline ⭐⭐⭐ 35 min Open In Colab

πŸŽ“ Learning Paths

Choose a path based on your background and goals:

πŸ€– For ML Engineers

Build AI models with synthetic protein data

  1. AI Protein Data Factory (15 min) - Learn zero-copy data handover to PyTorch/JAX.
  2. Bulk Dataset Factory (15 min) - Generate thousands of training samples.
  3. Hard Decoy Challenge (35 min) - Create negative samples for robust training.
  4. PLM Embeddings (ESM-2) (30 min) - Add evolutionary context as per-residue node features.

πŸ”¬ For Biophysicists

Understand structure, dynamics, and spectroscopy

  1. NeRF Geometry Lab (25 min) - Learn internal coordinate systems.
  2. Virtual NMR Spectrometer (25 min) - Predict relaxation rates and chemical shifts.
  3. Protein Quality Assessment (25 min) - Validate structure quality and geometry.
  4. GNN pLDDT Explorer (30 min) - Score with a GNN; interpret per-residue pLDDT; compute TM-score and lDDT.
  5. GFP Molecular Forge (30 min) - Explore chromophore chemistry.
  6. AlphaFold pLDDT vs NMR SΒ² (35 min) - Contrast AI rigidity with physical dynamics.

πŸ’Š For Drug Designers

Design and optimize therapeutic peptides

  1. Drug Discovery Pipeline (35 min) - End-to-end peptide library to lead selection.
  2. Macrocycle Design Lab (20 min) - Create head-to-tail cyclic peptides.
  3. Bio-Active Hormone Lab (20 min) - Model bioactive peptide hormones.
  4. Hard Decoy Challenge (35 min) - Generate decoys for docking validation.

Quick Visual Demo

Run this command to generate a Leucine Zipper, minimize its energy using OpenMM, and visualize it in your browser:

synth-pdb --sequence "LKELEKELEKELEKELEKELEKEL" --conformation alpha --minimize --visualize

Citation

If you use synth-pdb in your research, please cite it:

@software{elkins_synth_pdb_2026,
  author = {Elkins, George},
  title = {synth-pdb: High-Performance Protein Structure Generator},
  url = {https://github.com/elkins/synth-pdb},
  version = {1.37.0},
  year = {2026}
}