Command-Line Reference
This guide provides a comprehensive reference for the synth-pdb command-line interface.
Basic Usage
The simplest way to use synth-pdb is to specify a sequence length:
python -m synth_pdb.main --length 20 --output my_protein.pdb
Or provide a specific amino acid sequence:
python -m synth_pdb.main --sequence "ALA-GLY-SER-THR-VAL" --output test.pdb
Core Options
| Option |
Description |
Default |
--length |
Length of the amino acid sequence (number of residues). |
10 |
--sequence |
Specify an amino acid sequence (e.g., 'AGV' or 'ALA-GLY-VAL'). |
(Random) |
--output |
Output filename. |
(Generated) |
--format |
Output file format: pdb, cif, bcif. |
pdb |
--conformation |
Secondary structure conformation: alpha, beta, ppii, extended, random. |
alpha |
--structure |
Per-region conformation specification (e.g., '1-10:alpha,11-14:typeII,15-20:beta'). |
- |
--seed |
Random seed for reproducible generation. |
- |
Validation & Refinement
| Option |
Description |
--validate |
Run validation checks (bond lengths, angles, Ramachandran). |
--guarantee-valid |
Repeatedly generate until a valid structure is produced. |
--max-attempts |
Maximum number of attempts for --guarantee-valid. |
--best-of-N |
Generate N structures and select the one with the fewest violations. |
--refine-clashes |
Number of iterations to minimally adjust clashing atoms. |
--optimize |
Run Monte Carlo side-chain optimization. |
--minimize |
Run physics-based energy minimization using OpenMM. |
Scientific Features
NMR Observables
| Option |
Description |
--gen-shifts |
Generate synthetic Chemical Shift data (H, N, CA, CB, C). |
--shift-predictor |
Predictor to use: shiftx2 (default) or empirical. |
--gen-relax |
Generate synthetic NMR relaxation data (R1, R2, NOE). |
--output-rdcs |
Generate backbone N-H Residual Dipolar Coupling (RDC) data. |
--gen-couplings |
Generate synthetic 3J(HN-HA) scalar couplings. |
--gen-nef |
Generate synthetic NMR data (NOE restraints) in NEF format. |
MSA & Evolution
| Option |
Description |
--gen-msa |
Generate synthetic Multiple Sequence Alignment (MSA) via simulated evolution. |
--msa-depth |
Number of sequences to generate for MSA (default: 100). |
--evolution-temp |
Thermal Noise of MSA MCMC evolution (default: 1.5). |
Physics & Chemistry
| Option |
Description |
--forcefield |
Forcefield for minimization (default: amber14-all.xml). |
--solvent |
Solvent model: obc2, obc1, gbn, gbn2, hct, explicit. |
--cap-termini |
Add N-terminal Acetyl (ACE) and C-terminal N-methylamide (NME) caps. |
--ph |
pH for determining protonation states (default: 7.4). |
Advanced Modes (--mode)
synth-pdb supports several specialized operation modes:
generate: (Default) Generate a single structure.
decoys: Generate an ensemble of structures (decoys).
cryo-em: Generate 3D density maps (MRC format) from structures or ensembles.
saxs: Simulate Small-Angle X-ray Scattering (SAXS) profiles.
docking: Prepare structures for docking (PQR format, charge assignment).
pymol: Generate PyMOL scripts for visualization.
dataset: Bulk generation for machine learning datasets.
ai: Structure interpolation and clustering.
Cryo-EM Mode Options
| Option |
Description |
Default |
--resolution |
Target resolution in Angstroms (ร
). |
3.0 |
--grid-spacing |
Voxel size in Angstroms (ร
). |
1.0 |
--mrc-output |
Filename for the output density map. |
synthetic_map.mrc |
SAXS Mode Options
| Option |
Description |
Default |
--q-max |
Maximum scattering vector \(q\) (ร
โปยน). |
0.5 |
--saxs-points |
Number of points in the \(I(q)\) curve. |
51 |
--saxs-output |
Filename for the output .dat file. |
synthetic_saxs.dat |
AI Mode Options
| Option |
Description |
Default |
--ai-op |
AI operation: interpolate or cluster. |
- |
--start-pdb |
Start PDB file for interpolate. |
- |
--end-pdb |
End PDB file for interpolate. |
- |
--steps |
Number of steps for interpolate. |
10 |
--input-pattern |
Glob pattern for input PDB files for cluster (e.g., 'decoys/*.pdb'). |
- |
--n-clusters |
Number of clusters to form for cluster. |
5 |
AI Training & Benchmarking
synth-pdb includes a suite of scripts for training and validating GNN-based quality filters.
GNN Training (scripts/train_gnn_quality_filter.py)
| Option |
Description |
--output |
Path to save the trained .pt model. |
--n-samples |
Number of synthetic structures to generate for training. |
--epochs |
Number of training iterations. |
--diverse-good |
Recommended. Samples Alpha, Beta, and PPII for the "Good" category. |
--residue-loss-weight |
Weight (ฮป) for the auxiliary pLDDT regression task. |
GNN Benchmarking
scripts/compare_gnn_diversity.py: Compares model accuracy across different secondary structure motifs (Alpha vs Beta vs PPII).
scripts/stress_test_gnn.py: Adversarial testing to find the "Breaking Point" (Critical Drift) of a model's physical understanding.
Visualization
Use the --visualize flag to open the generated structure in a browser-based 3D viewer (powered by 3Dmol.js).
python -m synth_pdb.main --sequence "MEELQK" --visualize