generator Module

The generator module is the core of synth-pdb, responsible for creating protein structures from amino acid sequences.

Overview

The generator uses the NeRF (Natural Extension Reference Frame) algorithm to build 3D protein structures from internal coordinates (bond lengths, angles, and dihedrals).

Main Classes

`PeptideGenerator`

Object-oriented wrapper for protein structure generation. Provides a cleaner API for interactive notebooks and complex workflows.

Source code in synth_pdb/generator.py

class PeptideGenerator:
    """Object-oriented wrapper for protein structure generation.
    Provides a cleaner API for interactive notebooks and complex workflows.
    """

    def __init__(self, sequence: str = "ALA-GLY-SER", **kwargs: Any) -> None:
        self.sequence = sequence
        self.config = kwargs
        self._last_result: Optional[PeptideResult] = None

    def generate(self, **overrides: Any) -> "PeptideResult":
        """Generates the protein structure and returns a Result object."""
        # Merge init config with call-time overrides
        call_config = {**self.config, **overrides}

        # Call the functional generator
        pdb_content = generate_pdb_content(sequence_str=self.sequence, **call_config)

        # Package into a Result object for easy access
        self._last_result = PeptideResult(pdb_content)
        return self._last_result

Functions

`init(sequence='ALA-GLY-SER', **kwargs)`

Source code in synth_pdb/generator.py

def __init__(self, sequence: str = "ALA-GLY-SER", **kwargs: Any) -> None:
    self.sequence = sequence
    self.config = kwargs
    self._last_result: Optional[PeptideResult] = None

`generate(**overrides)`

Generates the protein structure and returns a Result object.

Source code in synth_pdb/generator.py

def generate(self, **overrides: Any) -> "PeptideResult":
    """Generates the protein structure and returns a Result object."""
    # Merge init config with call-time overrides
    call_config = {**self.config, **overrides}

    # Call the functional generator
    pdb_content = generate_pdb_content(sequence_str=self.sequence, **call_config)

    # Package into a Result object for easy access
    self._last_result = PeptideResult(pdb_content)
    return self._last_result

Main Functions

generate_pdb_content(length: Optional[int] = None, sequence_str: Optional[str] = None, use_plausible_frequencies: bool = False, conformation: str = 'alpha', structure: Optional[str] = None, optimize_sidechains: bool = False, minimize_energy: bool = False, forcefield: str = 'amber14-all.xml', solvent_model: str = 'obc2', solvent_padding: float = 1.0, keep_solvent: bool = False, seed: Optional[int] = None, ph: float = 7.4, cap_termini: bool = False, equilibrate: bool = False, equilibrate_steps: int = 1000, metal_ions: str = 'auto', minimization_k: float = 10.0, minimization_max_iter: int = 0, cis_proline_frequency: float = 0.05, phosphorylation_rate: float = 0.0, cyclic: bool = False, drift: float = 0.0, phi_list: Optional[List[float]] = None, psi_list: Optional[List[float]] = None, omega_list: Optional[List[float]] = None) -> str

Generates PDB content for a linear or cyclic peptide chain.

EDUCATIONAL NOTE - New Feature: Cyclic Peptides Cyclic peptides have their N-terminus bonded to their C-terminus. This modification increases metabolic stability and is common in therapeutic peptides (e.g., Cyclosporin).

Parameters:

Name	Type	Description	Default
`length`	`Optional[int]`	Number of residues (ignored if sequence_str provided)	`None`
`sequence_str`	`Optional[str]`	Explicit amino acid sequence (1-letter or 3-letter codes)	`None`
`use_plausible_frequencies`	`bool`	Use biologically realistic amino acid frequencies	`False`
`conformation`	`str`	Default secondary structure conformation. Options: 'alpha', 'beta', 'ppii', 'extended', 'random' Default: 'alpha' (alpha helix) Used for all residues if structure is not provided, or for residues not specified in structure parameter.	`'alpha'`
`structure`	`Optional[str]`	Per-region conformation specification (NEW!) Format: "start-end:conformation,start-end:conformation,..." Example: "1-10:alpha,11-15:random,16-30:beta" If provided, overrides conformation for specified regions. Unspecified residues use the default conformation parameter.	`None`
`drift`	`float`	Maximum random perturbation applied to phi/psi angles (degrees). Used for "hard decoy" generation to create near-native conformations.	`0.0`
`optimize_sidechains`	`bool`	Run Monte Carlo side-chain optimization	`False`
`minimize_energy`	`bool`	Run OpenMM energy minimization (REQUIRED for cyclic closure)	`False`
`forcefield`	`str`	Forcefield to use for minimization	`'amber14-all.xml'`
`seed`	`Optional[int]`	Random seed for reproducible generation	`None`
`ph`	`float`	pH for titration	`7.4`
`cap_termini`	`bool`	Add ACE/NME caps (Disabled if cyclic=True)	`False`
`equilibrate`	`bool`	Run MD equilibration	`False`
`equilibrate_steps`	`int`	Number of MD steps	`1000`
`metal_ions`	`str`	Handle metal ions	`'auto'`
`minimization_k`	`float`	Tolerance	`10.0`
`minimization_max_iter`	`int`	Max iterations	`0`
`cis_proline_frequency`	`float`	Frequency of cis-proline	`0.05`
`phosphorylation_rate`	`float`	Frequency of phosphorylation	`0.0`
`cyclic`	`bool`	Whether to generate a cyclic peptide (Head-to-Tail)	`False`

Returns:

Name	Type	Description
`str`	`str`	Complete PDB file content

Raises:

Type	Description
`ValueError`	If invalid conformation name or structure syntax provided

EDUCATIONAL NOTE - Why Per-Region Conformations Matter: Real proteins have mixed secondary structures. For example: - Zinc fingers: beta sheets + alpha helices - Immunoglobulins: multiple beta sheets connected by loops - Helix-turn-helix motifs: two alpha helices connected by a turn This feature allows users to create these realistic structures.

EDUCATIONAL NOTE - Macrocyclization (Cyclic Peptides):

Cyclic peptides (macrocycles) are chains where the N-terminus and C-terminus are covalently linked. This has profound biological implications: 1. Metabolic Stability: Resistance to exopeptidases that chew protein ends. 2. Binding Affinity: By "locking" the molecule into a specific shape, the entropic penalty of binding to a target is greatly reduced. 3. Bioavailability: Many legendary drugs (like Cyclosporine A) are macrocycles.

EDUCATIONAL NOTE - Hard Decoy Support (AI Training):

This generator includes specialized parameters for "Hard Decoy" generation: 1. Torsion Drift (drift): Adds controlled Gaussian noise to ideal \(\\phi/\\psi\) angles. This simulates "near-native" local structural errors that challenge the resolution of AI scoring functions. 2. Threading (phi_list, psi_list, omega_list): Allows constructing one sequence using the backbone torsion angles of another. This maps a "wrong" sequence to a "right" fold, a key test for discriminative models.

Usage Examples

Basic Generation

from synth_pdb.generator import PeptideGenerator

# Create generator
gen = PeptideGenerator("ALA-GLY-SER-LEU-VAL")

# Generate structure
peptide = gen.generate(conformation="alpha")

# Get PDB content
pdb_content = peptide.to_pdb()

# Save to file
with open("output.pdb", "w") as f:
    f.write(pdb_content)

Mixed Secondary Structures

# Helix-turn-helix motif
gen = PeptideGenerator("ACDEFGHIKLMNPQRSTVWY")
peptide = gen.generate(
    structure_regions="1-5:alpha,6-10:random,11-15:alpha"
)

Random Sequence Generation

from synth_pdb.generator import generate_pdb_content

# Generate random 20-residue peptide
pdb_content = generate_pdb_content(
    length=20,
    conformation="random",
    use_plausible_frequencies=True  # Use biologically realistic frequencies
)

With Energy Minimization

pdb_content = generate_pdb_content(
    sequence_str="LKELEKELEKELEKEL",  # Leucine zipper
    conformation="alpha",
    minimize_energy=True,
    cap_termini=True
)

Helper Functions

`_resolve_sequence(length, user_sequence_str=None, use_plausible_frequencies=False)`

Resolve the amino acid sequence from user input or random generation.

Parameters:

Name	Type	Description	Default
`length`	`Optional[int]`	Target length for random generation.	required
`user_sequence_str`	`Optional[str]`	Optional user-provided sequence (1-letter or 3-letter).	`None`
`use_plausible_frequencies`	`bool`	If True, uses biological frequencies for random.	`False`

Returns:

Type	Description
`List[str]`	List of 3-letter amino acid codes.

`_sample_ramachandran_angles(res_name, next_res_name=None)`

Sample phi/psi angles from Ramachandran probability distribution.

Uses residue-specific distributions for GLY and PRO, general distribution for all other amino acids. Samples from favored regions using weighted Gaussian distributions.

New Feature: Pre-Proline Bias If next_res_name is 'PRO' and current residue is not GLY or PRO, uses a specific 'PRE_PRO' distribution (favors beta/extended).

Parameters:

Name	Type	Description	Default
`res_name`	`str`	Three-letter amino acid code	required
`next_res_name`	`Optional[str]`	(Optional) Code of the next residue	`None`

Returns:

Type	Description
`Tuple[float, float]`	Tuple of (phi, psi) angles in degrees

Reference

Lovell et al. (2003) Proteins: Structure, Function, and Bioinformatics

`_detect_disulfide_bonds(peptide)`

Detect potential disulfide bonds between cysteine residues.

EDUCATIONAL NOTE - Disulfide Bond Detection:

Disulfide bonds form between two cysteine (CYS) residues when their sulfur atoms (SG) are close enough to form a covalent S-S bond.

Detection Criteria: - Both residues must be CYS - SG-SG distance: 2.0-2.2 Å (slightly relaxed from ideal 2.0-2.1 Å) - Only report each pair once (avoid duplicates)

Why Distance Matters: - < 2.0 Å: Too close (steric clash, not realistic) - 2.0-2.1 Å: Ideal disulfide bond distance - 2.1-2.2 Å: Acceptable (allows for flexibility) - > 2.2 Å: Too far (no covalent bond possible)

Biological Context: - Disulfides stabilize protein structure - Common in extracellular proteins - Rare in cytoplasm (reducing environment) - Important for protein folding and stability

Parameters:

Name	Type	Description	Default
`peptide`	`AtomArray`	Biotite AtomArray structure	required

Returns:

Type	Description
`list`	List of tuples (res_id1, res_id2) representing disulfide bonds

Example

disulfides = _detect_disulfide_bonds(structure) print(disulfides) [(3, 8), (12, 20)] # CYS 3-8 and CYS 12-20 are bonded

Educational Notes

NeRF Algorithm

The NeRF (Natural Extension Reference Frame) algorithm builds 3D structures from internal coordinates:

Bond Length: Distance between consecutive atoms (e.g., N-CA = 1.46 Å)
Bond Angle: Angle formed by three consecutive atoms (e.g., N-CA-C = 111°)
Dihedral Angle: Torsion angle formed by four consecutive atoms (e.g., phi, psi)

Mathematical Foundation:

Given three atoms (A, B, C) and internal coordinates (bond_length, bond_angle, dihedral), the position of a new atom D is calculated by:

Creating a local coordinate system at C
Rotating by the dihedral angle
Placing D at the specified bond length and angle

This allows building complex 3D structures from simple 1D sequences.

B-factor Calculation

B-factors (temperature factors) represent atomic mobility:

\[B = 8\pi^2 \langle u^2 \rangle\]

Where \(\langle u^2 \rangle\) is the mean square displacement.

synth-pdb calculates B-factors from Order Parameters (\(S^2\)) using the Lipari-Szabo formalism:

\[B \propto (1 - S^2)\]

Realistic Ranges: - Backbone atoms: 15-25 Ų - Side-chain atoms: 20-35 Ų - Terminal residues: 30-50 Ų

generator Module

Overview

Main Classes

PeptideGenerator

Functions

__init__(sequence='ALA-GLY-SER', **kwargs)

generate(**overrides)

Main Functions

EDUCATIONAL NOTE - Macrocyclization (Cyclic Peptides):

EDUCATIONAL NOTE - Hard Decoy Support (AI Training):

Usage Examples

Basic Generation

Mixed Secondary Structures

Random Sequence Generation

With Energy Minimization

Helper Functions

_resolve_sequence(length, user_sequence_str=None, use_plausible_frequencies=False)

_sample_ramachandran_angles(res_name, next_res_name=None)

_detect_disulfide_bonds(peptide)

EDUCATIONAL NOTE - Disulfide Bond Detection:

Educational Notes

NeRF Algorithm

B-factor Calculation

See Also

`PeptideGenerator`

`init(sequence='ALA-GLY-SER', **kwargs)`

`generate(**overrides)`

`_resolve_sequence(length, user_sequence_str=None, use_plausible_frequencies=False)`

`_sample_ramachandran_angles(res_name, next_res_name=None)`

`_detect_disulfide_bonds(peptide)`