Skip to content

biophysics Module

The biophysics module enhances synthetic structures with realistic biochemical properties, such as pH-dependent protonation and terminal capping.

Overview

While the core generator builds the 3D atomic coordinates, the biophysics module handles the "chemical identity" of the residues, ensuring that protonation states and terminal groups are biologically accurate for a given environment.

Main Functions

apply_ph_titration(structure, ph=7.4, rng=None)

Apply global pH settings to titratable residues (mainly Histidine).

Parameters:

Name Type Description Default
structure AtomArray

The atom array.

required
ph float

The pH value (default 7.4).

7.4

Returns:

Type Description
AtomArray

Modified atom array with updated residue names (HIS -> HIE/HID/HIP).

Source code in synth_pdb/biophysics.py
def apply_ph_titration(
    structure: struc.AtomArray, ph: float = 7.4, rng: random.Random | None = None
) -> struc.AtomArray:
    """Apply global pH settings to titratable residues (mainly Histidine).

    Args:
        structure: The atom array.
        ph: The pH value (default 7.4).

    Returns:
        Modified atom array with updated residue names (HIS -> HIE/HID/HIP).

    """
    logger.info(f"Applying pH Titration (pH={ph})...")

    # Iterate residues
    # We need to scan usually.
    # But for HIS, we can mask finding "HIS" and potentially replace.
    # However, replacing RES_NAME in Biotite is easy array operation.

    # 1. Low pH (Acidic) -> HIP (Positive)
    if ph < 6.0:
        # Simplistic Henderson-Hasselbalch Logic:
        # If pH < pKa (6.0), predominant species is protonated.
        # Rename ALL HIS to HIP.
        mask = structure.res_name == "HIS"
        if mask.any():
            count = len(set(structure.res_id[mask]))
            structure.res_name[mask] = "HIP"
            logger.info(f"Protonated {count} Histidines to HIP (pH {ph} < 6.0)")

    # 2. High/Physiological pH -> Neutral Tautomers (HIE/HID)
    else:
        # Determine tautomer ratios.
        # In solution, N-epsilon (HIE) is favored ~80:20 over N-delta (HID).
        # We will assign probabilistically per residue.

        # Get all HIS residue IDs
        his_mask = structure.res_name == "HIS"
        his_res_ids = sorted(set(structure.res_id[his_mask]))

        for res_id in his_res_ids:
            # 80% chance HIE, 20% chances HID
            # Note: Standard PDB often uses just "HIS" implying neural.
            # But explicit modelling requires choosing one.
            # If we want to be explicit:
            _rng: Any = rng if rng is not None else random
            new_name = "HIE" if _rng.random() < 0.8 else "HID"

            # Update this residue
            res_mask = (structure.res_id == res_id) & (structure.res_name == "HIS")
            structure.res_name[res_mask] = new_name

        if his_res_ids:
            logger.info(
                f" assigned tautomers (HIE/HID) to {len(his_res_ids)} Histidines (pH {ph} > 6.0)"
            )

    return structure

cap_termini(structure)

Add terminal capping groups (ACE/NME) to the peptide.

EDUCATIONAL NOTE - Terminal Capping:

Biological proteins are usually long chains. However, simulation and experiments often use shorter peptide fragments.

Uncapped termini (NH3+ and COO-) introduce strong charges that are often unrealistic for an internal fragment of a protein. - N-terminus cap: Acetyl (ACE) -> replaces H with CH3-CO- Eliminates positive charge at N-term. Structure: CH3-C(=O)-NH-... - C-terminus cap: N-Methylamide (NME) -> replaces O with NH-CH3 Eliminates negative charge at C-term. Structure: ...-CO-NH-CH3

This function geometrically constructs these caps attached to the start and end residues.

Parameters:

Name Type Description Default
structure AtomArray

Input peptide structure

required

Returns:

Type Description
AtomArray

Structure with ACE and NME residues added.

Source code in synth_pdb/biophysics.py
def cap_termini(structure: struc.AtomArray) -> struc.AtomArray:
    """Add terminal capping groups (ACE/NME) to the peptide.

    EDUCATIONAL NOTE - Terminal Capping:
    ====================================
    Biological proteins are usually long chains. However, simulation and experiments
    often use shorter peptide fragments.

    Uncapped termini (NH3+ and COO-) introduce strong charges that are often unrealistic
    for an internal fragment of a protein.
    - N-terminus cap: Acetyl (ACE) -> replaces H with CH3-CO-
      Eliminates positive charge at N-term.
      Structure: CH3-C(=O)-NH-...
    - C-terminus cap: N-Methylamide (NME) -> replaces O with NH-CH3
      Eliminates negative charge at C-term.
      Structure: ...-CO-NH-CH3

    This function geometrically constructs these caps attached to the start and end residues.

    Args:
        structure: Input peptide structure

    Returns:
        Structure with ACE and NME residues added.

    """
    logger.info("Adding terminal caps (ACE/NME)...")

    # 1. Identify Termini
    # Filter for non-hetero atoms to ensure we cap the protein chain,
    # not an injected metal ion or other cofactor.
    protein_indices = np.where(~structure.hetero)[0]
    if len(protein_indices) == 0:
        return structure

    protein_res_ids = sorted(set(structure.res_id[protein_indices]))
    n_term_id = protein_res_ids[0]
    c_term_id = protein_res_ids[-1]

    # --- ACE (Acetyl) at N-terminus ---
    # Attaches to N of first residue.
    # Geometry:
    # We need to place C (carbonyl), O (carbonyl oxygen), and CH3 (methyl).
    # Since we are prepending, we work "backwards" from N.

    try:
        n_1 = structure[(structure.res_id == n_term_id) & (structure.atom_name == "N")][0]
        ca_1 = structure[(structure.res_id == n_term_id) & (structure.atom_name == "CA")][0]
        c_1 = structure[(structure.res_id == n_term_id) & (structure.atom_name == "C")][0]

        # 1. Place ACE C (Carbonyl)
        # We position it relative to N-CA-C frame of first residue.
        #
        # EDUCATIONAL NOTE - Conformational Assumptions:
        # ----------------------------------------------
        # Since the ACE cap is being added to a pre-existing chain, we must
        # assume a reasonable torsion angle (Phi) for the new peptide bond.
        # A value of -135.0 (Beta-sheet range) is used here as a physically
        # plausible default that minimizes initial steric clashes with the
        # rest of the protein chain.
        #
        # Bond: C(ace)-N = 1.33 A (Standard Amide Bond)
        # Angle: C(ace)-N-CA = 121.7 deg
        # Dihedral: C(ace)-N-CA-C. This corresponds to the Phi angle definition.
        phi_assume = -135.0  # Beta sheet-like, safe usually

        c_ace_coord = position_atom_3d_from_internal_coords(
            c_1.coord, ca_1.coord, n_1.coord, BOND_LENGTH_C_N, ANGLE_C_N_CA, phi_assume
        )

        # 2. Place ACE O (Carbonyl Oxygen)
        # Bond: O-C = 1.23 A
        # Angle: O-C-N = 123.0 deg (approx)
        # Dihedral: O-C-N-CA.
        #
        # SCIENTIFIC NOTE - Planarity:
        # The peptide bond (omega) is nearly always trans (180 deg) and planar
        # due to resonance between the Carbonyl Oxygen and the Amide Nitrogen.
        # Setting O and CA to be 'trans' (180) ensures this planarity.
        o_ace_coord = position_atom_3d_from_internal_coords(
            ca_1.coord, n_1.coord, c_ace_coord, BOND_LENGTH_C_O, 123.0, 180.0
        )

        # 3. Place ACE CH3 (Methyl)
        # Bond: CH3-C = 1.50 A
        # Angle: CH3-C-N = 116.0 deg
        # Dihedral: CH3-C-N-CA (Omega). Standard Trans = 180.
        ch3_ace_coord = position_atom_3d_from_internal_coords(
            ca_1.coord, n_1.coord, c_ace_coord, 1.50, 116.0, 180.0
        )

        # Create ACE atoms (Residue ID: n_term_id - 1)
        # If n_term_id is 1, ACE is 0.
        ace_res_id = n_term_id - 1
        ace_atoms = [
            struc.Atom(
                ch3_ace_coord,
                atom_name="CH3",
                res_id=ace_res_id,
                res_name="ACE",
                element="C",
                hetero=False,
            ),
            struc.Atom(
                c_ace_coord,
                atom_name="C",
                res_id=ace_res_id,
                res_name="ACE",
                element="C",
                hetero=False,
            ),
            struc.Atom(
                o_ace_coord,
                atom_name="O",
                res_id=ace_res_id,
                res_name="ACE",
                element="O",
                hetero=False,
            ),
        ]
        ace_structure = struc.array(ace_atoms)
        ace_structure.chain_id[:] = "A"

    except IndexError:
        logger.warning("Could not build ACE cap: Missing N/CA/C on N-terminus.")
        ace_structure = None

    # --- NME (N-Methylamide) at C-terminus ---
    # Attaches to C of last residue.
    # Geometry:
    # We place N, and CH3 (methyl).

    try:
        c_last = structure[(structure.res_id == c_term_id) & (structure.atom_name == "C")][0]
        ca_last = structure[(structure.res_id == c_term_id) & (structure.atom_name == "CA")][0]
        n_last = structure[(structure.res_id == c_term_id) & (structure.atom_name == "N")][0]

        # 1. Place NME N
        # We assume Trans peptide bond (Omega=180) relative to previous CA-C.
        # Bond: N-C = 1.33 A
        # Angle: N-C-CA = 116.2 deg
        # Dihedral: N-C-CA-N(prev). This is Psi.
        # We assume Psi = -47 (Alpha) or 135 (Beta). Let's use 135 (Extended) to stick out.
        psi_assume = 135.0

        n_nme_coord = position_atom_3d_from_internal_coords(
            n_last.coord, ca_last.coord, c_last.coord, BOND_LENGTH_C_N, ANGLE_CA_C_N, psi_assume
        )

        # 2. Place NME CH3
        # Bond: CH3-N = 1.45 A
        # Angle: CH3-N-C = 122.0 deg (approx for amide nitrogen)
        # Dihedral: CH3-N-C-CA.
        # Standard Trans peptide bond (Omega = 180).
        ch3_nme_coord = position_atom_3d_from_internal_coords(
            ca_last.coord, c_last.coord, n_nme_coord, 1.45, 122.0, 180.0
        )

        # Create NME atoms
        nme_res_id = c_term_id + 1
        nme_atoms = [
            struc.Atom(
                n_nme_coord,
                atom_name="N",
                res_id=nme_res_id,
                res_name="NME",
                element="N",
                hetero=False,
            ),
            struc.Atom(
                ch3_nme_coord,
                atom_name="CH3",
                res_id=nme_res_id,
                res_name="NME",
                element="C",
                hetero=False,
            ),
        ]
        nme_structure = struc.array(nme_atoms)
        nme_structure.chain_id[:] = "A"

    except IndexError:
        logger.warning("Could not build NME cap: Missing N/CA/C on C-terminus.")
        nme_structure = None

    # Combine
    final_structure = structure

    if ace_structure:
        # -- Residue ID Management ---------------------------------------------
        # To maintain a valid PDB structure, every residue must have a unique ID.
        # Adding an N-terminal cap requires shifting the entire protein chain
        # forward to make room for the ACE residue at index 1.

        # Re-index existing residues: Shift all by +1
        final_structure.res_id += 1
        # ACE is now Res 1
        ace_structure.res_id[:] = 1

        # -- Protonation Correction --------------------------------------------
        # When a terminal NH3+ group becomes an internal amide bond (via ACE),
        # it must lose its N-terminal specific hydrogens (H2, H3).
        n_term_mask = (final_structure.res_id == (n_term_id + 1)) & np.isin(
            final_structure.atom_name, ["H2", "H3"]
        )
        if n_term_mask.any():
            final_structure = final_structure[~n_term_mask]
            logger.debug("Removed N-terminal H2/H3 to accommodate ACE cap.")

        # Prepend to the global array
        final_structure = ace_structure + final_structure
        # Update c_term_id for OXT check since it shifted
        c_term_id += 1

    if nme_structure:
        # -- C-Terminal Indexing -----------------------------------------------
        # Unlike ACE (which prepends), NME is appended to the end. We simply
        # find the current highest ID and increment.
        current_res_ids = sorted(set(final_structure.res_id))
        last_id = current_res_ids[-1]
        nme_structure.res_id[:] = last_id + 1

        # -- Carboxyl Correction -----------------------------------------------
        # A free C-terminus typically ends with a Carboxyl group (COO-) which
        # includes an OXT atom. When we cap it with N-Methylamide (NME), the
        # residue becomes an internal amide and must lose the OXT/HXT atoms
        # to satisfy forcefield valence requirements.
        c_term_mask = (final_structure.res_id == c_term_id) & np.isin(
            final_structure.atom_name, ["OXT", "HXT"]
        )
        if c_term_mask.any():
            final_structure = final_structure[~c_term_mask]
            logger.debug(f"Removed terminal atoms from residue {c_term_id} to accommodate NME cap.")

        # Append to the global array
        final_structure = final_structure + nme_structure

    return final_structure

find_salt_bridges(structure, cutoff=5.0)

Automatically detects potential salt bridges in a protein structure.

A salt bridge is defined here as a pair of acidic and basic residues where any of their side-chain charged atoms are within the specified cutoff.

Parameters:

Name Type Description Default
structure AtomArray

Biotite AtomArray (should include side chains).

required
cutoff float

Distance threshold in Angstroms (default 4.0).

5.0

Returns:

Type Description
list[dict[str, Any]]

list of dict: Each dict contains: - res_ia: Residue ID of the first residue - res_ib: Residue ID of the second residue - atom_a: Name of the coordinating atom in res_ia - atom_b: Name of the coordinating atom in res_ib - distance: The measured distance

Source code in synth_pdb/biophysics.py
def find_salt_bridges(structure: struc.AtomArray, cutoff: float = 5.0) -> list[dict[str, Any]]:
    """Automatically detects potential salt bridges in a protein structure.

    A salt bridge is defined here as a pair of acidic and basic residues
    where any of their side-chain charged atoms are within the specified cutoff.

    Args:
        structure: Biotite AtomArray (should include side chains).
        cutoff: Distance threshold in Angstroms (default 4.0).

    Returns:
        list of dict: Each dict contains:
            - res_ia: Residue ID of the first residue
            - res_ib: Residue ID of the second residue
            - atom_a: Name of the coordinating atom in res_ia
            - atom_b: Name of the coordinating atom in res_ib
            - distance: The measured distance

    """
    # Filter for Acidic and Basic atoms only to speed up search.
    #
    # EDUCATIONAL NOTE - Vectorized Proximity Search:
    # -----------------------------------------------
    # Instead of nested loops over all atoms (O(N^2)), we use NumPy masking
    # to extract only the candidates that can participate in salt bridges.
    acid_mask = np.isin(structure.res_name, ACIDIC_RESIDUES) & np.isin(
        structure.atom_name, ACIDIC_ATOMS
    )
    base_mask = np.isin(structure.res_name, BASIC_RESIDUES) & np.isin(
        structure.atom_name, BASIC_ATOMS
    )

    acids = structure[acid_mask]
    bases = structure[base_mask]

    # Early exit if no candidates found
    if len(acids) == 0 or len(bases) == 0:
        return []

    # Compute Distance Matrix between all Acid atoms and Base atoms
    #
    # PERFORMANCE NOTE:
    # Using broadcasting to create an (N, M, 3) tensor and then summing
    # squares allows us to leverage SIMD instructions for the search.
    diffs = acids.coord[:, np.newaxis, :] - bases.coord[np.newaxis, :, :]
    dists = np.sqrt(np.sum(diffs**2, axis=-1))

    # Find pairs within cutoff (typically 4.0 - 5.0 Angstroms)
    indices = np.where(dists < cutoff)

    found_pairs: dict[tuple[int, int], dict[str, Any]] = {}  # (res_ia, res_ib) -> bridge_dict

    for acid_idx, base_idx in zip(*indices, strict=False):
        a_atom = acids[acid_idx]
        b_atom = bases[base_idx]

        # Ensure we don't bridge within the same residue
        if a_atom.res_id == b_atom.res_id:
            continue

        # Ensure pair_key is a Tuple[int, int]
        # We sort to ensure consistent mapping regardless of order (ia, ib vs ib, ia)
        res_i, res_j = sorted([int(a_atom.res_id), int(b_atom.res_id)])
        pair_key = (res_i, res_j)
        dist = dists[acid_idx, base_idx]

        # SCIENTIFIC NOTE: Selecting the Optimal Pair
        # A single residue might have multiple charged atoms (e.g. ASP OD1/OD2).
        # We pick the closest atom pair to represent the center of the salt bridge.
        if pair_key not in found_pairs or dist < found_pairs[pair_key]["distance"]:
            found_pairs[pair_key] = {
                "res_ia": int(a_atom.res_id),
                "res_ib": int(b_atom.res_id),
                "atom_a": a_atom.atom_name,
                "atom_b": b_atom.atom_name,
                "distance": float(dist),
            }

    return list(found_pairs.values())

Usage Examples

pH Titration

Apply pH-dependent protonation to residues like Histidine.

import biotite.structure as struc
from synth_pdb.biophysics import apply_ph_titration

# Load structure
# structure: struc.AtomArray

# Apply acidic pH (renames HIS to HIP)
structure = apply_ph_titration(structure, ph=5.0)

# Apply physiological pH (probabilistically renames HIS to HIE or HID)
structure = apply_ph_titration(structure, ph=7.4)

Terminal Capping

Add Acetyl (ACE) and N-Methylamide (NME) caps to the termini of a peptide fragment.

from synth_pdb.biophysics import cap_termini

# Add caps to structure
structure = cap_termini(structure)

Salt Bridge Detection

Identify potential ionic interactions between acidic and basic residues.

from synth_pdb.biophysics import find_salt_bridges

bridges = find_salt_bridges(structure, cutoff=5.0)
for b in bridges:
    print(f"Salt bridge between {b['res_ia']} and {b['res_ib']}")

Educational Notes

pH and Protonation

Biological function depends heavily on pH. The most sensitive residue near physiological pH (7.4) is Histidine (\(pK_a \approx 6.0\)). - pH < 6.0: The imidazole ring is protonated and carries a \(+1\) charge. Represented as HIP. - pH > 6.0: The ring is neutral (\(0\) charge). It exists in two tautomeric forms: HIE (\(\epsilon\) nitrogen protonated) or HID (\(\delta\) nitrogen protonated).

Terminal Capping

Uncapped termini (\(NH_3^+\) and \(COO^-\)) introduce strong charges that are often unrealistic for a short peptide fragment intended to represent a region within a larger protein. - N-terminus cap (ACE): Acetyl group (\(CH_3\text{-}CO\text{-}\)) replaces the terminal hydrogen. - C-terminus cap (NME): N-Methylamide group (\(\text{-}NH\text{-}CH_3\)) replaces the terminal oxygen.

Capping eliminates these artificial terminal charges, providing a more realistic model of internal protein segments.

Salt Bridges

A salt bridge is a combination of two non-covalent interactions: Hydrogen Bonding and Electrostatic (Ionic) Attraction. It occurs between a positively charged basic residue (Lys, Arg, His) and a negatively charged acidic residue (Asp, Glu). These bridges are critical for stabilizing tertiary structure and driving specific molecular recognition.

References

  • Proline Conformation: MacArthur, M. W., & Thornton, J. M. (1991). "Influence of proline residues on protein conformation." Journal of Molecular Biology. DOI: 10.1016/0022-2836(91)90627-W
  • Salt Bridges: Bosshard, H. R., et al. (2004). "The salt bridge in proteins." Journal of Molecular Recognition. DOI: 10.1002/jmr.657
  • pH and Proteins: Tanford, C. (1962). "The interpretation of hydrogen ion titration curves of proteins." Advances in Protein Chemistry.

See Also