Skip to content

special_chemistry Module

The special_chemistry module models complex post-translational modifications (PTMs) and unique chemical events that go beyond standard amino acid chains.

Overview

Some proteins undergo autocatalytic chemical changes that are essential for their function. A prime example is the Green Fluorescent Protein (GFP), where a specific tripeptide sequence (SYG, TYG, or GYG) undergoes cyclization and oxidation to form a fluorophore.

Key Features

  • GFP Chromophore Maturation: Models the cyclization of the Ser65-Tyr66-Gly67 motif.
  • Covalent Bond Manipulation: Tools for programmatically adding or removing bonds between atoms in a Biotite structure.

API Reference

special_chemistry

Special Chemistry & Post-Translational Modifications Module.

This module handles unique chemical events beyond standard amino acid chains, such as the formation of chromophores or other covalent modifications that are critical for the function of certain proteins.

Functions

find_gfp_chromophore_motif(structure)

Scans the structure for the Ser-Tyr-Gly motif that forms the GFP chromophore.

The chromophore is formed by the cyclization of residues Ser-Tyr-Gly. This function identifies the indices of these three consecutive residues.

Parameters:

Name Type Description Default
structure AtomArray

Biotite AtomArray, must contain a single chain.

required

Returns:

Type Description
dict | None

A dictionary containing the residue IDs of S, Y, and G if the motif is found,

dict | None

otherwise None.

Source code in synth_pdb/special_chemistry.py
def find_gfp_chromophore_motif(structure: struc.AtomArray) -> dict | None:
    """Scans the structure for the Ser-Tyr-Gly motif that forms the GFP chromophore.

    The chromophore is formed by the cyclization of residues Ser-Tyr-Gly.
    This function identifies the indices of these three consecutive residues.

    Args:
        structure: Biotite AtomArray, must contain a single chain.

    Returns:
        A dictionary containing the residue IDs of S, Y, and G if the motif is found,
        otherwise None.

    """
    # Ensure we are working with a single protein chain
    if len(np.unique(structure.chain_id)) > 1:
        logger.warning("GFP chromophore search only supported for single chains.")
        return None

    res_ids, res_names = struc.get_residues(structure)

    for i in range(len(res_names) - 2):
        # Check for the S-Y-G sequence
        if res_names[i] == "SER" and res_names[i + 1] == "TYR" and res_names[i + 2] == "GLY":
            ser_res_id = res_ids[i]
            tyr_res_id = res_ids[i + 1]
            gly_res_id = res_ids[i + 2]

            logger.info(
                f"Found potential GFP chromophore motif: SER({ser_res_id})-TYR({tyr_res_id})-GLY({gly_res_id})"
            )
            return {
                "ser_res_id": ser_res_id,
                "tyr_res_id": tyr_res_id,
                "gly_res_id": gly_res_id,
            }

    return None

form_gfp_chromophore(structure, motif)

Forms the GFP chromophore by cyclizing the Ser-Tyr-Gly motif.

This function simulates the maturation of the GFP chromophore by: 1. Renaming the involved residues (S-Y-G) to 'CRO' (the standard PDB code for the matured chromophore). 2. Adjusting the covalent connectivity (conceptually) to reflect the formation of the imidazolinone ring.

Parameters:

Name Type Description Default
structure AtomArray

The input AtomArray containing the SYG motif.

required
motif dict

A dictionary identifying the residue IDs (ser_res_id, tyr_res_id, gly_res_id) to be modified.

required

Returns:

Type Description
AtomArray

The modified AtomArray with the residues renamed to CRO.

Source code in synth_pdb/special_chemistry.py
def form_gfp_chromophore(structure: struc.AtomArray, motif: dict) -> struc.AtomArray:
    """Forms the GFP chromophore by cyclizing the Ser-Tyr-Gly motif.

    This function simulates the maturation of the GFP chromophore by:
    1. Renaming the involved residues (S-Y-G) to 'CRO' (the standard PDB
       code for the matured chromophore).
    2. Adjusting the covalent connectivity (conceptually) to reflect
       the formation of the imidazolinone ring.

    Args:
        structure: The input AtomArray containing the SYG motif.
        motif: A dictionary identifying the residue IDs (ser_res_id,
               tyr_res_id, gly_res_id) to be modified.

    Returns:
        The modified AtomArray with the residues renamed to CRO.

    """
    new_structure = structure.copy()

    ser_id = motif["ser_res_id"]
    tyr_id = motif["tyr_res_id"]
    gly_id = motif["gly_res_id"]

    # In a real PDB, the three residues become a single 'CRO' residue.
    # Here we rename all three to signal the maturation event.
    mask = np.isin(new_structure.res_id, [ser_id, tyr_id, gly_id])
    new_structure.res_name[mask] = "CRO"

    logger.info(f"Successfully matured SYG motif (residues {ser_id}-{gly_id}) into CRO.")

    return new_structure

Scientific Principles

GFP Chromophore Formation

The maturation of the GFP chromophore involves three steps: 1. Cyclization: The amide nitrogen of Gly67 attacks the carbonyl carbon of Ser65. 2. Dehydration: Loss of a water molecule to form a five-membered heterocyclic ring (imidazolin-5-one). 3. Oxidation: Dehydrogenation of the Tyr66 \(C\alpha-C\beta\) bond to create a conjugated system.

The special_chemistry module simulates the final structural state of this matured chromophore, allowing for realistic modeling of fluorescent proteins.

Usage Example

from synth_pdb.generator import PeptideGenerator
from synth_pdb.special_chemistry import (
    find_gfp_chromophore_motif, 
    form_gfp_chromophore
)

# 1. Generate a sequence containing the GFP motif
# (A fragment of the GFP barrel)
seq = "FEGUFSYGVQCFS" 
gen = PeptideGenerator(seq)
structure = gen.generate(conformation="alpha")

# 2. Identify the chromophore motif (SYG)
motif = find_gfp_chromophore_motif(structure)

# 3. Apply the chemical modification
if motif:
    matured_structure = form_gfp_chromophore(structure, motif)