Skip to content

docking Module

The docking module provides utilities for preparing synthetic protein structures for molecular docking simulations and electrostatic calculations.

Overview

Molecular docking requires structures to be in specific formats that include physical parameters beyond simple coordinates. The docking module focuses on converting standard PDB files into PQR format, which includes partial atomic charges and Van der Waals radii.

Key Features

  • PQR Export: Converts PDB files to PQR format using OpenMM for charge assignment.
  • Charge Assignment: Automatically adds missing hydrogens and assigns partial charges based on the AMBER forcefield.
  • Radii Calculation: Derives atomic radii from Lennard-Jones parameters (\(\sigma / 2\)).
  • PTM Handling: Standardizes non-standard or modified residues for compatibility with standard forcefields.

API Reference

docking

Classes

DockingPrep

Utilities for preparing PDB structures for molecular docking.

Source code in synth_pdb/docking.py
class DockingPrep:
    """Utilities for preparing PDB structures for molecular docking."""

    def __init__(self, forcefield_name: str = "amber14-all.xml") -> None:
        self.forcefield_name = forcefield_name
        try:
            self.forcefield = app.ForceField(self.forcefield_name, "amber14/tip3pfb.xml")
        except Exception as e:
            logger.error(f"Failed to load forcefield '{forcefield_name}': {e}")
            raise

    def write_pqr(self, input_pdb: str, output_pqr: str) -> bool:
        """Converts a PDB file to PQR format (adding partial charges and radii).

        Uses OpenMM to assign charges based on the selected forcefield.
        Radii are derived from Lennard-Jones sigma (sigma / 2).

        Args:
            input_pdb: Path to input PDB.
            output_pqr: Path to output PQR.

        Returns:
            bool: True if successful.

        """
        try:
            import os
            import tempfile

            # 0. PDB PRE-PROCESSING (Standardize residues for OpenMM)
            ptm_map = {
                "SEP": "SER",
                "TPO": "THR",
                "PTR": "TYR",
                "HIE": "HIS",
                "HID": "HIS",
                "HIP": "HIS",
                "DAL": "ALA",
                "DAR": "ARG",
                "DAN": "ASN",
                "DAS": "ASP",
                "DCY": "CYS",
                "DGL": "GLU",
                "DGN": "GLN",
                "DHI": "HIS",
                "DIL": "ILE",
                "DLE": "LEU",
                "DLY": "LYS",
                "DME": "MET",
                "DPH": "PHE",
                "DPR": "PRO",
                "DSE": "SER",
                "DTH": "THR",
                "DTR": "TRP",
                "DTY": "TYR",
                "DVA": "VAL",
            }
            ptm_atom_names = ["P", "O1P", "O2P", "O3P"]

            with open(input_pdb) as f:
                pdb_lines = f.readlines()

            modified_lines = []
            for line in pdb_lines:
                if line.startswith(("ATOM", "HETATM")):
                    res_name = line[17:20].strip()
                    if res_name in ptm_map:
                        new_name = ptm_map[res_name]
                        line = line[:17] + f"{new_name: >3}" + line[20:]
                        if res_name in ["SEP", "TPO", "PTR"]:
                            atom_name = line[12:16].strip()
                            if atom_name in ptm_atom_names:
                                continue
                modified_lines.append(line)

            with tempfile.NamedTemporaryFile(mode="w", suffix=".pdb", delete=False) as tf:
                tf.writelines(modified_lines)
                temp_input_path = tf.name

            # 1. Load PDB
            pdb = app.PDBFile(temp_input_path)
            topology, positions = pdb.topology, pdb.positions

            # EDUCATIONAL NOTE: Ensure connectivity before hydrogen addition
            topology.createStandardBonds()
            topology.createDisulfideBonds(positions)

            # Cleanup
            try:
                os.unlink(temp_input_path)
            except Exception:
                pass

            # 2. Add Hydrogens (Crucial for correct charge assignment)
            modeller = app.Modeller(topology, positions)
            # STRIP existing H to avoid template mismatches/conflicts
            modeller.delete(
                [
                    a
                    for a in modeller.topology.atoms()
                    if a.element is not None and a.element.symbol == "H"
                ]
            )
            modeller.addHydrogens(self.forcefield, pH=7.4)  # Physiological pH

            # 3. Create System to get forces (charges/sigmas)
            system = self.forcefield.createSystem(
                modeller.topology, nonbondedMethod=app.NoCutoff, constraints=None, rigidWater=False
            )

            # 4. Extract NonbondedForce
            nonbonded = None
            for force in system.getForces():
                if isinstance(force, mm.NonbondedForce):
                    nonbonded = force
                    break

            if not nonbonded:
                raise ValueError("Forcefield does not contain NonbondedForce (no charges found).")

            # 5. Write PQR
            # Standard PDB columns, but Occupancy -> Charge, B-factor -> Radius
            # We can use Biotite or manual writing. Manual gives full control over PQR format quirks.
            # But converting OpenMM Topology -> Biotite AtomArray is verbose.
            # Let's write manually by iterating OpenMM topology.

            with open(output_pqr, "w") as f:
                atom_idx = 0
                for chain in modeller.topology.chains():
                    for residue in chain.residues():
                        for atom in residue.atoms():
                            # Get parameters (Charge, Sigma, Epsilon)
                            charge, sigma, epsilon = nonbonded.getParticleParameters(atom.index)

                            # Convert units
                            # charge is in elementary charge (float)
                            # sigma is in nanometers (Quantity)
                            # radius = sigma / 2

                            q = charge.value_in_unit(unit.elementary_charge)
                            r_nm = sigma.value_in_unit(unit.nanometer) * 0.5
                            r_angstrom = r_nm * 10.0

                            pos = modeller.positions[atom.index]
                            x = pos[0].value_in_unit(unit.angstrom)
                            y = pos[1].value_in_unit(unit.angstrom)
                            z = pos[2].value_in_unit(unit.angstrom)

                            atom_idx += 1

                            # PQR Format (Whitespace separated with standard columns)
                            # Field  Serial Atom Residue Chain ResID  X  Y  Z  Charge Radius
                            # Standard PQR columns:
                            # 1-6   "ATOM  " or "HETATM"
                            # 7-11   Atom serial number
                            # 13-16  Atom name
                            # 18-20  Residue name
                            # 22     Chain identifier
                            # 23-26  Residue sequence number
                            # 31-38  X coordinate (Angstroms)
                            # 39-46  Y coordinate (Angstroms)
                            # 47-54  Z coordinate (Angstroms)
                            # 55-62  Charge (e)
                            # 63-70  Radius (Angstroms)

                            # Template for robust alignment (8.3f for coords, 8.4f for Q and R)
                            # PQR is often whitespace-separated, but fixed-width is more compatible.
                            line = (
                                f"{'ATOM':<6}{atom_idx:>5} {atom.name:<4} {residue.name:<3} "
                                f"{chain.id:<1}{residue.id:>4}    "
                                f"{x:8.3f}{y:8.3f}{z:8.3f}{q:8.4f}{r_angstrom:8.4f}\n"
                            )
                            f.write(line)

            logger.info(f"Successfully wrote PQR to {output_pqr}")
            return True

        except Exception as e:
            logger.error(f"PQR generation failed: {e}")
            return False
Functions
write_pqr(input_pdb, output_pqr)

Converts a PDB file to PQR format (adding partial charges and radii).

Uses OpenMM to assign charges based on the selected forcefield. Radii are derived from Lennard-Jones sigma (sigma / 2).

Parameters:

Name Type Description Default
input_pdb str

Path to input PDB.

required
output_pqr str

Path to output PQR.

required

Returns:

Name Type Description
bool bool

True if successful.

Source code in synth_pdb/docking.py
def write_pqr(self, input_pdb: str, output_pqr: str) -> bool:
    """Converts a PDB file to PQR format (adding partial charges and radii).

    Uses OpenMM to assign charges based on the selected forcefield.
    Radii are derived from Lennard-Jones sigma (sigma / 2).

    Args:
        input_pdb: Path to input PDB.
        output_pqr: Path to output PQR.

    Returns:
        bool: True if successful.

    """
    try:
        import os
        import tempfile

        # 0. PDB PRE-PROCESSING (Standardize residues for OpenMM)
        ptm_map = {
            "SEP": "SER",
            "TPO": "THR",
            "PTR": "TYR",
            "HIE": "HIS",
            "HID": "HIS",
            "HIP": "HIS",
            "DAL": "ALA",
            "DAR": "ARG",
            "DAN": "ASN",
            "DAS": "ASP",
            "DCY": "CYS",
            "DGL": "GLU",
            "DGN": "GLN",
            "DHI": "HIS",
            "DIL": "ILE",
            "DLE": "LEU",
            "DLY": "LYS",
            "DME": "MET",
            "DPH": "PHE",
            "DPR": "PRO",
            "DSE": "SER",
            "DTH": "THR",
            "DTR": "TRP",
            "DTY": "TYR",
            "DVA": "VAL",
        }
        ptm_atom_names = ["P", "O1P", "O2P", "O3P"]

        with open(input_pdb) as f:
            pdb_lines = f.readlines()

        modified_lines = []
        for line in pdb_lines:
            if line.startswith(("ATOM", "HETATM")):
                res_name = line[17:20].strip()
                if res_name in ptm_map:
                    new_name = ptm_map[res_name]
                    line = line[:17] + f"{new_name: >3}" + line[20:]
                    if res_name in ["SEP", "TPO", "PTR"]:
                        atom_name = line[12:16].strip()
                        if atom_name in ptm_atom_names:
                            continue
            modified_lines.append(line)

        with tempfile.NamedTemporaryFile(mode="w", suffix=".pdb", delete=False) as tf:
            tf.writelines(modified_lines)
            temp_input_path = tf.name

        # 1. Load PDB
        pdb = app.PDBFile(temp_input_path)
        topology, positions = pdb.topology, pdb.positions

        # EDUCATIONAL NOTE: Ensure connectivity before hydrogen addition
        topology.createStandardBonds()
        topology.createDisulfideBonds(positions)

        # Cleanup
        try:
            os.unlink(temp_input_path)
        except Exception:
            pass

        # 2. Add Hydrogens (Crucial for correct charge assignment)
        modeller = app.Modeller(topology, positions)
        # STRIP existing H to avoid template mismatches/conflicts
        modeller.delete(
            [
                a
                for a in modeller.topology.atoms()
                if a.element is not None and a.element.symbol == "H"
            ]
        )
        modeller.addHydrogens(self.forcefield, pH=7.4)  # Physiological pH

        # 3. Create System to get forces (charges/sigmas)
        system = self.forcefield.createSystem(
            modeller.topology, nonbondedMethod=app.NoCutoff, constraints=None, rigidWater=False
        )

        # 4. Extract NonbondedForce
        nonbonded = None
        for force in system.getForces():
            if isinstance(force, mm.NonbondedForce):
                nonbonded = force
                break

        if not nonbonded:
            raise ValueError("Forcefield does not contain NonbondedForce (no charges found).")

        # 5. Write PQR
        # Standard PDB columns, but Occupancy -> Charge, B-factor -> Radius
        # We can use Biotite or manual writing. Manual gives full control over PQR format quirks.
        # But converting OpenMM Topology -> Biotite AtomArray is verbose.
        # Let's write manually by iterating OpenMM topology.

        with open(output_pqr, "w") as f:
            atom_idx = 0
            for chain in modeller.topology.chains():
                for residue in chain.residues():
                    for atom in residue.atoms():
                        # Get parameters (Charge, Sigma, Epsilon)
                        charge, sigma, epsilon = nonbonded.getParticleParameters(atom.index)

                        # Convert units
                        # charge is in elementary charge (float)
                        # sigma is in nanometers (Quantity)
                        # radius = sigma / 2

                        q = charge.value_in_unit(unit.elementary_charge)
                        r_nm = sigma.value_in_unit(unit.nanometer) * 0.5
                        r_angstrom = r_nm * 10.0

                        pos = modeller.positions[atom.index]
                        x = pos[0].value_in_unit(unit.angstrom)
                        y = pos[1].value_in_unit(unit.angstrom)
                        z = pos[2].value_in_unit(unit.angstrom)

                        atom_idx += 1

                        # PQR Format (Whitespace separated with standard columns)
                        # Field  Serial Atom Residue Chain ResID  X  Y  Z  Charge Radius
                        # Standard PQR columns:
                        # 1-6   "ATOM  " or "HETATM"
                        # 7-11   Atom serial number
                        # 13-16  Atom name
                        # 18-20  Residue name
                        # 22     Chain identifier
                        # 23-26  Residue sequence number
                        # 31-38  X coordinate (Angstroms)
                        # 39-46  Y coordinate (Angstroms)
                        # 47-54  Z coordinate (Angstroms)
                        # 55-62  Charge (e)
                        # 63-70  Radius (Angstroms)

                        # Template for robust alignment (8.3f for coords, 8.4f for Q and R)
                        # PQR is often whitespace-separated, but fixed-width is more compatible.
                        line = (
                            f"{'ATOM':<6}{atom_idx:>5} {atom.name:<4} {residue.name:<3} "
                            f"{chain.id:<1}{residue.id:>4}    "
                            f"{x:8.3f}{y:8.3f}{z:8.3f}{q:8.4f}{r_angstrom:8.4f}\n"
                        )
                        f.write(line)

        logger.info(f"Successfully wrote PQR to {output_pqr}")
        return True

    except Exception as e:
        logger.error(f"PQR generation failed: {e}")
        return False

Scientific Principles

The PQR Format

The PQR format is a variation of PDB where: 1. The Occupancy column is replaced by the Partial Charge (\(q\)) of the atom. 2. The B-factor column is replaced by the Atomic Radius (\(r\)) in Angstroms.

This format is the standard input for electrostatic solvers like APBS (Adaptive Poisson-Boltzmann Solver) and many docking engines.

Charge Assignment Logic

The module uses OpenMM to: 1. Standardize: Map modified residues (like phosphorylated Serine) to their parent residues if necessary. 2. Protonate: Add hydrogens at a specific pH (default 7.4). 3. Parameterize: Apply a forcefield (default amber14-all.xml) to look up the partial charge for each atom type in its specific chemical environment.

Usage Example

from synth_pdb.docking import DockingPrep

# 1. Initialize the preparation tool
prep = DockingPrep(forcefield_name="amber14-all.xml")

# 2. Convert a synthetic PDB to PQR
# This will add hydrogens and assign charges
success = prep.write_pqr(
    input_pdb="synthetic_protein.pdb", 
    output_pqr="ready_for_docking.pqr"
)

if success:
    print("PQR file created successfully.")