Skip to content

cryo_em Module

The cryo_em module provides tools for generating synthetic 3D density maps from protein structures and ensembles. This is essential for simulating how conformational heterogeneity and resolution limits affect experimental Cryo-EM data.

Overview

Cryo-Electron Microscopy (Cryo-EM) measures the Coulomb potential of a biological sample. In synth-pdb, we simulate this by representing each atom as a Gaussian blob whose width corresponds to the target resolution.

Key Features

  • Gaussian Voxelization: Converts atomic coordinates into a 3D grid of density values.
  • Ensemble Averaging: Supports AtomArrayStack to simulate how flexible regions appear blurred in a consensus map.
  • MRC Export: Saves density maps in the industry-standard MRC/CCP4 format for visualization in ChimeraX or PyMOL.

API Reference

cryo_em

Advanced Cryo-EM Density Map Generation for Synthetic Ensembles.

Classes

CryoEMSimulator

Stateful wrapper for Cryo-EM simulation workflows.

Allows for reproducible generation of density maps from PDB ensembles.

Source code in synth_pdb/cryo_em.py
class CryoEMSimulator:
    """Stateful wrapper for Cryo-EM simulation workflows.

    Allows for reproducible generation of density maps from PDB ensembles.
    """

    def __init__(self, resolution: float = 3.0, spacing: float = 1.0):
        """Initialize the simulator.

        Args:
            resolution: Target map resolution (A).
            spacing: Voxel spacing (A).
        """
        self.resolution = resolution
        self.spacing = spacing

    def simulate(self, structure: struc.AtomArray | struc.AtomArrayStack) -> np.ndarray:
        """Generates a density map for the provided structure."""
        density, _ = generate_density_map(
            structure, resolution=self.resolution, grid_spacing=self.spacing
        )
        return density
Functions
simulate(structure)

Generates a density map for the provided structure.

Source code in synth_pdb/cryo_em.py
def simulate(self, structure: struc.AtomArray | struc.AtomArrayStack) -> np.ndarray:
    """Generates a density map for the provided structure."""
    density, _ = generate_density_map(
        structure, resolution=self.resolution, grid_spacing=self.spacing
    )
    return density

Functions

generate_density_map(structure, resolution=3.0, grid_spacing=1.0, buffer=5.0)

Generates a 3D density map from an AtomArray or AtomArrayStack.

Parameters:

Name Type Description Default
structure AtomArray | AtomArrayStack

Biotite structure(s) to convert to density.

required
resolution float

Target resolution in Angstroms (Gaussian sigma ~ res/3).

3.0
grid_spacing float

Voxel size in Angstroms (default 1.0A).

1.0
buffer float

Extra padding around the protein in Angstroms.

5.0

Returns:

Type Description
tuple[ndarray, ndarray]

Tuple of (3D density array, (3,) array of grid origin coordinates).

SCIENTIFIC NOTE - Resolution vs Sigma:

The relationship between the resolution (R) and the Gaussian sigma (sigma) is often approximated as sigma = R / (2 * sqrt(2 * ln 2)) ~ R / 2.355. However, for simple visual benchmarks, sigma = resolution / 3 is also common. We use the 1/3 rule here for a conservative "sharpness" at the target res.

Source code in synth_pdb/cryo_em.py
def generate_density_map(
    structure: struc.AtomArray | struc.AtomArrayStack,
    resolution: float = 3.0,
    grid_spacing: float = 1.0,
    buffer: float = 5.0,
) -> tuple[np.ndarray, np.ndarray]:
    """Generates a 3D density map from an AtomArray or AtomArrayStack.

    Args:
        structure: Biotite structure(s) to convert to density.
        resolution: Target resolution in Angstroms (Gaussian sigma ~ res/3).
        grid_spacing: Voxel size in Angstroms (default 1.0A).
        buffer: Extra padding around the protein in Angstroms.

    Returns:
        Tuple of (3D density array, (3,) array of grid origin coordinates).

    SCIENTIFIC NOTE - Resolution vs Sigma:
    --------------------------------------
    The relationship between the resolution (R) and the Gaussian sigma (sigma)
    is often approximated as sigma = R / (2 * sqrt(2 * ln 2)) ~ R / 2.355.
    However, for simple visual benchmarks, sigma = resolution / 3 is also common.
    We use the 1/3 rule here for a conservative "sharpness" at the target res.
    """
    if isinstance(structure, struc.AtomArray):
        # Convert single structure to a stack of 1 for unified processing
        stack = struc.stack([structure])
    else:
        stack = structure

    # 1. Define Grid Boundaries
    # We find the min/max coordinates across ALL models in the ensemble.
    coords = stack.coord
    c_min = np.min(coords, axis=(0, 1)) - buffer
    c_max = np.max(coords, axis=(0, 1)) + buffer

    # Calculate grid dimensions based on spacing
    grid_dims = np.ceil((c_max - c_min) / grid_spacing).astype(int)
    density = np.zeros(grid_dims)

    logger.info(f"Generating Cryo-EM map with dimensions {grid_dims} and spacing {grid_spacing}A")

    # 2. Voxelization
    # For each atom in each model, we increment the corresponding voxel.
    for model_idx in range(stack.stack_depth()):
        model_coords = stack.coord[model_idx]

        # Calculate voxel indices for all atoms at once
        voxel_indices = ((model_coords - c_min) / grid_spacing).astype(int)

        # Clip indices to ensure they fall within the grid (defensive)
        voxel_indices = np.clip(voxel_indices, 0, grid_dims - 1)

        # Accumulate "delta functions" at atomic positions
        # EDUCATIONAL NOTE: In a more advanced implementation, we would
        # weigh atoms by their atomic number (Z) to reflect scattering power.
        for idx in voxel_indices:
            density[idx[0], idx[1], idx[2]] += 1.0

    # 3. Ensemble Averaging
    # We divide by the number of models to get the mean occupancy per voxel.
    density /= stack.stack_depth()

    # 4. Gaussian Blurring (Resolution Simulation)
    # This turns the "point cloud" into a continuous density volume.
    sigma_voxels = (resolution / 3.0) / grid_spacing
    logger.debug(f"Applying Gaussian blur with sigma={sigma_voxels:.2f} voxels")

    blurred_density = gaussian_filter(density, sigma=sigma_voxels)

    return blurred_density, c_min

save_mrc_file(path, density, origin, spacing=1.0)

Saves a 3D density array to an MRC/CCP4 formatted file.

Parameters:

Name Type Description Default
path str

Output filename.

required
density ndarray

3D numpy array of density values.

required
origin ndarray

(3,) array of coordinates for the (0,0,0) voxel.

required
spacing float

Grid spacing in Angstroms.

1.0
EDUCATIONAL NOTE - The MRC Format:

The MRC format is the standard for 3D electron microscopy data. It contains a 1024-byte header followed by the binary density data. Key header fields include: - NX, NY, NZ: Number of voxels in each dimension. - XLEN, YLEN, ZLEN: Physical size of the box in Angstroms. - MAPC, MAPR, MAPS: Mapping of array axes to physical axes (usually 1,2,3).

Source code in synth_pdb/cryo_em.py
def save_mrc_file(
    path: str,
    density: np.ndarray,
    origin: np.ndarray,
    spacing: float = 1.0,
) -> None:
    """Saves a 3D density array to an MRC/CCP4 formatted file.

    Args:
        path: Output filename.
        density: 3D numpy array of density values.
        origin: (3,) array of coordinates for the (0,0,0) voxel.
        spacing: Grid spacing in Angstroms.

    EDUCATIONAL NOTE - The MRC Format:
    ---------------------------------
    The MRC format is the standard for 3D electron microscopy data.
    It contains a 1024-byte header followed by the binary density data.
    Key header fields include:
    - NX, NY, NZ: Number of voxels in each dimension.
    - XLEN, YLEN, ZLEN: Physical size of the box in Angstroms.
    - MAPC, MAPR, MAPS: Mapping of array axes to physical axes (usually 1,2,3).
    """
    if not HAS_MRCFILE:
        raise ImportError(
            "mrcfile is not installed. Please install it with `pip install mrcfile` "
            "to save MRC files."
        )

    with mrcfile.new(path, overwrite=True) as mrc:
        # MRC files expect float32
        mrc.set_data(density.astype(np.float32))

        # Set voxel size (Angstroms)
        mrc.voxel_size = spacing

        # Set origin (Standard PDB/MRC translation)
        # Note: mrcfile origin is often stored in the 'origin' header field
        mrc.header.origin.x = origin[0]
        mrc.header.origin.y = origin[1]
        mrc.header.origin.z = origin[2]

        # Ensure the statistics are calculated correctly
        mrc.update_header_stats()

    logger.info(f"Successfully saved Cryo-EM map to {path}")

Scientific Principles

Resolution vs. Sigma

The relationship between the reported resolution (\(R\)) and the Gaussian standard deviation (\(\sigma\)) used for blurring is:

\[\sigma = \frac{R}{3}\]

This conservative "1/3 rule" ensures that the density reflects the structural details appropriate for the target resolution. At 3Å resolution, atoms are clearly separated; at 8Å, only secondary structure elements (like alpha helices) remain visible as "tubes" of density.

Conformational Heterogeneity

When simulating an ensemble (IDPs or flexible loops), the density at each voxel is the average occupancy across all models:

\[\rho(\mathbf{r}) = \frac{1}{N} \sum_{i=1}^N \rho_i(\mathbf{r})\]

This naturally results in lower, more diffuse density for highly mobile regions, mimicking the "B-factor" or local resolution effects seen in real experiments.

Usage Example

from synth_pdb.generator import PeptideGenerator
from synth_pdb.cryo_em import generate_density_map, save_mrc_file

# 1. Generate an ensemble of decoys
gen = PeptideGenerator("MEELQK")
ensemble = gen.generate_ensemble(n_models=50)

# 2. Generate a 4Å density map
density, origin = generate_density_map(
    ensemble, 
    resolution=4.0, 
    grid_spacing=1.0
)

# 3. Save to MRC for visualization
save_mrc_file("synthetic_map.mrc", density, origin, spacing=1.0)