Skip to content

saxs Module

The saxs module enables the simulation of Small-Angle X-ray Scattering (SAXS) profiles from atomic structures and ensembles.

Overview

SAXS is a technique used to probe the global shape, size, and flexibility of proteins in solution. This module uses the Debye Formula to calculate the scattering intensity \(I(q)\) as a function of the scattering vector magnitude \(q\).

Key Features

  • Debye Formula Implementation: Accurate \(O(N^2)\) calculation of interference patterns.
  • Atomic Form Factors: Uses \(q\)-dependent Gaussian approximations for C, N, O, S, P, and H.
  • Solvent Contrast: Accounts for the scattering of the displaced solvent volume (hydration shell approximation).
  • Ensemble Averaging: Computes the mean scattering profile for flexible structures or structural ensembles.

API Reference

saxs

EDUCATIONAL OVERVIEW - SAXS Curve Simulation:

---------------------------------------------

Small-Angle X-ray Scattering (SAXS) is a fundamental technique for studying

protein structure and dynamics in solution. This module computes synthetic

scattering curves (I(q) vs q) from atomic coordinates.

SCIENTIFIC PRINCIPLES:

----------------------

1. The Debye Formula: The scattering intensity I(q) is computed by summing the

interference between all pairs of atoms in the molecule.

I(q) = sum_i sum_j f_i(q) f_j(q) * sin(q * r_ij) / (q * r_ij)

where q is the scattering vector magnitude and r_ij is the distance between

atoms i and j.

2. Atomic Form Factors: Atoms of different elements scatter X-rays with

different efficiencies. We use q-dependent form factors approximated by

a sum of Gaussians (Waasmaier & Kirfel, 1995).

3. Solvent Contrast (Solvation Shell): In SAXS, we measure the "excess"

scattering of the protein relative to the solvent. We subtract the

scattering contribution of the displaced solvent volume (V) for each atom.

CRITICAL PHYSICAL STABILITY NOTE:

The effective scattering factor is f_eff(q) = f_vac(q) - rho_sol * V * exp(-q^2 * R^2 / 10).

If the volume V is underestimated (e.g., V=0 for H), the f_eff(q) contrast

profile becomes unstable. Specifically, the upward "pressure" from the

decaying solvent term can exceed the downward "pressure" from the protein's

interferometry, causing non-physical increases in I(q) at low q.

Maintaining standard volumes (Pavlov & Svergun, 1997) is essential.

REFERENCES:

-----------

- Waasmaier, D. & Kirfel, A. (1995). New analytical scattering-factor

functions for free atoms and ions. Acta Cryst. A51, 416-431.

- Pavlov, M.Y. & Svergun, D.I. (1997). A dataset for testing the

algorithms of small-angle scattering data analysis. J. Appl. Cryst. 30, 712-717.

- Svergun, D., Barberato, C. & Koch, M. H. (1995). CRYSOL - a program to

evaluate X-ray solution scattering of biological macromolecules from

atomic coordinates. J. Appl. Cryst. 28, 768-773.

Classes

SaxsSimulator

Stateful SAXS simulator for ensembles.

Source code in synth_pdb/saxs.py
class SaxsSimulator:
    """Stateful SAXS simulator for ensembles."""

    def __init__(
        self,
        q_min: float = 0.0,
        q_max: float = 0.5,
        n_points: int = 51,
        include_solvent: bool = True,
    ):
        self.q_min = q_min
        self.q_max = q_max
        self.n_points = n_points
        self.include_solvent = include_solvent

    def simulate(self, structure: struc.AtomArray | struc.AtomArrayStack) -> np.ndarray:
        """Computes the averaged SAXS profile for a structure or ensemble."""
        if hasattr(structure, "stack_depth") and structure.stack_depth() > 0:
            # For ensembles, average the intensities
            all_intensities = []
            for i in range(structure.stack_depth()):
                _, intensity = calculate_saxs_profile(
                    structure[i],
                    q_min=self.q_min,
                    q_max=self.q_max,
                    n_points=self.n_points,
                    include_solvent=self.include_solvent,
                )
                all_intensities.append(intensity)

            if all_intensities:
                return cast(np.ndarray, np.mean(all_intensities, axis=0))
            return np.zeros(self.n_points)

        if isinstance(structure, struc.AtomArrayStack) and structure.stack_depth() == 0:
            logger.warning("Attempted to simulate SAXS on an empty ensemble.")
            return np.zeros(self.n_points)

        # Single structure
        _, intensity = calculate_saxs_profile(
            structure,  # type: ignore[arg-type]
            q_min=self.q_min,
            q_max=self.q_max,
            n_points=self.n_points,
            include_solvent=self.include_solvent,
        )
        return intensity
Functions
simulate(structure)

Computes the averaged SAXS profile for a structure or ensemble.

Source code in synth_pdb/saxs.py
def simulate(self, structure: struc.AtomArray | struc.AtomArrayStack) -> np.ndarray:
    """Computes the averaged SAXS profile for a structure or ensemble."""
    if hasattr(structure, "stack_depth") and structure.stack_depth() > 0:
        # For ensembles, average the intensities
        all_intensities = []
        for i in range(structure.stack_depth()):
            _, intensity = calculate_saxs_profile(
                structure[i],
                q_min=self.q_min,
                q_max=self.q_max,
                n_points=self.n_points,
                include_solvent=self.include_solvent,
            )
            all_intensities.append(intensity)

        if all_intensities:
            return cast(np.ndarray, np.mean(all_intensities, axis=0))
        return np.zeros(self.n_points)

    if isinstance(structure, struc.AtomArrayStack) and structure.stack_depth() == 0:
        logger.warning("Attempted to simulate SAXS on an empty ensemble.")
        return np.zeros(self.n_points)

    # Single structure
    _, intensity = calculate_saxs_profile(
        structure,  # type: ignore[arg-type]
        q_min=self.q_min,
        q_max=self.q_max,
        n_points=self.n_points,
        include_solvent=self.include_solvent,
    )
    return intensity

Functions

calculate_saxs_profile(structure, q_min=0.0, q_max=0.5, n_points=51, include_solvent=True, solvent_density=0.334)

Calculate the SAXS profile I(q) for a protein structure.

This implements the Debye formula with O(N^2) complexity.

Parameters:

Name Type Description Default
structure AtomArray

Biotite AtomArray (full atom recommended).

required
q_min float

Minimum q value (default 0.0).

0.0
q_max float

Maximum q value (default 0.5).

0.5
n_points int

Number of q points.

51
include_solvent bool

If True, subtracts displaced solvent volume.

True
solvent_density float

Electron density of the solvent.

0.334

Returns:

Type Description
tuple[ndarray, ndarray]

Tuple of (q_values, intensity_values).

Source code in synth_pdb/saxs.py
def calculate_saxs_profile(
    structure: struc.AtomArray,
    q_min: float = 0.0,
    q_max: float = 0.5,
    n_points: int = 51,
    include_solvent: bool = True,
    solvent_density: float = 0.334,  # e/A^3 (Water)
) -> tuple[np.ndarray, np.ndarray]:
    """Calculate the SAXS profile I(q) for a protein structure.

    This implements the Debye formula with O(N^2) complexity.

    Args:
        structure: Biotite AtomArray (full atom recommended).
        q_min: Minimum q value (default 0.0).
        q_max: Maximum q value (default 0.5).
        n_points: Number of q points.
        include_solvent: If True, subtracts displaced solvent volume.
        solvent_density: Electron density of the solvent.

    Returns:
        Tuple of (q_values, intensity_values).
    """
    n_atoms = structure.array_length()
    logger.info(f"Calculating SAXS profile for {n_atoms} atoms...")

    q = np.linspace(q_min, q_max, n_points)

    # 1. Precompute inter-atomic distances (N x N matrix)
    coords = structure.coord
    if coords.ndim == 3:
        # If passed an AtomArrayStack with 1 model, flatten to 2D
        coords = coords[0]

    # Use scipy for efficient distance calculation
    dist = cdist(coords, coords)

    # 2. Vectorized form factor calculation
    elements = structure.element
    unique_elements = np.unique(elements)
    f_atoms_array = np.zeros((n_atoms, n_points))

    for elem in unique_elements:
        mask = elements == elem
        f_atom = get_form_factor(elem, q)

        if include_solvent:
            # Solvent displacement: f_eff = f_vac - rho_sol * V * exp(-q^2 * R^2 / 10)
            # R is the effective atomic radius: R = (3V / 4pi)^(1/3)
            #
            # SCIENTIFIC NOTE - Monotonicity and Decay:
            # ----------------------------------------
            # The exponent -q^2 * R^2 / K represents the decay of the solvent
            # displacement volume. Using K=6 (Radius of Gyration of a sphere)
            # is physically standard but can lead to non-monotonicity if
            # atomic volumes are small. We use K=10.0 for improved numerical
            # stability across all structure sizes, ensuring the protein's
            # interference always dominates the solvent decay at low q.
            v = FORM_FACTOR_COEFFS.get(elem.upper(), FORM_FACTOR_COEFFS["C"])["volume"]
            decay_rate = ((3 * v) / (4 * np.pi)) ** (2 / 3) / 10.0
            f_sol = solvent_density * v * np.exp(-(q**2) * decay_rate)
            f_atom = f_atom - f_sol

        f_atoms_array[mask] = f_atom

    # 3. Apply Debye formula: I(q) = sum_i sum_j f_i(q) * f_j(q) * sinc(q * r_ij)
    intensity = np.zeros(n_points)

    for i in range(n_points):
        qi = q[i]
        fi = f_atoms_array[:, i]

        if qi < 1e-7:
            # At q=0, sinc(qr) = 1, so I(0) = (sum f_i) ** 2
            intensity[i] = np.sum(fi) ** 2
        else:
            # Use np.sinc for numerical stability
            # Note: np.sinc(x) is sin(pi*x) / (pi*x), so we pass qr/pi
            qr_over_pi = (qi * dist) / np.pi
            sinc_qr = np.sinc(qr_over_pi)

            # Use dot product for faster summation: fi^T * sinc_qr * fi
            intensity[i] = fi @ (sinc_qr @ fi)

    return q, intensity

export_saxs_profile(q, intensity, output_file)

Export SAXS data to a standard .dat file (q, I, error).

Source code in synth_pdb/saxs.py
def export_saxs_profile(q: np.ndarray, intensity: np.ndarray, output_file: str) -> None:
    """Export SAXS data to a standard .dat file (q, I, error)."""
    # For synthetic data, we can provide a small dummy error (1% of intensity)
    error = intensity * 0.01
    data = np.column_stack([q, intensity, error])
    header = "Generated by synth-pdb\nq (A^-1)   I(q)       error"
    np.savetxt(output_file, data, header=header, fmt="%.6e")
    logger.info(f"SAXS profile exported to {output_file}")

Scientific Principles

The Debye Formula

The total scattering intensity \(I(q)\) is computed by summing the interference between all pairs of atoms \(i\) and \(j\):

\[I(q) = \sum_i \sum_j f_i(q) f_j(q) \frac{\sin(q r_{ij})}{q r_{ij}}\]

where: - \(q\) is the scattering vector (\(q = \frac{4\pi \sin \theta}{\lambda}\)). - \(r_{ij}\) is the distance between atoms \(i\) and \(j\). - \(f_i(q)\) is the effective atomic form factor.

Solvent Subtraction

In solution, we measure the "excess" scattering of the protein. The module approximates the effective form factor as:

\[f_{eff}(q) = f_{vac}(q) - \rho_{sol} V_i \exp\left(-\frac{q^2 V_i^{2/3}}{4\pi}\right)\]

where \(\rho_{sol}\) is the electron density of the solvent and \(V_i\) is the atomic volume.

Usage Example

from synth_pdb.generator import PeptideGenerator
from synth_pdb.saxs import calculate_saxs_profile, export_saxs_profile

# 1. Generate a structure
gen = PeptideGenerator("NSDSECPLSHDGYCLHDGVCMYIEALDKYACNCVVGYIGERCQ")
structure = gen.generate(conformation="alpha")

# 2. Simulate SAXS profile
q, intensity = calculate_saxs_profile(
    structure, 
    q_min=0.0, 
    q_max=0.5, 
    n_points=100
)

# 3. Export to .dat file
export_saxs_profile(q, intensity, "protein_saxs.dat")