chemical_shifts Module
The chemical_shifts module provides utilities for predicting NMR chemical shifts from protein structures.
Overview
Chemical shifts are highly sensitive indicators of local secondary structure and environment. This module provides shims to synth-nmr for predicting these values.
Main Functions
predict_chemical_shifts(structure, use_shiftx2=True)
Predict chemical shifts for a protein structure from Cartesian coordinates.
SCIENTIFIC BACKGROUND:
Chemical shift prediction is the "bridge" between the static atomic coordinates of a structural model and the experimental NMR observables. Empirical predictors like SHIFTX2 and SPARTA+ use machine-learning models trained on a vast database of experimentally solved structures. These models primarily evaluate: - Backbone Dihedral Angles (Phi, Psi, Omega) - Sidechain Dihedrals (Chi) - Hydrogen Bonding Geometry (O-H distances) - Ring Current Effects (proximity to aromatic rings)
Because these engines are parameterized exclusively on the Protein Data Bank (which predominantly contains naturally occurring L-amino acids), they are "blind" to the inverted chirality of D-amino acids. A right-handed D-alpha helix, perfectly stable physically, is viewed by an L-biased engine as a highly disallowed random coil state.
To solve this, this function implements a rigorous Dual-Pass Coordinate Inversion pipeline: 1. Base Pass: Standard prediction for L-amino acids. 2. Inversion Pass: The Cartesian coordinates are mathematically inverted (coord = -coord) through the origin. This reflects the D-enantiomers into their exact L-enantiomer geometric equivalents, perfectly preserving all inter-atomic distances and angular magnitudes. 3. The base predictor evaluates the inverted structure, and the resulting shifts are merged back for the D-residues.
Technical Implementation Details: - CLONING: We perform an 'in-memory' rename on a copy of the structure to prevent side-effects on the original structural ensemble. This ensures thread-safety in parallel generation workflows. - MASKING: residue-wise renaming is performed using vectorized NumPy boolean masks for maximum performance on large datasets (10^4+ atoms). - GEOMETRY: This strategy captures the primary phi/psi/chi dependence of the chemical shift, providing a robust first-order estimate even for heavily modified synthetic peptides.
ALGORITHM SELECTION & PRECISION:
The module supports two primary prediction engines: 1. SHIFTX2 (use_shiftx2=True): A hybrid machine-learning/empirical method that uses sequence-profile and ensemble-based refinement. It is considered the gold standard for accuracy (RMSD ~0.04 ppm for 1H). 2. SPARTA+ (use_shiftx2=False): A robust empirical method based on a neural network trained on local geometry. It is highly reproducible and has no external binary dependencies, making it ideal for CI/CD.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
structure
|
Any
|
The Biotite AtomArray to predict shifts for. Must contain at least backbone N, CA, and C atoms. |
required |
use_shiftx2
|
bool
|
If True, attempts to use the SHIFTX2 engine. If False, forces the use of the empirical SPARTA+ engine. Defaults to True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
dict[str, dict[int, dict[str, float]]]
|
A nested dictionary structure {chain: {res_id: {atom: value}}}. Values are in parts-per-million (ppm). |
Source code in synth_pdb/chemical_shifts.py
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 | |
calculate_csi(shifts, structure)
Calculate the Chemical Shift Index (CSI) for a protein structure.
SCIENTIFIC BACKGROUND:
The Chemical Shift Index (CSI) is a robust empirical method for identifying secondary structure elements (helices, sheets) directly from NMR chemical shifts. The method relies on the observation that local backbone geometry exerts a predictable deshielding or shielding effect on specific nuclei relative to their "random coil" (unstructured) baseline values.
Continuous CSI values (often denoted as Delta-delta) are calculated by subtracting the sequence-specific random coil shift from the experimentally measured or predicted shift.
For the C-alpha (CA) nucleus: - Positive deviations (> +0.7 ppm) strongly indicate an alpha-helical conformation. This is because the compact helical geometry typically deshields the CA nucleus. - Negative deviations (< -0.7 ppm) strongly indicate a beta-sheet conformation. The extended geometry of a beta-strand shields the CA nucleus, resulting in an upfield shift.
For the C-beta (CB) nucleus, the pattern is reversed: - Negative deviations indicate an alpha-helix. - Positive deviations indicate a beta-sheet.
This implementation automatically calculates the deviations for all relevant
nuclei present in the shifts dictionary, allowing researchers to build
a consensus secondary structure map without requiring NOE distance restraints
or full 3D coordinate generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
shifts
|
dict[str, dict[int, dict[str, float]]]
|
Predicted or experimental shifts in the nested dictionary format. |
required |
structure
|
Any
|
The Biotite AtomArray used for sequence and residue mapping. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Dict |
dict[str, dict[int, float]]
|
Nested dictionary {chain: {res_id: deviation}}. |
Source code in synth_pdb/chemical_shifts.py
get_secondary_structure(shifts, structure)
Infers categorical secondary structure (H, E, C) from chemical shifts.
SCIENTIFIC BACKGROUND:
Categorical secondary structure annotation is typically performed using geometric criteria (e.g., DSSP or P-SEA) evaluated directly on the 3D coordinates of a protein structure. However, it can also be inferred from chemical shift data using tools like TALOS or CSI, which map sequence-specific shifts to standard secondary structure alphabets.
In this implementation, we utilize Biotite's annotate_sse function, which
implements a variant of the P-SEA algorithm. This algorithm defines elements
based on continuous stretches of specific backbone dihedral angle pairs:
- 'a' (Alpha Helix): Contiguous residues with phi ~ -60, psi ~ -45
- 'b' (Beta Strand): Contiguous residues with extended phi/psi angles
- 'c' (Random Coil): Everything else
Because empirical secondary structure parsing tools are intrinsically formulated exclusively for naturally occurring L-amino acid geometries, they strictly evaluate structural signatures based on left-handed dihedral angular boundaries (e.g., phi < 0 for helices).
When a valid D-peptide (which possesses structurally sound right-handed geometry) is evaluated by standard P-SEA, its perfectly valid mirrored angles (phi > 0) fall drastically outside of the classical alpha-helical or beta-strand probability density boundaries. This causes the structural parser to erroneously classify perfectly stable D-structures as completely unstructured "random coil" ("C") regions.
To correctly infer categorical labels for non-natural geometries, we must: 1. Identify any D-amino acids in the input structure using the lookup map. 2. Rename D-residues back to their L-parents so the atomic parsing matches. 3. IF D-amino acids exist, mathematically invert the coordinates (coord = -coord) prior to evaluation.
This mathematical transformation perfectly reflects the D-enantiomeric structure through the origin. As a result, all internal physical geometries (hydrogen bond distances, backbone inter-atomic spacing) are identically preserved, but the global chirality is restored to left-handed geometry. The P-SEA algorithm can then successfully recognize the structural motifs and accurately classify the secondary structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
shifts
|
dict[str, dict[int, dict[str, float]]]
|
Predicted or experimental shifts. |
required |
structure
|
Any
|
The Biotite AtomArray used for mapping and sequence length. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
List[str]: A list of 3-state (H, E, C) or DSSP labels per residue. |
Source code in synth_pdb/chemical_shifts.py
653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 | |
Usage Examples
Predicting Chemical Shifts
from synth_pdb.chemical_shifts import predict_chemical_shifts
# structure: biotite.structure.AtomArray
shifts = predict_chemical_shifts(structure)
# shifts: Dict[int, Dict[str, float]]
# {residue_id: {atom_name: shift_value}}
Chemical Shift Index (CSI)
Analyze secondary structure based on H-alpha, C-alpha, and C-beta shifts.
from synth_pdb.chemical_shifts import calculate_csi
csi = calculate_csi(shifts)
# Returns list of -1 (alpha), 1 (beta), 0 (coil)
References
- Random Coil Shifts: Wishart, D. S., et al. (1995). "1H, 13C and 15N random coil NMR chemical shifts of the common amino acids. I. Investigations of nearest-neighbor effects." Journal of Biomolecular NMR. DOI: 10.1007/BF00211783
- Ring Current Effects: Haigh, C. W., & Mallion, R. B. (1979). "Ring current theories in nuclear magnetic resonance." Progress in Nuclear Magnetic Resonance Spectroscopy. DOI: 10.1016/0079-6565(79)80010-2
See Also
- nmr Module - General NMR utilities
- relaxation Module - NMR dynamics and relaxation
- Scientific Background: NMR Theory