🔍 Protein Quality Assessment Lab¶

Welcome to the Protein Quality Assessment Lab!

In this tutorial, you'll learn how to evaluate protein structures like a professional structural biologist.

🎯 Goal¶

Understand what makes a "good" vs "bad" protein structure by applying validation metrics used by the Protein Data Bank (PDB) and tools like MolProbity.

📚 What You'll Learn¶

Ramachandran Analysis: Check backbone geometry
Clash Detection: Find steric overlaps
Bond Geometry: Validate lengths and angles
Rotamer Analysis: Verify side-chain conformations
Overall Quality Scoring: Combine metrics

⚠️ The Golden Rule¶

Garbage In, Garbage Out

A beautiful molecular dynamics simulation on a terrible structure is still terrible. Always validate your starting structures!

In [ ]:

Copied!





# 🛠️ SETUP: Environment and Dependencies
import os
import sys

# Check if we are running in Google Colab
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("🌐 Running in Google Colab - Installing public dependencies...")
    # These are NOT pre-installed in Colab
    %pip install -q biotite py3Dmol ipywidgets
    
    # Path handling for Colab
    # If the user opened from GitHub, they might need to clone or we append path
    if not os.path.exists('synth_pdb'):
        print("⚠️ synth_pdb not found in current directory. Appending path...")
else:
    print("💻 Running locally")

# Ensure synth_pdb is in the python path
# '../../' should point to the repository root relative to this notebook
sys.path.append(os.path.abspath('../../'))

print("✅ Setup complete!")
# 🛠️ SETUP: Environment and Dependencies
import os
import sys

# Check if we are running in Google Colab
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    print("🌐 Running in Google Colab - Installing public dependencies...")
    # These are NOT pre-installed in Colab
    %pip install -q biotite py3Dmol ipywidgets
    
    # Path handling for Colab
    # If the user opened from GitHub, they might need to clone or we append path
    if not os.path.exists('synth_pdb'):
        print("⚠️ synth_pdb not found in current directory. Appending path...")
else:
    print("💻 Running locally")

# Ensure synth_pdb is in the python path
# '../../' should point to the repository root relative to this notebook
sys.path.append(os.path.abspath('../../'))

print("✅ Setup complete!")

In [ ]:

Copied!





import io

import biotite.structure as struc
import biotite.structure.io.pdb as pdb
import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display

from synth_pdb.generator import generate_pdb_content
from synth_pdb.validator import PDBValidator
from synth_pdb.geometry import calculate_dihedral, calculate_rmsd, superimpose_structures

print("📦 All imports successful!")
import io

import biotite.structure as struc
import biotite.structure.io.pdb as pdb
import ipywidgets as widgets
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display

from synth_pdb.generator import generate_pdb_content
from synth_pdb.validator import PDBValidator
from synth_pdb.geometry import calculate_dihedral, calculate_rmsd, superimpose_structures

print("📦 All imports successful!")

Step 1: Generate Test Structures¶

We'll create two structures:

Good Structure: Properly minimized with realistic geometry
Bad Structure: No energy minimization (raw NeRF output)

This lets us compare quality metrics side-by-side.

In [ ]:

Copied!





# Generate a Good Structure (with minimization)
print("🧬 Generating GOOD structure (with energy minimization)...")
sequence = "MKFLKFSLLTAVLLSVVFAFSSCGDDDDAKAAAKAAAKAAAKEAAAKEAAAKA"
structure_def = "1-10:alpha,11-20:beta,21-25:random,26-35:alpha,36-45:beta,46-53:alpha"

good_pdb = generate_pdb_content(
    sequence_str=sequence,
    structure=structure_def,
    optimize_sidechains=True,
    minimize_energy=True  # KEY: Energy minimization ON
)

print("✅ Good structure ready!")
print(f"   Length: {len(good_pdb)} characters")
# Generate a Good Structure (with minimization)
print("🧬 Generating GOOD structure (with energy minimization)...")
sequence = "MKFLKFSLLTAVLLSVVFAFSSCGDDDDAKAAAKAAAKAAAKEAAAKEAAAKA"
structure_def = "1-10:alpha,11-20:beta,21-25:random,26-35:alpha,36-45:beta,46-53:alpha"

good_pdb = generate_pdb_content(
    sequence_str=sequence,
    structure=structure_def,
    optimize_sidechains=True,
    minimize_energy=True  # KEY: Energy minimization ON
)

print("✅ Good structure ready!")
print(f"   Length: {len(good_pdb)} characters")

In [ ]:

Copied!





# Generate a Bad Structure (No energy minimization + torsion drift)
print("⚠️  Generating BAD structure (Drifted Decoy)...")

# We use drift to create a structure that has a fold but is physically misfolded
bad_pdb = generate_pdb_content(
    sequence_str=sequence,
    structure=structure_def,
    drift=139.0,                # KEY: Add torsion noise to create outliers
    optimize_sidechains=False,  # KEY: No optimization to create clashes
    minimize_energy=False       # KEY: No minimization
)

print("✅ Bad structure (Decoy) ready!")
print(f"   Length: {len(bad_pdb)} characters")
print("\n💡 TIP: This 'decoy' has a realistic fold but is physically distorted.")
# Generate a Bad Structure (No energy minimization + torsion drift)
print("⚠️  Generating BAD structure (Drifted Decoy)...")

# We use drift to create a structure that has a fold but is physically misfolded
bad_pdb = generate_pdb_content(
    sequence_str=sequence,
    structure=structure_def,
    drift=139.0,                # KEY: Add torsion noise to create outliers
    optimize_sidechains=False,  # KEY: No optimization to create clashes
    minimize_energy=False       # KEY: No minimization
)

print("✅ Bad structure (Decoy) ready!")
print(f"   Length: {len(bad_pdb)} characters")
print("\n💡 TIP: This 'decoy' has a realistic fold but is physically distorted.")

Step 2: Ramachandran Plot Analysis¶

The Ramachandran Plot is the most fundamental quality check.

📐 What is it?¶

It plots the backbone dihedral angles (φ, ψ) for each residue. Only certain combinations are sterically allowed:

Blue Region: Alpha helices (φ ≈ -60°, ψ ≈ -45°)
Red Region: Beta sheets (φ ≈ -120°, ψ ≈ +120°)
Forbidden Zones: Atoms would overlap (bad!)

✅ Good Structure¶

90%+ residues in favored regions
No outliers in forbidden zones

❌ Bad Structure¶

Many outliers
Scattered distribution

In [ ]:

Copied!





def plot_ramachandran(pdb_content, title="Ramachandran Plot"):
    """Extract phi/psi angles and plot Ramachandran diagram."""
    pdb_file = pdb.PDBFile.read(io.StringIO(pdb_content))
    structure = pdb_file.get_structure(model=1)

    # Calculate backbone dihedrals
    phi, psi, omega = struc.dihedral_backbone(structure)

    # Convert to degrees and filter NaN
    phi_deg = []
    psi_deg = []
    colors = []

    for i in range(len(phi)):
        if not np.isnan(phi[i]) and not np.isnan(psi[i]):
            p = np.degrees(phi[i])
            s = np.degrees(psi[i])
            phi_deg.append(p)
            psi_deg.append(s)

            # Color by region (Using Broadened 'Professional Allowed' regions)
            # Alpha/General Allowed: Phi in [-180, -30], Psi in [-120, 60]
            if -180 < p < -30 and -120 < s < 60:
                colors.append('blue')  # Alpha favored/allowed
            # Beta Allowed: Phi in [-180, -45], Psi in [60, 180] or [-180, -140]
            elif -180 < p < -45 and (60 < s < 180 or -180 < s < -140):
                colors.append('red')   # Beta favored/allowed
            elif p > 0:
                colors.append('green') # Left-handed (usually Glycine)
            else:
                colors.append('orange') # Other/outlier

    # Background regions (Professional Allowed Boundaries)
    plt.gca().add_patch(plt.Rectangle((-180, -120), 150, 180,
                                      color='blue', alpha=0.1, label='Alpha/Gen Allowed'))
    plt.gca().add_patch(plt.Rectangle((-180, 60), 135, 120,
                                      color='red', alpha=0.1, label='Beta Allowed'))
    # Wraparound for beta
    plt.gca().add_patch(plt.Rectangle((-180, -180), 135, 40,
                                      color='red', alpha=0.1))

    # Data points with small jitter
    jitter_phi = np.random.normal(0, 1.5, size=len(phi_deg))
    jitter_psi = np.random.normal(0, 1.5, size=len(psi_deg))
    plt.scatter(np.array(phi_deg) + jitter_phi, np.array(psi_deg) + jitter_psi, 
                c=colors, alpha=0.6, edgecolors='black', s=50)

    plt.xlim(-180, 180)
    plt.ylim(-180, 180)
    plt.axhline(0, color='black', linewidth=0.5, alpha=0.3)
    plt.axvline(0, color='black', linewidth=0.5, alpha=0.3)
    plt.xlabel('Phi (φ) degrees', fontsize=12)
    plt.ylabel('Psi (ψ) degrees', fontsize=12)
    plt.title(title, fontsize=14, fontweight='bold')
    plt.grid(True, alpha=0.2)
    plt.legend(loc='upper right')
    plt.tight_layout()

    # Calculate statistics
    total = len(phi_deg)
    favored = sum(1 for c in colors if c in ['blue', 'red'])
    outliers = sum(1 for c in colors if c == 'orange')

    return {
        'total': total,
        'favored': favored,
        'favored_pct': 100 * favored / total if total > 0 else 0,
        'outliers': outliers,
        'outliers_pct': 100 * outliers / total if total > 0 else 0
    }
def plot_ramachandran(pdb_content, title="Ramachandran Plot"):
    """Extract phi/psi angles and plot Ramachandran diagram."""
    pdb_file = pdb.PDBFile.read(io.StringIO(pdb_content))
    structure = pdb_file.get_structure(model=1)

    # Calculate backbone dihedrals
    phi, psi, omega = struc.dihedral_backbone(structure)

    # Convert to degrees and filter NaN
    phi_deg = []
    psi_deg = []
    colors = []

    for i in range(len(phi)):
        if not np.isnan(phi[i]) and not np.isnan(psi[i]):
            p = np.degrees(phi[i])
            s = np.degrees(psi[i])
            phi_deg.append(p)
            psi_deg.append(s)

            # Color by region (Using Broadened 'Professional Allowed' regions)
            # Alpha/General Allowed: Phi in [-180, -30], Psi in [-120, 60]
            if -180 < p < -30 and -120 < s < 60:
                colors.append('blue')  # Alpha favored/allowed
            # Beta Allowed: Phi in [-180, -45], Psi in [60, 180] or [-180, -140]
            elif -180 < p < -45 and (60 < s < 180 or -180 < s < -140):
                colors.append('red')   # Beta favored/allowed
            elif p > 0:
                colors.append('green') # Left-handed (usually Glycine)
            else:
                colors.append('orange') # Other/outlier

    # Background regions (Professional Allowed Boundaries)
    plt.gca().add_patch(plt.Rectangle((-180, -120), 150, 180,
                                      color='blue', alpha=0.1, label='Alpha/Gen Allowed'))
    plt.gca().add_patch(plt.Rectangle((-180, 60), 135, 120,
                                      color='red', alpha=0.1, label='Beta Allowed'))
    # Wraparound for beta
    plt.gca().add_patch(plt.Rectangle((-180, -180), 135, 40,
                                      color='red', alpha=0.1))

    # Data points with small jitter
    jitter_phi = np.random.normal(0, 1.5, size=len(phi_deg))
    jitter_psi = np.random.normal(0, 1.5, size=len(psi_deg))
    plt.scatter(np.array(phi_deg) + jitter_phi, np.array(psi_deg) + jitter_psi, 
                c=colors, alpha=0.6, edgecolors='black', s=50)

    plt.xlim(-180, 180)
    plt.ylim(-180, 180)
    plt.axhline(0, color='black', linewidth=0.5, alpha=0.3)
    plt.axvline(0, color='black', linewidth=0.5, alpha=0.3)
    plt.xlabel('Phi (φ) degrees', fontsize=12)
    plt.ylabel('Psi (ψ) degrees', fontsize=12)
    plt.title(title, fontsize=14, fontweight='bold')
    plt.grid(True, alpha=0.2)
    plt.legend(loc='upper right')
    plt.tight_layout()

    # Calculate statistics
    total = len(phi_deg)
    favored = sum(1 for c in colors if c in ['blue', 'red'])
    outliers = sum(1 for c in colors if c == 'orange')

    return {
        'total': total,
        'favored': favored,
        'favored_pct': 100 * favored / total if total > 0 else 0,
        'outliers': outliers,
        'outliers_pct': 100 * outliers / total if total > 0 else 0
    }

In [ ]:

Copied!





# Compare Good vs Bad
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

plt.sca(ax1)
good_stats = plot_ramachandran(good_pdb, title="GOOD: Minimized & Optimized")

plt.sca(ax2)
bad_stats = plot_ramachandran(bad_pdb, title="BAD: Drifted Decoy (No Minimization)")

plt.show()

# Print reports
print("📊 RAMACHANDRAN STATISTICS")
print("=" * 50)
print(f"GOOD Structure: {good_stats['favored_pct']:.1f}% in favored regions")
print(f"                {good_stats['outliers_pct']:.1f}% outliers")
print(f"\nBAD Structure:  {bad_stats['favored_pct']:.1f}% in favored regions")
print(f"                {bad_stats['outliers_pct']:.1f}% outliers")

print("\n💡 INTERPRETATION:")
print("   - 'Bad' structures (Decoys) have a fold but are physically misfolded.")
print("   - 'Good' structures have been relaxed to resolve all steric clashes.")
# Compare Good vs Bad
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

plt.sca(ax1)
good_stats = plot_ramachandran(good_pdb, title="GOOD: Minimized & Optimized")

plt.sca(ax2)
bad_stats = plot_ramachandran(bad_pdb, title="BAD: Drifted Decoy (No Minimization)")

plt.show()

# Print reports
print("📊 RAMACHANDRAN STATISTICS")
print("=" * 50)
print(f"GOOD Structure: {good_stats['favored_pct']:.1f}% in favored regions")
print(f"                {good_stats['outliers_pct']:.1f}% outliers")
print(f"\nBAD Structure:  {bad_stats['favored_pct']:.1f}% in favored regions")
print(f"                {bad_stats['outliers_pct']:.1f}% outliers")

print("\n💡 INTERPRETATION:")
print("   - 'Bad' structures (Decoys) have a fold but are physically misfolded.")
print("   - 'Good' structures have been relaxed to resolve all steric clashes.")

Step 3: Clash Detection¶

Steric clashes occur when atoms are too close together (overlapping van der Waals radii).

🔬 Detection Method¶

For each atom pair:

Calculate distance
Compare to sum of van der Waals radii
If distance < 0.4 Å below expected → CLASH!

✅ Good Structure¶

Few or no clashes
Clashes only in flexible loops (acceptable)

❌ Bad Structure¶

Many clashes throughout
Clashes in core regions (unacceptable)

In [ ]:

Copied!





def detect_clashes(pdb_content, clash_threshold=2.0):
    """Detect steric clashes between atoms."""
    pdb_file = pdb.PDBFile.read(io.StringIO(pdb_content))
    structure = pdb_file.get_structure(model=1)

    # Get all heavy atoms (no hydrogens)
    heavy = structure[structure.element != 'H']

    clashes = []
    n_atoms = len(heavy)

    # Pairwise distance check (simplified - real tools use spatial indexing)
    for i in range(min(n_atoms, 500)):  # Limit for speed
        for j in range(i+1, min(n_atoms, 500)):
            # Skip if same residue or adjacent residues
            if abs(heavy.res_id[i] - heavy.res_id[j]) <= 1:
                continue

            # Calculate distance
            dist = np.linalg.norm(heavy.coord[i] - heavy.coord[j])

            # Check for clash
            if dist < clash_threshold:
                clashes.append({
                    'atom1': f"{heavy.res_name[i]}{heavy.res_id[i]}:{heavy.atom_name[i]}",
                    'atom2': f"{heavy.res_name[j]}{heavy.res_id[j]}:{heavy.atom_name[j]}",
                    'distance': dist
                })

    return clashes

print("✅ Clash detection function ready!")
def detect_clashes(pdb_content, clash_threshold=2.0):
    """Detect steric clashes between atoms."""
    pdb_file = pdb.PDBFile.read(io.StringIO(pdb_content))
    structure = pdb_file.get_structure(model=1)

    # Get all heavy atoms (no hydrogens)
    heavy = structure[structure.element != 'H']

    clashes = []
    n_atoms = len(heavy)

    # Pairwise distance check (simplified - real tools use spatial indexing)
    for i in range(min(n_atoms, 500)):  # Limit for speed
        for j in range(i+1, min(n_atoms, 500)):
            # Skip if same residue or adjacent residues
            if abs(heavy.res_id[i] - heavy.res_id[j]) <= 1:
                continue

            # Calculate distance
            dist = np.linalg.norm(heavy.coord[i] - heavy.coord[j])

            # Check for clash
            if dist < clash_threshold:
                clashes.append({
                    'atom1': f"{heavy.res_name[i]}{heavy.res_id[i]}:{heavy.atom_name[i]}",
                    'atom2': f"{heavy.res_name[j]}{heavy.res_id[j]}:{heavy.atom_name[j]}",
                    'distance': dist
                })

    return clashes

print("✅ Clash detection function ready!")

In [ ]:

Copied!





# Detect clashes in both structures
print("🔍 Detecting steric clashes...\n")

good_clashes = detect_clashes(good_pdb)
bad_clashes = detect_clashes(bad_pdb)

print("📊 CLASH STATISTICS")
print("=" * 50)
print(f"✅ GOOD Structure: {len(good_clashes)} clashes detected")
if good_clashes:
    print("   Top 3 clashes:")
    for clash in sorted(good_clashes, key=lambda x: x['distance'])[:3]:
        print(f"     {clash['atom1']} ↔ {clash['atom2']}: {clash['distance']:.2f} Å")

print(f"\n⚠️  BAD Structure: {len(bad_clashes)} clashes detected")
if bad_clashes:
    print("   Top 3 clashes:")
    for clash in sorted(bad_clashes, key=lambda x: x['distance'])[:3]:
        print(f"     {clash['atom1']} ↔ {clash['atom2']}: {clash['distance']:.2f} Å")

print("\n💡 INTERPRETATION:")
print("   <10 clashes = Excellent")
print("   10-50 clashes = Acceptable (may need refinement)")
print("   >50 clashes = Poor quality")
# Detect clashes in both structures
print("🔍 Detecting steric clashes...\n")

good_clashes = detect_clashes(good_pdb)
bad_clashes = detect_clashes(bad_pdb)

print("📊 CLASH STATISTICS")
print("=" * 50)
print(f"✅ GOOD Structure: {len(good_clashes)} clashes detected")
if good_clashes:
    print("   Top 3 clashes:")
    for clash in sorted(good_clashes, key=lambda x: x['distance'])[:3]:
        print(f"     {clash['atom1']} ↔ {clash['atom2']}: {clash['distance']:.2f} Å")

print(f"\n⚠️  BAD Structure: {len(bad_clashes)} clashes detected")
if bad_clashes:
    print("   Top 3 clashes:")
    for clash in sorted(bad_clashes, key=lambda x: x['distance'])[:3]:
        print(f"     {clash['atom1']} ↔ {clash['atom2']}: {clash['distance']:.2f} Å")

print("\n💡 INTERPRETATION:")
print("   <10 clashes = Excellent")
print("   10-50 clashes = Acceptable (may need refinement)")
print("   >50 clashes = Poor quality")

Step 4: Comprehensive Validation¶

Now let's use synth-pdb's built-in validator to run a complete quality check.

In [ ]:

Copied!





# Validate Good Structure
print("🔍 Validating GOOD structure...\n")
good_validator = PDBValidator(good_pdb)
good_validator.validate_all()
good_violations = good_validator.get_violations()

print(f"\n{'='*50}")
if not good_violations:
    print("✅ No violations! Structure passes all checks.")
else:
    print(f"⚠️  Found {len(good_violations)} violations:")
    for v in good_violations[:5]:
        print(f"   - {v}")
    if len(good_violations) > 5:
        print(f"   ... and {len(good_violations)-5} more")
# Validate Good Structure
print("🔍 Validating GOOD structure...\n")
good_validator = PDBValidator(good_pdb)
good_validator.validate_all()
good_violations = good_validator.get_violations()

print(f"\n{'='*50}")
if not good_violations:
    print("✅ No violations! Structure passes all checks.")
else:
    print(f"⚠️  Found {len(good_violations)} violations:")
    for v in good_violations[:5]:
        print(f"   - {v}")
    if len(good_violations) > 5:
        print(f"   ... and {len(good_violations)-5} more")

In [ ]:

Copied!





# Validate Bad Structure
print("🔍 Validating BAD structure...\n")
bad_validator = PDBValidator(bad_pdb)
bad_validator.validate_all()
bad_violations = bad_validator.get_violations()

print(f"\n{'='*50}")
if not bad_violations:
    print("✅ No violations! Structure passes all checks.")
else:
    print(f"⚠️  Found {len(bad_violations)} violations:")
    for v in bad_violations[:5]:
        print(f"   - {v}")
    if len(bad_violations) > 5:
        print(f"   ... and {len(bad_violations)-5} more")
# Validate Bad Structure
print("🔍 Validating BAD structure...\n")
bad_validator = PDBValidator(bad_pdb)
bad_validator.validate_all()
bad_violations = bad_validator.get_violations()

print(f"\n{'='*50}")
if not bad_violations:
    print("✅ No violations! Structure passes all checks.")
else:
    print(f"⚠️  Found {len(bad_violations)} violations:")
    for v in bad_violations[:5]:
        print(f"   - {v}")
    if len(bad_violations) > 5:
        print(f"   ... and {len(bad_violations)-5} more")

Step 5: Interactive Comparison¶

Use the widget below to toggle between good and bad structures and see the quality differences!

In [ ]:

Copied!





import py3Dmol

# Create toggle widget
structure_selector = widgets.ToggleButtons(
    options=['Good Structure ✅', 'Bad Structure ⚠️'],
    description='View:',
    button_style='info'
)

output = widgets.Output()

def show_structure(change):
    with output:
        output.clear_output(wait=True)

        if 'Good' in structure_selector.value:
            pdb_data = good_pdb
            title = "✅ GOOD Structure (Minimized)"
            stats = good_stats
            clashes = len(good_clashes)
            violations = len(good_violations)
        else:
            pdb_data = bad_pdb
            title = "⚠️  BAD Structure (Not Minimized)"
            stats = bad_stats
            clashes = len(bad_clashes)
            violations = len(bad_violations)

        # 3D Viewer
        view = py3Dmol.view(width=600, height=400)
        view.addModel(pdb_data, 'pdb')
        view.setStyle({'cartoon': {'color': 'spectrum'}})
        view.zoomTo()
        view.show()

        # Quality Report
        print(f"\n{title}")
        print("=" * 50)
        print(f"Ramachandran favored: {stats['favored_pct']:.1f}%")
        print(f"Ramachandran outliers: {stats['outliers_pct']:.1f}%")
        print(f"Steric clashes: {clashes}")
        print(f"Total violations: {violations}")

        # Overall grade
        # A good structure must have very few clashes. OpenMM energy minimization
        # often creates minor bond/angle 'violations' to resolve severe steric clashes.
        # Therefore, we prioritize zero clashes over zero violations.
        if clashes == 0 and stats['favored_pct'] > 30:
            print("\n🏆 Overall Grade: EXCELLENT (Clash-free, physically relaxed)")
        elif clashes < 5 and stats['favored_pct'] > 50:
            print("\n👍 Overall Grade: GOOD (Minor clashes)")
        elif clashes < 15:
            print("\n⚠️  Overall Grade: ACCEPTABLE (Needs refinement)")
        else:
            print("\n❌ Overall Grade: POOR (Severe steric clashes)")

# Remove existing callbacks to prevent multiple triggers if cell is re-run
structure_selector.unobserve_all(name='value')
structure_selector.observe(show_structure, names='value')

display(structure_selector, output)

# 3Dmol.js initializes asynchronously in the browser. Calling show_structure()
# at kernel execution time races with the library bootstrap and produces
# the 'failed to load' error. Show a placeholder instead; the first
# button click fires show_structure() after 3Dmol.js is fully ready.
with output:
    from IPython.display import HTML as _HTML
    from IPython.display import display as _disp
    _disp(_HTML(
        '<div style="text-align:center;padding:40px;color:#aaa;'
        'border:1px dashed #555;border-radius:8px;'
        'font-style:italic;background:#1a1a1a;">'
        '&#128260; Click a button above to load the 3D structure'
        '</div>'
    ))
import py3Dmol

# Create toggle widget
structure_selector = widgets.ToggleButtons(
    options=['Good Structure ✅', 'Bad Structure ⚠️'],
    description='View:',
    button_style='info'
)

output = widgets.Output()

def show_structure(change):
    with output:
        output.clear_output(wait=True)

        if 'Good' in structure_selector.value:
            pdb_data = good_pdb
            title = "✅ GOOD Structure (Minimized)"
            stats = good_stats
            clashes = len(good_clashes)
            violations = len(good_violations)
        else:
            pdb_data = bad_pdb
            title = "⚠️  BAD Structure (Not Minimized)"
            stats = bad_stats
            clashes = len(bad_clashes)
            violations = len(bad_violations)

        # 3D Viewer
        view = py3Dmol.view(width=600, height=400)
        view.addModel(pdb_data, 'pdb')
        view.setStyle({'cartoon': {'color': 'spectrum'}})
        view.zoomTo()
        view.show()

        # Quality Report
        print(f"\n{title}")
        print("=" * 50)
        print(f"Ramachandran favored: {stats['favored_pct']:.1f}%")
        print(f"Ramachandran outliers: {stats['outliers_pct']:.1f}%")
        print(f"Steric clashes: {clashes}")
        print(f"Total violations: {violations}")

        # Overall grade
        # A good structure must have very few clashes. OpenMM energy minimization
        # often creates minor bond/angle 'violations' to resolve severe steric clashes.
        # Therefore, we prioritize zero clashes over zero violations.
        if clashes == 0 and stats['favored_pct'] > 30:
            print("\n🏆 Overall Grade: EXCELLENT (Clash-free, physically relaxed)")
        elif clashes < 5 and stats['favored_pct'] > 50:
            print("\n👍 Overall Grade: GOOD (Minor clashes)")
        elif clashes < 15:
            print("\n⚠️  Overall Grade: ACCEPTABLE (Needs refinement)")
        else:
            print("\n❌ Overall Grade: POOR (Severe steric clashes)")

# Remove existing callbacks to prevent multiple triggers if cell is re-run
structure_selector.unobserve_all(name='value')
structure_selector.observe(show_structure, names='value')

display(structure_selector, output)

# 3Dmol.js initializes asynchronously in the browser. Calling show_structure()
# at kernel execution time races with the library bootstrap and produces
# the 'failed to load' error. Show a placeholder instead; the first
# button click fires show_structure() after 3Dmol.js is fully ready.
with output:
    from IPython.display import HTML as _HTML
    from IPython.display import display as _disp
    _disp(_HTML(
        ''
        '🔄 Click a button above to load the 3D structure'
        '
'
    ))

🎓 Key Takeaways¶

What Makes a Good Structure?¶

>90% Ramachandran favored - Realistic backbone geometry
<10 steric clashes - No overlapping atoms
Proper bond lengths/angles - Chemistry makes sense
Good rotamers - Side chains in low-energy conformations

When to Use Quality Checks¶

✅ Before starting MD simulations
✅ After homology modeling
✅ When downloading structures from PDB (yes, even experimental structures have issues!)
✅ After any structure prediction/generation

Tools for Further Analysis¶

MolProbity (http://molprobity.biochem.duke.edu/) - Comprehensive validation
PROCHECK - Classic Ramachandran analysis
WHATIF - Detailed geometry checks
PDB REDO - Automated structure refinement

�� Next Steps¶

Try these experiments:

Generate your own structures with different parameters
Download a real PDB structure and validate it
Compare crystal structures vs NMR structures vs AlphaFold predictions

Remember: A validated structure is a trustworthy structure! 🔬