🛢️ The Oil Drop Model: Hydrophobic Burial Analysis¶
Duration: ~25 minutes | Level: ⭐⭐ Intermediate
Why do proteins fold?¶
In 1959, Walter Kauzmann proposed a deceptively simple answer: hydrophobic residues hate water. When a protein folds, it squeezes its greasy, water-fearing (hydrophobic) residues into a dry interior core — just like oil droplets coalescing in water. The polar, water-loving residues remain on the surface.
This Oil Drop Model makes a testable structural prediction:
A correctly folded protein should have its hydrophobic residues BURIED (low SASA) and its polar residues EXPOSED (high SASA).
In this tutorial we'll:
- Learn what Solvent-Accessible Surface Area (SASA) measures
- Generate proteins with different hydrophobic content and plot per-residue SASA
- Compute the
burial_ratio— a single number that tests the Oil Drop prediction - See how
get_quality_report()uses this to assess biophysical plausibility
import os
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
# @title Environment Setup
!pip install -q synth-pdb biotite matplotlib numpy scipy py3Dmol openmm
else:
sys.path.append(os.path.abspath('../../'))
print("✅ Environment configured!")
🔧 Setup¶
import os
import sys
try:
import google.colab # noqa: F401
IN_COLAB = True
except ImportError:
IN_COLAB = False
if IN_COLAB:
print('🌐 Running in Google Colab — installing dependencies...')
import subprocess
subprocess.run(['pip', 'install', '-q', 'synth-pdb', 'biotite'], check=True)
else:
# Local development: add the repo root to sys.path
repo_root = os.path.abspath(os.path.join(os.path.dirname('__file__'), '..', '..'))
if repo_root not in sys.path:
sys.path.insert(0, repo_root)
print(f'💻 Running locally — using repo at {repo_root}')
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
from synth_pdb.data import HYDROPHOBIC_AMINO_ACIDS
from synth_pdb.generator import generate_pdb_content
from synth_pdb.validator import PDBValidator
print('✅ All imports successful!')
print(f' Hydrophobic residues tracked: {sorted(HYDROPHOBIC_AMINO_ACIDS)}')
📐 Step 1: What is SASA?¶
Solvent-Accessible Surface Area is the area of a molecule's surface that a water molecule can touch — imagined as a 1.4 Å-radius sphere (water's radius) rolling over the protein surface.
| SASA | Meaning |
|---|---|
| High SASA (> 100 Ų) | Residue is surface-exposed, touching water |
| Low SASA (< 25 Ų) | Residue is buried in the protein core |
The Oil Drop prediction in numbers¶
- Hydrophobic residues (ALA, VAL, ILE, LEU, MET, PHE, TRP, TYR): low SASA ← buried
- Polar residues (SER, THR, ASN, GLN, LYS, ARG, ASP, GLU, HIS): high SASA ← exposed
We measure this with the burial_ratio:
$$\text{burial\_ratio} = \frac{\overline{\text{SASA}_{\text{polar}}}}{\overline{\text{SASA}_{\text{hydrophobic}}} + 10^{-6}}$$
A burial_ratio ≥ 0.8 means the polar mean SASA is at least 80% of the hydrophobic mean SASA —
consistent with hydrophobic burial. This is the threshold used in get_quality_report().
🧬 Step 2: Generate Three Test Proteins¶
We'll compare three structures designed to test different burial regimes:
| Structure | Sequence design | Expected burial_ratio |
|---|---|---|
| Amphipathic helix | Alternating Lys/Leu — polar one face, hydrophobic the other | ≥ 0.8 (plausible) |
| Hydrophobic-only peptide | All Val — no polar surface possible | < 0.1 (implausible) |
| Random-coil decoy | Scrambled amphipathic, no minimization | Variable |
# ── Amphipathic helix: alternating polar/hydrophobic faces ──────────────
# AAKLLLAAKLLLAAK → Ala/Leu hydrophobic face, Lys/Ala polar face
SEQ_HELIX = 'AAKLLLAAKLLLAAK'
SEQ_HYDRO = 'VIVVIVVIVVI' # All hydrophobic — should FAIL burial check
SEQ_RANDOM = 'LKLALAKLLAKA' # Scrambled — may or may not pass
print('🧬 Generating structures (seed=42 for reproducibility)...')
pdb_helix = generate_pdb_content(
sequence_str=SEQ_HELIX, structure=f'1-{len(SEQ_HELIX)}:alpha',
minimize_energy=True, seed=42
)
pdb_hydro = generate_pdb_content(
sequence_str=SEQ_HYDRO, structure=f'1-{len(SEQ_HYDRO)}:beta',
minimize_energy=False, seed=42
)
pdb_random = generate_pdb_content(
sequence_str=SEQ_RANDOM, structure=f'1-{len(SEQ_RANDOM)}:random',
minimize_energy=False, seed=42
)
print('✅ All three structures generated!')
print(f' Amphipathic helix: {len(SEQ_HELIX)} residues')
print(f' Hydrophobic-only: {len(SEQ_HYDRO)} residues')
print(f' Random-coil decoy: {len(SEQ_RANDOM)} residues')
📊 Step 3: Compute Per-Residue SASA¶
PDBValidator.calculate_residue_sasa() returns:
SASA: a dict mapping{res_id → SASA_in_Ų}for every residuemean_hydrophobic_sasa/mean_polar_sasa: group averagesburial_ratio: the Oil Drop metric
def compute_sasa(pdb_content, sequence):
"""Compute and return SASA data for a PDB string."""
v = PDBValidator(pdb_content)
sasa_data = v.calculate_residue_sasa()
# Build a list of (res_id, one_letter, sasa, is_hydrophobic)
residues = []
aa_3to1 = {
'ALA':'A','ARG':'R','ASN':'N','ASP':'D','CYS':'C','GLN':'Q','GLU':'E',
'GLY':'G','HIS':'H','ILE':'I','LEU':'L','LYS':'K','MET':'M','PHE':'F',
'PRO':'P','SER':'S','THR':'T','TRP':'W','TYR':'Y','VAL':'V',
}
# Map from sequence index to 3-letter
aa_1to3 = {v: k for k, v in aa_3to1.items()}
for i, (res_id, sasa_val) in enumerate(sorted(sasa_data['SASA'].items())):
aa1 = sequence[i] if i < len(sequence) else '?'
aa3 = aa_1to3.get(aa1, 'UNK')
is_hydro = aa3 in HYDROPHOBIC_AMINO_ACIDS
residues.append({'res_id': res_id, 'aa': aa1, 'sasa': sasa_val, 'hydrophobic': is_hydro})
return residues, sasa_data
residues_helix, sasa_helix = compute_sasa(pdb_helix, SEQ_HELIX)
residues_hydro, sasa_hydro = compute_sasa(pdb_hydro, SEQ_HYDRO)
residues_random, sasa_random = compute_sasa(pdb_random, SEQ_RANDOM)
for label, sasa_data in [('Amphipathic helix', sasa_helix),
('Hydrophobic-only', sasa_hydro),
('Random-coil decoy', sasa_random)]:
print(f'{label}:')
print(f' mean_hydrophobic_sasa = {sasa_data["mean_hydrophobic_sasa"]:7.1f} Ų')
print(f' mean_polar_sasa = {sasa_data["mean_polar_sasa"]:7.1f} Ų')
print(f' burial_ratio = {sasa_data["burial_ratio"]:7.3f}')
print()
📈 Step 4: Per-Residue SASA Bar Chart¶
Each bar is coloured by residue type:
- 🟠 Orange = hydrophobic (should be SHORT → buried)
- 🟢 Green = polar (should be TALL → exposed)
In a well-folded protein, orange bars are below green bars everywhere.
def plot_sasa_bars(ax, residues, title, burial_ratio, plausible):
"""Draw per-residue SASA coloured by residue type."""
ids = [r['res_id'] for r in residues]
vals = [r['sasa'] for r in residues]
colors = ['#e67e22' if r['hydrophobic'] else '#27ae60' for r in residues]
labels = [r['aa'] for r in residues]
ax.bar(range(len(ids)), vals, color=colors, edgecolor='white', linewidth=0.5)
ax.set_xticks(range(len(ids)))
ax.set_xticklabels(labels, fontsize=9)
ax.set_ylabel('SASA (Ų)', fontsize=10)
ax.set_ylim(0, max(vals) * 1.25 if vals else 200)
verdict = 'PASS: Plausible' if plausible else 'FAIL: Not plausible'
ax.set_title(
f'{title}\nburial_ratio = {burial_ratio:.3f} | {verdict}',
fontsize=11, fontweight='bold'
)
# Threshold line at 25 Ų (canonical buried cutoff)
ax.axhline(25, color='#95a5a6', linestyle='--', linewidth=0.8, alpha=0.7)
ax.text(len(ids) - 0.5, 27, 'buried < 25 Ų', ha='right', fontsize=7.5,
color='#7f8c8d', style='italic')
fig, axes = plt.subplots(3, 1, figsize=(11, 10))
fig.suptitle('Per-Residue SASA: The Oil Drop Test', fontsize=14, fontweight='bold', y=1.01)
plot_sasa_bars(axes[0], residues_helix,
'Amphipathic Helix (AAKLLLAAKLLLAAK)',
sasa_helix['burial_ratio'],
sasa_helix['burial_ratio'] >= 0.8)
plot_sasa_bars(axes[1], residues_hydro,
'Hydrophobic-Only Strand (VIVVIVVIVVI)',
sasa_hydro['burial_ratio'],
sasa_hydro['burial_ratio'] >= 0.8)
plot_sasa_bars(axes[2], residues_random,
'Random-Coil Decoy (LKLALAKLLAKA)',
sasa_random['burial_ratio'],
sasa_random['burial_ratio'] >= 0.8)
# Shared legend
legend_patches = [
mpatches.Patch(color='#e67e22', label='Hydrophobic residue'),
mpatches.Patch(color='#27ae60', label='Polar residue'),
]
fig.legend(handles=legend_patches, loc='upper right', fontsize=9, framealpha=0.9)
plt.tight_layout()
plt.show()
📊 Step 5: Comparing burial_ratio Across Structures¶
The Oil Drop threshold is burial_ratio ≥ 0.8. Let's see how our three structures score:
fig, ax = plt.subplots(figsize=(8, 4))
labels = ['Amphipathic\nHelix', 'Hydrophobic\nOnly', 'Random\nCoil Decoy']
ratios = [sasa_helix['burial_ratio'],
sasa_hydro['burial_ratio'],
sasa_random['burial_ratio']]
colors = ['#2ecc71' if r >= 0.8 else '#e74c3c' for r in ratios]
bars = ax.bar(labels, ratios, color=colors, alpha=0.85, edgecolor='white', linewidth=0.5)
# Value labels
for bar, ratio in zip(bars, ratios, strict=False):
ax.text(bar.get_x() + bar.get_width() / 2,
min(bar.get_height(), 2.0) + 0.05,
f'{ratio:.3f}', ha='center', va='bottom', fontsize=10, fontweight='bold')
# Threshold line
ax.axhline(0.8, color='#2c3e50', linestyle='--', linewidth=1.2, alpha=0.7, label='Threshold (0.8)')
ax.set_ylim(0, max(max(ratios) * 1.2, 1.1))
ax.set_ylabel('burial_ratio (polar SASA / hydrophobic SASA)', fontsize=10)
ax.set_title('Oil Drop Test: Which structure passes?', fontsize=12, fontweight='bold')
ax.legend(fontsize=9)
for _i, (bar, label) in enumerate(zip(bars, ['PASS', 'FAIL', '?' ], strict=False)):
col = '#27ae60' if 'PASS' in label else ('#e74c3c' if 'FAIL' in label else '#7f8c8d')
ax.text(bar.get_x() + bar.get_width() / 2, 0.04, label,
ha='center', va='bottom', fontsize=12, color=col, fontweight='bold')
plt.tight_layout()
plt.show()
🏅 Step 6: The Oil Drop Check Inside get_quality_report()¶
burial_ratio is one of the five criteria in the Scientific Defensibility Scorecard.
Let's see it in context:
def show_burial_in_report(label, pdb_content):
v = PDBValidator(pdb_content)
report = v.get_quality_report()
br = report['hydrophobic_burial_ratio']
bio = report['is_biophysically_plausible']
defns = report['is_overall_scientifically_defensible']
def tick(b):
return 'PASS' if b else 'FAIL'
print(f'{label}')
print(f' burial_ratio = {br:.3f} ({tick(bio)} Oil Drop test)')
print(f' is_biophysically_plausible = {bio}')
print(f' is_overall_defensible = {defns}')
print()
show_burial_in_report('Amphipathic Helix', pdb_helix)
show_burial_in_report('Hydrophobic-Only Strand', pdb_hydro)
show_burial_in_report('Random-Coil Decoy', pdb_random)
🎓 Key Takeaways¶
What the Oil Drop Model predicts¶
| Residue type | Surface area | Why |
|---|---|---|
| Hydrophobic (ALA, VAL, LEU…) | Low SASA → buried | Water contact is thermodynamically unfavourable |
| Polar (LYS, ASP, SER…) | High SASA → exposed | Hydrogen bonding to water is stabilising |
What burial_ratio measures¶
burial_ratio ≥ 0.8→ polar residues are as exposed as (or more than) hydrophobic ones ✅burial_ratio < 0.1→ nearly all surface area belongs to hydrophobic residues ❌- Extreme sequences (all-hydrophobic, all-polar) can saturate the metric — always interpret alongside the per-residue bar chart.
Why this matters¶
- Protein design: sequences that fail the Oil Drop check are unlikely to fold stably
- Structure validation: misfolded decoys often expose hydrophobic patches
- AI structure assessment: AlphaFold predictions with low burial_ratio in low-pLDDT regions are likely disordered, not misfolded
What to try next¶
- Run
generate_pdb_contentwithminimize_energy=Trueon the hydrophobic-only strand. Does minimization improve the burial_ratio? - Extend the analysis to the Structure Defensibility Dashboard for all five criteria.
- Try sequences from real IDPs (e.g. Alpha-synuclein N-terminal:
MDVFMKGLSKAKEGVVAA).