💊 Drug Discovery Pipeline¶

Welcome to the Peptide Drug Discovery Pipeline!

In this tutorial, you'll build an end-to-end computational workflow for discovering therapeutic peptides.

🎯 Goal¶

Design, screen, and optimize peptide drug candidates using computational methods.

📚 What You'll Learn¶

Library Generation: Create diverse peptide libraries
Property Prediction: Calculate ADME (Absorption, Distribution, Metabolism, Excretion)
Binding Scoring: Estimate target affinity
Cyclization: Improve stability with macrocycles
Lead Selection: Rank and filter candidates

💡 Real-World Context¶

This pipeline mirrors workflows used in biotech/pharma for:

Peptide therapeutics (e.g., GLP-1 agonists like Ozempic)
Cyclic peptide antibiotics
Protein-protein interaction inhibitors

In [ ]:

Copied!





# 🛠️ SETUP: Install dependencies

try:
    import google.colab
    IN_COLAB = True
    print("🌐 Running in Google Colab")
    !pip install -q synth-pdb matplotlib numpy biotite py3Dmol pandas seaborn
except ImportError:
    IN_COLAB = False
    print("💻 Running locally")

print("✅ Setup complete!")
# 🛠️ SETUP: Install dependencies

try:
    import google.colab
    IN_COLAB = True
    print("🌐 Running in Google Colab")
    !pip install -q synth-pdb matplotlib numpy biotite py3Dmol pandas seaborn
except ImportError:
    IN_COLAB = False
    print("💻 Running locally")

print("✅ Setup complete!")

In [ ]:

Copied!





import random

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from synth_pdb.generator import generate_pdb_content

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100

print("📦 All imports successful!")
import random

import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

from synth_pdb.generator import generate_pdb_content

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.dpi'] = 100

print("📦 All imports successful!")

Step 1: Peptide Library Generation¶

We'll create a focused library of peptides targeting a hypothetical receptor.

🧬 Design Strategy¶

Length: 8-12 residues (typical for bioactive peptides)
Composition: Biased toward drug-like amino acids
Diversity: Random sequences with structural constraints

🎲 Amino Acid Preferences¶

We'll favor residues common in successful peptide drugs:

Hydrophobic: L, V, I, F, W (binding pockets)
Polar: S, T, N, Q (H-bonding)
Charged: K, R, D, E (electrostatic interactions)
Special: P (turns), G (flexibility), C (disulfides)

In [ ]:

Copied!





# Define amino acid library (drug-like bias)
AA_LIBRARY = {
    'hydrophobic': ['L', 'V', 'I', 'F', 'W', 'A'],
    'polar': ['S', 'T', 'N', 'Q'],
    'positive': ['K', 'R'],
    'negative': ['D', 'E'],
    'special': ['P', 'G', 'C']
}

def generate_peptide_sequence(length=10, seed=None):
    """Generate a random peptide sequence with drug-like composition."""
    if seed is not None:
        random.seed(seed)

    # Composition bias (percentages)
    composition = (
        AA_LIBRARY['hydrophobic'] * 4 +  # 40%
        AA_LIBRARY['polar'] * 3 +         # 30%
        AA_LIBRARY['positive'] * 1 +      # 10%
        AA_LIBRARY['negative'] * 1 +      # 10%
        AA_LIBRARY['special'] * 1         # 10%
    )

    sequence = ''.join(random.choices(composition, k=length))
    return sequence

# Generate library
print("🧬 Generating peptide library...\n")
LIBRARY_SIZE = 50
library = []

for i in range(LIBRARY_SIZE):
    length = random.randint(8, 12)
    seq = generate_peptide_sequence(length, seed=i)
    library.append({
        'id': f"PEP{i+1:03d}",
        'sequence': seq,
        'length': length
    })

# Convert to DataFrame
df = pd.DataFrame(library)

print(f"✅ Generated {len(df)} peptides")
print(f"   Length range: {df['length'].min()}-{df['length'].max()} residues")
print("\nFirst 5 sequences:")
for idx, row in df.head().iterrows():
    print(f"   {row['id']}: {row['sequence']}")
# Define amino acid library (drug-like bias)
AA_LIBRARY = {
    'hydrophobic': ['L', 'V', 'I', 'F', 'W', 'A'],
    'polar': ['S', 'T', 'N', 'Q'],
    'positive': ['K', 'R'],
    'negative': ['D', 'E'],
    'special': ['P', 'G', 'C']
}

def generate_peptide_sequence(length=10, seed=None):
    """Generate a random peptide sequence with drug-like composition."""
    if seed is not None:
        random.seed(seed)

    # Composition bias (percentages)
    composition = (
        AA_LIBRARY['hydrophobic'] * 4 +  # 40%
        AA_LIBRARY['polar'] * 3 +         # 30%
        AA_LIBRARY['positive'] * 1 +      # 10%
        AA_LIBRARY['negative'] * 1 +      # 10%
        AA_LIBRARY['special'] * 1         # 10%
    )

    sequence = ''.join(random.choices(composition, k=length))
    return sequence

# Generate library
print("🧬 Generating peptide library...\n")
LIBRARY_SIZE = 50
library = []

for i in range(LIBRARY_SIZE):
    length = random.randint(8, 12)
    seq = generate_peptide_sequence(length, seed=i)
    library.append({
        'id': f"PEP{i+1:03d}",
        'sequence': seq,
        'length': length
    })

# Convert to DataFrame
df = pd.DataFrame(library)

print(f"✅ Generated {len(df)} peptides")
print(f"   Length range: {df['length'].min()}-{df['length'].max()} residues")
print("\nFirst 5 sequences:")
for idx, row in df.head().iterrows():
    print(f"   {row['id']}: {row['sequence']}")

Step 2: Property Prediction (ADME)¶

We'll calculate key physicochemical properties that affect drug-likeness.

📊 Properties to Calculate¶

Molecular Weight - Affects bioavailability
Net Charge - Affects solubility and membrane permeability
Hydrophobicity (GRAVY) - Affects membrane binding
Instability Index - Predicts degradation
Aromatic Content - Affects binding affinity

✅ Drug-Like Criteria¶

MW: 800-1500 Da (peptide range)
Net charge: -2 to +3 (soluble but membrane-permeable)
GRAVY: -0.5 to +0.5 (balanced)
Instability: <40 (stable)

In [ ]:

Copied!





# Amino acid properties
AA_MW = {'A': 89, 'C': 121, 'D': 133, 'E': 147, 'F': 165, 'G': 75, 'H': 155,
         'I': 131, 'K': 146, 'L': 131, 'M': 149, 'N': 132, 'P': 115, 'Q': 146,
         'R': 174, 'S': 105, 'T': 119, 'V': 117, 'W': 204, 'Y': 181}

AA_HYDRO = {'A': 1.8, 'C': 2.5, 'D': -3.5, 'E': -3.5, 'F': 2.8, 'G': -0.4,
            'H': -3.2, 'I': 4.5, 'K': -3.9, 'L': 3.8, 'M': 1.9, 'N': -3.5,
            'P': -1.6, 'Q': -3.5, 'R': -4.5, 'S': -0.8, 'T': -0.7, 'V': 4.2,
            'W': -0.9, 'Y': -1.3}

AA_CHARGE = {'D': -1, 'E': -1, 'K': 1, 'R': 1}  # at pH 7

def calculate_properties(sequence):
    """Calculate physicochemical properties."""
    # Molecular weight
    mw = sum(AA_MW.get(aa, 110) for aa in sequence) - 18 * (len(sequence) - 1)

    # Net charge at pH 7
    charge = sum(AA_CHARGE.get(aa, 0) for aa in sequence)

    # GRAVY (Grand Average of Hydropathy)
    gravy = sum(AA_HYDRO.get(aa, 0) for aa in sequence) / len(sequence)

    # Aromatic content
    aromatic = sum(1 for aa in sequence if aa in 'FWY') / len(sequence)

    # Instability index (simplified)
    # Real calculation uses dipeptide instability weights
    unstable_pairs = sum(1 for i in range(len(sequence)-1)
                        if sequence[i:i+2] in ['DP', 'PD', 'DD', 'EE'])
    instability = (unstable_pairs / len(sequence)) * 100

    return {
        'mw': mw,
        'charge': charge,
        'gravy': gravy,
        'aromatic_pct': aromatic * 100,
        'instability': instability
    }

# Calculate for all peptides
print("📊 Calculating ADME properties...\n")

for idx, row in df.iterrows():
    props = calculate_properties(row['sequence'])
    for key, value in props.items():
        df.at[idx, key] = value

print("✅ Properties calculated!")
print("\nProperty ranges:")
print(f"   MW: {df['mw'].min():.0f} - {df['mw'].max():.0f} Da")
print(f"   Charge: {df['charge'].min():.0f} to {df['charge'].max():.0f}")
print(f"   GRAVY: {df['gravy'].min():.2f} to {df['gravy'].max():.2f}")
print(f"   Instability: {df['instability'].min():.1f} - {df['instability'].max():.1f}")
# Amino acid properties
AA_MW = {'A': 89, 'C': 121, 'D': 133, 'E': 147, 'F': 165, 'G': 75, 'H': 155,
         'I': 131, 'K': 146, 'L': 131, 'M': 149, 'N': 132, 'P': 115, 'Q': 146,
         'R': 174, 'S': 105, 'T': 119, 'V': 117, 'W': 204, 'Y': 181}

AA_HYDRO = {'A': 1.8, 'C': 2.5, 'D': -3.5, 'E': -3.5, 'F': 2.8, 'G': -0.4,
            'H': -3.2, 'I': 4.5, 'K': -3.9, 'L': 3.8, 'M': 1.9, 'N': -3.5,
            'P': -1.6, 'Q': -3.5, 'R': -4.5, 'S': -0.8, 'T': -0.7, 'V': 4.2,
            'W': -0.9, 'Y': -1.3}

AA_CHARGE = {'D': -1, 'E': -1, 'K': 1, 'R': 1}  # at pH 7

def calculate_properties(sequence):
    """Calculate physicochemical properties."""
    # Molecular weight
    mw = sum(AA_MW.get(aa, 110) for aa in sequence) - 18 * (len(sequence) - 1)

    # Net charge at pH 7
    charge = sum(AA_CHARGE.get(aa, 0) for aa in sequence)

    # GRAVY (Grand Average of Hydropathy)
    gravy = sum(AA_HYDRO.get(aa, 0) for aa in sequence) / len(sequence)

    # Aromatic content
    aromatic = sum(1 for aa in sequence if aa in 'FWY') / len(sequence)

    # Instability index (simplified)
    # Real calculation uses dipeptide instability weights
    unstable_pairs = sum(1 for i in range(len(sequence)-1)
                        if sequence[i:i+2] in ['DP', 'PD', 'DD', 'EE'])
    instability = (unstable_pairs / len(sequence)) * 100

    return {
        'mw': mw,
        'charge': charge,
        'gravy': gravy,
        'aromatic_pct': aromatic * 100,
        'instability': instability
    }

# Calculate for all peptides
print("📊 Calculating ADME properties...\n")

for idx, row in df.iterrows():
    props = calculate_properties(row['sequence'])
    for key, value in props.items():
        df.at[idx, key] = value

print("✅ Properties calculated!")
print("\nProperty ranges:")
print(f"   MW: {df['mw'].min():.0f} - {df['mw'].max():.0f} Da")
print(f"   Charge: {df['charge'].min():.0f} to {df['charge'].max():.0f}")
print(f"   GRAVY: {df['gravy'].min():.2f} to {df['gravy'].max():.2f}")
print(f"   Instability: {df['instability'].min():.1f} - {df['instability'].max():.1f}")

Step 3: Binding Score Estimation¶

We'll use a simplified scoring function to estimate binding affinity.

🎯 Scoring Components¶

Hydrophobic contacts - Favorable binding
Electrostatic interactions - Charge complementarity
Aromatic stacking - π-π interactions
Size penalty - Too large = entropic cost

📝 Note¶

This is a toy model! Real docking uses:

AutoDock Vina
Rosetta
Schrödinger Glide
AlphaFold-Multimer

In [ ]:

Copied!





def estimate_binding_score(sequence, target_profile=None):
    """Simplified binding score estimation.
    Lower score = better binding (like docking scores).
    """
    if target_profile is None:
        # Default: hydrophobic pocket with some charged residues
        target_profile = {
            'prefers_hydrophobic': True,
            'prefers_aromatic': True,
            'charge_preference': +1  # Prefers positive ligands
        }

    score = 0.0

    # Hydrophobic contribution
    hydrophobic_count = sum(1 for aa in sequence if aa in 'LVIFWA')
    if target_profile['prefers_hydrophobic']:
        score -= hydrophobic_count * 0.5  # Favorable

    # Aromatic contribution
    aromatic_count = sum(1 for aa in sequence if aa in 'FWY')
    if target_profile['prefers_aromatic']:
        score -= aromatic_count * 0.8  # Very favorable

    # Charge complementarity
    net_charge = sum(AA_CHARGE.get(aa, 0) for aa in sequence)
    charge_match = abs(net_charge - target_profile['charge_preference'])
    score += charge_match * 0.3  # Penalty for mismatch

    # Size penalty (entropic cost)
    if len(sequence) > 10:
        score += (len(sequence) - 10) * 0.2

    # Add some noise (real binding is complex!)
    score += random.gauss(0, 0.5)

    return score

# Calculate binding scores
print("🎯 Estimating binding affinities...\n")

for idx, row in df.iterrows():
    score = estimate_binding_score(row['sequence'])
    df.at[idx, 'binding_score'] = score

print("✅ Binding scores calculated!")
print(f"   Best score: {df['binding_score'].min():.2f}")
print(f"   Worst score: {df['binding_score'].max():.2f}")
print("\nTop 5 binders:")
top5 = df.nsmallest(5, 'binding_score')[['id', 'sequence', 'binding_score']]
for idx, row in top5.iterrows():
    print(f"   {row['id']}: {row['sequence']} (score: {row['binding_score']:.2f})")
def estimate_binding_score(sequence, target_profile=None):
    """Simplified binding score estimation.
    Lower score = better binding (like docking scores).
    """
    if target_profile is None:
        # Default: hydrophobic pocket with some charged residues
        target_profile = {
            'prefers_hydrophobic': True,
            'prefers_aromatic': True,
            'charge_preference': +1  # Prefers positive ligands
        }

    score = 0.0

    # Hydrophobic contribution
    hydrophobic_count = sum(1 for aa in sequence if aa in 'LVIFWA')
    if target_profile['prefers_hydrophobic']:
        score -= hydrophobic_count * 0.5  # Favorable

    # Aromatic contribution
    aromatic_count = sum(1 for aa in sequence if aa in 'FWY')
    if target_profile['prefers_aromatic']:
        score -= aromatic_count * 0.8  # Very favorable

    # Charge complementarity
    net_charge = sum(AA_CHARGE.get(aa, 0) for aa in sequence)
    charge_match = abs(net_charge - target_profile['charge_preference'])
    score += charge_match * 0.3  # Penalty for mismatch

    # Size penalty (entropic cost)
    if len(sequence) > 10:
        score += (len(sequence) - 10) * 0.2

    # Add some noise (real binding is complex!)
    score += random.gauss(0, 0.5)

    return score

# Calculate binding scores
print("🎯 Estimating binding affinities...\n")

for idx, row in df.iterrows():
    score = estimate_binding_score(row['sequence'])
    df.at[idx, 'binding_score'] = score

print("✅ Binding scores calculated!")
print(f"   Best score: {df['binding_score'].min():.2f}")
print(f"   Worst score: {df['binding_score'].max():.2f}")
print("\nTop 5 binders:")
top5 = df.nsmallest(5, 'binding_score')[['id', 'sequence', 'binding_score']]
for idx, row in top5.iterrows():
    print(f"   {row['id']}: {row['sequence']} (score: {row['binding_score']:.2f})")

Step 4: Drug-Likeness Filtering¶

Apply filters to identify candidates with good ADME properties.

In [ ]:

Copied!





# Define drug-like criteria
def is_druglike(row):
    """Check if peptide meets drug-like criteria."""
    checks = {
        'mw_ok': 800 <= row['mw'] <= 1500,
        'charge_ok': -2 <= row['charge'] <= 3,
        'gravy_ok': -0.5 <= row['gravy'] <= 0.5,
        'stable': row['instability'] < 40
    }
    return all(checks.values()), checks

# Apply filters
druglike_flags = []
for idx, row in df.iterrows():
    is_dl, checks = is_druglike(row)
    druglike_flags.append(is_dl)

df['druglike'] = druglike_flags

# Summary
n_druglike = df['druglike'].sum()
print("📊 Drug-Likeness Filtering Results")
print("=" * 50)
print(f"Total library: {len(df)} peptides")
print(f"Drug-like: {n_druglike} ({100*n_druglike/len(df):.1f}%)")
print(f"Filtered out: {len(df) - n_druglike}")

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# MW distribution
axes[0,0].hist(df['mw'], bins=20, alpha=0.7, edgecolor='black')
axes[0,0].axvline(800, color='red', linestyle='--', label='Min threshold')
axes[0,0].axvline(1500, color='red', linestyle='--', label='Max threshold')
axes[0,0].set_xlabel('Molecular Weight (Da)')
axes[0,0].set_ylabel('Count')
axes[0,0].set_title('Molecular Weight Distribution')
axes[0,0].legend()

# Charge distribution
axes[0,1].hist(df['charge'], bins=range(-3, 6), alpha=0.7, edgecolor='black')
axes[0,1].axvline(-2, color='red', linestyle='--')
axes[0,1].axvline(3, color='red', linestyle='--')
axes[0,1].set_xlabel('Net Charge')
axes[0,1].set_ylabel('Count')
axes[0,1].set_title('Charge Distribution')

# GRAVY distribution
axes[1,0].hist(df['gravy'], bins=20, alpha=0.7, edgecolor='black')
axes[1,0].axvline(-0.5, color='red', linestyle='--')
axes[1,0].axvline(0.5, color='red', linestyle='--')
axes[1,0].set_xlabel('GRAVY Score')
axes[1,0].set_ylabel('Count')
axes[1,0].set_title('Hydrophobicity Distribution')

# Binding score vs MW
colors = ['green' if dl else 'red' for dl in df['druglike']]
axes[1,1].scatter(df['mw'], df['binding_score'], c=colors, alpha=0.6, edgecolors='black')
axes[1,1].set_xlabel('Molecular Weight (Da)')
axes[1,1].set_ylabel('Binding Score')
axes[1,1].set_title('Binding vs Size (Green=Drug-like)')

plt.tight_layout()
plt.show()
# Define drug-like criteria
def is_druglike(row):
    """Check if peptide meets drug-like criteria."""
    checks = {
        'mw_ok': 800 <= row['mw'] <= 1500,
        'charge_ok': -2 <= row['charge'] <= 3,
        'gravy_ok': -0.5 <= row['gravy'] <= 0.5,
        'stable': row['instability'] < 40
    }
    return all(checks.values()), checks

# Apply filters
druglike_flags = []
for idx, row in df.iterrows():
    is_dl, checks = is_druglike(row)
    druglike_flags.append(is_dl)

df['druglike'] = druglike_flags

# Summary
n_druglike = df['druglike'].sum()
print("📊 Drug-Likeness Filtering Results")
print("=" * 50)
print(f"Total library: {len(df)} peptides")
print(f"Drug-like: {n_druglike} ({100*n_druglike/len(df):.1f}%)")
print(f"Filtered out: {len(df) - n_druglike}")

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# MW distribution
axes[0,0].hist(df['mw'], bins=20, alpha=0.7, edgecolor='black')
axes[0,0].axvline(800, color='red', linestyle='--', label='Min threshold')
axes[0,0].axvline(1500, color='red', linestyle='--', label='Max threshold')
axes[0,0].set_xlabel('Molecular Weight (Da)')
axes[0,0].set_ylabel('Count')
axes[0,0].set_title('Molecular Weight Distribution')
axes[0,0].legend()

# Charge distribution
axes[0,1].hist(df['charge'], bins=range(-3, 6), alpha=0.7, edgecolor='black')
axes[0,1].axvline(-2, color='red', linestyle='--')
axes[0,1].axvline(3, color='red', linestyle='--')
axes[0,1].set_xlabel('Net Charge')
axes[0,1].set_ylabel('Count')
axes[0,1].set_title('Charge Distribution')

# GRAVY distribution
axes[1,0].hist(df['gravy'], bins=20, alpha=0.7, edgecolor='black')
axes[1,0].axvline(-0.5, color='red', linestyle='--')
axes[1,0].axvline(0.5, color='red', linestyle='--')
axes[1,0].set_xlabel('GRAVY Score')
axes[1,0].set_ylabel('Count')
axes[1,0].set_title('Hydrophobicity Distribution')

# Binding score vs MW
colors = ['green' if dl else 'red' for dl in df['druglike']]
axes[1,1].scatter(df['mw'], df['binding_score'], c=colors, alpha=0.6, edgecolors='black')
axes[1,1].set_xlabel('Molecular Weight (Da)')
axes[1,1].set_ylabel('Binding Score')
axes[1,1].set_title('Binding vs Size (Green=Drug-like)')

plt.tight_layout()
plt.show()

Step 5: Lead Selection & Ranking¶

Combine all criteria to select top candidates.

In [ ]:

Copied!





# Filter to drug-like candidates
candidates = df[df['druglike']].copy()

# Rank by binding score (lower = better)
candidates = candidates.sort_values('binding_score')

print("🏆 TOP 10 LEAD CANDIDATES")
print("=" * 80)
print(f"{'Rank':<6} {'ID':<8} {'Sequence':<15} {'MW':<8} {'Charge':<8} {'Score':<8}")
print("=" * 80)

for rank, (idx, row) in enumerate(candidates.head(10).iterrows(), 1):
    print(f"{rank:<6} {row['id']:<8} {row['sequence']:<15} "
          f"{row['mw']:<8.0f} {row['charge']:<8.0f} {row['binding_score']:<8.2f}")

# Save top candidates
top_candidates = candidates.head(10)
print(f"\n✅ Selected {len(top_candidates)} leads for further optimization")
# Filter to drug-like candidates
candidates = df[df['druglike']].copy()

# Rank by binding score (lower = better)
candidates = candidates.sort_values('binding_score')

print("🏆 TOP 10 LEAD CANDIDATES")
print("=" * 80)
print(f"{'Rank':<6} {'ID':<8} {'Sequence':<15} {'MW':<8} {'Charge':<8} {'Score':<8}")
print("=" * 80)

for rank, (idx, row) in enumerate(candidates.head(10).iterrows(), 1):
    print(f"{rank:<6} {row['id']:<8} {row['sequence']:<15} "
          f"{row['mw']:<8.0f} {row['charge']:<8.0f} {row['binding_score']:<8.2f}")

# Save top candidates
top_candidates = candidates.head(10)
print(f"\n✅ Selected {len(top_candidates)} leads for further optimization")

Step 6: Cyclization Strategy¶

Convert top linear peptides to cyclic forms for improved stability.

🔗 Why Cyclize?¶

Protease resistance - No free termini to cleave
Conformational rigidity - Better binding specificity
Improved bioavailability - Longer half-life

🧪 Cyclization Methods¶

Head-to-tail - Backbone cyclization (most common)
Disulfide - Cys-Cys bridge
Side-chain - Lys-Asp/Glu lactam bridge

In [ ]:

Copied!





# Generate 3D structure for top candidate
top_peptide = top_candidates.iloc[0]

print(f"🧬 Generating 3D structure for top candidate: {top_peptide['id']}")
print(f"   Sequence: {top_peptide['sequence']}")
print(f"   Binding score: {top_peptide['binding_score']:.2f}")
print("\n⏳ This may take a minute...\n")

# Generate linear structure
linear_pdb = generate_pdb_content(
    sequence_str=top_peptide['sequence'],
    conformation="random",  # Random coil
    optimize_sidechains=True,
    minimize_energy=True
)

# Generate cyclic structure
cyclic_pdb = generate_pdb_content(
    sequence_str=top_peptide['sequence'],
    conformation="random",
    cyclic=True,  # Head-to-tail cyclization
    optimize_sidechains=True,
    minimize_energy=True
)

print("✅ Structures generated!")
# Generate 3D structure for top candidate
top_peptide = top_candidates.iloc[0]

print(f"🧬 Generating 3D structure for top candidate: {top_peptide['id']}")
print(f"   Sequence: {top_peptide['sequence']}")
print(f"   Binding score: {top_peptide['binding_score']:.2f}")
print("\n⏳ This may take a minute...\n")

# Generate linear structure
linear_pdb = generate_pdb_content(
    sequence_str=top_peptide['sequence'],
    conformation="random",  # Random coil
    optimize_sidechains=True,
    minimize_energy=True
)

# Generate cyclic structure
cyclic_pdb = generate_pdb_content(
    sequence_str=top_peptide['sequence'],
    conformation="random",
    cyclic=True,  # Head-to-tail cyclization
    optimize_sidechains=True,
    minimize_energy=True
)

print("✅ Structures generated!")

In [ ]:

Copied!





# Visualize linear vs cyclic
import py3Dmol

print("🔬 3D Structure Comparison\n")

# Linear structure
print("Linear Peptide:")
view1 = py3Dmol.view(width=400, height=300)
view1.addModel(linear_pdb, 'pdb')
view1.setStyle({'cartoon': {'color': 'spectrum'}})
view1.setStyle({'resn': top_peptide['sequence'][0]},
               {'sphere': {'color': 'red', 'radius': 0.5}})  # N-terminus
view1.setStyle({'resn': top_peptide['sequence'][-1]},
               {'sphere': {'color': 'blue', 'radius': 0.5}})  # C-terminus
view1.zoomTo()
view1.show()

print("\nCyclic Peptide (head-to-tail):")
view2 = py3Dmol.view(width=400, height=300)
view2.addModel(cyclic_pdb, 'pdb')
view2.setStyle({'cartoon': {'color': 'spectrum'}})
view2.zoomTo()
view2.show()

print("\n💡 Notice: The cyclic form has no free termini (red/blue spheres in linear).")
print("   This makes it resistant to exopeptidases!")
# Visualize linear vs cyclic
import py3Dmol

print("🔬 3D Structure Comparison\n")

# Linear structure
print("Linear Peptide:")
view1 = py3Dmol.view(width=400, height=300)
view1.addModel(linear_pdb, 'pdb')
view1.setStyle({'cartoon': {'color': 'spectrum'}})
view1.setStyle({'resn': top_peptide['sequence'][0]},
               {'sphere': {'color': 'red', 'radius': 0.5}})  # N-terminus
view1.setStyle({'resn': top_peptide['sequence'][-1]},
               {'sphere': {'color': 'blue', 'radius': 0.5}})  # C-terminus
view1.zoomTo()
view1.show()

print("\nCyclic Peptide (head-to-tail):")
view2 = py3Dmol.view(width=400, height=300)
view2.addModel(cyclic_pdb, 'pdb')
view2.setStyle({'cartoon': {'color': 'spectrum'}})
view2.zoomTo()
view2.show()

print("\n💡 Notice: The cyclic form has no free termini (red/blue spheres in linear).")
print("   This makes it resistant to exopeptidases!")

Step 7: Export Results¶

Save top candidates for experimental validation.

In [ ]:

Copied!





# Export summary
export_df = top_candidates[['id', 'sequence', 'length', 'mw', 'charge',
                            'gravy', 'binding_score']].copy()

print("📄 Exportable Results\n")
print(export_df.to_string(index=False))

# Save to CSV (if running locally)
if not IN_COLAB:
    export_df.to_csv('lead_peptides.csv', index=False)
    print("\n✅ Saved to lead_peptides.csv")
else:
    print("\n💡 In Colab: Copy the table above for your records")

# Save top structure
print("\n💾 Top candidate PDB (cyclic):")
print(f"   ID: {top_peptide['id']}")
print(f"   Sequence: {top_peptide['sequence']}")
print("   Ready for molecular dynamics or docking studies!")
# Export summary
export_df = top_candidates[['id', 'sequence', 'length', 'mw', 'charge',
                            'gravy', 'binding_score']].copy()

print("📄 Exportable Results\n")
print(export_df.to_string(index=False))

# Save to CSV (if running locally)
if not IN_COLAB:
    export_df.to_csv('lead_peptides.csv', index=False)
    print("\n✅ Saved to lead_peptides.csv")
else:
    print("\n💡 In Colab: Copy the table above for your records")

# Save top structure
print("\n💾 Top candidate PDB (cyclic):")
print(f"   ID: {top_peptide['id']}")
print(f"   Sequence: {top_peptide['sequence']}")
print("   Ready for molecular dynamics or docking studies!")

🎓 Key Takeaways¶

Drug Discovery Pipeline Steps¶

Library Design - Diversity + drug-like bias
Property Prediction - Filter early (fail fast!)
Binding Estimation - Prioritize candidates
Cyclization - Improve stability
Experimental Validation - Synthesize and test top hits

Real-World Considerations¶

Synthesis cost - Longer peptides = more expensive
Solubility - Must dissolve for assays
Cell permeability - Cyclic peptides can cross membranes
Off-target effects - Test selectivity

Next Steps in Real Projects¶

Molecular Dynamics - Simulate binding
Docking - Predict binding pose
Synthesis - Order top 5-10 candidates
Biochemical assays - IC50, EC50 measurements
Cell-based assays - Efficacy and toxicity
Lead optimization - Iterate based on SAR

🚀 Challenge Exercises¶

Try these extensions:

Modify target profile - Change binding preferences and re-rank
Add disulfide bridges - Include Cys pairs for additional cyclization
Larger library - Generate 500+ peptides and analyze trends
Multi-objective optimization - Balance binding, stability, and cost

Remember: Computational predictions guide experiments, but nothing beats real data! 🧪