2010 AMBER Tutorial with Biotin and Streptavidin

From Rizzo_Lab
Revision as of 22:28, 8 March 2010 by Haoquan (talk | contribs) (Data extraction and calculation)
Jump to: navigation, search

In this tutorial, we will learn how to run a molecular dynamics simulation of a protein-ligand complex. We will then post-process that simulation by calculating structural fluctuations (with RMSD) and free energies of binding (MM-GBSA).

Biotin Notes

Biotin is also called vitamin H. And it takes part in multiple processes inside the cell. It's a B-complex vitamin (coenzyme) that's involved in gluconeogenesis, citric acid cycle, and various carboxylation reactions.

Streptavidin Notes

Download PDB Here and view it's details Here. Streptavidin has an incredibly strong affinity for biotin; the dissociation constant for the streptavidin-biotin complex is on the order of femtomolar.


.... ....


What is AMBER?

Amber - Assisted Model Building with Energy Refinement - is a suite of about 50 programs that can be used to simulate, study and analyze macromolecular systems such as proteins dissolved in water at physiological conditions. Amber10, the current version (Amber11 soon to be released) of Amber, is extremely advanced, powerful and fast. PMEMD, particle mesh Ewald MD (boundary condition treatment / parallelized code) can churn out 314 ps/day of data for the system dihydrofolate reductase (159 residue protein) in TIP3P water (23,558 total atoms). However, because PMEMD lacks the ability to restrain the atoms we need properly, we will be using SANDER to perform most of our simulations.

AMBER Notes

The Amber 10 Manual is the primary resource when trying to learn what variables and keywords mean and what they do. Using Adobe Acrobat to view the file, you can simply search the document for keywords, which saves much time.

Keywords for preparatory programs:

LEaP: creates or modifies systems in Amber. It consists of the functions of prep, link, edit, and parm.

ANTECHAMBER: the main Antechamber suite program that helps prepare input files for nucleic acids and proteins for LEaP.


Keywords for simulating programs:

SANDER: according to the Amber 10 manual, it is 'a basic energy minimizer and molecular dynamics program. This program relaxes the structure by iteratively moving the atoms down the energy gradient until a sufficiently low average gradient is obtained. The molecular dynamics portion generates configurations of the system by integrating Newtonian equations of motion. MD will sample more configurational space than minimization, and will allow the structure to cross over small potential energy barriers. Configurations may be saved at regular intervals during the simulation for later analysis, and basic free energy calculations using thermodynamic integration may be performed. More elaborate conformational searching and modeling MD studies can also be carried out using the SANDER module. This allows a variety of constraints to be added to the basic force field, and has been designed especially for the types of calculations involved in NMR structure refinement'.

PMEMD: verison of SANDER that allows parallel scaling and optimized speed.


There is a mailing list you could sign-up for, as an additional resource.

Unix tips

This is specific to our cluster (seawulf) and desktop (mathlab) environments. See Activating your Seawulf Account for details of sw short cut.

Download Files from SeaWulf to Herbie:

ssh compute.mathlab.sunysb.edu

Login in to Herbie

mkdir sw_dir

make a directory "sw_dir" for which to download files and be organized

cd to sw_dir so when you scp files or directories back to Herbie, it copies them to a specific directory - "sw_dir"

cd sw_dir
scp -r sw:/location_of_files_or_directory/ .

Safely Copy, Recursively, /location_of_files_or_directory/

run.sander.MPI.csh

include these lines before mpirun command to know which nodes mpi is running on

echo "Queue is giving this nodes:"
cat \$PBS_NODEFILE
echo "MPI is running on:"
mpirun -n 8 hostname

Structure Preparation

To begin with, create the directories in seawulf you will work in, using the commands here:

mkdir AMBER_Tutorial
cd AMBER_Tutorial
mkdir 001.CHIMERA.MOL.PREP  
mkdir 002.TLEAP  
mkdir 003.SANDER 
mkdir 004.ptraj

Copy the commands above to your terminal and hit enter one at a time.

Open Chimera, choose File - Fetch by ID, then type in "1df8". Now you will see your protein and ligand in Chimera.

1. It is a dimer, but you need only a monomer. Click Select - chain - B, you would see chain B is highlighted. Then click Action - Atoms/Bonds - delete. Now only a receptor, a ligand and several water molecules are left.

2. Now you need to separate the ligand and receptor. First, Select - residue - HOH, then delete it. File - Save PDB, save this pdb as "1df8.rec_lig.pdb", then Select - residue - BTN, delete it. Save PDB as "1df8.rec.noh.pdb" Now you have a receptor pdb file. Place it in your "001.CHIMERA.MOL.PREP" directory.

Second, open the 1df8.rec_lig.pdb, select the receptor and delete it (Tips: you can invert your selection.) Then Tools - structure editing - Add H, press OK. File - Save mol2, save it as "1df8.lig.mol2". Note that this mol2 file would cause errors later in leap, because of its numbering. You need to modify it manually. Simply you can copy this file from "/nfs/user03/pholden/AMBER_Tutorial/001.CHIMERA.MOL.PREP/1df8.lig.mol2" on seawulf, and compare it with yours to see what should be modified. Also, place it in your "001.CHIMERA.MOL.PREP" directory.

TLEAP

Class please add in a Tleap section here.

describe our protocol and give example input files.

TLEAP Generating Data

open 002.tleap and the file should look like:

" ============================================================================
" Netrw Directory Listing                                        (netrw v125)
"   /nfs/user03/username/1DF8_setup/AMBER_tutorial/002.TLEAP
"   Sorted by      name
"   Sort sequence: [\/]$,\.h$,\.c$,\.cpp$,*,\.o$,\.obj$,\.info$,\.swp$,\.bak$,\~$
"   Quick Help: <F1>:help  -:go up dir  D:delete  R:rename  s:sort-by  x:exec
" ============================================================================
../
rizzo_amber7.ionparms/
1df8.com.gas.leap.crd
1df8.com.gas.leap.parm
1df8.com.gas.leap.pdb
1df8.com.wat.leap.crd
1df8.com.wat.leap.parm
1df8.lig.ante.frcmod
1df8.lig.ante.pdb
1df8.lig.ante.prep
1df8.lig.gas.leap.crd
1df8.lig.gas.leap.parm
1df8.lig.wat.leap.crd
1df8.lig.wat.leap.parm
1df8.rec.gas.leap.crd
1df8.rec.gas.leap.parm
1df8.rec.wat.leap.crd
1df8.rec.wat.leap.parm
ANTECHAMBER_AC.AC
ANTECHAMBER_AC.AC0
ANTECHAMBER_BOND_TYPE.AC
ANTECHAMBER_BOND_TYPE.AC0
ANTECHAMBER_PREP.AC
ANTECHAMBER_PREP.AC0
ATOMTYPE.INF
NEWPDB.PDB
PREP.INF
leap.log
mk.files.csh*
tleap.com.in
tleap.com.out
tleap.lig.in
tleap.lig.out
tleap.rec.in
tleap.rec.out

now do csh to create the coordinate and parm files

Minimization and equilibration

In order to adjust our structures to the force field and remove any model building artifacts, we first perform a several-step equilibration protocol. Several iterations of minimization and molecular dynamics will be preformed with decreasing restraints.

The first step: Relaxing the experimental or silico structure

01mi.in: equilibration
&cntrl
  imin = 1, maxcyc = 1000, ntmin = 2,
  ntx = 1, ntc = 1, ntf = 1,
  ntb = 1, ntp = 0,
  ntwx = 1000, ntwe = 0, ntpr = 1000,
  scee = 1.2, cut = 8.0,
  ntr = 1,
  restraintmask = ':1-119 & !@H=',            
  restraint_wt=5.0,
/

The MD run should be set up:

cat << EOF > 10md.in
10md.in: production (500000 = 1ns)
 &cntrl
   imin = 0, ntx = 5, irest = 1, nstlim = 500000,
   temp0 = 298.15, tempi = 298.15, ig = 71287,
   ntc = 2, ntf = 1, ntt = 1, dt = 0.002,
   ntb = 2, ntp = 1, tautp = 1.0, taup = 1.0,
   ntwx = 500, ntwe = 0, ntwr = 500, ntpr = 500,
   scee = 1.2, cut = 8.0, iwrap = 1,
   ntr = 1, nscm = 100,
   restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1,
 /


  • &cntrl -> Tells SANDER that what follows are control variables.
  • imin=1 -> Perform Minimization
  • maxcyc=1000 -> Perform 1000 Minimization Steps
  • ntmin=2 -> Steepest Descent Method of Minimization
  • ntx=1 -> Initial Coordinates Lack Velocity - it's a restart file (See VMD)
  • ntc=1 -> "SHAKE" Posititional Restraints OFF (Default)
  • ntf=1 -> Calculate All types of Forces (bonds, angles, dihedrals, non-bonded)
  • ntb=1 -> Constant Volume Boundary Periodicity
  • ntp=0 -> No Pressure Regulation
  • ntwx=1000 -> Print Coordinates Frequency
  • ntwe=0 -> Print Energy to "mden" Frequency
  • ntpr=1000 -> Print Readable Energy Information to "mdout" and "mdinfo"
  • scee' -> 1-4 Coulombic Forces are Divided (Default=1.2)
  • cut=8.0 -> Coulombic Force Cutoff distance in Angstroms
  • restraintmask = ':1-119 & !@H=', -> restraint the residues matching the mask':1-119 & !@H='. Here, we're restraining residues 1 through 119 and everything that isn't hydrogen. Essentially, onlt Hydrogen atoms move free of restraint.
  • restraint_wt=5.0 is the Force Constant assigned to the restrained atoms. Each atom "sits" in a potential-energy well characterized by a "5.0" kcal/mol wall.
  • / is used to the machine to stop the job when it's done.

Class add in exactly what your equilibration was.

Production Simulation

Class please describe your production run simulations.

ptraj - Analyzing Your Data

ptraj is an analysis program included in the AMBER suite (AMBERtools) designed in part by Dr. Thomas Cheatham. See this website.

This page contains a brief list of ptraj functions and their syntax. Commands can be combined with most combinations of other functions to suit the need.

A useful and recommended program - merely a text file with functional syntax - to write is:

RUNTRAJ

#!/bin/csh
ptraj <filename.parm> <ptraj.1.in> > <ptraj.1.out>
exit

(When writing the above, one depressed 'Enter' on the keyboard, which is 'recorded' by vim. So, when the file is executed, it would be like hitting 'Enter' if you were entering the commands by hand in the shell.

  • Change filename.parm to 1df8.com.gas.leap.parm

#!/bin/csh -> will be in nearly all of the programs you will write - unless you dabble in Cpp or G77. It tells the shell to treat the contents of this here file as if the contents were being typed in the shell by hand.

ptraj -> has been aliased in your .cshrc file and will initialize ptraj once read by the machine.

filename.parm -> is the .parm file you would like to specify.

ptraj_input_filename.1.in -> is the set of instruction you want ptraj to read and perform, in an input file (This would be "ptraj.concatenate.strip.trj" in the coming examples).

exit -> will exit from ptraj when the ptraj_input_filename.1.in has completed its instruction(s).


Executing it:

Herbie:~> csh RUNTRAJ

1. Combine Production Trajectories while Stripping the Water Molecules

ptraj.1.in

 trajin ../003.SANDER/10md.trj 1 1000 1
  • trajin -> tells ptraj to "read-in" the file which comes after it
  • ../003.SANDER/10md.trj -> is the file to be "read-in"
  • 1 1000 1 -> tells ptraj to use the first to the 1000th snapshot of the trajectory. The third number, "1", is telling ptraj to read-in every frame. If this last number were "2", then ptraj would read-in every-other snapshot, "10" would be every 10th snapshot and so on.
trajin ../003.SANDER/11md.trj 1 1000 1
  • This will do the exact same as the first trajin cmd (command), except now we're analyzing a different trajectory - 11md.trj.
trajout 1df8.trj.strip nobox
  • trajout -> tells ptraj to write a new trajectory file, combining the two trajectories - 10md.trj and 11md.trj - from trajin.
  • 1df8.trj.strip -> is the name of the new trajectory to be made by trajout.
  • nobox -> is essentially a house-keeping cmd, where the periodic box information will just be neglected. Unless using CHARMM files, this ought to not be an issue.
strip :WAT
  • strip -> instructs ptraj to disappear those objects named "WAT" ':WAT
  • So you're left with a file "ptraj.concatenate.strip.trj" with the following in it:
trajin ../003.SANDER/10md.trj 1 1000 1
trajin ../003.SANDER/11md.trj 1 1000 1
trajout 1df8.trj.strip nobox
strip :WAT

2. RMSD

RMSD - root mean-square distance - can be used to measure the distance an object moves relative to a reference object. For example, one could use an RMSD analysis to measure the movement of the alpha-carbon atoms in the active site of a protein, using the experimental structure as the reference structure (ptraj will measure the RMSD between each object specified in the ptraj script - see below) where ptraj will by default fit the two structures, aligning them as much as possible. nofit is used to turn this function off.

ptraj.2.in

 trajin 1df8.trj.strip 1 2000 1
 trajout 1df8.com.trj.stripfit
 reference 1df8.com.gas.leap.crd
  • reference -> tells ptraj that you want to specify a reference file - snapshot - for which to compare your trajectory (file with many snapshots) to.
  • 1df8.com.gas.leap.crd -> is the reference file. This file is very important and you ought to be thoughtful about your selection of this file. Usually, when possible, one wants to use the experimental structure as the reference. Referencing the experimental structure 'usually' provides the most informative results. But, if done thoughtfully, a non-experimental reference could be informative, too...
 rms reference out 1df8.rmsd.CA.txt :1-118@CA
  • rms -> tells ptraj you want to perform an rms analysis
  • reference -> tells traj to use the reference file, specified in the previous line
  • out -> tells ptraj to create a temporary file out for which to store calculations during the analysis
  • 1df8.rmsd.CA.txt -> is the name of the file with the RMSD analysis results. This is the file you will use with your plotting program..
  • :1-118@CA -> tells ptraj to analyze the RMSD of the alpha-carbon atoms CA residues 1-118.


So when you're done, you're left with:

trajin 1df8.trj.strip 1 2000 1
trajout 1df8.com.trj.stripfit
reference 1df8.com.gas.leap.crd
rms reference out 1df8.rmsd.CA.txt :1-118@CA

4. Keep Only Streptavidin from 1df8.com.trj.stripfit

ptraj.4.in

trajin 1df8.com.trj.stripfit 1 2000 1
trajout 1df8.rec.trj.stripfit
strip :119
  • We've just stripped residue 119 (Biotin) from the 1df8.com.trj.stripfit file, which we've previously stripped of water

5. Keep Only Biotin from 1df8.com.trj.stripfit

ptraj.5.in

trajin 1df8.com.trj.stripfit 1 2000 1
trajout df8.lig.trj.stripfit
strip :1-118
  • Strip everything, keeping only the protein, Streptavidin

Also, you can write a csh file to go through the procedure above. Make a file analy.1.csh in your AMBER_tutorial directory as follows:

#! /bin/tcsh

mkdir 004.PTRAJ
cd ./004.PTRAJ

cat << EOF > ptraj.1.in
trajin ../003.SANDER/10md.trj 1 1000 1
trajin ../003.SANDER/11md.trj 1 1000 1
trajout 1df8.trj.strip nobox
strip :WAT
EOF

ptraj ../002.TLEAP/1df8.com.wat.leap.parm ptraj.1.in >ptraj.1.log

cat << EOF > ptraj.2.in
trajin 1df8.trj.strip
trajout 1df8.com.trj.stripfit
reference ../002.TLEAP/1df8.com.gas.leap.crd
rms reference out 1df8.rmsd.CA.txt :1-118@CA
EOF

ptraj ../002.TLEAP/1df8.com.gas.leap.parm ptraj.2.in >ptraj.2.log

cat << EOF > ptraj.3.in
trajin 1df8.com.trj.stripfit
reference ../002.TLEAP/1df8.com.gas.leap.crd
rms reference out 1df8.lig.rmsd.txt :119@C*,N*,O*,S* nofit
EOF 

ptraj ../002.TLEAP/1df8.com.gas.leap.parm ptraj.3.in >ptraj.3.log 

cat << EOF > ptraj.4.in
trajin 1df8.com.trj.stripfit
trajout 1df8.rec.trj.stripfit
strip :119
EOF 

cat <<EOF > ptraj.5.in 
trajin 1df8.com.trj.stripfit
trajout 1df8.lig.trj.stripfit
strip :1-118
EOF

ptraj ../002.TLEAP/1df8.com.gas.leap.parm ptraj.4.in >ptraj.4.log
ptraj ../002.TLEAP/1df8.com.gas.leap.parm ptraj.5.in >ptraj.5.log  

cd .. 

Then use "csh" command to execute the file analy.1.in in your AMBER_tutorial directory.


MMGBSA

Create a new directory 005.MMGBSA To do an analysis we need three runs for the complex, ligand and receptor. Now write an input File: gb.rescore.in Single Point GB energy Calculation

&cntrl
 ntf=1, ntb=0, ntc=2,
 idecomp=0,
 igb=5, saltcon=0.0,
 gbsa=2, surften=1.0
 offset=0.09, extdiel=1.0,
 cut=99999.0, nsnb=99999,
 scnb=2.0, scee=1.2,
 imin=5, maxcyc=1, ncyc=0,
/
  • idecomp=0 -> Important, but is turned-off here. It is used for analysis.
  • igb=5 -> OBC Flavor of GB
  • gbsa=2 -> Generalized Born / Surface Area
    • 1 -> LCPA Surface Area Method
    • 2 -> Recursive Atom-Centered Method
  • surften=1 -> Calculate Solvation Free-Energy (non-polar contribution)
  • offset=0.09 -> Dielectric Scaling for GB
  • extdiel=1.0 -> Dielectric Constant for Solvent Exterior (Default 78.5)
  • nsnb=99999 -> Non-Bonded List Update Frequency (See igb=0, nbflag=0) (Default=25)
  • scnb=2.0 -> 1-4 van der Waals Division (default=2.0)


Write the Following Job Script: run.sander.rescore.csh

#!/bin/tcsh
#PBS -l nodes=1:ppn=2
#PBS -l walltime=24:00:00
#PBS -o zzz.mmgbsa.1.out
#PBS -e zzz.mmgbsa.1.err
#PBS -V

set workdir = "${HOME}/AMBER_Tutorial/005.MMGBSA"
cd ${workdir} 

sander -O \
-i gb.rescore.in \
-o gb.rescore.out.com \
-p ../002.TLEAP/1df8.com.gas.leap.parm \
-c ../002.TLEAP/1df8.com.gas.leap.crd \
-y ../004.PTRAJ/1df8.com.trj.stripfit \
-r restrt.com \
-ref ../002.TLEAP/1df8.com.gas.leap.crd \
-x mdcrd.com \
-inf mdinfo.com \

sander -O \
-i gb.rescore.in \
-o gb.rescore.out.lig \
-p ../002.TLEAP/1df8.lig.gas.leap.parm \
-c ../002.TLEAP/1df8.lig.gas.leap.crd \
-y ../004.PTRAJ/1df8.lig.trj.stripfit \
-r restrt.lig \
-ref ../002.TLEAP/1df8.lig.gas.leap.crd \
-x mdcrd.lig \
-inf mdinfo.lig

sander -O \
-i gb.rescore.in \
-o gb.rescore.out.rec \
-p ../002.TLEAP/1df8.rec.gas.leap.parm \
-c ../002.TLEAP/1df8.rec.gas.leap.crd \
-y ../004.PTRAJ/1df8.rec.trj.stripfit \
-r restrt.rec \
-ref ../002.TLEAP/1df8.rec.gas.leap.crd \
-x mdcrd.rec \
-inf mdinfo.rec

exit


Change "run.sander.rescore.csh" into an executable

:~> chmod +x run.sander.rescore.csh

Submit the job -> run.sander.rescore.csh

~> qsub run.sander.rescore.csh

Monitor your job

~> qstat -u YourUserName

Data extraction and calculation

When MD finised, you will find "gb.rescore.out.com", "gb.rescore.out.lig", "gb.rescore.out.rec" these three outputs. Use the following script for extracting data.

#! /bin/bash
# by Haoquan
echo com lig rec > namelist

LIST=`cat namelist`

for i in $LIST ; do

grep VDWAALS gb.rescore.out.$i | awk '{print $3}' > $i.vdw
grep VDWAALS gb.rescore.out.$i | awk '{print $9}' > $i.polar
grep ESURF gb.rescore.out.$i | awk '{print $3 * 0.00542 + 0.92}' > $i.surf
grep RESTRAINT gb.rescore.out.$i | awk '{print $8}' > $i.coul

paste -d " " $i.vdw $i.polar $i.surf $i.coul | awk '{print $1 + $2 + $3 + $4}' > data.$i

rm $i.*

done

for ((j=1; j<2001; j+=1)) do
echo $j , >> time
done

paste -d " " data.com data.lig data.rec | awk '{print $1 - $2 - $3}' > data.all
paste -d " " time data.all > MMGBSA_vs_time

rm namelist time data.*


Now you have the final data sheet: MMGBSA_vs_time. Copy them to excel and separate two columns by comma Result may look like this.