Difference between revisions of "2010 AMBER Tutorial with Biotin and Streptavidin"

From Rizzo_Lab
Jump to: navigation, search
(Hydrogen bond distance)
 
(61 intermediate revisions by 6 users not shown)
Line 1: Line 1:
In this tutorial we will learn how to run a molecular dynamics simulation of a protein-ligand complex.  We will then post-process that simulation by calculating structural fluctulations (with RMSD) and free energies of binding (MM-GBSA).
+
For additional Rizzo Lab tutorials see [[AMBER Tutorials]].
 +
 
 +
In this tutorial, we will learn how to run a molecular dynamics simulation of a protein-ligand complex.  We will then post-process that simulation by calculating structural fluctuations (with RMSD) and free energies of binding (MM-GBSA).
  
 
=Biotin Notes=
 
=Biotin Notes=
Line 15: Line 17:
 
[http://ambermd.org/ Amber] - '''A'''ssisted '''M'''odel '''B'''uilding with '''E'''nergy '''R'''efinement - is a suite of about 50 programs that can be used to simulate, study and analyze macromolecular systems such as proteins dissolved in water at physiological conditions. Amber10, the current version (Amber11 soon to be released) of Amber, is extremely advanced, powerful and fast. PMEMD, particle mesh Ewald MD (boundary condition treatment / parallelized code) can churn out 314 ps/day of data for the system dihydrofolate reductase (159 residue protein) in TIP3P water (23,558 total atoms). However, because PMEMD lacks the ability to restrain the atoms we need properly, we will be using SANDER to perform ''most'' of our simulations.
 
[http://ambermd.org/ Amber] - '''A'''ssisted '''M'''odel '''B'''uilding with '''E'''nergy '''R'''efinement - is a suite of about 50 programs that can be used to simulate, study and analyze macromolecular systems such as proteins dissolved in water at physiological conditions. Amber10, the current version (Amber11 soon to be released) of Amber, is extremely advanced, powerful and fast. PMEMD, particle mesh Ewald MD (boundary condition treatment / parallelized code) can churn out 314 ps/day of data for the system dihydrofolate reductase (159 residue protein) in TIP3P water (23,558 total atoms). However, because PMEMD lacks the ability to restrain the atoms we need properly, we will be using SANDER to perform ''most'' of our simulations.
  
=Quick Tips=
+
==AMBER Notes==
  
===Quick Notes===
 
 
The [http://ringo.ams.sunysb.edu/img_auth.php/c/cb/Amber10i-1-_-_Manual.pdf Amber 10 Manual] is the primary resource when trying to learn what variables and keywords mean and what they do. Using Adobe Acrobat to view the file, you can simply search the document for keywords, which saves much time.
 
The [http://ringo.ams.sunysb.edu/img_auth.php/c/cb/Amber10i-1-_-_Manual.pdf Amber 10 Manual] is the primary resource when trying to learn what variables and keywords mean and what they do. Using Adobe Acrobat to view the file, you can simply search the document for keywords, which saves much time.
 
  
 
Keywords for preparatory programs:
 
Keywords for preparatory programs:
Line 37: Line 37:
 
There is a [http://lists.ambermd.org/mailman/listinfo/amber mailing list] you could sign-up for, as an additional resource.
 
There is a [http://lists.ambermd.org/mailman/listinfo/amber mailing list] you could sign-up for, as an additional resource.
  
===Quick Scripts===
+
=Unix tips=
'''Download Files from SeaWulf to Herbie'''
+
 
 +
This is specific to our cluster (seawulf) and desktop (mathlab) environments.  See  [[Activating your Seawulf Account]] for details of sw short cut.
 +
 
 +
Download Files from SeaWulf to Herbie:
 
  ssh compute.mathlab.sunysb.edu
 
  ssh compute.mathlab.sunysb.edu
 
Login in to Herbie
 
Login in to Herbie
Line 59: Line 62:
 
  echo "MPI is running on:"
 
  echo "MPI is running on:"
 
  mpirun -n 8 hostname
 
  mpirun -n 8 hostname
 
 
  
 
=Structure Preparation=
 
=Structure Preparation=
Line 74: Line 75:
 
Open Chimera, choose File - Fetch by ID, then type in "1df8". Now you will see your protein and ligand in Chimera.  
 
Open Chimera, choose File - Fetch by ID, then type in "1df8". Now you will see your protein and ligand in Chimera.  
  
1. It is a dimer, but you need only a monomer. Click Select - chain - B, you would see chain B is highlighted. Then click Action - Atoms/Bonds - delete. Now only a receptor, a ligand and several water molecules are left.  
+
1. It is a dimer, but you need only a monomer.  
 +
* Click Select - chain - B, you would see chain B is highlighted.
 +
* Then click Action - Atoms/Bonds - delete.
 +
 
 +
Now only a receptor, a ligand and several water molecules are left.  
  
 
2. Now you need to separate the ligand and receptor.  
 
2. Now you need to separate the ligand and receptor.  
First, Select - residue - HOH, then delete it. File - Save PDB, save this pdb as "1df8.rec_lig.pdb", then Select - residue - BTN, delete it. Save PDB as "1df8.rec.noh.pdb" Now you have a receptor pdb file. Place it in your "001.CHIMERA.MOL.PREP" directory.
+
* First, Select - residue - HOH, then delete it.
 +
* File - Save PDB, save this pdb as "1df8.rec_lig.pdb", then Select - residue - BTN, delete it. Save PDB as "1df8.rec.noh.pdb."
 +
* Second, open the 1df8.rec_lig.pdb, select the receptor and delete it (Tips: you can invert your selection.)
 +
 
 +
* Then Tools - structure editing - Add H, press OK.
 +
 
 +
* Then Tools - structure editing - Add Charge, press OK. select AM1-BCC charge model.
 +
 
 +
* File - Save mol2, save it as "1df8.lig.mol2".
 +
 
 +
* Open up ligand mol2 file with your favorite text editor (i.e vim).  Manually replace the 2nd columns with unique names.  Leap only uses the first 3 characters as the name.
 +
 
 +
old.mol2:
 +
 
 +
      1 C11        32.5640  18.1390  14.0710 C.2      1 BTN1        0.9079
 +
      2 O11        33.4260  17.4630  13.4730 O.co2    1 BTN1      -0.8596
 +
      3 O12        32.5320  19.3920  13.9660 O.co2    1 BTN1      -0.8472
 +
      4 C10        31.5990  17.4500  14.9620 C.3      1 BTN1      -0.1966
 +
      5 C9        31.2260  16.0460  14.5600 C.3      1 BTN1      -0.0459
 +
      6 C8        30.5160  16.0360  13.2000 C.3      1 BTN1      -0.0920
 +
      7 C7        30.0160  14.6260  12.8220 C.3      1 BTN1      -0.0683
 +
      8 C2        29.2080  14.5510  11.5450 C.3      1 BTN1      -0.0028
 +
      9 S1        27.5110  15.2280  11.7030 S.3      1 BTN1      -0.2811
 +
    10 C6        27.1670  14.6500  10.0230 C.3      1 BTN1      -0.0520
 +
    11 C5        27.7360  13.2480    9.9740 C.3      1 BTN1        0.0662
 +
    12 N1        26.8850  12.1810  10.4970 N.pl3    1 BTN1      -0.4731
 +
 
 +
new.mol2
 +
      1 C1        32.5640  18.1390  14.0710 C.2      1 BTN1        0.9079
 +
      2 O2        33.4260  17.4630  13.4730 O.co2    1 BTN1      -0.8596
 +
      3 O3        32.5320  19.3920  13.9660 O.co2    1 BTN1      -0.8472
 +
      4 C4        31.5990  17.4500  14.9620 C.3      1 BTN1      -0.1966
 +
      5 C5        31.2260  16.0460  14.5600 C.3      1 BTN1      -0.0459
 +
      6 C6        30.5160  16.0360  13.2000 C.3      1 BTN1      -0.0920
 +
      7 C7        30.0160  14.6260  12.8220 C.3      1 BTN1      -0.0683
 +
      8 C8        29.2080  14.5510  11.5450 C.3      1 BTN1      -0.0028
 +
      9 S9        27.5110  15.2280  11.7030 S.3      1 BTN1      -0.2811
 +
    10 C10        27.1670  14.6500  10.0230 C.3      1 BTN1      -0.0520
 +
    11 C11        27.7360  13.2480    9.9740 C.3      1 BTN1        0.0662
 +
    12 N12        26.8850  12.1810  10.4970 N.pl3    1 BTN1      -0.4731
 +
 
 +
Note that this mol2 file would cause errors later in leap, because of its numbering. You need to modify it manually. Simply you can copy this file from "/nfs/user03/pholden/AMBER_Tutorial/001.CHIMERA.MOL.PREP/1df8.lig.mol2" on seawulf, and compare it with yours to see what should be modified. Also, place it in your "001.CHIMERA.MOL.PREP" directory.
 +
 
 +
3. Now you have a receptor pdb file and a ligand mol2, place them in your "001.CHIMERA.MOL.PREP" directory by performing the following:
 +
*put files on a machine that is a portal to seawulf (i.e. silver, herbie or ringo).  If user is on a MATHLAB machine then files are accessible from silver or herbie.
 +
*scp files sw:path
 +
* Note that is only applies to AMS536 (or Rizzo lab rotation) students and is specific for our computer setup.
 +
 
 +
Note that there are four residues (22, 27, 40, and 57) that have two possible conformers, when tleap builds your parm and crd files only the first occurrence of each atom is kept.  See section below and leap output.
 +
 
 +
=TLEAP=
 +
 
 +
Class please add in a Tleap section here.
 +
 
 +
Class I've gotten a section started please improve this section.  Thanks.
 +
 
 +
describe our protocol and give example input files.
 +
 
 +
First, create directory called ~/AMBER_Tutorial/002.TLEAP and go into that dir:
 +
 
 +
mkdir ~/AMBER_Tutorial/002.TLEAP
 +
cd ~/AMBER_Tutorial/002.TLEAP
 +
 
 +
Next we will write the following files that you will need to input files for LEAP in order to create your parameter/topology files and your coordinate files
 +
 
 +
cat << EOF > tleap.lig.in
 +
set default PBradii mbondi2
 +
source leaprc.ff99SB
 +
loadamberparams rizzo_amber7.ionparms/parm.e16.dat
 +
loadoff rizzo_amber7.ionparms/ions.lib
 +
loadamberparams rizzo_amber7.ionparms/ions.frcmod
 +
source leaprc.gaff
 +
loadamberparams rizzo_amber7.ionparms/gaff.dat.rizzo
 +
loadamberparams 1df8.lig.ante.frcmod
 +
loadamberprep 1df8.lig.ante.prep
 +
lig = loadpdb 1df8.lig.ante.pdb
 +
saveamberparm lig 1df8.lig.gas.leap.parm 1df8.lig.gas.leap.crd
 +
solvateBox lig TIP3PBOX 10.0
 +
saveamberparm lig 1df8.lig.wat.leap.parm 1df8.lig.wat.leap.crd
 +
charge lig
 +
quit
 +
EOF
 +
 
 +
cat << EOF > tleap.rec.in
 +
set default PBradii mbondi2
 +
source leaprc.ff99SB
 +
loadamberparams rizzo_amber7.ionparms/parm.e16.dat
 +
loadoff rizzo_amber7.ionparms/ions.lib
 +
loadamberparams rizzo_amber7.ionparms/ions.frcmod
 +
REC = loadpdb ../001.CHIMERA.MOL.PREP/1df8.rec.noh.pdb
 +
 +
saveamberparm REC 1df8.rec.gas.leap.parm 1df8.rec.gas.leap.crd
 +
solvateBox REC TIP3PBOX 10.0
 +
saveamberparm REC 1df8.rec.wat.leap.parm 1df8.rec.wat.leap.crd
 +
charge REC
 +
quit
 +
EOF
 +
 
 +
cat << EOF > tleap.com.in
 +
set default PBradii mbondi2
 +
source leaprc.ff99SB
 +
loadamberparams rizzo_amber7.ionparms/parm.e16.dat
 +
loadoff rizzo_amber7.ionparms/ions.lib
 +
loadamberparams rizzo_amber7.ionparms/ions.frcmod
 +
source leaprc.gaff
 +
loadamberparams rizzo_amber7.ionparms/gaff.dat.rizzo
 +
REC = loadpdb ../001.CHIMERA.MOL.PREP/1df8.rec.noh.pdb
 +
 +
loadamberparams 1df8.lig.ante.frcmod
 +
loadamberprep 1df8.lig.ante.prep
 +
LIG = loadpdb 1df8.lig.ante.pdb
 +
COM = combine {REC LIG}
 +
saveamberparm COM 1df8.com.gas.leap.parm 1df8.com.gas.leap.crd
 +
solvateBox COM TIP3PBOX 10.0
 +
saveamberparm COM 1df8.com.wat.leap.parm 1df8.com.wat.leap.crd
 +
charge COM
 +
quit
 +
EOF
 +
 
 +
Note the rizzo_amber7.ionparms/*.dat are modified parameter files.
 +
 
 +
 
 +
The following is an example csh script used to generate the necessary files for running molecular Dynamics.
 +
cat << EOF > run.TLEAP.csh
 +
#! /bin/csh
 +
 +
set workdir = "~/AMBER_Tutorial/002.TLEAP"
 +
cd $workdir
 +
 +
rm 1df8.* ANTECHAMBER* *.out ATOMTYPE.INF leap.log NEWPDB.PDB PREP.INF
 +
 +
antechamber -i ../001.CHIMERA.MOL.PREP/1df8.lig.mol2 -fi mol2  -o 1df8.lig.ante.pdb  -fo pdb
 +
antechamber -i ../001.CHIMERA.MOL.PREP/1df8.lig.mol2 -fi mol2  -o 1df8.lig.ante.prep -fo prepi
 +
parmchk -i 1df8.lig.ante.prep -f  prepi -o 1df8.lig.ante.frcmod
 +
tleap -s -f tleap.lig.in > tleap.lig.out
 +
tleap -s -f tleap.rec.in > tleap.rec.out
 +
tleap -s -f tleap.com.in > tleap.com.out
 +
 +
ambpdb -p  1df8.com.gas.leap.parm -tit "pdb" <1df8.com.gas.leap.crd > 1df8.com.gas.leap.pdb
 +
 +
exit
 +
EOF
 +
 
 +
To run this script issue the following command:
 +
csh run.TLEAP.csh
 +
 
 +
To parametrize your ligands the following is done.
 +
 
 +
antechamber is run first to create a prepi file, which contains the internal coordinates of the ligand.
 +
Then parmchk is run to create a frcmod file, which contains additional parameters not in GAFF and need by tleap.
 +
This files are need be for running leap.  You can see if you look in the tleap.com.in and tleap.lig.in the frcmod is loaded to insure that the ligand is parametrized fully
 +
 
 +
It is a good idea to open all these files with your favorite text editor (eg. vim).
  
Second, open the 1df8.rec_lig.pdb, select the receptor and delete it (Tips: you can invert your selection.) Then Tools - structure editing - Add H, press OK. File - Save mol2, save it as "1df8.lig.mol2". Note that this mol2 file would cause errors later in leap, because of its numbering. You need to modify it manually. Simply you can copy this file from "/nfs/user03/pholden/AMBER_Tutorial/001.CHIMERA.MOL.PREP/1df8.lig.mol2" on seawulf, and compare it with yours to see what should be modified. Also, place it in your "001.CHIMERA.MOL.PREP" directory.
+
  
 +
To insure that TLEAP ran correctly and generated all nesesary files issue the following:
 +
 +
ls
 +
 +
You should have all of the following files:
 +
 +
1df8.com.gas.leap.crd  1df8.lig.wat.leap.parm    ATOMTYPE.INF
 +
1df8.com.gas.leap.parm  1df8.rec.gas.leap.crd      leap.log
 +
1df8.com.gas.leap.pdb  1df8.rec.gas.leap.parm    NEWPDB.PDB
 +
1df8.com.wat.leap.crd  1df8.rec.wat.leap.crd      PREP.INF
 +
1df8.com.wat.leap.parm  1df8.rec.wat.leap.parm    run.tleap.csh
 +
1df8.lig.ante.frcmod    ANTECHAMBER_AC.AC          tleap.com.in
 +
1df8.lig.ante.pdb      ANTECHAMBER_AC.AC0        tleap.com.out
 +
1df8.lig.ante.prep      ANTECHAMBER_BOND_TYPE.AC  tleap.lig.in
 +
1df8.lig.gas.leap.crd  ANTECHAMBER_BOND_TYPE.AC0  tleap.lig.out
 +
1df8.lig.gas.leap.parm  ANTECHAMBER_PREP.AC        tleap.rec.in
 +
1df8.lig.wat.leap.crd  ANTECHAMBER_PREP.AC0      tleap.rec.out
  
=Generating Data=
 
open 002.tleap and the file should look like:
 
" ============================================================================
 
" Netrw Directory Listing                                        (netrw v125)
 
"  /nfs/user03/username/1DF8_setup/AMBER_tutorial/002.TLEAP
 
"  Sorted by      name
 
"  Sort sequence: [\/]$,\.h$,\.c$,\.cpp$,*,\.o$,\.obj$,\.info$,\.swp$,\.bak$,\~$
 
"  Quick Help: <F1>:help  -:go up dir  D:delete  R:rename  s:sort-by  x:exec
 
" ============================================================================
 
../
 
rizzo_amber7.ionparms/
 
1df8.com.gas.leap.crd
 
1df8.com.gas.leap.parm
 
1df8.com.gas.leap.pdb
 
1df8.com.wat.leap.crd
 
1df8.com.wat.leap.parm
 
1df8.lig.ante.frcmod
 
1df8.lig.ante.pdb
 
1df8.lig.ante.prep
 
1df8.lig.gas.leap.crd
 
1df8.lig.gas.leap.parm
 
1df8.lig.wat.leap.crd
 
1df8.lig.wat.leap.parm
 
1df8.rec.gas.leap.crd
 
1df8.rec.gas.leap.parm
 
1df8.rec.wat.leap.crd
 
1df8.rec.wat.leap.parm
 
ANTECHAMBER_AC.AC
 
ANTECHAMBER_AC.AC0
 
ANTECHAMBER_BOND_TYPE.AC
 
ANTECHAMBER_BOND_TYPE.AC0
 
ANTECHAMBER_PREP.AC
 
ANTECHAMBER_PREP.AC0
 
ATOMTYPE.INF
 
NEWPDB.PDB
 
PREP.INF
 
leap.log
 
mk.files.csh*
 
tleap.com.in
 
tleap.com.out
 
tleap.lig.in
 
tleap.lig.out
 
tleap.rec.in
 
tleap.rec.out
 
 
now do csh to create the coordinate and parm files
 
now do csh to create the coordinate and parm files
==Minimization and equilibration(Do This First!)==
+
 
several iterations of minimization and equilibration should be done with decreasing restraint.
+
=Minimization and equilibration=
 +
In order to adjust our structures to the force field and remove any model building artifacts, we first perform a several-step equilibration protocol.  Several iterations of minimization and molecular dynamics will be preformed with decreasing restraints.
 +
 
  
 
'''The first step: Relaxing the experimental or silico structure'''
 
'''The first step: Relaxing the experimental or silico structure'''
Line 176: Line 308:
 
*'''/''' is used to the machine to stop the job when it's done.
 
*'''/''' is used to the machine to stop the job when it's done.
  
 +
The following are our exacted equilibration and production protocols.
 +
 +
equilibration:
 +
01mi.in: maxcyc = 1000, restraintmask = ':1-119 & !@H=', restraint_wt = 5.0,
 +
02md.in: nstlim = 50000, restraintmask = ':1-119 & !@H=', restraint_wt = 5.0, dt = 0.001,
 +
03mi.in: maxcyc = 1000, restraintmask = ':1-119 & !@H=', restraint_wt = 2.0,
 +
04mi.in: maxcyc = 1000, restraintmask = ':1-119 & !@H=', restraint_wt = 0.1,
 +
05mi.in: maxcyc = 1000, restraintmask = ':1-119 & !@H=', restraint_wt = 0.05,
 +
06md.in: nstlim = 50000, restraintmask = ':1-119 & !@H=', restraint_wt = 1.0, dt = 0.001,
 +
07md.in: nstlim = 50000, restraintmask = ':1-119 & !@H=', restraint_wt = 0.5, dt = 0.001,
 +
08md.in: nstlim = 50000, restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1, dt = 0.001,
 +
09md.in: nstlim = 50000, restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1, dt = 0.001,
 +
production:
 +
10md.in: nstlim = 500000, restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1, dt = 0.002,
 +
11md.in: nstlim = 500000, restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1, dt = 0.002,
 +
 +
Note that the other parameters not specified are the same as those in the above two files for minimization and molecular dynamics, respectively.
 +
 +
 +
When your simulations have finished, you ought to check the stability and realism of results. Use the script [http://ringo.ams.sunysb.edu/img_auth.php/3/3c/E_asis E_asis] to analyze the the mdout files. This ought to also be used to check the validity and stability of your production runs.
 +
 +
Download E_asis onto your local machine (the one you're using right now). Once saved onto local machine, transfer it to you working directory on Herbie, Seawulf, etc. Follow usual protocols to do this. This script will extract the energy, temperature, pressure and volume (and averages thereof) from the mdout file. To execute, do the following (it may be a good idea to make a separate directory just for this analysis, as many files are created):
 +
 +
chmod +x E_asis
 +
 +
 +
./E_asis Filename.out
 +
 +
Filenames, as per this tutorial ought to be 01min, 02md, 03md, etc. To analyze the whole equilibration experiment (i.e. 01min to 09md), the following may work. Please check the results to be sure it worked properly. There are various ways to coordinate the analysis of these output files..
 +
 +
./E_asis *.out
 +
 +
= Production Simulation =
 +
 +
(Class please describe your production run simulations...)
  
 +
A Production simulation is the ultimate simulation to be performed. Once the structure is built (See tLeap section), minimized and equilibrated (See Minimization and equilibration section) and only essential restraints retained, production dynamics produce the data used to answer the scientific question at hand. The previous steps were just preparatory.
 +
 +
The script used to instruct the supercomputer how to perform the dynamics:
 +
 +
'''runsander.csh'''
 +
#! bin/csh
 +
set workdir = "/nfs/user03/'''Your User Name'''/AMBER_Tutorial/003.SANDER"
 +
cd ${workdir}
 +
set coor = "../002.TLEAP/1df8.com.wat.leap.crd"
 +
set topo = "../002.TLEAP/1df8.com.wat.leap.parm"
 +
cat << EOFQSUB > temp.qsub.csh
 +
#! /bin/tcsh
 +
#PBS -l nodes=4:ppn=2
 +
#PBS -l walltime=720:00:00
 +
#PBS -o zzz.qsub.out
 +
#PBS -e zzz.qsub.err
 +
#PBS -V
 +
cd ${workdir}
 +
cat \$PBS_NODEFILE | sort | uniq
 +
cat \$PBS_NODEFILE | sort | uniq > mpd.nodes
 +
#I have Deleted The cat'ing of input files that appeared in "run.sander.MPI.csh"
 +
#I have Deleted the "mpiexec" aka "mpirun" for jobs 1 through 9
 +
mpirun -n 8 sander.MPI -O -i 10md.in -o 10md.out -p ${topo} -c 09md.rst -ref 05mi.rst -x 10md.trj -inf 10md.info -r 10md.rst
 +
mpirun -n 8 sander.MPI -O -i 11md.in -o 11md.out -p ${topo} -c 10md.rst -ref 05mi.rst -x 11md.trj -inf 11md.info -r 11md.rst
 +
EOFQSUB
 +
chmod +x temp.qsub.csh
 +
qsub temp.qsub.csh
 +
 +
The important instructions are the 18th and 19th lines - mpirun.
 +
 +
'''mpirun''' - Instructs SeaWulf (SW) to use an mpi regime.
 +
 +
'''-n''' - Denotes number of processors (Here 8)
 +
 +
'''sander.MPI''' - Is the actual MPI code to be run. This is the heart of the whole course. The text after this and on this line will
 +
be instructions for sander.
 +
 +
'''-O'''  Overwrite previous output files with the new ones (to be produced by this job). This is useful in that if you have to run this script more than once, then something must have gone wrong. Thus, this will simply overwrite the files that are erroneous anyways. Don't use this if you really know what your doing and have a reasonable reason for doing so.
 +
 +
'''-i'''  Points sander to the input file (10md.in then 11md.in).
 +
 +
'''-o'''  Instructs sander to write the output file (10md.out then 11md.out), which contains the energies written by sander during dynamics. The various variables within the input file will determine the nature of the output: How frequently are the energies printed? How detailed is the information? What types of energies are being printed?
 +
 +
'''-p'''  Points sander to the topology (aka '''p'''arameter file). See Amber10 manual if you don't know what this is.
 +
 +
'''-c'''  Points sander to the coordinates (corresponding to the topology file from above) for which you want the production dynamics to begin from. In this case, we're starting with the final (snapshot - single frame) frame from the previous (equilibration) run, 9md.rst. The '''.rst''' at the end of a file means "restart". "restart" files are single frames. They can contain subtle information in addition to simply the number of atoms in the file (first line) and the x, y and z coordinates for each atom (which is why you need the topology file to give those x,y,z's meaning - what atom has what coordinates) so check the Amber10 manual if you're curious.
 +
 +
'''-ref''' Specifies for sander which file you would like the restraints (if not using restraints, then '''-ref''' is not used) to be centered on. Asking Professor Rizzo to explain this flag is a good idea, after reading the Amber10 manual (don't waste his time), of course, if it is critical to your work. If not: the reference is a snapshot-set of coordinates which you have carefully chosen to represent the "ideal" structure you would like the dynamics (sander) to use as a "reference"; using the restrain_wt option in the input file, you're telling sander how much like the reference structure you want your dynamical system to be. The restraint_wt option acts like a [http://en.wikipedia.org/wiki/Harmonic_oscillator spring, where harmonically] the reference coordinates and coordinates of the dynamical system are "psuedo-connected." Thus, it is imperative that the reference structure be realistic (say the coordinates of the crystal structure) and be the same system (i.e. same number of atoms).
 +
 +
'''-x''' Instructs sander to print the trajectory (position of each atom along with its velocity) every '''ntwx''' steps. This is the "Big" file. When your simulation is complete, zip the trajectories:
 +
 +
gzip 10md.trj
 +
 +
gzip 11md.trj
 +
 +
The trajectories are, in my opinion, the most important component of a MD experiment. So, read the Amber10 Manual. You could notice that ptraj uses the trajectory files to perform the bulk of the data analysis like RMSD, H-bonding evolution, radius of gyration, pi-stacking, etc., etc.
 +
 +
'''-info''' Gives the results of the interactions between the supercomputer and sander: How did the calculation proceed? Did everything work properly? Did the simulation run to completion? This flag can be useful in debugging failed jobs....hk...
 +
 +
'''-r''' Instructs sander to write a restart file. The frequency this is done is specified is '''ntwr''' in the 10md.in / 11md.in files. Usually '''ntwr=500'''. A restart file will be written at the end of the simulation - the final snapshot of the simulation will be the restart file if and when the simulation has run successfully to completion. During the simulation, however, this file is continuously re-written as a fail-safe - say the supercomputer crashes during your 10ns simulation, which has been running for 3 weeks. Well, the restart file is printed every ntwr steps so you could, as the name of the flag implies, simply restart the simulation (with minor modification to the input file and above runsander.csh script) from the last snapshot before the crash.
  
 
=ptraj - Analyzing Your Data=
 
=ptraj - Analyzing Your Data=
Line 193: Line 420:
 
*Change '''filename.parm''' to '''1df8.com.gas.leap.parm'''
 
*Change '''filename.parm''' to '''1df8.com.gas.leap.parm'''
  
'''#!/bin/csh''' -> will be in nearly all of the programs '''''you''''' will write - unless you dabble in Cpp or G77. It tells the shell to treat the contents of this here file as if the contents were being typed in the shell by hand.
+
'''#!/bin/csh''' -> will be in nearly all of the programs '''''you''''' will write - bash and tcsh are other [http://c2.com/cgi/wiki?ScriptingLanguage scripting languages]. It tells the shell to treat the contents of this here file as if the contents were being typed in the shell by hand.
  
 
'''ptraj''' -> has been ''aliased'' in your .cshrc file and will initialize ptraj once read by the machine.
 
'''ptraj''' -> has been ''aliased'' in your .cshrc file and will initialize ptraj once read by the machine.
Line 208: Line 435:
 
  Herbie:~> csh RUNTRAJ
 
  Herbie:~> csh RUNTRAJ
  
===1.)Combine Production Trajectories while Stripping the Water Molecules===
+
===1. Combine Production Trajectories while Stripping the Water Molecules===
  
 
'''ptraj.1.in'''
 
'''ptraj.1.in'''
Line 238: Line 465:
 
  strip :WAT
 
  strip :WAT
  
===2.)RMSD===
+
===2. RMSD===
 
RMSD - root mean-square distance - can be used to measure the distance an object moves relative to a '''reference''' object.
 
RMSD - root mean-square distance - can be used to measure the distance an object moves relative to a '''reference''' object.
 
For example, one could use an RMSD analysis to measure the movement of the alpha-carbon atoms in the active site of a protein, using the experimental structure as the reference structure (ptraj will measure the RMSD between each object specified in the ptraj script - see below) where ptraj will by default fit the two structures, aligning them as much as possible. '''nofit''' is used to turn this function off.
 
For example, one could use an RMSD analysis to measure the movement of the alpha-carbon atoms in the active site of a protein, using the experimental structure as the reference structure (ptraj will measure the RMSD between each object specified in the ptraj script - see below) where ptraj will by default fit the two structures, aligning them as much as possible. '''nofit''' is used to turn this function off.
Line 263: Line 490:
 
  rms '''reference''' out 1df8.rmsd.CA.txt :1-118@CA
 
  rms '''reference''' out 1df8.rmsd.CA.txt :1-118@CA
  
===4.)Keep Only Streptavidin from 1df8.com.trj.stripfit===
+
===4. Keep Only Streptavidin from 1df8.com.trj.stripfit===
 
'''ptraj.4.in'''
 
'''ptraj.4.in'''
 
  trajin 1df8.com.trj.stripfit 1 2000 1
 
  trajin 1df8.com.trj.stripfit 1 2000 1
Line 270: Line 497:
 
*We've just stripped residue 119 (Biotin) from the 1df8.com.trj.stripfit file, which we've previously stripped of water
 
*We've just stripped residue 119 (Biotin) from the 1df8.com.trj.stripfit file, which we've previously stripped of water
  
===5.)Keep Only Biotin from 1df8.com.trj.stripfit===
+
===5. Keep Only Biotin from 1df8.com.trj.stripfit===
 
'''ptraj.5.in'''
 
'''ptraj.5.in'''
 
  trajin 1df8.com.trj.stripfit 1 2000 1
 
  trajin 1df8.com.trj.stripfit 1 2000 1
Line 330: Line 557:
 
Then use "csh" command to execute the file analy.1.in in your AMBER_tutorial directory.
 
Then use "csh" command to execute the file analy.1.in in your AMBER_tutorial directory.
  
===Data extraction and calculation===
+
=== Hydrogen bond distance ===
When MD finised, you will find "gb.rescore.out.com", "gb.rescore.out.lig", "gb.rescore.out.rec" these three outputs. Use the following script for extracting data.
+
Before using ptraj to measure the H-bond distance, it's better to know which atoms we want to put in the ptraj script.
 +
VMD can automatically draws these H-bonds. First, select the ligand and residues around the ligand. Second, change the '''Drawing Method''' to '''HBonds'''.
  
  #! /bin/bash
+
After we know the atom name, we use '''distance''' command to measure the distance between two atoms.
echo com lig rec > namelist
+
Here is the sample script
+
  #!/bin/csh
LIST=`cat namelist`
+
  ptraj ../002.TLEAP/1df8.com.gas.leap.parm << EOF
+
  trajin ../004.PTRAJ/1df8.com.trj.stripfit
for i in $LIST ; do
+
  distance 34N_119O2 :34@N :119@O2 out 34N_119O2.out
+
  distance 73OG_119O3 :73@OG :119@O3 out 73OG_119O3.out
  grep VDWAALS gb.rescore.out.$i | awk '{print $3}' > $i.vdw
+
  distance 12OG_119O14 :12@OG :119@O14 out 12OG_119O14.out
grep VDWAALS gb.rescore.out.$i | awk '{print $9}' > $i.polar
+
  distance 28OH_119O14 :28@OH :119@O14 out 28OH_119O14.out
  grep ESURF gb.rescore.out.$i | awk '{print $3 * 0.00542 + 0.92}' > $i.surf
+
  EOF
  grep RESTRAINT gb.rescore.out.$i | awk '{print $8}' > $i.coul
 
   
 
paste -d " " $i.vdw $i.polar $i.surf $i.coul | awk '{print $1 + $2 + $3 + $4}' > data.$i
 
   
 
rm $i.*
 
   
 
done
 
 
paste -d " " data.com data.lig data.rec | awk '{print $1 - $2 -$3}' > data.all
 
 
rm namelist
 
 
   
 
  
Now you have three data sheet: data.com, data.lig and data.rec. Copy them to excel and do the calculation.
+
Plot the result. Here is the sample. It will be a good way to compare the H-bond distance with the binding free energy from MM-GBSA.
Result may look like [http://ringo.ams.sunysb.edu/img_auth.php/f/fb/MMGBSA_%CE%94%CE%94G.txt this].
 
  
 +
[[Image:111.png |center|1000px]]
  
==MMGBSA==
+
=MMGBSA=
 
Create a new directory 005.MMGBSA To do an analysis we need three runs for the complex, ligand and receptor.
 
Create a new directory 005.MMGBSA To do an analysis we need three runs for the complex, ligand and receptor.
 
Now write an input File:
 
Now write an input File:
Line 371: Line 586:
 
   igb=5, saltcon=0.0,
 
   igb=5, saltcon=0.0,
 
   gbsa=2, surften=1.0
 
   gbsa=2, surften=1.0
   offset=0.09, extdiel=1.0,
+
   offset=0.09, extdiel=78.5,
 
   cut=99999.0, nsnb=99999,
 
   cut=99999.0, nsnb=99999,
 
   scnb=2.0, scee=1.2,
 
   scnb=2.0, scee=1.2,
Line 397: Line 612:
 
  #PBS -e zzz.mmgbsa.1.err
 
  #PBS -e zzz.mmgbsa.1.err
 
  #PBS -V
 
  #PBS -V
 
+
 
  set workdir = "${HOME}/AMBER_Tutorial/005.MMGBSA"
 
  set workdir = "${HOME}/AMBER_Tutorial/005.MMGBSA"
 
  cd ${workdir}  
 
  cd ${workdir}  
 
+
 
  sander -O \
 
  sander -O \
 
  -i gb.rescore.in \
 
  -i gb.rescore.in \
Line 411: Line 626:
 
  -x mdcrd.com \
 
  -x mdcrd.com \
 
  -inf mdinfo.com \
 
  -inf mdinfo.com \
 
+
 
  sander -O \
 
  sander -O \
 
  -i gb.rescore.in \
 
  -i gb.rescore.in \
Line 422: Line 637:
 
  -x mdcrd.lig \
 
  -x mdcrd.lig \
 
  -inf mdinfo.lig
 
  -inf mdinfo.lig
 
+
 
  sander -O \
 
  sander -O \
 
  -i gb.rescore.in \
 
  -i gb.rescore.in \
Line 433: Line 648:
 
  -x mdcrd.rec \
 
  -x mdcrd.rec \
 
  -inf mdinfo.rec
 
  -inf mdinfo.rec
 
+
 
  exit
 
  exit
  
Line 446: Line 661:
 
'''Monitor your job'''
 
'''Monitor your job'''
 
:~> qstat -u YourUserName
 
:~> qstat -u YourUserName
 +
 +
===Data extraction and calculation===
 +
When MD finised, you will find "gb.rescore.out.com", "gb.rescore.out.lig", "gb.rescore.out.rec" these three outputs. Use the following script to extract data of MMGBSA.
 +
 +
#! /bin/bash
 +
# by Haoquan
 +
echo com lig rec > namelist
 +
 +
LIST=`cat namelist`
 +
 +
for i in $LIST ; do
 +
 +
grep VDWAALS gb.rescore.out.$i | awk '{print $3}' > $i.vdw
 +
grep VDWAALS gb.rescore.out.$i | awk '{print $9}' > $i.polar
 +
grep VDWAALS gb.rescore.out.$i | awk '{print $6}' > $i.coul
 +
grep ESURF  gb.rescore.out.$i | awk '{print $3 * 0.00542 + 0.92}' > $i.surf
 +
 +
 +
paste -d " " $i.vdw $i.polar $i.surf $i.coul | awk '{print $1 + $2 + $3 + $4}' > data.$i
 +
 +
rm $i.*
 +
 +
done
 +
 +
paste -d " " data.com data.lig data.rec | awk '{print $1 - $2 - $3}' > data.all
 +
 +
for ((j=1; j<=`wc -l data.all | awk '{print $1}'`; j+=1)) do
 +
echo $j , >> time
 +
done
 +
 +
paste -d " " time data.all > MMGBSA_vs_time
 +
 +
rm namelist time data.*
 +
 +
 +
 +
Run this script (it takes about 5 seconds) Now you have the final data sheet: MMGBSA_vs_time. Copy them to excel or Origin and separate two columns by comma.
 +
Result may look like [http://ringo.ams.sunysb.edu/img_auth.php/d/d7/Data.MMGBSA_vs_time this sheet]and [http://ringo.ams.sunysb.edu/index.php/Image:Graph1.jpg this graph].

Latest revision as of 09:22, 28 February 2012

For additional Rizzo Lab tutorials see AMBER Tutorials.

In this tutorial, we will learn how to run a molecular dynamics simulation of a protein-ligand complex. We will then post-process that simulation by calculating structural fluctuations (with RMSD) and free energies of binding (MM-GBSA).

Biotin Notes

Biotin is also called vitamin H. And it takes part in multiple processes inside the cell. It's a B-complex vitamin (coenzyme) that's involved in gluconeogenesis, citric acid cycle, and various carboxylation reactions.

Streptavidin Notes

Download PDB Here and view it's details Here. Streptavidin has an incredibly strong affinity for biotin; the dissociation constant for the streptavidin-biotin complex is on the order of femtomolar.


.... ....


What is AMBER?

Amber - Assisted Model Building with Energy Refinement - is a suite of about 50 programs that can be used to simulate, study and analyze macromolecular systems such as proteins dissolved in water at physiological conditions. Amber10, the current version (Amber11 soon to be released) of Amber, is extremely advanced, powerful and fast. PMEMD, particle mesh Ewald MD (boundary condition treatment / parallelized code) can churn out 314 ps/day of data for the system dihydrofolate reductase (159 residue protein) in TIP3P water (23,558 total atoms). However, because PMEMD lacks the ability to restrain the atoms we need properly, we will be using SANDER to perform most of our simulations.

AMBER Notes

The Amber 10 Manual is the primary resource when trying to learn what variables and keywords mean and what they do. Using Adobe Acrobat to view the file, you can simply search the document for keywords, which saves much time.

Keywords for preparatory programs:

LEaP: creates or modifies systems in Amber. It consists of the functions of prep, link, edit, and parm.

ANTECHAMBER: the main Antechamber suite program that helps prepare input files for nucleic acids and proteins for LEaP.


Keywords for simulating programs:

SANDER: according to the Amber 10 manual, it is 'a basic energy minimizer and molecular dynamics program. This program relaxes the structure by iteratively moving the atoms down the energy gradient until a sufficiently low average gradient is obtained. The molecular dynamics portion generates configurations of the system by integrating Newtonian equations of motion. MD will sample more configurational space than minimization, and will allow the structure to cross over small potential energy barriers. Configurations may be saved at regular intervals during the simulation for later analysis, and basic free energy calculations using thermodynamic integration may be performed. More elaborate conformational searching and modeling MD studies can also be carried out using the SANDER module. This allows a variety of constraints to be added to the basic force field, and has been designed especially for the types of calculations involved in NMR structure refinement'.

PMEMD: verison of SANDER that allows parallel scaling and optimized speed.


There is a mailing list you could sign-up for, as an additional resource.

Unix tips

This is specific to our cluster (seawulf) and desktop (mathlab) environments. See Activating your Seawulf Account for details of sw short cut.

Download Files from SeaWulf to Herbie:

ssh compute.mathlab.sunysb.edu

Login in to Herbie

mkdir sw_dir

make a directory "sw_dir" for which to download files and be organized

cd to sw_dir so when you scp files or directories back to Herbie, it copies them to a specific directory - "sw_dir"

cd sw_dir
scp -r sw:/location_of_files_or_directory/ .

Safely Copy, Recursively, /location_of_files_or_directory/

run.sander.MPI.csh

include these lines before mpirun command to know which nodes mpi is running on

echo "Queue is giving this nodes:"
cat \$PBS_NODEFILE
echo "MPI is running on:"
mpirun -n 8 hostname

Structure Preparation

To begin with, create the directories in seawulf you will work in, using the commands here:

mkdir AMBER_Tutorial
cd AMBER_Tutorial
mkdir 001.CHIMERA.MOL.PREP  
mkdir 002.TLEAP  
mkdir 003.SANDER 
mkdir 004.ptraj

Copy the commands above to your terminal and hit enter one at a time.

Open Chimera, choose File - Fetch by ID, then type in "1df8". Now you will see your protein and ligand in Chimera.

1. It is a dimer, but you need only a monomer.

  • Click Select - chain - B, you would see chain B is highlighted.
  • Then click Action - Atoms/Bonds - delete.

Now only a receptor, a ligand and several water molecules are left.

2. Now you need to separate the ligand and receptor.

  • First, Select - residue - HOH, then delete it.
  • File - Save PDB, save this pdb as "1df8.rec_lig.pdb", then Select - residue - BTN, delete it. Save PDB as "1df8.rec.noh.pdb."
  • Second, open the 1df8.rec_lig.pdb, select the receptor and delete it (Tips: you can invert your selection.)
  • Then Tools - structure editing - Add H, press OK.
  • Then Tools - structure editing - Add Charge, press OK. select AM1-BCC charge model.
  • File - Save mol2, save it as "1df8.lig.mol2".
  • Open up ligand mol2 file with your favorite text editor (i.e vim). Manually replace the 2nd columns with unique names. Leap only uses the first 3 characters as the name.

old.mol2:

     1 C11        32.5640   18.1390   14.0710 C.2       1 BTN1        0.9079
     2 O11        33.4260   17.4630   13.4730 O.co2     1 BTN1       -0.8596
     3 O12        32.5320   19.3920   13.9660 O.co2     1 BTN1       -0.8472
     4 C10        31.5990   17.4500   14.9620 C.3       1 BTN1       -0.1966
     5 C9         31.2260   16.0460   14.5600 C.3       1 BTN1       -0.0459
     6 C8         30.5160   16.0360   13.2000 C.3       1 BTN1       -0.0920
     7 C7         30.0160   14.6260   12.8220 C.3       1 BTN1       -0.0683
     8 C2         29.2080   14.5510   11.5450 C.3       1 BTN1       -0.0028
     9 S1         27.5110   15.2280   11.7030 S.3       1 BTN1       -0.2811
    10 C6         27.1670   14.6500   10.0230 C.3       1 BTN1       -0.0520
    11 C5         27.7360   13.2480    9.9740 C.3       1 BTN1        0.0662
    12 N1         26.8850   12.1810   10.4970 N.pl3     1 BTN1       -0.4731

new.mol2

     1 C1         32.5640   18.1390   14.0710 C.2       1 BTN1        0.9079
     2 O2         33.4260   17.4630   13.4730 O.co2     1 BTN1       -0.8596
     3 O3         32.5320   19.3920   13.9660 O.co2     1 BTN1       -0.8472
     4 C4         31.5990   17.4500   14.9620 C.3       1 BTN1       -0.1966
     5 C5         31.2260   16.0460   14.5600 C.3       1 BTN1       -0.0459
     6 C6         30.5160   16.0360   13.2000 C.3       1 BTN1       -0.0920
     7 C7         30.0160   14.6260   12.8220 C.3       1 BTN1       -0.0683
     8 C8         29.2080   14.5510   11.5450 C.3       1 BTN1       -0.0028
     9 S9         27.5110   15.2280   11.7030 S.3       1 BTN1       -0.2811
    10 C10        27.1670   14.6500   10.0230 C.3       1 BTN1       -0.0520
    11 C11        27.7360   13.2480    9.9740 C.3       1 BTN1        0.0662
    12 N12        26.8850   12.1810   10.4970 N.pl3     1 BTN1       -0.4731

Note that this mol2 file would cause errors later in leap, because of its numbering. You need to modify it manually. Simply you can copy this file from "/nfs/user03/pholden/AMBER_Tutorial/001.CHIMERA.MOL.PREP/1df8.lig.mol2" on seawulf, and compare it with yours to see what should be modified. Also, place it in your "001.CHIMERA.MOL.PREP" directory.

3. Now you have a receptor pdb file and a ligand mol2, place them in your "001.CHIMERA.MOL.PREP" directory by performing the following:

  • put files on a machine that is a portal to seawulf (i.e. silver, herbie or ringo). If user is on a MATHLAB machine then files are accessible from silver or herbie.
  • scp files sw:path
  • Note that is only applies to AMS536 (or Rizzo lab rotation) students and is specific for our computer setup.

Note that there are four residues (22, 27, 40, and 57) that have two possible conformers, when tleap builds your parm and crd files only the first occurrence of each atom is kept. See section below and leap output.

TLEAP

Class please add in a Tleap section here.

Class I've gotten a section started please improve this section. Thanks.

describe our protocol and give example input files.

First, create directory called ~/AMBER_Tutorial/002.TLEAP and go into that dir:

mkdir ~/AMBER_Tutorial/002.TLEAP
cd ~/AMBER_Tutorial/002.TLEAP

Next we will write the following files that you will need to input files for LEAP in order to create your parameter/topology files and your coordinate files

cat << EOF > tleap.lig.in
set default PBradii mbondi2
source leaprc.ff99SB
loadamberparams rizzo_amber7.ionparms/parm.e16.dat
loadoff rizzo_amber7.ionparms/ions.lib
loadamberparams rizzo_amber7.ionparms/ions.frcmod
source leaprc.gaff
loadamberparams rizzo_amber7.ionparms/gaff.dat.rizzo
loadamberparams 1df8.lig.ante.frcmod
loadamberprep 1df8.lig.ante.prep
lig = loadpdb 1df8.lig.ante.pdb
saveamberparm lig 1df8.lig.gas.leap.parm 1df8.lig.gas.leap.crd
solvateBox lig TIP3PBOX 10.0
saveamberparm lig 1df8.lig.wat.leap.parm 1df8.lig.wat.leap.crd
charge lig
quit
EOF
cat << EOF > tleap.rec.in
set default PBradii mbondi2
source leaprc.ff99SB
loadamberparams rizzo_amber7.ionparms/parm.e16.dat
loadoff rizzo_amber7.ionparms/ions.lib
loadamberparams rizzo_amber7.ionparms/ions.frcmod
REC = loadpdb ../001.CHIMERA.MOL.PREP/1df8.rec.noh.pdb

saveamberparm REC 1df8.rec.gas.leap.parm 1df8.rec.gas.leap.crd
solvateBox REC TIP3PBOX 10.0
saveamberparm REC 1df8.rec.wat.leap.parm 1df8.rec.wat.leap.crd
charge REC
quit
EOF
cat << EOF > tleap.com.in
set default PBradii mbondi2
source leaprc.ff99SB
loadamberparams rizzo_amber7.ionparms/parm.e16.dat
loadoff rizzo_amber7.ionparms/ions.lib
loadamberparams rizzo_amber7.ionparms/ions.frcmod
source leaprc.gaff
loadamberparams rizzo_amber7.ionparms/gaff.dat.rizzo
REC = loadpdb ../001.CHIMERA.MOL.PREP/1df8.rec.noh.pdb

loadamberparams 1df8.lig.ante.frcmod
loadamberprep 1df8.lig.ante.prep
LIG = loadpdb 1df8.lig.ante.pdb
COM = combine {REC LIG}
saveamberparm COM 1df8.com.gas.leap.parm 1df8.com.gas.leap.crd
solvateBox COM TIP3PBOX 10.0
saveamberparm COM 1df8.com.wat.leap.parm 1df8.com.wat.leap.crd
charge COM
quit
EOF

Note the rizzo_amber7.ionparms/*.dat are modified parameter files.


The following is an example csh script used to generate the necessary files for running molecular Dynamics.

cat << EOF > run.TLEAP.csh
#! /bin/csh

set workdir = "~/AMBER_Tutorial/002.TLEAP"
cd $workdir

rm 1df8.* ANTECHAMBER* *.out ATOMTYPE.INF leap.log NEWPDB.PDB PREP.INF

antechamber -i ../001.CHIMERA.MOL.PREP/1df8.lig.mol2 -fi mol2  -o 1df8.lig.ante.pdb  -fo pdb
antechamber -i ../001.CHIMERA.MOL.PREP/1df8.lig.mol2 -fi mol2  -o 1df8.lig.ante.prep -fo prepi
parmchk -i 1df8.lig.ante.prep -f  prepi -o 1df8.lig.ante.frcmod
tleap -s -f tleap.lig.in > tleap.lig.out
tleap -s -f tleap.rec.in > tleap.rec.out
tleap -s -f tleap.com.in > tleap.com.out

ambpdb -p   1df8.com.gas.leap.parm -tit "pdb" <1df8.com.gas.leap.crd > 1df8.com.gas.leap.pdb

exit
EOF

To run this script issue the following command:

csh run.TLEAP.csh

To parametrize your ligands the following is done.

antechamber is run first to create a prepi file, which contains the internal coordinates of the ligand. Then parmchk is run to create a frcmod file, which contains additional parameters not in GAFF and need by tleap. This files are need be for running leap. You can see if you look in the tleap.com.in and tleap.lig.in the frcmod is loaded to insure that the ligand is parametrized fully

It is a good idea to open all these files with your favorite text editor (eg. vim).


To insure that TLEAP ran correctly and generated all nesesary files issue the following:

ls

You should have all of the following files:

1df8.com.gas.leap.crd   1df8.lig.wat.leap.parm     ATOMTYPE.INF
1df8.com.gas.leap.parm  1df8.rec.gas.leap.crd      leap.log
1df8.com.gas.leap.pdb   1df8.rec.gas.leap.parm     NEWPDB.PDB
1df8.com.wat.leap.crd   1df8.rec.wat.leap.crd      PREP.INF
1df8.com.wat.leap.parm  1df8.rec.wat.leap.parm     run.tleap.csh
1df8.lig.ante.frcmod    ANTECHAMBER_AC.AC          tleap.com.in
1df8.lig.ante.pdb       ANTECHAMBER_AC.AC0         tleap.com.out
1df8.lig.ante.prep      ANTECHAMBER_BOND_TYPE.AC   tleap.lig.in
1df8.lig.gas.leap.crd   ANTECHAMBER_BOND_TYPE.AC0  tleap.lig.out
1df8.lig.gas.leap.parm  ANTECHAMBER_PREP.AC        tleap.rec.in
1df8.lig.wat.leap.crd   ANTECHAMBER_PREP.AC0       tleap.rec.out

now do csh to create the coordinate and parm files

Minimization and equilibration

In order to adjust our structures to the force field and remove any model building artifacts, we first perform a several-step equilibration protocol. Several iterations of minimization and molecular dynamics will be preformed with decreasing restraints.


The first step: Relaxing the experimental or silico structure

01mi.in: equilibration
&cntrl
  imin = 1, maxcyc = 1000, ntmin = 2,
  ntx = 1, ntc = 1, ntf = 1,
  ntb = 1, ntp = 0,
  ntwx = 1000, ntwe = 0, ntpr = 1000,
  scee = 1.2, cut = 8.0,
  ntr = 1,
  restraintmask = ':1-119 & !@H=',            
  restraint_wt=5.0,
/

The MD run should be set up:

cat << EOF > 10md.in
10md.in: production (500000 = 1ns)
 &cntrl
   imin = 0, ntx = 5, irest = 1, nstlim = 500000,
   temp0 = 298.15, tempi = 298.15, ig = 71287,
   ntc = 2, ntf = 1, ntt = 1, dt = 0.002,
   ntb = 2, ntp = 1, tautp = 1.0, taup = 1.0,
   ntwx = 500, ntwe = 0, ntwr = 500, ntpr = 500,
   scee = 1.2, cut = 8.0, iwrap = 1,
   ntr = 1, nscm = 100,
   restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1,
 /


  • &cntrl -> Tells SANDER that what follows are control variables.
  • imin=1 -> Perform Minimization
  • maxcyc=1000 -> Perform 1000 Minimization Steps
  • ntmin=2 -> Steepest Descent Method of Minimization
  • ntx=1 -> Initial Coordinates Lack Velocity - it's a restart file (See VMD)
  • ntc=1 -> "SHAKE" Posititional Restraints OFF (Default)
  • ntf=1 -> Calculate All types of Forces (bonds, angles, dihedrals, non-bonded)
  • ntb=1 -> Constant Volume Boundary Periodicity
  • ntp=0 -> No Pressure Regulation
  • ntwx=1000 -> Print Coordinates Frequency
  • ntwe=0 -> Print Energy to "mden" Frequency
  • ntpr=1000 -> Print Readable Energy Information to "mdout" and "mdinfo"
  • scee' -> 1-4 Coulombic Forces are Divided (Default=1.2)
  • cut=8.0 -> Coulombic Force Cutoff distance in Angstroms
  • restraintmask = ':1-119 & !@H=', -> restraint the residues matching the mask':1-119 & !@H='. Here, we're restraining residues 1 through 119 and everything that isn't hydrogen. Essentially, onlt Hydrogen atoms move free of restraint.
  • restraint_wt=5.0 is the Force Constant assigned to the restrained atoms. Each atom "sits" in a potential-energy well characterized by a "5.0" kcal/mol wall.
  • / is used to the machine to stop the job when it's done.

The following are our exacted equilibration and production protocols.

equilibration:
01mi.in: maxcyc = 1000, restraintmask = ':1-119 & !@H=', restraint_wt = 5.0,
02md.in: nstlim = 50000, restraintmask = ':1-119 & !@H=', restraint_wt = 5.0, dt = 0.001,
03mi.in: maxcyc = 1000, restraintmask = ':1-119 & !@H=', restraint_wt = 2.0,
04mi.in: maxcyc = 1000, restraintmask = ':1-119 & !@H=', restraint_wt = 0.1,
05mi.in: maxcyc = 1000, restraintmask = ':1-119 & !@H=', restraint_wt = 0.05,
06md.in: nstlim = 50000, restraintmask = ':1-119 & !@H=', restraint_wt = 1.0, dt = 0.001,
07md.in: nstlim = 50000, restraintmask = ':1-119 & !@H=', restraint_wt = 0.5, dt = 0.001,
08md.in: nstlim = 50000, restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1, dt = 0.001,
09md.in: nstlim = 50000, restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1, dt = 0.001,
production:
10md.in: nstlim = 500000, restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1, dt = 0.002,
11md.in: nstlim = 500000, restraintmask = ':1-118@CA,C,N', restraint_wt = 0.1, dt = 0.002,

Note that the other parameters not specified are the same as those in the above two files for minimization and molecular dynamics, respectively.


When your simulations have finished, you ought to check the stability and realism of results. Use the script E_asis to analyze the the mdout files. This ought to also be used to check the validity and stability of your production runs.

Download E_asis onto your local machine (the one you're using right now). Once saved onto local machine, transfer it to you working directory on Herbie, Seawulf, etc. Follow usual protocols to do this. This script will extract the energy, temperature, pressure and volume (and averages thereof) from the mdout file. To execute, do the following (it may be a good idea to make a separate directory just for this analysis, as many files are created):

chmod +x E_asis


./E_asis Filename.out

Filenames, as per this tutorial ought to be 01min, 02md, 03md, etc. To analyze the whole equilibration experiment (i.e. 01min to 09md), the following may work. Please check the results to be sure it worked properly. There are various ways to coordinate the analysis of these output files..

./E_asis *.out

Production Simulation

(Class please describe your production run simulations...)

A Production simulation is the ultimate simulation to be performed. Once the structure is built (See tLeap section), minimized and equilibrated (See Minimization and equilibration section) and only essential restraints retained, production dynamics produce the data used to answer the scientific question at hand. The previous steps were just preparatory.

The script used to instruct the supercomputer how to perform the dynamics:

runsander.csh

#! bin/csh
set workdir = "/nfs/user03/Your User Name/AMBER_Tutorial/003.SANDER"
cd ${workdir}
set coor = "../002.TLEAP/1df8.com.wat.leap.crd"
set topo = "../002.TLEAP/1df8.com.wat.leap.parm"
cat << EOFQSUB > temp.qsub.csh
#! /bin/tcsh
#PBS -l nodes=4:ppn=2
#PBS -l walltime=720:00:00
#PBS -o zzz.qsub.out
#PBS -e zzz.qsub.err
#PBS -V
cd ${workdir}
cat \$PBS_NODEFILE | sort | uniq
cat \$PBS_NODEFILE | sort | uniq > mpd.nodes
#I have Deleted The cat'ing of input files that appeared in "run.sander.MPI.csh"
#I have Deleted the "mpiexec" aka "mpirun" for jobs 1 through 9
mpirun -n 8 sander.MPI -O -i 10md.in -o 10md.out -p ${topo} -c 09md.rst -ref 05mi.rst -x 10md.trj -inf 10md.info -r 10md.rst
mpirun -n 8 sander.MPI -O -i 11md.in -o 11md.out -p ${topo} -c 10md.rst -ref 05mi.rst -x 11md.trj -inf 11md.info -r 11md.rst
EOFQSUB
chmod +x temp.qsub.csh
qsub temp.qsub.csh

The important instructions are the 18th and 19th lines - mpirun.

mpirun - Instructs SeaWulf (SW) to use an mpi regime.

-n - Denotes number of processors (Here 8)

sander.MPI - Is the actual MPI code to be run. This is the heart of the whole course. The text after this and on this line will be instructions for sander.

-O Overwrite previous output files with the new ones (to be produced by this job). This is useful in that if you have to run this script more than once, then something must have gone wrong. Thus, this will simply overwrite the files that are erroneous anyways. Don't use this if you really know what your doing and have a reasonable reason for doing so.

-i Points sander to the input file (10md.in then 11md.in).

-o Instructs sander to write the output file (10md.out then 11md.out), which contains the energies written by sander during dynamics. The various variables within the input file will determine the nature of the output: How frequently are the energies printed? How detailed is the information? What types of energies are being printed?

-p Points sander to the topology (aka parameter file). See Amber10 manual if you don't know what this is.

-c Points sander to the coordinates (corresponding to the topology file from above) for which you want the production dynamics to begin from. In this case, we're starting with the final (snapshot - single frame) frame from the previous (equilibration) run, 9md.rst. The .rst at the end of a file means "restart". "restart" files are single frames. They can contain subtle information in addition to simply the number of atoms in the file (first line) and the x, y and z coordinates for each atom (which is why you need the topology file to give those x,y,z's meaning - what atom has what coordinates) so check the Amber10 manual if you're curious.

-ref Specifies for sander which file you would like the restraints (if not using restraints, then -ref is not used) to be centered on. Asking Professor Rizzo to explain this flag is a good idea, after reading the Amber10 manual (don't waste his time), of course, if it is critical to your work. If not: the reference is a snapshot-set of coordinates which you have carefully chosen to represent the "ideal" structure you would like the dynamics (sander) to use as a "reference"; using the restrain_wt option in the input file, you're telling sander how much like the reference structure you want your dynamical system to be. The restraint_wt option acts like a spring, where harmonically the reference coordinates and coordinates of the dynamical system are "psuedo-connected." Thus, it is imperative that the reference structure be realistic (say the coordinates of the crystal structure) and be the same system (i.e. same number of atoms).

-x Instructs sander to print the trajectory (position of each atom along with its velocity) every ntwx steps. This is the "Big" file. When your simulation is complete, zip the trajectories:

gzip 10md.trj

gzip 11md.trj

The trajectories are, in my opinion, the most important component of a MD experiment. So, read the Amber10 Manual. You could notice that ptraj uses the trajectory files to perform the bulk of the data analysis like RMSD, H-bonding evolution, radius of gyration, pi-stacking, etc., etc.

-info Gives the results of the interactions between the supercomputer and sander: How did the calculation proceed? Did everything work properly? Did the simulation run to completion? This flag can be useful in debugging failed jobs....hk...

-r Instructs sander to write a restart file. The frequency this is done is specified is ntwr in the 10md.in / 11md.in files. Usually ntwr=500. A restart file will be written at the end of the simulation - the final snapshot of the simulation will be the restart file if and when the simulation has run successfully to completion. During the simulation, however, this file is continuously re-written as a fail-safe - say the supercomputer crashes during your 10ns simulation, which has been running for 3 weeks. Well, the restart file is printed every ntwr steps so you could, as the name of the flag implies, simply restart the simulation (with minor modification to the input file and above runsander.csh script) from the last snapshot before the crash.

ptraj - Analyzing Your Data

ptraj is an analysis program included in the AMBER suite (AMBERtools) designed in part by Dr. Thomas Cheatham. See this website.

This page contains a brief list of ptraj functions and their syntax. Commands can be combined with most combinations of other functions to suit the need.

A useful and recommended program - merely a text file with functional syntax - to write is:

RUNTRAJ

#!/bin/csh
ptraj <filename.parm> <ptraj.1.in> > <ptraj.1.out>
exit

(When writing the above, one depressed 'Enter' on the keyboard, which is 'recorded' by vim. So, when the file is executed, it would be like hitting 'Enter' if you were entering the commands by hand in the shell.

  • Change filename.parm to 1df8.com.gas.leap.parm

#!/bin/csh -> will be in nearly all of the programs you will write - bash and tcsh are other scripting languages. It tells the shell to treat the contents of this here file as if the contents were being typed in the shell by hand.

ptraj -> has been aliased in your .cshrc file and will initialize ptraj once read by the machine.

filename.parm -> is the .parm file you would like to specify.

ptraj_input_filename.1.in -> is the set of instruction you want ptraj to read and perform, in an input file (This would be "ptraj.concatenate.strip.trj" in the coming examples).

exit -> will exit from ptraj when the ptraj_input_filename.1.in has completed its instruction(s).


Executing it:

Herbie:~> csh RUNTRAJ

1. Combine Production Trajectories while Stripping the Water Molecules

ptraj.1.in

 trajin ../003.SANDER/10md.trj 1 1000 1
  • trajin -> tells ptraj to "read-in" the file which comes after it
  • ../003.SANDER/10md.trj -> is the file to be "read-in"
  • 1 1000 1 -> tells ptraj to use the first to the 1000th snapshot of the trajectory. The third number, "1", is telling ptraj to read-in every frame. If this last number were "2", then ptraj would read-in every-other snapshot, "10" would be every 10th snapshot and so on.
trajin ../003.SANDER/11md.trj 1 1000 1
  • This will do the exact same as the first trajin cmd (command), except now we're analyzing a different trajectory - 11md.trj.
trajout 1df8.trj.strip nobox
  • trajout -> tells ptraj to write a new trajectory file, combining the two trajectories - 10md.trj and 11md.trj - from trajin.
  • 1df8.trj.strip -> is the name of the new trajectory to be made by trajout.
  • nobox -> is essentially a house-keeping cmd, where the periodic box information will just be neglected. Unless using CHARMM files, this ought to not be an issue.
strip :WAT
  • strip -> instructs ptraj to disappear those objects named "WAT" ':WAT
  • So you're left with a file "ptraj.concatenate.strip.trj" with the following in it:
trajin ../003.SANDER/10md.trj 1 1000 1
trajin ../003.SANDER/11md.trj 1 1000 1
trajout 1df8.trj.strip nobox
strip :WAT

2. RMSD

RMSD - root mean-square distance - can be used to measure the distance an object moves relative to a reference object. For example, one could use an RMSD analysis to measure the movement of the alpha-carbon atoms in the active site of a protein, using the experimental structure as the reference structure (ptraj will measure the RMSD between each object specified in the ptraj script - see below) where ptraj will by default fit the two structures, aligning them as much as possible. nofit is used to turn this function off.

ptraj.2.in

 trajin 1df8.trj.strip 1 2000 1
 trajout 1df8.com.trj.stripfit
 reference 1df8.com.gas.leap.crd
  • reference -> tells ptraj that you want to specify a reference file - snapshot - for which to compare your trajectory (file with many snapshots) to.
  • 1df8.com.gas.leap.crd -> is the reference file. This file is very important and you ought to be thoughtful about your selection of this file. Usually, when possible, one wants to use the experimental structure as the reference. Referencing the experimental structure 'usually' provides the most informative results. But, if done thoughtfully, a non-experimental reference could be informative, too...
 rms reference out 1df8.rmsd.CA.txt :1-118@CA
  • rms -> tells ptraj you want to perform an rms analysis
  • reference -> tells traj to use the reference file, specified in the previous line
  • out -> tells ptraj to create a temporary file out for which to store calculations during the analysis
  • 1df8.rmsd.CA.txt -> is the name of the file with the RMSD analysis results. This is the file you will use with your plotting program..
  • :1-118@CA -> tells ptraj to analyze the RMSD of the alpha-carbon atoms CA residues 1-118.


So when you're done, you're left with:

trajin 1df8.trj.strip 1 2000 1
trajout 1df8.com.trj.stripfit
reference 1df8.com.gas.leap.crd
rms reference out 1df8.rmsd.CA.txt :1-118@CA

4. Keep Only Streptavidin from 1df8.com.trj.stripfit

ptraj.4.in

trajin 1df8.com.trj.stripfit 1 2000 1
trajout 1df8.rec.trj.stripfit
strip :119
  • We've just stripped residue 119 (Biotin) from the 1df8.com.trj.stripfit file, which we've previously stripped of water

5. Keep Only Biotin from 1df8.com.trj.stripfit

ptraj.5.in

trajin 1df8.com.trj.stripfit 1 2000 1
trajout df8.lig.trj.stripfit
strip :1-118
  • Strip everything, keeping only the protein, Streptavidin

Also, you can write a csh file to go through the procedure above. Make a file analy.1.csh in your AMBER_tutorial directory as follows:

#! /bin/tcsh

mkdir 004.PTRAJ
cd ./004.PTRAJ

cat << EOF > ptraj.1.in
trajin ../003.SANDER/10md.trj 1 1000 1
trajin ../003.SANDER/11md.trj 1 1000 1
trajout 1df8.trj.strip nobox
strip :WAT
EOF

ptraj ../002.TLEAP/1df8.com.wat.leap.parm ptraj.1.in >ptraj.1.log

cat << EOF > ptraj.2.in
trajin 1df8.trj.strip
trajout 1df8.com.trj.stripfit
reference ../002.TLEAP/1df8.com.gas.leap.crd
rms reference out 1df8.rmsd.CA.txt :1-118@CA
EOF

ptraj ../002.TLEAP/1df8.com.gas.leap.parm ptraj.2.in >ptraj.2.log

cat << EOF > ptraj.3.in
trajin 1df8.com.trj.stripfit
reference ../002.TLEAP/1df8.com.gas.leap.crd
rms reference out 1df8.lig.rmsd.txt :119@C*,N*,O*,S* nofit
EOF 

ptraj ../002.TLEAP/1df8.com.gas.leap.parm ptraj.3.in >ptraj.3.log 

cat << EOF > ptraj.4.in
trajin 1df8.com.trj.stripfit
trajout 1df8.rec.trj.stripfit
strip :119
EOF 

cat <<EOF > ptraj.5.in 
trajin 1df8.com.trj.stripfit
trajout 1df8.lig.trj.stripfit
strip :1-118
EOF

ptraj ../002.TLEAP/1df8.com.gas.leap.parm ptraj.4.in >ptraj.4.log
ptraj ../002.TLEAP/1df8.com.gas.leap.parm ptraj.5.in >ptraj.5.log  

cd .. 

Then use "csh" command to execute the file analy.1.in in your AMBER_tutorial directory.

Hydrogen bond distance

Before using ptraj to measure the H-bond distance, it's better to know which atoms we want to put in the ptraj script. VMD can automatically draws these H-bonds. First, select the ligand and residues around the ligand. Second, change the Drawing Method to HBonds.

After we know the atom name, we use distance command to measure the distance between two atoms. Here is the sample script

#!/bin/csh
ptraj ../002.TLEAP/1df8.com.gas.leap.parm << EOF
trajin ../004.PTRAJ/1df8.com.trj.stripfit
distance 34N_119O2 :34@N :119@O2 out 34N_119O2.out
distance 73OG_119O3 :73@OG :119@O3 out 73OG_119O3.out
distance 12OG_119O14 :12@OG :119@O14 out 12OG_119O14.out
distance 28OH_119O14 :28@OH :119@O14 out 28OH_119O14.out
EOF

Plot the result. Here is the sample. It will be a good way to compare the H-bond distance with the binding free energy from MM-GBSA.

111.png

MMGBSA

Create a new directory 005.MMGBSA To do an analysis we need three runs for the complex, ligand and receptor. Now write an input File: gb.rescore.in Single Point GB energy Calculation

&cntrl
 ntf=1, ntb=0, ntc=2,
 idecomp=0,
 igb=5, saltcon=0.0,
 gbsa=2, surften=1.0
 offset=0.09, extdiel=78.5,
 cut=99999.0, nsnb=99999,
 scnb=2.0, scee=1.2,
 imin=5, maxcyc=1, ncyc=0,
/
  • idecomp=0 -> Important, but is turned-off here. It is used for analysis.
  • igb=5 -> OBC Flavor of GB
  • gbsa=2 -> Generalized Born / Surface Area
    • 1 -> LCPA Surface Area Method
    • 2 -> Recursive Atom-Centered Method
  • surften=1 -> Calculate Solvation Free-Energy (non-polar contribution)
  • offset=0.09 -> Dielectric Scaling for GB
  • extdiel=1.0 -> Dielectric Constant for Solvent Exterior (Default 78.5)
  • nsnb=99999 -> Non-Bonded List Update Frequency (See igb=0, nbflag=0) (Default=25)
  • scnb=2.0 -> 1-4 van der Waals Division (default=2.0)


Write the Following Job Script: run.sander.rescore.csh

#!/bin/tcsh
#PBS -l nodes=1:ppn=2
#PBS -l walltime=24:00:00
#PBS -o zzz.mmgbsa.1.out
#PBS -e zzz.mmgbsa.1.err
#PBS -V

set workdir = "${HOME}/AMBER_Tutorial/005.MMGBSA"
cd ${workdir} 

sander -O \
-i gb.rescore.in \
-o gb.rescore.out.com \
-p ../002.TLEAP/1df8.com.gas.leap.parm \
-c ../002.TLEAP/1df8.com.gas.leap.crd \
-y ../004.PTRAJ/1df8.com.trj.stripfit \
-r restrt.com \
-ref ../002.TLEAP/1df8.com.gas.leap.crd \
-x mdcrd.com \
-inf mdinfo.com \

sander -O \
-i gb.rescore.in \
-o gb.rescore.out.lig \
-p ../002.TLEAP/1df8.lig.gas.leap.parm \
-c ../002.TLEAP/1df8.lig.gas.leap.crd \
-y ../004.PTRAJ/1df8.lig.trj.stripfit \
-r restrt.lig \
-ref ../002.TLEAP/1df8.lig.gas.leap.crd \
-x mdcrd.lig \
-inf mdinfo.lig

sander -O \
-i gb.rescore.in \
-o gb.rescore.out.rec \
-p ../002.TLEAP/1df8.rec.gas.leap.parm \
-c ../002.TLEAP/1df8.rec.gas.leap.crd \
-y ../004.PTRAJ/1df8.rec.trj.stripfit \
-r restrt.rec \
-ref ../002.TLEAP/1df8.rec.gas.leap.crd \
-x mdcrd.rec \
-inf mdinfo.rec

exit


Change "run.sander.rescore.csh" into an executable

:~> chmod +x run.sander.rescore.csh

Submit the job -> run.sander.rescore.csh

~> qsub run.sander.rescore.csh

Monitor your job

~> qstat -u YourUserName

Data extraction and calculation

When MD finised, you will find "gb.rescore.out.com", "gb.rescore.out.lig", "gb.rescore.out.rec" these three outputs. Use the following script to extract data of MMGBSA.

#! /bin/bash
# by Haoquan
echo com lig rec > namelist

LIST=`cat namelist`

for i in $LIST ; do

grep VDWAALS gb.rescore.out.$i | awk '{print $3}' > $i.vdw
grep VDWAALS gb.rescore.out.$i | awk '{print $9}' > $i.polar
grep VDWAALS gb.rescore.out.$i | awk '{print $6}' > $i.coul
grep ESURF   gb.rescore.out.$i | awk '{print $3 * 0.00542 + 0.92}' > $i.surf


paste -d " " $i.vdw $i.polar $i.surf $i.coul | awk '{print $1 + $2 + $3 + $4}' > data.$i

rm $i.*

done

paste -d " " data.com data.lig data.rec | awk '{print $1 - $2 - $3}' > data.all 

for ((j=1; j<=`wc -l data.all | awk '{print $1}'`; j+=1)) do
echo $j , >> time
done

paste -d " " time data.all > MMGBSA_vs_time

rm namelist time data.*


Run this script (it takes about 5 seconds) Now you have the final data sheet: MMGBSA_vs_time. Copy them to excel or Origin and separate two columns by comma. Result may look like this sheetand this graph.