2015 AMBER tutorial with PARP

From Rizzo_Lab
Revision as of 22:51, 14 May 2015 by Stonybrook (talk | contribs) (MM-GBSA Energy Calculation)
Jump to: navigation, search

For additional Rizzo Lab tutorials see AMBER Tutorials.

In this tutorial, we will learn how to run a molecular dynamics simulation of a protein-ligand complex. We will then post-process that simulation by calculating structural fluctuations (with RMSD) and free energies of binding (MM-GBSA).

I. Introduction

AMBER

Amber - Assisted Model Building with Energy Refinement - is a multi-program suite for macromolecular simulations. Amber14 is the most recent version of the software and it includes new force fields such as ff14SB. In addition, in this release, more features from sander have been added to pmemd for both CPU and GPU platforms, including performance improvements, and support for extra points, multi-dimension replica exchange, a Monte Carlo barostat, ScaledMD, Jarzynski sampling, explicit solvent constant pH, GBSA, and hydrogen mass repartitioning. Support is also included for the latest Kepler, Titan and GTX7xx GPUs expanded options for Poisson-Boltzmann solvation calculations, accelerated molecular dynamics, additional features in sander pmemd code, and expanded methods for free energy calculations. Our lab is set up with Amber14 and the latest update of AmberTools14 which contains the programs such as antechamber and tleap to set up your simulation.

The Amber 14 Manual is available to get started with using Amber14. You can search the document for keywords such as "tleap" if you use Adobe Acrobat to view the file. Additionally, AmberTools Reference Manual is another reference for the programs available under Amber tools.

Here below are some of the programs available in both Amber and AmberTools:

  1. LEaP: LEaP is an X-windows-based program that provides for basic model building and Amber coordinate and parameter/topology input file creation. It includes a molecular editor which allows for building residues and manipulating molecules.
  2. ANTECHAMBER: This program suite automates the process of developing force field descriptors for most organic molecules. It starts with structures (usually in PDB format), and generates files that can be read into LEaP for use in molecular modeling. The force field description that is generated is designed to be compatible with the usual Amber force fields for proteins and nucleic acids.
  3. SANDER: Sander is short for Simulated annealing with NMR-derived energy restraints. This allows for NMR refinement based on NOE-derived distance restraints, torsion angle restraints, and penalty functions based on chemical shifts and NOESY volumes. Sander is also the "main" program used for molecular dynamics simulations, and is also used for replica-exchange, thermodynamic integration, and potential of mean force (PMF) calculations. Sander also includes QM/MM capability.
  4. PMEMD: This is an extensively-modified version (originally by Bob Duke) of the sander program, optimized for periodic, PME simulations, and for GB simulations. It is faster than sander and scales better on parallel machines.
  5. PTRAJ and CPPTRAJ: These are used to analyze MD trajectories, computing a variety of things, like RMS deviation from a reference structure, hydrogen bonding analysis, time-correlation functions, diffusional behavior, and so on.
  6. MM_PBSA and MM_PBSA.py: These are scripts that automate post-processing of MD trajectories, to analyze energetics using continuum solvent ideas. It can be used to break energies energies into "pieces" arising from different residues, and to estimate free energy differences between conformational basins.

There is also a mailing list available as an additional resource. What you can do with it is: you document your questions and sent to this mail address, some specialists of Amber will be assigned to reply your email and help you.

PARP

Poly (ADP-ribose) polymerase (PARP) is a family of proteins involved in a number of cellular processes involving mainly DNA repair and programmed cell death. PARP is composed of four domains of interest: a DNA-binding domain, a caspase-cleaved domain (see below), an auto-modification domain, and a catalytic domain. The DNA-binding domain is composed of two zinc finger motifs. The PARP family comprises 17 members (10 putative). They have all very different structures and functions in the cell. We're going to use PARP5b in this tutorial.

Organizing Directories

It makes things easier to organize your files in a clean and logical way. The following directory structure and naming scheme is a convenient way to organize your files. We could make these directories first before doing anything further.

~username/AMS536/Amber-Tutorial/001.system.prep/
                                002.tleap/ 
                                003.pmemd/       
                                004.ptraj/
                                005.mmgbsa/

II. Structural Preparation

Antechamber, Parmchk, tLeap

Before beginning the Molecular Dynamics protocol using AMBER, you must first set up your files. In your 001.chimera folder, you will add the following files:

 4TKG.lig.mol2
 4TKG.rec.noH.pdb
 4TKG.rec.noH.ZIN.pdb

To prepare the first two files, please see the 2025 DOCK tutorial at the following link: [1]

Note: delete any headers before the atoms/helix information.

In order to prepare 4TKG.rec.noH.ZIN.pdb, first open 4TKG.rec.noH.pdb and use the command "SHIFT + g" to reach the bottom of the pdb. The last lines of your files should look like this:

 ATOM   1648  OE1 GLU B 205      23.839 -23.190  57.747  1.00  0.00           O
 TER    1649      GLU B 205
 HETATM 1650 ZN   ZN  B 206      28.130  -3.467  55.482  1.00  0.00          Zn
 END

To make 4TKG.rec.noH.ZIN.pdb, you will need to change the "ZN" atom ID to "ZIN" so that AMBER can read the atom type.

 ATOM   1648  OE1 GLU B 205      23.839 -23.190  57.747  1.00  0.00           O
 TER    1649      GLU B 205
 ATOM   1650  ZIN ZIN B 206      28.130  -3.467  55.482  1.00  0.00          Zn
 END


To run a tleap, first generate a 002.tleap file, with the command:

 mkdir 002.tleap
 cp ~/../../tleap.lig.in

(content)

 tleap –s –f tleap.lig.in > tleap.lig.out

(error 1)

So the name of the atoms has to be redefined based on the gaff force field, antechamber can be used to solve the error.

 antechamber –i 4TKG.lig.mol2 –fi mol2 –o 4TKG.lig.gaff.mol2 –fo mol2

Change the content of tleap.lig.in to (gaff file)

Re-run tleap by:

 tleap –s –f tleap.lig.in > tleap.lig.out

(error 2)

So now run parmchk to fix the missing parameters

 parmchk –i 4TKG.lig.gaff.mol2 –f mol2 –o 4TKG.lig.ante.frcmod

Change the content of tleap.lig.in to (load amber params) run tleap.lig.in, and all the errors are fixed.

 cp ~/../../tleap.rec.in

(content)

 tleap –s –f tleap.rec.in > tleap.rec.out

(error)

For the missing parameters, cp the two ionparam from ../000.amberfiles/

 cp ../000.amberfiles/ions.lib
 cp ../000.amberfiles/ions.frcmod

Change the content of the tleap.rec.in to (load param)

run tleap.rec.in, and all the errors are fixed. (explain)

 cp ~/../../tleap.com.in
 tleap –s –f tleap.com.in > tleap.com.out


This whole process can be run on a cluster by using the following script:

III. Simulation using pmemd

IV. Simulation Analysis

Ptraj

MM-GBSA Energy Calculation

Molecular Mechanics-Generalized Born Surface Area (MM-GBSA) is a great method to calculate or estimate relative binding affinity of a ligand(s) to a receptor. The binding energy calculated from this method are also known as free energies of binding, where the more negative values indicate stronger binding. For this section, the topology files for the ligand, receptor and complex are needed.

Create a new directory:

 mkdir 005.MMGBSA

Create an input file name

 vim gb.rescore.in

Enter the following into the input file:

  Single point GB energy calc
&cntrl
 ntf    = 1,        ntb    = 0,        ntc    = 2,
 idecomp= 0,
 igb    = 5,        saltcon= 0.00,
 gbsa   = 2,        surften= 1.0,
 offset = 0.09,     extdiel= 78.5,
 cut    = 99999.0,  nsnb   = 99999,
 imin   = 5,        maxcyc = 1,        ncyc   = 0,
 /

Create a tcsh/bash/csh script (run.sander.rescore.csh) with the following information:

 #! /bin/tcsh
 #PBS -l nodes=1:ppn=1
 #PBS -l walltime=48:00:00
 #PBS -o zzz.qsub.out
 #PBS -e zzz.qsub.err
 #PBS -V
 #PBS -N mmgbsa
 
 set workdir = /nfs/user03/kbelfon/amber_tutorial/005.mmgbsa
 
 cd $workdir
 
 sander -O -i gb.rescore.in \
 -o gb.rescore.out.com \
 -p ../002.tleap/4TKG.com.gas.leap.prm7 \
 -c ../002.tleap/4TKG.com.gas.leap.rst7 \
 -y ../004.ptraj/4TKG.com.trj.stripfit \
 -r restrt.com \
 -ref ../002.tleap/4TKG.com.gas.leap.rst7 \
 -x mdcrd.com \
 -inf mdinfo.com

 sander -O -i gb.rescore.in \
 -o gb.rescore.out.lig \
 -p ../002.tleap/4TKG.lig.gas.leap.prm7 \
 -c ../002.tleap/4TKG.lig.gas.leap.rst7 \
 -y ../004.ptraj/4TKG.lig.trj.stripfit \
 -r restrt.lig \
 -ref ../002.tleap/4TKG.lig.gas.leap.rst7 \
 -x mdcrd.lig \
 -inf mdinfo.lig 
 
 sander -O -i gb.rescore.in \
 -o gb.rescore.out.test.rec \
 -p ../002.tleap/4TKG.rec.gas.leap.prm7 \
 -c ../002.tleap/4TKG.rec.gas.leap.rst7 \
 -y ../004.ptraj/4TKG.rec.trj.stripfit \
 -r restrt.rec \
 -ref ../002.tleap/4TKG.rec.gas.leap.rst7 \
 -x mdcrd.rec \
 -inf mdinfo.rec 
 
 exit

Execute this script on the seawulf cluster or machine(s) of your preference

 qsub run.sander.rescore.csh

Three output files will be generated once the job is completed: gb.rescore.out.com, gb.rescore.out.lig, and gb.rescore.out.rec These files represent the single point energy calculation results for the complex (.com), the ligand (.lig) and the receptor (.rec). The energy will be output by the program Sander for each frame specified in the input file. The final results for one frame in one of the three files should look as the following:

                                   FINAL RESULTS
  
  
   
   NSTEP       ENERGY          RMS            GMAX         NAME    NUMBER
      1       5.9132E+03     2.0005E+01     1.2640E+02     C         159
  
 BOND    =      661.8980  ANGLE   =     1751.7992  DIHED      =     2581.7692
 VDWAALS =    -1696.6585  EEL     =   -13958.9335  EGB        =    -3125.9524
 1-4 VDW =      747.0185  1-4 EEL =     7750.8118  RESTRAINT  =        0.0000
 ESURF   =    11201.4791
minimization completed, ENE= 0.59132314E+04 RMS= 0.200047E+02

Extracting Data from MM-GBSA calculation and calculating Free energy of Binding From the output files above:

VDWAALS = ΔGvdw

EELS = ΔGcoul

EGB = ΔGpolar

SASA = ESURF

With this information ΔGnonpolar can be solved using equation(1):

ΔGnonpolar = SASA*0.00542 + 0.92 (1)

Once ΔGnonpolar is solved then ΔGmmgbsa can be determined by equation(2):

ΔGmmgbsa = ΔGvdw + ΔGcoul + ΔGpolar + ΔGnonpolar (2)

Solve equation 2 and 3 using the extracted information from all three output files. So therefore you should have ΔGmmgbsa for the complex, ligand and receptor

Finally ΔΔGbind can be calculated using equation (3):

ΔΔGbind = ΔGmmgbsa,complex – (ΔGmmgbsa,lig + ΔGmmgbsa,rec) (3)

Plot your ΔΔGbind and examine the plot for changes in the ligand position and the ΔΔGbind. Also, you should calculate the mean and standard deviation for your ΔΔGbind.

The following script (run.extract.terms.csh) can be used to extract the energy from the three output files obtained above and calculate ΔΔGbind:

 #! /bin/bash
 # by Haoquan
 echo com lig rec > namelist
 LIST=`cat namelist`
 for i in $LIST ; do
 grep VDWAALS gb.rescore.out.$i | awk '{print $3}' > $i.vdw
 grep EGB     gb.rescore.out.$i | awk '{print $9}' > $i.polar
 grep EELS    gb.rescore.out.$i | awk '{print $6}' > $i.coul
 grep ESURF   gb.rescore.out.$i | awk '{print $3 * 0.00542 + 0.92}' > $i.surf
 paste -d " " $i.vdw $i.polar $i.surf $i.coul | awk '{print $1 + $2 + $3 + $4}' > data.$i
 rm $i.*
 done
 paste -d " " data.com data.lig data.rec | awk '{print $1 - $2 - $3}' > data.all 
 for ((j=1; j<=`wc -l data.all | awk '{print $1}'`; j+=1)) do
 echo $j , >> time
 done
 paste -d " " time data.all > MMGBSA_vs_time.dat  
 rm namelist time data.*

Execute this script:

 bash get.mmgbsa.sh

A text file called MMGBSA_vs_time.dat with x and y values separated by a space and comma should be created. Use XMGRACE to plot this dat file using the following command in Linux:

 xmgrace MMGBSA_vs_time.dat

V. Frequently Encountered Problems