Nonstandard residues prep

From Rizzo_Lab
Jump to: navigation, search

This tutorial is created by and for AMS536 class at Stony Brook University and made available for the use of the whole modeling community.

The following are examples of a work flow of how to assign parameters for non-standard residues compatible with the amber program. This procedure will give you reasonable parameters; however, the charge calculation should use higher levels of theory and the other parameters should be optimized Further perhaps fit to QM or optimizing to reproduce experimental observables. Parameters assigned from gaff.

Fluorophenylalanine

To prepare the mutant structure for Molecular Dynamics, a copy of the phenylalanine residue from the raw pdb file for 1QP6 was made and subsequently modified using a text editor to include the name FPH, rather than PHE. Furthermore, the five hydrogen atoms on the phenylalanine residue named ‘H’ for hydrogen, were replaced with five F’s, for fluorine. Using the program, Chimera, hydrogens were added to this modified residue and the C-F bond length was defined as 1.35 Angstroms. Also using Chimera, which has a built-in antechamber module, charges were assigned using the charge model, AM1-BCC. The .mol2 file generated was used as an input to create a new pdb file and a prep file. Using Parmchk, the prep file, which establishes connectivity between atoms, was used as an input to create the frcmod file, which stores additional parameters. The prep file was manually modified to contain AMBER parameters from the PHE.prep file in the AMBER library for the backbone atoms of the mutant residue. The resulting structure is a hybrid of the two force fields(FF99SB for the backbone, whose parameters were taken from a PHE backbone, and GAFF for the side-chain), and for this reason, additional parameters, including the bond lengths, angles, and dihedral angles had to be specified and stored in the frcmod file. A Table of Additional Parameters (bond angles)stored in the frcmod file is shown below.

Angle θ Source
c3-CT-C 63.8 110.53 GAFF: c-c3-c3
H1-CT-c3 46.4 110.05 GAFF: c3-c3-ha
CT-c3-hc 46.4 110.05 GAFF: c3-c3-ha
CT-c3-ca 62.5 114.61 GAFF: c3-c3-ca
N-CT-c3 80.0 109.70 PARM99: CT-CT-N

Below is the prep file created:

FLUOROPHENYLALANINE
molecule.res
 FPH  INT     0
 CORR OMIT DU   BEG
   0.00000
   1  DUMM  DU    M    0  -1  -2     0.000     0.000     0.000   0.00000
   2  DUMM  DU    M    1   0  -1     1.449     0.000     0.000   0.00000
   3  DUMM  DU    M    2   1   0     1.522   111.100     0.000   0.00000
   4  N     N     M    3   2   1     1.335   116.600   180.000  -0.41570
   5  H     H     E    4   3   2     1.010   -10.200     0.000   0.27190
   6  CA    CT    M    4   3   2     1.449   121.900   180.000  -0.00240
   7  HA    H1    E    6   4   3     1.090   -109.50   300.000   0.09780
   8  CB    c3    3    6   4   3     1.533   110.759  -133.834  -0.01010
   9  HB2   hc    E    8   6   4     1.090   112.115  -149.730   0.08300
  10  HB3   hc    E    8   6   4     1.090   111.978   -26.211   0.08300
  11  CG    ca    S    8   6   4     1.505   109.715    91.984  -0.10820
  12  CD1   ca    B   11   8   6     1.410   119.856    69.106   0.12160
  13  FD1   f     E   12  11   8     1.350   119.921    -0.168  -0.10880
  14  CE1   ca    B   12  11   8     1.410   120.090   179.880   0.07260
  15  FE1   f     E   14  12  11     1.350   120.094  -179.943  -0.10190
  16  CZ    ca    B   14  12  11     1.410   119.913     0.059   0.09970
  17  FZ    f     E   16  14  12     1.350   119.912   179.922  -0.09840
  18  CE2   ca    B   16  14  12     1.414   120.110    -0.071   0.07280
  19  FE2   f     E   18  16  14     1.350   120.153  -179.940  -0.10140
  20  CD2   ca    S   18  16  14     1.412   119.865     0.062   0.12490
  21  FD2   f     E   20  18  16     1.350   119.987   179.997  -0.10980
  22  C     C     M    6   4   3     1.522   111.100   180.000   0.59730
  23  O     O     E   22   6   4     1.229   120.500     0.000  -0.56790


LOOP
 CG   CD2

IMPROPER
 -M   CA   N    H
 CA   +M   C    O
 CG   CE2  CD2  FD2
 CD2  CZ   CE2  FE2
 CE1  CE2  CZ   FZ
 CD1  CZ   CE1  FE1
 CG   CE1  CD1  FD1
 CD1  CD2  CG   CB

DONE
STOP

non-starandard Nucleic acids with modified backbone

The following procedure is a method of preparation for running MD simulations on a non-standard nucleic acid in AMBER. Although this was initially meant for application on a nucleic acid analog, in principle it can work with all types of molecules which are too large for the usual Antechamber parameterization.

1.With a PDB structure of the desired molecule at hand, you will have to load it into Chimera, and assign it gasteiger charges. This charge model is used because for a system larger than 60 heavy atoms, an AM1-BCC charge model would not be compatible.

2.With newest version of Chimera, save the file as a mol2 molecule, with AMBER/Gaff atom types. Having this option on is very important, because by default Chimera will give a mol2 file sybyl atom types, and TLEAP which will be used later will not be able to recognize it. Older versions of Chimera will not have this option of AMBER/Gaff atom types.

3.Now that we have all the connection/topology and charge information contained in the mol2 file, we only need one more element before we load the molecule into TLEAP, that is a frcmod file. For unknown dihedral angles types, we need to assign it force constants, and with the command

parmchk -i <nucleicacid>.mol2 -f mol2 -o CFNA.frcmod

a frcmod file with all the force constants for the unknown dihedral angle types will be generated. Note that a message may show up saying maximum number of atoms reached, however this is ok, parmchk will relocate memory, and will be successful in generating the complete frcmod file.

4.With the mol2 file and frcmod file at hand, using the key command loadmol2, a set of amber parameters will be saved. A specific example of how to write the commands as a script is as follows

 set default PBradii mbondi2
 source leaprc.ff99SB
 source leaprc.gaff
 loadamberparams gaff.dat.rizzo
 loadamberparams <nucleicacid>.frcmod
 file = loadmol2 <nucleicacid>.mol2
 saveamberparm file <nucleicacid>.gas.prm7 <nucleicacid>.gas.rst7
 charge file
 quit

5.So the generated prm7 would be the amber parameter file, which one can use for the MD simulations.


An alternative method would be a residue-wise parameterization approach, given that a large molecule consists of repeating units, such as a DNA analog in our case. However this is tricky, as since this method employs prep files, which uses a z-matrix for defining atom connections. A general description is as follows.

1.Build each standalone residue structure, with hydrogens capped at the connection points.

2.Now assign each residue with AM1-BCC charges in Chimera, save it as a mol2 file.

3.Use Antechamber to generate prep files for every residue. The actual command will be

antechamber -i <residue>.mol2 -fi mol2 -o <residue>.prep -fo prepi

4.Now we have at hand the prep files of both hydrogen capped and heavy atom capped residues, open up the hydrogen capped preps. Delete the whole line of the capping hydrogens. Try to redistribute the resulting deleted charges over to other atoms, so that we have a neutral molecule(or whatever integral charge it had initially).

5.Now here's the tricky part. Notice the beginning of the prep file, there are some dummy atoms, and then comes your first actual atom. In an example of a prep file for standard DNA, the phosphorous wants to connect to atom number 3, with bond length 1.60 angstroms, bond angle of 119.04 with 4-3-2, dihedral angle of 200.00 with 4-3-2-1. See table.

 D-ADENOSINE - with 5' - phosphate group and 3' - O(minus) group
 CORRECT OMIT DU   BEG
  0.0
  1   DUMM  DU    M    0  -1  -2    0.00      0.00      0.00       0.0000
  2   DUMM  DU    M    1   0  -1    1.00      0.00      0.00       0.0000
  3   DUMM  DU    M    2   1   0    1.00     90.00      0.00       0.0000
  4   P     P     M    3   2   1    1.60    119.04    200.00       1.1662
  5   O1P   O2    E    4   3   2    1.48    109.61    150.00      -0.7760
  6   O2P   O2    E    4   3   2    1.48    109.58     20.00      -0.7760
  7   O5'   OS    M    4   3   2    1.60    101.43    -98.89      -0.4989
  8   C5'   CT    M    7   4   3    1.44    119.00    -39.22       0.0558
  9   H5'1  H1    E    8   7   4    1.09    109.50     60.00       0.0679
 10   H5'2  H1    E    8   7   4    1.09    109.50    -60.00       0.0679
 11   C4'   CT    M    8   7   4    1.52    110.00    180.00       0.1065
 12   H4'   H1    E   11   8   7    1.09    109.50   -200.00       0.1174
 13   O4'   OS    S   11   8   7    1.46    108.86    -86.31      -0.3548
 14   C1'   CT    B   13  11   8    1.42    110.04    105.60       0.0394
 15   H1'   H2    E   14  13  11    1.09    109.50   -240.00       0.2007
 16   N9    N*    S   14  13  11    1.52    109.59   -127.70      -0.0251
 17   C8    CK    B   16  14  13    1.37    131.20     81.59       0.2006
 18   H8    H5    E   17  16  14    1.08    120.00      0.00       0.1553
 19   N7    NB    S   17  16  14    1.30    113.93    177.00      -0.6073
 20   C5    CB    S   19  17  16    1.39    104.00      0.00       0.0515
 21   C6    CA    B   20  19  17    1.40    132.42    180.00       0.7009
 22   N6    N2    B   21  20  19    1.34    123.50      0.00      -0.9019
 23   H61   H     E   22  21  20    1.01    120.00    180.00       0.4115
 24   H62   H     E   22  21  20    1.01    120.00      0.00       0.4115
 25   N1    NC    S   21  20  19    1.34    117.43    180.00      -0.7615
 26   C2    CQ    B   25  21  20    1.33    118.80      0.00       0.5875
 27   H2    H5    E   26  25  21    1.08    120.00    180.00       0.0473
 28   N3    NC    S   26  25  21    1.32    129.17      0.00      -0.6997
 29   C4    CB    E   28  26  25    1.35    110.80      0.00       0.3053
 30   C3'   CT    M   11   8   7    1.53    115.78   -329.11       0.2022
 31   H3'   H1    E   30  11   8    1.09    109.50     30.00       0.0615
 32   C2'   CT    B   30  11   8    1.53    102.80    -86.30       0.0670
 33   H2'1  H1    E   32  30  11    1.09    109.50    120.00       0.0972
 34   O2'   OH    S   32  30  11    1.43    109.50    240.00      -0.6139
 35   HO'2  HO    E   34  32  30    0.96    107.00    180.00       0.4186
 36   O3'   OS    M   30  11   8    1.42    116.52   -203.47      -0.5246

So what this means is that, when these prep files get loaded into TLEAP, TLEAP will according to your given residue sequence(of the pdb file of your complete molecule you have loaded in also, say ATCGGTCA) try to build a structure of the same sequence, according to the preps it has, and verify the built structure with your pdb structure. And if it matches, we are happy, and the MD is ready to run. It is suppose to match, so the connection points are critical. Notice the dummy atoms have an "M" mark after "DU", therefore dummy atom 3, 2, 1, actually corresponds to the first, second and third atom also with the "M" mark, in the neighboring residue(to make things simple let's think of it as also being the same adenosine), counting from bottom up. So with an adenosine-adenosine connection, see how O3', C3' and C4' actually fits the criteria of both being the first 3 atoms bearing "M" in the bottom up count order, and being the actual connection atoms, which define the bond length, bond angle and dihedral angle in DNA. Compare table and figure and you will see.

DNA.png

In our prep file for our own residues, we will need to get the connection information correct, just like DNA in the AMBER library. Antechamber generates a prep file with atoms in a arbitrary order, we will have to tweak the four connection atoms, to make atom number 4 our terminal atom(just like phosphorous in DNA), the other 3 connection defining atoms in the correct relative order(just like how O3', C3' and C4' is relatively sequenced and marked with "M"), and all marking them with "M". Change any interfering "M"s to E. Bond length, angle and dihedral angle numbers will have to be obtained manually from visualization software like MOE, and input into atom row 4. Fiddling with the z-matrix is not easy, and one is very likely to make errors, so a facile approach would be the first method mentioned. However this second method may be slightly more accurate in terms of charge model, and once we have parameterized all the residues for a new species of nucleic acid analog or some residue-based structure, we can build a molecule of its species in any arbitrary residue sequence.

6.Finally, run command

parmchk -i <residue>.prep -f prepi -o <residue>.frcmod

and we have both the prep and frcmod file at hand.

7.An example of a script of commands to be used in TLEAP to generate the AMBER parameters as follows:

 set default PBradii mbondi2
 source leaprc.ff99SB
 source leaprc.gaff
 loadamberparams <residue>.frcmod
 (...repeat for every residue)
 loadamberprep <residue>.prep
 (...repeat for every residue)
 file = loadpdb <nucleicacid>.pdb
 saveamberparm file <nucleicacid>.gas.prm7 <nucleicacid>.gas.rst7
 charge lig
 quit