Difference between revisions of "2013 DOCK tutorial with Orotodine Monophosphate Decarboxylase"

Latest revision as of 17:26, 14 January 2015

For additional Rizzo Lab tutorials see DOCK Tutorials. Use this link Wiki Formatting as a reference for editing the wiki. This tutorial was developed collaboratively by the AMS 536 class of 2013, using DOCK v6.6. It is assumed that users of this tutorial have a basic understanding of the command line and are able to navigate a Unix operating system.

I. Introduction

DOCK

DOCK is a molecular docking program used in drug discovery. It was developed by Irwin D. Kuntz, Jr. and colleagues at UCSF (see UCSF DOCK). This program, given a protein binding site and a small molecule, tries to predict the correct binding mode of the small molecule in the binding site, and the associated binding energy. Small molecules with highly favorable binding energies could be new drug leads. This makes DOCK a valuable drug discovery tool. DOCK is typically used to screen massive libraries of millions of compounds against a protein to isolate potential drug leads. These leads are then further studied, and could eventually result in a new, marketable drug. DOCK works well as a screening procedure for generating leads, but is not currently as useful for optimization of those leads.

DOCK 6 uses an incremental construction algorithm called anchor and grow. It is described by a three-step process:

Rigid portion of ligand (anchor) is docked by geometric methods.
Non-rigid segments added in layers; energy minimized.
The resulting configurations are 'pruned' and energy re-minimized, yielding the docked configurations.

Orotodine Monophosphate Decarboxylase

The protein receptor which is the subject of this tutorial is orotodine monophosphate decarboxylase (OMP), a homodimeric protein from the organism Methanobacterium thermoautotrophicum. OMP is involved in the biosynthesis of several pyrimidines including uridine monophosphate (UMP), the ligand used in this tutorial. UMP is a pyrimidine-based nucleotide monomer of RNA. The structure used for this tutorial can be found at the Protein Data Bank under accenssion number 1LOQ.

Organizing Directories

While performing docking, it is convenient to create a project directory (here we call it ~AMS536/dock-tutorial/), and adopt a standard directory structure / naming scheme, so that files are easy to find and identify. For this tutorial, we will use something similar to the following:

~username/AMS536/dock-tutorial/00.files/
                              /01.dockprep/
                              /02.surface-sphgen/
                              /03.box-grid/
                              /04.dock/
                              /05.mini-virtual-screen/
                              /06.database-filter/
                              /07.virtual-screen/

Before beginning the tutorial, it may be easiest to make all of the sub-directories at the same time. From here on in, these subdirectories will be referred to as ~00.files/, ~01.dockprep/ etc. In addition, most of the important files that are derived from the original crystal structure will be given a prefix that is the same as the PDB code, '1LOQ'.

II. Preparing the Receptor and Ligand

Downloading the PDB file (1LOQ)

Go to PDB homepage (http://www.rcsb.org/pdb/home/home.do ) enter the protein ID (1LOQ) in the search bar, click Download Files in the top-right of the webpage, then select PDB File (text). In the new window, save the file in your ~00.files/ folder.

Preparing the ligand and receptor in Chimera

NOTE: If you are accessing files remotely (i.e. through a terminal) and want to use a visualization program such as Chimera, first transfer the files to your local computer, run Chimera, and transfer the files back.

In this section, we will create four new files and save them in the ~01.dockprep/ folder:

 1LOQ.dockprep.mol2      (complete system with hydrogens and charges)
 1LOQ.receptor.mol2      (the receptor alone)
 1LOQ.receptor.noH.pdb   (the receptor alone, without hydrogens)
 1LOQ.ligand.mol2        (the ligand alone)

To prepare these files, first make a copy of the original PDB file in the ~00.files/ folder (named, for example, 1LOQ.copy.pdb) and open it with VIM ($ vim 1LOQ.copy.pdb). Because the residue name of the ligand (U) will give us some problems when assigning charges, change the residue name "U" to "LIG" starting at line 2082. Here is an example command that will change all instances of " U" into "LIG", while preserving the correct spacing. Note there are two blank spaces before U in the following expression:

 %s/  U/LIG/gc

For this command, g is short for global and c is short for check with the user before making the change.

Next, open up the PDB file (1LOQ.copy.pdb) in Chimera. To delete water molecules and other ligands, click Tools -> Structure Editing -> Dock Prep. Check all boxes and click Okay to the end. Alternatively, the waters can be deleted manually by choosing Select -> Residue -> HOH, then go to Actions -> Atoms/Bonds -> Delete. Hydrogen atoms can be added manually by choosing Tools -> Structure Editing -> Add H.

Next, to add charges to the ligand and receptor, go to Select -> Residue -> LIG, then go to Tools -> Structure Editing -> Add Charge. Choose AMBER ff99SB as the charge model, click Okay, and when prompted chose AM1-BCC charges for the ligand, and make sure the Net Charge is set to -1. (You must consider the chemistry of the ligand when assigning a charge state).

Finally, save this file as 1LOQ.dockprep.mol2 into your ~01.dockprep/ folder.

Generating the final files

To create the receptor file: Open 1LOQ.dockprep.mol2, click Select -> Residue -> LIG. Then click Actions -> Atoms/Bonds -> Delete. Save the file as 1LOQ.receptor.mol2.

To create the receptor file with no hydrogen atoms: Open 1LOQ.receptor.mol2, click Select -> Chemistry -> Element -> H, then chose Actions -> Atoms/Bonds -> Delete. Save the file as 1LOQ.receptor.noH.pdb.

To create the ligand file: Open 1LOQ.dockprep.mol2, click Select -> Residue -> LIG, then click Select -> Invert, then chose Actions -> Atoms/Bonds -> Delete. Save the file as 1LOQ.ligand.mol2.

III. Generating Receptor Surface and Spheres

Receptor Surface

The next section of this tutorial will be performed in your ~02.surface-sphgen/ folder.

In order to generate the receptor surface one can perform the following steps (working in Chimera):

Go File -> Open and choose the PDB file of the protein containing no hydrogens (1LOQ.receptor.noH.pdb) from 01.dockprep
Further, Actions -> Surface -> Show
Go Tools -> Structure Editing -> Write DMS in order to obtain a dms file, which we will need while placing spheres
In the new window save the surface as 1LOQ.receptor.dms

Receptor surface

Placing Spheres

For more information SPHGEN, please consult the DOCK online owners manual at:

http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm

In order to place the spheres on the receptor surface, the following steps are required:

1. Create a file called INSPH (an input file we will need later) and fill it out as follows (parameter descriptions included to the right), then save it.

1LOQ.receptor.dms
R
X
0.0
4.0
1.4
1LOQ.receptor.sph

Here is a line-by-line break down on what each line does:

1LOQ.receptor.dms - molecular surface file that we got from previous step
R - sphere outside of surface (R) or inside surface (L)
X - specifies subset of surface points to be used (X=all points)
0.0 - prevents generation of large spheres with close surface contacts(defalut=0.0)
4.0 - maximum sphere radius in angstroms (default=4.0)
1.4 - minimum sphere radius in angstroms (default=radius of probe)
1LOQ.receptor.sph - clustered spheres file that we want to generate

2. Run the sphgen program from the command line:

sphgen -i INSPH -o OUTSPH

OUTSPH is the output file containing the information on the sphgen run, while 1LOQ.receptor.sph is the file which contains the spheres.

3. (optional) If one wants to have a look at the spheres received as an outcome of sphgen, one should run showsphere, by typing in the command line:

showsphere

The following questions will be prompted on the command line:

Enter name of sphere cluster file:
Enter cluster number to process (<0 = all):
Generate surfaces as well as pdb files (<N>/Y)?
Enter name for output file prefix:
Process cluster 0 (contains ALL spheres) (<N>/Y)?

As the questions have been answered the program will generate the spheres, writing the data in a PDB file, afterwards the spheres may be observed by opening this file in Chimera.

4. Run the sphere_selector program from the command line as follows:

sphere_selector 1LOQ.receptor.sph ../01.dockprep/1LOQ.ligand.mol2 8.0

This program selects the spheres within a user-defined radius (8.0 here) of a target molecule from the previously obtained file 1LOQ.receptor.sph. As a result a new file selected_spheres.sph will be generated. Again, the spheres can and should be visualized by using showsphere and Chimera.

5. Run showsphere in order to visualize the spheres:

showsphere

When prompted on the command line, answering the questions as follows:

selected_spheres.sph
-1
N
output_spheres_selected
N

Launch Chimera, choose File -> Open, choose 1LOQ.receptor.noH.pdb
Go File -> Open, choosing output_spheres_selected.pdb
Go Select -> Residue -> SPH
Finally, Actions -> Atoms/Bonds -> sphere

The picture like one placed below will be obtained afterwards.

Selected spheres

IV. Generating Box and Grid

This section of the tutorial will be performed in your ~03.box-grid/ folder. For more information GRID, please consult the DOCK online owners manual at:

http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm

Box

We will create a box around the binding site to define the boundaries of ligand sampling and aid in energy calculations. As a general rule, we will create a box around the spheres (from the previous step) with margins of at least 8.0 Angstroms on all sides. You first need to create a showbox.in file to generate the box:

vim showbox.in

Edit this file as following:

Y                                           # Yes, generate a box
8.0                                         # box margin
../02.surface-sphgen/selected_spheres.sph   # input sphere files
1                                           # which sphere cluster to use
1LOQ.box.pdb                                # output file name

Note: Do not include the comments in the showbox.in file. To run the program showbox, give the name of the program on the command line and provide the showbox.in file:

showbox < showbox.in

Box Image

Grid

Grids are used as a method to speed up evaluation of the energy scoring function by pre-computing electrostatic and VDW potential energies on the receptor at defined grid points within the box. VDW parameters are stored in a file that was installed with DOCK, and we will find it convenient to copy it and two other files to our ~00.files/ folder. In this class, DOCK was installed in the following location:

/home/wjallen/AMS536/local/dock6/

You can determine the path to your DOCK installation by typing:

which dock6

From that location, copy the following three files to your local ~00.files/ folder:

/home/wjallen/AMS536/local/dock6/parameters/vdw_AMBER_parm99.def
/home/wjallen/AMS536/local/dock6/parameters/flex.defn
/home/wjallen/AMS536/local/dock6/parameters/flex_drive.tbl

Use grid program to generate the grid and answer the questions as following:

touch grid.in
grid -i grid.in

compute_grids                  yes
grid_spacing                   0.4
output_molecule                no
contact_score                  no
energy_score                   yes
energy_cutoff_distance         9999
atom_model                     a
attractive_exponent            6
repulsive_exponent             9
distance_dielectric            yes
dielectric_factor              4
bump_filter                    yes
bump_overlap                   0.75
receptor_file                  ../01.dockprep/1LOQ.receptor.mol2
box_file                       ../03.box-grid/1LOQ.box.pdb
vdw_definition_file            ../00.files/vdw_AMBER_parm99.defn
score_grid_prefix              1LOQ.grid

Grid Image

V. Docking a Single Molecule for Pose Reproduction

This part of the tutorial will be performed in your ~04.dock/ folder. For more information DOCK, please consult the DOCK online owners manual at:

http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm

Rigid Docking

When docking a molecule rigidly in DOCK, no conformations or rotamers will be sampled other than the input configuration which, in our case, was derived from the crystal structure. This will be much faster that flexible docking, but it is not appropriate for virtual screening because the docked conformation of compounds from a library is generally not known. First create an empty input file for docking:

touch dock.in

Then execute DOCK6:

dock6 -i dock.in

You will be prompted for answers to many questions. For the rigid docking experiments, suggested parameters are as follows:

ligand_atom_file                                             ../01.dockprep/1LOQ.ligand.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               no 
use_database_filter                                          no
orient_ligand                                                yes 
automated_matching                                           yes
receptor_site_file                                           ../02.surface-sphgen/selected_spheres.sph
use_internal_energy                                          no
flexible_ligand                                              no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           yes
grid_score_secondary                                         no
grid_score_rep_rad_scale                                     1
grid_score_vdw_scale                                         1
grid_score_es_scale                                          1
grid_score_grid_prefix                                       ../03.box-grid/1LOQ.grid
multigrid_score_secondary                                    no 
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_descriptor_score_secondary                              no
amber_score_secondary                                        no
minimize_ligand                                              no
atom_model                                                   all
vdw_defn_file                                                ../00.files/vdw_AMBER_parm99.defn
flex_defn_file                                               ../00.files/flex.defn
flex_drive_file                                              ../00.files/flex_drive.tbl
ligand_outfile_prefix                                        output
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

With these input parameters, the rigid docking experiment will result in a single scored ligand pose. Open Chimera to visualize the docking result. After opening the receptor file in surface view, open the ligand file (named "output_scored.mol2") by going under Tools -> Surface/Binding Analysis -> ViewDock.

Rigid docking result

Flexible Docking

Flexible docking will take longer, but is necessary for virtual screening. For flexible docking, several input parameters need to be changed, and these changes will generate additional questions prompted to the user. Some changes to parameters are shown below:

calculate_rmsd                                               yes
flexible_ligand                                              yes
use_internal_energy                                          yes
minimize_ligand                                              yes
num_scored_conformers                                        10

Once these changes have been made to the input file, execute DOCK6 again:

dock6 -i dock.in

This flexible ligand protocol can be used with multiple ligand input files. The docked ligand poses can be compared using RMSD (Root Mean Squared Deviation). RMSD is a measurement of the distance between atoms on the original crystal structure ligand with the newly generated ligand conformation. An RMSD under 2 Angstroms would indicate a good conformation. Below are two examples of a final docked ligand in the receptor binding site pocket. The worst and best energy scored ligands are shown below. The original ligand is in cyan and the docked ligand is colored in heteroatom.

Best scored ligand result (-78.3 kcal/mol)

Worst scored ligand result (-67.4 kcal/mol)

VI. Virtual Screening

Virtual Screening Introduction

A virtual screen of various ligand allows for the comparison of both qualitative (e.g. position in binding site) and quantitative (e.g. energy scores) data pertaining to the each screened ligand with an originally docked molecule. Virtual screening is often used as a method to cut the cost of experimentation by narrowing down the ligands within a database and predicting which will exhibit the most favorable binding to a specific protein (with a pre-determined .grid file).

Running the Database Filter

When first beginning the virtual screen, a filter should be specified in order save computational power by only screening potentially effective ligands within the directory ~06.database-filter/. This involves limiting several variables of ligands specified in a .mol2 file. To limit the ligands from the original database, a file "dock_filter.in" is created and executed using DOCK. Note that in this example the file "cdiv_p0.0.mol2" is a multi mol2 file containing thousands of ligands from the CHEMDIV vendor which was previously downloaded from the UCSF ZINC database (https://zinc.docking.org/about/ucsf). ZINC is a great resource of purchasable compounds for virtual screening. Here are the contents of the file:

ligand_atom_file                                             /home/wjallen/AMS536/multi-mol2-files/cdiv_p0.0.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               no
use_database_filter                                          yes
dbfilter_max_heavy_atoms                                     50
dbfilter_min_heavy_atoms                                     0
dbfilter_max_rot_bonds                                       10
dbfilter_min_rot_bonds                                       0
dbfilter_max_molwt                                           9999.0
dbfilter_min_molwt                                           0.0
dbfilter_max_formal_charge                                   -1.0
dbfilter_min_formal_charge                                   -2.0
orient_ligand                                                no
use_internal_energy                                          no
flexible_ligand                                              no
bump_filter                                                  no
score_molecules                                              no
atom_model                                                   all
vdw_defn_file                                                ../00.files/vdw_AMBER_parm99.defn
flex_defn_file                                               ../00.files/flex.defn
flex_drive_file                                              ../00.files/flex_drive.tbl
ligand_outfile_prefix                                        filtered
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

In this file some important characteristics of the ligands that are limited are dbfilter_max_heavy_atoms, dbfilter_max_rot_bonds, dbfilter_max_molwt, and both dbfilter_max_formal_charge and dbfilter_min_formal_charge. As show in the text of the file, a file called filtered_scored.mol2 will be created with all of the ligands from the original database that fit into the specified limits of number of heavy atoms, number of rotatable bonds, molecular weight, and formal charge. Note: The max and min formal charge specified in this file (-1 to -2) are an extremely limited range, and are only used in this tutorial as an illustration.

Running the Virtual Screen

After creating a new directory, ~07.virtual-screen/, a dock.in file must be created. So as not to confuse the virtual screen file with the original dock.in of the docked ligand, the new file created is called dock_vs.in. Running this file in dock6 allows us to fill in the important details of the file that terminate in a virtual screening of the input ligand.

ligand_atom_file                                             ../06.database-filter/filtered_scored.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               no
use_database_filter                                          no
orient_ligand                                                yes
automated_matching                                           yes
receptor_site_file                                           ../02.surface-sphgen/selected_spheres.sph
max_orientations                                             1000
critical_points                                              no
chemical_matching                                            no
use_ligand_spheres                                           no
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
flexible_ligand                                              yes
user_specified_anchor                                        no
limit_max_anchors                                            no
min_anchor_size                                              5
pruning_use_clustering                                       yes
pruning_max_orients                                          1000
pruning_clustering_cutoff                                    100
pruning_conformer_score_cutoff                               100
use_clash_overlap                                            no
write_growth_tree                                            no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           yes
grid_score_secondary                                         no
grid_score_rep_rad_scale                                     1
grid_score_vdw_scale                                         1
grid_score_es_scale                                          1
grid_score_grid_prefix                                       ../03.box-grid/1LOQ.grid
multigrid_score_secondary                                    no
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_descriptor_score_secondary                              no
amber_score_secondary                                        no
minimize_ligand                                              yes
minimize_anchor                                              yes
minimize_flexible_growth                                     yes
use_advanced_simplex_parameters                              no
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_anchor_max_iterations                                500
simplex_grow_max_iterations                                  500
simplex_grow_tors_premin_iterations                          0
simplex_random_seed                                          0
simplex_restraint_min                                        no
atom_model                                                   all
vdw_defn_file                                                ../00.files/vdw_AMBER_parm99.defn
flex_defn_file                                               ../00.files/flex.defn
flex_drive_file                                              ../00.files/flex_drive.tbl
ligand_outfile_prefix                                        output
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

A few of the questions addressed in the terminal window while running a virtual screen are as follows:

ligand_atom_file - here we have specified the new file created after filtering the initial ligand database file

calculate_rmsd - we select the default <no> because since we are screening a library of compounds, there will be no reference position to compare them to

orient_ligand - here we would like to the have the ligand oriented in the binding pocket

receptor_site_file - this is our selected_spheres.sph file from earlier

internal_energy_rep_exp - here we select 12 rather than 9 because using a repulsive exponent of 12 gives us a slightly steeper rising potential as radius between the atoms falls below the optimal distance (sigma)

flexible_ligand - when performing a virtual screen of a random database it is always advisable to set the ligands to flexible

min_anchor_size - ensures that major features on the molecules are used as anchors and not just simple small groups

score_molecules - we always want to score the molecules of a virtual screen in order to determine which are the best fit for the binding pocket

grid_score_primary - we want the grid score to be the primary method of scoring the screened ligands

grid_score_grid_prefix - the input for the .grid file created earlier

atom_model - we select the all atom model

num_scored_conformers - we only want to produce one scored conformer from each screened ligand because we only want the best conformations

Note that if you are performing a small sample virtual screening it is possible to run it on a local computer in a reasonable amount of time; however, if you are running the virtual screening to a very large scale (i.e. hundreds or thousands of ligands) you will need to use several nodes in Seawulf to produce results in a timely fashion.

Analyzing the Results

When your virtual screening has run completely and you are ready to analyze the results, open Chimera. In Chimera you can prepare your protein by opening 1LOQ.receptor.noH.pdb and choosing Actions -> Surface -> show. You may also wish to open up the original ligand file 1LOQ.ligand.mol2 in order to show the position of the original ligand/substrate in the receptor's active site.

To view the results of your virtual screening go to Tools -> Surface/Binding Analysis -> ViewDock and from their you can select your virtual screen output file. In the "ViewDock" pop-up window you can then select Column --> Show --> Grid Score which will add a column to the ViewDock window with the grid score of each ligand. You can double-click on the column heading Grid Score to sort the ligands based on their grid scores.

[Chimera image1] [Chimera image2]

VII. Running DOCK in Serial and in Parallel on Seawulf

The Seawulf Cluster is a 470-processor Linux Cluster capable of highly parallel processing. This parallel processing allows dock virtual screens to be completed in a fraction of the time as a single processor.

If you are docking multiple ligands, you can use more than one processor in parallel mode, but you should never use more processors than you have ligands. Before we can run DOCK on Seawulf, we need to copy the proper files from Herbie to Seawulf. If we CD into the AMS536 folder we can use the following command from the mathlab computer to copy all of the dock-tutorial files

scp -r /dock-tutorial/  username@herbie.mathlab.sunysb.edu:~/AMS536/

Now we have all of our DOCK preparation files and folders on the seawulf cluster.

Running DOCK in Serial on a Single Processor

Running on a single processor is very similar to running dock on the mathlab computer.

If you make a file called qsub.csh with the text:

#!/bin/tcsh                 
#PBS -l nodes=1:ppn=1        
#PBS -l walltime=01:00:00   
#PBS -N dock6           
#PBS -M user@ic.sunysb.edu 
#PBS -j oe                   
#PBS -o pbs.out            

cd /nfs/user03/zfoda/AMS536/dock-tutorial/07.virtscreen
/nfs/user03/wjallen/local/dock6/bin/dock6 -i dock.in -o dock.out

An explanation of the commands:

#!/bin/tcsh                                                 #Execute script with tcsh
#PBS -l nodes=1:ppn=1                                       #Use one node, and one processor per node, so one single processor 
#PBS -l walltime=01:00:00                                   #Allow 1 hour for your job run 
#PBS -N dock6                                               #Name of your job
#PBS -M user@ic.sunysb.edu                                  #Get an email notifying you when your job is completed
#PBS -j oe                                                  #Combine the output and error streams into a single output file
#PBS -o pbs.out                                             #Name of your output file

cd /nfs/user03/zfoda/AMS536/dock-tutorial/07.virtscreen       #Change to your home directory and folder with dock files           
/nfs/user03/wjallen/local/dock6/bin/dock6 -i dock.in -o dock.out #Specifies path to dock executable and provide input and output filenames

To submit the experiment use the command:

qsub qsub.csh

You will have submitted a DOCK experiment to the seawulf queue.

Running DOCK in Parallel using MPI

In order to run DOCK in parallel you have to use a slightly different build of DOCK6 called dock6.mpi. Message passing interface (MPI) is basically a program that allows programs like DOCK to run in parallel.

So, make another file called qsub.vs.csh with the contents:

#!/bin/tcsh
#PBS -l nodes=4:ppn=2 
#PBS -l walltime=24:00:00
#PBS -N screen
#PBS -o qsub.log
#PBS -j oe
#PBS -V

cd /nfs/user03/username/AMS536/dock-tutorial/07.virtscreen
mpirun -np 8 /nfs/user03/wjallen/local/dock6/bin/dock6.mpi -i dockvs.in -o dockvs.out

As you can see there are two major changes:

#PBS -l nodes=4:ppn=2 #Use 4 nodes, and 2 processors per node, so 8 processors

Note: since one processor is used to distribute the processes, this will run DOCK as 7 (n-1) parallel processes.

mpirun -np 8 /nfs/user03/wjallen/local/dock6/bin/dock6.mpi -i dockvs.in -o dockvs.out #this line uses mpi to run dock.mpi on multiple processors

And then we can run:

 qsub qsub.vs.csh

Difference between revisions of "2013 DOCK tutorial with Orotodine Monophosphate Decarboxylase"

Latest revision as of 17:26, 14 January 2015

Contents

I. Introduction

DOCK

Orotodine Monophosphate Decarboxylase

Organizing Directories

II. Preparing the Receptor and Ligand

Downloading the PDB file (1LOQ)

Preparing the ligand and receptor in Chimera

Generating the final files

III. Generating Receptor Surface and Spheres

Receptor Surface

Placing Spheres

IV. Generating Box and Grid

Box

Grid

V. Docking a Single Molecule for Pose Reproduction

Rigid Docking

Flexible Docking

VI. Virtual Screening

Virtual Screening Introduction

Running the Database Filter

Running the Virtual Screen

Analyzing the Results

VII. Running DOCK in Serial and in Parallel on Seawulf

Running DOCK in Serial on a Single Processor

Running DOCK in Parallel using MPI

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Rizzo Lab

Courses

Toolbox

@@ Line 1: / Line 1: @@
-For additional Rizzo Lab tutorials see [[DOCK Tutorials]]. Use this link [http://www.unilang.org/wiki/index.php/Wiki_markup Wiki Markup] as a reference for editing the wiki. This tutorial was developed collaboratively by the AMS 536 class of 2012.
+For additional Rizzo Lab tutorials see [[DOCK Tutorials]]. Use this link [http://www.mediawiki.org/wiki/Help:Formatting Wiki Formatting] as a reference for editing the wiki. This tutorial was developed collaboratively by the AMS 536 class of 2013, using DOCK v6.6. It is assumed that users of this tutorial have a basic understanding of the command line and are able to navigate a Unix operating system.
 ==I. Introduction==
@@ Line 13: / Line 13: @@
 # The resulting configurations are 'pruned' and energy re-minimized, yielding the docked configurations.
+===Orotodine Monophosphate Decarboxylase===
-===Streptavidin & Biotin===
+The protein receptor which is the subject of this tutorial is orotodine monophosphate decarboxylase (OMP), a homodimeric protein from the organism ''Methanobacterium thermoautotrophicum''. OMP is involved in the biosynthesis of several pyrimidines including uridine monophosphate (UMP), the ligand used in this tutorial. UMP is a pyrimidine-based nucleotide monomer of RNA. The structure used for this tutorial can be found at the Protein Data Bank under accenssion number [http://www.pdb.org/pdb/explore/explore.do?structureId=1LOQ 1LOQ].
-'''Streptavidin'''
-is a tetrameric prokaryote protein that binds the co-enzyme biotin with an extremely high affinity. The streptavidin monomer is composed of eight antiparallel beta-strands which folds to give a beta barrel tertiary structure. A biotin binding-site is located at one end of each β-barrel, which has a high affinity as well as a high avidity for biotin. Four identical streptavidin monomers  associate to give streptavidin’s tetrameric quaternary structure. The biotin binding-site in each barrel consists of residues from the interior of the barrel, together with a conserved Trp120 from neighboring subunit. In this way, each subunit contributes to the binding site on the neighboring subunit, and so the tetramer can also be considered a dimer of functional dimers.
-'''Biotin'''
-is a water soluble B-vitamin complex which is composed of an ureido (tetrahydroimidizalone) ring fused with a tetrahydrothiophene ring. It is a co-enzyme that is required in the metabolism of fatty acids and leucine. It is also involved in gluconeogenesis.
 ===Organizing Directories===
-While performing docking, it is convenient to adopt a standard directory structure / naming scheme, so that files are easy to find / identify. For this tutorial, we will use something similar to the following:
+While performing docking, it is convenient to create a project directory (here we call it ~AMS536/dock-tutorial/), and adopt a standard directory structure / naming scheme, so that files are easy to find and identify. For this tutorial, we will use something similar to the following:
-  ~username/AMS536/DOCK-Tutorial/00-original-files/
+  ~username/AMS536/dock-tutorial/00.files/
-                                /01-dockprep/
+                                /01.dockprep/
-                                /02-surface-spheres/
+                                /02.surface-sphgen/
-                                /03-box-grid/
+                                /03.box-grid/
-                                /04-dock/
+                                /04.dock/
-                                /05-virtual-screen/
+                                /05.mini-virtual-screen/
+                               /06.database-filter/
-In addition, most of the important files that are derived from the original crystal structure will be given a prefix that is the same as the PDB code, '1DF8'. The following sections in this tutorial will adhere to this directory structure / naming scheme.
+                               /07.virtual-screen/
+Before beginning the tutorial, it may be easiest to make all of the sub-directories at the same time. From here on in, these subdirectories will be referred to as ~00.files/, ~01.dockprep/ etc. In addition, most of the important files that are derived from the original crystal structure will be given a prefix that is the same as the PDB code, '1LOQ'.
 ==II. Preparing the Receptor and Ligand==
-===Downloading the PDB Structure (1DF8)===
+===Downloading the PDB file (1LOQ)===
-The Protein Data Bank contains atomic coordinates for more than 79,000 molecules and is accessible via the internet from [http://www.pdb.org PDB].  The target (protein or nucleic acid) is accessible by PDB ID, molecular name, or author.  After entering the PDB ID, molecule name, or author, click '''Download Files''', then select '''PDB File (text)'''.  A window will open; click on '''Save File &rarr; Downloads'''.
+Go to PDB homepage (http://www.rcsb.org/pdb/home/home.do ) enter the protein ID (1LOQ) in the search bar, click Download Files in the top-right of the webpage, then select PDB File (text). In the new window, save the file in your ~00.files/ folder.
-===Preparing the Ligand and Enzyme in Chimera===
+===Preparing the ligand and receptor in Chimera===
-If you are on "herbie" you can access Chimera directly by typing in '''Chimera''' at the prompt.  Otherwise, download Chimera to your desktop and obtain your protein/nucleic acid complex of interest using '''Fetch''' (by ID).
+NOTE: If you are accessing files remotely (i.e. through a terminal) and want to use a visualization program such as Chimera, first transfer the files to your local computer, run Chimera, and transfer the files back.
-<u> Open Your Protein of Interest and Delete Unwanted Molecules: </u>
+In this section, we will create four new files and save them in the ~01.dockprep/ folder:
-Click on '''Open''' under the '''File''' menu and find where you saved your pdb file.
+LOQ.dockprep.mol2      (complete system with hydrogens and charges)
+LOQ.receptor.mol2      (the receptor alone)
+LOQ.receptor.noH.pdb   (the receptor alone, without hydrogens)
+LOQ.ligand.mol2        (the ligand alone)
-You only need one of the monomers to perform docking.  Remove chain B by carrying out the following:
+To prepare these files, first make a copy of the original PDB file in the ~00.files/ folder (named, for example, 1LOQ.copy.pdb) and open it with VIM ($ vim 1LOQ.copy.pdb). Because the residue name of the ligand (U) will give us some problems when assigning charges, change the residue name "U" to "LIG" starting at line 2082. Here is an example command that will change all instances of "  U" into "LIG", while preserving the correct spacing.  Note there are two blank spaces before U in the following expression:
-# '''Select''' &rarr; '''Chain''' &rarr; '''B''' &rarr; '''All'''.
+  %s/  U/LIG/gc
-# '''Actions''' &rarr;'''Atoms''' &rarr; '''Delete'''.
-To delete water molecules and/or other ligands, go to '''Tools''' &rarr; '''Structure Edit''' &rarr; '''Dock Prep'''.  Check all boxes and click okay.  This will walk you through the steps needed to prepare the complex for docking, and will also assign partial charges to the protein and the ligand.  Choose am 1-bcc for charges.
+For this command, '''g'''  is short for global and '''c''' is short for check with the user before making the change.
-Save the file as 1DF8.dockprep.mol2.
+Next, open up the PDB file (1LOQ.copy.pdb) in Chimera. To delete water molecules and other ligands, click ''Tools'' -> ''Structure Editing'' -> ''Dock Prep''. Check all boxes and click ''Okay'' to the end. Alternatively, the waters can be deleted manually by choosing ''Select'' -> ''Residue'' -> ''HOH'', then go to ''Actions'' -> ''Atoms/Bonds'' -> ''Delete''. Hydrogen atoms can be added manually by choosing ''Tools'' -> ''Structure Editing'' -> ''Add H''.
-[[Image :1DF8.dockprep.barbarav.png|thumb|375px|dockprep image]]
+Next, to add charges to the ligand and receptor,  go to ''Select'' -> ''Residue'' -> ''LIG'', then go to ''Tools'' -> ''Structure Editing'' -> ''Add Charge''. Choose ''AMBER ff99SB'' as the charge model, click ''Okay'', and when prompted chose AM1-BCC charges for the ligand, and make sure the ''Net Charge'' is set to ''-1''. (You must consider the chemistry of the ligand when assigning a charge state).
-This file contains a conformation of the complex with hydrogen atoms.  The grid calculation is based on the receptor with its hydrogen atoms.  The grid score is an energy calculation that is based on the following equation:
+Finally, save this file as '''1LOQ.dockprep.mol2''' into your ~01.dockprep/ folder.
-E <sub>GRID</sub> = E <sub>VDW</sub> + E <sub>ES</sub>.  The score is an approximation of the molecular mechanics' energy function and it considers only through space interactions.
+===Generating the final files===
-<u> Create a Receptor File: </u>
+To create the receptor file: Open 1LOQ.dockprep.mol2, click ''Select'' -> ''Residue'' -> ''LIG''. Then click ''Actions'' -> ''Atoms/Bonds'' -> ''Delete''. Save the file as '''1LOQ.receptor.mol2'''.
-Go to '''Select''' &rarr; '''Residue''' &rarr; '''BTN'''.  Then go to '''Actions''' &rarr; '''Atoms''' &rarr; '''Delete'''.
+To create the receptor file with no hydrogen atoms: Open 1LOQ.receptor.mol2, click ''Select'' -> ''Chemistry'' -> ''Element'' -> ''H'', then chose ''Actions'' -> ''Atoms/Bonds'' -> ''Delete''. Save the file as '''1LOQ.receptor.noH.pdb'''.
-Save the file as 1DF8.receptor.mol2.
+To create the ligand file: Open 1LOQ.dockprep.mol2, click ''Select'' -> ''Residue'' -> ''LIG'', then click ''Select'' -> ''Invert'', then chose ''Actions'' -> ''Atoms/Bonds'' -> ''Delete''. Save the file as '''1LOQ.ligand.mol2'''.
-<u> Create a Receptor File with No Hydrogen Atoms: </u>
+==III. Generating Receptor Surface and Spheres==
-Go to '''Select''' &rarr; '''Chemistry''' &rarr; '''Element''' &rarr; '''H''' &rarr; '''Delete'''.
+=== Receptor Surface ===
-Save a PDB file as 1DF8.receptor.noH.pdb.
+The next section of this tutorial will be performed in your ~02.surface-sphgen/ folder.
-<u> Create a Ligand File: </u>
+In order to generate the receptor surface one can perform the following steps (working in Chimera):
+# Go ''File -> Open'' and choose the PDB file of the protein containing no hydrogens (1LOQ.receptor.noH.pdb) from '''01.dockprep'''
+# Further, ''Actions'' -> ''Surface'' -> ''Show''
+# Go ''Tools'' -> ''Structure Editing'' -> ''Write DMS'' in order to obtain a ''dms'' file, which we will need while placing spheres
+# In the new window save the surface as '''1LOQ.receptor.dms'''
-Open only the 1DF8.dockprep.mol2.
-Go to '''Select''' &rarr; '''Chain''' &rarr; '''Delete'''.  This will allow you to have only the ligand.
+[[Image:Receptor surface.png|thumb|center|375px|Receptor surface]]
-Save the file as 1DF8.ligand.mol2.
+=== Placing Spheres ===
-These are the files that you will need to continue with rigid or flexible docking.
+For more information SPHGEN, please consult the DOCK online owners manual at:
-==III. Generating Receptor Surface and Spheres==
+http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm
+In order to place the spheres on the receptor surface, the following steps are required:
-===Receptor Surface===
+'''1.''' Create a file called '''INSPH''' (an input file we will need later) and fill it out as follows (parameter descriptions included to the right), then save it.
-To generate an enzyme surface, first open the receptor pdb file with the hydrogen atoms removed <b>(1DF8.receptor.noH.pdb)</b>. Next, go to <b>Actions -> Surface -> Show</b>. Note that for DOCK calculation hydrogen atoms are considered, but for generating enzyme surface and spheres, it is necessary to use the protein without hydrogens.
+LOQ.receptor.dms
+ R
+ X
+.0
+.0
+.4
+LOQ.receptor.sph
-[[Image:2012_DOCK_Tutorial_1DF8_surface.png|thumb|center|375px|2012_DOCK_Tutorial_1DF8_surface]]
+Here is a line-by-line break down on what each line does:
-Recent versions of Chimera include a <b>Write DMS</b> tool that facilitates calculation of the molecular surface. Go to <b>Tools -> Structure Editing -> Write DMS</b>. Save the surface as <b>1DF8.receptor.dms</b>.
+*1LOQ.receptor.dms - molecular surface file that we got from previous step
+*R - sphere outside of surface (R) or inside surface (L)
+*X - specifies subset of surface points to be used (X=all points)
+*0.0 - prevents generation of large spheres with close surface contacts(defalut=0.0)
+*4.0 - maximum sphere radius in angstroms (default=4.0)
+*1.4 - minimum sphere radius in angstroms (default=radius of probe)
+*1LOQ.receptor.sph - clustered spheres file that we want to generate
-[[Image:2012_DOCK_Tutorial_1DF8_dms.png|thumb|center|375px|2012_DOCK_Tutorial_1DF8_dms]]
+'''2.''' Run the sphgen program from the command line:
-The <b>Write DMS</b> tool will "roll" a small probe (default radius = 1.4 Angstroms = Size of a water molecule) over the surface of the enzyme and calculate the surface normal at each point. DMS (distributed molecular surface) file is subsequently used as input file for <b>sphgen</b>.
+ sphgen -i INSPH -o OUTSPH
-===Spheres===
+'''OUTSPH''' is the output file containing the information on the sphgen run, while '''1LOQ.receptor.sph''' is the file which contains the spheres.
-To generate docking spheres, we need to use a command line program called sphgen. To run the <b>sphgen</b>, we need a input file named <b>INSPH</b>.
+'''3.''' ''(optional)'' If one wants to have a look at the spheres received as an outcome of sphgen, one should run '''showsphere''', by typing in the command line:
-DF8.receptor.dms            #molecular surface file that we got from previous step
+  showsphere
- R                            #sphere outside of surface (R) or inside surface (L)
- X                            #specifies subset of surface points to be used (X=all points)
-.0                          #prevents generation of large spheres with close surface contacts(defalut=0.0)
-.0                          #maximum sphere radius in angstroms (default=4.0)
-.4                          #minimum sphere radius in angstroms (default=radius of probe)
-DF8.receptor.sph            #clustered spheres file that we want to generate
-Save the INSPH file. Then use this command to generate the spheres file:
+The following questions will be prompted on the command line:
-  sphgen -i INSPH -o OUTSPH
+  Enter name of sphere cluster file:
+ Enter cluster number to process (<0 = all):
+ Generate surfaces as well as pdb files (<N>/Y)?
+ Enter name for output file prefix:
+ Process cluster 0 (contains ALL spheres) (<N>/Y)?
--i means input; -o means output.
+As the questions have been answered the program will generate the spheres, writing the data in a '''PDB''' file, afterwards the spheres may be observed by opening this file in '''Chimera'''.
-You should get two output files: <b>OUTSPH</b> and <b>1DF8.receptor.sph</b>.
+'''4.''' Run the '''sphere_selector''' program from the command line as follows:
-The OUTSPH file should similar to this:
-  density type = X
+  sphere_selector 1LOQ.receptor.sph ../01.dockprep/1LOQ.ligand.mol2 8.0
- reading  1DF8.receptor.dms                                                     type   R
- # of atoms =    881   # of surf pts =  10771
- finding spheres for   1DF8.receptor.dms
- dotlim =     0.000
- radmax =    4.000
- Minimum radius of acceptable spheres?
-.4000000
- output to  1DF8.receptor.sph
- clustering is complete     28  clusters
-You can also open the spheres file that we generated in this step <b>(1DF8_receptor.sph)</b>. This file contains detailed information about the spheres, which are divided into 28 clusters. Cluster 0 in the end of the spheres file is a combination of all the clusters.
+This program selects the spheres within a user-defined radius (8.0 here) of a target molecule from the previously obtained file '''1LOQ.receptor.sph'''. As a result a new file '''selected_spheres.sph''' will be generated. Again, the spheres can and should be visualized by using '''showsphere''' and '''Chimera'''.
-In order to visualize the generated spheres, you can use a program called <b>showsphere</b>. Showsphere is an interactive program. In the command line, simply type
+'''5.''' Run '''showsphere''' in order to visualize the spheres:
   showsphere
-You will be asked the following questions:
+When prompted on the command line, answering the questions as follows:
- Enter name of sphere cluster file:
-DF8.receptor.sph
- Enter cluster number to process (<0 = all): -1
- Generate surfaces as well as pdb files (<N>/Y)? N
- Enter name for output file prefix:
-DF8.output.
- Process cluster 0 (contains ALL spheres) (<N>/Y)? N
-In this case, 28 pdb files should be generated.
-An alternative way to do this is to create an input file, <b>showsphere.in</b>, as follows:
-DF8.receptor.sph
+  selected_spheres.sph
   -1
   N
-DF8.output
+  output_spheres_selected
   N
-Then type the command <b>showsphere</b>
+* Launch '''Chimera''', choose ''File'' -> ''Open'', choose '''1LOQ.receptor.noH.pdb'''
+* Go ''File'' -> ''Open'', choosing '''output_spheres_selected.pdb'''
+* Go ''Select'' -> ''Residue'' -> ''SPH''
+* Finally, ''Actions'' -> ''Atoms/Bonds'' -> ''sphere''
- showshpere < showsphere.in
+The picture like one placed below will be obtained afterwards.
-After generating the pdb files by showsphere, you can visualize each cluster by UCSF Chimera. For example, open <b>1DF8.output_1.pdb</b> by Chimera, then choose <b>Actions->Atoms/Bonds->sphere</b>, you will be able to see the spheres in cluster 1. You can also open the receptor file <b>(1DF8.receptor.mol2)</b> at the same time, then choose <b>Presets->Interactive 3</b>(hydrophobicity  surface), then again choose <b>Actions->Atoms/Bonds->sphere</b>, you will be able to see what the spheres in cluster 1 look like on the enzyme surface.
+[[Image:Selected spheres.png|thumb|375px|center|Selected spheres]]
-[[Image:2012_DOCK_Tutorial_1DF8_spheres.png|thumb|center|375px|2012_DOCK_Tutorial_1DF8_spheres]]
+==IV. Generating Box and Grid==
-There are over 500 spheres in the spheres file<b>(1DF8.receptor.sph)</b>. However, we're only interested in docking the ligand into the active site. Therefore we need to select only those spheres which are inside the active site, using <b>sphere_selector</b> program.
+This section of the tutorial will be performed in your ~03.box-grid/ folder. For more information GRID, please consult the DOCK online owners manual at:
- sphere_selector 1DF8.receptor.sph ../01-dockprep/1DF8.ligand.mol2 10.0
+http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm
-Sphere_selector filters the output from <b>sphgen</b>, selecting all spheres within a user-defined radius of a target molecule. In this case, we selected the spheres within 10  angstroms of our ligand. A file called <b>selected_spheres.sph</b> should be created, showing the spheres that are selected. You can again visualize it by <b>showsphere</b>. You can also change the radius (to 8 angstroms or 6  angstroms) or manually edit the file  selected_spheres.sph so that you can select the spheres you want. In this tutorial, spheres within 6 angstroms of the ligand are used for the next step (with one sphere which does not belong to the active site being manually deleted).
+===Box===
-[[Image:2012_DOCK_Tutorial_1DF8_surface_spheres_10 angstroms.png|thumb|375px|left|2012_DOCK_Tutorial_1DF8_surface_spheres_10 angstroms]]
+We will create a box around the binding site to define the boundaries of ligand sampling and aid in energy calculations. As a general rule, we will create a box around the spheres (from the previous step) with margins of at least 8.0 Angstroms on all sides. You first need to create a '''showbox.in''' file to generate the box:
-[[Image:2012_DOCK_Tutorial_1DF8_surface_spheres_6 angstroms.png|thumb|375px|center|2012_DOCK_Tutorial_1DF8_surface_spheres_6 angstroms]]<BR>
+ vim showbox.in
-==IV. Generating Box and Grid==
+Edit this file as following:
-===Box===
+ Y                                           # Yes, generate a box
-In order to speed up docking calculations, DOCK generates a fine grid. At each point in the grid, electrostatic and the VDW probes' energies are precomputed. The energies are computed using a molecular force field.  To determine the dimensions of the grid, however, we first generate a box that contains the outer boundaries for the grid calculation.  The dimensions and location of the box can be determined using a program called '''showbox'''.
+.0                                         # box margin
+ ../02.surface-sphgen/selected_spheres.sph   # input sphere files
+                                           # which sphere cluster to use
+LOQ.box.pdb                                # output file name
+''Note: Do not include the comments in the showbox.in file''. To run the program '''showbox''', give the name of the program on the command line and provide the '''showbox.in''' file:
-First create a directory where you will place the grid files.
+ showbox < showbox.in
- $mkdir 03-box-grid
+[[Image:boxexample.png|thumb|500px|center|Box Image]]
- $cd 03-box-grid
-'''showbox''' can be used interactively or a file with predetermined answers can be fed into the program.
-The program asks the questions depicted in the diagram the right:
+===Grid===
-[[File:Flow chart.JPG|thumb|upright=2|right|Flow Chart of Questions for Showbox
-(Red path is followed in this tutorial)]]
-To run the program in the interactive mode, run
- $showbox
+Grids are used as a method to speed up evaluation of the energy scoring function by pre-computing electrostatic and VDW potential energies on the receptor at defined grid points within the box. VDW parameters are stored in a file that was installed with DOCK, and we will find it convenient to copy it and two other files to our ~00.files/ folder. In this class, DOCK was installed in the following location:
-To feed the answers to the questions, run
+ /home/wjallen/AMS536/local/dock6/
- $showbox < showbox.in
+You can determine the path to your DOCK installation by typing:
-For example, showbox.in can contain:
+  which dock6
-  Y
- ../02-surface-spheres/selected_spheres.sph
-DF8.box.pdb
-'''Y''' means we use automatic box construction, '''5''' is the extra margin to be enclosed around our ligand (in Angstroms), '''selected_spheres.sph''' is the sphere file we generated, '''1''' corresponds to the cluster number in the '''selected_spheres.sph''' file, and '''1DF8.box.pdb''' is the output file.  We can open the output box file in chimera to make sure the box is in the right place.
+From that location, copy the following three files to your local ~00.files/ folder:
-[[File:1DF8_box.png|thumb|upright=2|right|1DF8 receptor along with our ligand and the box we generated using showbox]]
-===Grid===
+ /home/wjallen/AMS536/local/dock6/parameters/vdw_AMBER_parm99.def
-Now let's generate a grid within our box.  We will use the energy scoring method to generate a grid, resulting in three additional files with extensions '''*.nrg''', '''*.bmp''', and '''*.out'''.  The '''*.nrg''' file contains the energy scoring, '''*.bmp''' contains the size, position and grid spacing and determines whether there are any overlaps with receptor atoms.
+  /home/wjallen/AMS536/local/dock6/parameters/flex.defn
+  /home/wjallen/AMS536/local/dock6/parameters/flex_drive.tbl
-To generate the grid we will use the grid program.  This program can either be used interactively, or an input file can be fed in, just like the '''showbox''' program.
+Use grid program to generate the grid and answer the questions as following:
-  Usage: grid [-i [input_file]] [-o [output_file]] ...
+  touch grid.in
-  [-standard_i/o] [-terse] [-verbose]
+ grid -i grid.in
-  -i: read from grid.in or input_file, standard_in otherwise
-  -o: write to grid.out or output_file (-i required),
-      standard_out otherwise
-  -s: read from and write to standard streams (-i and/or -o illegal)
-  -t: terse program output
-  -v: verbose program output
-For our grid.in file, we will use the following answers:
   compute_grids                  yes
-  grid_spacing                   0.3
+  grid_spacing                   0.4
   output_molecule                no
   contact_score                  no
@@ Line 240: / Line 219: @@
   bump_filter                    yes
   bump_overlap                   0.75
-  receptor_file                  ../01-dockprep/1DF8.receptor.mol2
+  receptor_file                  ../01.dockprep/1LOQ.receptor.mol2
-  box_file                       1DF8.box.pdb
+  box_file                       ../03.box-grid/1LOQ.box.pdb
-  vdw_definition_file            /opt/software/AMS536software/dock6/parameters/vdw_AMBER_parm99.defn
+  vdw_definition_file            ../00.files/vdw_AMBER_parm99.defn
-  score_grid_prefix              grid
+  score_grid_prefix              1LOQ.grid
-Line by line:
-#compute scoring grids (yes)
+[[Image:gridexample.png|thumb|500px|center|Grid Image]]
-#what is the distance between grid points along each axis (in Angstroms).
-#write up coordinates of the receptor into a new file
-#compute contact grid? default is no
-#compute energy score? yes - we are using this method to compute force fields on probes
-#the max distance between atoms for the energy contribution to be computed
-#atom_model '''u''' means united atom model where atoms are attached to hydrogens, and '''a''' stands for all-atom model, where hydrogens on carbons are treated separately
-#attractive component stands for exponent of the attractive LJ term in VDW potential
-#repulsive component stands for exponent in the repulsive LJ term in VDW potential
-#distance dielectric stands for the dielectric constant to be linearly dependent on distance
-#distance dielectric factor is the coefficient of the dielectric
-#bump filter flag determines if we want to screen orientation for clashes before scoring and minimization
-#bump_overlap stands for the fraction of allowed overlap where 1 corresponds to no allowed overlap and 0 corresponds to full overlap being permitted.
-#our receptor file
-#the box file we generated in the Box section
-#VDW parameters file
-#Prefix for the grid file name.  All the extensions will be generated automatically.
 ==V. Docking a Single Molecule for Pose Reproduction==
-===Docking and Results===
+This part of the tutorial will be performed in your ~04.dock/ folder. For more information DOCK, please consult the DOCK online owners manual at:
-Change directory into “04-dock” and create an empty input file called dock.in
- $touch dock.in     #create a file called “dock.in”
+http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm
-Run dock6.4
+===Rigid Docking===
- $dock6 -i dock.in
+When docking a molecule ''rigidly'' in DOCK, no conformations or rotamers will be sampled other than the input configuration which, in our case, was derived from the crystal structure. This will be much faster that ''flexible'' docking, but it is not appropriate for virtual screening because the docked conformation of compounds from a library is generally not known. First create an empty input file for docking:
+  touch dock.in
-Alternatively:
+Then execute DOCK6:
+  dock6 -i dock.in
-  $dock6 -i dock -v # “-v” option allows you to print out the information regarding each growth step on terminal
+You will be prompted for answers to many questions. For the rigid docking experiments, suggested parameters are as follows:
+  ligand_atom_file                                             ../01.dockprep/1LOQ.ligand.mol2
-Or:
+  limit_max_ligands                                            no
+ skip_molecule                                                no
-  $dock6 -i dock -o dock.bf.out -v # “-o” allows you to write the aforementioned information into an output file named “dock.bf.out”
+  read_mol_solvation                                           no
+  calculate_rmsd                                               no
-'''1st run''':
+  use_database_filter                                          no
+ orient_ligand                                                yes
-Notice that running dock6 with an empty input file will require you to answer a series of questions.  For the first run we will deactivate most of the features by selecting “no”.  Parameters requiring specification are listed below:
+ automated_matching                                           yes
+ receptor_site_file                                           ../02.surface-sphgen/selected_spheres.sph
-  ligand_atom_file [database.mol2] ():   	../01-dockprep/1DF8.ligand.mol2
+ use_internal_energy                                          no
-  ligand_outfile_prefix [output] ():   		1DF8.output
+ flexible_ligand                                              no
+  bump_filter                                                  no
-What the program is doing in the 1st run is to take the ligand.mol2 file and directly generate an output file without any additional manipulations.  You would expect it to be exactly the same as the original pose.  You can open it in Chimera by using the “ViewDock” function under “Tools->Surface/Binding Analysis”.  We will not show the result here.
+ score_molecules                                              yes
+ contact_score_primary                                        no
-'''2nd run:'''
+ contact_score_secondary                                      no
+  grid_score_primary                                           yes
-The real experiment begins here.  Notice that we selected “no” for most of the functions.  This time we will try change some parameters in the dock.in file.
+ grid_score_secondary                                         no
+  grid_score_rep_rad_scale                                     1
-  $vim dock.in
+  grid_score_vdw_scale                                         1
+  grid_score_es_scale                                          1
-Parameters being changed are listed below:
+  grid_score_grid_prefix                                       ../03.box-grid/1LOQ.grid
+ multigrid_score_secondary                                    no
-  orient_ligand                                               yes
+ dock3.5_score_secondary                                      no
+ continuous_score_secondary                                   no
-The "orient_ligand" option tells the program whether to try orientations different from the pose in the original .mol2 file.  Note that because of the change, additional questions are asked by the program.  For simplicity we keep most of the answers as default.  File paths that need to be specified are listed below:
+ descriptor_score_secondary                                   no
+ gbsa_zou_score_secondary                                     no
-  receptor_site_file                          [receptor.sph] ():../02-surface-spheres/selected_spheres_06A.sph
+ gbsa_hawkins_score_secondary                                 no
-  vdw_defn_file                               [vdw.defn] ():../03-grid/vdw_AMBER_parm99.defn
+ SASA_descriptor_score_secondary                              no
-  flex_defn_file                              [flex.defn] ():../03-grid/flex.defn
+ amber_score_secondary                                        no
-  flex_drive_file                             [flex_drive.tbl] ():../03-grid/flex_drive.tbl
+ minimize_ligand                                              no
+ atom_model                                                   all
-The receptor_site_file should be the selected spheres file (.sph) generated in a previous step (02 surface and spheres), according to which the program orients the ligand.
+  vdw_defn_file                                                ../00.files/vdw_AMBER_parm99.defn
+  flex_defn_file                                               ../00.files/flex.defn
+  flex_drive_file                                              ../00.files/flex_drive.tbl
+ ligand_outfile_prefix                                        output
+ write_orientations                                           no
+ num_scored_conformers                                        1
+ rank_ligands                                                 no
-The vdw_defn_file instructs the dock6 program to use the Van der Waals parameter sets from the AMBER force field.
+With these input parameters, the rigid docking experiment will result in a single scored ligand pose. Open Chimera to visualize the docking result. After opening the receptor file in surface view, open the ligand file (named "output_scored.mol2") by going under ''Tools'' -> ''Surface/Binding Analysis'' -> ''ViewDock''.
-The flex_defn_file and the flex_drive_file contain the information required by the program to sample conformations.
+[[Image:dock_original ligand.png|thumb|375px|center|Rigid docking result]]
-The result is shown below.  As you may notice the result doesn't look very good.
+===Flexible Docking===
-[[Image:DOCK_2nd_run_result.png|thumb|center|375px|alt=Example alt text|DOCK second Run result]]
+Flexible docking will take longer, but is necessary for virtual screening. For flexible docking, several input parameters need to be changed, and these changes will generate additional questions prompted to the user. Some changes to parameters are shown below:
-'''3rd Run:'''
-Here we will further specify parameters to improve the result.
   calculate_rmsd                                               yes
-  score_molecules                                              yes
+  flexible_ligand                                              yes
+ use_internal_energy                                          yes
+ minimize_ligand                                              yes
   num_scored_conformers                                        10
-Again, further questions will be asked when you run the program.  Keep most answers as the default values.  Paths requiring specification are listed below:
+Once these changes have been made to the input file, execute DOCK6 again:
+  dock6 -i dock.in
- grid_score_grid_prefix                                        [grid] ():../03-grid/1DF8.grid
+This flexible ligand protocol can be used with multiple ligand input files. The docked ligand poses can be compared using RMSD (Root Mean Squared Deviation). RMSD is a measurement of the distance between atoms on the original crystal structure ligand with the newly generated ligand conformation. An RMSD under 2 Angstroms would indicate a good conformation. Below are two examples of a final docked ligand in the receptor binding site pocket. The worst and best energy scored ligands are shown below. The original ligand is in cyan and the docked ligand is colored in heteroatom.
-Note that the above specification will tell the program to load the 1DF8.grid.nrg file you generated in the previous step (grid) for scoring.
+[[Image:dock_bestligand_and original.png|thumb|375px|center|Best scored ligand result (-78.3 kcal/mol)]]
- simplex_max_iterations                                        [1000] ():20
+[[Image:dock_worstligand_and original.png|thumb|375px|center|Worst scored ligand result (-67.4 kcal/mol)]]
- write_conformations                                           [yes] (yes no):no
- cluster_rmsd_threshold                                        [2.0] ():0.2
-The simplex_max_iteration parameter specifies number of minimization cycles.
+==VI. Virtual Screening==
-Note the program will cluster poses that are very close together (rmsd smaller than the threshold specified in the cluster_rmsd_threshold parameter) into a cluster.
+===Virtual Screening Introduction===
-This time the program returned the best 10 poses.  The one with the best grid score (-52.8, shown blue) superimposed quite well with the original pose in the crystal structure (RMSD = 0.71).  You can view the grid scores in ViewDock by selecting “Column->Show->Grid Score”
+A virtual screen of various ligand allows for the comparison of both qualitative (e.g. position in binding site) and quantitative (e.g. energy scores) data pertaining to the each screened ligand with an originally docked molecule.  Virtual screening is often used as a method to cut the cost of experimentation by narrowing down the ligands within a database and predicting which will exhibit the most favorable binding to a specific protein (with a pre-determined .grid file).
-[[Image:DOCK_3rd_run_result.png|thumb|center|375px|alt=Example alt text|DOCK third Run result]]
+===Running the Database Filter===
-'''4th Run:'''
+When first beginning the virtual screen, a filter should be specified in order save computational power by only screening potentially effective ligands within the directory ~06.database-filter/.  This involves limiting several variables of ligands specified in a .mol2 file.  To limit the ligands from the original database, a file "dock_filter.in" is created and executed using DOCK.  Note that in this example the file "cdiv_p0.0.mol2" is a multi mol2 file containing thousands of ligands from the CHEMDIV vendor which was previously downloaded from the UCSF ZINC database (https://zinc.docking.org/about/ucsf).  ZINC is a great resource of purchasable compounds for virtual screening. Here are the contents of the file:
-This time we will set the ligand as flexible:
+ ligand_atom_file                                             /home/wjallen/AMS536/multi-mol2-files/cdiv_p0.0.mol2
+ limit_max_ligands                                            no
+ skip_molecule                                                no
+ read_mol_solvation                                           no
+ calculate_rmsd                                               no
+ use_database_filter                                          yes
+ dbfilter_max_heavy_atoms                                     50
+ dbfilter_min_heavy_atoms                                     0
+ dbfilter_max_rot_bonds                                       10
+ dbfilter_min_rot_bonds                                       0
+ dbfilter_max_molwt                                           9999.0
+ dbfilter_min_molwt                                           0.0
+ dbfilter_max_formal_charge                                   -1.0
+ dbfilter_min_formal_charge                                   -2.0
+ orient_ligand                                                no
+ use_internal_energy                                          no
+ flexible_ligand                                              no
+ bump_filter                                                  no
+ score_molecules                                              no
+ atom_model                                                   all
+ vdw_defn_file                                                ../00.files/vdw_AMBER_parm99.defn
+ flex_defn_file                                               ../00.files/flex.defn
+ flex_drive_file                                              ../00.files/flex_drive.tbl
+ ligand_outfile_prefix                                        filtered
+ write_orientations                                           no
+ num_scored_conformers                                        1
+ rank_ligands                                                 no
-  flexible_ligand                                              yes
+In this file some important characteristics of the ligands that are limited are ''dbfilter_max_heavy_atoms'', ''dbfilter_max_rot_bonds'', ''dbfilter_max_molwt'', and both ''dbfilter_max_formal_charge'' and ''dbfilter_min_formal_charge''.  As show in the text of the file, a file called filtered_scored.mol2 will be created with all of the ligands from the original database that fit into the specified limits of number of heavy atoms, number of rotatable bonds, molecular weight, and formal charge. '''''Note:''''' ''The max and min formal charge specified in this file (-1 to -2) are an extremely limited range, and are only used in this tutorial as an illustration.''
-Further questions will be asked during the run.  Keep most of the answers as the default values except for:
+===Running the Virtual Screen===
- simplex_grow_max_iterations                                   [20] ():500
-The simplex_grow_max_iteration specifies the maximum number of iterations per cycle per growth step.
+After creating a new directory, ~07.virtual-screen/, a dock.in file must be created.  So as not to confuse the virtual screen file with the original dock.in of the docked ligand, the new file created is called dock_vs.in.  Running this file in dock6 allows us to fill in the important details of the file that terminate in a virtual screening of the input ligand.
-Notice that the run is significantly slower this time.  Again the best pose generated (-64.8) is shown in blue.  The grid score improved significantly but the RMSD did not change much (0.79)
+  ligand_atom_file                                             ../06.database-filter/filtered_scored.mol2
-[[Image:DOCK_4th_run_result.png|thumb|center|375px|alt=Example alt text|DOCK fourth Run result]]
-'''5th Run:'''
-This time we will turn the bump filter on:
- bump_filter                                                  yes
-The bump_filter option filters out conformations that cause clashes between atoms.
-Further questions will be asked during the run.  Keep answers as the default values.  Specify the following path:
- bump_grid_prefix                                              [grid] ():../03-grid/1DF8.grid
-Note that the path tells the program to access the .bmp file generated in the previous grid step.
-The best pose (grid score -65.3) is again shown in blue.  Notice that turning on the bump filter does not alter either the grid score or the RMSD (0.78).
-[[Image:DOCK_5th_run_result.png|thumb|center|375px|alt=Example alt text|DOCK fifth Run result]]
-Remember you can tweak any parameter in the dock.in file instead of keeping them as default.
-Following is the final dock.in file used during the fifth run:
- ligand_atom_file                                             ../01-dockprep/1DF8.ligand.mol2
   limit_max_ligands                                            no
   skip_molecule                                                no
   read_mol_solvation                                           no
-  calculate_rmsd                                               yes
+  calculate_rmsd                                               no
- use_rmsd_reference_mol                                       no
   use_database_filter                                          no
   orient_ligand                                                yes
   automated_matching                                           yes
-  receptor_site_file                                           ../02-surface-spheres/selected_spheres_06A.sph
+  receptor_site_file                                           ../02.surface-sphgen/selected_spheres.sph
-  max_orientations                                             500
+  max_orientations                                             1000
   critical_points                                              no
   chemical_matching                                            no
@@ Line 398: / Line 361: @@
   internal_energy_rep_exp                                      12
   flexible_ligand                                              yes
-  min_anchor_size                                              40
+ user_specified_anchor                                        no
+ limit_max_anchors                                            no
+  min_anchor_size                                              5
   pruning_use_clustering                                       yes
-  pruning_max_orients                                          100
+  pruning_max_orients                                          1000
   pruning_clustering_cutoff                                    100
-  pruning_conformer_score_cutoff                               25
+  pruning_conformer_score_cutoff                               100
   use_clash_overlap                                            no
   write_growth_tree                                            no
@@ Line 414: / Line 379: @@
   grid_score_vdw_scale                                         1
   grid_score_es_scale                                          1
-  grid_score_grid_prefix                                       ../03-grid/1DF8.grid
+  grid_score_grid_prefix                                       ../03.box-grid/1LOQ.grid
+ multigrid_score_secondary                                    no
   dock3.5_score_secondary                                      no
   continuous_score_secondary                                   no
+ descriptor_score_secondary                                   no
   gbsa_zou_score_secondary                                     no
   gbsa_hawkins_score_secondary                                 no
+ SASA_descriptor_score_secondary                              no
   amber_score_secondary                                        no
   minimize_ligand                                              yes
@@ Line 436: / Line 404: @@
   simplex_restraint_min                                        no
   atom_model                                                   all
-  vdw_defn_file                                                ../03-grid/vdw_AMBER_parm99.defn
+  vdw_defn_file                                                ../00.files/vdw_AMBER_parm99.defn
-  flex_defn_file                                               ../03-grid/flex.defn
+  flex_defn_file                                               ../00.files/flex.defn
-  flex_drive_file                                              ../03-grid/flex_drive.tbl
+  flex_drive_file                                              ../00.files/flex_drive.tbl
-  ligand_outfile_prefix                                        1DF8.output
+  ligand_outfile_prefix                                        output
   write_orientations                                           no
-  num_scored_conformers                                        10
+  num_scored_conformers                                        1
- write_conformations                                          no
- cluster_conformations                                        yes
- cluster_rmsd_threshold                                       0.2
   rank_ligands                                                 no
-Now we will write up a script for submitting your dock job to Seawulf.  Create a script called “sub.dock.csh”
- $vim sub.dock.csh
+A few of the questions addressed in the terminal window while running a virtual screen are as follows:
-wherein you write:
+'''ligand_atom_file''' - here we have specified the new file created after filtering the initial ligand database file
- #! /bin/tcsh
+'''calculate_rmsd''' - we select the default <no> because since we are screening a library of compounds, there will be no reference position to compare them to
- #PBS -l nodes=1:ppn=1
- #PBS -l walltime=01:00:00
- #PBS -o zzz.qsub.out
- #PBS -j oe
- cd ~/AMS536/DOCK_tutorial/05-dock_qsub
- /nfs/user03/sudipto/dock6/bin/dock6 -i dock.in -o dock.out
-This will request 1 processor from the cluster for your job.  When you are submitting the job:
+'''orient_ligand''' - here we would like to the have the ligand oriented in the binding pocket
- $qsub sub.dock.csh
+'''receptor_site_file''' - this is our selected_spheres.sph file from earlier
-Note that you might have to make the script executable before running it:
+'''internal_energy_rep_exp''' - here we select 12 rather than 9 because using a repulsive exponent of 12 gives us a slightly steeper rising potential as radius between the atoms falls below the optimal distance (sigma)
- $chmod +x sub.dock.csh
+'''flexible_ligand''' - when performing a virtual screen of a random database it is always advisable to set the ligands to flexible
-===Results===
+'''min_anchor_size''' - ensures that major features on the molecules are used as anchors and not just simple small groups
-==VI. Virtual Screening==
+'''score_molecules''' - we always want to score the molecules of a virtual screen in order to determine which are the best fit for the binding pocket
-===Virtual Screening Preparation===
+'''grid_score_primary''' - we want the grid score to be the primary method of scoring the screened ligands
-<b>Virtual screening</b> is a widely used method in computational drug design. It searches large libraries of chemical compounds to identify favorable structures that bind to a target molecule. To perform virtual screening, we use <b>ligands.mol2</b>, a mol2 file which contains 101 small molecules to be the virtual library. The computational cost is reasonable for a quick search. Usually we use larger molecule set from chemical database, such as <b>ZINC</b>(http://zinc.docking.org/).
-Since the computational cost of virtual screening is much higher than single-ligand docking, it is better for us to run it on Seawulf. We need to compress the whole <b>DOCK-Tutorial</b> directory, copy it to Seawulf and compress it.
+'''grid_score_grid_prefix''' - the input for the .grid file created earlier
- tar -cvf DOCK-Tutorial.tar DOCK-Tutorial/
- scp DOCK-Tutorial sw:/nfs/user03/usrname/AMS536
- tar -xvf DOCK-Tutorial.tar DOCK-Tutorial
-===Virtual Screening Protocol===
+'''atom_model''' - we select the all atom model
-The purpose of virtual screening is different from single molecule docking, so we need to modify our previous docking script <b>dock.in</b> to <b>vs.in</b>. We can see the difference between the two files.
-c1
- < ligand_atom_file                                             ligands.mol2
- ---
- > ligand_atom_file                                             ../01-dockprep/1DF8.ligand.mol2
-,59c56,59
- < vdw_defn_file                                                /nfs/user03/sudipto/dock6/parameters/vdw_AMBER_parm99.defn
- < flex_defn_file                                               /nfs/user03/sudipto/dock6/parameters/flex.defn
- < flex_drive_file                                              /nfs/user03/sudipto/dock6/parameters/flex_drive.tbl
- < ligand_outfile_prefix                                        1DF8.vs.output
- ---
- > vdw_defn_file                                                ../03-box-grid/vdw_AMBER_parm99.defn
- > flex_defn_file                                               ../03-box-grid/flex.defn
- > flex_drive_file                                              ../03-box-grid/flex_drive.tbl
- > ligand_outfile_prefix                                        1DF8.output
-c61
- < num_scored_conformers                                        1
- ---
- > num_scored_conformers                                        10
-c63,64
- < cluster_conformations                                        no
- ---
- > cluster_conformations                                        yes
- > cluster_rmsd_threshold                                       0.2
-<b>num_scored_conformers: 1 -> 10</b> In virtual screening, we only need the most favorable pose of each candidate molecule and compare them.<br>
+'''num_scored_conformers''' - we only want to produce one scored conformer from each screened ligand because we only want the best conformations
-<b>cluster_conformations: yes -> no</b> Slightly different conformations are not clustered together. They are treated as different conformations in the docking process.
-In order to generate different search spaces, we can modify some other parameters.
+Note that if you are performing a small sample virtual screening it is possible to run it on a local computer in a reasonable amount of time; however, if you are running the virtual screening to a very large scale (i.e. hundreds or thousands of ligands) you will need to use several nodes in Seawulf to produce results in a timely fashion.
-<b>max_orientations:</b> The maximal number of anchor orientations that will be generated.<br>
+===Analyzing the Results===
-<b>min_anchor_size:</b> The minimum number of atoms for an anchor.<br>
-<b>pruning_use_clustering:</b> Pruning conformations during the clustering process.<br>
-<b>use_internal_energy:</b> Using repulsive VDM to avoid internal clashes.<br>
-Parallel computing can reduce the running time of virtual screening. Here is our job submission script <b>sub.virtual_screen.csh</b>.
+When your virtual screening has run completely and you are ready to analyze the results, open Chimera.  In Chimera you can prepare your protein by opening 1LOQ.receptor.noH.pdb and choosing ''Actions'' -> ''Surface'' -> ''show''.  You may also wish to open up the original ligand file 1LOQ.ligand.mol2 in order to show the position of the original ligand/substrate in the receptor's active site.
- #! /bin/tcsh
- #PBS -l nodes=4:ppn=2
- #PBS -l walltime=01:00:00
- #PBS -o zzz.qsub.out
- #PBS -j oe
- #PBS -V
- set nprocs = `wc -l $PBS_NODEFILE | awk '{print $1}'`
-  echo "Running on ${nprocs} processors"
- echo ""
- echo "processor list are:"
- cat $PBS_NODEFILE
- cd ~/AMS536/DOCK_Tutorial/06-virtualscreen
- mpirun -np $nprocs dock6.mpi -i vs.in -o vs.out
-Finally, we can submit the job and perform virtual screening.
- qsub sub.virtual_screen.csh
-===Virtual Screening Results===
+To view the results of your virtual screening go to ''Tools'' -> ''Surface/Binding Analysis'' -> ''ViewDock'' and from their you can select your virtual screen output file.  In the "ViewDock" pop-up window you can then select ''Column'' --> ''Show'' --> ''Grid Score'' which will add a column to the ViewDock window with the grid score of each ligand.  You can double-click on the column heading ''Grid Score'' to sort the ligands based on their grid scores.
-After performing virtual screening on Seawulf or any other computer, if you are using a parallel computer, you should get a multi-mol2 file <b>1DF8.vs.output_scored.mol2</b> which contains the mol2 files of all succesfully docked ligands, a <b>vs.out</b> file which contains the dock results of successfully docked ligands, and <b>vs.out.1</b> through <b>vs.out.7</b> which contain dock results from different nodes you use (the number will vary according to the number of nodes you use, here we use 8 nodes as mentioned before).
-The <b>vs.out</b> file is returned by the leading node and contains the information of each succesfully docked ligands, looks like this:
+[Chimera image1] [Chimera image2]
- Molecule: ZINC33171556
- Anchors:               1
- Orientations:          500
- Conformations:         116
-    Grid Score:          -52.373383
-      Grid_vdw:          -51.643253
-       Grid_es:           -0.730129
-    Int_energy:            6.125062
-The <b>vs.out.1</b> through <b>vs.out.2</b> files are returned by other nodes processing those ligands, separately. For those succesfully docked ligands, the file will return the elapsed time for docking, and for those not succesfully docked, it will return an error massege like this:
+==VII. Running DOCK in Serial and in Parallel on Seawulf==
- Molecule: ZINC20605433
+The [http://www.stonybrook.edu/seawulfcluster/ Seawulf] Cluster is a 470-processor Linux Cluster capable of highly parallel processing. This parallel processing allows dock virtual screens to be completed in a fraction of the time as a single processor.
- Elapsed time:  0 seconds
- ERROR:  Could not complete growth.
-         Confirm that the grid box is large enough to contain the ligand,
-         and try increasing max_orientations.
-If you download the multi-mol2 file from seawulf using:
+If you are docking multiple ligands, you can use more than one processor in parallel mode, but you should never use more processors than you have ligands. Before we can run DOCK on Seawulf, we need to copy the proper files from Herbie to Seawulf. If we CD into the AMS536 folder we can use the following command from the mathlab computer to copy all of the dock-tutorial files
- scp sw:~/AMS536/DOCK_tutorial/06-virtual-screen/1DF8.vs.output_scored.mol2 ./
-Now you have this multi-mol2 file <b>1DF8.vs.output_scored.mol2</b> on your local machine, you can actually open this file in Chimera to visually check you docking results and do some visual analysis. But it's not a good idea to directly open your multi-mol2 file because it contains information of all 47 succesfully docked ligands, if you just open this file, it will be pretty messy. So what you will do is, first you open the receptor file <b>1DF8.receptor.mol2</b> and the ligand <b>1DF8.ligands.mol2</b> as a reference. And then use the ViewDock function of Chimera to look at your 47 ligands one or however many you want at a time. You can find ViewDock via <b>Tools</b> -> <b>Surface/Binding Analysis</b> -> <b>ViewDock</b>, and then find your <b>1DF8.vs.output_scored.mol2</b> file in your own directory and click on open. Now a new window of ViewDock will pop out and there will be an extra ligands in Chimera main window which is the highlighted ligand in ViewDock window. You can look at the ligands one at a time, you can also hold <b>ctrl</b> and click on different ligands to view them at the same time, this will give you a direct idea of how good these ligands dock. The other thing you can do is, the multi-mol2 file <b>1DF8.vs.output_scored.mol2</b> contains the energy score of every ligand, so in ViewDock window, you can go to <b>Column</b> -> <b>show</b> -> <b>Grid Score</b> to show the grid score, and then you can click on the head of the column to rank order all the ligands by their grid scores.The picture showing here is the best and worst scored ligands of out calculation, the best scroed one in cyan and the worst scored on in magenta, and the reference ligand is colored according to elements. As you can see, the best scored ligand fits in the binding pocket very well, but the worst scored one almost sticks out of the binding pocket.
+ scp -r /dock-tutorial/  username@herbie.mathlab.sunysb.edu:~/AMS536/
-[[Image:DOCK_tutorial_2012_VirtualScreeningSample.png|thumb|center|375px|Example of Virtual Screening Result]]
+Now we have all of our DOCK preparation files and folders on the seawulf cluster.
-==VII. Running DOCK in Serial and in Parallel on Seawulf==
+===Running DOCK in Serial on a Single Processor===
-The Seawulf Cluster has 235 dual processor nodes (2 processors per node), for a total of 470 individual processors. These are 3.4Ghz Intel Pentium IV Xeon processors from Dell. Each node has 2GB attached RAM and a 40GB hard disk.
-Typically you will use one processor on a single node to dock one ligand.  If you are docking multiple ligands, you can use more than processor in parallel mode, but you should never use more processors than you have ligands.
-===Running DOCK in Serial on a Single Processor===
+Running on a single processor is very similar to running dock on the mathlab computer.
-The following sample code can be used to run DOCK on one processor on a single node:
+If you make a file called qsub.csh with the text:
   #!/bin/tcsh
@@ Line 586: / Line 472: @@
   #PBS -o pbs.out
-  cd /nfs/user03/sudipto/DOCK_Tutorial
+  cd /nfs/user03/zfoda/AMS536/dock-tutorial/07.virtscreen
-  /nfs/user03/sudipto/dock6/bin/dock6 -i dock.in -o dock.out
+  /nfs/user03/wjallen/local/dock6/bin/dock6 -i dock.in -o dock.out
-Here is an explanation of the code and format:
+An explanation of the commands:
   #!/bin/tcsh                                                 #Execute script with tcsh
   #PBS -l nodes=1:ppn=1                                       #Use one node, and one processor per node, so one single processor
   #PBS -l walltime=01:00:00                                   #Allow 1 hour for your job run
   #PBS -N dock6                                               #Name of your job
   #PBS -M user@ic.sunysb.edu                                  #Get an email notifying you when your job is completed
@@ Line 599: / Line 485: @@
   #PBS -o pbs.out                                             #Name of your output file
-  cd /path-to-your-home-directory-on-seawulf/DOCK_Tutorial    #Change to your home directory and folder with dock files
+  cd /nfs/user03/zfoda/AMS536/dock-tutorial/07.virtscreen       #Change to your home directory and folder with dock files
-  /nfs/user03/sudipto/dock6/bin/dock6 -i dock.in -o dock.out  #Specifies path to dock executable and provide input and output filenames
+  /nfs/user03/wjallen/local/dock6/bin/dock6 -i dock.in -o dock.out #Specifies path to dock executable and provide input and output filenames
-===Running DOCK in Parallel using MPI===
+To submit the experiment use the command:
-The following sample code can be used to run DOCK on 4 nodes, using both processors on each node, for a total of 8 processors.
+  qsub qsub.csh
-  #! /bin/tcsh
- #PBS -l nodes=4:ppn=2
- #PBS -l walltime=24:00:00
- #PBS -o zzz.qsub.out
- #PBS -j oe
- #PBS -V
- set nprocs = `wc -l $PBS_NODEFILE | awk '{print $1}'`
- echo "Running on ${nprocs} processors"
- cd /nfs/user03/sudipto/DOCK_Tutorial
- cp /nfs/user03/sudipto/dock6/parameters/vdw_AMBER_parm99.defn .
- cp /nfs/user03/sudipto/dock6/parameters/flex* .
- mpirun -np $nprocs /nfs/user03/sudipto/dock6/bin/dock6.mpi -i dock.in -o dock.out
-For more information, see [[PBS Queue]]
+You will have submitted a DOCK experiment to the seawulf queue.
-===Serial Calculation for Pose Reproduction===
+See also [http://ringo.ams.sunysb.edu/index.php/PBS_Queue PBS] commands.
-===Parallel Virtual Screen===
+===Running DOCK in Parallel using MPI===
-==VIII. Molecular Footprints==
-===Preparation (minimization) of Reference Molecule===
-The footprint of the reference molecule guides the selection of molecules/poses in the footprint rescoring method, therefore careful preparation of this reference molecule is necessary. Typically the reference molecule will be an endogenous substrate or known inhibitor from an x-ray crystallographic structure.
-Preferably, the molecule should be prepared with hydrogens and charges using AMBER and subsequently converted to mol2 format using the 'top2mol2' command.  This will provide you with a 'monotonic' mol2 file -- meaning that all the hydrogens will be listed with the corresponding protein residue, rather than separately at the end of the file (like Chimera's 'DockPrep' tool does).
-An alternative way to generate the "monotonic" receptor file is to open the receptor.mol2 file with Chimera, save as pdb file, reopen the pdb file and go through DockPrep, and again save as a mol2 file. In this case, the new mol2 file will be monotonic.
-Once the molecule is prepared, it should be minimized in the receptor using a tethered minimization.  This is especially important if the crystal structure was refined without hydrogens (typically hydrogens are not used in refinement unless the resolution is ~1 Angstrom or better).  Minimization is needed to help alleviate any steric clashes involving the newly added hydrogens (which would otherwise disrupt your footprint results).
-An example input file for use on seawulf (using dock6.5):
- ligand_atom_file                                             ligand.mol2
- limit_max_ligands                                            no
- skip_molecule                                                no
- read_mol_solvation                                           no
- calculate_rmsd                                               yes
- use_rmsd_reference_mol                                       no
- use_database_filter                                          no
- orient_ligand                                                no
- use_internal_energy                                          yes
- internal_energy_rep_exp                                      12
- flexible_ligand                                              no
- bump_filter                                                  no
- score_molecules                                              yes
- contact_score_primary                                        no
- contact_score_secondary                                      no
- grid_score_primary                                           no
- grid_score_secondary                                         no
- dock3.5_score_primary                                        no
- dock3.5_score_secondary                                      no
- continuous_score_primary                                     yes
- continuous_score_secondary                                   no
- cont_score_rec_filename                                      receptor.mol2
- cont_score_att_exp                                           6
- cont_score_rep_exp                                           12
- cont_score_rep_rad_scale                                     1
- cont_score_use_dist_dep_dielectric                           yes
- cont_score_dielectric                                        4.0
- cont_score_vdw_scale                                         1
- cont_score_es_scale                                          1
- descriptor_score_secondary                                   no
- gbsa_zou_score_secondary                                     no
- gbsa_hawkins_score_secondary                                 no
- amber_score_secondary                                        no
- minimize_ligand                                              yes
- simplex_max_iterations                                       1000
- simplex_tors_premin_iterations                               0
- simplex_max_cycles                                           1
- simplex_score_converge                                       0.1
- simplex_cycle_converge                                       1.0
- simplex_trans_step                                           1.0
- simplex_rot_step                                             0.1
- simplex_tors_step                                            10.0
- simplex_random_seed                                          0
- simplex_restraint_min                                        yes
- simplex_coefficient_restraint                                10.0
- atom_model                                                   all
- vdw_defn_file                                                /nfs/user03/tbalius/zzz.programs/dock6.5/parameters/vdw_AMBER_parm99.defn
- flex_defn_file                                               /nfs/user03/tbalius/zzz.programs/dock6.5/parameters/flex.defn
- flex_drive_file                                              /nfs/user03/tbalius/zzz.programs/dock6.5/parameters/flex_drive.tbl
- ligand_outfile_prefix                                        xtalmin
- write_orientations                                           no
- num_scored_conformers                                        1
- rank_ligands                                                 no
-===Molecular footprints comparisons===
-After rescoring of the molecules by footprint-based score, typically three output files will be generated: a mol2 file showing the ligands information, a footprint.txt file showing the footprints infomation, and a hbond.txt file showing the hydrogen bond information. The footprint.txt file contains per residue interaction of the receptor with the ligand, for both reference molecule and the docked molecule. Shown below is an example of VDW footprints comparison.
-[[File:2012_Dock_Tutorial_Footprints_1.png|thumb|375px|center|2012_DOCK_Tutorial_Footprints_1]]
-[[File:2012_Dock_Tutorial_Footprints_2.png|thumb|375px|center|2012_DOCK_Tutorial_Footprints_2]]
-Actually the upper picture and the bottom picture are comparing the same molecule to a same reference molecule. The difference is that the reference molecule for the upper case is the original ligand.mol2 without energy minimization! Obviously, the reference molecule is not correct in this case. As you can see, several very disfavored VDW interactions were observed for the reference molecule. In contrast, the bottom picture shows the correct comparison.
-==IX. Frequently Encountered Problems==
-===Barbara===
-'''How to Open Files'''
-If you want to view a file in chimera, first transfer the file from "herbie" or "seawulf" to your laptop using WinSCP.  Then open Chimera on your laptop to open the files.
-===Woo Suk===
-====Distinguishing overlapped spheres====
-Sometimes your 1DF8.output_1.pdb can have some overlapped spheres. In this case, you cannot find them before you delete one sphere and you have to repeat this one more time to delete another one.<BR>
-In order to avoid the situation, you can adjust a parameter in <b>INSPH</b> file like following:
-DF8.receptor.dms
- R
- X
- <b>0.001</b>
-.0
-.4
-DF8.receptor.sph
-You can change the fourth parameter from 0.0 to other values such as 0.1, 0.01, or 0.001. Because what you want is just to distinguish the very close spheres, you don't need large numbers.
-===Longfei===
-<b>Installation of DOCK</b>
-If you want to install DOCK on your own desktop or laptop, and you are a beginner of Unix/Linux, you might encounter some problems during installation. A good reference for the installation is the DOCK User Manual(http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm). The problem I encountered was: after I used gnu as configuration file to configure the Makefile, I successfully made the <b>config.h</b> file, but when I was trying to build the DOCK executables, an error message " g77 command not found " appeared. And it is quite inconvenient to install the g77 command. A way to solve this problem is to open the <b>config.h</b> file, and manually change the <b>g77</b> in that file to <b>gfortran</b>.
-<b>Use the Correct Path of Your File</b>
-One frequently encountered problem is using the wrong path of your file. For example, when you use <b>sphere_selector</b> to generate spheres around the ligand, it requires two files--one is the spheres file .sph and another is your ligand file, but your ligand file is probably not in your working directory! So you need either copy your ligand file to the current directory or specify the correct path of your ligand file like below:
-sphere_selector 1DF8.receptor.sph ../01-dockprep/1DF8.ligand.mol2 10.0
-This is also important when you run DOCK. Remember to use the correct path and name of the your file.
-<b>Notice the Shell That You are Using</b>
-This tutorial is based on the shell <b>tcsh</b>, and maybe you are using a different shell. This will not cause any trouble in most of the time, but you may want to notice this in some circumstances, for example when you want to execute some <b>.csh</b> file. One way to solve this problem is that simply type "tcsh" in the command line, and you will be running tcsh in another shell.
-===Roman===
-One of the problems one may encounter with grid creation is the sampling location for ligand binding.  It is important to create a grid and dock the ligand to the correct location in the protein.  If the grid is created in the incorrect location, the ligand binding sites will be sampled within the grid, and the best binding site will not be sampled.  In our case, we are docking it to an area where a ligand is known to bind to streptavidin.  However, this may not always be the case.
-One solution is to dock to the center of mass of the spheres clusters created with spheregen and assess the center of mass as viable or not based on your chemical intuition.
-Another way is to assess the protein for buried sites where there may also be a number of spheres.  Both are options in the showbox program.
-Finally, one can attempt to find proteins with similar sequence and determine where their ligand binding sites are, and then pick the binding site in the protein you are docking to using that information.
-===Quan===
-===Hui===
-'''Keep the previous dock6 output .mol2 files'''
-When we run dock6 program, the output file would be "1DF8.output_scored.mol2", with the prefix "1DF8.output" that you specified in dock.in file. In the case that you would have several runs of dock6 program with the output file of the same path (in the same directory), and you want to keep the result for each run, you need to rename "1DF8.output_scored.mol2" file to another one, otherwise the old .mol2 file will be overwritten by the new .mol2 file generated in the next run.
+In order to run DOCK in parallel you have to use a slightly different build of DOCK6 called dock6.mpi.
+Message passing interface ([http://www.mpi-forum.org/docs/docs.html MPI]) is basically a program that allows programs like DOCK to run in parallel.
-For example, you can rename the old .mol2 file generated by the first run and then perform the second run.
+So, make another file called qsub.vs.csh with the contents:
- $mv 1DF8.output_scored.mol2 first_1DF8.output_scored.mol2
-===Tuoling===
-'''Some recommended parameters we can change when running dock'''
- max_orientations                                             500
- min_anchor_size                                              40
- pruning_conformer_score_cutoff                               25
- pruning_clustering_cutoff                                    100
-Explanations of these parameters can be found on the website provided by Long Fei.
-Increase the number of these parameters can lead to increased running time. But it enables more thorough cycling and may give you better results.
-===Yuchen===
-====Always Remember Your Path====
-One situation often appears is that the program will return an error saying "Could not open 1DF8.ligands.mol2 for reading." or something like that indicating that the program couldn't find the input file you specified in you <b>dock.in</b> or <b>vs.in</b> or some other files. In this case, the most often reason is that the path you specified in your input file is wrong. So the first thing you might want to do is to check those paths, pay special attention to those paths where you use "<b>.</b>" to indicate the same folder and "<b>..</b>" to indicate the folder up one level because this is the place where you are most likely to make mistakes. If you are not sure of the usage of "<b>.</b>" and "<b>..</b>", it will be safer for you to use the absolute paths of the files. And you can use the command <b>pwd</b> to ask the system where you are if you don't know that.
-===Yunting===
-====Delete Spheres Manually====
-We need to play some tricks if we want to manually select the spheres in <b>selected_spheres.sph</b>.
-. Transfer <b>sph</b> file to <b>pbd</b> file by using <b>showsphere</b>.<br>
-. Open the <b>pdb</b> file in Chimera and choose the sphere we want to delete.<br>
-. Identify the number of that sphere by <b>Actions->Label->residue->specifier</b>.<br>
-. Open <b>sph</b> file, delete that sphere and modify the number of spheres in that cluster.<br>
- DOCK spheres within 6.0 ang of ligands
- cluster     1   number of spheres in cluster    22
-  28.86000  11.09173   4.97943   2.337  586 0  0
-  28.67840   9.74620  12.13940   1.400   60 0  0
-  27.11509  13.42709  13.48051   1.401  385 0  0
-  27.79940  12.82468  12.04078   2.475  480 0  0
-  34.13903  14.01393   7.31591   3.590  691 0  0
-  36.78641  17.02434   9.54436   3.481  691 0  0
-  30.01753  14.79655  14.46633   1.402  174 0  0
-  25.74631  14.83304   9.98248   1.400  589 0  0
-===Kip===
-====Molecule preparation for molecular footprinting====
-A sample script:
   #!/bin/tcsh
+  #PBS -l nodes=4:ppn=2
-  #check programs
+  #PBS -l walltime=24:00:00
-  foreach prog (antechamber parmchk tleap)
+ #PBS -N screen
-         set prog_version = `which $prog`
+ #PBS -o qsub.log
-         if ( $prog_version == /nfs/user03/tbalius/amber10_ompi/bin/$prog ) then
+ #PBS -j oe
-                 echo $prog OK, current path is: `which $prog`
+  #PBS -V
-         else
-                 echo please double-check $prog path, current path is: `which $prog`
-         endif
-  end
-  #set your home directory on seawulf
+  cd /nfs/user03/username/AMS536/dock-tutorial/07.virtscreen
- set mydir = "$HOME"
+  mpirun -np 8 /nfs/user03/wjallen/local/dock6/bin/dock6.mpi -i dockvs.in -o dockvs.out
- # set up your ligands
- foreach lig (ligand_name1 ligand_name2 ligand_name3)
-         #set working dir and go there
-         echo setting up directory for: $lig
-         mkdir  "$mydir/DOCK_project2/10-ligandprep-top2mol2/$lig"
-         set workdir = "$mydir/DOCK_project2/10-ligandprep-top2mol2/$lig"
-         cd $workdir
-         cp ../$lig.mol2 .
-         echo 'Done!'
-         #convert mol2 to pdb and prep files
-         echo converting mol2 to pdb and prep files
-         antechamber -i $lig.mol2 -fi mol2  -o $lig.ante.pdb  -fo pdb
-         antechamber -i $lig.mol2 -fi mol2  -o $lig.ante.prep -fo prepi
-         echo 'Done!'
-         #check for missing parameters
-         echo checking parameters for ligand: $lig
-         parmchk -i $lig.ante.prep -f  prepi -o $lig.ante.frcmod
-         echo 'Done!'
-         #kludgey way to make input files
-         echo making input file for ligand: $lig
-         echo "set default PBradii mbondi2" > tleap.$lig.in
-         echo "source leaprc.ff99SB" >> tleap.$lig.in
-         echo "loadoff ../rizzo_amber7.ionparms/ions.lib" >> tleap.$lig.in
-         echo "loadamberparams ../rizzo_amber7.ionparms/ions.frcmod" >> tleap.$lig.in
-         echo "source leaprc.gaff" >> tleap.$lig.in
-         echo "loadamberparams ../rizzo_amber7.ionparms/gaff.dat.rizzo" >> tleap.$lig.in
-         echo "loadamberparams $lig.ante.frcmod" >> tleap.$lig.in
-         echo "loadamberprep $lig.ante.prep" >> tleap.$lig.in
-         echo "lig = loadpdb $lig.ante.pdb" >> tleap.$lig.in
-         echo "saveamberparm lig $lig.gas.leap.prm7 $lig.gas.leap.rst7" >> tleap.$lig.in
-         echo "solvateBox lig TIP3PBOX 10.0" >> tleap.$lig.in
-         echo "saveamberparm lig $lig.wat.leap.prm7 $lig.wat.leap.rst7" >> tleap.$lig.in
-         echo "charge lig" >> tleap.$lig.in
-         echo "quit" >> tleap.$lig.in
-         echo 'Done!'
-         #run tleap
-         echo running tleap for ligand: $lig
-         tleap -s -f tleap.$lig.in > tleap.$lig.out
-         echo 'Done!'
-         #make a new mol2 file
-         echo creating final mol2 file for ligand: $lig
-         top2mol2 -p $lig.gas.leap.prm7 -c $lig.gas.leap.rst7 -o $lig.gas.leap.mol2
-         echo 'Done!'
-         #make a new pdb file
-         echo creating final pdb file for ligand: $lig
-         ambpdb -p $lig.gas.leap.prm7 -tit "pdb" < $lig.gas.leap.rst7 > $lig.gas.leap.pdb
-         echo 'Done!'
- end
-====Keeping your data in sync====
-One problem you may encounter is that you want to run your jobs on Seawulf, but you also want the same files on a mathlab computer (or your home computer) for analysis. One option is to use RSYNC to keep your files on Seawulf and Herbie (or any mathlab computer) in sync. For example, if you want to move your DOCK Tutorial files to Seawulf, do the following from Herbie or any mathlab computer:
- rsync -arv /home/username/DOCK_Tutorial/ sw:/nfs/user03/username/DOCK_Tutorial
-*Note: The trailing slash on /home/username/DOCK_Tutorial'''/''' means that rsync will only copy the ''contents'' of your DOCK_Tutorial folders.  If you want to copy the DOCK_Tutorial folder itself, as well as its contents, then remove the trailing /.
-To copy newer files from Seawulf back to Herbie, do the following from Seawulf:
-  rsync -arv /nfs/user03/username/DOCK_Tutorial/ username@compute.mathlab.sunysb.edu:/home/username/DOCK_Tutorial
-You can apply this same strategy to then sync files with your home computer as well. Use the following format:
- rsync -arv source target
-*Note: this is easiest if you are using a Linux or Mac computer at home.
+As you can see there are two major changes:
-====Deleting jobs====
+ #PBS -l nodes=4:ppn=2 #Use 4 nodes, and 2 processors per node, so 8 processors
-If you make a mistake and need to delete a job from the queue, first list all your queued or running jobs:
- qstat -u username
+Note: since one processor is used to distribute the processes, this will run DOCK as 7 (n-1) parallel processes.
-*IMPORTANT: If you get no output from qstat, it means that whatever jobs you have submitted are done!
+ mpirun -np 8 /nfs/user03/wjallen/local/dock6/bin/dock6.mpi -i dockvs.in -o dockvs.out #this line uses mpi to run dock.mpi on multiple processors
-If you do have jobs running or waiting in the queue, qstat will output a list that includes their job id(s).  Find the job id for the one you want to delete, and do:
+And then we can run:
- qdel jobid
+  qsub qsub.vs.csh