2012 DOCK tutorial with Streptavidin

From Rizzo_Lab
Revision as of 10:38, 27 February 2012 by Stonybrook (talk | contribs) (IV. Generating Box and Grid)
Jump to: navigation, search

For additional Rizzo Lab tutorials see DOCK Tutorials.

Use this link Wiki Markup as a reference for editing the wiki.

I. Introduction


DOCK is a molecular docking program used in drug discovery. It was developed by Irwin D. Kuntz, Jr. and colleagues at UCSF (see UCSF DOCK). This program, given a protein binding site and a small molecule, tries to predict the correct binding mode of the small molecule in the binding site, and the associated binding energy. Small molecules with highly favorable binding energies could be new drug leads. This makes DOCK a valuable drug discovery tool. DOCK is typically used to screen massive libraries of millions of compounds against a protein to isolate potential drug leads. These leads are then further studied, and could eventually result in a new, marketable drug. DOCK works well as a screening procedure for generating leads, but is not currently as useful for optimization of those leads.

DOCK 6 uses an incremental construction algorithm called anchor and grow. It is described by a three-step process:

  1. Rigid portion of ligand (anchor) is docked by geometric methods.
  2. Non-rigid segments added in layers; energy minimized.
  3. The resulting configurations are 'pruned' and energy re-minimized, yielding the docked configurations.

Streptavidin & Biotin

Streptavidin is a tetrameric prokaryote protein that binds the co-enzyme biotin with an extremely high affinity. The streptavidin monomer is composed of eight antiparallel beta-strands which folds to give a beta barrel tertiary structure. A biotin binding-site is located at one end of each β-barrel, which has a high affinity as well as a high avidity for biotin. Four identical streptavidin monomers associate to give streptavidin’s tetrameric quaternary structure. The biotin binding-site in each barrel consists of residues from the interior of the barrel, together with a conserved Trp120 from neighbouring subunit. In this way, each subunit contributes to the binding site on the neighboring subunit, and so the tetramer can also be considered a dimer of functional dimers.

Biotin is a water soluble B-vitamin complex which is composed of an ureido (tetrahydroimidizalone) ring fused with a tetrahydrothiophene ring. It is a co-enzyme that is required in the metabolism of fatty acids and leucine. It is also involved in gluconeogenisis.

Organizing Directories

While performing docking, it is convenient to adopt a standard directory structure / naming scheme, so that files are easy to find / identify. For this tutorial, we will use something similar to the following:


The following sections in this tutorial will refer back to files within these directories.

II. Preparing the Receptor and Ligand

Downloading the PDB Structure

Preparing for DOCK with Chimera

III. Generating Receptor Surface and Spheres

Receptor Surface

To generate an enzyme surface, first open the receptor pdb file with the hydrogen atoms removed (1DF8.receptor.noH.pdb). Next, go to Actions -> Surface -> Show. Note that for DOCK calculation hydrogen atoms are considered, but for generating enzyme surface and spheres, it is necessary to use the protein without hydrogens.


IV. Generating Box and Grid


In order to speed up docking calculations, DOCK generates a fine grid, and at each point in the grid electrostatic and a VDW probes' energies are precomputed. The energies are computed using a molecular force field. To determine the dimentions of the grid, however, we first generate a box that contains the outer boundaries for grid calculation. The dimentions and location of the box can be determined using a program called showbox.

First create a directory where you will place the grid files.

$mkdir 03-box-grid
$cd 03-box-grid

Showbox can be used interactively or a file with predetermined answers can be fed into the program.

The program asks the questions depicted in the diagram the right:

Error creating thumbnail: Unable to save thumbnail to destination
Flow Chart of Questions for Showbox (Red path is followed in this tutorial)

To run the program in the interactive mode, run


To feed the answers to the questions, run

$showbox < showbox.in

for example, showbox.in can contain:


Y means we use automatic box construction, 5 is the extra margin to be enclosed around our ligand (in Angstroms), selected_spheres.sph is the sphere file we generated, 1 corresponds to the cluster number in the selected_spheres.sph file, and 1DF8.box.pdb is the output file. We can open the output box file in chimera to make sure the box is in the right place.

1DF8 receptor along with our ligand and the box we generated using showbox


Now let's generate a grid within our box. We will use the energy scoring method to generate a grid, resulting in three additional files with extensions *.nrg, *.bmp, and *.out. The *.nrg file contains the energy scoring, *.bmp contains the size, position and grid spacing and determines whether there are any overlap with receptor atoms.

To generate the grid we will use the grid program. This program can either be used interactively, or an input file can be fed in, just like the showbox program.

Usage: grid [-i [input_file]] [-o [output_file]] ...
 [-standard_i/o] [-terse] [-verbose]
 -i: read from grid.in or input_file, standard_in otherwise
 -o: write to grid.out or output_file (-i required), 
     standard_out otherwise
 -s: read from and write to standard streams (-i and/or -o illegal)
 -t: terse program output
 -v: verbose program output

For our grid.in file, we will use the following answers:

compute_grids                  yes
grid_spacing                   0.3
output_molecule                no
contact_score                  no
energy_score                   yes
energy_cutoff_distance         9999
atom_model                     a
attractive_exponent            6
repulsive_exponent             9
distance_dielectric            yes
dielectric_factor              4
bump_filter                    yes
bump_overlap                   0.75
receptor_file                  ../01-dockprep/1DF8.receptor.mol2
box_file                       1DF8.box.pdb
vdw_definition_file            /opt/software/AMS536software/dock6/parameters/vdw_AMBER_parm99.defn
score_grid_prefix              grid

Line by line:

  1. compute scoring grids (yes)
  2. what is the distance between grid points along each axis (in Angstroms).
  3. write up coordinates of the receptor into a new file
  4. compute contact grid? default is no
  5. compute energy score? yes - we are using this method to compute force fields on probes
  6. the max distance between atoms for the energy contribution to be computed
  7. atom_model u means united atom model where atoms are attached to hydrogens, and a stands for all-atom model, where hydrogens on carbons are treated separately
  8. attractive component stands for exponent of the attractive LJ term in VDW potential
  9. repulseive component stands for exponent in the repulsive LJ term in VDW potential
  10. distance dielectric stands for the dielectric constant to be linearly dependent on distance
  11. distance dielectric factor is the coefficient of the dielectric
  12. bump filter flag determines if we want to screen orientation for clashes before scoring and minimization
  13. bump_overlap stands for the fraction of allowed overlap where 1 corresponds to no allowed overlap and 0 corresponds to full overlap being permitted.
  14. our receptor file
  15. the box file we generated in the Box section
  16. VDW parameters file
  17. Prefix for the grid file name. All the extensions will be generated automatically.

V. Docking a Single Molecule for Pose Reproduction



VI. Virtual Screening

Virtual Screening Protocol

Virtual Screening Results

VII. Running DOCK in Serial and in Parallel on Seawulf

Use PBS Queue as a reference.

Serial Calculation for Pose Reproduction

Parallel Virtual Screen

VIII. Frequently Encountered Problems