2021 DOCK tutorial 4 with PDBID 1EFY
- 1 Introduction
- 2 Preparing the Ligand and Receptor
- 3 Surface Generation and Spehere Selection
- 4 Generating Box and Grid
- 5 Single Molecule Docking and Pose Reproduction
- 6 Virtual Screen
- 7 Cartesian Minimization
- 8 Rescore
DOCK is a member of a large suite of molecular docking packages. Notably, it was the first molecular docking package of its kind, and it was created by Irwin Kuntz’s group in the 1980s. The original authors of DOCK include Brian K. Shoichet, David A. Case, and Robert C.Rizzo. Molecular docking plays a key role in computer aided drug design (CADD). Generally, molecular docking allows us to explore the interactions between small molecules, or ligands, and proteins by evaluating the shape and energetics of the ligand. We will be using the 6.9 release of DOCK, the latest as of November 2018, for this tutorial. New features of DOCK6.9 include DOCK_DN, a new ligand searching method for ligand de novo design and fragment library generation, which allows users to generate their own fragment libraries from mol2 files. For a comprehensive list of what’s new in DOCK6.9, click here.
In this tutorial, we will be using DOCK to predict the interactions between a small molecule inhibitor and a protein target. Here is the general workflow that we will be following:
- Download a PDB file from the Protein Data Bank
- Add protons and partial charges to the small molecule inhibitor and the target (independently).
- Calculate the molecular surface of the target protein.
- Generate spheres surrounding the binding site.
- Calculate the bounding box and energy grid for the target protein.
- Match the ligand atoms to the sphere centers.
- Score the putative ligand atom centers using the energy grid.
- Rank the orientations by score.
UCSF Chimera is a standalone program that allows us to visualize and analyze macromolecules. General features of Chimera include automatic identification of atom, hydrogen addition and partial charge assignment, measurements of distances, angles, surface area, volume, and many more. For a complete description of everything Chimera can do, see https://www.rbvi.ucsf.edu/chimera/.
In this tutorial, we will be working with the crystal structure of a member of the Poly (ADP-Ribose) polymerase (PARP) family. Specifically, we will be working with the catalytic fragment of PARP along with a benzimidazole inhibitor. This crystal structure has a resolution of 2.20 Å and an R-Value free of 0.274. PARP family proteins are involved in DNA repair, and it has been shown that PARP may allow cancer cells to repair their DNA after exposure to chemotherapy, which allows for resistance to develop . As such, it is of particular interest to develop novel inhibitors for PARP. The goal for this tutorial is to demonstrate an end to end example of docking the benzimidazole inhibitor against the PARP target, as well as perform a virtual screen to find other small molecule inhibitors of PARP.
In order to successfully follow this tutorial, please make sure to have the following programs installed:
- MobaXterm (for Windows users)
- Chimera 1.15 (Please choose the right specifications for your OS)
- DOCK 6.9
We will be using a shell environment (either the built-in terminal for MacOS/Linux users or MobaXTerm for Windows users) to run several of the programs in this tutorial. As such, it may be useful to familiarize yourself with running basic UNIX commands and VIM.
In addition, this tutorial assumes you have access to a High Performance Computing cluster, i.e. Seawulf.
Before we begin our molecular docking and virtual screen, it will be useful to establish our directory structure.
First ssh into your account on Seawulf:
ssh <your netid>@login.seawulf.stonybrook.edu
Navigate into your student directory using the cd command. Once you are in your student directory, create a new directory for this tutorial and name it 1EFY. The shell command to make a new directory is mkdir.
Now move into the 1EFY directory. Next, we will be creating the following directories:
mkdir 001.structure 002.surface_spheres 003.gridbox 004.dock 005.virtual_screen 006.virtual_screen_mpi 007.cartesianmin 008.rescore
In order to move back up the tree out of the current directory, use:
Preparing the Ligand and Receptor
For this tutorial, we will be utilizing the Protein Data Bank for information about our target protein. The Protein Databank is a database containing the structural data for biological macromolecules. Visit the Protein Data Bank and search for 1EFY. You should be redirected to a page containing the X-ray crystal structure of our protein, as well as the primary literature about the protein.
Click on Download files > PDB Format. This file contains the crystal structure of the catalytic fragment of Poly (ADP-ribose) Polymerase complexed with a benzimidazole inhibitor. It contains information about the atomic coordinates, secondary structure assignments, and atomic connectivity of the protein and ligand.
We will be visualizing the structure in Chimera. Open Chimera and click File > Open. Once we select the PDB file that we have just downloaded, we should see the following.
We can see the receptor, the benzimidazole inhibitor, and two water molecules. In order to see the complex from different angles or positions, we may perform three dimensional manipulations. It is possible to rotate, translate, or scale the the view of the structures using the mouse. Please refer to this documentation to see which mouse controls are appropriate for your setup.
We will next describe how to separate the receptor and ligand into two separate MOL2 formatted files.
Preparing the Receptor
In order to prepare our system for DOCK, we must perform some preprocessing steps, since DOCK can only process Tripos MOL2 formatted files as input. Luckily, we can use Chimera to create two new MOL2 files out of the original PDB file: one for the receptor and one for the ligand. In addition, prior to docking, we must also use Chimera to specify the charges for every atom in the receptor and ligand.
Let's first create the MOL2 file for the receptor. Using the original PDB structure, we will strip everything but the receptor. In Chimera, choose
Select > Residue > all nonstandard
The benzimidazole inhibitor and the two water molecules should now be highlighted like this:
Again in Chimera, select
Actions > Atoms/Bonds > delete
This leaves us with a “cleaned” version of the receptor:
We can now save the receptor as a MOL2 file. Click
File > Save Mol2…
Save the file as 1EFY_rec_woH.mol2 where woH indicates the receptor without hydrogens. Next we must add hydrogens and charges, which is required for docking. Select
Tools > Structure Editing > AddH
A window should pop up like this:
Select OK. Now each residue in the receptor should be properly protonated. However, it is important that we validate the hydrogens that Chimera has added, as they may not be correct.
Similarly, we must add partial charges to each atom in the receptor. Select
Tools > Structure Editing > Add Charge.
Make sure that AM1-BCC is selected in the window that pops up and select OK. Now each atom of the receptor should have the appropriate partial charge. As with the protonation states, it is important to double check Chimera’s work.
This receptor, which has now been prepared for DOCK, can be saved as a MOL2 file named 1EFY_rec_dockprep.mol2.
We will now prepare the ligand in a similar fashion. Open the original PDB structure in Chimera. In this case, we will be deleting all of the constituents in the structure except for the benzimidazole inhibitor. Click
Select > Residue > BZC.
This should highlight the inhibitor. Click
Select > Invert (all models).
This should highlight everything except the inhibitor. Click
Actions > Atoms/Bonds > delete
Only the ligand should now be present:
We will save this structure as MOL2 file.
File > save as mol2 > 1EFY_ligand_noH.mol2
where noH indicates no hydrogens. Just as before, we must now add the hydrogens and charge.
We will save this charged and protonated structure as a MOL2 file:
File > save as mol2 > 1EFY_ligand_H.mol2
Now that we have prepared our files for DOCK, we can scp or rsync all four of the files into the 001.structure directory.
Surface Generation and Spehere Selection
Identification of the binding site on a receptor is driven by identifying its concave surfaces. Once the molecular surface has been calculated, we can generate overlapping spheres to describe the negative image of the molecular surface. The centers of the spheres constituting the largest cluster represents possible positions where the ligand can bind. In Chimera, we can calculate the molecular surface according to the following steps: Open “1EFY_rec_woH.mol2” Select Actions > Surface > Show
This shows the solvent excluded molecular surface of the receptor. Now we can use DMS, an open source program, to calculate the molecular surface of the receptor. Tools > Structure Editing > Write DMS and save the file name as “1EFY_rec_surphace.dms” This file can be moved to 002 Now that the molecular surface has been calculated and written to the DMS file, we can construct spheres to fill in any concave regions of the surface of the receptor by using the sphgen program in DOCK. The centers of the spheres represent positions of ligand atoms. The template for sphere generation using sphgen looks like the following:
INPUT FILE*: rec.ms #molecular surface file R #sphere outside of surface (R) or inside surface (L) X #specifies subset of surface points to be used (X=all points) 0.0 #prevents generation of large spheres with close surface contacts (default=0.0) 4.0 #maximum sphere radius in angstroms (default=4.0) 1.4 #minimum sphere radius in angstroms (default=radius of probe) rec.sph #clustered spheres file
Our file looked like this:
1efy_rec_surface.dms R X 0.0 4.0 1.4 1EFY_receptor_noH.sph
You can run the sphgen program like this:
sphgen -i INSPH -o OUTSPH
Chimera can read the output of the sphgen program
Generating Box and Grid
Many individual floating point calculations are required for the computation of Lennard-Jones and Coulombic Interactions. Calculating the nonbonded interactions for a macromolecule with 100,000 atoms would be prohibitively expensive. We can calculate the overlapping volume of two sets of atoms using a grid based method. The intuition behind this method is the following: Generate a bounding box that contains all of the atoms. Generate grid points with a predefined separation. Tradeoff between grid separation and computational resources We will be using the accessory programs showbox and grid in order to generate the grids required for grid based scoring (which is based on the non-bonded terms of the molecular mechanic force field). The first step is to generate a box enclosing the active site of the receptor. Create an input file for showbox: Touch showbox.in
compute_grids yes grid_spacing 0.4 output_molecule no contact_score no energy_score yes energy_cutoff_distance 9999 atom_model a attractive_exponent 6 repulsive_exponent 9 distance_dielectric yes dielectric_factor 4 bump_filter yes bump_overlap 0.75 receptor_file ../00.files/1efy_receptor.mol2 box_file 1EFY.box.pdb vdw_definition_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn score_grid_prefix grid
We can pipe in the input file into the showbox with the command “showbox < box.in” The output of the program will be a PDB file containing the structure of the bounding box
- The grid calculation can take up to 45 minutes*
The Grid program generates the grid that allows for rapid score evaluation in DOCK
Single Molecule Docking and Pose Reproduction
In order to meaningfully dock new ligands into the receptor, the current model must be verified for its accuracy. To accomplish this, the co-crystallized ligand will be energy minimized and docked into the 1efy receptor. The resulting pose should closely resemble the crystal structure. If significant deviations are noted, the ligand and receptor should be scrutinized for correct charge, protonation, and resonance.
First the input file should be constructed
dock6 -i min.in
Input the following
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../001.dockprep/3jqz_lig_withH.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../001.dockprep/3jqz_lig_withH.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003.gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min yes simplex_coefficient_restraint 10.0 atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6/parameters/flex_drive.tbl ligand_outfile_prefix 3jqz.lig.min write_orientations no num_scored_conformers 1 rank_ligands no
Rigid docking is the simplest and lowest cost of the three main docking methods, as it essentially attempts to dock the desired ligand in its native pose, only allowing 3d transformation and rotations, as well as very minor torsional corrections.
Create the input file
dock6 -i rigid.in
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file 1EFY_ligand_min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename 1EFY_ligand_min_scored.mol2 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../02.surface_spheres/selected_spheres.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../03.grid_box/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix rigid.out write_orientations no num_scored_conformers 1 rank_ligands no
At this point, it is important to visualize the resulting pose in comparison to the crystallized ligand. Open both in Chimera along with the receptor file to ensure no major changes have occurred