2022 DOCK tutorial 2 with PDBID 4ZUD

From Rizzo_Lab
Revision as of 14:35, 28 February 2022 by Stonybrook (talk | contribs) (Ligand Minimization)
Jump to: navigation, search



DOCK is a molecular modeling program capable of sampling lower-energy ligand conformations with respect to a binding surface on a given protein. DOCK utilizes and manipulates the geometry of the ligand to find the conformation that yields that most favorable interaction with the respective binding site. With this tool, millions of molecules can be rapidly screened against a target protein for the purposes of identifying new drug molecules that are physiologically relevant.

For more information on DOCK and it's uses, please refer to their online manual: DOCK6 Manual

4ZUD System

Directory Preparation

Before beginning, create the following directories in your space so that all necessary files are organized and can be access quickly:

mkdir 001.structure 002.surface_spheres 003.gridbox 004.dock 005.virtual_screen  006.cartesian_min 007.rescore

You don't have to name your directories the same as they are named here, but be cautious since the files that will be used for this tutorial utilize this naming scheme. They will need to be changed in each file that refers to them if you don't use this naming scheme!

Be sure to have Chimera installed on your system as it will be our primary visualization and system-editing program.

Protein and Ligand Preparation

Download the 4ZUD PDB file from the RCSB PDB website and open the file in Chimera.

Select -> Open -> (pathway to pdb file on your local machine) -> Open
4zud pdb.png

You will notice a few side chain residues are explicitly displayed; those are the ones that directly engage with the ligand. The structure also has some missing regions denoted by the dashed-lines. These regions do not have to be modeled to use the system for docking since the majority of the protein remains restrained during the process (except for the residues of the active site, to a certain extent). You can play around with Chimera and visualize the protein from different angles to get a complete look at the protein to ensure there are no glaring errors in the structure that could have somehow arose from the downloading and opening process (Doesn't usually happen, but it's always good to be sure before moving on!)

Protein Preparation

Many structures deposited in the PDB lack hydrogens due to the difficulty in resolving their electron densities from cryo-EM or X-ray crystallography. The structures also lack formal charges since that information is not captured with out current experimental structure-determining techniques. Both charges and hydrogens are crucial for accurately studying any chemical system, and so they both must be added manually to 4ZUD in order to prime the system for docking.

First we want to delete the ligand from the file since we want to save the protein separately

Select -> Residue -> OLM
Actions -> Atoms/Bonds -> Delete

4ZUD Structure Caveat

In the 4ZUD paper it mentions that ILE53 was mutated to an Alanine, but when you load the PDB into Chimera, it is recognized as an isoleucine but with an alanine side-chain.

4zud pdb mutated I53A (1).png

There is no immediately obvious explanation for this, but since it is being recognized as an isoleucine, we're just going to edit the side chain to be the correct one. ILE53 can be selected by using the following command on the Chimera Command Line interface which could be accessed under Tools:

Tools -> General Controls -> Command Line

Selection Command:

select #1:53

Make sure that you're selecting model 1 or whatever your current model is ranked at under Tools -> General Control -> Model Panel

Once residue 53 is selected, you can change the side-chain atoms by selecting a new rotamer type:

Tools -> Structure Editing -> Rotamers

In the Choose Rotamer Parameters window, select the ILE rotamer type and press apply.

In the ILE 53.A Side-Chain Rotamers window, choose the highest probability rotamers and press apply.

4zud rotamers ILE53 chimera rotamers window.png

You should now see your alanine side chain change into an isoleucine one. It will remove the hydrogens so you have to add those back in, just for this residue.

Adding Hydrogens

To select the entire protein:

Select -> Chain -> A

To add the hydrogens to the protein:

Tools -> Structure Editing -> AddH

All residues in the protein should now have all of the hydrogens that were missing. Make sure to look at the output log of this command just in-case any errors arise, although there should be none if the instructions were followed thus far.

Adding Charges

There is a similar Chimera command to add charges to your protein selection:

Tools -> Structure Editing -> Add Charge

After using this command you should receive an error stating that Correct charges are unknown for 3 non-standard atom names in otherwise standard residues. If you look at those atoms in the reply log, they're hydrogens belonging to ILE53. If you take a look at the paper that accompanies the 4ZUD structure, those hydrogens were replaced by tritium for crystallization purposes and Chimera does not recognize them as standard atoms and doesn't have predefined partial charges for them. Since Chimera doesn't recognize them, it will not apply charges to them. Additionally, since they are just hydrogens (meaning that their charge contributions to the system is often times very minimal) and the residue is not near the ligand active site, we do not have to do anything further in terms of adding charges.

The Add Charge command predicts the protein net charge to be -2.913. Make sure that this value is in agreement with what the PDB website or the corresponding paper says about the charge, if that information is provided.

File Saving

Once you have completed all of the aforementioned steps, you have to save the protein as a mol2 file.

File -> Save Mol2

For the purposes of this tutorial the file will be called 4ZUD_protein_hydrogens.mol2

You will want to save a version of this file without hydrogens. To do that, you can select and delete all of the hydrogens like so:

Select -> Chemistry -> element -> H
Actions -> Atoms/Bonds -> Delete

You can then resave the file. 4ZUD_protein_without_hydrogens.mol2

Ligand Preparation

Now we want to focus on preparing the ligand. We can reopen the unmodified 4ZUD PDB in Chimera and delete the protein from the file. You can do this by doing an inverse selection for the protein. First select the ligand:

Select -> Residue -> OLM

On your keyboard, press Shift + Right-Arrow keys simultaneously to invert the selection to the parts that belong to the protein. You can then delete your selection, and you should be left with just the ligand:

Actions -> Atoms/Bonds -> Delete

Adding Hydrogens

To add hydrogens to the ligand, you can follow the same procedure that you used to add them to the protein § Adding Hydrogens.

Adding Charges

You can follow the same procedures to add the charges to the ligand as well § Adding Charges.

The Add Charge command predicts a net -1 charge for the ligand. This makes sense since there is a carboxylate group in the molecules. Again, make sure to corroborate this with what the paper states and/or the PDB website states about the charge.

File Saving

You can then save the file the same way you did for the protein mol2. For the purposes of this tutorial, the file will be named 4ZUD_ligand_hydrogens.mol2

Before going any further, make sure to copy the mol2 files that were generated thus far from your local computer to Seawulf. Place these files in the 001.structure directory, or whichever one is the equivalent directory in your records.

Surface Spheres Generation

Grid Box

Grid Generation


Next, we'll need to generate the grid we'll use in subsequent steps. Let's start by creating our grid.in file:

   vi grid.in

Into this file copy-paste the following:

 compute_grids                             yes
 grid_spacing                              0.4
 output_molecule                           no
 contact_score                             no
 energy_score                              yes
 energy_cutoff_distance                    9999
 atom_model                                a
 attractive_exponent                       6
 repulsive_exponent                        9
 distance_dielectric                       yes
 dielectric_factor                         4
 bump_filter                               yes
 bump_overlap                              0.75
 receptor_file                             ../001_structure/4zud_receptor_wcharges_wH.mol2
 box_file                                  4zud.box.pdb
 vdw_definition_file                      /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
 score_grid_prefix                        grid

Now that there is a grid.in file, the grid can be produced by running the following:

    grid -i grid.in -o gridinfo.out

Upon successful completion of the grid generation, there should be three new files in your working directory: grid.bmp, grid.nrg, and gridinfo.out.

Box Generation


Ligand Minimization

While the ligand in the 4ZUD crystal structure seems to be in a reasonable pose (no steric clashes, etc.), and is of course bound to the receptor, it may not be representative of the lowest energy conformation. Assuming this is the case, the use of the non-minimized ligand conformation will reduce the accuracy of any calculations we perform with it.

One can avoid this problem by performing an energy minimization of the ligand. The first step to do this with 4ZUD will be to cd into the appropriate directory:

   cd 004.dock

We'll be doing some other work in this directory later, so we should create a directory within this one specifically for performing our minimization:

  mkdir min

cd into /min/ and produce an input file:

   vi min.in

Into which the following should be copy-pasted:

  conformer_search_type                                        rigid
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100.0
  ligand_atom_file                                             ../001_structure/4zud_ligand_wcharges_wH.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               yes
  use_rmsd_reference_mol                                       yes
  rmsd_reference_filename                                      ../001_structure/4zud_ligand_wcharges_wH.mol2
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes   
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           yes
  grid_score_secondary                                         no
  grid_score_rep_rad_scale                                     1
  grid_score_vdw_scale                                         1
  grid_score_es_scale                                          1  
  grid_score_grid_prefix                                       ../003_gridbox/grid
  multigrid_score_secondary                                    no
  dock3.5_score_secondary                                      no
  continuous_score_secondary                                   no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              yes
  simplex_max_iterations                                       1000
  simplex_tors_premin_iterations                               0 
  simplex_max_cycles                                           1
  simplex_score_converge                                       0.1
  simplex_cycle_converge                                       1.0
  simplex_trans_step                                           1.0
  simplex_rot_step                                             0.1
  simplex_tors_step                                            10.0
  simplex_random_seed                                          0
  simplex_restraint_min                                        yes
  simplex_coefficient_restraint                                10.0
  atom_model                                                   all
  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        4zud.lig.min

Now that we have our .in file, let's put it to use:

 dock6 -i min.in -o min.out

Once this finishes running, two files will be generated: 4zud.lig.min_scored.mol2 and min.out. Copy 4zud.lig.min_scored.mol2 to your local machine and open it alongside the original ligand in chimera. It should something like this:

Min ligand.png

Where the blue structure is the ligand before minimization, and the beige ligand is after.


The main goal of this section is to conduct a pose reproduction of the crystal structure ligand to ensure that we are able to successfully re-produce the low-energy pose that the ligand was crystalized in. If docking is not successful at this, then this system may not be suitable for further testing using DOCK since it is not able to produce a pose that is already known to be sampled by the ligand, based on observations from the crystal. For these reasons, pose reproduction is a worthwhile preliminary step in any docking experiments so that we can be more confident in our results if the system will be used for more complex docking protocols such as virtual screening or de novo design.

In the current implementation of DOCK6, a sampling success is defined as a pose RMSD that is < 2 Å relative to the crystal pose.

Rigid Docking

Rigid docking is the least computationally intensive algorithm in the DOCK program since it does not allow for dihedral rotations, or bond length perturbations. Since we are just focusing on pose reproduction (and since we know that the crystal structure ligand pose is of biological relevance)

Fixed Anchor Docking

Flex Docking

Virtual Screening