Difference between revisions of "2021 DOCK tutorial 1 with PDBID 1HW9"

Latest revision as of 12:41, 7 March 2022

Introduction

DOCK

DOCK 6 is one of the many tools available to computational biologists that predicts ligand binding geometries and interactions. The functions of DOCK 6 are diverse and have several general applications. A primary use of the program involves a virtual screening of thousands of molecules for an intended purpose. These purposes can include database screenings for molecules that inhibit enzyme activity, bind a particular protein, or even bind to larger complexes. As more versions of the program are released, new features are added such as the inclusion of solvation and receptor flexibility considerations in its calculations.

IHW9

IHW9 is the PDB code for the catalytic complex between human HMG-CoA reductase (HMGR) and Simvastatin. HMGR is considered a rate-controlling enzyme in the metabolic pathway responsible for the biosynthesis of cholesterol. Inhibitors of HMGR, known as statins, are often prescribed as treatment therapies for high cholesterol patients. While statins inhibit the catalytic effect of HMGR, they also provide other positive biochemical effects such as the stimulation of bone growth and anti-inflammatory responses. Studying statin binding using this complex can potentially aid in the discovery of drugs capable of producing these off-target effects.

Directory Preparation

Before even beginning the DOCK prep, it's much easier to create the directories we'll be using throughout the calculation now rather than later. To begin, make a directory using the PDB code provided on the website. This makes organizing the files cleaner and easier to follow.

  mkdir 1HW9

Once the directory is made, enter it as we will be creating several more directories within this directory.

  cd 1HW9

Inside the 1HW9 directory, you should make several additional directories with the following titles. You do not have to follow the same nomenclature provided here as these are simply examples. However, the names you ultimately choose should be indicative of the information you will be keeping in them. This makes finding files much easier later on in the DOCK process.

  001_structure  002_surface_spheres  003_gridbox  004_dock  005_virtual_screen  006_virtual_screen_mpi  007_cartesianmin  008_rescore

You can confirm the presence of these directories within 1HW9 by using the "ls" command.

Receptor Preparation

Downloading and Opening 1HW9

The first file we need to get started is the 1HW9 PDB file from the website. If you head over to the Protein Data Base website, you'll notice a search bar located at the top of the page. Simply type in "1HW9" into that search bar and it will take you to the corresponding page. You can also search using the name of the complex, but since we already have the corresponding code, this makes things much easier.

Once on the webpage, you'll notice a breadth of information regarding the protein complex. Take note of the information you see and where you can access this information such as relevant literature links and images. At the right of your screen, you'll see a "Download Files" tab which you should click on.

  Download Files -> PDB format -> (Your_Specified_Folder)

Since you are now the proud owner of the 1HW9 PDB file, you can begin opening it up in Chimera. Once Chimera is open, go to the top left and click:

  File -> Open -> (Path_to_Specified Folder) -> 1HW9.pdb

Once open, the program should show you something that looks like the above image. This is known as a ribbon diagram and visualizes several properties of the HMGR-Sim complex. Using the ribbon diagram and Chimera, we can ascertain several defining characteristics such as specific residues, rotatable bonds, and protein chains. The image at the moment looks like a lot, but you would never run a calculation with all this information at once. We need to isolate several components first to make the image easier to follow and calculations easier to perform.

DOCK Prep of the Protein Receptor

While having the entire complex at our disposal is nice, it doesn't do much good to have all chains present during the calculations. This spends unnecessary computational resources and time. Instead, we can isolate one of the chains that incorporates the interaction between HMGR and Simvastatin. Our focus will be on Chain A of the complex. In order to visualize the chain:

  Select -> Chain -> A

Notice, however, that this does not remove Chains B, C, and D from our view. We need to perform this action manually.

  Select -> Invert (Selected Molecules) -> Actions -> Atoms/Bonds -> Delete

After deletion, the only thing remaining should be Chain A of the original file. It is within your best interest to save this session as you'll need to come back to this file when preparing the ligand. Doing so will prevent you from having to do the above deletion step again.

  File -> Save Session As ->

Now that the session has been saved, we can go ahead and isolate the receptor. While we've isolated Chain A, you'll notice that Simvastatin is still located on the molecule. In order to remove everything besides HMGR, we need to do the following:

  Select -> Residue -> ADP -> Actions -> Atoms/Bonds -> Delete
  Select -> Residue -> HOH -> Actions -> Atoms/Bonds -> Delete
  Select -> Residue -> SIM -> Actions -> Atoms/Bonds -> Delete

Notice that not only did we remove Simvastatin, but we also removed any ADP and water molecules. These are not vital to the calculations and can be removed as such. The only item remaining should be the isolated receptor.

The isolated receptor can now be saved as a mol2 file which we will again modify shortly after saving.

  File -> Save Mol2 -> 1HW9_rec_noH.mol2

We now need to actually prepare the molecule for future DOCK calculations. This can be done using Chimera to add both charge and hydrogens to the receptor. Fortunately, there is a very useful command in Chimera that can do both in one step.

  Tools -> Structure Editing -> DOCK prep

Follow the onscreen prompts and command windows to fully prep the receptor. This would be a great time to review the literature present on the PDB website. Consulting the literature, you'll notice the neutral charge on the receptor molecule. It's important to note that this charge needs to be the same charge that you assign the receptor molecule in Chimera during the dock prep. After you're finished, the receptor should look like this.

  Atoms/Bonds -> Show

Now that the receptor is prepped, we can save it and move on to isolating and prepping the ligand. This file should be saved as a separate mol2 file.

  File -> Save Mol2 -> (Destination) -> 1HW9_rec_wH.mol2

Ligand Preparation

Isolating the Ligand

Open the session in Chimera we previously saved that consisted of Chain A with both the receptor and Simvastatin ligand. This time around, we'll be using the session file to isolate the ligand instead of the receptor.

  Select -> Residue -> SIM -> Select -> Invert (Selected Molecules) -> Actions -> Atoms/Bonds -> Delete

Similar to the receptor, we'll need to save this ligand as its own mol2 file.

  File -> Save Mol2 -> (Destination) -> 1HW9_lig_noH.mol2

DOCK Prep of the Ligand

We now need to prepare the ligand file for DOCK as well. This will be done in the exact same manner as the DOCK prep for the receptor.

  Tools -> Structure Editing -> DOCK Prep

Again, follow the onscreen prompts to complete this step. However, you'll notice that Simvastatin ligand will have a net charge of -1. This needs to be confirmed using the literature on the PDB website. If we look at the residues at the binding pocket, we'll see the presence of arginine and lysine residues. This would indicate that that ligand here is most likely in a deprotonated state.

Save this file under a different name using the previous nomenclature.

  File -> Save Mol2 -> (Destination) -> 1HW9_lig_wH.mol2

At this point in time, make sure that all previously generated files have been moved to the 001_structure directory we created at the beginning of this tutorial.

Surface Generation and Selection of Spheres

Visualizing the Surface

Chimera is great because it can create an even better visualization of the surface of the receptor protein. The program covers the expected surface of the protein with a series of spheres. In doing so, we create an image that easily visualizes the binding pocket of the protein. In order to create this surface:

  File -> Open -> 1HW9_rec_noH.mol2 -> Actions -> Surface -> Show

The blue surface in the image represents the ligand. It's great to visualize both as you can see where these two molecules are interacting.

Of course, we need to save this surface without the ligand for future calculations. This surface will aid in finding exactly where the ligand-receptor interaction is occuring.

  Tools -> Structure Editing -> Write DMS -> 1HW9_rec_surface.dms

Once saved, please move the .dms file to the 002_surface_spheres directory created earlier.

Sphere Selection

We now need to visualize the empty space that exists on the surface of the protein. These empty spaces represent possible locations for binding to occur. Several properties can be modified when generating the surface spheres such as the minimum difference between the spheres and surface, and maximum/minimum sphere radii. Ideally, we want to avoid steric clashing so we'll set this property to 0. Additionally, we don't want the generated spheres to be abnormally large or abnormally small. As such, we'll set the boundaries for these properties to 4 (max) and 1.4 (min).

Before running the command to generate the spheres, we need to create an input file using the properties previously described. For programming purposes, please name the input file and output file as you see it here.

  vi INSPH

Let's set up the file with the desired properties. Do not copy the comments into your file

  1HW9_rec_surface.dms      -> your input file we made a step before
  R                         -> R flag specifies external sphere generation
  X                         -> Specifies all surface points
  0.0                       -> Steric clashing
  4.0                       -> Max sphere radii
  1.4                       -> Min sphere radii
  1HW9_receptor_noH.sph     -> Output file

We can now use our input file to run the Linux sphere generation command "sphgen"

  sphgen -i INSPH -o OUTSPH

This will generate the .sph file we specified in the INSPH input file. If you followed exactly, 1HW9_receptor_noH.sph should appear in your directory. Let's open it in Chimera to visually confirm the success.

The above image represents the generated spheres on the surface of our protein receptor. If you only open the sphere file, you'll only see the colored spheres in Chimera. Opening up the mol2 file of the receptor will show you where these spheres are in reference to the surface of the receptor.

Notice, however, that some of these surface spheres are relatively useless for the docking calculations. We only want to focus on the spheres that are present within the binding pocket of the receptor. Luckily, there's a command we can use to narrow down the spheres that are within 10 angstroms of Simvastatin.

  sphere_selector 1HW9_receptor_noH.sph 1HW9_lig_wH.mol2 10.0

This will output a file in the directory titled "selected_spheres.sph"

Box and Grid Generation

DOCK calculations can be computationally expensive, so it's important to perform as many actions as possible to reduce the amount of computation power required. One such step is the generation of an energy grid. DOCK uses the grid to calculate an energy score between one ligand pose and the receptor. Any poses or interactions that occur outside this grid are simply ignored.

Box Generation

Before beginning, make sure to switch the directory you're working in. We will now be working in our third directory.

  cd 003_gridbox

We'll be using Showbox, to generate the box around our molecule. First, we need to generate an input file that the command can use to perform such actions.

  vi showbox.in

Similar to before, we'll be placing several parameters into this input file to provide a bit of direction during the box generation.

  Y                                             -> Says yes, we want the box to be generated                                
  8.0                                           -> How far in angstroms should the box appear from our selected spheres                            
  ../002_surface_spheres/selected_spheres.sph   -> Location of your our selected spheres file
  1 
  1HW9.box.pdb                                  -> Output file

In order to execute the command:

  showbox < showbox.in

Upon success, 1HW9.box.pdb should appear in your current directory.

Grid Generation

We'll next need to generate the grid used for the DOCK calculations. To do so, we'll need to generate a new input fie:

  vi grid.in

Similar to our other input files, we'll specify a number of parameters that will guide the grid generation process.

  compute_grids                             yes
  grid_spacing                              0.4
  output_molecule                           no
  contact_score                             no
  energy_score                              yes
  energy_cutoff_distance                    9999
  atom_model                                a
  attractive_exponent                       6
  repulsive_exponent                        9
  distance_dielectric                       yes
  dielectric_factor                         4
  bump_filter                               yes
  bump_overlap                              0.75
  receptor_file                             ../001_structure/1HW9_rec_wH.mol2
  box_file                                  1HW9.box.pdb
  vdw_definition_file                      /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  score_grid_prefix                        grid

In order to execute the script:

  grid -i grid.in -o gridinfo.out

If the script runs successfully, three new files should be in your directory. These files include: gridinfo.out, grid.bmp, and grid.nrg

Energy Minimization

Even though the ligand in the crystal structure binds to the receptor, it may not represent the lowest possible energy conformation. In order to increase the accuracy of the DOCK calculation, we need to reduce the potential energy of the ligand which will aid in reproducing other conformers. Before we begin, let's change to the fourth directory that we created.

  cd 004_dock

In this directory, we'll need to create an input file for the energy minimization calculation

  vi min.in

This input file should have the following parameters and paths:

  conformer_search_type                                        rigid
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100.0
  ligand_atom_file                                             ../001_structure/1HW9_lig_wH.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               yes
  use_rmsd_reference_mol                                       yes
  rmsd_reference_filename                                      ../001_structure/1HW9_lig_wH.mol2
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           yes
  grid_score_secondary                                         no
  grid_score_rep_rad_scale                                     1
  grid_score_vdw_scale                                         1
  grid_score_es_scale                                          1
  grid_score_grid_prefix                                       ../003_gridbox/grid
  multigrid_score_secondary                                    no
  dock3.5_score_secondary                                      no
  continuous_score_secondary                                   no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              yes
  simplex_max_iterations                                       1000
  simplex_tors_premin_iterations                               0
  simplex_max_cycles                                           1
  simplex_score_converge                                       0.1
  simplex_cycle_converge                                       1.0
  simplex_trans_step                                           1.0
  simplex_rot_step                                             0.1
  simplex_tors_step                                            10.0
  simplex_random_seed                                          0
  simplex_tors_step                                            10.0
  simplex_random_seed                                          0 

  simplex_restraint_min                                        yes
  simplex_coefficient_restraint                                10.0
  atom_model                                                   all
  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        1HW9.lig.min
  write_orientations                                           no
  num_scored_conformers                                        1
  rank_ligands                                                 no

In order to execute:

  dock6 -i min.in -o min.out

This will generate two new files in the directory. You should see min.out and 1HW9.lig.min_scored.mol2

We can visualize the difference between the energy minimized ligand and its original pose. The beige ligand in the above image represents the energy minimized form. The blue is the original ligand. As you can see, there is a subtle difference between the two. In order to quantitatively see the difference:

  vi 1HW9.lig.min_scored.mol2

You'll be able to see the RMSD values of the energy minimized ligand.

Footprint Analysis

We can also visually analyze the electrostatic and Van der Waals interactions that occur between the ligand and the receptor. We'll be able to see which residues are contributing the most during the binding event. Let's create an input file to generate the molecular footprint.

  vi footprint.in

This can be done in the 004_dock directory as well. In this input file, we'll be placing the following parameters.

  conformer_search_type                                        rigid
  use_internal_energy                                          no
  ligand_atom_file                                             1HW9.lig.min_scored.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               no
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           no
  grid_score_secondary                                         no
  multigrid_score_primary                                      no
  multigrid_score_secondary                                    no
  dock3.5_score_primary                                        no
  dock3.5_score_secondary                                      no
  continuous_score_primary                                     no
  continuous_score_secondary                                   no
  footprint_similarity_score_primary                           yes
  footprint_similarity_score_secondary                         no
  fps_score_use_footprint_reference_mol2                       yes
  fps_score_footprint_reference_mol2_filename                  ../001_structure/1HW9_lig_wH.mol2
  fps_score_foot_compare_type                                  Euclidean
  fps_score_normalize_foot                                     no
  fps_score_foot_comp_all_residue                              yes
  fps_score_receptor_filename                                  ../001_structure/1HW9_rec_wH.mol2
  fps_score_vdw_att_exp                                        6
  fps_score_vdw_rep_exp                                        9
  fps_score_vdw_rep_rad_scale                                  1
  fps_score_use_distance_dependent_dielectric                  yes
  fps_score_dielectric                                         4.0
  fps_score_vdw_fp_scale                                       1
  fps_score_es_fp_scale                                        1
  fps_score_hb_fp_scale                                        0
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              no
  atom_model                                                   all
  vdw_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        footprint.out
  write_footprints                                             yes
  write_hbonds                                                 yes
  write_orientations                                           no
  num_scored_conformers                                        1
  rank_ligands                                                 no

In order to execute the script:

  dock6 -i footprint.in -o footprint.out

Executing the script may produce an error, but that is okay. The script should still produce three desired files. These files include: footprint.out_footprint_scored.txt footprint.out_hbond_scored.txt and footprint.out_scored.mol2

However, we are not done as we still need to visualize the molecular footprint. We can do this by using a prewritten python script.

In order to execute the script which utilizes the files we just generated:

  python plot_footprint_single_magnitude.py footprint.out_footprint_scored.txt  50

The 50 at the end of the line indicates that we will only be looking at the 50 most significant residues. The footprint.out.pdf that is generated will have to be copied to your local computer in order for you to visualize it. It should look like the following image:

Docking

Rigid Docking

In rigid docking, the ligand is kept in a static position and has no rotational freedom. This means that the calculation is the least computationally expensive compared to the other forms of docking we will be performing. Before beginning, make sure you are in the appropriate directory as we will be generating several files in this directory.

  cd 004_dock

We'll need to create an input file for the rigid docking calculation. NOTE: Typically, the max_orientations parameter has a value of 1000. However, in our case. we need to change that parameter to 2000 in order to increase sampling. When using 1000, the RMSD values are not within the desired <2 angstroms. Doubling the max orientations remedies this issue.

  vi rigid.in

The parameters in this input file are as follows:

  conformer_search_type                                        rigid
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100
  ligand_atom_file                                             1HW9.lig.min_scored.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               yes
  use_rmsd_reference_mol                                       yes
  rmsd_reference_filename                                      1HW9.lig.min_scored.mol2
  use_database_filter                                          no
  orient_ligand                                                yes
  automated_matching                                           yes
  receptor_site_file                                           ../002_surface_spheres/selected_spheres.sph
  max_orientations                                             2000
  critical_points                                              no
  chemical_matching                                            no
  use_ligand_spheres                                           no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           yes
  grid_score_secondary                                         no
  grid_score_rep_rad_scale                                     1
  grid_score_vdw_scale                                         1
  grid_score_es_scale                                          1
  grid_score_grid_prefix                                       ../003_gridbox/grid
  multigrid_score_secondary                                    no
  dock3.5_score_secondary                                      no
  continuous_score_secondary                                   no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              yes
  simplex_max_iterations                                       1000
  simplex_tors_premin_iterations                               0
  simplex_max_cycles                                           1
  simplex_score_converge                                       0.1
  simplex_cycle_converge                                       1.0
  simplex_trans_step                                           1.0
  simplex_rot_step                                             0.1
  simplex_tors_step                                            10.0
  simplex_random_seed                                          0
  simplex_restraint_min                                        no
  atom_model                                                   all
  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        rigid.out
  write_orientations                                           no
  num_scored_conformers                                        1
  rank_ligands                                                 no

In order to execute:

  dock6 -i rigid.in

If the command runs successfully, the following file will be generated: rigid.out_scored.mol2 You should vi into this mol2 file to confirm that the RMSD values are within the acceptable range. We can also visualize the mol2 file in Chimera to see the conformer generated compared to the original ligand.

The blue molecule represents the conformer generated during the rigid docking process. The beige molecule is the original ligand. As you can see, they appear very similar in shape and overall structure.

Fixed Anchor Docking

Fixed anchor docking is a bit more computationally expensive compared to the previous DOCK setup. In FAD, a large part of the ligand will remain anchored but will have some rotational freedom compared to other rigid groups in the ligand. Only the largest structure of the ligand will be fixed. Let's create the input file for the fixed anchor docking.

  vi fixed.in

The input file will have the following properties:

  conformer_search_type                                        flex
  write_fragment_libraries                                     no
  user_specified_anchor                                        no
  limit_max_anchors                                            no
  min_anchor_size                                              5
  pruning_use_clustering                                       yes
  pruning_max_orients                                          10000
  pruning_clustering_cutoff                                    100
  pruning_conformer_score_cutoff                               100.0
  pruning_conformer_score_scaling_factor                       1.0
  use_clash_overlap                                            no
  write_growth_tree                                            no
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100.0
  ligand_atom_file                                             ../001_structure/1HW9_lig_wH.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               yes
  use_rmsd_reference_mol                                       yes
  rmsd_reference_filename                                      ../001_structure/1HW9_lig_wH.mol2
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           yes
  grid_score_secondary                                         no
  grid_score_rep_rad_scale                                     1
  grid_score_vdw_scale                                         1
  grid_score_es_scale                                          1
  grid_score_grid_prefix                                       ../003_gridbox/grid
  multigrid_score_secondary                                    no
  dock3.5_score_secondary                                      no
  continuous_score_secondary                                   no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              yes
  minimize_anchor                                              yes
  minimize_flexible_growth                                     yes
  use_advanced_simplex_parameters                              no
  simplex_max_cycles                                           1
  simplex_score_converge                                       0.1
  simplex_cycle_converge                                       1
  simplex_trans_step                                           1
  simplex_rot_step                                             0.1
  simplex_tors_step                                            10
  simplex_anchor_max_iterations                                500
  simplex_grow_max_iterations                                  500
  simplex_grow_tors_premin_iterations                          0
  simplex_random_seed                                          0
  simplex_restraint_min                                        no
  atom_model                                                   all
  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        1S19_fad
  write_orientations                                           no
  num_scored_conformers                                        100
  write_conformations                                          no
  cluster_conformations                                        yes
  cluster_rmsd_threshold                                       2.0
  rank_ligands                                                 no

In order to execute the command:

  dock6 -i fixed.in -o fixed.out

If this is successful, you should see two new files appear in the current directory. These files include 1HW9_fixed.out and 1HW9_fad_scored.mol2 You should absolutely open up the 1HW9_fad_scored.mol2 file if you haven't already as an important lesson lies within the data of the file. You'll notice that the top-scoring molecule in that file has RMSD values much greater than 2 angstroms. As a matter of fact, the second molecule also follows the same trend. It isn't until the third molecule that the RMSD values are less than 2 angstroms. This is what's known as a scoring failure. During the docking process, there are three types of outcomes. A docking success occurs when the top-scoring molecule has RMSD values lower than 2 angstroms. A scoring failure occurs when the top-scoring molecule is not within 2 angstroms but you're list of molecules samples compounds within 2 angstroms. Finally, a sampling failure exists where your list of compounds does not contain a single molecule within 2 angstroms of the crystallized ligand.

Flex Docking

Flex docking is considered to be the most computationally expensive out of the three dock calculations we'll perform. During this calculation, the ligand has full rotational freedom and all internal degrees of freedom are being sampled. It is considered to be the most rigorous. Let's create the input file so that we can run this calculation.

  conformer_search_type                                        flex
  write_fragment_libraries                                     no
  user_specified_anchor                                        no
  limit_max_anchors                                            no
  min_anchor_size                                              5
  pruning_use_clustering                                       yes
  pruning_max_orients                                          5000
  pruning_clustering_cutoff                                    2500
  pruning_conformer_score_cutoff                               100.0
  pruning_conformer_score_scaling_factor                       1.0
  use_clash_overlap                                            no
  write_growth_tree                                            no
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100.0
  ligand_atom_file                                             1HW9.lig.min_scored.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               yes
  use_rmsd_reference_mol                                       yes
  rmsd_reference_filename                                      1HW9.lig.min_scored.mol2
  use_database_filter                                          no
  orient_ligand                                                yes
  automated_matching                                           yes
  receptor_site_file                                           ../002_surface_spheres/selected_spheres.sph
  max_orientations                                             1000
  critical_points                                              no
  chemical_matching                                            no
  use_ligand_spheres                                           no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           yes
  grid_score_secondary                                         no
  grid_score_rep_rad_scale                                     1
  grid_score_vdw_scale                                         1
  grid_score_es_scale                                          1
  grid_score_grid_prefix                                       ../003_gridbox/grid
  multigrid_score_secondary                                    no
  dock3.5_score_secondary                                      no
  continuous_score_secondary                                   no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              yes
  minimize_anchor                                              yes
  minimize_flexible_growth                                     yes
  use_advanced_simplex_parameters                              no
  simplex_max_cycles                                           1
  simplex_score_converge                                       0.1
  simplex_cycle_converge                                       1.0
  simplex_trans_step                                           1.0
  simplex_rot_step                                             0.1
  simplex_tors_step                                            10.0
  simplex_anchor_max_iterations                                500
  simplex_grow_max_iterations                                  500
  simplex_grow_tors_premin_iterations                          0
  simplex_random_seed                                          0
  simplex_restraint_min                                        no
  atom_model                                                   all
  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        flex.out
  write_orientations                                           yes
  num_scored_conformers                                        25
  write_conformations                                          yes
  cluster_conformations                                        yes
  cluster_rmsd_threshold                                       2
  rank_ligands                                                 no

You'll notice that we changed the parameters slightly to increase sampling. Because of these changes,this calculation will take a while to run. As a courtesy to others, you should not run jobs like these on the head node of the Seawulf cluster. Instead, we need to submit this job to the queue via slurm. Let's create the slurm file capable of doing so.

  vi slurm.sh

Inside, you should copy and paste the following text:

  #!/bin/bash
  #SBATCH --time=10:00:00
  #SBATCH --nodes=1
  #SBATCH --ntasks=28
  #SBATCH --job-name=job_name
  #SBATCH --output=output
  #SBATCH -p long-28core

  cd $SLURM_SUBMIT_DIR

  dock6 -i flex.in -o 1HW9_flex.out

In order to execute the slurm script and submit the job to the queue:

  sbatch slurm.sh

You can confirm that you've submitted the job and even see if it's already running by using the following command:

  squeue -u netid

This job will take hours to complete. However, once it is complete, you should see the following files in your directory: flex.out_conformers.mol2, flex.out_orients.mol2, flex.out_scored.mol2. You should open the flex.out_scored.mol2 file to check your results of the flex dock. You'll notice the results are similar to the previous dock calculation where a scoring failure occurred. This time, however, any molecule that comes close to the two-angstrom cutoff is very far down the list. We've come to the conclusion that the binding pocket of the receptor is very open and causes a few concerns during the flex docking process. If you go through the dock molecules in Chimera, you can see that they sample the large binding pocket very well. It's just unfortunate that many of these sampled orients are several angrstroms off of the original ligand.

Virtual Screening

Test Screen

Virtual screening is great because it allows you to dock a predetermined library of molecules and score them based on a set of parameters. In this tutorial, we will be using the tutorial file "VS_library_5K.mol2" which should be copied to the 005_virtual_screen and 006_virtual_screen_mpi directories. This directory will be provided for you by the instructor. First, let's change to the appropriate directory.

  cd 006_virtual_screen

We now need to create the input file for the virtual screen process. It should have the following parameters:

  vi virtual in

  conformer_search_type                                        flex
  write_fragment_libraries                                     no
  user_specified_anchor                                        no
  limit_max_anchors                                            no
  min_anchor_size                                              5
  pruning_use_clustering                                       yes
  pruning_max_orients                                          1000
  pruning_clustering_cutoff                                    100
  pruning_conformer_score_cutoff                               100.0
  pruning_conformer_score_scaling_factor                       1.0
  use_clash_overlap                                            no
  write_growth_tree                                            no
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100.0
  ligand_atom_file                                             VS_library_5K.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               no
  use_database_filter                                          no
  orient_ligand                                                yes
  automated_matching                                           yes
  receptor_site_file                                           ../002_surface_spheres/selected_spheres.sph
  max_orientations                                             1000
  critical_points                                              no
  chemical_matching                                            no
  use_ligand_spheres                                           no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           yes
  grid_score_secondary                                         no
  grid_score_rep_rad_scale                                     1
  grid_score_vdw_scale                                         1
  grid_score_es_scale                                          1
  grid_score_grid_prefix                                       ../003_gridbox/grid
  multigrid_score_secondary                                    no
  dock3.5_score_secondary                                      no
  continuous_score_secondary                                   no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              yes
  minimize_anchor                                              yes
  minimize_flexible_growth                                     yes
  use_advanced_simplex_parameters                              no
  simplex_max_cycles                                           1
  simplex_score_converge                                       0.1
  simplex_cycle_converge                                       1.0
  simplex_trans_step                                           1.0
  simplex_rot_step                                             0.1
  simplex_tors_step                                            10.0
  simplex_anchor_max_iterations                                500
  simplex_grow_max_iterations                                  500
  simplex_grow_tors_premin_iterations                          0
  simplex_random_seed                                          0
  simplex_restraint_min                                        no
  atom_model                                                   all
  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        virtual.out
  write_orientations                                           no
  num_scored_conformers                                        1
  rank_ligands                                                 no

Let's make sure that the input file is set up appropriately by performing a trial run of the execute command.

  dock6 -i virtual.in

If no error occurs, then the file has been setup appropraitely. However, you can't run a job like this on the head node and it should be running on multiple cores. Control C to cancel the job as we're going to set this up using MPI.

Virtual Screen MPI

Make sure that you change to the appropriate directory as well as copying the VS library to this directory as well.

  cd 006_virtual_screen_mpi

You can also move the virtual.in input file we created in the previous directory to this directory as well. However, we now also need to create an input for the slurm script with an MPI command.

  vi virtual.sh

  #!/bin/bash
  #SBATCH --time=48:00:00
  #SBATCH --nodes=4
  #SBATCH --ntasks=28
  #SBATCH --job-name=1HW9_VS_min
  #SBATCH --output=1HW9_VS_min
  #SBATCH -p long-28core

  cd $SLURM_SUBMIT_DIR

  mpirun -np 112 dock6.mpi -i virtual.in -o 1HW9.virtual.mpi.out

In order to formally submit this to the queue:

  sbatch virtual.sh

Since we're running this on 112 cores, this job will produce 112 out files in the working directory. These results will all be summarized in the virtual.out file for easier visualization. You should also transfer this out file to your local computer to see the results in Chimera. You can see how well they align with the original crystallized ligand and receptor.

Cartesian Minimization

Before we can really make an accurate assessment of the virtual screen results, we should perform an energy minimization of all the ligands that were screened. This will help filter some of the better molecules equipped for our system. First, lets move to the appropriate directory.

  cd 007_cartesianmin

We need to create the input file for the minimization calculation of course. It should have the following parameters:

  conformer_search_type                                        rigid
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100
  ligand_atom_file                                             ../006_virtual_screen_mpi/virtual.out_scored.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               no
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  vcontact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           no
  grid_score_secondary                                         no
  multigrid_score_primary                                      no
  multigrid_score_secondary                                    no
  dock3.5_score_primary                                        no
  dock3.5_score_secondary                                      no
  continuous_score_primary                                     yes
  continuous_score_secondary                                   no
  cont_score_rec_filename                                      ../001_structure/1HW9_rec_wH.mol2
  cont_score_att_exp                                           6
  cont_score_rep_exp                                           12
  cont_score_rep_rad_scale                                     1
  cont_score_use_dist_dep_dielectric                           yes
  cont_score_dielectric                                        4.0
  cont_score_vdw_scale                                         1.0
  cont_score_es_scale                                          1.0
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              yes
  simplex_max_iterations                                       1000
  simplex_tors_premin_iterations                               0
  simplex_max_cycles                                           1.0
  simplex_score_converge                                       0.1
  simplex_cycle_converge                                       1.0
  simplex_trans_step                                           1.0
  simplex_rot_step                                             0.1
  simplex_tors_step                                            10.0
  simplex_random_seed                                          0
  simplex_restraint_min                                        no
  atom_model                                                   all
  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        3vjk.virtual_screen.min
  write_orientations                                           no
  num_scored_conformers                                        1
  rank_ligands                                                 no

This job will take a while so we'll need to submit this to the queue using slurm.

  vi min.sh

  #!/bin/bash
  #SBATCH --time=48:00:00
  #SBATCH --nodes=2
  #SBATCH --ntasks=28
  #SBATCH --job-name=mpi_1HW9_VS_AR
  #SBATCH --output=1HW9_VS_mpi.out
  #SBATCH -p long-28core

  cd $SLURM_SUBMIT_DIR
  mpirun -np 56 /gpfs/projects/AMS536/zzz.programs/dock6.9_release/bin/dock6.mpi -i min_scored.in -o min_scored.out

In order to run this:

  sbatch min.sh

We'll now use these results for the final step of our virtual screen process.

Rescore

We can actually rescore our virtual screened molecules based on the cartesisan minimization that we just performed. We'll be able to look at parameters like Tanimoto score and molecular footprint to see which screened molecules best represent the original crystallized ligand. Rescoring molecules is a great way to provide rationale for eliminating a majority of your screened molecules as you can't possibly hope to purchase/synthesize ~5000 molecules. First, let's make sure we're operating in the correct directory.

  cd 008_rescore

Next, we'll create the necessary input file and parameters to perform the rescore calculation.

  vi rescore.in

  conformer_search_type                                        rigid
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100
  ligand_atom_file                                             ../006_virtual_screen_mpi/virtual.out_scored.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               no 
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           no
  grid_score_secondary                                         no
  multigrid_score_primary                                      no
  multigrid_score_secondary                                    no
  dock3.5_score_primary                                        no
  dock3.5_score_secondary                                      no
  continuous_score_primary                                     no
  continuous_score_secondary                                   no
  footprint_similarity_score_primary                           no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_primary                                  no
  pharmacophore_score_secondary                                no
  descriptor_score_primary                                     yes
  descriptor_score_secondary                                   no
  descriptor_use_grid_score                                    no
  descriptor_use_multigrid_score                               no
  descriptor_use_continuous_score                              no
  descriptor_use_footprint_similarity                          yes
  descriptor_use_pharmacophore_score                           yes
  descriptor_use_hungarian                                     yes
  descriptor_use_volume_overlap                                yes
  descriptor_fps_score_use_footprint_reference_mol2            yes
  descriptor_fps_score_footprint_reference_mol2_filename       ../004_dock/1HW9.lig.min_scored.mol2
  descriptor_fps_score_foot_compare_type                       Euclidean
  descriptor_fps_score_normalize_foot                          no
  descriptor_fps_score_foot_comp_all_residue                   yes
  descriptor_fps_score_receptor_filename                       ../001_structure/1HW9_rec_wH.mol2
  descriptor_fps_score_vdw_att_exp                             6
  descriptor_fps_score_vdw_rep_exp                             12
  descriptor_fps_score_vdw_rep_rad_scale                       1
  descriptor_fps_score_use_distance_dependent_dielectric       yes
  descriptor_fps_score_dielectric                              4.0
  descriptor_fps_score_vdw_fp_scale                            1
  descriptor_fps_score_es_fp_scale                             1
  descriptor_fps_score_hb_fp_scale                             0
  descriptor_fms_score_use_ref_mol2                            yes
  descriptor_fms_score_ref_mol2_filename                       ../004_dock/1HW9.lig.min_scored.mol2
  descriptor_fms_score_write_reference_pharmacophore_mol2      no
  descriptor_fms_score_write_reference_pharmacophore_txt       no
  descriptor_fms_score_write_candidate_pharmacophore           no
  descriptor_fms_score_write_matched_pharmacophore             no
  descriptor_fms_score_compare_type                            overlap
  descriptor_fms_score_full_match                              yes
  descriptor_fms_score_match_rate_weight                       5.0
  descriptor_fms_score_match_dist_cutoff                       1.0
  descriptor_fms_score_match_proj_cutoff                       .7071
  descriptor_fms_score_max_score                               20
  descriptor_fingerprint_ref_filename                          ../004_dock/1HW9.lig.min_scored.mol2
  descriptor_hms_score_ref_filename                            ../004_dock/1HW9.lig.min_scored.mol2
  descriptor_hms_score_matching_coeff                          -5
  descriptor_hms_score_rmsd_coeff                              1
  descriptor_volume_score_reference_mol2_filename              ../004_dock/1HW9.lig.min_scored.mol2
  descriptor_volume_score_overlap_compute_method               analytical
  descriptor_weight_fps_score                                  1
  descriptor_weight_pharmacophore_score                        1
  descriptor_weight_fingerprint_tanimoto                       -1
  descriptor_weight_hms_score                                  1
  descriptor_weight_volume_overlap_score                       -1
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              no
  atom_model                                                   all
  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
  chem_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/chem.defn
  pharmacophore_defn_file                                      /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/ph4.defn
  ligand_outfile_prefix                                        descriptor.output
  write_footprints                                             yes
  write_hbonds                                                 yes
  write_orientations                                           no
  num_scored_conformers                                        1
  rank_ligands                                                 no

This is yet another job we'll have to submit to the queue. Let's create the .sh file necessary for submission.

  vi job.sh

  #!/bin/sh
  #SBATCH --partition=long-40core
  #SBATCH --time=48:00:00
  #SBATCH --nodes=4
  #SBATCH --ntasks=32
  #SBATCH --job-name=tutorial_run
  #SBATCH --output=%x-%j.o

  echo "starting DOCK6.9 Simulation"
  module load intel/mpi/64/2018/18.0.3

  mpirun -np 32 dock6.mpi -i rescore.in -o rescore.out
  echo "done"

In order to run:

  sbatch job.sh

Since we're running this on 32 cores, you should see 32 .out files upon successful completion. However, all of the data in the .out files will be appropriately summarized in the descriptor.out_scored.mol2 file which we can view in Chimera.

Complete Visualization of Virtual Screen Results

Since we've now generated all the necessary files for the virtual screen process, we can compile them to view in Chimera. Make sure you have all the necessary files on hand as we'll be importing them into Chimera in the following order:

1. 1HW9.box.pdb 2. selected_spheres.sph 3. 1HW9_rec_wH.mol2 4. 1HW9_lig_wH.mol2 5. virtual.out_scored.mol2

Before importing file 5 into Chimera, please load files 1-4 into a single session. We need to use the specific View DOCK tool in order to import file 5. If you've opened files 1-4 successfully in Chimera, your screen should look something like this.

Now that files are 1-4 are successfully imported, we can open the virtual.out_scored.mol2. In order to properly do this:

  Tools->Structure/Binding Analysis->ViewDock
  Select Dock 4,5,6

It will most likely take your computer a bit to load the entire file in. The number of molecules in the file borders the limit of what Chimera can handle. If Chimera freezes, just let it run in the background and the file should eventually open. You'll notice that when the file does successfully open, a new window will pop up. This window has all the molecules from your virtual screen. You can tab through these molecules one by one using the arrow keys on your keyboard. Most importantly, do not close out that window and instead, minimize it when you want to view your entire session without obstruction.

In the ViewDock window, you should click:

  Column->Show

You'll see SEVERAL descriptors and ways to filter your results. You can select any one of the descriptors to filter your results. Feel free to play around with the descriptors to further anaylze the virtual screen molecules. Perhaps look at the footprint similarity scores or descriptor scores? Perhaps look at the number of H-bonds or H-bond patterns? The choice is up to you in how you want to analyze your results!