2021 DOCK tutorial 1 with PDBID 1HW9
Contents
Introduction
DOCK
DOCK 6 is one of the many tools available to computational biologists that predicts ligand binding geometries and interactions. The functions of DOCK 6 are diverse and have several general applications. A primary use of the program involves a virtual screening of thousands of molecules for an intended purpose. These purposes can include database screenings for molecules that inhibit enzyme activity, bind a particular protein, or even bind to larger complexes. As more versions of the program are released, new features are added such as the inclusion of solvation and receptor flexibility considerations in its calculations.
IHW9
IHW9 is the PDB code for the catalytic complex between human HMG-CoA reductase (HMGR) and Simvastatin. HMGR is considered a rate-controlling enzyme in the metabolic pathway responsible for the biosynthesis of cholesterol. Inhibitors of HMGR, known as statins, are often prescribed as treatment therapies for high cholesterol patients. While statins inhibit the catalytic effect of HMGR, they also provide other positive biochemical effects such as the stimulation of bone growth and anti-inflammatory responses. Studying statin binding using this complex can potentially aid in the discovery of drugs capable of producing these off-target effects.
Directory Preparation
Before even beginning the DOCK prep, it's much easier to create the directories we'll be using throughout the calculation now rather than later. To begin, make a directory using the PDB code provided on the website. This makes organizing the files cleaner and easier to follow.
mkdir 1HW9
Once the directory is made, enter it as we will be creating several more directories within this directory.
cd 1HW9
Inside the 1HW9 directory, you should make several additional directories with the following titles. You do not have to follow the same nomenclature provided here as these are simply examples. However, the names you ultimately choose should be indicative of the information you will be keeping in them. This makes finding files much easier later on in the DOCK process.
001_structure 002_surface_spheres 003_gridbox 004_dock 005_virtual_screen 006_virtual_screen_mpi 007_cartesianmin 008_rescore
You can confirm the presence of these directories within 1HW9 by using the "ls" command.
Receptor Preparation
Downloading and Opening 1HW9
The first file we need to get started is the 1HW9 PDB file from the website. If you head over to the Protein Data Base website, you'll notice a search bar located at the top of the page. Simply type in "1HW9" into that search bar and it will take you to the corresponding page. You can also search using the name of the complex, but since we already have the corresponding code, this makes things much easier.
Once on the webpage, you'll notice a breadth of information regarding the protein complex. Take note of the information you see and where you can access this information such as relevant literature links and images. At the right of your screen, you'll see a "Download Files" tab which you should click on.
Download Files -> PDB format -> (Your_Specified_Folder)
Since you are now the proud owner of the 1HW9 PDB file, you can begin opening it up in Chimera. Once Chimera is open, go to the top left and click:
File -> Open -> (Path_to_Specified Folder) -> 1HW9.pdb
Once open, the program should show you something that looks like the above image. This is known as a ribbon diagram and visualizes several properties of the HMGR-Sim complex. Using the ribbon diagram and Chimera, we can ascertain several defining characteristics such as specific residues, rotatable bonds, and protein chains. The image at the moment looks like a lot, but you would never run a calculation with all this information at once. We need to isolate several components first to make the image easier to follow and calculations easier to perform.
DOCK Prep of the Protein Receptor
While having the entire complex at our disposal is nice, it doesn't do much good to have all chains present during the calculations. This spends unnecessary computational resources and time. Instead, we can isolate one of the chains that incorporates the interaction between HMGR and Simvastatin. Our focus will be on Chain A of the complex. In order to visualize the chain:
Select -> Chain -> A
Notice, however, that this does not remove Chains B, C, and D from our view. We need to perform this action manually.
Select -> Invert (Selected Molecules) -> Actions -> Atoms/Bonds -> Delete
After deletion, the only thing remaining should be Chain A of the original file. It is within your best interest to save this session as you'll need to come back to this file when preparing the ligand. Doing so will prevent you from having to do the above deletion step again.
File -> Save Session As ->
Now that the session has been saved, we can go ahead and isolate the receptor. While we've isolated Chain A, you'll notice that Simvastatin is still located on the molecule. In order to remove everything besides HMGR, we need to do the following:
Select -> Residue -> ADP -> Actions -> Atoms/Bonds -> Delete Select -> Residue -> HOH -> Actions -> Atoms/Bonds -> Delete Select -> Residue -> SIM -> Actions -> Atoms/Bonds -> Delete
Notice that not only did we remove Simvastatin, but we also removed any ADP and water molecules. These are not vital to the calculations and can be removed as such. The only item remaining should be the isolated receptor.
The isolated receptor can now be saved as a mol2 file which we will again modify shortly after saving.
File -> Save Mol2 -> 1HW9_rec_noH.mol2
We now need to actually prepare the molecule for future DOCK calculations. This can be done using Chimera to add both charge and hydrogens to the receptor. Fortunately, there is a very useful command in Chimera that can do both in one step.
Tools -> Structure Editing -> DOCK prep
Follow the onscreen prompts and command windows to fully prep the receptor. This would be a great time to review the literature present on the PDB website. Consulting the literature, you'll notice the neutral charge on the receptor molecule. It's important to note that this charge needs to be the same charge that you assign the receptor molecule in Chimera during the dock prep. After you're finished, the receptor should look like this.
Atoms/Bonds -> Show
Now that the receptor is prepped, we can save it and move on to isolating and prepping the ligand. This file should be saved as a separate mol2 file.
File -> Save Mol2 -> (Destination) -> 1HW9_rec_wH.mol2
Ligand Preparation
Isolating the Ligand
Open the session in Chimera we previously saved that consisted of Chain A with both the receptor and Simvastatin ligand. This time around, we'll be using the session file to isolate the ligand instead of the receptor.
Select -> Residue -> SIM -> Select -> Invert (Selected Molecules) -> Actions -> Atoms/Bonds -> Delete
Similar to the receptor, we'll need to save this ligand as its own mol2 file.
File -> Save Mol2 -> (Destination) -> 1HW9_lig_noH.mol2
DOCK Prep of the Ligand
We now need to prepare the ligand file for DOCK as well. This will be done in the exact same manner as the DOCK prep for the receptor.
Tools -> Structure Editing -> DOCK Prep
Again, follow the onscreen prompts to complete this step. However, you'll notice that Simvastatin ligand will have a net charge of -1. This needs to be confirmed using the literature on the PDB website. If we look at the residues at the binding pocket, we'll see the presence of arginine and lysine residues. This would indicate that that ligand here is most likely in a deprotonated state.
Save this file under a different name using the previous nomenclature.
File -> Save Mol2 -> (Destination) -> 1HW9_lig_wH.mol2
At this point in time, make sure that all previously generated files have been moved to the 001_structure directory we created at the beginning of this tutorial.
Surface Generation and Selection of Spheres
Visualizing the Surface
Chimera is great because it can create an even better visualization of the surface of the receptor protein. The program covers the expected surface of the protein with a series of spheres. In doing so, we create an image that easily visualizes the binding pocket of the protein. In order to create this surface:
File -> Open -> 1HW9_rec_noH.mol2 -> Actions -> Surface -> Show
The blue surface in the image represents the ligand. It's great to visualize both as you can see where these two molecules are interacting.
Of course, we need to save this surface without the ligand for future calculations. This surface will aid in finding exactly where the ligand-receptor interaction is occuring.
Tools -> Structure Editing -> Write DMS -> 1HW9_rec_surface.dms
Once saved, please move the .dms file to the 002_surface_spheres directory created earlier.
Sphere Selection
We now need to visualize the empty space that exists on the surface of the protein. These empty spaces represent possible locations for binding to occur. Several properties can be modified when generating the surface spheres such as the minimum difference between the spheres and surface, and maximum/minimum sphere radii. Ideally, we want to avoid steric clashing so we'll set this property to 0. Additionally, we don't want the generated spheres to be abnormally large or abnormally small. As such, we'll set the boundaries for these properties to 4 (max) and 1.4 (min).
Before running the command to generate the spheres, we need to create an input file using the properties previously described. For programming purposes, please name the input file and output file as you see it here.
vi INSPH
Let's set up the file with the desired properties. Do not copy the comments into your file
1HW9_rec_surface.dms -> your input file we made a step before R -> R flag specifies external sphere generation X -> Specifies all surface points 0.0 -> Steric clashing 4.0 -> Max sphere radii 1.4 -> Min sphere radii 1HW9_receptor_noH.sph -> Output file
We can now use our input file to run the Linux sphere generation command "sphgen"
sphgen -i INSPH -o OUTSPH
This will generate the .sph file we specified in the INSPH input file. If you followed exactly, 1HW9_receptor_noH.sph should appear in your directory. Let's open it in Chimera to visually confirm the success.
The above image represents the generated spheres on the surface of our protein receptor. If you only open the sphere file, you'll only see the colored spheres in Chimera. Opening up the mol2 file of the receptor will show you where these spheres are in reference to the surface of the receptor.
Notice, however, that some of these surface spheres are relatively useless for the docking calculations. We only want to focus on the spheres that are present within the binding pocket of the receptor. Luckily, there's a command we can use to narrow down the spheres that are within 10 angstroms of Simvastatin.
sphere_selector 1HW9_receptor_noH.sph 1HW9_lig_wH.mol2 10.0
This will output a file in the directory titled "selected_spheres.sph"
Box and Grid Generation
DOCK calculations can be computationally expensive, so it's important to perform as many actions as possible to reduce the amount of computation power required. One such step is the generation of an energy grid. DOCK uses the grid to calculate an energy score between one ligand pose and the receptor. Any poses or interactions that occur outside this grid are simply ignored.
Box Generation
Before beginning, make sure to switch the directory you're working in. We will now be working in our third directory.
cd 003_gridbox
We'll be using Showbox, to generate the box around our molecule. First, we need to generate an input file that the command can use to perform such actions.
vi showbox.in
Similar to before, we'll be placing several parameters into this input file to provide a bit of direction during the box generation.
Y -> Says yes, we want the box to be generated 8.0 -> How far in angstroms should the box appear from our selected spheres ../002_surface_spheres/selected_spheres.sph -> Location of your our selected spheres file 1 1HW9.box.pdb -> Output file
In order to execute the command:
showbox < showbox.in
Upon success, 1HW9.box.pdb should appear in your current directory.
Grid Generation
We'll next need to generate the grid used for the DOCK calculations. To do so, we'll need to generate a new input fie:
vi grid.in
Similar to our other input files, we'll specify a number of parameters that will guide the grid generation process.
compute_grids yes grid_spacing 0.4 output_molecule no contact_score no energy_score yes energy_cutoff_distance 9999 atom_model a attractive_exponent 6 repulsive_exponent 9 distance_dielectric yes dielectric_factor 4 bump_filter yes bump_overlap 0.75 receptor_file ../001_structure/1HW9_rec_wH.mol2 box_file 1HW9.box.pdb vdw_definition_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn score_grid_prefix grid
In order to execute the script:
grid -i grid.in -o gridinfo.out
If the script runs successfully, three new files should be in your directory. These files include: gridinfo.out, grid.bmp, and grid.nrg
Energy Minimization
Even though the ligand in the crystal structure binds to the receptor, it may not represent the lowest possible energy conformation. In order to increase the accuracy of the DOCK calculation, we need to reduce the potential energy of the ligand which will aid in reproducing other conformers. Before we begin, let's change to the fourth directory that we created.
cd 004_dock
In this directory, we'll need to create an input file for the energy minimization calculation
vi min.in
This input file should have the following parameters and paths:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../001_structure/1HW9_lig_wH.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../001_structure/1HW9_lig_wH.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min yes simplex_coefficient_restraint 10.0 atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix 1HW9.lig.min write_orientations no num_scored_conformers 1 rank_ligands no
In order to execute:
dock6 -i min.in -o min.out
This will generate two new files in the directory. You should see min.out and 1HW9.lig.min_scored.mol2
We can visualize the difference between the energy minimized ligand and its original pose. The beige ligand in the above image represents the energy minimized form. The blue is the original ligand. As you can see, there is a subtle difference between the two. In order to quantitatively see the difference:
vi 1HW9.lig.min_scored.mol2
You'll be able to see the RMSD values of the energy minimized ligand.
Footprint Analysis
We can also visually analyze the electrostatic and Van der Waals interactions that occur between the ligand and the receptor. We'll be able to see which residues are contributing the most during the binding event. Let's create an input file to generate the molecular footprint.
vi footprint.in
This can be done in the 004_dock directory as well. In this input file, we'll be placing the following parameters.
conformer_search_type rigid use_internal_energy no ligand_atom_file 1HW9.lig.min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary no multigrid_score_secondary no dock3.5_score_primary no dock3.5_score_secondary no continuous_score_primary no continuous_score_secondary no footprint_similarity_score_primary yes footprint_similarity_score_secondary no fps_score_use_footprint_reference_mol2 yes fps_score_footprint_reference_mol2_filename ../001_structure/1HW9_lig_wH.mol2 fps_score_foot_compare_type Euclidean fps_score_normalize_foot no fps_score_foot_comp_all_residue yes fps_score_receptor_filename ../001_structure/1HW9_rec_wH.mol2 fps_score_vdw_att_exp 6 fps_score_vdw_rep_exp 9 fps_score_vdw_rep_rad_scale 1 fps_score_use_distance_dependent_dielectric yes fps_score_dielectric 4.0 fps_score_vdw_fp_scale 1 fps_score_es_fp_scale 1 fps_score_hb_fp_scale 0 pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix footprint.out write_footprints yes write_hbonds yes write_orientations no num_scored_conformers 1 rank_ligands no
In order to execute the script:
dock6 -i footprint.in -o footprint.out
Executing the script may produce an error, but that is okay. The script should still produce three desired files. These files include: footprint.out_footprint_scored.txt footprint.out_hbond_scored.txt and footprint.out_scored.mol2
However, we are not done as we still need to visualize the molecular footprint. We can do this by using a prewritten python script.
In order to execute the script which utilizes the files we just generated:
python plot_footprint_single_magnitude.py footprint.out_footprint_scored.txt 50
The 50 at the end of the line indicates that we will only be looking at the 50 most significant residues. The footprint.out.pdf that is generated will have to be copied to your local computer in order for you to visualize it. It should look like the following image:
Docking
Rigid Docking
In rigid docking, the ligand is kept in a static position and has no rotational freedom. This means that the calculation is the least computationally expensive compared to the other forms of docking we will be performing. Before beginning, make sure you are in the appropriate directory as we will be generating several files in this directory.
cd 004_dock
We'll need to create an input file for the rigid docking calculation. NOTE: Typically, the max_orientations parameter has a value of 1000. However, in our case. we need to change that parameter to 2000 in order to increase sampling. When using 1000, the RMSD values are not within the desired <2 angstroms. Doubling the max orientations remedies this issue.
vi rigid.in
The parameters in this input file are as follows:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100 ligand_atom_file 1HW9.lig.min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename 1HW9.lig.min_scored.mol2 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002_surface_spheres/selected_spheres.sph max_orientations 2000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix rigid.out write_orientations no num_scored_conformers 1 rank_ligands no
In order to execute:
dock6 -i rigid.in
If the command runs successfully, the following file will be generated: rigid.out_scored.mol2 You should vi into this mol2 file to confirm that the RMSD values are within the acceptable range. We can also visualize the mol2 file in Chimera to see the conformer generated compared to the original ligand.
The blue molecule represents the conformer generated during the rigid docking process. The beige molecule is the original ligand. As you can see, they appear very similar in shape and overall structure.
Fixed Anchor Docking
Fixed anchor docking is a bit more computationally expensive compared to the previous DOCK setup. In FAD, a large part of the ligand will remain anchored but will have some rotational freedom compared to other rigid groups in the ligand. Only the largest structure of the ligand will be fixed. Let's create the input file for the fixed anchor docking.
vi fixed.in
The input file will have the following properties:
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 10000 pruning_clustering_cutoff 100 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../001_structure/1HW9_lig_wH.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../001_structure/1HW9_lig_wH.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1 simplex_trans_step 1 simplex_rot_step 0.1 simplex_tors_step 10 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix 1S19_fad write_orientations no num_scored_conformers 100 write_conformations no cluster_conformations yes cluster_rmsd_threshold 2.0 rank_ligands no
In order to execute the command:
dock6 -i fixed.in -o fixed.out
If this is successful, you should see two new files appear in the current directory. These files include 1HW9_fixed.out and 1HW9_fad_scored.mol2 You should absolutely open up the 1HW9_fad_scored.mol2 file if you haven't already as an important lesson lies within the data of the file. You'll notice that the top-scoring molecule in that file has RMSD values much greater than 2 angstroms. As a matter of fact, the second molecule also follows the same trend. It isn't until the third molecule that the RMSD values are less than 2 angstroms. This is what's known as a scoring failure. During the docking process, there are three types of outcomes. A docking success occurs when the top-scoring molecule has RMSD values lower than 2 angstroms. A scoring failure occurs when the top-scoring molecule is not within 2 angstroms but you're list of molecules samples compounds within 2 angstroms. Finally, a sampling failure exists where your list of compounds does not contain a single molecule within 2 angstroms of the crystallized ligand.
Flex Docking
Flex docking is considered to be the most computationally expensive out of the three dock calculations we'll perform. During this calculation, the ligand has full rotational freedom and all internal degrees of freedom are being sampled. It is considered to be the most rigorous. Let's create the input file so that we can run this calculation.
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 5000 pruning_clustering_cutoff 2500 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file 1HW9.lig.min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename 1HW9.lig.min_scored.mol2 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002_surface_spheres/selected_spheres.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix flex.out write_orientations yes num_scored_conformers 25 write_conformations yes cluster_conformations yes cluster_rmsd_threshold 2 rank_ligands no
You'll notice that we changed the parameters slightly to increase sampling. Because of these changes,this calculation will take a while to run. As a courtesy to others, you should not run jobs like these on the head node of the Seawulf cluster. Instead, we need to submit this job to the queue via slurm. Let's create the slurm file capable of doing so.
vi slurm.sh
Inside, you should copy and paste the following text:
#!/bin/bash #SBATCH --time=10:00:00 #SBATCH --nodes=1 #SBATCH --ntasks=28 #SBATCH --job-name=job_name #SBATCH --output=output #SBATCH -p long-28core
cd $SLURM_SUBMIT_DIR
dock6 -i flex.in -o 1HW9_flex.out
In order to execute the slurm script and submit the job to the queue:
sbatch slurm.sh
You can confirm that you've submitted the job and even see if it's already running by using the following command:
squeue -u netid
This job will take hours to complete. However, once it is complete, you should see the following files in your directory: flex.out_conformers.mol2, flex.out_orients.mol2, flex.out_scored.mol2. You should open the flex.out_scored.mol2 file to check your results of the flex dock. You'll notice the results are similar to the previous dock calculation where a scoring failure occurred. This time, however, any molecule that comes close to the two-angstrom cutoff is very far down the list. We've come to the conclusion that the binding pocket of the receptor is very open and causes a few concerns during the flex docking process. If you go through the dock molecules in Chimera, you can see that they sample the large binding pocket very well. It's just unfortunate that many of these sampled orients are several angrstroms off of the original ligand.
Virtual Screening
Test Screen
Virtual screening is great because it allows you to dock a predetermined library of molecules and score them based on a set of parameters. In this tutorial, we will be using the tutorial file "VS_library_5K.mol2" which should be copied to the 005_virtual_screen and 006_virtual_screen_mpi directories. This directory will be provided for you by the instructor. First, let's change to the appropriate directory.
cd 006_virtual_screen
We now need to create the input file for the virtual screen process. It should have the following parameters:
vi virtual in
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 1000 pruning_clustering_cutoff 100 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file VS_library_5K.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002_surface_spheres/selected_spheres.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix virtual.out write_orientations no num_scored_conformers 1 rank_ligands no
Let's make sure that the input file is set up appropriately by performing a trial run of the execute command.
dock6 -i virtual.in
If no error occurs, then the file has been setup appropraitely. However, you can't run a job like this on the head node and it should be running on multiple cores. Control C to cancel the job as we're going to set this up using MPI.
Virtual Screen MPI
Make sure that you change to the appropriate directory as well as copying the VS library to this directory as well.
cd 006_virtual_screen_mpi
You can also move the virtual.in input file we created in the previous directory to this directory as well. However, we now also need to create an input for the slurm script with an MPI command.
vi virtual.sh
#!/bin/bash #SBATCH --time=48:00:00 #SBATCH --nodes=4 #SBATCH --ntasks=28 #SBATCH --job-name=1HW9_VS_min #SBATCH --output=1HW9_VS_min #SBATCH -p long-28core
cd $SLURM_SUBMIT_DIR
mpirun -np 112 dock6.mpi -i virtual.in -o 1HW9.virtual.mpi.out
In order to formally submit this to the queue:
sbatch virtual.sh
Since we're running this on 112 cores, this job will produce 112 out files in the working directory. These results will all be summarized in the virtual.out file for easier visualization. You should also transfer this out file to your local computer to see the results in Chimera. You can see how well they align with the original crystallized ligand and receptor.
Cartesian Minimization
Before we can really make an accurate assessment of the virtual screen results, we should perform an energy minimization of all the ligands that were screened. This will help filter some of the better molecules equipped for our system. First, lets move to the appropriate directory.
cd 007_cartesianmin
We need to create the input file for the minimization calculation of course. It should have the following parameters:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100 ligand_atom_file ../006_virtual_screen_mpi/virtual.out_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules yes vcontact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary no multigrid_score_secondary no dock3.5_score_primary no dock3.5_score_secondary no continuous_score_primary yes continuous_score_secondary no cont_score_rec_filename ../001_structure/1HW9_rec_wH.mol2 cont_score_att_exp 6 cont_score_rep_exp 12 cont_score_rep_rad_scale 1 cont_score_use_dist_dep_dielectric yes cont_score_dielectric 4.0 cont_score_vdw_scale 1.0 cont_score_es_scale 1.0 footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1.0 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix 3vjk.virtual_screen.min write_orientations no num_scored_conformers 1 rank_ligands no
This job will take a while so we'll need to submit this to the queue using slurm.
vi min.sh
#!/bin/bash #SBATCH --time=48:00:00 #SBATCH --nodes=2 #SBATCH --ntasks=28 #SBATCH --job-name=mpi_1HW9_VS_AR #SBATCH --output=1HW9_VS_mpi.out #SBATCH -p long-28core
cd $SLURM_SUBMIT_DIR mpirun -np 56 /gpfs/projects/AMS536/zzz.programs/dock6.9_release/bin/dock6.mpi -i min_scored.in -o min_scored.out
In order to run this:
sbatch min.sh
We'll now use these results for the final step of our virtual screen process.
Rescore
We can actually rescore our virtual screened molecules based on the cartesisan minimization that we just performed. We'll be able to look at parameters like Tanimoto score and molecular footprint to see which screened molecules best represent the original crystallized ligand. Rescoring molecules is a great way to provide rationale for eliminating a majority of your screened molecules as you can't possibly hope to purchase/synthesize ~5000 molecules. First, let's make sure we're operating in the correct directory.
cd 008_rescore
Next, we'll create the necessary input file and parameters to perform the rescore calculation.
vi rescore.in
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100 ligand_atom_file ../006_virtual_screen_mpi/virtual.out_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary no multigrid_score_secondary no dock3.5_score_primary no dock3.5_score_secondary no continuous_score_primary no continuous_score_secondary no footprint_similarity_score_primary no footprint_similarity_score_secondary no pharmacophore_score_primary no pharmacophore_score_secondary no descriptor_score_primary yes descriptor_score_secondary no descriptor_use_grid_score no descriptor_use_multigrid_score no descriptor_use_continuous_score no descriptor_use_footprint_similarity yes descriptor_use_pharmacophore_score yes descriptor_use_hungarian yes descriptor_use_volume_overlap yes descriptor_fps_score_use_footprint_reference_mol2 yes descriptor_fps_score_footprint_reference_mol2_filename ../004_dock/1HW9.lig.min_scored.mol2 descriptor_fps_score_foot_compare_type Euclidean descriptor_fps_score_normalize_foot no descriptor_fps_score_foot_comp_all_residue yes descriptor_fps_score_receptor_filename ../001_structure/1HW9_rec_wH.mol2 descriptor_fps_score_vdw_att_exp 6 descriptor_fps_score_vdw_rep_exp 12 descriptor_fps_score_vdw_rep_rad_scale 1 descriptor_fps_score_use_distance_dependent_dielectric yes descriptor_fps_score_dielectric 4.0 descriptor_fps_score_vdw_fp_scale 1 descriptor_fps_score_es_fp_scale 1 descriptor_fps_score_hb_fp_scale 0 descriptor_fms_score_use_ref_mol2 yes descriptor_fms_score_ref_mol2_filename ../004_dock/1HW9.lig.min_scored.mol2 descriptor_fms_score_write_reference_pharmacophore_mol2 no descriptor_fms_score_write_reference_pharmacophore_txt no descriptor_fms_score_write_candidate_pharmacophore no descriptor_fms_score_write_matched_pharmacophore no descriptor_fms_score_compare_type overlap descriptor_fms_score_full_match yes descriptor_fms_score_match_rate_weight 5.0 descriptor_fms_score_match_dist_cutoff 1.0 descriptor_fms_score_match_proj_cutoff .7071 descriptor_fms_score_max_score 20 descriptor_fingerprint_ref_filename ../004_dock/1HW9.lig.min_scored.mol2 descriptor_hms_score_ref_filename ../004_dock/1HW9.lig.min_scored.mol2 descriptor_hms_score_matching_coeff -5 descriptor_hms_score_rmsd_coeff 1 descriptor_volume_score_reference_mol2_filename ../004_dock/1HW9.lig.min_scored.mol2 descriptor_volume_score_overlap_compute_method analytical descriptor_weight_fps_score 1 descriptor_weight_pharmacophore_score 1 descriptor_weight_fingerprint_tanimoto -1 descriptor_weight_hms_score 1 descriptor_weight_volume_overlap_score -1 gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl chem_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/chem.defn pharmacophore_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/ph4.defn ligand_outfile_prefix descriptor.output write_footprints yes write_hbonds yes write_orientations no num_scored_conformers 1 rank_ligands no
This is yet another job we'll have to submit to the queue. Let's create the .sh file necessary for submission.
vi job.sh
#!/bin/sh #SBATCH --partition=long-40core #SBATCH --time=48:00:00 #SBATCH --nodes=4 #SBATCH --ntasks=32 #SBATCH --job-name=tutorial_run #SBATCH --output=%x-%j.o
echo "starting DOCK6.9 Simulation" module load intel/mpi/64/2018/18.0.3
mpirun -np 32 dock6.mpi -i rescore.in -o rescore.out echo "done"
In order to run:
sbatch job.sh
Since we're running this on 32 cores, you should see 32 .out files upon successful completion. However, all of the data in the .out files will be appropriately summarized in the descriptor.out_scored.mol2 file which we can view in Chimera.
Complete Visualization of Virtual Screen Results
Since we've now generated all the necessary files for the virtual screen process, we can compile them to view in Chimera. Make sure you have all the necessary files on hand as we'll be importing them into Chimera in the following order:
1. 1HW9.box.pdb 2. selected_spheres.sph 3. 1HW9_rec_wH.mol2 4. 1HW9_lig_wH.mol2 5. virtual.out_scored.mol2
Before importing file 5 into Chimera, please load files 1-4 into a single session. We need to use the specific View DOCK tool in order to import file 5. If you've opened files 1-4 successfully in Chimera, your screen should look something like this.
Now that files are 1-4 are successfully imported, we can open the virtual.out_scored.mol2. In order to properly do this:
Tools->Structure/Binding Analysis->ViewDock Select Dock 4,5,6
It will most likely take your computer a bit to load the entire file in. The number of molecules in the file borders the limit of what Chimera can handle. If Chimera freezes, just let it run in the background and the file should eventually open. You'll notice that when the file does successfully open, a new window will pop up. This window has all the molecules from your virtual screen. You can tab through these molecules one by one using the arrow keys on your keyboard. Most importantly, do not close out that window and instead, minimize it when you want to view your entire session without obstruction.
In the ViewDock window, you should click:
Column->Show
You'll see SEVERAL descriptors and ways to filter your results. You can select any one of the descriptors to filter your results. Feel free to play around with the descriptors to further anaylze the virtual screen molecules. Perhaps look at the footprint similarity scores or descriptor scores? Perhaps look at the number of H-bonds or H-bond patterns? The choice is up to you in how you want to analyze your results!