2021 DOCK tutorial 1 with PDBID 1HW9
- 1 Introduction
- 2 Receptor Preparation
- 3 Ligand Preparation
- 4 Surface Generation and Selection of Spheres
- 5 Box and Grid Generation
- 6 Energy Minimization
- 7 Footprint Analysis
- 8 Docking
- 9 Virtual Screening
DOCK 6 is one of the many tools available to computational biologists that predicts ligand binding geometries and interactions. The functions of DOCK 6 are diverse and have several general applications. A primary use of the program involves a virtual screening of thousands of molecules for an intended purpose. These purposes can include database screenings for molecules that inhibit enzyme activity, bind a particular protein, or even bind to larger complexes. As more versions of the program are released, new features are added such as the inclusion of solvation and receptor flexibility considerations in its calculations.
IHW9 is the PDB code for the catalytic complex between human HMG-CoA reductase (HMGR) and Simvastatin. HMGR is considered a rate-controlling enzyme in the metabolic pathway responsible for the biosynthesis of cholesterol. Inhibitors of HMGR, known as statins, are often prescribed as treatment therapies for high cholesterol patients. While statins inhibit the catalytic effect of HMGR, they also provide other positive biochemical effects such as the stimulation of bone growth and anti-inflammatory responses. Studying statin binding using this complex can potentially aid in the discovery of drugs capable of producing these off-target effects.
Before even beginning the DOCK prep, it's much easier to create the directories we'll be using throughout the calculation now rather than later. To begin, make a directory using the PDB code provided on the website. This makes organizing the files cleaner and easier to follow.
Once the directory is made, enter it as we will be creating several more directories within this directory.
Inside the 1HW9 directory, you should make several additional directories with the following titles. You do not have to follow the same nomenclature provided here as these are simply examples. However, the names you ultimately choose should be indicative of the information you will be keeping in them. This makes finding files much easier later on in the DOCK process.
001_structure 002_surface_spheres 003_gridbox 004_dock 005_virtual_screen 006_virtual_screen_mpi 007_cartesianmin 008_rescore
You can confirm the presence of these directories within 1HW9 by using the "ls" command.
Downloading and Opening 1HW9
The first file we need to get started is the 1HW9 PDB file from the website. If you head over to the Protein Data Base website, you'll notice a search bar located at the top of the page. Simply type in "1HW9" into that search bar and it will take you to the corresponding page. You can also search using the name of the complex, but since we already have the corresponding code, this makes things much easier.
Once on the webpage, you'll notice a breadth of information regarding the protein complex. Take note of the information you see and where you can access this information such as relevant literature links and images. At the right of your screen, you'll see a "Download Files" tab which you should click on.
Download Files -> PDB format -> (Your_Specified_Folder)
Since you are now the proud owner of the 1HW9 PDB file, you can begin opening it up in Chimera. Once Chimera is open, go to the top left and click:
File -> Open -> (Path_to_Specified Folder) -> 1HW9.pdb
Once open, the program should show you something that looks like the above image. This is known as a ribbon diagram and visualizes several properties of the HMGR-Sim complex. Using the ribbon diagram and Chimera, we can ascertain several defining characteristics such as specific residues, rotatable bonds, and protein chains. The image at the moment looks like a lot, but you would never run a calculation with all this information at once. We need to isolate several components first to make the image easier to follow and calculations easier to perform.
DOCK Prep of the Protein Receptor
While having the entire complex at our disposal is nice, it doesn't do much good to have all chains present during the calculations. This spends unnecessary computational resources and time. Instead, we can isolate one of the chains that incorporates the interaction between HMGR and Simvastatin. Our focus will be on Chain A of the complex. In order to visualize the chain:
Select -> Chain -> A
Notice, however, that this does not remove Chains B, C, and D from our view. We need to perform this action manually.
Select -> Invert (Selected Molecules) -> Actions -> Atoms/Bonds -> Delete
After deletion, the only thing remaining should be Chain A of the original file. It is within your best interest to save this session as you'll need to come back to this file when preparing the ligand. Doing so will prevent you from having to do the above deletion step again.
File -> Save Session As ->
Now that the session has been saved, we can go ahead and isolate the receptor. While we've isolated Chain A, you'll notice that Simvastatin is still located on the molecule. In order to remove everything besides HMGR, we need to do the following:
Select -> Residue -> ADP -> Actions -> Atoms/Bonds -> Delete Select -> Residue -> HOH -> Actions -> Atoms/Bonds -> Delete Select -> Residue -> SIM -> Actions -> Atoms/Bonds -> Delete
Notice that not only did we remove Simvastatin, but we also removed any ADP and water molecules. These are not vital to the calculations and can be removed as such. The only item remaining should be the isolated receptor.
The isolated receptor can now be saved as a mol2 file which we will again modify shortly after saving.
File -> Save Mol2 -> 1HW9_rec_noH.mol2
We now need to actually prepare the molecule for future DOCK calculations. This can be done using Chimera to add both charge and hydrogens to the receptor. Fortunately, there is a very useful command in Chimera that can do both in one step.
Tools -> Structure Editing -> DOCK prep
Follow the onscreen prompts and command windows to fully prep the receptor. This would be a great time to review the literature present on the PDB website. Consulting the literature, you'll notice the neutral charge on the receptor molecule. It's important to note that this charge needs to be the same charge that you assign the receptor molecule in Chimera during the dock prep. After you're finished, the receptor should look like this.
Atoms/Bonds -> Show
Now that the receptor is prepped, we can save it and move on to isolating and prepping the ligand. This file should be saved as a separate mol2 file.
File -> Save Mol2 -> (Destination) -> 1HW9_rec_wH.mol2
Isolating the Ligand
Open the session in Chimera we previously saved that consisted of Chain A with both the receptor and Simvastatin ligand. This time around, we'll be using the session file to isolate the ligand instead of the receptor.
Select -> Residue -> SIM -> Select -> Invert (Selected Molecules) -> Actions -> Atoms/Bonds -> Delete
Similar to the receptor, we'll need to save this ligand as its own mol2 file.
File -> Save Mol2 -> (Destination) -> 1HW9_lig_noH.mol2
DOCK Prep of the Ligand
We now need to prepare the ligand file for DOCK as well. This will be done in the exact same manner as the DOCK prep for the receptor.
Tools -> Structure Editing -> DOCK Prep
Again, follow the onscreen prompts to complete this step. However, you'll notice that Simvastatin ligand will have a net charge of -1. This needs to be confirmed using the literature on the PDB website. If we look at the residues at the binding pocket, we'll see the presence of arginine and lysine residues. This would indicate that that ligand here is most likely in a deprotonated state.
Save this file under a different name using the previous nomenclature.
File -> Save Mol2 -> (Destination) -> 1HW9_lig_wH.mol2
At this point in time, make sure that all previously generated files have been moved to the 001_structure directory we created at the beginning of this tutorial.
Surface Generation and Selection of Spheres
Visualizing the Surface
Chimera is great because it can create an even better visualization of the surface of the receptor protein. The program covers the expected surface of the protein with a series of spheres. In doing so, we create an image that easily visualizes the binding pocket of the protein. In order to create this surface:
File -> Open -> 1HW9_rec_wH.mol2 -> Actions -> Surface -> Show
The blue surface in the image represents the ligand. It's great to visualize both as you can see where these two molecules are interacting.
Of course, we need to save this surface without the ligand for future calculations. This surface will aid in finding exactly where the ligand-receptor interaction is occuring.
Tools -> Structure Editing -> Write DMS -> 1HW9_rec_surface.dms
Once saved, please move the .dms file to the 002_surface_spheres directory created earlier.
We now need to visualize the empty space that exists on the surface of the protein. These empty spaces represent possible locations for binding to occur. Several properties can be modified when generating the surface spheres such as the minimum difference between the spheres and surface, and maximum/minimum sphere radii. Ideally, we want to avoid steric clashing so we'll set this property to 0. Additionally, we don't want the generated spheres to be abnormally large or abnormally small. As such, we'll set the boundaries for these properties to 4 (max) and 1.4 (min).
Before running the command to generate the spheres, we need to create an input file using the properties previously described. For programming purposes, please name the input file and output file as you see it here.
Let's set up the file with the desired properties. Do not copy the comments into your file
1HW9_rec_surface.dms -> your input file we made a step before R -> R flag specifies external sphere generation X -> Specifies all surface points 0.0 -> Steric clashing 4.0 -> Max sphere radii 1.4 -> Min sphere radii 1HW9_receptor_noH.sph -> Output file
We can now use our input file to run the Linux sphere generation command "sphgen"
sphgen -i INSPH -o OUTSPH
This will generate the .sph file we specified in the INSPH input file. If you followed exactly, 1HW9_receptor_noH.sph should appear in your directory. Let's open it in Chimera to visually confirm the success.
The above image represents the generated spheres on the surface of our protein receptor. If you only open the sphere file, you'll only see the colored spheres in Chimera. Opening up the mol2 file of the receptor will show you where these spheres are in reference to the surface of the receptor.
Notice, however, that some of these surface spheres are relatively useless for the docking calculations. We only want to focus on the spheres that are present within the binding pocket of the receptor. Luckily, there's a command we can use to narrow down the spheres that are within 10 angstroms of Simvastatin.
sphere_selector 1HW9_receptor_noH.sph 1HW9_lig_wH.mol2 10.0
This will output a file in the directory titled "selected_spheres.sph"
Box and Grid Generation
DOCK calculations can be computationally expensive, so it's important to perform as many actions as possible to reduce the amount of computation power required. One such step is the generation of an energy grid. DOCK uses the grid to calculate an energy score between one ligand pose and the receptor. Any poses or interactions that occur outside this grid are simply ignored.
Before beginning, make sure to switch the directory you're working in. We will now be working in our third directory.
We'll be using Showbox, to generate the box around our molecule. First, we need to generate an input file that the command can use to perform such actions.
Similar to before, we'll be placing several parameters into this input file to provide a bit of direction during the box generation.
Y -> Says yes, we want the box to be generated 8.0 -> How far in angstroms should the box appear from our selected spheres ../002_surface_spheres/selected_spheres.sph -> Location of your our selected spheres file 1 1HW9.box.pdb -> Output file
In order to execute the command:
showbox < showbox.in
Upon success, 1HW9.box.pdb should appear in your current directory.
We'll next need to generate the grid used for the DOCK calculations. To do so, we'll need to generate a new input fie:
Similar to our other input files, we'll specify a number of parameters that will guide the grid generation process.
compute_grids yes grid_spacing 0.4 output_molecule no contact_score no energy_score yes energy_cutoff_distance 9999 atom_model a attractive_exponent 6 repulsive_exponent 9 distance_dielectric yes dielectric_factor 4 bump_filter yes bump_overlap 0.75 receptor_file ../001_structure/1HW9_rec_wH.mol2 box_file 1HW9.box.pdb vdw_definition_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn score_grid_prefix grid
In order to execute the script:
grid -i grid.in -o gridinfo.out
If the script runs successfully, three new files should be in your directory. These files include: gridinfo.out, grid.bmp, and grid.nrg
Even though the ligand in the crystal structure binds to the receptor, it may not represent the lowest possible energy conformation. In order to increase the accuracy of the DOCK calculation, we need to reduce the potential energy of the ligand which will aid in reproducing other conformers. Before we begin, let's change to the fourth directory that we created.
In this directory, we'll need to create an input file for the energy minimization calculation
This input file should have the following parameters and paths:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../001_structure/1HW9_lig_wH.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../001_structure/1HW9_lig_wH.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1
In order to execute:
dock6 -i min.in -o min.out
This will generate two new files in the directory. You should see min.out and 1HW9.lig.min_scored.mol2
We can visualize the difference between the energy minimized ligand and its original pose. The beige ligand in the above image represents the energy minimized form. The blue is the original ligand. As you can see, there is a subtle difference between the two. In order to quantitatively see the difference:
You'll be able to see the RMSD values of the energy minimized ligand.
We can also visually analyze the electrostatic and Van der Waals interactions that occur between the ligand and the receptor. We'll be able to see which residues are contributing the most during the binding event. Let's create an input file to generate the molecular footprint.
This can be done in the 004_dock directory as well. In this input file, we'll be placing the following parameters.
conformer_search_type rigid use_internal_energy no ligand_atom_file 1HW9.lig.min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary no multigrid_score_secondary no dock3.5_score_primary no dock3.5_score_secondary no continuous_score_primary no continuous_score_secondary no footprint_similarity_score_primary yes footprint_similarity_score_secondary no fps_score_use_footprint_reference_mol2 yes fps_score_footprint_reference_mol2_filename ../001_structure/1HW9_lig_wH.mol2 fps_score_foot_compare_type Euclidean fps_score_normalize_foot no fps_score_foot_comp_all_residue yes fps_score_receptor_filename ../001_structure/1HW9_rec_wH.mol2 fps_score_vdw_att_exp 6 fps_score_vdw_rep_exp 9 fps_score_vdw_rep_rad_scale 1 fps_score_use_distance_dependent_dielectric yes fps_score_dielectric 4.0 fps_score_vdw_fp_scale 1 fps_score_es_fp_scale 1 fps_score_hb_fp_scale 0 pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no
In order to execute the script:
dock6 -i footprint.in -o footprint.out
Executing the script may produce an error, but that is okay. The script should still produce three desired files. These files include: footprint.out_footprint_scored.txt footprint.out_hbond_scored.txt and footprint.out_scored.mol2
However, we are not done as we still need to visualize the molecular footprint. We can do this by using a prewritten python script. You first need to copy the scipt into your current working directory.
In order to execute the script which utilizes the files we just generated:
python plot_footprint_single_magnitude.py footprint.out_footprint_scored.txt 50
The 50 at the end of the line indicates that we will only be looking at the 50 most significant residues. The footprint.out.pdf that is generated will have to be copied to your local computer in order for you to visualize it. It should look like the following image:
In rigid docking, the ligand is kept in a static position and has no rotational freedom. This means that the calculation is the least computationally expensive compared to the other forms of docking we will be performing. Before beginning, make sure you are in the appropriate directory as we will be generating several files in this directory.
We'll need to create an input file for the rigid docking calculation. NOTE: Typically, the max_orientations parameter has a value of 1000. However, in our case. we need to change that parameter to 2000 in order to increase sampling. When using 1000, the RMSD values are not within the desired <2 angstroms. Doubling the max orientations remedies this issue.
The parameters in this input file are as follows:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100 ligand_atom_file 1HW9.lig.min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename 1HW9.lig.min_scored.mol2 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002_surface_spheres/selected_spheres.sph max_orientations 2000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000
In order to execute:
dock6 -i rigid.in
If the command runs successfully, the following file will be generated: rigid.out_scored.mol2 You should vi into this mol2 file to confirm that the RMSD values are within the acceptable range. We can also visualize the mol2 file in Chimera to see the conformer generated compared to the original ligand.
The blue molecule represents the conformer generated during the rigid docking process. The beige molecule is the original ligand. As you can see, they appear very similar in shape and overall structure.
Fixed Anchor Docking
Fixed anchor docking is a bit more computationally expensive compared to the previous DOCK setup. In FAD, a large part of the ligand will remain anchored but will have some rotational freedom compared to other rigid groups in the ligand. Only the largest structure of the ligand will be fixed. Let's create the input file for the fixed anchor docking.
The input file will have the following properties:
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 10000 pruning_clustering_cutoff 100 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../001_structure/1HW9_lig_wH.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../001_structure/1HW9_lig_wH.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no
In order to execute the command:
dock6 -i fixed.in -o fixed.out
If this is successful, you should see two new files appear in the current directory. These files include 1HW9_fixed.out and 1HW9_fad_scored.mol2 You should absolutely open up the 1HW9_fad_scored.mol2 file if you haven't already as an important lesson lies within the data of the file. You'll notice that the top-scoring molecule in that file has RMSD values much greater than 2 angstroms. As a matter of fact, the second molecule also follows the same trend. It isn't until the third molecule that the RMSD values are less than 2 angstroms. This is what's known as a scoring failure. During the docking process, there are three types of outcomes. A docking success occurs when the top-scoring molecule has RMSD values lower than 2 angstroms. A scoring failure occurs when the top-scoring molecule is not within 2 angstroms but you're list of molecules samples compounds within 2 angstroms. Finally, a sampling failure exists where your list of compounds does not contain a single molecule within 2 angstroms of the crystallized ligand.
Flex docking is considered to be the most computationally expensive out of the three dock calculations we'll perform. During this calculation, the ligand has full rotational freedom and all internal degrees of freedom are being sampled. It is considered to be the most rigorous. Let's create the input file so that we can run this calculation.
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 5000 pruning_clustering_cutoff 2500 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file 1HW9.lig.min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename 1HW9.lig.min_scored.mol2 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002_surface_spheres/selected_spheres.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix flex.out write_orientations yes num_scored_conformers 25 write_conformations yes cluster_conformations yes cluster_rmsd_threshold 2 rank_ligands no
You'll notice that we changed the parameters slightly to increase sampling. Because of these changes,this calculation will take a while to run. As a courtesy to others, you should not run jobs like these on the head node of the Seawulf cluster. Instead, we need to submit this job to the queue via slurm. Let's create the slurm file capable of doing so.
Inside, you should copy and paste the following text:
#!/bin/bash #SBATCH --time=10:00:00 #SBATCH --nodes=1 #SBATCH --ntasks=28 #SBATCH --job-name=job_name #SBATCH --output=output #SBATCH -p long-28core
dock6 -i flex.in -o 1HW9_flex.out
In order to execute the slurm script and submit the job to the queue:
You can confirm that you've submitted the job and even see if it's already running by using the following command:
squeue -u netid
This job will take hours to complete. However, once it is complete, you should see the following files in your directory: flex.out_conformers.mol2, flex.out_orients.mol2, flex.out_scored.mol2. You should open the flex.out_scored.mol2 file to check your results of the flex dock. You'll notice the results are similar to the previous dock calculation where a scoring failure occurred. This time, however, any molecule that comes close to the two-angstrom cutoff is very far down the list. We've come to the conclusion that the binding pocket of the receptor is very open and causes a few concerns during the flex docking process. If you go through the dock molecules in Chimera, you can see that they sample the large binding pocket very well. It's just unfortunate that many of these sampled orients are several angrstroms off of the original ligand.
Virtual screening is great because it allows you to dock a predetermined library of molecules and score them based on a set of parameters. In this tutorial, we use