Difference between revisions of "2021 DOCK tutorial 1 with PDBID 1HW9"

Revision as of 14:47, 13 March 2021

Introduction

DOCK

DOCK 6 is one of the many tools available to computational biologists that predicts ligand binding geometries and interactions. The functions of DOCK 6 are diverse and have several general applications. A primary use of the program involves a virtual screening of thousands of molecules for an intended purpose. These purposes can include database screenings for molecules that inhibit enzyme activity, bind a particular protein, or even bind to larger complexes. As more versions of the program are released, new features are added such as the inclusion of solvation and receptor flexibility considerations in its calculations.

IHW9

IHW9 is the PDB code for the catalytic complex between human HMG-CoA reductase (HMGR) and Simvastatin. HMGR is considered a rate-controlling enzyme in the metabolic pathway responsible for the biosynthesis of cholesterol. Inhibitors of HMGR, known as statins, are often prescribed as treatment therapies for high cholesterol patients. While statins inhibit the catalytic effect of HMGR, they also provide other positive biochemical effects such as the stimulation of bone growth and anti-inflammatory responses. Studying statin binding using this complex can potentially aid in the discovery of drugs capable of producing these off-target effects.

Directory Preparation

Before even beginning the DOCK prep, it's much easier to create the directories we'll be using throughout the calculation now rather than later. To begin, make a directory using the PDB code provided on the website. This makes organizing the files cleaner and easier to follow.

  mkdir 1HW9

Once the directory is made, enter it as we will be creating several more directories within this directory.

  cd 1HW9

Inside the 1HW9 directory, you should make several additional directories with the following titles. You do not have to follow the same nomenclature provided here as these are simply examples. However, the names you ultimately choose should be indicative of the information you will be keeping in them. This makes finding files much easier later on in the DOCK process.

  001_structure  002_surface_spheres  003_gridbox  004_dock  005_virtual_screen  006_virtual_screen_mpi  007_cartesianmin  008_rescore

You can confirm the presence of these directories within 1HW9 by using the "ls" command.

Receptor Preparation

Downloading and Opening 1HW9

The first file we need to get started is the 1HW9 PDB file from the website. If you head over to the Protein Data Base website, you'll notice a search bar located at the top of the page. Simply type in "1HW9" into that search bar and it will take you to the corresponding page. You can also search using the name of the complex, but since we already have the corresponding code, this makes things much easier.

Once on the webpage, you'll notice a breadth of information regarding the protein complex. Take note of the information you see and where you can access this information such as relevant literature links and images. At the right of your screen, you'll see a "Download Files" tab which you should click on.

  Download Files -> PDB format -> (Your_Specified_Folder)

Since you are now the proud owner of the 1HW9 PDB file, you can begin opening it up in Chimera. Once Chimera is open, go to the top left and click:

  File -> Open -> (Path_to_Specified Folder) -> 1HW9.pdb

Once open, the program should show you something that looks like the above image. This is known as a ribbon diagram and visualizes several properties of the HMGR-Sim complex. Using the ribbon diagram and Chimera, we can ascertain several defining characteristics such as specific residues, rotatable bonds, and protein chains. The image at the moment looks like a lot, but you would never run a calculation with all this information at once. We need to isolate several components first to make the image easier to follow and calculations easier to perform.

DOCK Prep of the Protein Receptor

While having the entire complex at our disposal is nice, it doesn't do much good to have all chains present during the calculations. This spends unnecessary computational resources and time. Instead, we can isolate one of the chains that incorporates the interaction between HMGR and Simvastatin. Our focus will be on Chain A of the complex. In order to visualize the chain:

  Select -> Chain -> A

Notice, however, that this does not remove Chains B, C, and D from our view. We need to perform this action manually.

  Select -> Invert (Selected Molecules) -> Actions -> Atoms/Bonds -> Delete

After deletion, the only thing remaining should be Chain A of the original file. It is within your best interest to save this session as you'll need to come back to this file when preparing the ligand. Doing so will prevent you from having to do the above deletion step again.

  File -> Save Session As ->

Now that the session has been saved, we can go ahead and isolate the receptor. While we've isolated Chain A, you'll notice that Simvastatin is still located on the molecule. In order to remove everything besides HMGR, we need to do the following:

  Select -> Residue -> ADP -> Actions -> Atoms/Bonds -> Delete
  Select -> Residue -> HOH -> Actions -> Atoms/Bonds -> Delete
  Select -> Residue -> SIM -> Actions -> Atoms/Bonds -> Delete

Notice that not only did we remove Simvastatin, but we also removed any ADP and water molecules. These are not vital to the calculations and can be removed as such. The only item remaining should be the isolated receptor.

The isolated receptor can now be saved as a mol2 file which we will again modify shortly after saving.

  File -> Save Mol2 -> 1HW9_rec_noH.mol2

We now need to actually prepare the molecule for future DOCK calculations. This can be done using Chimera to add both charge and hydrogens to the receptor. Fortunately, there is a very useful command in Chimera that can do both in one step.

  Tools -> Structure Editing -> DOCK prep

Follow the onscreen prompts and command windows to fully prep the receptor. This would be a great time to review the literature present on the PDB website. Consulting the literature, you'll notice the neutral charge on the receptor molecule. It's important to note that this charge needs to be the same charge that you assign the receptor molecule in Chimera during the dock prep. After you're finished, the receptor should look like this.

  Atoms/Bonds -> Show

Now that the receptor is prepped, we can save it and move on to isolating and prepping the ligand. This file should be saved as a separate mol2 file.

  File -> Save Mol2 -> (Destination) -> 1HW9_rec_wH.mol2

Ligand Preparation

Isolating the Ligand

Open the session in Chimera we previously saved that consisted of Chain A with both the receptor and Simvastatin ligand. This time around, we'll be using the session file to isolate the ligand instead of the receptor.

  Select -> Residue -> SIM -> Select -> Invert (Selected Molecules) -> Actions -> Atoms/Bonds -> Delete

Similar to the receptor, we'll need to save this ligand as its own mol2 file.

  File -> Save Mol2 -> (Destination) -> 1HW9_lig_noH.mol2

DOCK Prep of the Ligand

We now need to prepare the ligand file for DOCK as well. This will be done in the exact same manner as the DOCK prep for the receptor.

  Tools -> Structure Editing -> DOCK Prep

Again, follow the onscreen prompts to complete this step. However, you'll notice that Simvastatin ligand will have a net charge of -1. This needs to be confirmed using the literature on the PDB website. If we look at the residues at the binding pocket, we'll see the presence of arginine and lysine residues. This would indicate that that ligand here is most likely in a deprotonated state.

Save this file under a different name using the previous nomenclature.

  File -> Save Mol2 -> (Destination) -> 1HW9_lig_wH.mol2

At this point in time, make sure that all previously generated files have been moved to the 001_structure directory we created at the beginning of this tutorial.

Surface Generation and Selection of Spheres

Visualizing the Surface

Chimera is great because it can create an even better visualization of the surface of the receptor protein. The program covers the expected surface of the protein with a series of spheres. In doing so, we create an image that easily visualizes the binding pocket of the protein. In order to create this surface:

  File -> Open -> 1HW9_rec_wH.mol2 -> Actions -> Surface -> Show

The blue surface in the image represents the ligand. It's great to visualize both as you can see where these two molecules are interacting.

Of course, we need to save this surface without the ligand for future calculations. This surface will aid in finding exactly where the ligand-receptor interaction is occuring.

  Tools -> Structure Editing -> Write DMS -> 1HW9_rec_surface.dms

Once saved, please move the .dms file to the 002_surface_spheres directory created earlier.

Sphere Selection

We now need to visualize the empty space that exists on the surface of the protein. These empty spaces represent possible locations for binding to occur. Several properties can be modified when generating the surface spheres such as the minimum difference between the spheres and surface, and maximum/minimum sphere radii. Ideally, we want to avoid steric clashing so we'll set this property to 0. Additionally, we don't want the generated spheres to be abnormally large or abnormally small. As such, we'll set the boundaries for these properties to 4 (max) and 1.4 (min).

Before running the command to generate the spheres, we need to create an input file using the properties previously described. For programming purposes, please name the input file and output file as you see it here.

  vi INSPH

Let's set up the file with the desired properties. Do not copy the comments into your file

  1HW9_rec_surface.dms      -> your input file we made a step before
  R                         -> R flag specifies external sphere generation
  X                         -> Specifies all surface points
  0.0                       -> Steric clashing
  4.0                       -> Max sphere radii
  1.4                       -> Min sphere radii
  1HW9_receptor_noH.sph     -> Output file

We can now use our input file to run the Linux sphere generation command "sphgen"

  sphgen -i INSPH -o OUTSPH

This will generate the .sph file we specified in the INSPH input file. If you followed exactly, 1HW9_receptor_noH.sph should appear in your directory. Let's open it in Chimera to visually confirm the success.

The above image represents the generated spheres on the surface of our protein receptor. If you only open the sphere file, you'll only see the colored spheres in Chimera. Opening up the mol2 file of the receptor will show you where these spheres are in reference to the surface of the receptor.

Notice, however, that some of these surface spheres are relatively useless for the docking calculations. We only want to focus on the spheres that are present within the binding pocket of the receptor. Luckily, there's a command we can use to narrow down the spheres that are within 10 angstroms of Simvastatin.

  sphere_selector 1HW9_receptor_noH.sph 1HW9_lig_wH.mol2 10.0

This will output a file in the directory titled "selected_spheres.sph"

Box and Grid Generation

DOCK calculations can be computationally expensive, so it's important to perform as many actions as possible to reduce the amount of computation power required. One such step is the generation of an energy grid. DOCK uses the grid to calculate an energy score between one ligand pose and the receptor. Any poses or interactions that occur outside this grid are simply ignored.

Box Generation

Before beginning, make sure to switch the directory you're working in. We will now be working in our third directory.

  cd 003_gridbox

We'll be using Showbox, to generate the box around our molecule. First, we need to generate an input file that the command can use to perform such actions.

  vi showbox.in

Similar to before, we'll be placing several parameters into this input file to provide a bit of direction during the box generation.

  Y                                             -> Says yes, we want the box to be generated                                
  8.0                                           -> How far in angstroms should the box appear from our selected spheres                            
  ../002_surface_spheres/selected_spheres.sph   -> Location of your our selected spheres file
  1 
  1HW9.box.pdb                                  -> Output file

In order to execute the command:

  showbox < showbox.in

Upon success, 1HW9.box.pdb should appear in your current directory.

Grid Generation

We'll next need to generate the grid used for the DOCK calculations. To do so, we'll need to generate a new input fie:

  vi grid.in

Similar to our other input files, we'll specify a number of parameters that will guide the grid generation process.

  compute_grids                             yes
  grid_spacing                              0.4
  output_molecule                           no
  contact_score                             no
  energy_score                              yes
  energy_cutoff_distance                    9999
  atom_model                                a
  attractive_exponent                       6
  repulsive_exponent                        9
  distance_dielectric                       yes
  dielectric_factor                         4
  bump_filter                               yes
  bump_overlap                              0.75
  receptor_file                             ../001_structure/1HW9_rec_wH.mol2
  box_file                                  1HW9.box.pdb
  vdw_definition_file                      /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
  score_grid_prefix                        grid

In order to execute the script:

  grid -i grid.in -o gridinfo.out

If the script runs successfully, three new files should be in your directory. These files include: gridinfo.out, grid.bmp, and grid.nrg

Energy Minimization

Even though the ligand in the crystal structure binds to the receptor, it may not represent the lowest possible energy conformation. In order to increase the accuracy of the DOCK calculation, we need to reduce the potential energy of the ligand which will aid in reproducing other conformers. Before we begin, let's change to the fourth directory that we created.

  cd 004_dock

In this directory, we'll need to create an input file for the energy minimization calculation

  vi min.in

This input file should have the following parameters and paths:

  conformer_search_type                                        rigid
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100.0
  ligand_atom_file                                             ../001_structure/1HW9_lig_wH.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               yes
  use_rmsd_reference_mol                                       yes
  rmsd_reference_filename                                      ../001_structure/1HW9_lig_wH.mol2
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           yes
  grid_score_secondary                                         no
  grid_score_rep_rad_scale                                     1
  grid_score_vdw_scale                                         1
  grid_score_es_scale                                          1
  grid_score_grid_prefix                                       ../003_gridbox/grid
  multigrid_score_secondary                                    no
  dock3.5_score_secondary                                      no
  continuous_score_secondary                                   no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              yes
  simplex_max_iterations                                       1000
  simplex_tors_premin_iterations                               0
  simplex_max_cycles                                           1
  simplex_score_converge                                       0.1
  simplex_cycle_converge                                       1.0
  simplex_trans_step                                           1.0
  simplex_rot_step                                             0.1

In order to execute:

  dock6 -i min.in -o min.out

This will generate two new files in the directory. You should see min.out and 1HW9.lig.min_scored.mol2

We can visualize the difference between the energy minimized ligand and its original pose. The beige ligand in the above image represents the energy minimized form. The blue is the original ligand. As you can see, there is a subtle difference between the two. In order to quantitatively see the difference:

  vi 1HW9.lig.min_scored.mol2

You'll be able to see the RMSD values of the energy minimized ligand.

Footprint Analysis

We can also visually analyze the electrostatic and Van der Waals interactions that occur between the ligand and the receptor. We'll be able to see which residues are contributing the most during the binding event. Let's create an input file to generate the molecular footprint.

  vi footprint.in

This can be done in the 004_dock directory as well. In this input file, we'll be placing the following parameters.

  conformer_search_type                                        rigid
  use_internal_energy                                          no
  ligand_atom_file                                             1HW9.lig.min_scored.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               no
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           no
  grid_score_secondary                                         no
  multigrid_score_primary                                      no
  multigrid_score_secondary                                    no
  dock3.5_score_primary                                        no
  dock3.5_score_secondary                                      no
  continuous_score_primary                                     no
  continuous_score_secondary                                   no
  footprint_similarity_score_primary                           yes
  footprint_similarity_score_secondary                         no
  fps_score_use_footprint_reference_mol2                       yes
  fps_score_footprint_reference_mol2_filename                  ../001_structure/1HW9_lig_wH.mol2
  fps_score_foot_compare_type                                  Euclidean
  fps_score_normalize_foot                                     no
  fps_score_foot_comp_all_residue                              yes
  fps_score_receptor_filename                                  ../001_structure/1HW9_rec_wH.mol2
  fps_score_vdw_att_exp                                        6
  fps_score_vdw_rep_exp                                        9
  fps_score_vdw_rep_rad_scale                                  1
  fps_score_use_distance_dependent_dielectric                  yes
  fps_score_dielectric                                         4.0
  fps_score_vdw_fp_scale                                       1
  fps_score_es_fp_scale                                        1
  fps_score_hb_fp_scale                                        0
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no

In order to execute the script:

  dock6 -i footprint.in -o footprint.out

Executing the script may produce an error, but that is okay. The script should still produce three desired files. These files include: footprint.out_footprint_scored.txt footprint.out_hbond_scored.txt and footprint.out_scored.mol2

However, we are not done as we still need to visualize the molecular footprint. We can do this by using a prewritten python script. You first need to copy the scipt into your current working directory.

  cp /gpfs/projects/AMS536/2020/536_class/steve_ta/footprint_test_1.21.2020/plot_footprint_single_magnitude.py\

In order to execute the script which utilizes the files we just generated:

  python plot_footprint_single_magnitude.py footprint.out_footprint_scored.txt  50

The 50 at the end of the line indicates that we will only be looking at the 50 most significant residues. The footprint.out.pdf that is generated will have to be copied to your local computer in order for you to visualize it. It should look like the following image:

Docking

Rigid Docking

In rigid docking, the ligand is kept in a static position and has no rotational freedom. This means that the calculation is the least computationally expensive compared to the other forms of docking we will be performing. Before beginning, make sure you are in the appropriate directory as we will be generating several files in this directory.

  cd 004_dock

We'll need to create an input file for the rigid docking calculation. NOTE: Typically, the max_orientations parameter has a value of 1000. However, in our case. we need to change that parameter to 2000 in order to increase sampling. When using 1000, the RMSD values are not within the desired <2 angstroms. Doubling the max orientations remedies this issue.

  vi rigid.in

The parameters in this input file are as follows:

  conformer_search_type                                        rigid
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100
  ligand_atom_file                                             1HW9.lig.min_scored.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               yes
  use_rmsd_reference_mol                                       yes
  rmsd_reference_filename                                      1HW9.lig.min_scored.mol2
  use_database_filter                                          no
  orient_ligand                                                yes
  automated_matching                                           yes
  receptor_site_file                                           ../002_surface_spheres/selected_spheres.sph
  max_orientations                                             2000
  critical_points                                              no
  chemical_matching                                            no
  use_ligand_spheres                                           no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           yes
  grid_score_secondary                                         no
  grid_score_rep_rad_scale                                     1
  grid_score_vdw_scale                                         1
  grid_score_es_scale                                          1
  grid_score_grid_prefix                                       ../003_gridbox/grid
  multigrid_score_secondary                                    no
  dock3.5_score_secondary                                      no
  continuous_score_secondary                                   no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              yes
  simplex_max_iterations                                       1000

In order to execute:

  dock6 -i rigid.in

If the command runs successfully, the following file will be generated: rigid.out_scored.mol2 You should vi into this mol2 file to confirm that the RMSD values are within the acceptable range. We can also visualize the mol2 file in Chimera to see the conformer generated compared to the original ligand.

The blue molecule represents the conformer generated during the rigid docking process. The beige molecule is the original ligand. As you can see, they appear very similar in shape and overall structure.

Fixed Anchor Docking

Fixed anchor docking is a bit more computationally expensive compared to the previous DOCK setup. In FAD, a large part of the ligand will remain anchored but will have some rotational freedom compared to other rigid groups in the ligand. Only the largest structure of the ligand will be fixed. Let's create the input file for the fixed anchor docking.

  vi fixed.in

The input file will have the following properties:

  conformer_search_type                                        flex
  write_fragment_libraries                                     no
  user_specified_anchor                                        no
  limit_max_anchors                                            no
  min_anchor_size                                              5
  pruning_use_clustering                                       yes
  pruning_max_orients                                          10000
  pruning_clustering_cutoff                                    100
  pruning_conformer_score_cutoff                               100.0
  pruning_conformer_score_scaling_factor                       1.0
  use_clash_overlap                                            no
  write_growth_tree                                            no
  use_internal_energy                                          yes
  internal_energy_rep_exp                                      12
  internal_energy_cutoff                                       100.0
  ligand_atom_file                                             ../001_structure/1HW9_lig_wH.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               yes
  use_rmsd_reference_mol                                       yes
  rmsd_reference_filename                                      ../001_structure/1HW9_lig_wH.mol2
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           yes
  grid_score_secondary                                         no
  grid_score_rep_rad_scale                                     1
  grid_score_vdw_scale                                         1
  grid_score_es_scale                                          1
  grid_score_grid_prefix                                       ../003_gridbox/grid
  multigrid_score_secondary                                    no
  dock3.5_score_secondary                                      no
  continuous_score_secondary                                   no
  footprint_similarity_score_secondary                         no
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no

In order to execute the command:

  dock6 -i fixed.in -o fixed.out

If this is successful, you should see two new files appear in the current directory. These files include 1HW9_fixed.out and 1HW9_fad_scored.mol2 You should absolutely open up the 1HW9_fad_scored.mol2 file if you haven't already as an important lesson lies within the data of the file. You'll notice that the top-scoring molecule in that file has RMSD values much greater than 2 angstroms. As a matter of fact, the second molecule also follows the same trend. It isn't until the third molecule that the RMSD values are less than 2 angstroms. This is what's known as a scoring failure. During the docking process, there are three types of outcomes. A docking success occurs when the top-scoring molecule has RMSD values lower than 2 angstroms. A scoring failure occurs when the top-scoring molecule is not within 2 angstroms but you're list of molecules samples compounds within 2 angstroms. Finally, a sampling failure exists where your list of compounds does not contain a single molecule within 2 angstroms of the crystallized ligand.

Flex Docking

Flex docking is considered to be the most computationally expensive out of the three dock calculations we'll perform. During this calculation, the ligand has full rotational freedom and all internal degrees of freedom are being sampled. It is considered to be the most rigorous. Let's create the input file so that we can run this calculation.

Revision as of 14:42, 13 March 2021 (view source) Stonybrook (talk \| contribs) (→‎Fixed Anchor Docking) ← Previous diff		Revision as of 14:47, 13 March 2021 (view source) Stonybrook (talk \| contribs) (→‎Flex Docking) Next diff →
Line 491:		Line 491:

	=== '''Flex Docking''' ===		=== '''Flex Docking''' ===
		+	Flex docking is considered to be the most computationally expensive out of the three dock calculations we'll perform. During this calculation, the ligand has full rotational freedom and all internal degrees of freedom are being sampled. It is considered to be the most rigorous. Let's create the input file so that we can run this calculation.

Difference between revisions of "2021 DOCK tutorial 1 with PDBID 1HW9"

Revision as of 14:47, 13 March 2021

Contents

Introduction

DOCK

IHW9

Directory Preparation

Receptor Preparation

Downloading and Opening 1HW9

DOCK Prep of the Protein Receptor

Ligand Preparation

Isolating the Ligand

DOCK Prep of the Ligand

Surface Generation and Selection of Spheres

Visualizing the Surface

Sphere Selection

Box and Grid Generation

Box Generation

Grid Generation

Energy Minimization

Footprint Analysis

Docking

Rigid Docking

Fixed Anchor Docking

Flex Docking

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Rizzo Lab

Courses

Toolbox