Difference between revisions of "2022 DOCK tutorial 2 with PDBID 4ZUD"
Stonybrook (talk | contribs) (→4ZUD System) |
Stonybrook (talk | contribs) (→4ZUD System) |
||
Line 16: | Line 16: | ||
inverse agonist. | inverse agonist. | ||
− | [ | + | [https://www.sciencedirect.com/science/article/pii/S0021925820395302?via%3Dihub Crystal Structure Paper] |
=== '''Directory Preparation''' === | === '''Directory Preparation''' === |
Revision as of 14:57, 7 March 2022
Contents
Introduction
DOCK
DOCK is a molecular modeling program capable of sampling lower-energy ligand conformations with respect to a binding surface on a given protein. DOCK utilizes and manipulates the geometry of the ligand to find the conformation that yields that most favorable interaction with the respective binding site. With this tool, millions of molecules can be rapidly screened against a target protein for the purposes of identifying new drug molecules that are physiologically relevant.
For more information on DOCK and it's uses, please refer to their online manual: DOCK6 Manual
4ZUD System
4ZUD is an angiotensin II type 1 receptor. It is primarily expressed in liver, heart and kidney cells with its primary function being its ability to regulate blood pressure. Perturbations in chemical cascades that involve this kind of receptor often lead to cardiovascular disease and hypertension; both common ailments.
Since angiotensin II type 1 overstimulation is so common, there has been a large effort to find ways to modulate the protein with drug molecules.
The 4ZUD structure is an angiotesin II type 1 receptor binding domain crystal in complex with Olmesartan, an inverse agonist.
Directory Preparation
Before beginning, create the following directories in your space so that all necessary files are organized and can be access quickly:
mkdir 001.structure 002.surface_spheres 003.gridbox 004.dock 005.virtual_screen 006.cartesian_min 007.rescore
You don't have to name your directories the same as they are named here, but be cautious since the files that will be used for this tutorial utilize this naming scheme. They will need to be changed in each file that refers to them if you don't use this naming scheme!
Be sure to have Chimera installed on your system as it will be our primary visualization and system-editing program.
Protein and Ligand Preparation
Download the 4ZUD PDB file from the RCSB PDB website and open the file in Chimera.
Select -> Open -> (pathway to pdb file on your local machine) -> Open
You will notice a few side chain residues are explicitly displayed; those are the ones that directly engage with the ligand. The structure also has some missing regions denoted by the dashed-lines. These regions do not have to be modeled to use the system for docking since the majority of the protein remains restrained during the process (except for the residues of the active site, to a certain extent). You can play around with Chimera and visualize the protein from different angles to get a complete look at the protein to ensure there are no glaring errors in the structure that could have somehow arose from the downloading and opening process (Doesn't usually happen, but it's always good to be sure before moving on!)
Protein Preparation
Many structures deposited in the PDB lack hydrogens due to the difficulty in resolving their electron densities from cryo-EM or X-ray crystallography. The structures also lack formal charges since that information is not captured with out current experimental structure-determining techniques. Both charges and hydrogens are crucial for accurately studying any chemical system, and so they both must be added manually to 4ZUD in order to prime the system for docking.
First we want to delete the ligand from the file since we want to save the protein separately
Select -> Residue -> OLM Actions -> Atoms/Bonds -> Delete
Adding Hydrogens and Charges
To select the entire protein:
Select -> Chain -> A
To add the hydrogens and charges to the protein we will use Dock Prep:
Tools -> Surface/Binding Analysis -> DockPrep
The first panel should look something like this:
Leave the default setting on and press okay. You will then be prompted to add the hydrogens:
Again, leave the default settings.
Lastly, DockPrep will assign charges:
Make sure that this value of the calculated charge (which will be outputted in the Chimera reply log) is in agreement with what the PDB website or the corresponding paper says about the charge, if that information is provided.
File Saving
Once you have completed all of the aforementioned steps, you have to save the protein as a mol2 file.
File -> Save Mol2
For the purposes of this tutorial the file will be called 4ZUD_protein_hydrogens.mol2
You will want to save a version of this file without hydrogens. To do that, you can select and delete all of the hydrogens like so:
Select -> Chemistry -> element -> H Actions -> Atoms/Bonds -> Delete
You can then resave the file. 4ZUD_protein_without_hydrogens.mol2
Ligand Preparation
Now we want to focus on preparing the ligand. We can reopen the unmodified 4ZUD PDB in Chimera and delete the protein from the file. You can do this by doing an inverse selection for the protein. First select the ligand:
Select -> Residue -> OLM
On your keyboard, press Shift + Right-Arrow keys simultaneously to invert the selection to the parts that belong to the protein. You can then delete your selection, and you should be left with just the ligand:
Actions -> Atoms/Bonds -> Delete
Adding Hydrogens and Charges
To add hydrogens to the ligand, you can follow the same procedure that you used to add them to the protein § Adding Hydrogens and Charges.
DockPrep predicts a net -1 charge for the ligand. This makes sense since there is a carboxylate group in the molecules. Again, make sure to corroborate this with what the paper states and/or the PDB website states about the charge.
File Saving
You can then save the file the same way you did for the protein mol2. For the purposes of this tutorial, the file will be named 4ZUD_ligand_hydrogens.mol2
Before going any further, make sure to copy the mol2 files that were generated thus far from your local computer to Seawulf. Place these files in the 001.structure directory, or whichever one is the equivalent directory in your records.
Surface Spheres Generation
Grid Box
Grid Generation
UNDER CONSTRUCTION -- FILE NAMES, DIRECTORIES ARE FROM AS. WILL CHANGE THEM TO JA CONVENTIONS.
Next, we'll need to generate the grid we'll use in subsequent steps. Let's start by creating our grid.in file:
vi grid.in
Into this file copy-paste the following:
compute_grids yes grid_spacing 0.4 output_molecule no contact_score no energy_score yes energy_cutoff_distance 9999 atom_model a attractive_exponent 6 repulsive_exponent 9 distance_dielectric yes dielectric_factor 4 bump_filter yes bump_overlap 0.75 receptor_file ../001_structure/4zud_receptor_wcharges_wH.mol2 box_file 4zud.box.pdb vdw_definition_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn score_grid_prefix grid
Now that there is a grid.in file, the grid can be produced by running the following:
grid -i grid.in -o gridinfo.out
Upon successful completion of the grid generation, there should be three new files in your working directory: grid.bmp, grid.nrg, and gridinfo.out.
Box Generation
Before we do anything, we must be in the correct directory:
cd 003.gridbox
To generate a box around our subject, we need an .in file for Showbox:
vi showbox.in
Into which we will copy-paste the following:
Y 8.0 ../002_surface_spheres/4zud_receptor_trunc_spheres_sel.sph 1 4zud.box.pdb
Now that we have a .in file, we will produce a .box.pdb file by running the following:
showbox < showbox.in
After running this, you should have a 4ZUD.box.pdb file in the directory which you ran these commands.
Ligand Minimization
While the ligand in the 4ZUD crystal structure seems to be in a reasonable pose (no steric clashes, etc.), and is of course bound to the receptor, it may not be representative of the lowest energy conformation. Assuming this is the case, the use of the non-minimized ligand conformation will reduce the accuracy of any calculations we perform with it.
One can avoid this problem by performing an energy minimization of the ligand. The first step to do this with 4ZUD will be to cd into the appropriate directory:
cd 004.dock
We'll be doing some other work in this directory later, so we should create a directory within this one specifically for performing our minimization:
mkdir min
cd into /min/ and produce an input file:
vi min.in
Into which the following should be copy-pasted:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../001_structure/4zud_ligand_wcharges_wH.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../001_structure/4zud_ligand_wcharges_wH.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min yes simplex_coefficient_restraint 10.0 atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix 4zud.lig.min
Now that we have our .in file, let's put it to use:
dock6 -i min.in -o min.out
Once this finishes running, two files will be generated: 4zud.lig.min_scored.mol2 and min.out. Copy 4zud.lig.min_scored.mol2 to your local machine and open it alongside the original ligand in chimera. It should something like this:
Where the blue structure is the ligand before minimization, and the beige ligand is after.
Footprint Analysis
DOCKING
The main goal of this section is to conduct a pose reproduction of the crystal structure ligand to ensure that we are able to successfully re-produce the low-energy pose that the ligand was crystalized in. If docking is not successful at this, then this system may not be suitable for further testing using DOCK since it is not able to produce a pose that is already known to be sampled by the ligand, based on observations from the crystal. For these reasons, pose reproduction is a worthwhile preliminary step in any docking experiments so that we can be more confident in our results if the system will be used for more complex docking protocols such as virtual screening or de novo design.
In the current implementation of DOCK6, a sampling success is defined as a pose RMSD that is < 2 Å relative to the crystal pose.
Rigid Docking
Rigid docking is the least computationally intensive algorithm in the DOCK program since it does not allow for dihedral rotations, or bond length perturbations. Since we are just focusing on pose reproduction (and since we know that the crystal structure ligand pose is of biological relevance), this docking method would be a good first step.
In your 004.dock directory prepare this file:
vim rigid.in
Insert the following into that file:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100 ligand_atom_file ../003.gridbox/4zud.lig.min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../003.gridbox/4zud.lig.min_scored.mol2 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002.surface_spheres/selected_spheres.sph max_orientations 2000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003.gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix rigid.out write_orientations no num_scored_conformers 1 rank_ligands no
Run the input file with the following command:
dock6 -i rigid.in
Along with an out file, the rigid.out_scored.mol2 should have been created. This is your docked ligand(s). The ligand maybe be superimposed onto the crystal structure ligand using View Dock in Chimera:
Tools -> Surface/Binding Analysis -> ViewDock -> (Path to docked mol2) -> Dock 4, 5, or 6
The crystal reference ligand is colored in brown.
Notice how similar the reproduced pose is to the crystal pose. This is largely due to not allowing much position correction/movement during the docking; hence rigid docking. This is the simplest form of docking so you should expect to see very small RMSDs. The RMSD for this DOCK run is below our threshold so it is considered to be a sampling success.
Fixed Anchor Docking
With fixed anchor docking, we can introduce more flexibility into the conformational sampling when trying to re-dock the ligand into the active site. This is done by choosing certain fragments from the ligand as "anchors" which are manually placed in specific locations on the active site, which serve as points from where the other fragments that make up the known crystal ligand can attach to and grow from.
This docking algorithm will allow for the sampling of rotamers of the linker moieties attached to the fixed anchor(s), allowing for a more diverse free energy landscape to explore; possibly yielding more theoretically favorable ligand conformations!
In a new sub directory:
mkdir Fixed_anchor_docking
In a new input file:
vi fixed.in
Insert the following:
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 10000 pruning_clustering_cutoff 100 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../../001.structure/4ZUD_ligand_hydrogens.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../../001.structure/4ZUD_ligand_hydrogens.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../../003.gridbox/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1 simplex_trans_step 1 simplex_rot_step 0.1 simplex_tors_step 10 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix 4ZUD_fad write_orientations no num_scored_conformers 1 rank_ligands no
To run the analysis:
dock6 -i fixed.in -O fixed.out
The resulting docked pose:
The RMSD in this case is still small, and within the cutoff, but slightly larger than the rigid docking which is also evident in the image. It makes sense that the pose generated from fixed anchor docking is not as close to the crystal pose since we allowed the algorithm to explore different rotational conformations, increasing the likelihood of generating a new conformation.
Flex Docking
In Flex Docking the ligand is fully flexible, as its name would suggest. All degrees of freedom are sampled and all atoms within the ligand have full rotational freedom about their bonds. Due to the rigor of this calculation, however, flex docking is quite resource and time-consuming. This of course means that we will need to submit our run to the queue. Before we do that, we need to make an .in file for our run:
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 5000 pruning_clustering_cutoff 2500 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../../003.gridbox/4zud.lig.min_scored.mol2 #CHANGE ME limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../../003.gridbox/4zud.lig.min_scored.mol2 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../../002.surface_spheres/selected_spheres.sph #CHANGE ME max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../../003.gridbox/grid #CHANGE ME multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix flex.out write_orientations yes num_scored_conformers 1 rank_ligands no
Now that we have our .in file, we need to make a slurm script to run it. To make a script, first run the following:
vi flex.sh
Inside of this file, copy-paste the following:
#!/bin/bash #SBATCH --time=10:00:00 #SBATCH --nodes=1 #SBATCH --ntasks=28 #SBATCH --job-name=4zud_flex_dock #SBATCH --output=4zud_flex_dock #SBATCH -p long-28core #SBATCH --mail-type=BEGIN,END #SBATCH --mail-user=<your_netid>@stonybrook.edu
cd $SLURM_SUBMIT_DIR dock6 -i flex.in -o 4zud_flex.out
You should get an email when your run begins and when it ends. For most systems, this should take a few hours to run. If the run is much shorter than that it can be an indicator something has gone wrong. Once complete, three files will appear in the directory where you ran your script: flex.out_scored.mol2, flex.out_conformers.mol2 and flex.out_orients.mol2. Be sure to take a look at your flex.out_scored.mol2. You should see that all RMSD values are less than 2 angstroms -- a desirable result.
Virtual Screening
Virtual Screening is a powerful technique in which we dock a library of molecules (often numbering in the tens of thousands) into our receptor and rank them on various parameters. As one might imagine, these calculations are time and resource-intensive, so we won't be running this on the head node.
The molecule library we will use in this tutorial is "VS_library_25K.mol2", and can be found in /gpfs/software/AMS536/zzz.programs. Copy this file to <virtual screening directory> and <cartesian minimization directory>, since it will be needed later.
To start off, move into the appropriate directory:
<directory>
Once there, make an .in file for the screen:
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 1000 pruning_clustering_cutoff 100 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file VS_library_25K.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002_surface_sphere/selected_spheres.sph #CHANGE ME max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../003_gridbox/grid #CHANGE ME multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix virtual.out write_orientations no num_scored_conformers 1 rank_ligands yes max_ranked_ligands 500
To ensure that the .in file has been configured correctly, we will do a test run of the calculation:
dock6 -i virtual.in
If the calculation seems to begin without issue, you can press ctrl+c to end the test. We are ready to submit the calculation as a job. Since we can leverage multiple cores for our virtual screen, we will run the command with MPI. The slurm script can be made by running the following:
vi virtual_screen.sh
Then you need to copy-paste the following into the file:
#!/bin/bash #SBATCH --time=48:00:00 #SBATCH --nodes=4 #SBATCH --ntasks=28 #SBATCH --job-name=4zud_vs #SBATCH --output=4zud_vs #SBATCH -p long-28core #SBATCH --mail-type=BEGIN,END #SBATCH --mail-user=andrew.sillato@stonybrook.edu
cd $SLURM_SUBMIT_DIR mpirun -np 112 dock6.mpi -i virtual.in -o virtual.out
To submit this job to the queue, run:
sbatch virtual_screen.sh
Due to the fact that this job is running on 112 cores, 112 .out files will be produced over the course of the run. Their contents are summarized in virtual.out, which should be transferred to your local machine for viewing in Chimera.
add note about truncating the virtual.out file here