2023 DOCK tutorial 2 with PDBID 3WZE

From Rizzo_Lab
Revision as of 15:45, 9 March 2023 by Stonybrook (talk | contribs) (Footprinting)
Jump to: navigation, search

In this tutorial, you will learn how use the program DOCK6.10 to perform a virtual screen, in which you assess how well the molecules in a library of drug-like molecules bind to a protein of known structure.

Introduction

A protein whose function is found to be involved in one or more diseases may become a target for pharmaceutical design. Oftentimes, these pharmaceuticals are designed to compete with the enzyme's native substrate for the enzyme's active site, making many pharmaceutical molecules competitive inhibitors of their protein targets. If the target protein's structure is known, and the active site can be identified, then performing a virtual screen can be a monetarily and temporally efficient method of identifying molecules which are likely to bind well to the target's active site.

A virtual screen is set up by first preparing the enzyme's structure and the structure of its native substrate for docking, then the residues important for the native ligand to bind are identified by generating a footprint. A large library of drug-like molecules is then downloaded from a database such as ZINC [REFERENCE], and, using the footprint and enzyme structure, docked into the enzyme using a program such as DOCK6.10 [REFERENCE]. Results are then assessed to see which drug-like compounds match the native substrate's footprint profile and which are energetically comfortable within the simulated active site. Such molecules could then be tested biochemically for their ability to inhibit the target protein, sparing biochemists the hassle of having to test hundreds of thousands of compounds in physical screening experiments.

Software and Files

PDB 3WZE

PDB stands for "Protein Data Bank", which is a repository for the experimentally solved structures of proteins. Each protein structure is assigned a 4 digit code, and 3WZE is the code assigned to the solved structure of vascular endothelial growth factor receptor 2 (VEGFR2) bound to the inhibitor sorafenib. VEGFR2 is a receptor tyrosine kinase, meaning that it is an integral membrane protein that has an exofacial receptor domain, transmembrane domain, and cytofacial kinase domain. Because of the difficulties of protein purification, crystallization, and structure solution, many protein structures in the protein data bank are incomplete: lacking regions that are intrinsically disordered or otherwise not conducive to crystallization. PDB 3WZE is one such structure, because it only contains structural data for the cytofacial kinase domain of VEGFR2.

The active site of kinases is well characterized, and sorafenib is shown bound within it in PDB 3WZE, which will be useful for conducting later steps in the virtual screen.

Download the .pdb file of 3WZE, and use a program such as Chimera or ChimeraX to open and view it.

DOCK6.10

Chimera

ChimeraX (optional)

Chimera is now no longer actively developed, and has been succeeded by ChimeraX, which is developed by the same group [REFERENCE]. Although ChimeraX has lost some of the functionality of its predecessor, it has new capabilities to compensate, and it is easier to operate using typed commands, whereas Chimera requires clicking through menus. That being said, Chimera is still required for this tutorial because ChimeraX cannot open .sph files and it cannot save a surface as a .dms file.

Separate instructions for completing the tutorial with ChimeraX will be provided alongside the instructions for using Chimera in each section.

Alphafold (optional)

Alphafold is a protein structure prediction program from Google's DeepmindREFERENCE, and it was the first program to predict the structures of proteins in the annual CASP competition to within 90% accuracy in 2020 REFERENCE. Using this program, one can generate a reasonably accurate prediction of a protein's structure using only its amino acid sequence. In the context of virtual screening, this means that a protein's structure no longer needs to be solved experimentally before one can embark on a virtual screen of the target. As long as the active site can be identified (which is often done by comparing the predicted structure to homologous proteins with solved structures), one can perform a virtual screen of a protein of unsolved structure.

Even without a university server, Alphafold can be used from within ChimeraX by going to Tools -> Structure Prediction -> Alphafold. This will bring up a menu in which you can paste an amino acid sequence for prediction, searching, or retrieval from the Alphafold database. All human proteins have already been predicted by Alphafold, and their structures can be easily retrieved using the protein's UniProt identifier and the Fetch button. Non-human proteins will have to be predicted from scratch by inputting their amino acid sequence and using the Predict button.

Because this tutorial will use PDB file 3WZE, which contains the solved structure of the Vascular Endothelial Growth Factor Receptor and a bound inhibitor called sorafenib, Alphafold will be unnecessary for this tutorial, but the broadened scope of what virtual screens are possible as a result of this program is worth noting nonetheless.

Using the Terminal

Seawulf uses the Linux language for various commands. Here are some of the basic commands that will make running the tutorial run smoothly.

The first command is cd, which is used to change into different directories. To go to a desired directory do:

cd /desired/directory

To move to a previous directory you were just in, type:

cd ../

The cd command can also be used to get into a folder, type:

cd /desired/folder

The next command is the ls command. This will list out all the files or folders in the directory you are currently in, type:

ls

To make a new directory, type:

 mdkir name_of_directory

To remove a file, type:

rm file

To delete a directory and all its contents, type:

rm -r directory

To create a file in the terminal, type:

vi filename

This will open up a blank screen in which you can start typing. To edit the file, first type “i”. This will make the environment go to “INSERT” mode. You can now freely type. In order to save whatever you wrote in the file, click the escape button. This will remove you from “INSERT” mode. Next type:

:wq

This will save any changes you have made to your file

To move a file to a new location, type:

mv file /new/destination 

To make a copy of a file, type:

cp file file_new_name

This command can also be used to copy directories as well, type:

cp -r directory_1 directory_2

Directory Organization

Having defined folders established before starting is a great way to maintain organization and clarity as you go.

 000_files
 001_structure
 002_surface_spheres
 003_gridbox
 004_dock
 005_virtual_screen
 006_virtual_screen_mpi
 007_cartesian_min
 008_rescore

Additionally, file names should be clear and logical. We recommend starting with the PDB code followed by the receptor or ligand and ending with what changes were made. For example, 3WZE_lig_AddH_addCharge.mol2 indicates that this file is the ligand for 3WZE and hydrogens and charge have been added.

Preparing the Receptor

Preparing the Ligand

Surface & Spheres

Surface Generation

The generation of the surface of the receptor protein will have better visualization of the binding site of the receptor. In order to do so, load the mol2 file of 3WZE receptor in Chimera:

 Action < Surface < show

Then we will save the dms(molecular surface) file as 3WZE_rec.dms, which will be used in the next step of spheres generation:

 Tools < Structure Editing < Write DMS

Spheres Generation

First create a new file “INSPH” by typing the following into terminal:

 vi INSPH

Then type the following information into this file (anything after # are comments that are references for reading only, and should not be putting into the file) :

 3WZE_rec.dms  #Molecular Surface file
 R             #This line specifies whether to make the spheres in the model’s exterior (R) or interior (L)
 X             #Specifies that the surface points from the dms file should be used for making spheres(?)
 0             #Minimum radius between spheres
 4.0           #Max radius of each sphere
 1.4           #Minimum radius of each sphere
 3WZE.sph      #Name of the output file that will be made

Then save the file and run the following command on terminal: note that if you want to re-run this command, make sure you remove file “OUTSPH” before your re-run because this file cannot be overwritten.

 sphgen -i INSPH -o OUTSPH

The output file “3WZE.sph” can be visualized by using Chimera, load the output file along with the receptor protein mol2 file, the two structures should be overlapping:

Spheres selection

Since we are only interested in what’s happening in the binding site, we will remove spheres that are far away from the ligand. To do so, use the following command and an output file “selected_spheres.sph” will be generated:

 sphere_selector 3WZE.sph 3WZE_lig.mol2 10.0
 #sphere_selector [sphere output file] [ligand mol2 file] [radius for distance from ligand]

Box and Grid

Box generation

First we will generate a box that will be used to generate the grid. To generate a box, we will need to create a new file “showbox.in” by using the following command:

 vi showbox.in

Then type the following information into the file (anything after # are comments that are references for reading only, and should not be putting into the file) :

 Y                         #Indicates that we want to make a box
 8.0                       #Extra length beyond the spheres to be included by the box in all directions
 selected_spheres.sph      #Indicates that the box should be constructed to encompass the spheres in this file.
 1                         #This specifies which cluster of spheres should be used to generate the box. 
 3WZE.box.pdb              #Name of output file

Then, run the following code to get the boxed structure:

 showbox < showbox.in

Grid generation

Create a new file “grid.in” by using the following command:

 vi grid.in

Then type the following information into the file:

 compute_grids                             yes
 grid_spacing                              0.4
 output_molecule                           no
 contact_score                             no
 energy_score                              yes
 energy_cutoff_distance                    9999
 atom_model                                a
 attractive_exponent                       6
 repulsive_exponent                        9
 distance_dielectric                       yes
 dielectric_factor                         4
 bump_filter                               yes
 bump_overlap                              0.75
 receptor_file                             3WZE_rec.mol2
 box_file                                  3WZE.box.pdb
 vdw_definition_file                       /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/vdw_AMBER_parm99.defn
 score_grid_prefix                         grid

Then generate the grid by typing the following command:

 grid -i grid.in -o gridinfo.out

This will generate three files: “gridinfo.out”, “grid.bmp”, and “grid.nrg”. Open “gridinfo.out” and check if the information about the receptor matches the that from the paper and your previous preparations. Also check to make sure there are no error messages at the bottom.

Reproducing the PDB's binding with DOCK

Energy minimization

To improve the accuracy of DOCK calculation of ligand protein binding, we will have to minimize the potential energy of the ligand. To do so, create a new file “min.in” by using the following command:

 vi min.in

Then type the following information into the file:

 conformer_search_type                                        rigid
 use_internal_energy                                          yes
 internal_energy_rep_exp                                      12
 internal_energy_cutoff                                       100.0
 ligand_atom_file                                             3WZE_rec.mol2
 limit_max_ligands                                            no
 skip_molecule                                                no
 read_mol_solvation                                           no
 calculate_rmsd                                               yes
 use_rmsd_reference_mol                                       yes      
 rmsd_reference_filename                                      3WZE_rec.mol2
 use_database_filter                                          no
 orient_ligand                                                no
 bump_filter                                                  no
 score_molecules                                              yes
 contact_score_primary                                        no
 contact_score_secondary                                      no
 grid_score_primary                                           yes
 grid_score_secondary                                         no
 grid_score_rep_rad_scale                                     1
 grid_score_vdw_scale                                         1
 grid_score_es_scale                                          1
 grid_score_grid_prefix                                       grid 
 multigrid_score_secondary                                    no
 dock3.5_score_secondary                                      no
 continuous_score_secondary                                   no
 footprint_similarity_score_secondary                         no
 pharmacophore_score_secondary                                no
 descriptor_score_secondary                                   no
 gbsa_zou_score_secondary                                     no
 gbsa_hawkins_score_secondary                                 no
 SASA_score_secondary                                         no
 amber_score_secondary                                        no
 minimize_ligand                                              yes
 simplex_max_iterations                                       1000
 simplex_tors_premin_iterations                               0
 simplex_max_cycles                                           1
 simplex_score_converge                                       0.1
 simplex_cycle_converge                                       1.0
 simplex_trans_step                                           1.0
 simplex_rot_step                                             0.1
 simplex_tors_step                                            10.0
 simplex_random_seed                                          0
 simplex_restraint_min                                        yes
 simplex_coefficient_restraint                                10.0
 atom_model                                                   all
 vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/vdw_AMBER_parm99.defn
 flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/flex.defn
 flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/flex_drive.tbl
 ligand_outfile_prefix                                        3WZE.lig.min
 write_orientations                                           no
 num_scored_conformers                                        1
 rank_ligands                                                 no

Then run the following command through the use of DOCK:

 dock6 -i min.in

The output file “3WZE.lig.min” is the potential energy minimized version of the ligand.

Footprinting

Dock6 has a feature in which we can visualize both the electrostatic and Van der interactions between the ligand and protein. This will show which amino acids are contributing to the binding of the ligand to the protein.

To determine the molecular footprint, create the following input file in your 004_dock directory:

vi footprint.in

Copy the following into your file:

conformer_search_type					rigid
use_internal_energy 						no
ligand_atom_file 						ligand file name
limit_max_ligands 						no
skip_molecule 						no
read_mol_solvation 						no
calculate_rmsd 						no 
use_database_filter 						no
orient_ligand 							no
bump_filter 							no
score_molecules 						yes
contact_score_primary 					no
contact_score_secondary 					no
grid_score_primary 						no
grid_score_secondary 					no 
multigrid_score_primary 					no
multigrid_score_secondary            				no
dock3.5_score_primary 					no
dock3.5_score_secondary 					no
continuous_score_primary 					no
continuous_score_secondary    				no
footprint_similarity_score_primary 				no
footprint_similarity_score_secondary 			no
fps_score_use_footprint_reference_mol2 			no
fps_score_use_footprint_reference_mol2       		no
fps_score_foot_compare_type 				Euclidean
fps_score_normalize_foot 					no
fps_score_foot_comp_all_residue 				no
fps_score_receptor_filename 					receptor name 
fps_score_vdw_att_exp 					6
fps_score_vdw_rep_exp 					9
fps_score_vdw_rep_rad_scale  				1
fps_score_use_distance_dependent_dielectric 		yes
fps_score_dielectric 						4.0
fps_score_vdw_fp_scale 					1
fps_score_es_fp_scale 					1
fps_score_hb_fp_scale 					0
pharmacophore_score_secondary 				no
descriptor_score_secondary 					no
gbsa_zou_score_secondary 					no
gbsa_hawkins_score_secondary 				no
SASA_score_secondary					no
amber_score_secondary 					no
minimize_ligand 						no
atom_model 							all
vdw_defn_file 						
flex_defn_file 							
flex_drive_file 						
ligand_outfile_prefix						footprint.out
write_footprints 						yes
write_hbonds 							yes
write_orientations 						no
num_scored_conformers 					1
rank_ligands 							no

In order to run the footprint calculation, type:

dock6 -i footprint.in -o footprint.out

An error might occur when running footprint with dock, but that is fine. Three files will be generated after a successful run. The files that are generated are footprint.out_footprint_scored.txt footprint.out_hbond_scored.txt footprint.out_scored.mol2. In order to visualize the molecular footprint we need to use a python script provided here a prewritten python script. To run the script just do:

python plot_footprint_single_magnitude.py footprint.out_footprint_scored.txt  50

This python script will generate a graph that shows the Van der Waals and Electrostatic contributions of the 50 most contributing amino acids. Use a pdf viewer to visualize the graph you just created. It should look similar to this:

Virtual Screen

By having a predetermined compound library, virtual screening in DOCK can score these in the compound library based on the input parameters. In this case, we used a smaller library “VS_library_25K.mol2” that was provided by the instructor . First create a new file “virtual.in” for inputting virtual screening parameters:

 vi virtual.in

Then type the following information into the file:

 conformer_search_type                                        flex
 user_specified_anchor                                        no
 limit_max_anchors                                            no
 min_anchor_size                                              5
 pruning_use_clustering                                       yes
 pruning_max_orients                                          1000
 pruning_clustering_cutoff                                    100
 pruning_conformer_score_cutoff                               100.0
 pruning_conformer_score_scaling_factor                       1.0
 use_clash_overlap                                            no
 write_growth_tree                                            no
 write_fragment_libraries                                     no
 use_internal_energy                                          yes
 internal_energy_rep_exp                                      12
 internal_energy_cutoff                                       100.0
 ligand_atom_file                                             VS_library_25K.mol2
 limit_max_ligands                                            no
 skip_molecule                                                no
 read_mol_solvation                                           no 
 calculate_rmsd                                               no
 use_database_filter                                          no
 orient_ligand                                                yes
 automated_matching                                           yes
 receptor_site_file                                           selected_spheres.sph
 max_orientations                                             1000
 critical_points                                              no
 chemical_matching                                            no
 use_ligand_spheres                                           no 
 bump_filter                                                  no 
 score_molecules                                              yes
 contact_score_primary                                        no
 contact_score_secondary                                      no
 grid_score_primary                                           yes
 grid_score_secondary                                         no
 grid_score_rep_rad_scale                                     1
 grid_score_vdw_scale                                         1
 grid_score_es_scale                                          1
 grid_score_grid_prefix                                       grid
 multigrid_score_secondary                                    no
 dock3.5_score_secondary                                      no
 continuous_score_secondary                                   no
 footprint_similarity_score_secondary                         no
 pharmacophore_score_secondary                                no
 descriptor_score_secondary                                   no
 gbsa_zou_score_secondary                                     no
 gbsa_hawkins_score_secondary                                 no
 SASA_score_secondary                                         no 
 amber_score_secondary                                        no
 minimize_ligand                                              yes
 minimize_anchor                                              yes
 minimize_flexible_growth                                     yes
 use_advanced_simplex_parameters                              no
 simplex_max_cycles                                           1
 simplex_score_converge                                       0.1
 simplex_cycle_converge                                       1.0
 simplex_trans_step                                           1.0
 simplex_rot_step                                             0.1
 simplex_tors_step                                            10.0 
 simplex_anchor_max_iterations                                500
 simplex_grow_max_iterations                                  500
 simplex_grow_tors_premin_iterations                          0
 simplex_random_seed                                          0
 simplex_restraint_min                                        no
 atom_model                                                   all
 vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/vdw_AMBER_parm99.defn
 flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/flex.defn
 flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/flex_drive.tbl
 ligand_outfile_prefix                                        virtual.out
 write_orientations                                           no 
 num_scored_conformers                                        1
 rank_ligands                                                 no

To prevent overload of the current working load and speed up the process, we can write the code for virtual screening along with requesting more nodes in s shell script and submit the job. To do so, first create a new file “virtual.sh”:

 vi virtual.sh

Then type the following information into this file:

 #!/bin/bash
 #SBATCH --time=5:00:00
 #SBATCH --nodes=4
 #SBATCH --ntasks=112
 #SBATCH --job-name=3WZE_flex_dock
 #SBATCH --output=3WZE_flex_dock
 #SBATCH -p long-28core
 mpirun -np 112 dock6.mpi -i virtual.in -o 3WZE.virtual.mpi.out

Then submit the job by typing the following command:

 sbatch virtual.sh

Since we specified the number of task to be 112, we are expecting to see 112 output files, name in the format "virtual.out.xxx". In addition, the output contains a mol2 file with all 112 virtual screening results inside.

Cartesian Energy Minimization

We have to perform energy minimization on the virtual screening results before analyzing. First, create a new file “cart_min.in”:

 vi cart_min.in

Then type the following information into the file:

 conformer_search_type                                        rigid
 use_internal_energy                                          yes
 internal_energy_rep_exp                                      12
 internal_energy_cutoff                                       100.0
 ligand_atom_file                                             3WZE_virtual.out_scored.mol2
 limit_max_ligands                                            no
 skip_molecule                                                no
 read_mol_solvation                                           no
 calculate_rmsd                                               no
 use_database_filter                                          no
 orient_ligand                                                no
 bump_filter                                                  no
 score_molecules                                              yes
 contact_score_primary                                        no
 contact_score_secondary                                      no
 grid_score_primary                                           no
 grid_score_secondary                                         no
 multigrid_score_primary                                      no
 multigrid_score_secondary                                    no
 dock3.5_score_primary                                        no
 dock3.5_score_secondary                                      no
 continuous_score_primary                                     yes
 continuous_score_secondary                                   no
 cont_score_rec_filename                                      3WZE_rec.mol2
 cont_score_att_exp                                           6
 cont_score_rep_exp                                           12
 cont_score_rep_rad_scale                                     1
 cont_score_use_dist_dep_dielectric                           yes
 cont_score_dielectric                                        4.0
 cont_score_vdw_scale                                         1
 cont_score_es_scale                                          1
 footprint_similarity_score_secondary                         no
 pharmacophore_score_secondary                                no
 descriptor_score_secondary                                   no
 gbsa_zou_score_secondary                                     no
 gbsa_hawkins_score_secondary                                 no
 SASA_score_secondary                                         no
 amber_score_secondary                                        no
 minimize_ligand                                              yes
 simplex_max_iterations                                       1000
 simplex_tors_premin_iterations                               0
 simplex_max_cycles                                           1
 simplex_score_converge                                       0.1
 simplex_cycle_converge                                       1.0
 simplex_trans_step                                           1.0
 simplex_rot_step                                             0.1
 simplex_tors_step                                            10.0
 simplex_random_seed                                          0
 simplex_restraint_min                                        no
 atom_model                                                   all
 vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/vdw_AMBER_parm99.defn
 flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/flex.defn
 flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/flex_drive.tbl
 ligand_outfile_prefix                                        3WZE.virtualscreen_cart.min
 write_orientations                                           no
 num_scored_conformers                                        1
 rank_ligands                                                 no

Then we write a new shell script “cart_min.sh” to request node and run the code to prevent overflow:

 vi cart_min.sh

The following lines should be included in the shell file:

 #!/bin/bash
 #SBATCH --time=10:00:00
 #SBATCH --nodes=4
 #SBATCH --ntasks=112
 #SBATCH --job-name=3WZE_flex_dock
 #SBATCH --output=3WZE_flex_dock
 #SBATCH -p long-28core
 mpirun -np 112 dock6.mpi -i cart_min.in -o 3WZE_cart_min.virtual.mpi.out

Then submit the job by running the following command:

 sbatch cart_min.sh

We then will be able to see the results.

Rescoring the Virtual Screen

References