Difference between revisions of "2023 DOCK tutorial 2 with PDBID 3WZE"
Stonybrook (talk | contribs) (→Grid generation) |
Stonybrook (talk | contribs) |
||
Line 1: | Line 1: | ||
In this tutorial, you will learn how use the program DOCK6.10 to perform a virtual screen, in which you assess how well the molecules in a library of drug-like molecules bind to a protein of known structure. | In this tutorial, you will learn how use the program DOCK6.10 to perform a virtual screen, in which you assess how well the molecules in a library of drug-like molecules bind to a protein of known structure. | ||
− | = Introduction = | + | =''' Introduction '''= |
A protein whose function is found to be involved in one or more diseases may become a target for pharmaceutical design. Oftentimes, these pharmaceuticals are designed to compete with the enzyme's native substrate for the enzyme's active site, making many pharmaceutical molecules competitive inhibitors of their protein targets. If the target protein's structure is known, and the active site can be identified, then performing a virtual screen can be a monetarily and temporally efficient method of identifying molecules which are likely to bind well to the target's active site. | A protein whose function is found to be involved in one or more diseases may become a target for pharmaceutical design. Oftentimes, these pharmaceuticals are designed to compete with the enzyme's native substrate for the enzyme's active site, making many pharmaceutical molecules competitive inhibitors of their protein targets. If the target protein's structure is known, and the active site can be identified, then performing a virtual screen can be a monetarily and temporally efficient method of identifying molecules which are likely to bind well to the target's active site. | ||
Line 7: | Line 7: | ||
A virtual screen is set up by first preparing the enzyme's structure and the structure of its native substrate for docking, then the residues important for the native ligand to bind are identified by generating a footprint. A large library of drug-like molecules is then downloaded from a database such as ZINC '''[REFERENCE]''', and, using the footprint and enzyme structure, docked into the enzyme using a program such as DOCK6.10 '''[REFERENCE]'''. Results are then assessed to see which drug-like compounds match the native substrate's footprint profile and which are energetically comfortable within the simulated active site. Such molecules could then be tested biochemically for their ability to inhibit the target protein, sparing biochemists the hassle of having to test hundreds of thousands of compounds in physical screening experiments. | A virtual screen is set up by first preparing the enzyme's structure and the structure of its native substrate for docking, then the residues important for the native ligand to bind are identified by generating a footprint. A large library of drug-like molecules is then downloaded from a database such as ZINC '''[REFERENCE]''', and, using the footprint and enzyme structure, docked into the enzyme using a program such as DOCK6.10 '''[REFERENCE]'''. Results are then assessed to see which drug-like compounds match the native substrate's footprint profile and which are energetically comfortable within the simulated active site. Such molecules could then be tested biochemically for their ability to inhibit the target protein, sparing biochemists the hassle of having to test hundreds of thousands of compounds in physical screening experiments. | ||
− | == Software and Files == | + | =='''Software and Files''' == |
− | === PDB 3WZE === | + | ==='''PDB 3WZE''' === |
PDB stands for "Protein Data Bank", which is a repository for the experimentally solved structures of proteins. Each protein structure is assigned a 4 digit code, and 3WZE is the code assigned to the solved structure of vascular endothelial growth factor receptor 2 (VEGFR2) bound to the inhibitor sorafenib. VEGFR2 is a receptor tyrosine kinase, meaning that it is an integral membrane protein that has an exofacial receptor domain, transmembrane domain, and cytofacial kinase domain. Because of the difficulties of protein purification, crystallization, and structure solution, many protein structures in the protein data bank are incomplete: lacking regions that are intrinsically disordered or otherwise not conducive to crystallization. PDB 3WZE is one such structure, because it only contains structural data for the cytofacial kinase domain of VEGFR2. | PDB stands for "Protein Data Bank", which is a repository for the experimentally solved structures of proteins. Each protein structure is assigned a 4 digit code, and 3WZE is the code assigned to the solved structure of vascular endothelial growth factor receptor 2 (VEGFR2) bound to the inhibitor sorafenib. VEGFR2 is a receptor tyrosine kinase, meaning that it is an integral membrane protein that has an exofacial receptor domain, transmembrane domain, and cytofacial kinase domain. Because of the difficulties of protein purification, crystallization, and structure solution, many protein structures in the protein data bank are incomplete: lacking regions that are intrinsically disordered or otherwise not conducive to crystallization. PDB 3WZE is one such structure, because it only contains structural data for the cytofacial kinase domain of VEGFR2. | ||
Line 17: | Line 17: | ||
Download the .pdb file of 3WZE, and use a program such as Chimera or ChimeraX to open and view it. | Download the .pdb file of 3WZE, and use a program such as Chimera or ChimeraX to open and view it. | ||
− | === DOCK6.10 === | + | ==='''DOCK6.10 '''=== |
− | === Chimera === | + | ===''' Chimera''' === |
− | === ChimeraX (optional) === | + | === '''ChimeraX (optional) '''=== |
Chimera is now no longer actively developed, and has been succeeded by ChimeraX, which is developed by the same group '''[REFERENCE]'''. Although ChimeraX has lost some of the functionality of its predecessor, it has new capabilities to compensate, and it is easier to operate using typed commands, whereas Chimera requires clicking through menus. That being said, Chimera is still required for this tutorial because ChimeraX cannot open .sph files and it cannot save a surface as a .dms file. | Chimera is now no longer actively developed, and has been succeeded by ChimeraX, which is developed by the same group '''[REFERENCE]'''. Although ChimeraX has lost some of the functionality of its predecessor, it has new capabilities to compensate, and it is easier to operate using typed commands, whereas Chimera requires clicking through menus. That being said, Chimera is still required for this tutorial because ChimeraX cannot open .sph files and it cannot save a surface as a .dms file. | ||
Line 27: | Line 27: | ||
Separate instructions for completing the tutorial with ChimeraX will be provided alongside the instructions for using Chimera in each section. | Separate instructions for completing the tutorial with ChimeraX will be provided alongside the instructions for using Chimera in each section. | ||
− | === Alphafold (optional) === | + | ===''' Alphafold (optional) '''=== |
Alphafold is a protein structure prediction program from Google's Deepmind'''REFERENCE''', and it was the first program to predict the structures of proteins in the annual CASP competition to within 90% accuracy in 2020 '''REFERENCE'''. Using this program, one can generate a reasonably accurate prediction of a protein's structure using only its amino acid sequence. In the context of virtual screening, this means that a protein's structure no longer needs to be solved experimentally before one can embark on a virtual screen of the target. As long as the active site can be identified (which is often done by comparing the predicted structure to homologous proteins with solved structures), one can perform a virtual screen of a protein of unsolved structure. | Alphafold is a protein structure prediction program from Google's Deepmind'''REFERENCE''', and it was the first program to predict the structures of proteins in the annual CASP competition to within 90% accuracy in 2020 '''REFERENCE'''. Using this program, one can generate a reasonably accurate prediction of a protein's structure using only its amino acid sequence. In the context of virtual screening, this means that a protein's structure no longer needs to be solved experimentally before one can embark on a virtual screen of the target. As long as the active site can be identified (which is often done by comparing the predicted structure to homologous proteins with solved structures), one can perform a virtual screen of a protein of unsolved structure. | ||
Line 35: | Line 35: | ||
Because this tutorial will use PDB file 3WZE, which contains the solved structure of the Vascular Endothelial Growth Factor Receptor and a bound inhibitor called sorafenib, Alphafold will be unnecessary for this tutorial, but the broadened scope of what virtual screens are possible as a result of this program is worth noting nonetheless. | Because this tutorial will use PDB file 3WZE, which contains the solved structure of the Vascular Endothelial Growth Factor Receptor and a bound inhibitor called sorafenib, Alphafold will be unnecessary for this tutorial, but the broadened scope of what virtual screens are possible as a result of this program is worth noting nonetheless. | ||
− | == Using the Terminal == | + | == '''Using the Terminal''' == |
− | == Directory Organization == | + | == '''Directory Organization''' == |
Having defined folders established before starting is a great way to maintain organization and clarity as you go. | Having defined folders established before starting is a great way to maintain organization and clarity as you go. | ||
Line 52: | Line 52: | ||
Additionally, file names should be clear and logical. We recommend starting with the PDB code followed by the receptor or ligand and ending with what changes were made. For example, 3WZE_lig_AddH_addCharge.mol2 indicates that this file is the ligand for 3WZE and hydrogens and charge have been added. | Additionally, file names should be clear and logical. We recommend starting with the PDB code followed by the receptor or ligand and ending with what changes were made. For example, 3WZE_lig_AddH_addCharge.mol2 indicates that this file is the ligand for 3WZE and hydrogens and charge have been added. | ||
− | = Preparing the Receptor = | + | = '''Preparing the Receptor''' = |
− | = Preparing the Ligand = | + | = '''Preparing the Ligand''' = |
= ''' Surface & Spheres '''= | = ''' Surface & Spheres '''= | ||
Line 123: | Line 123: | ||
This will generate three files: “gridinfo.out”, “grid.bmp”, and “grid.nrg”. Open “gridinfo.out” and check if the information about the receptor matches the that from the paper and your previous preparations. Also check to make sure there are no error messages at the bottom. | This will generate three files: “gridinfo.out”, “grid.bmp”, and “grid.nrg”. Open “gridinfo.out” and check if the information about the receptor matches the that from the paper and your previous preparations. Also check to make sure there are no error messages at the bottom. | ||
− | = Reproducing the PDB's binding with DOCK = | + | = '''Reproducing the PDB's binding with DOCK''' = |
− | = | + | ='''Energy minimization '''= |
− | = | + | = '''Footprinting''' = |
− | = | + | = '''Virtual Screen''' = |
− | = | + | = '''Cartesian Energy Minimization''' = |
− | = References = | + | = '''Rescoring the Virtual Screen''' = |
+ | |||
+ | = '''References''' = |
Revision as of 22:37, 7 March 2023
In this tutorial, you will learn how use the program DOCK6.10 to perform a virtual screen, in which you assess how well the molecules in a library of drug-like molecules bind to a protein of known structure.
Contents
Introduction
A protein whose function is found to be involved in one or more diseases may become a target for pharmaceutical design. Oftentimes, these pharmaceuticals are designed to compete with the enzyme's native substrate for the enzyme's active site, making many pharmaceutical molecules competitive inhibitors of their protein targets. If the target protein's structure is known, and the active site can be identified, then performing a virtual screen can be a monetarily and temporally efficient method of identifying molecules which are likely to bind well to the target's active site.
A virtual screen is set up by first preparing the enzyme's structure and the structure of its native substrate for docking, then the residues important for the native ligand to bind are identified by generating a footprint. A large library of drug-like molecules is then downloaded from a database such as ZINC [REFERENCE], and, using the footprint and enzyme structure, docked into the enzyme using a program such as DOCK6.10 [REFERENCE]. Results are then assessed to see which drug-like compounds match the native substrate's footprint profile and which are energetically comfortable within the simulated active site. Such molecules could then be tested biochemically for their ability to inhibit the target protein, sparing biochemists the hassle of having to test hundreds of thousands of compounds in physical screening experiments.
Software and Files
PDB 3WZE
PDB stands for "Protein Data Bank", which is a repository for the experimentally solved structures of proteins. Each protein structure is assigned a 4 digit code, and 3WZE is the code assigned to the solved structure of vascular endothelial growth factor receptor 2 (VEGFR2) bound to the inhibitor sorafenib. VEGFR2 is a receptor tyrosine kinase, meaning that it is an integral membrane protein that has an exofacial receptor domain, transmembrane domain, and cytofacial kinase domain. Because of the difficulties of protein purification, crystallization, and structure solution, many protein structures in the protein data bank are incomplete: lacking regions that are intrinsically disordered or otherwise not conducive to crystallization. PDB 3WZE is one such structure, because it only contains structural data for the cytofacial kinase domain of VEGFR2.
The active site of kinases is well characterized, and sorafenib is shown bound within it in PDB 3WZE, which will be useful for conducting later steps in the virtual screen.
Download the .pdb file of 3WZE, and use a program such as Chimera or ChimeraX to open and view it.
DOCK6.10
Chimera
ChimeraX (optional)
Chimera is now no longer actively developed, and has been succeeded by ChimeraX, which is developed by the same group [REFERENCE]. Although ChimeraX has lost some of the functionality of its predecessor, it has new capabilities to compensate, and it is easier to operate using typed commands, whereas Chimera requires clicking through menus. That being said, Chimera is still required for this tutorial because ChimeraX cannot open .sph files and it cannot save a surface as a .dms file.
Separate instructions for completing the tutorial with ChimeraX will be provided alongside the instructions for using Chimera in each section.
Alphafold (optional)
Alphafold is a protein structure prediction program from Google's DeepmindREFERENCE, and it was the first program to predict the structures of proteins in the annual CASP competition to within 90% accuracy in 2020 REFERENCE. Using this program, one can generate a reasonably accurate prediction of a protein's structure using only its amino acid sequence. In the context of virtual screening, this means that a protein's structure no longer needs to be solved experimentally before one can embark on a virtual screen of the target. As long as the active site can be identified (which is often done by comparing the predicted structure to homologous proteins with solved structures), one can perform a virtual screen of a protein of unsolved structure.
Even without a university server, Alphafold can be used from within ChimeraX by going to Tools -> Structure Prediction -> Alphafold. This will bring up a menu in which you can paste an amino acid sequence for prediction, searching, or retrieval from the Alphafold database. All human proteins have already been predicted by Alphafold, and their structures can be easily retrieved using the protein's UniProt identifier and the Fetch button. Non-human proteins will have to be predicted from scratch by inputting their amino acid sequence and using the Predict button.
Because this tutorial will use PDB file 3WZE, which contains the solved structure of the Vascular Endothelial Growth Factor Receptor and a bound inhibitor called sorafenib, Alphafold will be unnecessary for this tutorial, but the broadened scope of what virtual screens are possible as a result of this program is worth noting nonetheless.
Using the Terminal
Directory Organization
Having defined folders established before starting is a great way to maintain organization and clarity as you go.
000_files 001_structure 002_surface_spheres 003_gridbox 004_dock 005_virtual_screen 006_virtual_screen_mpi 007_cartesian_min 008_rescore
Additionally, file names should be clear and logical. We recommend starting with the PDB code followed by the receptor or ligand and ending with what changes were made. For example, 3WZE_lig_AddH_addCharge.mol2 indicates that this file is the ligand for 3WZE and hydrogens and charge have been added.
Preparing the Receptor
Preparing the Ligand
Surface & Spheres
Surface Generation
The generation of the surface of the receptor protein will have better visualization of the binding site of the receptor. In order to do so, load the mol2 file of 3WZE receptor in Chimera:
Action < Surface < show
Then we will save the dms(molecular surface) file as 3WZE_rec.dms, which will be used in the next step of spheres generation:
Tools < Structure Editing < Write DMS
Spheres Generation
First create a new file “INSPH” by typing the following into terminal:
vi INSPH
Then type the following information into this file (anything after # are comments that are references for reading only, and should not be putting into the file) :
3WZE_rec.dms #Molecular Surface file R #This line specifies whether to make the spheres in the model’s exterior (R) or interior (L) X #Specifies that the surface points from the dms file should be used for making spheres(?) 0 #Minimum radius between spheres 4.0 #Max radius of each sphere 1.4 #Minimum radius of each sphere 3WZE.sph #Name of the output file that will be made
Then save the file and run the following command on terminal: note that if you want to re-run this command, make sure you remove file “OUTSPH” before your re-run because this file cannot be overwritten.
sphgen -i INSPH -o OUTSPH
The output file “3WZE.sph” can be visualized by using Chimera, load the output file along with the receptor protein mol2 file, the two structures should be overlapping:
Spheres selection
Since we are only interested in what’s happening in the binding site, we will remove spheres that are far away from the ligand. To do so, use the following command and an output file “selected_spheres.sph” will be generated:
sphere_selector 3WZE.sph 3WZE_lig.mol2 10.0 #sphere_selector [sphere output file] [ligand mol2 file] [radius for distance from ligand]
Box and Grid
Box generation
First we will generate a box that will be used to generate the grid. To generate a box, we will need to create a new file “showbox.in” by using the following command:
vi showbox.in
Then type the following information into the file (anything after # are comments that are references for reading only, and should not be putting into the file) :
Y #Indicates that we want to make a box 8.0 #Extra length beyond the spheres to be included by the box in all directions selected_spheres.sph #Indicates that the box should be constructed to encompass the spheres in this file. 1 #This specifies which cluster of spheres should be used to generate the box. 3WZE.box.pdb #Name of output file
Then, run the following code to get the boxed structure:
showbox < showbox.in
Grid generation
Create a new file “grid.in” by using the following command:
vi grid.in
Then type the following information into the file:
compute_grids yes grid_spacing 0.4 output_molecule no contact_score no energy_score yes energy_cutoff_distance 9999 atom_model a attractive_exponent 6 repulsive_exponent 9 distance_dielectric yes dielectric_factor 4 bump_filter yes bump_overlap 0.75 receptor_file 3WZE_rec.mol2 box_file 3WZE.box.pdb vdw_definition_file /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/vdw_AMBER_parm99.defn score_grid_prefix grid
Then generate the grid by typing the following command:
grid -i grid.in -o gridinfo.out
This will generate three files: “gridinfo.out”, “grid.bmp”, and “grid.nrg”. Open “gridinfo.out” and check if the information about the receptor matches the that from the paper and your previous preparations. Also check to make sure there are no error messages at the bottom.