|
|
Line 32: |
Line 32: |
| ==II. Preparing the Receptor and Ligand== | | ==II. Preparing the Receptor and Ligand== |
| | | |
− | | + | (1 person) |
| | | |
| ==III. Generating Receptor Surface and Spheres== | | ==III. Generating Receptor Surface and Spheres== |
| | | |
− | | + | (2 people) |
| | | |
| ==IV. Generating Box and Grid== | | ==IV. Generating Box and Grid== |
| | | |
− | | + | (2 people) |
| | | |
| ==V. Docking a Single Molecule for Pose Reproduction== | | ==V. Docking a Single Molecule for Pose Reproduction== |
| | | |
− | | + | (3 people) |
| | | |
| ==VI. Virtual Screening== | | ==VI. Virtual Screening== |
| | | |
− | ===Virtual Screening Introduction===
| + | (2 people) |
− | | |
− | A virtual screen of various ligand allows for the comparison of both qualitative (e.g. position in binding site) and quantitative (e.g. energy scores) data pertaining to the each screened ligand with an originally docked molecule. Virtual screening is often used as a method to cut the cost of experimentation by narrowing down the ligands within a database and predicting which will exhibit the most favorable binding to a specific protein (with a pre-determined .grid file).
| |
− | | |
− | ===Running the Database Filter===
| |
− | | |
− | When first beginning the virtual screen, a filter should be specified in order save computational power by only screening potentially effective ligands within the directory 06.database-filter/. This involves limiting several variables of ligands specified in a .mol2 file. To limit the ligands from the original database, a file "dock_filter.in" is created and executed using DOCK. Note that in this example the file "cdiv_p0.0.mol2" is a multi mol2 file containing thousands of ligands from the CHEMDIV vendor which was previously downloaded from the UCSF ZINC database (https://zinc.docking.org/about/ucsf). ZINC is a great resource of purchasable compounds for virtual screening. Here are the contents of the file:
| |
− | | |
− | ligand_atom_file /home/wjallen/AMS536/multi-mol2-files/cdiv_p0.0.mol2
| |
− | limit_max_ligands no
| |
− | skip_molecule no
| |
− | read_mol_solvation no
| |
− | calculate_rmsd no
| |
− | use_database_filter yes
| |
− | dbfilter_max_heavy_atoms 50
| |
− | dbfilter_min_heavy_atoms 0
| |
− | dbfilter_max_rot_bonds 10
| |
− | dbfilter_min_rot_bonds 0
| |
− | dbfilter_max_molwt 9999.0
| |
− | dbfilter_min_molwt 0.0
| |
− | dbfilter_max_formal_charge -1.0
| |
− | dbfilter_min_formal_charge -2.0
| |
− | orient_ligand no
| |
− | use_internal_energy no
| |
− | flexible_ligand no
| |
− | bump_filter no
| |
− | score_molecules no
| |
− | atom_model all
| |
− | vdw_defn_file ../00.files/vdw_AMBER_parm99.defn
| |
− | flex_defn_file ../00.files/flex.defn
| |
− | flex_drive_file ../00.files/flex_drive.tbl
| |
− | ligand_outfile_prefix filtered
| |
− | write_orientations no
| |
− | num_scored_conformers 1
| |
− | rank_ligands no
| |
− | | |
− | In this file some important characteristics of the ligands that are limited are ''dbfilter_max_heavy_atoms'', ''dbfilter_max_rot_bonds'', ''dbfilter_max_molwt'', and both ''dbfilter_max_formal_charge'' and ''dbfilter_min_formal_charge''. As show in the text of the file, a file called filtered_scored.mol2 will be created with all of the ligands from the original database that fit into the specified limits of number of heavy atoms, number of rotatable bonds, molecular weight, and formal charge. '''''Note:''''' ''The max and min formal charge specified in this file (-1 to -2) are an extremely limited range, and are only used in this tutorial as an illustration.''
| |
− | | |
− | ===Running the Virtual Screen===
| |
− | | |
− | After creating a new directory, 07.virtual-screen/, a dock.in file must be created. So as not to confuse the virtual screen file with the original dock.in of the docked ligand, the new file created is called dock_vs.in. Running this file in dock6 allows us to fill in the important details of the file that terminate in a virtual screening of the input ligand.
| |
− | | |
− | ligand_atom_file ../06.database-filter/filtered_scored.mol2
| |
− | limit_max_ligands no
| |
− | skip_molecule no
| |
− | read_mol_solvation no
| |
− | calculate_rmsd no
| |
− | use_database_filter no
| |
− | orient_ligand yes
| |
− | automated_matching yes
| |
− | receptor_site_file ../02.surface-sphgen/selected_spheres.sph
| |
− | max_orientations 1000
| |
− | critical_points no
| |
− | chemical_matching no
| |
− | use_ligand_spheres no
| |
− | use_internal_energy yes
| |
− | internal_energy_rep_exp 12
| |
− | flexible_ligand yes
| |
− | user_specified_anchor no
| |
− | limit_max_anchors no
| |
− | min_anchor_size 5
| |
− | pruning_use_clustering yes
| |
− | pruning_max_orients 1000
| |
− | pruning_clustering_cutoff 100
| |
− | pruning_conformer_score_cutoff 100
| |
− | use_clash_overlap no
| |
− | write_growth_tree no
| |
− | bump_filter no
| |
− | score_molecules yes
| |
− | contact_score_primary no
| |
− | contact_score_secondary no
| |
− | grid_score_primary yes
| |
− | grid_score_secondary no
| |
− | grid_score_rep_rad_scale 1
| |
− | grid_score_vdw_scale 1
| |
− | grid_score_es_scale 1
| |
− | grid_score_grid_prefix ../03.box-grid/1LOQ.grid
| |
− | multigrid_score_secondary no
| |
− | dock3.5_score_secondary no
| |
− | continuous_score_secondary no
| |
− | descriptor_score_secondary no
| |
− | gbsa_zou_score_secondary no
| |
− | gbsa_hawkins_score_secondary no
| |
− | SASA_descriptor_score_secondary no
| |
− | amber_score_secondary no
| |
− | minimize_ligand yes
| |
− | minimize_anchor yes
| |
− | minimize_flexible_growth yes
| |
− | use_advanced_simplex_parameters no
| |
− | simplex_max_cycles 1
| |
− | simplex_score_converge 0.1
| |
− | simplex_cycle_converge 1.0
| |
− | simplex_trans_step 1.0
| |
− | simplex_rot_step 0.1
| |
− | simplex_tors_step 10.0
| |
− | simplex_anchor_max_iterations 500
| |
− | simplex_grow_max_iterations 500
| |
− | simplex_grow_tors_premin_iterations 0
| |
− | simplex_random_seed 0
| |
− | simplex_restraint_min no
| |
− | atom_model all
| |
− | vdw_defn_file ../00.files/vdw_AMBER_parm99.defn
| |
− | flex_defn_file ../00.files/flex.defn
| |
− | flex_drive_file ../00.files/flex_drive.tbl
| |
− | ligand_outfile_prefix output
| |
− | write_orientations no
| |
− | num_scored_conformers 1
| |
− | rank_ligands no
| |
− | | |
− | | |
− | A few of the questions addressed in the terminal window while running a virtual screen are as follows:
| |
− | | |
− | '''ligand_atom_file''' - here we have specified the new file created after filtering the initial ligand database file
| |
− | | |
− | '''calculate_rmsd''' - we select the default <no> because since we are screening a library of compounds, there will be no reference position to compare them to
| |
− | | |
− | '''orient_ligand''' - here we would like to the have the ligand oriented in the binding pocket
| |
− | | |
− | '''receptor_site_file''' - this is our selected_spheres.sph file from earlier
| |
− | | |
− | '''internal_energy_rep_exp''' - here we select 12 rather than 9 because using a repulsive exponent of 12 gives us a slightly steeper rising potential as radius between the atoms falls below the optimal distance (sigma)
| |
− | | |
− | '''flexible_ligand''' - when performing a virtual screen of a random database it is always advisable to set the ligands to flexible
| |
− | | |
− | '''min_anchor_size''' - ensures that major features on the molecules are used as anchors and not just simple small groups
| |
− | | |
− | '''score_molecules''' - we always want to score the molecules of a virtual screen in order to determine which are the best fit for the binding pocket
| |
− | | |
− | '''grid_score_primary''' - we want the grid score to be the primary method of scoring the screened ligands
| |
− | | |
− | '''grid_score_grid_prefix''' - the input for the .grid file created earlier
| |
− | | |
− | '''atom_model''' - we select the all atom model
| |
− | | |
− | '''num_scored_conformers''' - we only want to produce one scored conformer from each screened ligand because we only want the best conformations
| |
− | | |
− | Note that if you are performing a small sample virtual screening it is possible to run it on a local computer in a reasonable amount of time; however, if you are running the virtual screening to a very large scale (i.e. hundreds or thousands of ligands) you will need to use several nodes in Seawulf to produce results in a timely fashion.
| |
− | | |
− | ===Analyzing the Results===
| |
− | | |
− | When your virtual screening has run completely and you are ready to analyze the results, open Chimera. In Chimera you can prepare your protein by opening 1LOQ.receptor.noH.pdb and choosing ''Actions'' -> ''Surface'' -> ''show''. You may also wish to open up the original ligand file 1LOQ.ligand.mol2 in order to show the position of the original ligand/substrate in the receptor's active site.
| |
− | | |
− | To view the results of your virtual screening go to ''Tools'' -> ''Surface/Binding Analysis'' -> ''ViewDock'' and from their you can select your virtual screen output file. In the "ViewDock" pop-up window you can then select ''Column'' --> ''Show'' --> ''Grid Score'' which will add a column to the ViewDock window with the grid score of each ligand. You can double-click on the column heading ''Grid Score'' to sort the ligands based on their grid scores.
| |
− | | |
− | [Chimera image1] [Chimera image2]
| |
| | | |
| ==VII. Running DOCK in Serial and in Parallel on Seawulf== | | ==VII. Running DOCK in Serial and in Parallel on Seawulf== |
− | The [http://www.stonybrook.edu/seawulfcluster/ Seawulf] Cluster is a 470-processor Linux Cluster capable of highly parallel processing. This parallel processing allows dock virtual screens to be completed in a fraction of the time as a single processor.
| |
− |
| |
− | If you are docking multiple ligands, you can use more than one processor in parallel mode, but you should never use more processors than you have ligands. Before we can run DOCK on Seawulf, we need to copy the proper files from Herbie to Seawulf. If we CD into the AMS536 folder we can use the following command from the mathlab computer to copy all of the dock-tutorial files
| |
− |
| |
− | scp -r /dock-tutorial/ username@herbie.mathlab.sunysb.edu:~/AMS536/
| |
− |
| |
− | Now we have all of our DOCK preparation files and folders on the seawulf cluster.
| |
− |
| |
− | ===Running DOCK in Serial on a Single Processor===
| |
− |
| |
− | Running on a single processor is very similar to running dock on the mathlab computer.
| |
− |
| |
− | If you make a file called qsub.csh with the text:
| |
− |
| |
− | #!/bin/tcsh
| |
− | #PBS -l nodes=1:ppn=1
| |
− | #PBS -l walltime=01:00:00
| |
− | #PBS -N dock6
| |
− | #PBS -M user@ic.sunysb.edu
| |
− | #PBS -j oe
| |
− | #PBS -o pbs.out
| |
− |
| |
− | cd /nfs/user03/zfoda/AMS536/dock-tutorial/07.virtscreen
| |
− | /nfs/user03/wjallen/local/dock6/bin/dock6 -i dock.in -o dock.out
| |
− |
| |
− | An explanation of the commands:
| |
− |
| |
− | #!/bin/tcsh #Execute script with tcsh
| |
− | #PBS -l nodes=1:ppn=1 #Use one node, and one processor per node, so one single processor
| |
− | #PBS -l walltime=01:00:00 #Allow 1 hour for your job run
| |
− | #PBS -N dock6 #Name of your job
| |
− | #PBS -M user@ic.sunysb.edu #Get an email notifying you when your job is completed
| |
− | #PBS -j oe #Combine the output and error streams into a single output file
| |
− | #PBS -o pbs.out #Name of your output file
| |
− |
| |
− | cd /nfs/user03/zfoda/AMS536/dock-tutorial/07.virtscreen #Change to your home directory and folder with dock files
| |
− | /nfs/user03/wjallen/local/dock6/bin/dock6 -i dock.in -o dock.out #Specifies path to dock executable and provide input and output filenames
| |
− |
| |
− | To submit the experiment use the command:
| |
− |
| |
− | qsub qsub.csh
| |
− |
| |
− | You will have submitted a DOCK experiment to the seawulf queue.
| |
− |
| |
− | See also [http://ringo.ams.sunysb.edu/index.php/PBS_Queue PBS] commands.
| |
− |
| |
− | ===Running DOCK in Parallel using MPI===
| |
− |
| |
− | In order to run DOCK in parallel you have to use a slightly different build of DOCK6 called dock6.mpi.
| |
− | Message passing interface ([http://www.mpi-forum.org/docs/docs.html MPI]) is basically a program that allows programs like DOCK to run in parallel.
| |
− |
| |
− | So, make another file called qsub.vs.csh with the contents:
| |
− |
| |
− | #!/bin/tcsh
| |
− | #PBS -l nodes=4:ppn=2
| |
− | #PBS -l walltime=24:00:00
| |
− | #PBS -N screen
| |
− | #PBS -o qsub.log
| |
− | #PBS -j oe
| |
− | #PBS -V
| |
− |
| |
− | cd /nfs/user03/username/AMS536/dock-tutorial/07.virtscreen
| |
− | mpirun -np 8 /nfs/user03/wjallen/local/dock6/bin/dock6.mpi -i dockvs.in -o dockvs.out
| |
− |
| |
− | As you can see there are two major changes:
| |
− |
| |
− | #PBS -l nodes=4:ppn=2 #Use 4 nodes, and 2 processors per node, so 8 processors
| |
− |
| |
− | Note: since one processor is used to distribute the processes, this will run DOCK as 7 (n-1) parallel processes.
| |
− |
| |
− | mpirun -np 8 /nfs/user03/wjallen/local/dock6/bin/dock6.mpi -i dockvs.in -o dockvs.out #this line uses mpi to run dock.mpi on multiple processors
| |
− |
| |
− | And then we can run:
| |
| | | |
− | qsub qsub.vs.csh
| + | (1 person) |
DOCK is a molecular docking program used in drug discovery. It was developed by Irwin D. Kuntz, Jr. and colleagues at UCSF (see UCSF DOCK). This program, given a protein binding site and a small molecule, tries to predict the correct binding mode of the small molecule in the binding site, and the associated binding energy. Small molecules with highly favorable binding energies could be new drug leads. This makes DOCK a valuable drug discovery tool. DOCK is typically used to screen massive libraries of millions of compounds against a protein to isolate potential drug leads. These leads are then further studied, and could eventually result in a new, marketable drug. DOCK works well as a screening procedure for generating leads, but is not currently as useful for optimization of those leads.
While performing docking, it is convenient to adopt a standard directory structure / naming scheme, so that files are easy to find / identify. For this tutorial, we will use something similar to the following:
In addition, most of the important files that are derived from the original crystal structure will be given a prefix that is the same as the PDB code, '1HVR'. The following sections in this tutorial will adhere to this directory structure / naming scheme.