AutoDock4 Pose Reproduction Tutorial
The purpose of this tutorial is to provide members of the Rizzo lab a way to benchmark the AutoDock4 software to compare the Pose Reproduction success rates against the DOCK6 software. (Note: The program version used for this experiment was AutoDock4.2.6 and DOCK6.9
Contents
I. Introduction
AutoDock4 is a commonly used docking program which assess the affinity of a ligand, a drug candidate to a target site (Protein, Enzyme, RNA). To evaluate a software's ability to accurately reproduce experimental results, an experiment called Pose Reproduction was developed
Pose Reproduction used an experimental known ligand and protein complex from the PDB database and attempts to dock this ligand back into it's original location. If the lowest energy ligand(most energetically favorable) is within 2.0 RMSDh of the original target site, this is referred to as a docking success. If any of the ligands, but not the lowest energy ligand is within 2.0 RMSDh of the original target site, this is referred to as a scoring failure. If none of the ligands are within 2.0 RMSDh of the original target site this is referred to as a sampling failure.
II.Prepping Directories
First step is prepare a file with a list of systems within it. For this docking experiment this file was called clean.systems.all
121P 181L 182L 183L 184L etc
Second step run the run.000.AutoDock.source.sh script to prepare a directory for each system in the file. The 1st arguement is the list of systems file made in the previous step. The 2nd Arguement is the new directory that will be made where all the AutoDock4 experiments will be performed
bash ./run.000.AutoDock.source.sh ../clean.systems.all AutoDock4_Tutorial
The directory where all the directories will be formed is
AutoDock4_Tutorial/
Each System will have it's own directory in this file
AutoDock4_Tutorial/121P/ AutoDock4_Tutorial/181L/ AutoDock4_Tutorial/182L/ etc
III.Preparing Receptors and Ligands
For this part of the experiment, the receptors and ligands were prepared into pdbqt format. To accomplish this part of the experiment the original mol2 files were used from the testset to convert these systems. The ligands will be assigned gasteiger charges and am1bcc charges will be assigned to the receptor, which produced the highest success rates in previous experiments and was performed in previous papers. Scripts were developed to process these systems from mol2 to pdbqt.
Command to convert these files
Step 1) Make sure you are in the correct directory
cd AutoDock4_Tutorial/
Step 2) Run the correct bash script to run these molecules
bash ./../run001.AutoDock4.system.prep.sh /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all ../AutoDock4_Tutorial
This script will create a ligand and pdbqt receptor in each system directory
cd 121P/ ls
121P/121P.lig.am1bcc.pdbqt 121P/121P.rec.clean.pdbqt
Further Processing may be needed to prepare these systems, will explain later
This scripts used to accomplish this were prepare_ligand4.py and prepare_receptor4.py found in mgltools/1.5.6
IV.Preparing Grids
For this part of the experiment the grids will be generated for each ligand and receptor
Step 1) Enter the correct directory
cd AutoDock4_Tutorial
Step 2) Run the bash script that creates the grids, recommending submitting to the qsub
Important Note: This script uses the ligand location as the center of the grid position, since all ligands in these systems are already in the binding pocket
bash ./../run002.AutoDock4.grid.generation.sh /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all ../AutoDock4_Tutorial > AutoDock4_grid_output
This will generate all of the grids by using the ligand and receptor previously generated, it's possible to edit the parameters in this script using the prepare_gpf4.py script by inputting new parameters such as the gridbox which is adjusted using the option npts='60,60,60' or the center of the box which uses the -y command to center the grid box around the ligand
Usage: prepare_gpf4.py -l pdbqt_file -r pdbqt_file -l ligand_filename -r receptor_filename
Optional parameters: [-i reference_gpf_filename] [-o output_gpf_filename] [-x flexres_filename] [-p parameter=newvalue. For example: -p ligand_types='HD,Br,A,C,OA' or p npts='60,60,66' or gridcenter='2.5,6.5,-7.5'] [-d directory of ligands to use to set types] [-y boolean to center grids on center of ligand] [-n boolean to NOT size_box_to_include_ligand] [-I increment npts in all 3 dimensions by this integer] [-v]
Following this cd into the 121P system directory
cd AutoDock4_Tutorial/121P/ ls
This will always generate the grids, .fld, .xyz, a variety of .map files will be generated for each chemical type present within the systems, and a log file of the results a .glg file
121P.autogrid.glg 121P.lig.am1bcc.pdbqt 121P.rec.clean.A.map 121P.rec.clean.e.map 121P.rec.clean.maps.fld 121P.rec.clean.N.map 121P.rec.clean.P.map 121P.rec.clean.C.map 121P.rec.clean.gpf 121P.rec.clean.maps.xyz 121P.rec.clean.OA.map 121P.rec.clean.d.map 121P.rec.clean.HD.map 121P.rec.clean.NA.map 121P.rec.clean.pdbqt
The commands used for this script was prepare_gpf4.py found in mgltools/1.5.6 and autogrid4 found in autodock/4.2.6
V.Docking Ligands
The following step will perform the actual AutoDock4 docking for this experiment. For this part of the experiment, it's possible to reuse the previously generated ligands, receptors, and grids to perform multiple docking experiments. This eliminates any variables that later bias in case systems were converted differently and to save time to prevent the user from rerunning all these steps again.
To conduct this experiment
cd AutoDock4_Tutorial
Following this run the script run003.AutoDock4.docking.sh Arguement 1 is the list of systems Arguement 2 is the directory were all the system directories are located Arguement 3 is the docking directory created for each docking experiment
bash ../run003.AutoDock4.docking.sh /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all ../AutoDock4_Tutorial Tutorial_Docking
Once, this completed all the systems should look the same. Perform the ls command to see all files
ls
The following systems should show
121P.autogrid.glg 121P.lig.am1bcc.pdbqt 121P.rec.clean.A.map 121P.rec.clean.e.map 121P.rec.clean.maps.fld 121P.rec.clean.N.map 121P.rec.clean.P.map 121P.rec.clean.C.map 121P.rec.clean.gpf 121P.rec.clean.maps.xyz 121P.rec.clean.OA.map 121P.rec.clean.d.map 121P.rec.clean.HD.map 121P.rec.clean.NA.map 121P.rec.clean.pdbqt Tutorial_Docking/
Following this cd into the Tutorial_Docking/ and ls to view results
cd Tutorial_Docking/
ls
The following should be within the directory
121P.docking.dlg 121P.docking.dpf 121P.dock.parameter.dpf summary_of_results_1.0
Trouble Shooting
Some of these systems may present issues such as zero charge atom types or non-integral charge systems(ex. is a ligand with a charge of 2.48) This should only occur with ligands because the receptors are maintaining their am1bcc charges.
Non-integral Charge Issue will look like this, and occur during the docking step
autodock4: *** Caution! Non-integral total charge (-2.498 e) on ligand may indicate a problem... ***
To trouble shoot, you'll need to perform DOCK the first time to see which systems are problematic
This troubleshoot increases the success rate of the Pose Reproduction by 3%
Following all this the grids were regenerated for all these systems. Once these ligands, receptors, and grids have been generated once. You'll be able to reuse these systems and not need to repeat all the previous steps again.
The following script was used to determine and isolate these problematic ligands. This adds the nonintegral charge systems and systems with zero charge atoms into one directory.
Arguement 1 is the list of systems
Arguement 2 is the docking directory
python ../Problematic_files_transfer.py /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all Tutorial_Docking
This creates a directory of Problematic_ligands/ and the directory Broken/ was created within this to store these problematic ligands
cd AutoDock4_Tutorial/Problematic_ligands/Broken
ls
This shows the list of all the problematically charged ligands
121P.lig.am1bcc.mol2 181L.lig.am1bcc.mol2 184L.lig.am1bcc.mol2 etc
Once these ligands are identified into their own directory open them all up in chimera
File->open->All_Ligands
Then add gasteiger charges to all the systems
Tools->Structure Editing->Add Charges
Then press ok under these settings
Following this, save all of these fixed ligand into one directory. This directory was put into the Fixed directory
cd Fixed/
The inside of the directory will have a list of all the ligands within the directory that are all fixed
ls
This should show, all the fixed ligands. Directory can be named anything but ligands needs to have this file name format to work
121P.lig.am1bcc.mol2 181L.lig.am1bcc.mol2 182L.lig.am1bcc.mol2 183L.lig.am1bcc.mol2 etc
Lastly a script will be used to convert these properly charged mol2 files to pdbqt files with the same charges.
(NOTE: USE ABSOLUTE PATHS FOR THIS SCRIPT)
Argument 1 is the list of problematic ligand systems generated previously with Problematic_ligand_transfer.py script and in the AutoDock4_Tutorial/ directory
Argument 2 is the directory containing all the Fixed systems
Argument 3 is the main directory for the docking experiments, AutoDock4_tutorial
bash ../Fixed_ligand_transfer.sh /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/Problematic.txt /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/Problematic_ligands/Fixed /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial
Following this you can generate all the grids again in part IV and perform the docking experiment Part V with just these problematic ligands using the Problematic.txt file
VI.Rescoring in DOCK6
Following this these molecules needed to be rescored with DOCK6 to provide an accurate comparison between these two softwares. To accomplish this openbabel was used to convert these systems from pdbqt to mol2 files. This conversion changes them to delete all the hydrogens from the systems.The First script converted these atoms from pdbqt to mol2 format using openbabel
cd AutoDock4_Tutorial
1st argument is the number of different conformations
2nd argument is the docking directory
3rd argument is the list of all the systems being docked
bash /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/pdbqt_to_mol2_AutoDock4.sh 10 Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all
These are then rescored using DOCK6's all atom model option, which should only take 1 second per system.
This is the script that rescores all the molecules
Argument 1 is the number of ligands generated by AutoDock4, This value should be equal to the number of GA runs
Argument 2 is the Docking directory where the AutoDock4 experiment was conducted
bash /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/run004.AutoDock4.Rescore.with.DOCK.v5.sh 10 Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all
This will split each of the docking results to pdbqt, convert these files to mol2 files, then rescore them all in DOCK6. This will generate many different files but the key 2 files in each is AutoDock4_Tutorial/sys/sys.output_scored.mol2 and AutoDock4_Tutorial/sys/sys.output.all_scored.mol2. You should see these files when cd into a directory, then using the ls command
cd 121P/Tutorial_Docking
ls
Go back into the main directory, some of these ligands had been prepared improperly from open babel which caused them to have -1000 Hungarian RMSD's and a script was developed to fix this. Note: This is performed after rescoring all the molecules because this script only changes the ligands of the problematic systems
cd AutoDock4_Tutorial
1st argument is the list of all the systems being tested in the experiment
2nd argument is the Docking directory
python /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/atom_type_fix_rescore.py /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all Tutorial_Docking
This script fixes these ligands and generates a txt file named "redo_rescore.txt", this file has the list of all the systems that have been altered due to problematic issues. Following this only the problematic systems are rescored with DOCK6.9 using the previous run004 script.
bash /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/run004.AutoDock4.Rescore.with.DOCK.v5.sh 10 Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/redo_rescore.txt
After this step has been performed all the systems will have been successfully docked and we can analyze all the results
VII.Create CSV
This script will generate the AutoDock4 results that list all the important data revelant to the Pose Reproduction Benchmark results.
Argument 1 Is the docking directory which stores that docking result
Argument 2 is the list of all the systems being tested so far
Argument 3 is the new name of the csv file that will be generated for the docking results, .csv will be appended to the end of it
python /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/bickel_laverty_calculate_autodock4_results.score.noH.py Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all Tutorial.3.Score
You should be able to see the new csv file generated within this directory
cd AutoDock4_Tutorial
You can open this file using this command, and make sure the CSV ran properly
vim Tutorial.3.Score.csv
VIII.Generating Graphs from the CSV
Lastly, the csv file will be used to generate all the graphs to analyze these results
python ../../DOCK6_Pose_Reproduction_analysis_v4.py Tutorial.3.Score.2.csv 1 python ../../DOCK6_Pose_Reproduction_analysis_v4.py Tutorial.3.Score.2.csv 2 python ../../DOCK6_Pose_Reproduction_analysis_v4.py Tutorial.3.Score.2.csv 3