Difference between revisions of "AutoDock4 Pose Reproduction Tutorial"

From Rizzo_Lab
Jump to: navigation, search
(Trouble Shooting)
(VII.Create CSV)
 
(24 intermediate revisions by the same user not shown)
Line 113: Line 113:
  
 
The commands used for this script was prepare_gpf4.py found in mgltools/1.5.6 and autogrid4 found in autodock/4.2.6
 
The commands used for this script was prepare_gpf4.py found in mgltools/1.5.6 and autogrid4 found in autodock/4.2.6
 
===Trouble Shooting===
 
 
Some of these systems may present issues such as zero charge atom types or non-integral charge systems(ex. is a ligand with a charge of 2.48) This should only occur with ligands because the receptors are maintaining their am1bcc charges.
 
 
Non-integral Charge Issue will look like this, and occur during the docking step
 
 
autodock4: *** Caution!  Non-integral total charge (-2.498 e) on ligand may indicate a problem... ***
 
 
To trouble shoot this a script was developed to put all these problematic systems into a folder. These systems had then been prepared manually using Chimera to add gasteiger charges to all these systems.
 
 
This troubleshoot increases the success rate of the Pose Reproduction by 3%
 
 
Following all this the grids were regenerated for all these systems. Once these ligands, receptors, and grids have been generated once. You'll be able to reuse these systems and not need to repeat all the previous steps again.
 
 
 
The following is how to combat this issue
 
 
NEED TO DEVELOP SCRIPT TO PUT ALL THE NONINTEGRAL CHARGE AND ZERO CHARGED ATOMS INTO A FOLDER
 
  
 
==V.Docking Ligands==
 
==V.Docking Ligands==
Line 183: Line 164:
 
  121P.dock.parameter.dpf  
 
  121P.dock.parameter.dpf  
 
  summary_of_results_1.0
 
  summary_of_results_1.0
 +
 +
 +
===Trouble Shooting===
 +
 +
Some of these systems may present issues such as zero charge atom types or non-integral charge systems(ex. is a ligand with a charge of 2.48) This should only occur with ligands because the receptors are maintaining their am1bcc charges.
 +
 +
Non-integral Charge Issue will look like this, and occur during the docking step
 +
 +
autodock4: *** Caution!  Non-integral total charge (-2.498 e) on ligand may indicate a problem... ***
 +
 +
To trouble shoot, you'll need to perform DOCK the first time to see which systems are problematic
 +
 +
 +
This troubleshoot increases the success rate of the Pose Reproduction by 3%
 +
 +
 +
 +
Following all this the grids were regenerated for all these systems. Once these ligands, receptors, and grids have been generated once. You'll be able to reuse these systems and not need to repeat all the previous steps again.
 +
 +
 +
The following script was used to determine and isolate these problematic ligands. This adds the nonintegral charge systems and systems with zero charge atoms into one directory.
 +
 +
Arguement 1 is the list of systems
 +
 +
Arguement 2 is the docking directory
 +
 +
  python ../Problematic_files_transfer.py /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all Tutorial_Docking
 +
 +
This creates a directory of Problematic_ligands/ and the directory Broken/ was created within this to store these problematic ligands
 +
 +
cd AutoDock4_Tutorial/Problematic_ligands/Broken
 +
 +
ls
 +
 +
This shows the list of all the problematically charged ligands
 +
 +
121P.lig.am1bcc.mol2
 +
181L.lig.am1bcc.mol2
 +
184L.lig.am1bcc.mol2
 +
etc
 +
 +
Once these ligands are identified into their own directory open them all up in chimera
 +
 +
File->open->All_Ligands
 +
 +
[[File:Problematic_Ligands.png]]
 +
 +
Then add gasteiger charges to all the systems
 +
 +
Tools->Structure Editing->Add Charges
 +
 +
[[File:Charges to Ligand.PNG]]
 +
 +
Then press ok under these settings
 +
 +
Following this, save all of these fixed ligand into one directory. This directory was put into the Fixed directory
 +
 +
cd Fixed/
 +
 +
The inside of the directory will have a list of all the ligands within the directory that are all fixed
 +
ls
 +
 +
This should show, all the fixed ligands. Directory can be named anything but ligands needs to have this file name format to work
 +
121P.lig.am1bcc.mol2
 +
181L.lig.am1bcc.mol2
 +
182L.lig.am1bcc.mol2
 +
183L.lig.am1bcc.mol2
 +
etc
 +
 +
Lastly a script will be used to convert these properly charged mol2 files to pdbqt files with the same charges.
 +
 +
(NOTE: USE ABSOLUTE PATHS FOR THIS SCRIPT)
 +
 +
Argument 1 is the list of problematic ligand systems generated previously with Problematic_ligand_transfer.py script and in the AutoDock4_Tutorial/ directory
 +
 +
Argument 2 is the directory containing all the Fixed systems
 +
 +
Argument 3 is the main directory for the docking experiments, AutoDock4_tutorial
 +
 +
bash ../Fixed_ligand_transfer.sh /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/Problematic.txt /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/Problematic_ligands/Fixed /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial
 +
 +
Following this you can generate all the grids again in part IV and perform the docking experiment Part V with just these problematic ligands using the Problematic.txt file
 +
 +
==VI.Rescoring in DOCK6==
 +
Following this these molecules needed to be rescored with DOCK6 to provide an accurate comparison between these two softwares. To accomplish this openbabel was used to convert these systems from pdbqt to mol2 files. This conversion changes them to delete all the hydrogens from the systems.The First script converted these atoms from pdbqt to mol2 format using openbabel
 +
 +
cd AutoDock4_Tutorial
 +
 +
1st argument is the number of different conformations
 +
 +
2nd argument is the docking directory
 +
 +
3rd argument is the list of all the systems being docked
 +
 +
bash /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/pdbqt_to_mol2_AutoDock4.sh 10 Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all
 +
 +
 +
These are then rescored using DOCK6's all atom model option, which should only take 1 second per system.
 +
 +
This is the script that rescores all the molecules
 +
 +
Argument 1 is the number of ligands generated by AutoDock4, This value should be equal to the number of GA runs
 +
 +
Argument 2 is the Docking directory where the AutoDock4 experiment was conducted
 +
 +
bash /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/run004.AutoDock4.Rescore.with.DOCK.v5.sh 10 Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all
 +
 +
This will split each of the docking results to pdbqt, convert these files to mol2 files, then rescore them all in DOCK6. This will generate many different files but the key 2 files in each is AutoDock4_Tutorial/sys/sys.output_scored.mol2 and AutoDock4_Tutorial/sys/sys.output.all_scored.mol2. You should see these files when cd into a directory, then using the ls command
 +
 +
cd 121P/Tutorial_Docking
 +
 +
ls
 +
 +
Go back into the main directory, some of these ligands had been prepared improperly from open babel which caused them to have -1000 Hungarian RMSD's and a script was developed to fix this. Note: This is performed after rescoring all the molecules because this script only changes the ligands of the problematic systems
 +
 +
cd AutoDock4_Tutorial
 +
 +
1st argument is the list of all the systems being tested in the experiment
 +
 +
2nd argument is the Docking directory
 +
 +
python /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/atom_type_fix_rescore.py /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all Tutorial_Docking
 +
 +
This script fixes these ligands and generates a txt file named "redo_rescore.txt", this file has the list of all the systems that have been altered due to problematic issues. Following this only the problematic systems are rescored with DOCK6.9 using the previous run004 script.
 +
 +
bash /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/run004.AutoDock4.Rescore.with.DOCK.v5.sh 10 Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/redo_rescore.txt
 +
 +
After this step has been performed all the systems will have been successfully docked and we can analyze all the results
 +
 +
==VII.Create CSV==
 +
This script will generate the AutoDock4 results that list all the important data revelant to the Pose Reproduction Benchmark results.
 +
 +
Argument 1 Is the docking directory which stores that docking result
 +
 +
Argument 2 is the list of all the systems being tested so far
 +
 +
Argument 3 is the new name of the csv file that will be generated for the docking results, .csv will be appended to the end of it
 +
 +
python /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/bickel_laverty_calculate_autodock4_results.score.noH.py Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all Tutorial.3.Score
 +
 +
You should be able to see the new csv file generated within this directory
 +
 
 +
cd AutoDock4_Tutorial
 +
 +
You can open this file using this command, and make sure the CSV ran properly
 +
 +
vim Tutorial.3.Score.csv
 +
 +
[[File:AutoDock4 CSV results.png]]
 +
 +
==VIII.Generating Graphs from the CSV==
 +
Lastly, the csv file will be used to generate all the graphs to analyze these results
 +
 +
python ../../DOCK6_Pose_Reproduction_analysis_v4.py Tutorial.3.Score.2.csv 1
 +
[[File:Tutorial Docking 1.png]]
 +
python ../../DOCK6_Pose_Reproduction_analysis_v4.py Tutorial.3.Score.2.csv 2
 +
[[File:Tutorial Docking 2.png]]
 +
python ../../DOCK6_Pose_Reproduction_analysis_v4.py Tutorial.3.Score.2.csv 3
 +
[[File:Tutorial Docking 3.png]]

Latest revision as of 11:25, 12 August 2020

The purpose of this tutorial is to provide members of the Rizzo lab a way to benchmark the AutoDock4 software to compare the Pose Reproduction success rates against the DOCK6 software. (Note: The program version used for this experiment was AutoDock4.2.6 and DOCK6.9

I. Introduction

AutoDock4 is a commonly used docking program which assess the affinity of a ligand, a drug candidate to a target site (Protein, Enzyme, RNA). To evaluate a software's ability to accurately reproduce experimental results, an experiment called Pose Reproduction was developed

Pose Reproduction used an experimental known ligand and protein complex from the PDB database and attempts to dock this ligand back into it's original location. If the lowest energy ligand(most energetically favorable) is within 2.0 RMSDh of the original target site, this is referred to as a docking success. If any of the ligands, but not the lowest energy ligand is within 2.0 RMSDh of the original target site, this is referred to as a scoring failure. If none of the ligands are within 2.0 RMSDh of the original target site this is referred to as a sampling failure.

Pose Reproduction Intro.JPG

II.Prepping Directories

First step is prepare a file with a list of systems within it. For this docking experiment this file was called clean.systems.all

121P
181L
182L
183L 
184L
etc

Second step run the run.000.AutoDock.source.sh script to prepare a directory for each system in the file. The 1st arguement is the list of systems file made in the previous step. The 2nd Arguement is the new directory that will be made where all the AutoDock4 experiments will be performed

bash ./run.000.AutoDock.source.sh ../clean.systems.all AutoDock4_Tutorial

The directory where all the directories will be formed is

AutoDock4_Tutorial/

Each System will have it's own directory in this file

AutoDock4_Tutorial/121P/
AutoDock4_Tutorial/181L/
AutoDock4_Tutorial/182L/
etc

III.Preparing Receptors and Ligands

For this part of the experiment, the receptors and ligands were prepared into pdbqt format. To accomplish this part of the experiment the original mol2 files were used from the testset to convert these systems. The ligands will be assigned gasteiger charges and am1bcc charges will be assigned to the receptor, which produced the highest success rates in previous experiments and was performed in previous papers. Scripts were developed to process these systems from mol2 to pdbqt.

Command to convert these files

Step 1) Make sure you are in the correct directory

cd AutoDock4_Tutorial/

Step 2) Run the correct bash script to run these molecules

bash ./../run001.AutoDock4.system.prep.sh /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all ../AutoDock4_Tutorial

This script will create a ligand and pdbqt receptor in each system directory

cd 121P/
ls
121P/121P.lig.am1bcc.pdbqt
121P/121P.rec.clean.pdbqt

Further Processing may be needed to prepare these systems, will explain later

This scripts used to accomplish this were prepare_ligand4.py and prepare_receptor4.py found in mgltools/1.5.6

IV.Preparing Grids

For this part of the experiment the grids will be generated for each ligand and receptor

Step 1) Enter the correct directory

cd AutoDock4_Tutorial

Step 2) Run the bash script that creates the grids, recommending submitting to the qsub

Important Note: This script uses the ligand location as the center of the grid position, since all ligands in these systems are already in the binding pocket


bash ./../run002.AutoDock4.grid.generation.sh /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all ../AutoDock4_Tutorial > AutoDock4_grid_output

This will generate all of the grids by using the ligand and receptor previously generated, it's possible to edit the parameters in this script using the prepare_gpf4.py script by inputting new parameters such as the gridbox which is adjusted using the option npts='60,60,60' or the center of the box which uses the -y command to center the grid box around the ligand

Usage: prepare_gpf4.py -l pdbqt_file -r pdbqt_file
    -l ligand_filename
    -r receptor_filename
Optional parameters:
   [-i reference_gpf_filename]
   [-o output_gpf_filename]
   [-x flexres_filename]
   [-p parameter=newvalue. For example: -p ligand_types='HD,Br,A,C,OA' or p npts='60,60,66' or gridcenter='2.5,6.5,-7.5']
   [-d directory of ligands to use to set types]
   [-y boolean to center grids on center of ligand]
   [-n boolean to NOT size_box_to_include_ligand]
   [-I increment npts in all 3 dimensions by this integer]
   [-v]


Following this cd into the 121P system directory

cd AutoDock4_Tutorial/121P/

ls

This will always generate the grids, .fld, .xyz, a variety of .map files will be generated for each chemical type present within the systems, and a log file of the results a .glg file

121P.autogrid.glg
121P.lig.am1bcc.pdbqt
121P.rec.clean.A.map  
121P.rec.clean.e.map   
121P.rec.clean.maps.fld  
121P.rec.clean.N.map   
121P.rec.clean.P.map
121P.rec.clean.C.map  
121P.rec.clean.gpf     
121P.rec.clean.maps.xyz  
121P.rec.clean.OA.map
121P.rec.clean.d.map  
121P.rec.clean.HD.map  
121P.rec.clean.NA.map    
121P.rec.clean.pdbqt

The commands used for this script was prepare_gpf4.py found in mgltools/1.5.6 and autogrid4 found in autodock/4.2.6

V.Docking Ligands

The following step will perform the actual AutoDock4 docking for this experiment. For this part of the experiment, it's possible to reuse the previously generated ligands, receptors, and grids to perform multiple docking experiments. This eliminates any variables that later bias in case systems were converted differently and to save time to prevent the user from rerunning all these steps again.

To conduct this experiment

cd AutoDock4_Tutorial

Following this run the script run003.AutoDock4.docking.sh Arguement 1 is the list of systems Arguement 2 is the directory were all the system directories are located Arguement 3 is the docking directory created for each docking experiment

bash ../run003.AutoDock4.docking.sh /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all ../AutoDock4_Tutorial Tutorial_Docking

Once, this completed all the systems should look the same. Perform the ls command to see all files

ls

The following systems should show

121P.autogrid.glg
121P.lig.am1bcc.pdbqt
121P.rec.clean.A.map  
121P.rec.clean.e.map   
121P.rec.clean.maps.fld  
121P.rec.clean.N.map   
121P.rec.clean.P.map
121P.rec.clean.C.map  
121P.rec.clean.gpf     
121P.rec.clean.maps.xyz  
121P.rec.clean.OA.map
121P.rec.clean.d.map  
121P.rec.clean.HD.map  
121P.rec.clean.NA.map    
121P.rec.clean.pdbqt
Tutorial_Docking/

Following this cd into the Tutorial_Docking/ and ls to view results

cd Tutorial_Docking/
ls

The following should be within the directory

121P.docking.dlg  
121P.docking.dpf 
121P.dock.parameter.dpf 
summary_of_results_1.0


Trouble Shooting

Some of these systems may present issues such as zero charge atom types or non-integral charge systems(ex. is a ligand with a charge of 2.48) This should only occur with ligands because the receptors are maintaining their am1bcc charges.

Non-integral Charge Issue will look like this, and occur during the docking step

autodock4: *** Caution!  Non-integral total charge (-2.498 e) on ligand may indicate a problem... ***

To trouble shoot, you'll need to perform DOCK the first time to see which systems are problematic


This troubleshoot increases the success rate of the Pose Reproduction by 3%


Following all this the grids were regenerated for all these systems. Once these ligands, receptors, and grids have been generated once. You'll be able to reuse these systems and not need to repeat all the previous steps again.


The following script was used to determine and isolate these problematic ligands. This adds the nonintegral charge systems and systems with zero charge atoms into one directory.

Arguement 1 is the list of systems

Arguement 2 is the docking directory

 python ../Problematic_files_transfer.py /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all Tutorial_Docking

This creates a directory of Problematic_ligands/ and the directory Broken/ was created within this to store these problematic ligands

cd AutoDock4_Tutorial/Problematic_ligands/Broken
ls

This shows the list of all the problematically charged ligands

121P.lig.am1bcc.mol2
181L.lig.am1bcc.mol2
184L.lig.am1bcc.mol2
etc

Once these ligands are identified into their own directory open them all up in chimera

File->open->All_Ligands

Problematic Ligands.png

Then add gasteiger charges to all the systems

Tools->Structure Editing->Add Charges

Charges to Ligand.PNG

Then press ok under these settings

Following this, save all of these fixed ligand into one directory. This directory was put into the Fixed directory

cd Fixed/

The inside of the directory will have a list of all the ligands within the directory that are all fixed

ls

This should show, all the fixed ligands. Directory can be named anything but ligands needs to have this file name format to work

121P.lig.am1bcc.mol2
181L.lig.am1bcc.mol2
182L.lig.am1bcc.mol2
183L.lig.am1bcc.mol2
etc

Lastly a script will be used to convert these properly charged mol2 files to pdbqt files with the same charges.

(NOTE: USE ABSOLUTE PATHS FOR THIS SCRIPT)

Argument 1 is the list of problematic ligand systems generated previously with Problematic_ligand_transfer.py script and in the AutoDock4_Tutorial/ directory

Argument 2 is the directory containing all the Fixed systems

Argument 3 is the main directory for the docking experiments, AutoDock4_tutorial

bash ../Fixed_ligand_transfer.sh /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/Problematic.txt /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/Problematic_ligands/Fixed /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial

Following this you can generate all the grids again in part IV and perform the docking experiment Part V with just these problematic ligands using the Problematic.txt file

VI.Rescoring in DOCK6

Following this these molecules needed to be rescored with DOCK6 to provide an accurate comparison between these two softwares. To accomplish this openbabel was used to convert these systems from pdbqt to mol2 files. This conversion changes them to delete all the hydrogens from the systems.The First script converted these atoms from pdbqt to mol2 format using openbabel

cd AutoDock4_Tutorial

1st argument is the number of different conformations

2nd argument is the docking directory

3rd argument is the list of all the systems being docked

bash /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/pdbqt_to_mol2_AutoDock4.sh 10 Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all


These are then rescored using DOCK6's all atom model option, which should only take 1 second per system.

This is the script that rescores all the molecules

Argument 1 is the number of ligands generated by AutoDock4, This value should be equal to the number of GA runs

Argument 2 is the Docking directory where the AutoDock4 experiment was conducted

bash /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/run004.AutoDock4.Rescore.with.DOCK.v5.sh 10 Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all

This will split each of the docking results to pdbqt, convert these files to mol2 files, then rescore them all in DOCK6. This will generate many different files but the key 2 files in each is AutoDock4_Tutorial/sys/sys.output_scored.mol2 and AutoDock4_Tutorial/sys/sys.output.all_scored.mol2. You should see these files when cd into a directory, then using the ls command

cd 121P/Tutorial_Docking
ls

Go back into the main directory, some of these ligands had been prepared improperly from open babel which caused them to have -1000 Hungarian RMSD's and a script was developed to fix this. Note: This is performed after rescoring all the molecules because this script only changes the ligands of the problematic systems

cd AutoDock4_Tutorial

1st argument is the list of all the systems being tested in the experiment

2nd argument is the Docking directory

python /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/atom_type_fix_rescore.py /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all Tutorial_Docking

This script fixes these ligands and generates a txt file named "redo_rescore.txt", this file has the list of all the systems that have been altered due to problematic issues. Following this only the problematic systems are rescored with DOCK6.9 using the previous run004 script.

bash /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/run004.AutoDock4.Rescore.with.DOCK.v5.sh 10 Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/redo_rescore.txt

After this step has been performed all the systems will have been successfully docked and we can analyze all the results

VII.Create CSV

This script will generate the AutoDock4 results that list all the important data revelant to the Pose Reproduction Benchmark results.

Argument 1 Is the docking directory which stores that docking result

Argument 2 is the list of all the systems being tested so far

Argument 3 is the new name of the csv file that will be generated for the docking results, .csv will be appended to the end of it

python /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/slaverty_autodock_runs/AutoDock4_Tutorial/bickel_laverty_calculate_autodock4_results.score.noH.py Tutorial_Docking /gpfs/projects/rizzo/yuchzhou/RCR/DOCK_testset/clean.systems.all Tutorial.3.Score

You should be able to see the new csv file generated within this directory

cd AutoDock4_Tutorial

You can open this file using this command, and make sure the CSV ran properly

vim Tutorial.3.Score.csv

AutoDock4 CSV results.png

VIII.Generating Graphs from the CSV

Lastly, the csv file will be used to generate all the graphs to analyze these results

python ../../DOCK6_Pose_Reproduction_analysis_v4.py Tutorial.3.Score.2.csv 1 Tutorial Docking 1.png python ../../DOCK6_Pose_Reproduction_analysis_v4.py Tutorial.3.Score.2.csv 2 Tutorial Docking 2.png python ../../DOCK6_Pose_Reproduction_analysis_v4.py Tutorial.3.Score.2.csv 3 Tutorial Docking 3.png