Database Enrichment SB2024 V1 DOCK6.10 A
The purpose of this tutorial is to develop a uniform method to test ligand enrichment across the Rizzo lab with the DOCK software.
Contents
I.Introduction
Ligand Enrichment is a common experiment used to evaluate how well a docking program is capable of accurately modeling in vitro experiments. This experiment uses active ligands and decoy ligands to access a docking programs ability to successfully dock to a target site. These active and decoy ligands are roughly the same size and differ due to chemical similarities. These active ligands should bind more favorably(Have a lower energy score) then the decoy ligands if the docking program can accurately model these binding site and ligand interactions.
The 3 major outcomes for this experiment are early enrichment, random enrichment, and late enrichment. Early enrichment indicates the active ligands dock more successful in the experiment(The goal for all docking programs). The second is random enrichment indicating that the docking program can differentiate between active and decoy. Late enrichment indicating that docking software gives the lowest energy scores to the decoys which is the worst outcome. The other factor to consider is the degree of early and late enrichment
II.Prepping systems
-The first step is to create a directory for the system you are preparing
mkdir 1Q4X
-The first step is to obtain the active and decoy ligand test set systems which can be found on the Schoichet DUD-E test set website http://dude.docking.org/targets
-Once these targets are obtained unzip these files using the gzip command to get the active and decoy forms
gzip -d actives_final.mol2.gz gzip -d decoys_final.mol2.gz
-Prepare the target receptor by either using the official test set ligands or manually prepare a receptor target from scratch
Following all these steps your directory should look like the following using the 1QRX system
    actives_final.mol2
    decoys_final.mol2
    1Q4X.rec.clean.mol2
III.Docking molecules
-After completing this step a virtual screen will be conducted using mpi for both the active and decoy ligands seperately
-The input parameters are as follows for the active ligands
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 1000 pruning_clustering_cutoff 100 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file actives_final.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand yes automated_matching yes receptor_site_file /gpfs/projects/rizzo/ccorbo/2020_DUDE_0.3_gridspacing/DUDE_Good_to_go/1Q4X/1Q4X.rec.clust.close.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix /gpfs/projects/rizzo/ccorbo/2020_DUDE_0.3_gridspacing/DUDE_Good_to_go/1Q4X/1Q4X.rec multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix 1Q4X.active.output.mpi write_orientations no num_scored_conformers 1 rank_ligands no
-The input parameters for the decoy ligands.
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 1000 pruning_clustering_cutoff 100 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file decoys_final.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand yes automated_matching yes receptor_site_file /gpfs/projects/rizzo/ccorbo/2020_DUDE_0.3_gridspacing/DUDE_Good_to_go/1Q4X/1Q4X.rec.clust.close.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix /gpfs/projects/rizzo/ccorbo/2020_DUDE_0.3_gridspacing/DUDE_Good_to_go/1Q4X/1Q4X.rec multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix 1Q4X.decoy.output.mpi write_orientations no num_scored_conformers 1 rank_ligands no
-Then submit the script to the qsub to dock the molecule in parallel. Some of the ligand active and decoy testsets are quite large so mpi submission is recommended.
#!/bin/bash #SBATCH --partition=rn-long-40core #SBATCH --time=48:00:00 #SBATCH --nodes=4 #SBATCH --ntasks=160 #SBATCH --job-name=1B9V_mpi_runs #SBATCH --output=1B9V_mpi_runs
cd $SLURM_SUBMIT_DIR module load intel/mpi/64/2018/18.0.3 mpirun -np 160 dock6.mpi -i 1Q4X_active_mpi.in -o 1Q4X_decoy_mpi.out mpirun -np 160 dock6.mpi -i 1Q4X_decoy_mpi.in -o 1Q4X_decoy_mpi.out
IV.Ligand Enrichment Analysis
-Lastly, 2 scripts were developed to analyze the results. One script to generate a CSV file and a secondary script that uses the CSV data to create a graph.
-The script that generates the CSV file takes three parameters, the list of systems, name of decoy ligands mol2 file, name of active ligands mol2 file. (NOTE: This script can generate multiple CSV files for different ligand experiments, but the naming of the active and decoy mol2 files must be the same,) The 1Q4X.txt file has the following text
1Q4X
If your creating multiple csvs for multiple systems the format fill be
1Q4X 1LRU 1SYN etc
This script is run one directory before the 1Q4X directory(Not in the 1Q4X directory)
1Q4X/
Example:
python roc_curve_lig_enrichment_v2.py 1Q4X.txt decoys_final.mol2 actives_final.mol2
This produces the csv file in the 1Q4X
1Q4X_lig_enrichment.csv
Following this a python script is used to create a graph to analyze the results First change directory into the 1Q4X directory
cd 1Q4X
Then run the script make_roc_curve.py CSV_file Name (Note: Name can be anything) Example:
python ../make_roc_curve.py 1Q4X_lig_enrichment.csv DOCK6.9
This ROC curve is generated using 2 formulas for the graph For Decoy, the x-axis
# of Docked Decoys/Total Decoys
For Active, the y-axis
# of Docked Actives/Total Actives
These ROC curves start with no active and decoy ligands docked.
# of Docked Decoys=0 # of Docked Actives=0
Then these are sorted from lowest to highest energy score, if active is lower then 1 is added to active
# of Docked Decoys=# of Docked Decoys # of Docked Actives=# of Docked Actives + 1
If decoys are lower than 1 is added to the decoy list
# of Docked Decoys=# of Docked Decoys + 1 # of Active Decoys=# of Actives Decoys
These are all continued until all of these active and decoys ligands are added til graph is [1,1] (Note: If all molecules aren't docked 1,1 is appended to the end of the csv as the final location)
