Difference between revisions of "Database Enrichment SB2024 V1 DOCK6.10 A"

Revision as of 20:22, 24 January 2024

The purpose of this tutorial is to develop a uniform method to test ligand enrichment across the Rizzo lab with the DOCK software.

I.Introduction

Ligand Enrichment is an experiment used to evaluate how well a docking program can rank experimentally known binders (termed actives) over decoy molecules for a given target. These active and decoy ligands are ideally property matched meaning an active has decoys with similar physiochemical properties. These active ligands should bind more favorably(Have a lower energy score) then the decoy ligands if the docking program can accurately model these binding site and ligand interactions.

The 3 major outcomes for this experiment are early enrichment, random enrichment, and late enrichment. Early enrichment indicates the active ligands dock more successful in the experiment(The goal for all docking programs). The second is random enrichment indicating that the docking program cannot differentiate between active and decoy. Late enrichment indicating that docking software gives the lowest energy scores to the decoys which is the worst outcome.

II.Prepping systems

-The first step is to create directories.

    mkdir testset

-Create subdirectory for each system you will run

    mkdir 1Q4X

- Then obtain the active and decoy ligands which can be found on the Schoichet DUD-E test set website http://dude.docking.org/targets. Once these targets are obtained unzip these files using the gzip command and move them into the appropriate subdirectory.

    cd 1BCD
    gzip -d actives_final.mol2.gz 
    gzip -d decoys_final.mol2.gz

-Prepare the target receptor by either using the official SB2023 test set files (to be published) or prepare the receptor associated with the PDB using run000 to run004 in https://github.com/rizzolab/Testset_Protocols and move relevant files into the directory ~/testset/1Q4X

Following all these steps you should have a separate subdirectory for each system with the following files:

    actives_final.mol2
    decoys_final.mol2
    1Q4X.rec.clean.mol2
    1Q4X.rec.clust.close.sph
    1Q4X.rec.nrg
    1Q4X.rec.bmp

III.Docking molecules

-Now that files are ready for docking step a virtual screen will be conducted for both the active and decoy ligands separately.

-Pull Database Enrichment scripts from https://github.com/rizzolab/Benchmarking_and_Validation

- 001.submit.sh has #SBATCH header for submitting to an HPC, such as seawulf. If not using an HPC, delete #SBATCH lines.

 Enter required parameters in script
    testset=" Path to folder with all system subdirectories"

    system_file=" List of systems to run"
       ie: 1Q4X
           1BCD
           1SJ0
           ...

    dock=" Path to dock uppermost folder"

    mpi="Yes / No" - do you want to run in parallel

    processes=" Number of processes" - only set if mpi = Yes

    sbatch or bash 001.submit.sh

IV.Ligand Enrichment Analysis

-

Difference between revisions of "Database Enrichment SB2024 V1 DOCK6.10 A"

Revision as of 20:22, 24 January 2024

Contents

I.Introduction

II.Prepping systems

III.Docking molecules

IV.Ligand Enrichment Analysis

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Rizzo Lab

Courses

Toolbox

@@ Line 2: / Line 2: @@
 ==I.Introduction==
-Ligand Enrichment is a common experiment used to evaluate how well a docking program is capable of accurately modeling in vitro experiments. This experiment uses active ligands and decoy ligands to access a docking programs ability to successfully dock to a target site. These active and decoy ligands are roughly the same size and differ due to chemical similarities. These active ligands should bind more favorably(Have a lower energy score) then the decoy ligands if the docking program can accurately model these binding site and ligand interactions.
+Ligand Enrichment is an experiment used to evaluate how well a docking program can rank experimentally known binders (termed actives) over decoy molecules for a given target. These active and decoy ligands are ideally property matched meaning an active has decoys with similar physiochemical properties. These active ligands should bind more favorably(Have a lower energy score) then the decoy ligands if the docking program can accurately model these binding site and ligand interactions.
-The 3 major outcomes for this experiment are early enrichment, random enrichment, and late enrichment. Early enrichment indicates the active ligands dock more successful in the experiment(The goal for all docking programs). The second is random enrichment indicating that the docking program can differentiate between active and decoy. Late enrichment indicating that docking software gives the lowest energy scores to the decoys which is the worst outcome. The other factor to consider is the degree of early and late enrichment
+The 3 major outcomes for this experiment are early enrichment, random enrichment, and late enrichment. Early enrichment indicates the active ligands dock more successful in the experiment(The goal for all docking programs). The second is random enrichment indicating that the docking program cannot differentiate between active and decoy. Late enrichment indicating that docking software gives the lowest energy scores to the decoys which is the worst outcome.
 ==II.Prepping systems==
--The first step is to create a directory for the system you are preparing
+-The first step is to create directories.
+     mkdir testset
+-Create subdirectory for each system you will run
       mkdir 1Q4X
--The first step is to obtain the active and decoy ligand test set systems which can be found on the Schoichet DUD-E test set website http://dude.docking.org/targets
+- Then obtain the active and decoy ligands which can be found on the Schoichet DUD-E test set website http://dude.docking.org/targets. Once these targets are obtained unzip these files using the gzip command and move them into the appropriate subdirectory.
--Once these targets are obtained unzip these files using the gzip command to get the active and decoy forms
+     cd 1BCD
- gzip -d actives_final.mol2.gz
+     gzip -d actives_final.mol2.gz
- gzip -d decoys_final.mol2.gz
+     gzip -d decoys_final.mol2.gz
--Prepare the target receptor by either using the official test set ligands or manually prepare a receptor target from scratch
+-Prepare the target receptor by either using the official SB2023 test set files (to be published) or prepare the receptor associated with the PDB using run000 to run004 in https://github.com/rizzolab/Testset_Protocols and move relevant files into the directory ~/testset/1Q4X
-Following all these steps your directory should look like the following using the 1QRX system
+Following all these steps you should have a separate subdirectory for each system with the following files:
       actives_final.mol2
       decoys_final.mol2
 Q4X.rec.clean.mol2
+Q4X.rec.clust.close.sph
+Q4X.rec.nrg
+Q4X.rec.bmp
 ==III.Docking molecules==
--After completing this step a virtual screen will be conducted using mpi for both the active and decoy ligands seperately
+-Now that files are ready for docking step a virtual screen will be conducted for both the active and decoy ligands separately.
--The input parameters are as follows for the active ligands
+-Pull Database Enrichment scripts from https://github.com/rizzolab/Benchmarking_and_Validation
- conformer_search_type                                        flex
+- 001.submit.sh has #SBATCH header for submitting to an HPC, such as seawulf. If not using an HPC, delete #SBATCH lines.
- write_fragment_libraries                                     no
+  Enter required parameters in script
- user_specified_anchor                                        no
+     testset=" Path to folder with all system subdirectories"
- limit_max_anchors                                            no
- min_anchor_size                                              5
- pruning_use_clustering                                       yes
- pruning_max_orients                                          1000
- pruning_clustering_cutoff                                    100
- pruning_conformer_score_cutoff                               100.0
- pruning_conformer_score_scaling_factor                       1.0
- use_clash_overlap                                            no
- write_growth_tree                                            no
- use_internal_energy                                          yes
- internal_energy_rep_exp                                      12
- internal_energy_cutoff                                       100.0
- ligand_atom_file                                             actives_final.mol2
- limit_max_ligands                                            no
- skip_molecule                                                no
- read_mol_solvation                                           no
- calculate_rmsd                                               no
- use_database_filter                                          no
- orient_ligand                                                yes
- automated_matching                                           yes
- receptor_site_file                                           /gpfs/projects/rizzo/ccorbo/2020_DUDE_0.3_gridspacing/DUDE_Good_to_go/1Q4X/1Q4X.rec.clust.close.sph
- max_orientations                                             1000
- critical_points                                              no
- chemical_matching                                            no
- use_ligand_spheres                                           no
- bump_filter                                                  no
- score_molecules                                              yes
- contact_score_primary                                        no
- contact_score_secondary                                      no
- grid_score_primary                                           yes
- grid_score_secondary                                         no
- grid_score_rep_rad_scale                                     1
- grid_score_vdw_scale                                         1
- grid_score_es_scale                                          1
- grid_score_grid_prefix                                       /gpfs/projects/rizzo/ccorbo/2020_DUDE_0.3_gridspacing/DUDE_Good_to_go/1Q4X/1Q4X.rec
- multigrid_score_secondary                                    no
- dock3.5_score_secondary                                      no
- continuous_score_secondary                                   no
- footprint_similarity_score_secondary                         no
- pharmacophore_score_secondary                                no
- descriptor_score_secondary                                   no
- gbsa_zou_score_secondary                                     no
- gbsa_hawkins_score_secondary                                 no
- SASA_score_secondary                                         no
- amber_score_secondary                                        no
- minimize_ligand                                              yes
- minimize_anchor                                              yes
- minimize_flexible_growth                                     yes
- use_advanced_simplex_parameters                              no
- simplex_max_cycles                                           1
- simplex_score_converge                                       0.1
- simplex_cycle_converge                                       1.0
- simplex_trans_step                                           1.0
- simplex_rot_step                                             0.1
- simplex_tors_step                                            10.0
- simplex_anchor_max_iterations                                500
- simplex_grow_max_iterations                                  500
- simplex_grow_tors_premin_iterations                          0
- simplex_random_seed                                          0
- simplex_restraint_min                                        no
- atom_model                                                   all
- vdw_defn_file                                                /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
- flex_defn_file                                               /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/flex.defn
- flex_drive_file                                              /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
- ligand_outfile_prefix                                        1Q4X.active.output.mpi
- write_orientations                                           no
- num_scored_conformers                                        1
- rank_ligands                                                 no
--The input parameters for the decoy ligands.
+     system_file=" List of systems to run"
+        ie: 1Q4X
+BCD
+SJ0
+            ...
- conformer_search_type                                        flex
+     dock=" Path to dock uppermost folder"
- write_fragment_libraries                                     no
- user_specified_anchor                                        no
- limit_max_anchors                                            no
- min_anchor_size                                              5
- pruning_use_clustering                                       yes
- pruning_max_orients                                          1000
- pruning_clustering_cutoff                                    100
- pruning_conformer_score_cutoff                               100.0
- pruning_conformer_score_scaling_factor                       1.0
- use_clash_overlap                                            no
- write_growth_tree                                            no
- use_internal_energy                                          yes
- internal_energy_rep_exp                                      12
- internal_energy_cutoff                                       100.0
- ligand_atom_file                                             decoys_final.mol2
- limit_max_ligands                                            no
- skip_molecule                                                no
- read_mol_solvation                                           no
- calculate_rmsd                                               no
- use_database_filter                                          no
- orient_ligand                                                yes
- automated_matching                                           yes
- receptor_site_file                                           /gpfs/projects/rizzo/ccorbo/2020_DUDE_0.3_gridspacing/DUDE_Good_to_go/1Q4X/1Q4X.rec.clust.close.sph
- max_orientations                                             1000
- critical_points                                              no
- chemical_matching                                            no
- use_ligand_spheres                                           no
- bump_filter                                                  no
- score_molecules                                              yes
- contact_score_primary                                        no
- contact_score_secondary                                      no
- grid_score_primary                                           yes
- grid_score_secondary                                         no
- grid_score_rep_rad_scale                                     1
- grid_score_vdw_scale                                         1
- grid_score_es_scale                                          1
- grid_score_grid_prefix                                       /gpfs/projects/rizzo/ccorbo/2020_DUDE_0.3_gridspacing/DUDE_Good_to_go/1Q4X/1Q4X.rec
- multigrid_score_secondary                                    no
- dock3.5_score_secondary                                      no
- continuous_score_secondary                                   no
- footprint_similarity_score_secondary                         no
- pharmacophore_score_secondary                                no
- descriptor_score_secondary                                   no
- gbsa_zou_score_secondary                                     no
- gbsa_hawkins_score_secondary                                 no
- SASA_score_secondary                                         no
- amber_score_secondary                                        no
- minimize_ligand                                              yes
- minimize_anchor                                              yes
- minimize_flexible_growth                                     yes
- use_advanced_simplex_parameters                              no
- simplex_max_cycles                                           1
- simplex_score_converge                                       0.1
- simplex_cycle_converge                                       1.0
- simplex_trans_step                                           1.0
- simplex_rot_step                                             0.1
- simplex_tors_step                                            10.0
- simplex_anchor_max_iterations                                500
- simplex_grow_max_iterations                                  500
- simplex_grow_tors_premin_iterations                          0
- simplex_random_seed                                          0
- simplex_restraint_min                                        no
- atom_model                                                   all
- vdw_defn_file                                                /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
- flex_defn_file                                               /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/flex.defn
- flex_drive_file                                              /gpfs/projects/rizzo/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
- ligand_outfile_prefix                                        1Q4X.decoy.output.mpi
- write_orientations                                           no
- num_scored_conformers                                        1
- rank_ligands                                                 no
+     mpi="Yes / No" - do you want to run in parallel
--Then submit the script to the qsub to dock the molecule in parallel. Some of the ligand active and decoy testsets are quite large so mpi submission is recommended.
+     processes=" Number of processes" - only set if mpi = Yes
- #!/bin/bash
- #SBATCH --partition=rn-long-40core
- #SBATCH --time=48:00:00
- #SBATCH --nodes=4
- #SBATCH --ntasks=160
- #SBATCH --job-name=1B9V_mpi_runs
- #SBATCH --output=1B9V_mpi_runs
- cd $SLURM_SUBMIT_DIR
+     sbatch or bash 001.submit.sh
- module load intel/mpi/64/2018/18.0.3
- mpirun -np 160 dock6.mpi -i 1Q4X_active_mpi.in -o 1Q4X_decoy_mpi.out
- mpirun -np 160 dock6.mpi -i 1Q4X_decoy_mpi.in -o 1Q4X_decoy_mpi.out
 ==IV.Ligand Enrichment Analysis==
--Lastly, 2 scripts were developed to analyze the results. One script to generate a CSV file and a secondary script that uses the CSV data to create a graph.
+-
--The script that generates the CSV file takes three parameters, the list of systems, name of decoy ligands mol2 file, name of active ligands mol2 file.
-(NOTE: This script can generate multiple CSV files for different ligand experiments, but the naming of the active and decoy mol2 files must be the same,)
-The 1Q4X.txt file has the following text
-Q4X
-If your creating multiple csvs for multiple systems the format fill be
-Q4X
-LRU
-SYN
- etc
-This script is run one directory before the 1Q4X directory(Not in the 1Q4X directory)
-Q4X/
-Example:
- python roc_curve_lig_enrichment_v2.py 1Q4X.txt decoys_final.mol2 actives_final.mol2
-This produces the csv file in the 1Q4X
-Q4X_lig_enrichment.csv
-Following this a python script is used to create a graph to analyze the results
-First change directory into the 1Q4X directory
- cd 1Q4X
-Then run the script make_roc_curve.py CSV_file Name
-(Note: Name can be anything)
-Example:
- python ../make_roc_curve.py 1Q4X_lig_enrichment.csv DOCK6.9
   [[File:1Q4X_ligand_enrichment_DOCK6.9.png]]
-This ROC curve is generated using 2 formulas for the graph
-For Decoy, the x-axis
- # of Docked Decoys/Total Decoys
-For Active, the y-axis
- # of Docked Actives/Total Actives
-These ROC curves start with no active and decoy ligands docked.
- # of Docked Decoys=0
- # of Docked Actives=0
-Then these are sorted from lowest to highest energy score, if active is lower then 1 is added to active
- # of Docked Decoys=# of Docked Decoys
- # of Docked Actives=# of Docked Actives + 1
-If decoys are lower than 1 is added to the decoy list
- # of Docked Decoys=# of Docked Decoys + 1
- # of Active Decoys=# of Actives Decoys
-These are all continued until all of these active and decoys ligands are added til graph is [1,1]
-(Note: If all molecules aren't docked 1,1 is appended to the end of the csv as the final location)