Database Enrichment SB2024 V1 DOCK6.10 A

From Rizzo_Lab
Revision as of 20:17, 24 January 2024 by Ccorbo (talk | contribs)
Jump to: navigation, search

The purpose of this tutorial is to develop a uniform method to test ligand enrichment across the Rizzo lab with the DOCK software. Note any data in this tutorial is solely for the purpose of example.

I.Introduction

Ligand Enrichment is an experiment used to evaluate how well a docking program can rank experimentally known binders (termed actives) over decoy molecules for a given target. These active and decoy ligands are ideally property matched meaning an active has decoys with similar physiochemical properties. These active ligands should bind more favorably(Have a lower energy score) then the decoy ligands if the docking program can accurately model these binding site and ligand interactions.


The 3 major outcomes for this experiment are early enrichment, random enrichment, and late enrichment. Early enrichment indicates the active ligands dock more successful in the experiment(The goal for all docking programs). The second is random enrichment indicating that the docking program cannot differentiate between active and decoy. Late enrichment indicating that docking software gives the lowest energy scores to the decoys which is the worst outcome.

II.Prepping systems

-The first step is to create directories.

    mkdir testset

-Create subdirectory for each system you will run

    mkdir 1Q4X

- Then obtain the active and decoy ligands which can be found on the Schoichet DUD-E test set website http://dude.docking.org/targets. Once these targets are obtained unzip these files using the gzip command and move them into the appropriate subdirectory.

    cd 1Q4X
    gzip -d actives_final.mol2.gz 
    gzip -d decoys_final.mol2.gz 
   

-Prepare the target receptor by either using the official SB2023 test set files (to be published) or prepare the receptor associated with the PDB using run000 to run004 in https://github.com/rizzolab/Testset_Protocols and move relevant files into the directory ~/testset/1Q4X

Following all these steps you should have a separate subdirectory for each system with the following files:

    actives_final.mol2
    decoys_final.mol2
    1Q4X.rec.clean.mol2
    1Q4X.rec.clust.close.sph
    1Q4X.rec.nrg
    1Q4X.rec.bmp

III.Docking molecules

-Now that files are ready for docking step a virtual screen will be conducted for both the active and decoy ligands separately.

-Pull Database Enrichment scripts from https://github.com/rizzolab/Benchmarking_and_Validation

- 001.submit.sh has #SBATCH header for submitting to an HPC, such as seawulf. If not using an HPC, delete #SBATCH lines.

- Enter required parameters in script

    testset=" Path to folder with all system subdirectories"
    system_file=" List of systems to run"
       ie: 1Q4X
           1BCD
           1SJ0
           ...
    dock=" Path to dock uppermost folder"
    mpi="Yes / No" - do you want to run in parallel
    processes=" Number of processes" - only set if mpi = Yes
    sbatch or bash 001.submit.sh

- After docking has completed, the folder testset/1Q4X will now have the following files, as well as input and output docking files:

    1Q4X_actives.FLX_scored.mol2  
    1Q4X_decoys.FLX_scored.mol2   
    Active_score.txt  
    Decoy_score.txt            
    All_score.txt  
    All_score_sort.txt

- All_score_sort.txt will have the list of actives and decoys and their associated ranked scores:

    -105.160493 Decoy
    -105.037376 Active
    -104.870392 Decoy
    -103.900323 Decoy
    -103.186615 Active
    -103.178314 Decoy
    ...


IV.Ligand Enrichment Analysis

-002.analysis.sh assumes anaconda/3 is installed as a module. If not the bash script can be edited for the python scripts to be run externally with python3.

-Before running 002.analysis.sh again fill in parameters "testset" and "system_file" with same previous values.

    bash 002.analysis.sh

-Some "philosophical" decisions are built into these scripts and are important to be aware of:

    1. Actives and decoys which do not successfully dock are added to the end of the ranked list at a random enrichment rate (actives and decoys equally interspersed)
    2. Active and Decoy mol2 may have multiple protomers of the same ligand. These scripts retain all protomers for rescoring, although it may be desireable to retain only the best scoring protomer of each molecule.

- This will generate a roc curve for each system and place it in the file:

    ~testset/plots/1Q4X_Enrichment.png


1Q4X ligand enrichment DOCK6.9.png

There will also be a file quantifying the outcome:

    Statistics.txt
    1Q4X
    1%
    AUC is 5.149840284033384
    Actives Count is 77
    Decoys Count is 248
    10%
    AUC is 304.8390844661322
    Actives Count is 393
    Decoys Count is 2861
    100%
    AUC is 8236.886617507042


SEE README FILE FOR ADDTIONAL DETAILS THAT MAY NOT BE COVERED HERE