Difference between revisions of "2017 Denovo design tutorial 1 with PDB 1BJU"
(Created page with "==2017 Denovo design tutorial 1 with PDB 1BJU== The Denovo module of DOCK is a relatively new feature (as of Fall 2016) that constructs new ligand molecules inside a protein ...") |
(→Running De novo) |
||
(3 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
==2017 Denovo design tutorial 1 with PDB 1BJU== | ==2017 Denovo design tutorial 1 with PDB 1BJU== | ||
− | The | + | The de novo module of DOCK6 is a relatively new feature (as of Fall 2016) that constructs new ligand molecules inside a protein active site from a library of user-specified fragments, and then scores them based off the scoring method being utilized. These fragments are common chemical functional groups -- or building blocks -- that are typically selected from a ZINC library of millions of compounds based off of their frequency of appearance. These fragments are classified as scaffolds, linkers, or side chains, according to the number of atomic positions that are permitted to seed growth: 3, 2, and 1 atoms, respectively. Thus, a scaffold could seed growth from three different atoms, having three linkers bonded to each position, and a linker could seed growth on two positions, and a side-chain on one position. |
− | Once the molecules are built within the active site, their interactions with the protein are scored using the user-specified method of scoring. This tutorial will walk through the steps needed to run a | + | Once the molecules are built within the active site, their interactions with the protein are scored using the user-specified method of scoring. This tutorial will walk through the steps needed to run a de novo calculation on the Beta Trypsin system from the 2016 DOCK tutorial. This method will utilize the multi-grid scoring function, called through the descriptor score. Ensure you have all the folders and files necessary from running the 2016 tutorial, if not, they can be found on LIRed through this path: /gpfs/home/guest43/scratch/denovo/wiki_denovo/001.files/ |
Before running the calculation, it's worth looking through the "Things to Keep in Mind" section at the bottom for some good pieces of information. | Before running the calculation, it's worth looking through the "Things to Keep in Mind" section at the bottom for some good pieces of information. | ||
− | ===The Files Needed For | + | ===The Files Needed For De novo=== |
− | To run the | + | To run the de novo code with multigrid scoring you need these files: |
fraglib_scaffold.mol2 | fraglib_scaffold.mol2 | ||
fraglib_linker.mol2 | fraglib_linker.mol2 | ||
Line 21: | Line 21: | ||
flex.defn | flex.defn | ||
flex_drive.tbl | flex_drive.tbl | ||
− | |||
− | |||
− | + | The fragment libraries and parameter files must be obtained prior to the de novo growth, and can be found on LIRed through the paths | |
− | |||
− | Everything else is generated through this tutorial, prior to running the | + | /PATH/trial_denovo/000.fraglib |
+ | /PATH/DOCK6/parameters/ | ||
+ | |||
+ | Everything else is generated through this tutorial, prior to running the de novo code. | ||
==Preparing The Files== | ==Preparing The Files== | ||
− | Before running | + | Before running de novo on Beta-Trypsin, please ensure you have gone through the DOCK 2016 tutorial and have all the resulting files. The tutorial can be accessed through [http://ringo.ams.sunysb.edu/index.php/2016_DOCK_tutorial_with_Beta_Trypsin here]. |
You should have these files in your directory: | You should have these files in your directory: | ||
Line 39: | Line 39: | ||
1BJU.rec.noH.mol2 | 1BJU.rec.noH.mol2 | ||
selected_spheres.sph | selected_spheres.sph | ||
− | Additionally, you will also need these parameter files: | + | Additionally, you will also need these parameter files found in the parameters directory fo DOCK6: |
vdw_AMBER_parm99.defn | vdw_AMBER_parm99.defn | ||
flex.defn | flex.defn | ||
flex_drive.tbl | flex_drive.tbl | ||
− | In order to run | + | In order to run de novo with multigrid scoring, we must first go through several steps: |
1). Create a primary residue text file and a reference text file -- selects the primary residues of interest. | 1). Create a primary residue text file and a reference text file -- selects the primary residues of interest. | ||
Line 51: | Line 51: | ||
3). Minimizes ligand mol2 file using multigrids from previous step. | 3). Minimizes ligand mol2 file using multigrids from previous step. | ||
− | 4). Rescores ligand on multigrid to yield a minimized ligand .mol2 file. This serves as the reference ligand for | + | 4). Rescores ligand on multigrid to yield a minimized ligand .mol2 file. This serves as the reference ligand for de novo calculations. |
− | + | There is one script for each step, but we will only use the simple input files for DOCK6. If you are interested in using the scripts (and a lot of debugging), they can be found on lired under: /PATH/trial_denovo/run/ . | |
==DOCK Specifying Primary Residues== | ==DOCK Specifying Primary Residues== | ||
Line 277: | Line 277: | ||
cp 1BJU.lig.multigridmin.mol2 ../001.files/ | cp 1BJU.lig.multigridmin.mol2 ../001.files/ | ||
− | Change the path to Dock and your primary residue file if necessary, and ensure you are using a version of Dock with the | + | Change the path to Dock and your primary residue file if necessary, and ensure you are using a version of Dock with the de novo code. If you get an error that says something like "cannot stat *.nrg / *.bmp" etc, check to make sure your directories are all pointing to the right places in your two input files. |
After running this script, you should be given a plethora of different files. If you are running on the 1BJU system, you should have 19 different residues: 18 individual residues, and a 19th file containing the grid for the rest of the residues. You will have four files for each residue: a .bmp file, a .mol2 file, a .nrg file, and a .out file (for each residue!). Additionally you should have two other files: | After running this script, you should be given a plethora of different files. If you are running on the 1BJU system, you should have 19 different residues: 18 individual residues, and a 19th file containing the grid for the rest of the residues. You will have four files for each residue: a .bmp file, a .mol2 file, a .nrg file, and a .out file (for each residue!). Additionally you should have two other files: | ||
1BJU.lig.multigridmin.mol2, | 1BJU.lig.multigridmin.mol2, | ||
Line 372: | Line 372: | ||
'''This will serve as our reference molecule for guided growth!''' | '''This will serve as our reference molecule for guided growth!''' | ||
− | == Running | + | == Running De novo == |
− | We can now run | + | We can now run de novo growth! Rejoice! Compared to the previous steps, this part is fairly straight forward. Simply create the input file, and create a script to submit it to the cluster. WE will be using a generic library from a library of druglike molecules provided in the dock6 distribution in the parameters directory. |
=== Creating The Input File === | === Creating The Input File === | ||
Line 381: | Line 381: | ||
conformer_search_type denovo | conformer_search_type denovo | ||
− | dn_fraglib_scaffold_file / | + | dn_fraglib_scaffold_file /PATH/trial_denovo/000.fraglib/fraglib_scaffold.mol2 |
− | dn_fraglib_linker_file / | + | dn_fraglib_linker_file /PATH/trial_denovo/000.fraglib/fraglib_linker.mol2 |
− | dn_fraglib_sidechain_file / | + | dn_fraglib_sidechain_file /PATH/trial_denovo/000.fraglib/fraglib_sidechain.mol2 |
dn_user_specified_anchor yes | dn_user_specified_anchor yes | ||
dn_fraglib_anchor_file 03_anchors_byfreq.mol2 | dn_fraglib_anchor_file 03_anchors_byfreq.mol2 | ||
dn_use_torenv_table yes | dn_use_torenv_table yes | ||
− | dn_torenv_table / | + | dn_torenv_table /PATH/trial_denovo/000.fraglib/fraglib_torenv.dat |
dn_sampling_method graph | dn_sampling_method graph | ||
dn_graph_max_picks 30 | dn_graph_max_picks 30 | ||
Line 506: | Line 506: | ||
There are a few things to note here: you must specify a .mol2 file for the scaffold, linker, and sidechain libraries; you must specify your anchor library, which must be tailored prior to the calculation to include the specific anchors you would like to seed from; finally, even though we call upon the descriptor score, we only do so to call our multigrid scoring function -- we are not using descriptor grid score. This would be the same as running with descriptor_score = no and multigrid_score = yes, but it is standard protocol to call any and all scoring functions through descriptor score, regardless of if you're using it or not. | There are a few things to note here: you must specify a .mol2 file for the scaffold, linker, and sidechain libraries; you must specify your anchor library, which must be tailored prior to the calculation to include the specific anchors you would like to seed from; finally, even though we call upon the descriptor score, we only do so to call our multigrid scoring function -- we are not using descriptor grid score. This would be the same as running with descriptor_score = no and multigrid_score = yes, but it is standard protocol to call any and all scoring functions through descriptor score, regardless of if you're using it or not. | ||
− | === Creating a Script to Submit | + | === Creating a Script to Submit De novo to LIRED === |
Now we want to generate a script that will call Dock to run the input file we just generated. Why do we need to make a script instead of submitting it directly to Dock? Denovo calculations take a good amount of time and can get very computationally expensive, thus we will want to submit it to a cluster using qsub. Generate a script with the following inside it: | Now we want to generate a script that will call Dock to run the input file we just generated. Why do we need to make a script instead of submitting it directly to Dock? Denovo calculations take a good amount of time and can get very computationally expensive, thus we will want to submit it to a cluster using qsub. Generate a script with the following inside it: | ||
Line 534: | Line 534: | ||
==Things to Keep in Mind== | ==Things to Keep in Mind== | ||
− | When running | + | When running de novo for the first time, it is '''strongly''' encouraged that you '''run it through interactive mode first''': that is, generate an empty input file, and run the code inputting the parameters manually for each question. This will give you a good idea of what it wants, what it's doing, and where any potential errors you may come across are originating from. |
− | The | + | The de novo code takes anywhere from 4-8 hours per anchor for 15 molecules per layer depending on a myriad of factors: the anchor being used, the specific system, the number of grids, the scoring function, etc. |
− | If you submit an anchor library containing more anchors than you will use (ex: library has 100 anchors, you're only using five) '''the | + | If you submit an anchor library containing more anchors than you will use (ex: library has 100 anchors, you're only using five) '''the de novo code will automatically pick the largest anchors!''' Thus, if you do not specify your anchors, upon finishing your calculation and reviewing your structures, you will notice a disturbing amount of large ring structures. To get around this, be sure to use an anchor library which you have '''personally compiled''' and be aware which order it will run the calculation in (it chooses the largest molecular weight anchor first). |
− | It has been determined that the | + | It has been determined that the de novo code is '''sequence independent'''. Meaning that the results do not depend on the order of their calculation. For example, if you have in your anchor library file anchors A, B, and C for a de novo calculation, you will receive the same results (molecules, conformations, and scores) as if you had run the calculation for A, B, and C individually, with each structure in their own anchor file. |
For multigrid scoring, you do not need to specify a dummy atom, or use the corresponding dummy_H parameter file. For other types of scoring functions you will have to specify in your anchor files which atoms are the dummy atoms. | For multigrid scoring, you do not need to specify a dummy atom, or use the corresponding dummy_H parameter file. For other types of scoring functions you will have to specify in your anchor files which atoms are the dummy atoms. | ||
Line 546: | Line 546: | ||
Dock can be finicky about paths. Sometimes it doesn't want full paths (i.e. originating from the top directory, /gpfs), but other times it wants the explicit path in its entirety. If you keep receiving an error about a file location, and you are positive you have entered the correct path, try either reducing the path as much as possible (starting from your home directory, ~/ ) or try including the full path if you have not. | Dock can be finicky about paths. Sometimes it doesn't want full paths (i.e. originating from the top directory, /gpfs), but other times it wants the explicit path in its entirety. If you keep receiving an error about a file location, and you are positive you have entered the correct path, try either reducing the path as much as possible (starting from your home directory, ~/ ) or try including the full path if you have not. | ||
− | If Dock does not accept "denovo" as a conformer_search_type then you are not running a version that contains the | + | If Dock does not accept "denovo" as a conformer_search_type then you are not running a version that contains the de novo code. |
− | We have ran the | + | We have ran the de novo code here '''linearly''', that is with each anchor being ran to completion before beginning the next. The de novo calculations finish all the layers of an anchor before moving on to the next one. For larger calculations (with more than 5-10 anchors) you may want to consider running the calculation serially, essentially making each anchor its own job, and then collating the data. This will run much quicker, but will require additional post-processing of the data. '''Somewhere around fifteen anchors per run is ideal.''' |
− | An enormous amount of credit goes to Brian for generating the scripts and general protocol for running | + | An enormous amount of credit goes to Brian for generating the scripts and general protocol for running de novo DOCK. I could not have figured it out without his groundwork. Thanks Brian! |
− | ===Length of Denovo | + | ===Length of Denovo Growth=== |
− | The | + | The de novo code can take a large amount of time, especially as the number of anchors and layers is increased. To give an idea of how long the de novo calculations take, below are some details from different runs on Beta-Trypsin (the .out file from the de novo code has the total calculation time in seconds at the bottom): |
1). 3 anchors with 9 molecules per layer: ~ 16.7 hours (~ 5.6 hours per anchor) | 1). 3 anchors with 9 molecules per layer: ~ 16.7 hours (~ 5.6 hours per anchor) | ||
2). 15 anchors with 25 molecules per layer: ~ 107.7 hours (~7 hours per anchor) | 2). 15 anchors with 25 molecules per layer: ~ 107.7 hours (~7 hours per anchor) | ||
3). 1 anchor with 25 molecules per layer: 8.1 hours | 3). 1 anchor with 25 molecules per layer: 8.1 hours |
Latest revision as of 11:52, 17 December 2019
Contents
2017 Denovo design tutorial 1 with PDB 1BJU
The de novo module of DOCK6 is a relatively new feature (as of Fall 2016) that constructs new ligand molecules inside a protein active site from a library of user-specified fragments, and then scores them based off the scoring method being utilized. These fragments are common chemical functional groups -- or building blocks -- that are typically selected from a ZINC library of millions of compounds based off of their frequency of appearance. These fragments are classified as scaffolds, linkers, or side chains, according to the number of atomic positions that are permitted to seed growth: 3, 2, and 1 atoms, respectively. Thus, a scaffold could seed growth from three different atoms, having three linkers bonded to each position, and a linker could seed growth on two positions, and a side-chain on one position.
Once the molecules are built within the active site, their interactions with the protein are scored using the user-specified method of scoring. This tutorial will walk through the steps needed to run a de novo calculation on the Beta Trypsin system from the 2016 DOCK tutorial. This method will utilize the multi-grid scoring function, called through the descriptor score. Ensure you have all the folders and files necessary from running the 2016 tutorial, if not, they can be found on LIRed through this path: /gpfs/home/guest43/scratch/denovo/wiki_denovo/001.files/
Before running the calculation, it's worth looking through the "Things to Keep in Mind" section at the bottom for some good pieces of information.
The Files Needed For De novo
To run the de novo code with multigrid scoring you need these files:
fraglib_scaffold.mol2 fraglib_linker.mol2 fraglib_sidechain.mol2 anchor_library.mol2 fraglib_torenv.dat selected_spheres.sph primary_residues_multigrid.bmp / .nrg multigrid_minimized_ligand.mol2 vdw_AMBER_parm99.defn flex.defn flex_drive.tbl
The fragment libraries and parameter files must be obtained prior to the de novo growth, and can be found on LIRed through the paths
/PATH/trial_denovo/000.fraglib /PATH/DOCK6/parameters/
Everything else is generated through this tutorial, prior to running the de novo code.
Preparing The Files
Before running de novo on Beta-Trypsin, please ensure you have gone through the DOCK 2016 tutorial and have all the resulting files. The tutorial can be accessed through here.
You should have these files in your directory:
1BJU.pdb 1BJU.lig.mol2 1BJU.rec.clean.mol2 1BJU.rec.noH.mol2 selected_spheres.sph
Additionally, you will also need these parameter files found in the parameters directory fo DOCK6:
vdw_AMBER_parm99.defn flex.defn flex_drive.tbl
In order to run de novo with multigrid scoring, we must first go through several steps:
1). Create a primary residue text file and a reference text file -- selects the primary residues of interest.
2). Make a multigrid file for each specified residue -- forms a grid for each residue specified in previous step.
3). Minimizes ligand mol2 file using multigrids from previous step.
4). Rescores ligand on multigrid to yield a minimized ligand .mol2 file. This serves as the reference ligand for de novo calculations.
There is one script for each step, but we will only use the simple input files for DOCK6. If you are interested in using the scripts (and a lot of debugging), they can be found on lired under: /PATH/trial_denovo/run/ .
DOCK Specifying Primary Residues
Create a directory within your working directory titled 008.footprint_rescore. This is where all pertinent files from this step will go, and where we will run our calculation from.
The input file for this step should be titled 1BJU.footprint_rescore.in, and should look like (substitute your own working directory for ${WORKDIR}) :
conformer_search_type rigid use_internal_energy no ligand_atom_file ${WORKDIR}/001.files/1BJU.lig.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary no multigrid_score_secondary no dock3.5_score_primary no dock3.5_score_secondary no continuous_score_primary no continuous_score_secondary no footprint_similarity_score_primary yes footprint_similarity_score_secondary no fps_use_footprint_reference_mol2 yes fps_footprint_reference_mol2_filename ${WORKDIR}/001.files/1BJU.lig.mol2 fps_foot_compare_type d fps_normalize_foot no fps_foot_comp_all_residue no fps_choose_foot_range_type threshold fps_vdw_threshold 1 fps_es_threshold 0.5 fps_hb_threshold 0.5 fps_use_remainder yes fps_receptor_filename ${WORKDIR}/001.files/1BJU.rec.clean.mol2 fps_vdw_att_exp 6 fps_vdw_rep_exp 12 fps_vdw_rep_rad_scale 1 fps_use_distance_dependent_dielectric yes fps_dielectric 4.0 fps_vdw_fp_scale 1 fps_es_fp_scale 1 fps_hb_fp_scale 0 descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_descriptor_score_secondary no amber_score_secondary no minimize_ligand no atom_model all vdw_defn_file ${WORKDIR}/001.files/vdw_AMBER_parm99.defn flex_defn_file ${WORKDIR}/001.files/flex.defn flex_drive_file ${WORKDIR}/001.files/flex_drive.tbl ligand_outfile_prefix output write_footprints yes write_hbonds no write_orientations no num_scored_conformers 1 rank_ligands no
This calculation should be done very quickly (<10 seconds), and upon finishing you will have three output files:
1BJU.footprint_rescore.out output_footprint_scored.txt output_scored.mol2
Now, we must declare the primary residues in the active site and generate a grid file for each. Create a new file in the text editor named 1BJU.primary_residues.sh. Write this inside of it (copied from Brian's script *.fpsrescore.qsub.sh):
#!/bin/bash grep -A 1 "range_union" 1BJU.footprint_rescore.out | grep -v "range_union" | grep -v "\-" | sed -e '{s/,/\n/g}' | sed -e '{s/ //g}' | sed '/^$/d' | sort -n | uniq > temp.dat for i in `cat temp.dat`; do printf "%0*d\n" 3 $i; done > 1BJU.primary_residues.dat for RES in `cat temp.dat` do grep " ${RES} " output_footprint_scored.txt | awk -v temp=${RES} '{if ($2 == temp) print $0;}' | awk '{print $1 " " $3 " " $4}' >> reference.txt done grep "remainder" output_footprint_scored.txt | sed -e '{s/,/ /g}' | tr -d '\n' | awk '{print $2 " " $3 " " $6}' >> reference.txt mv reference.txt 1BJU.reference.txt rm temp.dat
Run the script and you should have two new files:
1BJU.primary_residues.dat 1BJU.reference.txt
These are our primary residues! Now we need to generate a grid for each one.
Generating the Grids
We must now generate a grid file for each residue. To do so, we will need the aid of another one of Brian's scripts: 1BJU.make_multigrids.qsub.sh. But before we can use his script, we need to generate two input files for Dock which will be called upon by the script. Create a file named 1BJU.multigrid.in inside your 007.multigrid folder with the following inside it:
compute_grids yes grid_spacing 0.4 output_molecule yes contact_score no chemical_score no energy_score yes energy_cutoff_distance 9999 atom_model a attractive_exponent 6 repulsive_exponent 9 distance_dielectric yes dielectric_factor 4 bump_filter yes bump_overlap 0.75 receptor_file temp.mol2 box_file ../001.files/1bju.box.pdb vdw_definition_file ../001.files/vdw_AMBER_parm99.defn chemical_definition_file ../001.files/chem.defn score_grid_prefix temp.rec receptor_out_file temp.rec.grid.mol2
Additionally, create a file named 1BJU.reference_multigrid.in:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../001.files/1BJU.lig.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../001.files/1BJU.lig.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary yes multigrid_score_secondary no multigrid_score_rep_rad_scale 1 multigrid_score_vdw_scale 1 multigrid_score_es_scale 1 multigrid_score_number_of_grids 19 multigrid_score_grid_prefix0 ../007.multigrid/1BJU.resid_001 multigrid_score_grid_prefix1 ../007.multigrid/1BJU.resid_040 multigrid_score_grid_prefix2 ../007.multigrid/1BJU.resid_081 multigrid_score_grid_prefix3 ../007.multigrid/1BJU.resid_084 multigrid_score_grid_prefix4 ../007.multigrid/1BJU.resid_171 multigrid_score_grid_prefix5 ../007.multigrid/1BJU.resid_172 multigrid_score_grid_prefix6 ../007.multigrid/1BJU.resid_173 multigrid_score_grid_prefix7 ../007.multigrid/1BJU.resid_174 multigrid_score_grid_prefix8 ../007.multigrid/1BJU.resid_176 multigrid_score_grid_prefix9 ../007.multigrid/1BJU.resid_177 multigrid_score_grid_prefix10 ../007.multigrid/1BJU.resid_191 multigrid_score_grid_prefix11 ../007.multigrid/1BJU.resid_192 multigrid_score_grid_prefix12 ../007.multigrid/1BJU.resid_193 multigrid_score_grid_prefix13 ../007.multigrid/1BJU.resid_194 multigrid_score_grid_prefix14 ../007.multigrid/1BJU.resid_196 multigrid_score_grid_prefix15 ../007.multigrid/1BJU.resid_197 multigrid_score_grid_prefix16 ../007.multigrid/1BJU.resid_204 multigrid_score_grid_prefix17 ../007.multigrid/1BJU.resid_206 multigrid_score_grid_prefix18 /gpfs/home/guest43/scratch/denovo/wiki_denovo/007.multigrid/1BJU.resid_remaining multigrid_score_fp_ref_mol no multigrid_score_fp_ref_text yes multigrid_score_footprint_text /gpfs/home/guest43/scratch/denovo/wiki_denovo/008.footprint_rescore/1BJU.reference.txt multigrid_score_use_euc yes multigrid_score_use_norm_euc no multigrid_score_use_cor no multigrid_vdw_euc_scale 1 multigrid_es_euc_scale 1 dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no ph4_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_descriptor_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min yes simplex_coefficient_restraint 5.0 atom_model all vdw_defn_file ../zzz.parameters/vdw_AMBER_parm99.defn flex_defn_file ../zzz.parameters/flex.defn flex_drive_file ../zzz.parameters/flex_drive.tbl ligand_outfile_prefix output write_orientations no num_scored_conformers 1 rank_ligands no
Now that we have our input files, we can form the script that will call upon them to generate the grid files for each specified residue. Create a blank file named 1BJU.make_multigrids.qsub.sh in your 007.multigrid folder. Then transcribe into it:
cd /gpfs/home/guest43/scratch/denovo/trial_denovo/009.make-mg/ export PRIMARY_RES=` cat ../008.footprint_rescore/1BJU.primary_residues.dat | sed -e 's/\n/ /g' ` export DOCKHOME="/gpfs/home/guest43/local/dock.6.7_2015-02-17.denovo_paper.2016.05.04/" python /gpfs/home/guest43/local/dock.6.7_2015-02-17.denovo_paper.2016.05.04/bin/multigrid_fp_gen.py ../001.files/1BJU.rec.clean.mol2 1BJU.resid 1BJU.multigrid.in ${PRIMARY_RES} rm temp.mol2 rm 1BJU.resid_*.rec.grid.mol2 /gpfs/home/guest43/local/dock.6.7_2015-02-17.denovo_paper.2016.05.04/bin/dock6 -i 1BJU.reference_multigridmin.in -o 1BJU.reference_multigridmin.out mv output_scored.mol2 1BJU.lig.multigridmin.mol2 cp 1BJU.lig.multigridmin.mol2 ../001.files/
Change the path to Dock and your primary residue file if necessary, and ensure you are using a version of Dock with the de novo code. If you get an error that says something like "cannot stat *.nrg / *.bmp" etc, check to make sure your directories are all pointing to the right places in your two input files. After running this script, you should be given a plethora of different files. If you are running on the 1BJU system, you should have 19 different residues: 18 individual residues, and a 19th file containing the grid for the rest of the residues. You will have four files for each residue: a .bmp file, a .mol2 file, a .nrg file, and a .out file (for each residue!). Additionally you should have two other files: 1BJU.lig.multigridmin.mol2, 1BJU.reference_multigridmin.out. Check your output file for any errors and to make sure everything ran to completion. Visualize your ligand in Chimera to make sure it contains atoms and looks like a real chemical structure. You should have something that looks like this:
Minimizing Ligand on Grids
We're taking the input here straight from Brian's script for this part (run.003e.mg_rescore.sh). Create an input file named 1BJU.parents_multigridmin.in with this inside it:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../001.files/1BJU.lig.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary yes multigrid_score_secondary no multigrid_score_rep_rad_scale 1 multigrid_score_vdw_scale 1 multigrid_score_es_scale 1 multigrid_score_number_of_grids 19 multigrid_score_grid_prefix0 ../007.multigrid/1BJU.resid_001 multigrid_score_grid_prefix1 ../007.multigrid/1BJU.resid_040 multigrid_score_grid_prefix2 ../007.multigrid/1BJU.resid_081 multigrid_score_grid_prefix3 ../007.multigrid/1BJU.resid_084 multigrid_score_grid_prefix4 ../007.multigrid/1BJU.resid_171 multigrid_score_grid_prefix5 ../007.multigrid/1BJU.resid_172 multigrid_score_grid_prefix6 ../007.multigrid/1BJU.resid_173 multigrid_score_grid_prefix7 ../007.multigrid/1BJU.resid_174 multigrid_score_grid_prefix8 ../007.multigrid/1BJU.resid_176 multigrid_score_grid_prefix9 ../007.multigrid/1BJU.resid_177 multigrid_score_grid_prefix10 ../007.multigrid/1BJU.resid_191 multigrid_score_grid_prefix11 ../007.multigrid/1BJU.resid_192 multigrid_score_grid_prefix12 ../007.multigrid/1BJU.resid_193 multigrid_score_grid_prefix13 ../007.multigrid/1BJU.resid_194 multigrid_score_grid_prefix14 ../007.multigrid/1BJU.resid_196 multigrid_score_grid_prefix15 ../007.multigrid/1BJU.resid_197 multigrid_score_grid_prefix16 ../007.multigrid/1BJU.resid_204 multigrid_score_grid_prefix17 ../007.multigrid/1BJU.resid_206 multigrid_score_grid_prefix18 ../007.multigrid/1BJU.resid_remaining multigrid_score_fp_ref_mol no multigrid_score_fp_ref_text yes multigrid_score_footprint_text ../008.footprint_rescore/1BJU.reference.txt multigrid_score_use_euc yes multigrid_score_use_norm_euc no multigrid_score_use_cor no multigrid_vdw_euc_scale 1 multigrid_es_euc_scale 1 dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no ph4_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_descriptor_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min yes simplex_coefficient_restraint 5.0 atom_model all vdw_defn_file /gpfs/home/guest43/scratch/denovo/wiki_denovo/001.files/vdw_AMBER_parm99.defn flex_defn_file /gpfs/home/guest43/scratch/denovo/wiki_denovo/001.files/flex.defn flex_drive_file /gpfs/home/guest43/scratch/denovo/wiki_denovo/001.files/flex_drive.tbl ligand_outfile_prefix output write_orientations no num_scored_conformers 1 rank_ligands no
After running this with dock6 you should have an output file (which should be checked for errors, as always) and a .mol2 file named output_scored.mol2. Rename this to 1BJU.parents_multigridmin.mol2, and visualize it in Chimera, to ensure you still have a realistic molecule. This is the mol2 file of the ligand minimized using the multigrid scoring. This will serve as our reference molecule for guided growth!
Running De novo
We can now run de novo growth! Rejoice! Compared to the previous steps, this part is fairly straight forward. Simply create the input file, and create a script to submit it to the cluster. WE will be using a generic library from a library of druglike molecules provided in the dock6 distribution in the parameters directory.
Creating The Input File
Create a folder named 010.denovo. Then, inside this directory, create an input file with the following inside it:
conformer_search_type denovo dn_fraglib_scaffold_file /PATH/trial_denovo/000.fraglib/fraglib_scaffold.mol2 dn_fraglib_linker_file /PATH/trial_denovo/000.fraglib/fraglib_linker.mol2 dn_fraglib_sidechain_file /PATH/trial_denovo/000.fraglib/fraglib_sidechain.mol2 dn_user_specified_anchor yes dn_fraglib_anchor_file 03_anchors_byfreq.mol2 dn_use_torenv_table yes dn_torenv_table /PATH/trial_denovo/000.fraglib/fraglib_torenv.dat dn_sampling_method graph dn_graph_max_picks 30 dn_graph_breadth 3 dn_graph_depth 2 dn_graph_temperature 100 dn_pruning_conformer_score_cutoff 100.0 dn_pruning_conformer_score_scaling_factor 2.0 dn_pruning_clustering_cutoff 100.0 dn_constraint_mol_wt 750 dn_constraint_rot_bon 15 dn_constraint_formal_charge 2.0 dn_heur_unmatched_num 1 dn_heur_matched_rmsd 2.0 dn_unique_anchors 3 dn_max_grow_layers 9 dn_max_root_size 25 dn_max_layer_size 25 dn_max_current_aps 5 dn_max_scaffolds_per_layer 1 dn_write_checkpoints yes dn_write_prune_dump yes dn_write_orients no dn_write_growth_trees no dn_output_prefix 1BJU.final use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../001.files/selected_spheres.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary no multigrid_score_secondary no dock3.5_score_primary no dock3.5_score_secondary no continuous_score_primary no continuous_score_secondary no footprint_similarity_score_primary no footprint_similarity_score_secondary no ph4_score_primary no ph4_score_secondary no descriptor_score_primary yes descriptor_score_secondary no descriptor_use_grid_score no descriptor_use_multigrid_score yes descriptor_use_pharmacophore_score no descriptor_use_tanimoto no descriptor_use_hungarian no descriptor_multigrid_score_rep_rad_scale 1.0 descriptor_multigrid_score_vdw_scale 1.0 descriptor_multigrid_score_es_scale 1.0 descriptor_multigrid_score_number_of_grids 22 descriptor_multigrid_score_grid_prefix0 ../007.multigrid/1BJU.resid_001 descriptor_multigrid_score_grid_prefix1 ../007.multigrid/1BJU.resid_040 descriptor_multigrid_score_grid_prefix2 ../007.multigrid/1BJU.resid_081 descriptor_multigrid_score_grid_prefix3 ../007.multigrid/1BJU.resid_084 descriptor_multigrid_score_grid_prefix4 ../007.multigrid/1BJU.resid_171 descriptor_multigrid_score_grid_prefix5 ../007.multigrid/1BJU.resid_172 descriptor_multigrid_score_grid_prefix6 ../007.multigrid/1BJU.resid_173 descriptor_multigrid_score_grid_prefix7 ../007.multigrid/1BJU.resid_174 descriptor_multigrid_score_grid_prefix8 ../007.multigrid/1BJU.resid_176 descriptor_multigrid_score_grid_prefix9 ../007.multigrid/1BJU.resid_177 descriptor_multigrid_score_grid_prefix10 ../007.multigrid/1BJU.resid_189 descriptor_multigrid_score_grid_prefix11 ../007.multigrid/1BJU.resid_190 descriptor_multigrid_score_grid_prefix12 ../007.multigrid/1BJU.resid_191 descriptor_multigrid_score_grid_prefix13 ../007.multigrid/1BJU.resid_192 descriptor_multigrid_score_grid_prefix14 ../007.multigrid/1BJU.resid_193 descriptor_multigrid_score_grid_prefix15 ../007.multigrid/1BJU.resid_194 descriptor_multigrid_score_grid_prefix16 ../007.multigrid/1BJU.resid_195 descriptor_multigrid_score_grid_prefix17 ../007.multigrid/1BJU.resid_196 descriptor_multigrid_score_grid_prefix18 ../007.multigrid/1BJU.resid_197 descriptor_multigrid_score_grid_prefix19 ../007.multigrid/1BJU.resid_204 descriptor_multigrid_score_grid_prefix20 ../007.multigrid/1BJU.resid_206 descriptor_multigrid_score_grid_prefix21 ../007.multigrid/1BJU.resid_remaining descriptor_multigrid_score_fp_ref_mol yes descriptor_multigrid_score_footprint_ref ../07.multigrid/1BJU.parents.multigridmin.mol2 descriptor_multigrid_score_use_euc yes descriptor_multigrid_score_use_norm_euc no descriptor_multigrid_score_use_cor no descriptor_multigrid_vdw_euc_scale 1.0 descriptor_multigrid_es_euc_scale 1.0 descriptor_weight_multigrid_score 1 gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_descriptor_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file ../001.files/vdw_AMBER_parm99.defn flex_defn_file ../001.files/flex.defn flex_drive_file ../001.files/flex_drive.tbl
There are a few things to note here: you must specify a .mol2 file for the scaffold, linker, and sidechain libraries; you must specify your anchor library, which must be tailored prior to the calculation to include the specific anchors you would like to seed from; finally, even though we call upon the descriptor score, we only do so to call our multigrid scoring function -- we are not using descriptor grid score. This would be the same as running with descriptor_score = no and multigrid_score = yes, but it is standard protocol to call any and all scoring functions through descriptor score, regardless of if you're using it or not.
Creating a Script to Submit De novo to LIRED
Now we want to generate a script that will call Dock to run the input file we just generated. Why do we need to make a script instead of submitting it directly to Dock? Denovo calculations take a good amount of time and can get very computationally expensive, thus we will want to submit it to a cluster using qsub. Generate a script with the following inside it:
#!/bin/bash #PBS -l walltime=48:00:00 #PBS -l nodes=1:ppn=24 #PBS -q long #PBS -N 1BJU.denovo #PBS -V /gpfs/home/guest43/local/dock.6.7_2015-02-17.denovo_paper.2016.05.04/bin/dock6 -i 1BJU.denovo_mg.in -o 1BJU.denovo_mg.out
Submit this to the queue by typing:
qsub <script_name>
Monitor the output file to see which anchor/layer the calculation is at. Run this calculation from your 010.denovo directory.
Viewing Your Results
After the calculation has (successfully) finished, you should have in your directory a large amount of new files. These files take the form 1BJU.final_anchor_*.prune_dump_layer_*.mol2 and 1BJU.final_anchor_*.root_layer_*.mol2. For each anchor, you will have a number of both of these files equal to the number of molecules per layer you specified in the input file. Additionally, you will have an output file (which, of course, should be checked for errors), and a file named 1BJU.final.denovo_build.mol2 -- this is your final output file containing all of the constructed and scored molecules. We are going to open this in Chimera using ViewDock.
First, in Chimera, open your 1BJU.parents.multigridmin.mol2 file, then on top of that open the cleaned receptor file. Then click Tools > Surface/Binding Analysis > ViewDock and open the 1BJU.final.denovo_build.mol2 file. This file can have upwards of a thousand different molecules in it, depending on how many anchors and layers you used, and can take a little while to open. Once you select the file in ViewDock and click open, most likely Chimera will freeze, and you won't be able to do anything. It must load all the molecules at once, so give it a good five or ten minutes to load before you decide to quit the program. It will open, you just have to be patient.
Once it has loaded you can arrange your molecules based on their descriptor (multigrid) score and view them imposed over the reference ligand in the active site.
Things to Keep in Mind
When running de novo for the first time, it is strongly encouraged that you run it through interactive mode first: that is, generate an empty input file, and run the code inputting the parameters manually for each question. This will give you a good idea of what it wants, what it's doing, and where any potential errors you may come across are originating from.
The de novo code takes anywhere from 4-8 hours per anchor for 15 molecules per layer depending on a myriad of factors: the anchor being used, the specific system, the number of grids, the scoring function, etc.
If you submit an anchor library containing more anchors than you will use (ex: library has 100 anchors, you're only using five) the de novo code will automatically pick the largest anchors! Thus, if you do not specify your anchors, upon finishing your calculation and reviewing your structures, you will notice a disturbing amount of large ring structures. To get around this, be sure to use an anchor library which you have personally compiled and be aware which order it will run the calculation in (it chooses the largest molecular weight anchor first).
It has been determined that the de novo code is sequence independent. Meaning that the results do not depend on the order of their calculation. For example, if you have in your anchor library file anchors A, B, and C for a de novo calculation, you will receive the same results (molecules, conformations, and scores) as if you had run the calculation for A, B, and C individually, with each structure in their own anchor file.
For multigrid scoring, you do not need to specify a dummy atom, or use the corresponding dummy_H parameter file. For other types of scoring functions you will have to specify in your anchor files which atoms are the dummy atoms.
Dock can be finicky about paths. Sometimes it doesn't want full paths (i.e. originating from the top directory, /gpfs), but other times it wants the explicit path in its entirety. If you keep receiving an error about a file location, and you are positive you have entered the correct path, try either reducing the path as much as possible (starting from your home directory, ~/ ) or try including the full path if you have not.
If Dock does not accept "denovo" as a conformer_search_type then you are not running a version that contains the de novo code.
We have ran the de novo code here linearly, that is with each anchor being ran to completion before beginning the next. The de novo calculations finish all the layers of an anchor before moving on to the next one. For larger calculations (with more than 5-10 anchors) you may want to consider running the calculation serially, essentially making each anchor its own job, and then collating the data. This will run much quicker, but will require additional post-processing of the data. Somewhere around fifteen anchors per run is ideal.
An enormous amount of credit goes to Brian for generating the scripts and general protocol for running de novo DOCK. I could not have figured it out without his groundwork. Thanks Brian!
Length of Denovo Growth
The de novo code can take a large amount of time, especially as the number of anchors and layers is increased. To give an idea of how long the de novo calculations take, below are some details from different runs on Beta-Trypsin (the .out file from the de novo code has the total calculation time in seconds at the bottom):
1). 3 anchors with 9 molecules per layer: ~ 16.7 hours (~ 5.6 hours per anchor) 2). 15 anchors with 25 molecules per layer: ~ 107.7 hours (~7 hours per anchor) 3). 1 anchor with 25 molecules per layer: 8.1 hours