Difference between revisions of "2017 Denovo design tutorial 2 with PDB 4QMZ"

From Rizzo_Lab
Jump to: navigation, search
(Running de novo)
 
(43 intermediate revisions by 3 users not shown)
Line 1: Line 1:
--2017 Denovo design tutorial 2 with PDB 4QMZ--
+
==2017 De novo design tutorial 2 with PDB 4QMZ==
  
The Denovo module of DOCK is a relatively new feature as of Fall 2016 that constructs new ligand molecules inside a protein active site from a library of user-specified "fragments." These novel ligand molecules are scored based on a number of unique scoring algorithms/criteria specified. The fragments used are common chemical functional groups -- or building blocks -- that are typically selected from a ZINC library of millions of compounds based off of their frequency of appearance. These fragments are classified as scaffolds, linkers, or side chains, according to the number of atomic positions that are permitted to seed growth: 3, 2, and 1 atoms, respectively. Thus, a scaffold could seed growth from three different atoms, having three linkers bonded to each position, and a linker could seed growth on two positions, and a side-chain on one position.
+
The ''de novo'' module of DOCK6 is a relatively new feature as of Fall 2016 that constructs new ligand molecules inside a protein active site from a library of user-specified fragments. These novel ligand molecules are scored based on a number of unique scoring algorithms/criteria specified. The fragments used are common chemical functional groups -- or building blocks -- that are typically selected from a ZINC library of millions of compounds based off of their frequency of appearance. These fragments are classified as scaffolds, linkers, or side chains, according to the number of atomic positions that are permitted to seed growth: 3, 2, and 1 atoms, respectively. Thus, a scaffold could seed growth from three different atoms, having three linkers bonded to each position, and a linker could seed growth on two positions, and a side-chain on one position.
Once the molecules are built within the active site, their interactions with the protein are scored using the user-specified method of scoring. This tutorial will walk through the steps needed to run a Denovo calculation on the 4QMZ system from the 2017 DOCK tutorial. This method will utilize the multi-grid scoring function, called through the descriptor score. Ensure you have all the folders and files necessary from running the 2017 tutorial. Users are encouraged to run through the traditional DOCK tutorial for the 4qmz system as many of the files are recycled for the denovo experiments.
+
Once the molecules are built within the active site, their interactions with the protein are scored using the user-specified method of scoring employed through DOCK6.  
 +
 
 +
This tutorial will walk through the steps needed to run de novo growth on the 4QMZ system to build a novel chemically feasible ensemble of molecules from the 2017 DOCK6 tutorial. This method will utilize the multigrid scoring function (MGS), called through the descriptor score. Ensure you have all the folders and files necessary from running the 2017 tutorial. Users are encouraged to run through the traditional DOCK6 tutorial for the 4qmz system as many of the files are recycled for the denovo experiments.
 
Before running the calculation, it's worth looking through the "Things to Keep in Mind" section at the bottom for some good pieces of information.
 
Before running the calculation, it's worth looking through the "Things to Keep in Mind" section at the bottom for some good pieces of information.
 +
 +
==Additional Files Needed==
 +
 +
To run the ''de novo'' code with multigrid scoring you need these files:
 +
    fraglib_scaffold.mol2                                                  <-- [Fragment Library Generation]
 +
    fraglib_linker.mol2                                                    <-- [Fragment Library Generation]
 +
    fraglib_sidechain.mol2                                                <-- [Fragment Library Generation]
 +
    anchor_library.mol2                                                    <-- User defined anchor mol2 file with all attachment points written as "Du"
 +
    fraglib_torenv.dat                                                    <-- [Fragment Library Generation]
 +
    selected_spheres.sph                                                  <-- Generated through sphgen
 +
    primary_residues_multigrid.bmp / .nrg                                  <-- Generated through DOCK6 for each of the primary residues
 +
    multigrid_minimized_ligand.mol2                                        <-- Generated through docking and minimizng the reference molecule
 +
    vdw_AMBER_parm99.defn                                                  <-- Located in the parameter file of DOCK6
 +
    flex.defn                                                              <-- Located in the parameter file in DOCK6
 +
    flex_drive.tbl                                                        <-- Located in the parameter file in DOCK6
 +
 
 +
The fragment libraries must be generated ([Fragment Library Generation]) or obtained prior to the de novo calculation:
 +
  /PATH/denovo/trial_denovo/000.fraglib
 +
 +
 +
Everything else is generated through this tutorial, prior to running the ''de novo'' code.
 +
 +
 +
==Preparing The Files==
 +
Before running ''de novo'' on 4QMZ, please ensure you have gone through the DOCK6 2017 tutorial and have all the resulting files. The tutorial can be accessed through here.
 +
You should have these files in your directory:
 +
    4qmz.pdb
 +
    4qmz.lig.mol2
 +
    4qmz.rec.clean.mol2
 +
    4qmz.rec.noH.mol2
 +
    selected_spheres.sph
 +
Additionally, you will also need these parameter files found in the parameters directory of DOCK6:
 +
    vdw_AMBER_parm99.defn
 +
    flex.defn
 +
    flex_drive.tbl
 +
 +
In order to run de novo with multigrid scoring, we must first go through several steps:
 +
1). Create a primary residue text file and a reference text file -- determine the primary residues of interest and score the interactions with the reference ligand.
 +
 +
2). Make a multigrid file for each specified residue -- forms a grid for each residue specified in previous step.
 +
 +
3). Minimizes ligand mol2 file using multigrids from previous step (it is not necessary for the ligand to be minimized in multigrid, singlegrid would suffice).
 +
 +
4). Rescores ligand on multigrid to yield a minimized ligand .mol2 file. This serves as the reference ligand for de novo growth.
 +
 +
There is one script for each step, but we will only use the simple input files for DOCK6.
 +
 +
==DOCK Specifying Primary Residues==
 +
Create a directory within your working directory titled 008.footprint_rescore. This is where all pertinent files from this step will go, and where we will run our calculation from.
 +
The input file for this step should be titled 4qmz.footprint_rescore.in, and should look like (substitute your own directory path ~/your/own/directory/01.dockprep/4qmz.lig.mol2) :
 +
 +
  conformer_search_type                                        rigid
 +
  use_internal_energy                                          no
 +
  ligand_atom_file                                            /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
 +
  limit_max_ligands                                            no
 +
  skip_molecule                                                no
 +
  read_mol_solvation                                          no
 +
  calculate_rmsd                                              no
 +
  use_database_filter                                          no
 +
  orient_ligand                                                no
 +
  bump_filter                                                  no
 +
  score_molecules                                              yes
 +
  contact_score_primary                                        no
 +
  contact_score_secondary                                      no
 +
  grid_score_primary                                          no
 +
  grid_score_secondary                                        no
 +
  multigrid_score_primary                                      no
 +
  multigrid_score_secondary                                    no
 +
  dock3.5_score_primary                                        no
 +
  dock3.5_score_secondary                                      no
 +
  continuous_score_primary                                    no
 +
  continuous_score_secondary                                  no
 +
  footprint_similarity_score_primary                          yes
 +
  footprint_similarity_score_secondary                        no
 +
  fps_score_use_footprint_reference_mol2                      yes
 +
  fps_score_footprint_reference_mol2_filename                  /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
 +
  fps_score_foot_compare_type                                  Euclidean
 +
  fps_score_normalize_foot                                    no
 +
  fps_score_foot_comp_all_residue                              no
 +
  fps_score_choose_foot_range_type                            threshold
 +
  fps_score_vdw_threshold                                      1
 +
  fps_score_es_threshold                                      0.5
 +
  fps_score_hb_threshold                                      0.5
 +
  fps_score_use_remainder                                      yes
 +
  fps_score_receptor_filename                                  /PATH/dock_tutorial/01.dockprep/4qmz.rec.mol2
 +
  fps_score_vdw_att_exp                                        6
 +
  fps_score_vdw_rep_exp                                        12
 +
  fps_score_vdw_rep_rad_scale                                  1
 +
  fps_score_use_distance_dependent_dielectric                  yes
 +
  fps_score_dielectric                                        4.0
 +
  fps_score_vdw_fp_scale                                      1
 +
  fps_score_es_fp_scale                                        1
 +
  fps_score_hb_fp_scale                                        0
 +
  pharmacophore_score_secondary                                no
 +
  descriptor_score_secondary                                  no
 +
  gbsa_zou_score_secondary                                    no
 +
  gbsa_hawkins_score_secondary                                no
 +
  SASA_score_secondary                                        no
 +
  amber_score_secondary                                        no
 +
  minimize_ligand                                              no
 +
  atom_model                                                  all
 +
  vdw_defn_file                                                /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
 +
  flex_defn_file                                              /PATH/DOCK6/parameters/flex.defn
 +
  flex_drive_file                                              /PATH/DOCK6/parameters/flex_drive.tbl
 +
  ligand_outfile_prefix                                        output
 +
  write_footprints                                            yes
 +
  write_hbonds                                                no
 +
  write_orientations                                          no
 +
  num_scored_conformers                                        1
 +
  rank_ligands                                                no
 +
 +
This calculation should be very quick (~10 seconds) and result in three output files:
 +
4qmz.footprint_rescore.out
 +
output_footprint_scored.txt
 +
output_scored.mol2
 +
 +
 +
Now, we must declare the primary residues in the active site and generate a grid file for each. Create a new file in the text editor named 4qmz.primary_residues.sh and write this inside of it (copied from Brian's script *.fpsrescore.qsub.sh):
 +
 +
  #!/bin/bash
 +
  grep -A 1 "range_union" footprintrescore.out |
 +
  grep -v "range_union" |
 +
  grep -v "\-" |
 +
  sed -e '{s/,/\n/g}' |
 +
  sed -e '{s/ //g}' |
 +
  sed '/^$/d' |
 +
  sort -n |
 +
  uniq > temp.dat
 +
  for i in `cat temp.dat`; do printf "%0*d\n" 3 $i; done > 4qmz.primary_residues.dat
 +
  for RES in `cat temp.dat`
 +
  do
 +
          grep " ${RES} " output_footprint_scored.txt  |
 +
          awk -v temp=${RES} '{if ($2 == temp) print $0;}' |
 +
          awk '{print $1 "  " $3 "  " $4}' >> reference.txt
 +
  done
 +
  grep "remainder" output_footprint_scored.txt |
 +
  sed -e '{s/,/  /g}' |
 +
  tr -d '\n' |
 +
  awk '{print $2 "  " $3 "  " $6}' >> reference.txt
 +
  mv reference.txt 4qmz.reference.txt
 +
  rm temp.dat
 +
 +
Run the script and you should have two new files:
 +
4qmz.primary_residues.dat
 +
4qmz.reference.txt
 +
These are our primary residues! Now we need to generate a grid for each one.
 +
 +
 +
==Generating the Grids==
 +
 +
We must now generate a grid file for each residue. We need to generate two input files for DOCK6 which will be called upon by the script. Create a file named 4qmz.multigrid.in inside your 007.multigrid folder with the following inside it:
 +
 +
  compute_grids                  yes
 +
  grid_spacing                  0.4
 +
  output_molecule                yes
 +
  contact_score                  no
 +
  chemical_score                no
 +
  energy_score                  yes
 +
  energy_cutoff_distance        9999
 +
  atom_model                    a
 +
  attractive_exponent            6
 +
  repulsive_exponent            9
 +
  distance_dielectric            yes
 +
  dielectric_factor              4
 +
  bump_filter                    yes
 +
  bump_overlap                  0.75
 +
  receptor_file                  temp.mol2
 +
  box_file                      ../03.box-grid/4qmz.box.pdb
 +
  vdw_definition_file            /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
 +
  chemical_definition_file      /PATH/DOCK6/parameters/chem.defn
 +
  score_grid_prefix              temp.rec
 +
  receptor_out_file              temp.rec.grid.mol2
 +
 +
Additionally, create a file named 4qmz.reference_multigrid.in:
 +
 +
conformer_search_type                                        rigid
 +
use_internal_energy                                          yes
 +
internal_energy_rep_exp                                      12
 +
internal_energy_cutoff                                      100.0
 +
ligand_atom_file                                            /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
 +
limit_max_ligands                                            no
 +
skip_molecule                                                no
 +
read_mol_solvation                                          no
 +
calculate_rmsd                                              yes
 +
use_rmsd_reference_mol                                      yes
 +
rmsd_reference_filename                                      /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
 +
use_database_filter                                          no
 +
orient_ligand                                                no
 +
bump_filter                                                  no
 +
score_molecules                                              yes
 +
contact_score_primary                                        no
 +
contact_score_secondary                                      no
 +
grid_score_primary                                          no
 +
grid_score_secondary                                        no
 +
multigrid_score_primary                                      yes
 +
multigrid_score_secondary                                    no
 +
multigrid_score_rep_rad_scale                                1
 +
multigrid_score_vdw_scale                                    1
 +
multigrid_score_es_scale                                    1
 +
multigrid_score_number_of_grids                              19
 +
multigrid_score_grid_prefix0                                ../10.multigrid/4qmz.resid_017
 +
multigrid_score_grid_prefix1                                ../10.multigrid/4qmz.resid_018
 +
multigrid_score_grid_prefix2                                ../10.multigrid/4qmz.resid_019
 +
multigrid_score_grid_prefix3                                ../10.multigrid/4qmz.resid_026
 +
multigrid_score_grid_prefix4                                ../10.multigrid/4qmz.resid_039
 +
multigrid_score_grid_prefix5                                ../10.multigrid/4qmz.resid_071
 +
multigrid_score_grid_prefix6                                ../10.multigrid/4qmz.resid_087
 +
multigrid_score_grid_prefix7                                ../10.multigrid/4qmz.resid_088
 +
multigrid_score_grid_prefix8                                ../10.multigrid/4qmz.resid_089
 +
multigrid_score_grid_prefix9                                ../10.multigrid/4qmz.resid_090
 +
multigrid_score_grid_prefix10                                ../10.multigrid/4qmz.resid_091
 +
multigrid_score_grid_prefix11                                ../10.multigrid/4qmz.resid_093
 +
multigrid_score_grid_prefix12                                ../10.multigrid/4qmz.resid_097
 +
multigrid_score_grid_prefix13                                ../10.multigrid/4qmz.resid_100
 +
multigrid_score_grid_prefix14                                ../10.multigrid/4qmz.resid_139
 +
multigrid_score_grid_prefix15                                ../10.multigrid/4qmz.resid_279
 +
multigrid_score_grid_prefix16                                ../10.multigrid/4qmz.resid_280
 +
multigrid_score_grid_prefix17                                ../10.multigrid/4qmz.resid_283
 +
multigrid_score_grid_prefix18                                /PATH/dock_tutorial/10.multigrid/4qmz.resid_remaining
 +
multigrid_score_fp_ref_mol                                  no
 +
multigrid_score_fp_ref_text                                  yes
 +
multigrid_score_footprint_text                              /PATH/dock_tutorial/09.footprint_rescore/4qmz.reference.txt
 +
multigrid_score_use_euc                                      yes
 +
multigrid_score_use_norm_euc                                no
 +
multigrid_score_use_cor                                      no
 +
multigrid_vdw_euc_scale                                      1
 +
multigrid_es_euc_scale                                      1
 +
dock3.5_score_secondary                                      no
 +
continuous_score_secondary                                  no
 +
footprint_similarity_score_secondary                        no
 +
ph4_score_secondary                                          no
 +
descriptor_score_secondary                                  no
 +
gbsa_zou_score_secondary                                    no
 +
gbsa_hawkins_score_secondary                                no
 +
SASA_descriptor_score_secondary                              no
 +
amber_score_secondary                                        no
 +
minimize_ligand                                              yes
 +
simplex_max_iterations                                      1000
 +
simplex_tors_premin_iterations                              0
 +
simplex_max_cycles                                          1
 +
simplex_score_converge                                      0.1
 +
simplex_cycle_converge                                      1.0
 +
simplex_trans_step                                          1.0
 +
simplex_rot_step                                            0.1
 +
simplex_tors_step                                            10.0
 +
simplex_random_seed                                          0
 +
simplex_restraint_min                                        yes
 +
simplex_coefficient_restraint                                5.0
 +
atom_model                                                  all
 +
vdw_defn_file                                                /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
 +
flex_defn_file                                              /PATH/DOCK6/parameters/flex.defn
 +
flex_drive_file                                              /PATH/DOCK6/parameters/flex_drive.tbl
 +
ligand_outfile_prefix                                        output
 +
write_orientations                                          no
 +
num_scored_conformers                                        1
 +
rank_ligands                                                no
 +
 +
Now that we have our input files, we can form the script that will call upon them to generate the grid files for each specified residue. Create a blank file named 4qmz.make_multigrids.qsub.sh in your 007.multigrid folder. Then transcribe into it:
 +
 +
cd /PATH/dock_tutorial/09.footprint_rescore
 +
export PRIMARY_RES=` cat 4qmz.primary_residues.dat | sed -e 's/\n/ /g' `
 +
export DOCKHOME="/gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/"
 +
python /PATH/DOCK6/bin/multigrid_fp_gen.py 4qmz.rec.mol2 4qmz.resid 4qmz.multigrid.in ${PRIMARY_RES}
 +
rm temp.mol2
 +
rm 4qmz.resid_*.rec.grid.mol2
 +
/PATH/DOCK6/bin/dock6.dn -i 4qmz.reference_multigridmin.in -o 4qmz.reference_multigridmin.out
 +
mv output_scored.mol2 4qmz.lig.multigridmin.mol2
 +
cp 4qmz.lig.multigridmin.mol2 ../10.multigrid
 +
 +
Change the path to DOCK6 and your primary residue file if necessary, and ensure you are using a version of DOCK6 with the ''de novo'' code. If you get an error that says something like "cannot stat *.nrg / *.bmp" etc, check to make sure your directories are all pointing to the right places in your two input files. After running this script, you should be given a plethora of different files. If you are running on the 4qmz system, you should have 19 different residues: 18 individual residues, and a 19th file containing the grid for the rest of the residues. You will have four files for each residue: a .bmp file, a .mol2 file, a .nrg file, and a .out file (for each residue!). Additionally you should have two other files: 4qmz.lig.multigridmin.mol2, 4qmz.reference_multigridmin.out. Check your output file for any errors and to make sure everything ran to completion. Visualize your ligand in Chimera to make sure it contains atoms and looks like a real chemical structure. You should have something that looks like this:
 +
 +
  [[File:4qmz multigrid min.png]]
 +
 +
In addition to ensuring the ligand still seems reasonable, it may be worthwhile and interesting to visualize the ligand with the primary residues to create a distilled down active site like this (ligand is highlighted green for ease of visualization):
 +
 +
[[File:4qmz w prim res.png]]
 +
 +
[[File:4qmz w prim res2.png]]
 +
 +
[[File:4qmz w prim res3.png]]
 +
 +
[[File:4qmz w prim res4.png]]
 +
 +
==Minimizing Ligand on the Grids==
 +
 +
We're resoring the reference in grid space. Create an input file named 4qmz.parents_multigridmin.in with this inside it:
 +
 +
conformer_search_type                                        rigid
 +
use_internal_energy                                          yes
 +
internal_energy_rep_exp                                      12
 +
internal_energy_cutoff                                      100.0
 +
ligand_atom_file                                            /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
 +
limit_max_ligands                                            no
 +
skip_molecule                                                no
 +
read_mol_solvation                                          no
 +
calculate_rmsd                                              no
 +
use_database_filter                                          no
 +
orient_ligand                                                no
 +
bump_filter                                                  no
 +
score_molecules                                              yes
 +
contact_score_primary                                        no
 +
contact_score_secondary                                      no
 +
grid_score_primary                                          no
 +
grid_score_secondary                                        no
 +
multigrid_score_primary                                      yes
 +
multigrid_score_secondary                                    no
 +
multigrid_score_rep_rad_scale                                1
 +
multigrid_score_vdw_scale                                    1
 +
multigrid_score_es_scale                                    1
 +
multigrid_score_number_of_grids                              19
 +
multigrid_score_grid_prefix0                                4qmz.resid_017
 +
multigrid_score_grid_prefix1                                4qmz.resid_018
 +
multigrid_score_grid_prefix2                                4qmz.resid_019
 +
multigrid_score_grid_prefix3                                4qmz.resid_026
 +
multigrid_score_grid_prefix4                                4qmz.resid_039
 +
multigrid_score_grid_prefix5                                4qmz.resid_071
 +
multigrid_score_grid_prefix6                                4qmz.resid_087
 +
multigrid_score_grid_prefix7                                4qmz.resid_088
 +
multigrid_score_grid_prefix8                                4qmz.resid_089
 +
multigrid_score_grid_prefix9                                4qmz.resid_090
 +
multigrid_score_grid_prefix10                                4qmz.resid_091
 +
multigrid_score_grid_prefix11                                4qmz.resid_093
 +
multigrid_score_grid_prefix12                                4qmz.resid_097
 +
multigrid_score_grid_prefix13                                4qmz.resid_100
 +
multigrid_score_grid_prefix14                                4qmz.resid_139
 +
multigrid_score_grid_prefix15                                4qmz.resid_279
 +
multigrid_score_grid_prefix16                                4qmz.resid_280
 +
multigrid_score_grid_prefix17                                4qmz.resid_283
 +
multigrid_score_grid_prefix18                                4qmz.resid_remaining
 +
multigrid_score_fp_ref_mol                                  no
 +
multigrid_score_fp_ref_text                                  yes
 +
multigrid_score_footprint_text                              4qmz.reference.txt
 +
multigrid_score_foot_compare_type                            Euclidean
 +
multigrid_score_normalize_foot                              no
 +
multigrid_score_vdw_euc_scale                                1.0
 +
multigrid_score_es_euc_scale                                1.0
 +
dock3.5_score_secondary                                      no
 +
continuous_score_secondary                                  no
 +
footprint_similarity_score_secondary                        no
 +
pharmacophore_score_secondary                                no
 +
descriptor_score_secondary                                  no
 +
gbsa_zou_score_secondary                                    no
 +
gbsa_hawkins_score_secondary                                no
 +
SASA_score_secondary                                        no
 +
amber_score_secondary                                        no
 +
minimize_ligand                                              yes
 +
simplex_max_iterations                                      1000
 +
simplex_tors_premin_iterations                              0
 +
simplex_max_cycles                                          1
 +
simplex_score_converge                                      0.1
 +
simplex_cycle_converge                                      1.0
 +
simplex_trans_step                                          1.0
 +
simplex_rot_step                                            0.1
 +
simplex_tors_step                                            10.0
 +
simplex_random_seed                                          0
 +
simplex_restraint_min                                        yes
 +
simplex_coefficient_restraint                                5.0
 +
atom_model                                                  all
 +
vdw_defn_file                                                /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
 +
flex_defn_file                                              /PATH/DOCK6/parameters/flex.defn
 +
flex_drive_file                                              /PATH/DOCK6/parameters/flex_drive.tbl
 +
ligand_outfile_prefix                                        output
 +
write_footprints                                            no
 +
write_orientations                                          no
 +
num_scored_conformers                                        1
 +
rank_ligands                                                no
 +
 +
After running this with DOCK6 you should have an output file (which should be checked for errors, as always) and a .mol2 file named output_scored.mol2. Rename this to 4qmz.parents_multigridmin.mol2, and visualize it in Chimera, to ensure you still have a realistic molecule. This is the mol2 file of the ligand minimized using the multigrid scoring. This will serve as our reference molecule for guided growth!
 +
 +
==Running ''de novo''==
 +
 +
We can now run ''de novo'' growth! Rejoice! Compared to the previous steps, this part is fairly straight forward. We will be using a generic library made from a library of druglike molecules provided in the dock6 distribution in the parameters directory. Simply create the input file, and create a script to submit it to the cluster.
 +
 +
===Creating the Input File===
 +
 +
Create a folder named 010.denovo. Then, inside this directory, create an input file with the following inside it:
 +
 +
conformer_search_type                                        denovo
 +
dn_fraglib_scaffold_file                                    /PATH/dock_tutorial/000.fraglib/fraglib_scaffold.mol2
 +
dn_fraglib_linker_file                                      /PATH/dock_tutorial/000.fraglib/fraglib_linker.mol2
 +
dn_fraglib_sidechain_file                                    /PATH/dock_tutorial/000.fraglib/fraglib_sidechain.mol2
 +
dn_user_specified_anchor                                    yes
 +
dn_fraglib_anchor_file                                      anchor1.mol2
 +
dn_use_torenv_table                                          yes
 +
dn_torenv_table                                              /PATH/dock_tutorial/000.fraglib/fraglib_torenv.dat
 +
dn_sampling_method                                          graph
 +
dn_graph_max_picks                                          30
 +
dn_graph_breadth                                            3
 +
dn_graph_depth                                              2
 +
dn_graph_temperature                                        100
 +
dn_pruning_conformer_score_cutoff                            100.0
 +
dn_pruning_conformer_score_scaling_factor                    2.0
 +
dn_pruning_clustering_cutoff                                100.0
 +
dn_constraint_mol_wt                                        750
 +
dn_constraint_rot_bon                                        15
 +
dn_constraint_formal_charge                                  2.0
 +
dn_heur_unmatched_num                                        1
 +
dn_heur_matched_rmsd                                        2.0
 +
dn_unique_anchors                                            3
 +
dn_max_grow_layers                                          9
 +
dn_max_root_size                                            25
 +
dn_max_layer_size                                            25
 +
dn_max_current_aps                                          5
 +
dn_max_scaffolds_per_layer                                  1
 +
dn_write_checkpoints                                        yes
 +
dn_write_prune_dump                                          yes
 +
dn_write_orients                                            no
 +
dn_write_growth_trees                                        no
 +
dn_output_prefix                                            4qmz.final
 +
use_internal_energy                                          yes
 +
internal_energy_rep_exp                                      12
 +
internal_energy_cutoff                                      100.0
 +
use_database_filter                                          no
 +
orient_ligand                                                yes
 +
automated_matching                                          yes
 +
receptor_site_file                                          ../02.surface-spheres/selected_spheres.sph
 +
max_orientations                                            1000
 +
critical_points                                              no
 +
chemical_matching                                            no
 +
use_ligand_spheres                                          no
 +
bump_filter                                                  no
 +
score_molecules                                              yes
 +
contact_score_primary                                        no
 +
contact_score_secondary                                      no
 +
grid_score_primary                                          no
 +
grid_score_secondary                                        no
 +
multigrid_score_primary                                      no
 +
multigrid_score_secondary                                    no
 +
dock3.5_score_primary                                        no
 +
dock3.5_score_secondary                                      no
 +
continuous_score_primary                                    no
 +
continuous_score_secondary                                  no
 +
footprint_similarity_score_primary                          no
 +
footprint_similarity_score_secondary                        no
 +
ph4_score_primary                                            no
 +
ph4_score_secondary                                          no
 +
descriptor_score_primary                                    yes
 +
descriptor_score_secondary                                  no
 +
descriptor_use_grid_score                                    no
 +
descriptor_use_multigrid_score                              yes
 +
descriptor_use_pharmacophore_score                          no
 +
descriptor_use_tanimoto                                      no
 +
descriptor_use_hungarian                                    no
 +
descriptor_multigrid_score_rep_rad_scale                    1.0
 +
descriptor_multigrid_score_vdw_scale                        1.0
 +
descriptor_multigrid_score_es_scale                          1.0
 +
descriptor_multigrid_score_number_of_grids                  19
 +
descriptor_multigrid_score_grid_prefix0                      ../09.footprint_rescore/4qmz.resid_017
 +
descriptor_multigrid_score_grid_prefix1                      ../09.footprint_rescore/4qmz.resid_018
 +
descriptor_multigrid_score_grid_prefix2                      ../09.footprint_rescore/4qmz.resid_019
 +
descriptor_multigrid_score_grid_prefix3                      ../09.footprint_rescore/4qmz.resid_026
 +
descriptor_multigrid_score_grid_prefix4                      ../09.footprint_rescore/4qmz.resid_039
 +
descriptor_multigrid_score_grid_prefix5                      ../09.footprint_rescore/4qmz.resid_071
 +
descriptor_multigrid_score_grid_prefix6                      ../09.footprint_rescore/4qmz.resid_087
 +
descriptor_multigrid_score_grid_prefix7                      ../09.footprint_rescore/4qmz.resid_088
 +
descriptor_multigrid_score_grid_prefix8                      ../09.footprint_rescore/4qmz.resid_089
 +
descriptor_multigrid_score_grid_prefix9                      ../09.footprint_rescore/4qmz.resid_090
 +
descriptor_multigrid_score_grid_prefix10                    ../09.footprint_rescore/4qmz.resid_091
 +
descriptor_multigrid_score_grid_prefix11                    ../09.footprint_rescore/4qmz.resid_093
 +
descriptor_multigrid_score_grid_prefix12                    ../09.footprint_rescore/4qmz.resid_097
 +
descriptor_multigrid_score_grid_prefix13                    ../09.footprint_rescore/4qmz.resid_100
 +
descriptor_multigrid_score_grid_prefix14                    ../09.footprint_rescore/4qmz.resid_139
 +
descriptor_multigrid_score_grid_prefix15                    ../09.footprint_rescore/4qmz.resid_279
 +
descriptor_multigrid_score_grid_prefix16                    ../09.footprint_rescore/4qmz.resid_280
 +
descriptor_multigrid_score_grid_prefix17                    ../09.footprint_rescore/4qmz.resid_283
 +
descriptor_multigrid_score_grid_prefix18                    ../09.footprint_rescore/4qmz.resid_remaining
 +
descriptor_multigrid_score_fp_ref_mol                        yes
 +
descriptor_multigrid_score_footprint_ref                    ../09.footprint_rescore/4qmz.parents_multigridmin.mol2
 +
descriptor_multigrid_score_use_euc                          yes
 +
descriptor_multigrid_score_use_norm_euc                      no
 +
descriptor_multigrid_score_use_cor                          no
 +
descriptor_multigrid_vdw_euc_scale                          1.0
 +
descriptor_multigrid_es_euc_scale                            1.0
 +
descriptor_weight_multigrid_score                            1
 +
gbsa_zou_score_secondary                                    no
 +
gbsa_hawkins_score_secondary                                no
 +
SASA_descriptor_score_secondary                              no
 +
amber_score_secondary                                        no
 +
minimize_ligand                                              yes
 +
minimize_anchor                                              yes
 +
minimize_flexible_growth                                    yes
 +
use_advanced_simplex_parameters                              no
 +
simplex_max_cycles                                          1
 +
simplex_score_converge                                      0.1
 +
simplex_cycle_converge                                      1.0
 +
simplex_trans_step                                          1.0
 +
simplex_rot_step                                            0.1
 +
simplex_tors_step                                            10.0
 +
simplex_anchor_max_iterations                                500
 +
simplex_grow_max_iterations                                  500
 +
simplex_grow_tors_premin_iterations                          0
 +
simplex_random_seed                                          0
 +
simplex_restraint_min                                        no
 +
atom_model                                                  all
 +
vdw_defn_file                                                /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
 +
flex_defn_file                                              /PATH/DOCK6/parameters/flex.defn
 +
flex_drive_file                                              /PATH/DOCK6/parameters/flex_drive.tbl
 +
 +
 +
There are a few things to note here: you must specify a .mol2 file for the scaffold, linker, and sidechain libraries; you must specify your anchor library, which must be tailored prior to the calculation to include the specific anchors you would like to seed from; finally, even though we call upon the descriptor score, we only do so to call our multigrid scoring function -- we are not using descriptor grid score. This would be the same as running with descriptor_score = no and multigrid_score = yes, but it is standard protocol to call any and all scoring functions through descriptor score, regardless of if you're using it or not.
 +
 +
===Creating a script to submit to Seawulf===
 +
 +
Now we want to generate a script that will call DOCK6 to run the input file we just generated. Why do we need to make a script instead of submitting it directly to DOCK6? ''De novo'' generic growth take a good amount of time (approximately 5-10 hrs per anchor) and can get very computationally expensive, thus we will want to submit it to a cluster using qsub. Generate a script (denote.sh) with the following inside it:
 +
 +
#!/bin/bash
 +
#PBS -l walltime=48:00:00
 +
#PBS -l nodes=1:ppn=24
 +
#PBS -q long
 +
#PBS -N 4qmz.denovo
 +
#PBS -V
 +
cd $PBS_O_WORKDIR
 +
/gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/bin/dock6.dn -i denovo.in -o 4qmz.denovo_mg.out
 +
 +
 +
Submit this to the queue by typing:
 +
 +
  qsub denovo.sh
 +
 +
Monitor the output file to see which anchor/layer the calculation is at. Run this calculation from your 010.denovo directory.
 +
 +
===Viewing the Results===
 +
 +
After the calculation has (successfully) finished, you should have in your directory a large amount of new files. These files take the form 4qmz.final_anchor_*.prune_dump_layer_*.mol2 and 4qmz.final_anchor_*.root_layer_*.mol2. For each anchor, you will have a number of both of these files equal to the number of molecules per layer you specified in the input file. Additionally, you will have an output file (which, of course, should be checked for errors), and a file named 4qmz.final.denovo_build.mol2 -- this is your final output file containing all of the constructed and scored molecules. We are going to open this in Chimera using ViewDock.
 +
First, in Chimera, open your 4qmz.parents.multigridmin.mol2 file, then on top of that open the cleaned receptor file. Then click Tools > Surface/Binding Analysis > ViewDock and open the 4qmz.final.denovo_build.mol2 file. This file can have upwards of a thousand different molecules in it, depending on how many anchors and layers you used, and can take a little while to open. Once you select the file in ViewDock and click open, most likely Chimera will freeze, and you won't be able to do anything. It must load all the molecules at once, so give it a good five or ten minutes to load before you decide to quit the program. It will open, you just have to be patient.
 +
Once it has loaded you can arrange your molecules based on various parameters contained in the mol2 file. Descriptor (multigrid) score, Mol. weight, and chain/fragment sequence are a few useful metrics for visualizing the newly created molecules in the receptor's active site.
 +
 +
 +
 +
Top descriptor score hits along with fragment strings. Identifies families of related compounds:
 +
 +
[[File:Viewdock1.png]]
 +
 +
 +
 +
Related family of molecules:
 +
 +
[[File:Viewdock2.png]]
 +
 +
 +
 +
Related family of compounds with varying active site conformation:
 +
 +
[[File:Viewdock3.png]]
 +
 +
 +
 +
Related family with conserved binding pocket pose:
 +
 +
[[File:Viewdock4.png]]
 +
 +
 +
 +
Top descriptor scores vs. cognate ligand in red:
 +
 +
[[File:Viewdock6.png]]
 +
 +
==Things to Keep in Mind==
 +
 +
When running de novo for the first time, it is strongly encouraged that you run it through interactive mode first: that is, generate an empty input file, and run the code inputting the parameters manually for each question. This will give you a good idea of what it wants, what it's doing, and where any potential errors you may come across are originating from.
 +
 +
The ''de novo'' code takes anywhere from 4-8 hours per anchor for 15 molecules per layer depending on a myriad of factors: the anchor being used, the specific system, the number of grids, the scoring function, etc.
 +
 +
If you submit an anchor library containing more anchors than you will use (ex: library has 100 anchors, you're only using five) the de novo code will automatically pick the largest anchors! Thus, if you do not specify your anchors, upon finishing your calculation and reviewing your structures, you will notice a disturbing amount of large ring structures. To get around this, be sure to use an anchor library which you have personally compiled and be aware which order it will run the calculation in (it chooses the largest molecular weight anchor first).
 +
 +
It has been determined that the ''de novo'' code is sequence independent. Meaning that the results do not depend on the order of their calculation. For example, if you have in your anchor library file anchors A, B, and C for a de novo calculation, you will receive the same results (molecules, conformations, and scores) as if you had run the calculation for A, B, and C individually, with each structure in their own anchor file.
 +
 +
For multigrid scoring, you do not need to specify a dummy atom, or use the corresponding dummy_H parameter file. For other types of scoring functions you will have to specify in your anchor files which atoms are the dummy atoms.
 +
 +
Dock can be finicky about paths. Sometimes it doesn't want full paths (i.e. originating from the top directory, /gpfs), but other times it wants the explicit path in its entirety. If you keep receiving an error about a file location, and you are positive you have entered the correct path, try either reducing the path as much as possible (starting from your home directory, ~/ ) or try including the full path if you have not.
 +
 +
If DOCK6 does not accept "denovo" as a conformer_search_type then you are not running a version of DOCK6 that contains the de novo code.
 +
 +
 +
===Length of Denovo Calculations===
 +
 +
The de novo code can take a large amount of time, especially as the number of anchors and layers is increased. To give an idea of how long the de novo calculations take, below are some details from different runs on the 4qmz system (the .out file from the de novo code has the total calculation time in seconds at the bottom):
 +
 +
1.) Single anchor (Methylene):  ~3.92 hours
 +
      dn_max_grow_layers                                          9
 +
      dn_max_root_size                                            25
 +
      dn_max_layer_size                                            25
 +
 +
2.) Single Anchor (Carbonyl): ~6.3 hours
 +
      dn_max_grow_layers                                          9
 +
      dn_max_root_size                                            25
 +
      dn_max_layer_size                                            25
 +
 +
 +
3.) Single Anchor (Amine): ~4.99 hours
 +
      dn_max_grow_layers                                          9
 +
      dn_max_root_size                                            25
 +
      dn_max_layer_size                                            25
 +
 +
4.) Single anchor (Methylene): not calculated
 +
      dn_max_grow_layers                                          9
 +
      dn_max_root_size                                            100
 +
      dn_max_layer_size                                            100
 +
 +
5.) Single Anchor (Methylene): ~18.1 hours
 +
      dn_max_grow_layers                                          9
 +
      dn_max_root_size                                            100
 +
      dn_max_layer_size                                            25
 +
     
 +
 +
6.) Single Anchor (Methylene): ~24.2 hours
 +
      dn_max_grow_layers                                          9
 +
      dn_max_root_size                                            25
 +
      dn_max_layer_size                                            100
 +
 +
 +
== Make Unique Script to prune de novo results==
 +
 +
I was able to test a couple of scripts that would filter through resulting mol2 libraries from de novo runs to collect unique results and omit any repeated molecules. This process uses two scripts, zzz.002.makeunique_new_sub.sh and split_on_tanimoto_new.py. It is imperative that the tanimoto score is included in the original de novo calculation in order to execute these scripts. Otherwise results from a de novo run can be rescored using the tanimoto scoring function without a problem.
 +
split_on_tanimoto takes the full de novo mol2 output and compares the tanimotos to a reference ligand. It is easiest to use the cognate ligand from your docking trials however any reference should be appropriate. From here many "bins" containing molecules with the same tanimoto to some reference. From here, zzz.002.makeunique_new_sub.sh compares the tanimoto of the first molecule in the bin (the highest scoring molecule) with the rest of the molecules in the same bin. If there are any tanimoto scores of 1, those molecules are deleted. This process continues until each molecule has been tested. At this point the output should contain a single copy of each of the best scoring molecules generated by a de novo calculation.

Latest revision as of 11:55, 17 December 2019

2017 De novo design tutorial 2 with PDB 4QMZ

The de novo module of DOCK6 is a relatively new feature as of Fall 2016 that constructs new ligand molecules inside a protein active site from a library of user-specified fragments. These novel ligand molecules are scored based on a number of unique scoring algorithms/criteria specified. The fragments used are common chemical functional groups -- or building blocks -- that are typically selected from a ZINC library of millions of compounds based off of their frequency of appearance. These fragments are classified as scaffolds, linkers, or side chains, according to the number of atomic positions that are permitted to seed growth: 3, 2, and 1 atoms, respectively. Thus, a scaffold could seed growth from three different atoms, having three linkers bonded to each position, and a linker could seed growth on two positions, and a side-chain on one position. Once the molecules are built within the active site, their interactions with the protein are scored using the user-specified method of scoring employed through DOCK6.

This tutorial will walk through the steps needed to run de novo growth on the 4QMZ system to build a novel chemically feasible ensemble of molecules from the 2017 DOCK6 tutorial. This method will utilize the multigrid scoring function (MGS), called through the descriptor score. Ensure you have all the folders and files necessary from running the 2017 tutorial. Users are encouraged to run through the traditional DOCK6 tutorial for the 4qmz system as many of the files are recycled for the denovo experiments. Before running the calculation, it's worth looking through the "Things to Keep in Mind" section at the bottom for some good pieces of information.

Additional Files Needed

To run the de novo code with multigrid scoring you need these files:

    fraglib_scaffold.mol2                                                  <-- [Fragment Library Generation]
    fraglib_linker.mol2                                                    <-- [Fragment Library Generation]
    fraglib_sidechain.mol2                                                 <-- [Fragment Library Generation]
    anchor_library.mol2                                                    <-- User defined anchor mol2 file with all attachment points written as "Du"
    fraglib_torenv.dat                                                     <-- [Fragment Library Generation]
    selected_spheres.sph                                                   <-- Generated through sphgen
    primary_residues_multigrid.bmp / .nrg                                  <-- Generated through DOCK6 for each of the primary residues
    multigrid_minimized_ligand.mol2                                        <-- Generated through docking and minimizng the reference molecule
    vdw_AMBER_parm99.defn                                                  <-- Located in the parameter file of DOCK6
    flex.defn                                                              <-- Located in the parameter file in DOCK6
    flex_drive.tbl                                                         <-- Located in the parameter file in DOCK6
  

The fragment libraries must be generated ([Fragment Library Generation]) or obtained prior to the de novo calculation:

  /PATH/denovo/trial_denovo/000.fraglib 


Everything else is generated through this tutorial, prior to running the de novo code.


Preparing The Files

Before running de novo on 4QMZ, please ensure you have gone through the DOCK6 2017 tutorial and have all the resulting files. The tutorial can be accessed through here. You should have these files in your directory:

    4qmz.pdb
    4qmz.lig.mol2
    4qmz.rec.clean.mol2
    4qmz.rec.noH.mol2
    selected_spheres.sph

Additionally, you will also need these parameter files found in the parameters directory of DOCK6:

    vdw_AMBER_parm99.defn
    flex.defn
    flex_drive.tbl

In order to run de novo with multigrid scoring, we must first go through several steps:

1). Create a primary residue text file and a reference text file -- determine the primary residues of interest and score the interactions with the reference ligand.
2). Make a multigrid file for each specified residue -- forms a grid for each residue specified in previous step.
3). Minimizes ligand mol2 file using multigrids from previous step (it is not necessary for the ligand to be minimized in multigrid, singlegrid would suffice).
4). Rescores ligand on multigrid to yield a minimized ligand .mol2 file. This serves as the reference ligand for de novo growth.

There is one script for each step, but we will only use the simple input files for DOCK6.

DOCK Specifying Primary Residues

Create a directory within your working directory titled 008.footprint_rescore. This is where all pertinent files from this step will go, and where we will run our calculation from. The input file for this step should be titled 4qmz.footprint_rescore.in, and should look like (substitute your own directory path ~/your/own/directory/01.dockprep/4qmz.lig.mol2) :

  conformer_search_type                                        rigid
  use_internal_energy                                          no
  ligand_atom_file                                             /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               no
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           no
  grid_score_secondary                                         no
  multigrid_score_primary                                      no
  multigrid_score_secondary                                    no
  dock3.5_score_primary                                        no
  dock3.5_score_secondary                                      no
  continuous_score_primary                                     no
  continuous_score_secondary                                   no
  footprint_similarity_score_primary                           yes
  footprint_similarity_score_secondary                         no
  fps_score_use_footprint_reference_mol2                       yes
  fps_score_footprint_reference_mol2_filename                  /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
  fps_score_foot_compare_type                                  Euclidean
  fps_score_normalize_foot                                     no
  fps_score_foot_comp_all_residue                              no
  fps_score_choose_foot_range_type                             threshold
  fps_score_vdw_threshold                                      1
  fps_score_es_threshold                                       0.5
  fps_score_hb_threshold                                       0.5
  fps_score_use_remainder                                      yes
  fps_score_receptor_filename                                  /PATH/dock_tutorial/01.dockprep/4qmz.rec.mol2
  fps_score_vdw_att_exp                                        6
  fps_score_vdw_rep_exp                                        12
  fps_score_vdw_rep_rad_scale                                  1
  fps_score_use_distance_dependent_dielectric                  yes
  fps_score_dielectric                                         4.0
  fps_score_vdw_fp_scale                                       1
  fps_score_es_fp_scale                                        1
  fps_score_hb_fp_scale                                        0
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              no
  atom_model                                                   all
  vdw_defn_file                                                /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /PATH/DOCK6/parameters/flex.defn
  flex_drive_file                                              /PATH/DOCK6/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        output
  write_footprints                                             yes
  write_hbonds                                                 no
  write_orientations                                           no
  num_scored_conformers                                        1
  rank_ligands                                                 no

This calculation should be very quick (~10 seconds) and result in three output files:

4qmz.footprint_rescore.out 
output_footprint_scored.txt
output_scored.mol2


Now, we must declare the primary residues in the active site and generate a grid file for each. Create a new file in the text editor named 4qmz.primary_residues.sh and write this inside of it (copied from Brian's script *.fpsrescore.qsub.sh):

  #!/bin/bash 
  grep -A 1 "range_union" footprintrescore.out |
  grep -v "range_union" |
  grep -v "\-" |
  sed -e '{s/,/\n/g}' |
  sed -e '{s/ //g}' |
  sed '/^$/d' |
  sort -n |
  uniq > temp.dat
  for i in `cat temp.dat`; do printf "%0*d\n" 3 $i; done > 4qmz.primary_residues.dat
  for RES in `cat temp.dat`
  do
          grep " ${RES} " output_footprint_scored.txt  |
          awk -v temp=${RES} '{if ($2 == temp) print $0;}' |
          awk '{print $1 "  " $3 "  " $4}' >> reference.txt
  done
  grep "remainder" output_footprint_scored.txt |
  sed -e '{s/,/  /g}' |
  tr -d '\n' |
  awk '{print $2 "  " $3 "  " $6}' >> reference.txt
  mv reference.txt 4qmz.reference.txt
  rm temp.dat

Run the script and you should have two new files:

4qmz.primary_residues.dat
4qmz.reference.txt 

These are our primary residues! Now we need to generate a grid for each one.


Generating the Grids

We must now generate a grid file for each residue. We need to generate two input files for DOCK6 which will be called upon by the script. Create a file named 4qmz.multigrid.in inside your 007.multigrid folder with the following inside it:

  compute_grids                  yes
  grid_spacing                   0.4
  output_molecule                yes
  contact_score                  no
  chemical_score                 no
  energy_score                   yes
  energy_cutoff_distance         9999
  atom_model                     a
  attractive_exponent            6
  repulsive_exponent             9
  distance_dielectric            yes
  dielectric_factor              4
  bump_filter                    yes
  bump_overlap                   0.75
  receptor_file                  temp.mol2
  box_file                       ../03.box-grid/4qmz.box.pdb
  vdw_definition_file            /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
  chemical_definition_file       /PATH/DOCK6/parameters/chem.defn
  score_grid_prefix              temp.rec
  receptor_out_file              temp.rec.grid.mol2

Additionally, create a file named 4qmz.reference_multigrid.in:

conformer_search_type                                        rigid 
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100.0
ligand_atom_file                                             /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               yes
use_rmsd_reference_mol                                       yes
rmsd_reference_filename                                      /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
use_database_filter                                          no
orient_ligand                                                no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           no
grid_score_secondary                                         no
multigrid_score_primary                                      yes
multigrid_score_secondary                                    no
multigrid_score_rep_rad_scale                                1
multigrid_score_vdw_scale                                    1
multigrid_score_es_scale                                     1
multigrid_score_number_of_grids                              19
multigrid_score_grid_prefix0                                 ../10.multigrid/4qmz.resid_017
multigrid_score_grid_prefix1                                 ../10.multigrid/4qmz.resid_018
multigrid_score_grid_prefix2                                 ../10.multigrid/4qmz.resid_019
multigrid_score_grid_prefix3                                 ../10.multigrid/4qmz.resid_026
multigrid_score_grid_prefix4                                 ../10.multigrid/4qmz.resid_039
multigrid_score_grid_prefix5                                 ../10.multigrid/4qmz.resid_071
multigrid_score_grid_prefix6                                 ../10.multigrid/4qmz.resid_087
multigrid_score_grid_prefix7                                 ../10.multigrid/4qmz.resid_088
multigrid_score_grid_prefix8                                 ../10.multigrid/4qmz.resid_089
multigrid_score_grid_prefix9                                 ../10.multigrid/4qmz.resid_090
multigrid_score_grid_prefix10                                ../10.multigrid/4qmz.resid_091
multigrid_score_grid_prefix11                                ../10.multigrid/4qmz.resid_093
multigrid_score_grid_prefix12                                ../10.multigrid/4qmz.resid_097
multigrid_score_grid_prefix13                                ../10.multigrid/4qmz.resid_100
multigrid_score_grid_prefix14                                ../10.multigrid/4qmz.resid_139
multigrid_score_grid_prefix15                                ../10.multigrid/4qmz.resid_279
multigrid_score_grid_prefix16                                ../10.multigrid/4qmz.resid_280
multigrid_score_grid_prefix17                                ../10.multigrid/4qmz.resid_283
multigrid_score_grid_prefix18                                /PATH/dock_tutorial/10.multigrid/4qmz.resid_remaining
multigrid_score_fp_ref_mol                                   no
multigrid_score_fp_ref_text                                  yes
multigrid_score_footprint_text                               /PATH/dock_tutorial/09.footprint_rescore/4qmz.reference.txt
multigrid_score_use_euc                                      yes
multigrid_score_use_norm_euc                                 no
multigrid_score_use_cor                                      no
multigrid_vdw_euc_scale                                      1
multigrid_es_euc_scale                                       1
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
footprint_similarity_score_secondary                         no
ph4_score_secondary                                          no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_descriptor_score_secondary                              no
amber_score_secondary                                        no
minimize_ligand                                              yes
simplex_max_iterations                                       1000
simplex_tors_premin_iterations                               0
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_random_seed                                          0
simplex_restraint_min                                        yes
simplex_coefficient_restraint                                5.0
atom_model                                                   all
vdw_defn_file                                                /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               /PATH/DOCK6/parameters/flex.defn
flex_drive_file                                              /PATH/DOCK6/parameters/flex_drive.tbl
ligand_outfile_prefix                                        output
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

Now that we have our input files, we can form the script that will call upon them to generate the grid files for each specified residue. Create a blank file named 4qmz.make_multigrids.qsub.sh in your 007.multigrid folder. Then transcribe into it:

cd /PATH/dock_tutorial/09.footprint_rescore
export PRIMARY_RES=` cat 4qmz.primary_residues.dat | sed -e 's/\n/ /g' `
export DOCKHOME="/gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/"
python /PATH/DOCK6/bin/multigrid_fp_gen.py 4qmz.rec.mol2 4qmz.resid 4qmz.multigrid.in ${PRIMARY_RES}
rm temp.mol2
rm 4qmz.resid_*.rec.grid.mol2
/PATH/DOCK6/bin/dock6.dn -i 4qmz.reference_multigridmin.in -o 4qmz.reference_multigridmin.out
mv output_scored.mol2 4qmz.lig.multigridmin.mol2
cp 4qmz.lig.multigridmin.mol2 ../10.multigrid

Change the path to DOCK6 and your primary residue file if necessary, and ensure you are using a version of DOCK6 with the de novo code. If you get an error that says something like "cannot stat *.nrg / *.bmp" etc, check to make sure your directories are all pointing to the right places in your two input files. After running this script, you should be given a plethora of different files. If you are running on the 4qmz system, you should have 19 different residues: 18 individual residues, and a 19th file containing the grid for the rest of the residues. You will have four files for each residue: a .bmp file, a .mol2 file, a .nrg file, and a .out file (for each residue!). Additionally you should have two other files: 4qmz.lig.multigridmin.mol2, 4qmz.reference_multigridmin.out. Check your output file for any errors and to make sure everything ran to completion. Visualize your ligand in Chimera to make sure it contains atoms and looks like a real chemical structure. You should have something that looks like this:

 4qmz multigrid min.png

In addition to ensuring the ligand still seems reasonable, it may be worthwhile and interesting to visualize the ligand with the primary residues to create a distilled down active site like this (ligand is highlighted green for ease of visualization):

4qmz w prim res.png
4qmz w prim res2.png
4qmz w prim res3.png
4qmz w prim res4.png

Minimizing Ligand on the Grids

We're resoring the reference in grid space. Create an input file named 4qmz.parents_multigridmin.in with this inside it:

conformer_search_type                                        rigid
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100.0
ligand_atom_file                                             /PATH/dock_tutorial/01.dockprep/4qmz.lig.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               no
use_database_filter                                          no
orient_ligand                                                no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           no
grid_score_secondary                                         no
multigrid_score_primary                                      yes
multigrid_score_secondary                                    no
multigrid_score_rep_rad_scale                                1
multigrid_score_vdw_scale                                    1
multigrid_score_es_scale                                     1
multigrid_score_number_of_grids                              19
multigrid_score_grid_prefix0                                 4qmz.resid_017
multigrid_score_grid_prefix1                                 4qmz.resid_018
multigrid_score_grid_prefix2                                 4qmz.resid_019
multigrid_score_grid_prefix3                                 4qmz.resid_026
multigrid_score_grid_prefix4                                 4qmz.resid_039
multigrid_score_grid_prefix5                                 4qmz.resid_071
multigrid_score_grid_prefix6                                 4qmz.resid_087
multigrid_score_grid_prefix7                                 4qmz.resid_088
multigrid_score_grid_prefix8                                 4qmz.resid_089
multigrid_score_grid_prefix9                                 4qmz.resid_090
multigrid_score_grid_prefix10                                4qmz.resid_091
multigrid_score_grid_prefix11                                4qmz.resid_093
multigrid_score_grid_prefix12                                4qmz.resid_097
multigrid_score_grid_prefix13                                4qmz.resid_100
multigrid_score_grid_prefix14                                4qmz.resid_139
multigrid_score_grid_prefix15                                4qmz.resid_279
multigrid_score_grid_prefix16                                4qmz.resid_280
multigrid_score_grid_prefix17                                4qmz.resid_283
multigrid_score_grid_prefix18                                4qmz.resid_remaining
multigrid_score_fp_ref_mol                                   no
multigrid_score_fp_ref_text                                  yes
multigrid_score_footprint_text                               4qmz.reference.txt
multigrid_score_foot_compare_type                            Euclidean
multigrid_score_normalize_foot                               no
multigrid_score_vdw_euc_scale                                1.0
multigrid_score_es_euc_scale                                 1.0
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
footprint_similarity_score_secondary                         no
pharmacophore_score_secondary                                no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_score_secondary                                         no
amber_score_secondary                                        no
minimize_ligand                                              yes
simplex_max_iterations                                       1000
simplex_tors_premin_iterations                               0
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_random_seed                                          0
simplex_restraint_min                                        yes
simplex_coefficient_restraint                                5.0
atom_model                                                   all
vdw_defn_file                                                /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               /PATH/DOCK6/parameters/flex.defn
flex_drive_file                                              /PATH/DOCK6/parameters/flex_drive.tbl
ligand_outfile_prefix                                        output
write_footprints                                             no
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

After running this with DOCK6 you should have an output file (which should be checked for errors, as always) and a .mol2 file named output_scored.mol2. Rename this to 4qmz.parents_multigridmin.mol2, and visualize it in Chimera, to ensure you still have a realistic molecule. This is the mol2 file of the ligand minimized using the multigrid scoring. This will serve as our reference molecule for guided growth!

Running de novo

We can now run de novo growth! Rejoice! Compared to the previous steps, this part is fairly straight forward. We will be using a generic library made from a library of druglike molecules provided in the dock6 distribution in the parameters directory. Simply create the input file, and create a script to submit it to the cluster.

Creating the Input File

Create a folder named 010.denovo. Then, inside this directory, create an input file with the following inside it:

conformer_search_type                                        denovo
dn_fraglib_scaffold_file                                     /PATH/dock_tutorial/000.fraglib/fraglib_scaffold.mol2
dn_fraglib_linker_file                                       /PATH/dock_tutorial/000.fraglib/fraglib_linker.mol2
dn_fraglib_sidechain_file                                    /PATH/dock_tutorial/000.fraglib/fraglib_sidechain.mol2
dn_user_specified_anchor                                     yes
dn_fraglib_anchor_file                                       anchor1.mol2
dn_use_torenv_table                                          yes
dn_torenv_table                                              /PATH/dock_tutorial/000.fraglib/fraglib_torenv.dat
dn_sampling_method                                           graph
dn_graph_max_picks                                           30
dn_graph_breadth                                             3
dn_graph_depth                                               2
dn_graph_temperature                                         100
dn_pruning_conformer_score_cutoff                            100.0
dn_pruning_conformer_score_scaling_factor                    2.0
dn_pruning_clustering_cutoff                                 100.0
dn_constraint_mol_wt                                         750
dn_constraint_rot_bon                                        15
dn_constraint_formal_charge                                  2.0
dn_heur_unmatched_num                                        1
dn_heur_matched_rmsd                                         2.0
dn_unique_anchors                                            3
dn_max_grow_layers                                           9
dn_max_root_size                                             25
dn_max_layer_size                                            25
dn_max_current_aps                                           5
dn_max_scaffolds_per_layer                                   1
dn_write_checkpoints                                         yes
dn_write_prune_dump                                          yes
dn_write_orients                                             no
dn_write_growth_trees                                        no
dn_output_prefix                                             4qmz.final
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100.0
use_database_filter                                          no
orient_ligand                                                yes
automated_matching                                           yes
receptor_site_file                                           ../02.surface-spheres/selected_spheres.sph
max_orientations                                             1000
critical_points                                              no
chemical_matching                                            no
use_ligand_spheres                                           no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           no
grid_score_secondary                                         no
multigrid_score_primary                                      no
multigrid_score_secondary                                    no
dock3.5_score_primary                                        no
dock3.5_score_secondary                                      no
continuous_score_primary                                     no
continuous_score_secondary                                   no
footprint_similarity_score_primary                           no
footprint_similarity_score_secondary                         no
ph4_score_primary                                            no
ph4_score_secondary                                          no
descriptor_score_primary                                     yes
descriptor_score_secondary                                   no
descriptor_use_grid_score                                    no
descriptor_use_multigrid_score                               yes
descriptor_use_pharmacophore_score                           no
descriptor_use_tanimoto                                      no
descriptor_use_hungarian                                     no
descriptor_multigrid_score_rep_rad_scale                     1.0
descriptor_multigrid_score_vdw_scale                         1.0
descriptor_multigrid_score_es_scale                          1.0
descriptor_multigrid_score_number_of_grids                   19
descriptor_multigrid_score_grid_prefix0                      ../09.footprint_rescore/4qmz.resid_017
descriptor_multigrid_score_grid_prefix1                      ../09.footprint_rescore/4qmz.resid_018
descriptor_multigrid_score_grid_prefix2                      ../09.footprint_rescore/4qmz.resid_019
descriptor_multigrid_score_grid_prefix3                      ../09.footprint_rescore/4qmz.resid_026
descriptor_multigrid_score_grid_prefix4                      ../09.footprint_rescore/4qmz.resid_039
descriptor_multigrid_score_grid_prefix5                      ../09.footprint_rescore/4qmz.resid_071
descriptor_multigrid_score_grid_prefix6                      ../09.footprint_rescore/4qmz.resid_087
descriptor_multigrid_score_grid_prefix7                      ../09.footprint_rescore/4qmz.resid_088
descriptor_multigrid_score_grid_prefix8                      ../09.footprint_rescore/4qmz.resid_089
descriptor_multigrid_score_grid_prefix9                      ../09.footprint_rescore/4qmz.resid_090
descriptor_multigrid_score_grid_prefix10                     ../09.footprint_rescore/4qmz.resid_091
descriptor_multigrid_score_grid_prefix11                     ../09.footprint_rescore/4qmz.resid_093
descriptor_multigrid_score_grid_prefix12                     ../09.footprint_rescore/4qmz.resid_097
descriptor_multigrid_score_grid_prefix13                     ../09.footprint_rescore/4qmz.resid_100
descriptor_multigrid_score_grid_prefix14                     ../09.footprint_rescore/4qmz.resid_139
descriptor_multigrid_score_grid_prefix15                     ../09.footprint_rescore/4qmz.resid_279
descriptor_multigrid_score_grid_prefix16                     ../09.footprint_rescore/4qmz.resid_280
descriptor_multigrid_score_grid_prefix17                     ../09.footprint_rescore/4qmz.resid_283
descriptor_multigrid_score_grid_prefix18                     ../09.footprint_rescore/4qmz.resid_remaining
descriptor_multigrid_score_fp_ref_mol                        yes
descriptor_multigrid_score_footprint_ref                     ../09.footprint_rescore/4qmz.parents_multigridmin.mol2
descriptor_multigrid_score_use_euc                           yes
descriptor_multigrid_score_use_norm_euc                      no
descriptor_multigrid_score_use_cor                           no
descriptor_multigrid_vdw_euc_scale                           1.0
descriptor_multigrid_es_euc_scale                            1.0
descriptor_weight_multigrid_score                            1
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_descriptor_score_secondary                              no
amber_score_secondary                                        no
minimize_ligand                                              yes
minimize_anchor                                              yes
minimize_flexible_growth                                     yes
use_advanced_simplex_parameters                              no
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_anchor_max_iterations                                500
simplex_grow_max_iterations                                  500
simplex_grow_tors_premin_iterations                          0
simplex_random_seed                                          0
simplex_restraint_min                                        no
atom_model                                                   all
vdw_defn_file                                                /PATH/DOCK6/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               /PATH/DOCK6/parameters/flex.defn
flex_drive_file                                              /PATH/DOCK6/parameters/flex_drive.tbl


There are a few things to note here: you must specify a .mol2 file for the scaffold, linker, and sidechain libraries; you must specify your anchor library, which must be tailored prior to the calculation to include the specific anchors you would like to seed from; finally, even though we call upon the descriptor score, we only do so to call our multigrid scoring function -- we are not using descriptor grid score. This would be the same as running with descriptor_score = no and multigrid_score = yes, but it is standard protocol to call any and all scoring functions through descriptor score, regardless of if you're using it or not.

Creating a script to submit to Seawulf

Now we want to generate a script that will call DOCK6 to run the input file we just generated. Why do we need to make a script instead of submitting it directly to DOCK6? De novo generic growth take a good amount of time (approximately 5-10 hrs per anchor) and can get very computationally expensive, thus we will want to submit it to a cluster using qsub. Generate a script (denote.sh) with the following inside it:

#!/bin/bash
#PBS -l walltime=48:00:00
#PBS -l nodes=1:ppn=24
#PBS -q long
#PBS -N 4qmz.denovo
#PBS -V
cd $PBS_O_WORKDIR
/gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/bin/dock6.dn -i denovo.in -o 4qmz.denovo_mg.out


Submit this to the queue by typing:

 qsub denovo.sh

Monitor the output file to see which anchor/layer the calculation is at. Run this calculation from your 010.denovo directory.

Viewing the Results

After the calculation has (successfully) finished, you should have in your directory a large amount of new files. These files take the form 4qmz.final_anchor_*.prune_dump_layer_*.mol2 and 4qmz.final_anchor_*.root_layer_*.mol2. For each anchor, you will have a number of both of these files equal to the number of molecules per layer you specified in the input file. Additionally, you will have an output file (which, of course, should be checked for errors), and a file named 4qmz.final.denovo_build.mol2 -- this is your final output file containing all of the constructed and scored molecules. We are going to open this in Chimera using ViewDock. First, in Chimera, open your 4qmz.parents.multigridmin.mol2 file, then on top of that open the cleaned receptor file. Then click Tools > Surface/Binding Analysis > ViewDock and open the 4qmz.final.denovo_build.mol2 file. This file can have upwards of a thousand different molecules in it, depending on how many anchors and layers you used, and can take a little while to open. Once you select the file in ViewDock and click open, most likely Chimera will freeze, and you won't be able to do anything. It must load all the molecules at once, so give it a good five or ten minutes to load before you decide to quit the program. It will open, you just have to be patient. Once it has loaded you can arrange your molecules based on various parameters contained in the mol2 file. Descriptor (multigrid) score, Mol. weight, and chain/fragment sequence are a few useful metrics for visualizing the newly created molecules in the receptor's active site.


Top descriptor score hits along with fragment strings. Identifies families of related compounds:

Viewdock1.png


Related family of molecules:

Viewdock2.png


Related family of compounds with varying active site conformation:

Viewdock3.png


Related family with conserved binding pocket pose:

Viewdock4.png


Top descriptor scores vs. cognate ligand in red:

Viewdock6.png

Things to Keep in Mind

When running de novo for the first time, it is strongly encouraged that you run it through interactive mode first: that is, generate an empty input file, and run the code inputting the parameters manually for each question. This will give you a good idea of what it wants, what it's doing, and where any potential errors you may come across are originating from.

The de novo code takes anywhere from 4-8 hours per anchor for 15 molecules per layer depending on a myriad of factors: the anchor being used, the specific system, the number of grids, the scoring function, etc.

If you submit an anchor library containing more anchors than you will use (ex: library has 100 anchors, you're only using five) the de novo code will automatically pick the largest anchors! Thus, if you do not specify your anchors, upon finishing your calculation and reviewing your structures, you will notice a disturbing amount of large ring structures. To get around this, be sure to use an anchor library which you have personally compiled and be aware which order it will run the calculation in (it chooses the largest molecular weight anchor first).

It has been determined that the de novo code is sequence independent. Meaning that the results do not depend on the order of their calculation. For example, if you have in your anchor library file anchors A, B, and C for a de novo calculation, you will receive the same results (molecules, conformations, and scores) as if you had run the calculation for A, B, and C individually, with each structure in their own anchor file.

For multigrid scoring, you do not need to specify a dummy atom, or use the corresponding dummy_H parameter file. For other types of scoring functions you will have to specify in your anchor files which atoms are the dummy atoms.

Dock can be finicky about paths. Sometimes it doesn't want full paths (i.e. originating from the top directory, /gpfs), but other times it wants the explicit path in its entirety. If you keep receiving an error about a file location, and you are positive you have entered the correct path, try either reducing the path as much as possible (starting from your home directory, ~/ ) or try including the full path if you have not.

If DOCK6 does not accept "denovo" as a conformer_search_type then you are not running a version of DOCK6 that contains the de novo code.


Length of Denovo Calculations

The de novo code can take a large amount of time, especially as the number of anchors and layers is increased. To give an idea of how long the de novo calculations take, below are some details from different runs on the 4qmz system (the .out file from the de novo code has the total calculation time in seconds at the bottom):

1.) Single anchor (Methylene):  ~3.92 hours
     dn_max_grow_layers                                           9
     dn_max_root_size                                             25
     dn_max_layer_size                                            25
2.) Single Anchor (Carbonyl): ~6.3 hours
     dn_max_grow_layers                                           9
     dn_max_root_size                                             25
     dn_max_layer_size                                            25

3.) Single Anchor (Amine): ~4.99 hours
     dn_max_grow_layers                                           9
     dn_max_root_size                                             25
     dn_max_layer_size                                            25
4.) Single anchor (Methylene): not calculated
     dn_max_grow_layers                                           9
     dn_max_root_size                                             100
     dn_max_layer_size                                            100
5.) Single Anchor (Methylene): ~18.1 hours
     dn_max_grow_layers                                           9
     dn_max_root_size                                             100
     dn_max_layer_size                                            25
     
6.) Single Anchor (Methylene): ~24.2 hours
     dn_max_grow_layers                                           9
     dn_max_root_size                                             25
     dn_max_layer_size                                            100


Make Unique Script to prune de novo results

I was able to test a couple of scripts that would filter through resulting mol2 libraries from de novo runs to collect unique results and omit any repeated molecules. This process uses two scripts, zzz.002.makeunique_new_sub.sh and split_on_tanimoto_new.py. It is imperative that the tanimoto score is included in the original de novo calculation in order to execute these scripts. Otherwise results from a de novo run can be rescored using the tanimoto scoring function without a problem. split_on_tanimoto takes the full de novo mol2 output and compares the tanimotos to a reference ligand. It is easiest to use the cognate ligand from your docking trials however any reference should be appropriate. From here many "bins" containing molecules with the same tanimoto to some reference. From here, zzz.002.makeunique_new_sub.sh compares the tanimoto of the first molecule in the bin (the highest scoring molecule) with the rest of the molecules in the same bin. If there are any tanimoto scores of 1, those molecules are deleted. This process continues until each molecule has been tested. At this point the output should contain a single copy of each of the best scoring molecules generated by a de novo calculation.