Difference between revisions of "2017 Denovo design tutorial 2 with PDB 4QMZ"

From Rizzo_Lab
Jump to: navigation, search
(Generating the Grids)
(Generating the Grids)
Line 273: Line 273:
  
 
Change the path to Dock and your primary residue file if necessary, and ensure you are using a version of Dock with the Denovo code. If you get an error that says something like "cannot stat *.nrg / *.bmp" etc, check to make sure your directories are all pointing to the right places in your two input files. After running this script, you should be given a plethora of different files. If you are running on the 4qmz system, you should have 19 different residues: 18 individual residues, and a 19th file containing the grid for the rest of the residues. You will have four files for each residue: a .bmp file, a .mol2 file, a .nrg file, and a .out file (for each residue!). Additionally you should have two other files: 4qmz.lig.multigridmin.mol2, 4qmz.reference_multigridmin.out. Check your output file for any errors and to make sure everything ran to completion. Visualize your ligand in Chimera to make sure it contains atoms and looks like a real chemical structure. You should have something that looks like this:
 
Change the path to Dock and your primary residue file if necessary, and ensure you are using a version of Dock with the Denovo code. If you get an error that says something like "cannot stat *.nrg / *.bmp" etc, check to make sure your directories are all pointing to the right places in your two input files. After running this script, you should be given a plethora of different files. If you are running on the 4qmz system, you should have 19 different residues: 18 individual residues, and a 19th file containing the grid for the rest of the residues. You will have four files for each residue: a .bmp file, a .mol2 file, a .nrg file, and a .out file (for each residue!). Additionally you should have two other files: 4qmz.lig.multigridmin.mol2, 4qmz.reference_multigridmin.out. Check your output file for any errors and to make sure everything ran to completion. Visualize your ligand in Chimera to make sure it contains atoms and looks like a real chemical structure. You should have something that looks like this:
 +
 +
  [[http://ringo.ams.sunysb.edu/index.php/File:4qmz_multigrid_min.png#filelinks]]
  
 
==Minimizing Ligand on the Grids==
 
==Minimizing Ligand on the Grids==

Revision as of 10:17, 28 March 2017

2017 Denovo design tutorial 2 with PDB 4QMZ

The Denovo module of DOCK is a relatively new feature as of Fall 2016 that constructs new ligand molecules inside a protein active site from a library of user-specified "fragments." These novel ligand molecules are scored based on a number of unique scoring algorithms/criteria specified. The fragments used are common chemical functional groups -- or building blocks -- that are typically selected from a ZINC library of millions of compounds based off of their frequency of appearance. These fragments are classified as scaffolds, linkers, or side chains, according to the number of atomic positions that are permitted to seed growth: 3, 2, and 1 atoms, respectively. Thus, a scaffold could seed growth from three different atoms, having three linkers bonded to each position, and a linker could seed growth on two positions, and a side-chain on one position. Once the molecules are built within the active site, their interactions with the protein are scored using the user-specified method of scoring. This tutorial will walk through the steps needed to run a Denovo calculation on the 4QMZ system from the 2017 DOCK tutorial. This method will utilize the multi-grid scoring function, called through the descriptor score. Ensure you have all the folders and files necessary from running the 2017 tutorial. Users are encouraged to run through the traditional DOCK tutorial for the 4qmz system as many of the files are recycled for the denovo experiments. Before running the calculation, it's worth looking through the "Things to Keep in Mind" section at the bottom for some good pieces of information.

Additional Files Needed

To run the Denovo code with multigrid scoring you need these files:

    fraglib_scaffold.mol2                                                  <-- LIRed
    fraglib_linker.mol2                                                    <-- LIRed
    fraglib_sidechain.mol2                                                 <-- LIRed
    anchor_library.mol2                                                    <--LIRed
    fraglib_torenv.dat                                                     <-- LIRed
    selected_spheres.sph                                                   <-- make your own
    primary_residues_multigrid.bmp / .nrg                                  <-- make your own
    multigrid_minimized_ligand.mol2                                        <-- make your own
    vdw_AMBER_parm99.defn                                                  <--needed for regular dock
    flex.defn                                                              <--needed for regular dock
    flex_drive.tbl                                                         <-- needed for regular dock
    vdw_DumHyd.defn (needed for scoring functions other than multigrid)    <-- LIRed 

The fragment libraries and parameter files must be obtained prior to the Denovo calculation, and can be found on LIRed through the paths:

  /gpfs/home/guest43/scratch/denovo/trial_denovo/000.fraglib 
  /trial_denovo/zzz.parameters/

Everything else is generated through this tutorial, prior to running the Denovo code.


Preparing The Files

Before running Denovo on 4QMZ, please ensure you have gone through the DOCK 2017 tutorial and have all the resulting files. The tutorial can be accessed through here. You should have these files in your directory:

    4qmz.pdb
    4qmz.lig.mol2
    4qmz.rec.clean.mol2
    4qmz.rec.noH.mol2
    selected_spheres.sph

Additionally, you will also need these parameter files:

    vdw_AMBER_parm99.defn
    flex.defn
    flex_drive.tbl

In order to run Denovo with multigrid scoring, we must first go through several steps:

1). Create a primary residue text file and a reference text file -- selects the primary residues of interest.
2). Make a multigrid file for each specified residue -- forms a grid for each residue specified in previous step.
3). Minimizes ligand mol2 file using multigrids from previous step.
4). Rescores ligand on multigrid to yield a minimized ligand .mol2 file. This serves as the reference ligand for Denovo calculations.

Luckily, our good friend Brian generated some extremely robust scripts to make this process easier. There is one script for each step, but we will only use the simple input files for DOCK. If you are interested in using the scripts (and a lot of debugging), they can be found on lired under: /gfps/home/guest43/scratch/denovo/trial_denovo/run/ .

Dock Specifying Primary Residues

Create a directory within your working directory titled 008.footprint_rescore. This is where all pertinent files from this step will go, and where we will run our calculation from. The input file for this step should be titled 4qmz.footprint_rescore.in, and should look like (substitute your own directory path ~/your/own/directory/01.dockprep/4qmz.lig.mol2) :

  conformer_search_type                                        rigid
  use_internal_energy                                          no
  ligand_atom_file                                             /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2
  limit_max_ligands                                            no
  skip_molecule                                                no
  read_mol_solvation                                           no
  calculate_rmsd                                               no
  use_database_filter                                          no
  orient_ligand                                                no
  bump_filter                                                  no
  score_molecules                                              yes
  contact_score_primary                                        no
  contact_score_secondary                                      no
  grid_score_primary                                           no
  grid_score_secondary                                         no
  multigrid_score_primary                                      no
  multigrid_score_secondary                                    no
  dock3.5_score_primary                                        no
  dock3.5_score_secondary                                      no
  continuous_score_primary                                     no
  continuous_score_secondary                                   no
  footprint_similarity_score_primary                           yes
  footprint_similarity_score_secondary                         no
  fps_score_use_footprint_reference_mol2                       yes
  fps_score_footprint_reference_mol2_filename                  /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2
  fps_score_foot_compare_type                                  Euclidean
  fps_score_normalize_foot                                     no
  fps_score_foot_comp_all_residue                              no
  fps_score_choose_foot_range_type                             threshold
  fps_score_vdw_threshold                                      1
  fps_score_es_threshold                                       0.5
  fps_score_hb_threshold                                       0.5
  fps_score_use_remainder                                      yes
  fps_score_receptor_filename                                  /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.rec.mol2
  fps_score_vdw_att_exp                                        6
  fps_score_vdw_rep_exp                                        12
  fps_score_vdw_rep_rad_scale                                  1
  fps_score_use_distance_dependent_dielectric                  yes
  fps_score_dielectric                                         4.0
  fps_score_vdw_fp_scale                                       1
  fps_score_es_fp_scale                                        1
  fps_score_hb_fp_scale                                        0
  pharmacophore_score_secondary                                no
  descriptor_score_secondary                                   no
  gbsa_zou_score_secondary                                     no
  gbsa_hawkins_score_secondary                                 no
  SASA_score_secondary                                         no
  amber_score_secondary                                        no
  minimize_ligand                                              no
  atom_model                                                   all
  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/vdw_AMBER_parm99.defn
  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex.defn
  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex_drive.tbl
  ligand_outfile_prefix                                        output
  write_footprints                                             yes
  write_hbonds                                                 no
  write_orientations                                           no
  num_scored_conformers                                        1
  rank_ligands                                                 no

This calculation should be very quick (~10 seconds) and result in three output files:

4qmz.footprint_rescore.out 
output_footprint_scored.txt
output_scored.mol2


Now, we must declare the primary residues in the active site and generate a grid file for each. Create a new file in the text editor named 4qmz.primary_residues.sh and write this inside of it (copied from Brian's script *.fpsrescore.qsub.sh):

  #!/bin/bash 
  grep -A 1 "range_union" footprintrescore.out |
  grep -v "range_union" |
  grep -v "\-" |
  sed -e '{s/,/\n/g}' |
  sed -e '{s/ //g}' |
  sed '/^$/d' |
  sort -n |
  uniq > temp.dat
  for i in `cat temp.dat`; do printf "%0*d\n" 3 $i; done > 4qmz.primary_residues.dat
  for RES in `cat temp.dat`
  do
          grep " ${RES} " output_footprint_scored.txt  |
          awk -v temp=${RES} '{if ($2 == temp) print $0;}' |
          awk '{print $1 "  " $3 "  " $4}' >> reference.txt
  done
  grep "remainder" output_footprint_scored.txt |
  sed -e '{s/,/  /g}' |
  tr -d '\n' |
  awk '{print $2 "  " $3 "  " $6}' >> reference.txt
  mv reference.txt 4qmz.reference.txt
  rm temp.dat

Run the script and you should have two new files:

4qmz.primary_residues.dat
4qmz.reference.txt 

These are our primary residues! Now we need to generate a grid for each one.


Generating the Grids

We must now generate a grid file for each residue. To do so, we will need the aid of another one of Brian's scripts: 4qmz.make_multigrids.qsub.sh. But before we can use his script, we need to generate two input files for Dock which will be called upon by the script. Create a file named 4qmz.multigrid.in inside your 007.multigrid folder with the following inside it:

  compute_grids                  yes
  grid_spacing                   0.4
  output_molecule                yes
  contact_score                  no
  chemical_score                 no
  energy_score                   yes
  energy_cutoff_distance         9999
  atom_model                     a
  attractive_exponent            6
  repulsive_exponent             9
  distance_dielectric            yes
  dielectric_factor              4
  bump_filter                    yes
  bump_overlap                   0.75
  receptor_file                  temp.mol2
  box_file                       ../03.box-grid/4qmz.box.pdb
  vdw_definition_file            /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/vdw_AMBER_parm99.defn
  chemical_definition_file       /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/chem.defn
  score_grid_prefix              temp.rec
  receptor_out_file              temp.rec.grid.mol2

Additionally, create a file named 4qmz.reference_multigrid.in:

conformer_search_type                                        rigid 
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100.0
ligand_atom_file                                             /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               yes
use_rmsd_reference_mol                                       yes
rmsd_reference_filename                                      /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2
use_database_filter                                          no
orient_ligand                                                no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           no
grid_score_secondary                                         no
multigrid_score_primary                                      yes
multigrid_score_secondary                                    no
multigrid_score_rep_rad_scale                                1
multigrid_score_vdw_scale                                    1
multigrid_score_es_scale                                     1
multigrid_score_number_of_grids                              19
multigrid_score_grid_prefix0                                 ../10.multigrid/4qmz.resid_017
multigrid_score_grid_prefix1                                 ../10.multigrid/4qmz.resid_018
multigrid_score_grid_prefix2                                 ../10.multigrid/4qmz.resid_019
multigrid_score_grid_prefix3                                 ../10.multigrid/4qmz.resid_026
multigrid_score_grid_prefix4                                 ../10.multigrid/4qmz.resid_039
multigrid_score_grid_prefix5                                 ../10.multigrid/4qmz.resid_071
multigrid_score_grid_prefix6                                 ../10.multigrid/4qmz.resid_087
multigrid_score_grid_prefix7                                 ../10.multigrid/4qmz.resid_088
multigrid_score_grid_prefix8                                 ../10.multigrid/4qmz.resid_089
multigrid_score_grid_prefix9                                 ../10.multigrid/4qmz.resid_090
multigrid_score_grid_prefix10                                ../10.multigrid/4qmz.resid_091
multigrid_score_grid_prefix11                                ../10.multigrid/4qmz.resid_093
multigrid_score_grid_prefix12                                ../10.multigrid/4qmz.resid_097
multigrid_score_grid_prefix13                                ../10.multigrid/4qmz.resid_100
multigrid_score_grid_prefix14                                ../10.multigrid/4qmz.resid_139
multigrid_score_grid_prefix15                                ../10.multigrid/4qmz.resid_279
multigrid_score_grid_prefix16                                ../10.multigrid/4qmz.resid_280
multigrid_score_grid_prefix17                                ../10.multigrid/4qmz.resid_283
multigrid_score_grid_prefix18                                /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/10.multigrid/4qmz.resid_remaining
multigrid_score_fp_ref_mol                                   no
multigrid_score_fp_ref_text                                  yes
multigrid_score_footprint_text                               /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/09.footprint_rescore/4qmz.reference.txt
multigrid_score_use_euc                                      yes
multigrid_score_use_norm_euc                                 no
multigrid_score_use_cor                                      no
multigrid_vdw_euc_scale                                      1
multigrid_es_euc_scale                                       1
dock3.5_score_secondary                                      no
continuous_score_secondary                                   no
footprint_similarity_score_secondary                         no
ph4_score_secondary                                          no
descriptor_score_secondary                                   no
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
SASA_descriptor_score_secondary                              no
amber_score_secondary                                        no
minimize_ligand                                              yes
simplex_max_iterations                                       1000
simplex_tors_premin_iterations                               0
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_random_seed                                          0
simplex_restraint_min                                        yes
simplex_coefficient_restraint                                5.0
atom_model                                                   all
vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex.defn
flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex_drive.tbl
ligand_outfile_prefix                                        output
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

Now that we have our input files, we can form the script that will call upon them to generate the grid files for each specified residue. Create a blank file named 4qmz.make_multigrids.qsub.sh in your 007.multigrid folder. Then transcribe into it:

cd /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/09.footprint_rescore
export PRIMARY_RES=` cat 4qmz.primary_residues.dat | sed -e 's/\n/ /g' `
export DOCKHOME="/gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/"
python /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/bin/multigrid_fp_gen.py 4qmz.rec.mol2 4qmz.resid 4qmz.multigrid.in ${PRIMARY_RES}
rm temp.mol2
rm 4qmz.resid_*.rec.grid.mol2
/gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/bin/dock6.dn -i 4qmz.reference_multigridmin.in -o 4qmz.reference_multigridmin.out
mv output_scored.mol2 4qmz.lig.multigridmin.mol2
cp 4qmz.lig.multigridmin.mol2 ../10.multigrid

Change the path to Dock and your primary residue file if necessary, and ensure you are using a version of Dock with the Denovo code. If you get an error that says something like "cannot stat *.nrg / *.bmp" etc, check to make sure your directories are all pointing to the right places in your two input files. After running this script, you should be given a plethora of different files. If you are running on the 4qmz system, you should have 19 different residues: 18 individual residues, and a 19th file containing the grid for the rest of the residues. You will have four files for each residue: a .bmp file, a .mol2 file, a .nrg file, and a .out file (for each residue!). Additionally you should have two other files: 4qmz.lig.multigridmin.mol2, 4qmz.reference_multigridmin.out. Check your output file for any errors and to make sure everything ran to completion. Visualize your ligand in Chimera to make sure it contains atoms and looks like a real chemical structure. You should have something that looks like this:

 [[1]]

Minimizing Ligand on the Grids

Running Denovo

Creating the Input File

Creating a script to submit to Seawulf

Viewing the Results

Things to Keep in Mind

Length of Denovo Calculations