Difference between revisions of "2017 Denovo design tutorial 2 with PDB 4QMZ"
Stonybrook (talk | contribs) (→Generating the Grids) |
Stonybrook (talk | contribs) (→Minimizing Ligand on the Grids) |
||
Line 287: | Line 287: | ||
==Minimizing Ligand on the Grids== | ==Minimizing Ligand on the Grids== | ||
+ | |||
+ | We're taking the input here straight from Brian's script for this part (run.003e.mg_rescore.sh). Create an input file named 4qmz.parents_multigridmin.in with this inside it: | ||
+ | |||
+ | conformer_search_type rigid | ||
+ | use_internal_energy yes | ||
+ | internal_energy_rep_exp 12 | ||
+ | internal_energy_cutoff 100.0 | ||
+ | ligand_atom_file /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2 | ||
+ | limit_max_ligands no | ||
+ | skip_molecule no | ||
+ | read_mol_solvation no | ||
+ | calculate_rmsd no | ||
+ | use_database_filter no | ||
+ | orient_ligand no | ||
+ | bump_filter no | ||
+ | score_molecules yes | ||
+ | contact_score_primary no | ||
+ | contact_score_secondary no | ||
+ | grid_score_primary no | ||
+ | grid_score_secondary no | ||
+ | multigrid_score_primary yes | ||
+ | multigrid_score_secondary no | ||
+ | multigrid_score_rep_rad_scale 1 | ||
+ | multigrid_score_vdw_scale 1 | ||
+ | multigrid_score_es_scale 1 | ||
+ | multigrid_score_number_of_grids 19 | ||
+ | multigrid_score_grid_prefix0 4qmz.resid_017 | ||
+ | multigrid_score_grid_prefix1 4qmz.resid_018 | ||
+ | multigrid_score_grid_prefix2 4qmz.resid_019 | ||
+ | multigrid_score_grid_prefix3 4qmz.resid_026 | ||
+ | multigrid_score_grid_prefix4 4qmz.resid_039 | ||
+ | multigrid_score_grid_prefix5 4qmz.resid_071 | ||
+ | multigrid_score_grid_prefix6 4qmz.resid_087 | ||
+ | multigrid_score_grid_prefix7 4qmz.resid_088 | ||
+ | multigrid_score_grid_prefix8 4qmz.resid_089 | ||
+ | multigrid_score_grid_prefix9 4qmz.resid_090 | ||
+ | multigrid_score_grid_prefix10 4qmz.resid_091 | ||
+ | multigrid_score_grid_prefix11 4qmz.resid_093 | ||
+ | multigrid_score_grid_prefix12 4qmz.resid_097 | ||
+ | multigrid_score_grid_prefix13 4qmz.resid_100 | ||
+ | multigrid_score_grid_prefix14 4qmz.resid_139 | ||
+ | multigrid_score_grid_prefix15 4qmz.resid_279 | ||
+ | multigrid_score_grid_prefix16 4qmz.resid_280 | ||
+ | multigrid_score_grid_prefix17 4qmz.resid_283 | ||
+ | multigrid_score_grid_prefix18 4qmz.resid_remaining | ||
+ | multigrid_score_fp_ref_mol no | ||
+ | multigrid_score_fp_ref_text yes | ||
+ | multigrid_score_footprint_text 4qmz.reference.txt | ||
+ | multigrid_score_foot_compare_type Euclidean | ||
+ | multigrid_score_normalize_foot no | ||
+ | multigrid_score_vdw_euc_scale 1.0 | ||
+ | multigrid_score_es_euc_scale 1.0 | ||
+ | dock3.5_score_secondary no | ||
+ | continuous_score_secondary no | ||
+ | footprint_similarity_score_secondary no | ||
+ | pharmacophore_score_secondary no | ||
+ | descriptor_score_secondary no | ||
+ | gbsa_zou_score_secondary no | ||
+ | gbsa_hawkins_score_secondary no | ||
+ | SASA_score_secondary no | ||
+ | amber_score_secondary no | ||
+ | minimize_ligand yes | ||
+ | simplex_max_iterations 1000 | ||
+ | simplex_tors_premin_iterations 0 | ||
+ | simplex_max_cycles 1 | ||
+ | simplex_score_converge 0.1 | ||
+ | simplex_cycle_converge 1.0 | ||
+ | simplex_trans_step 1.0 | ||
+ | simplex_rot_step 0.1 | ||
+ | simplex_tors_step 10.0 | ||
+ | simplex_random_seed 0 | ||
+ | simplex_restraint_min yes | ||
+ | simplex_coefficient_restraint 5.0 | ||
+ | atom_model all | ||
+ | vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/vdw_AMBER_parm99.defn | ||
+ | flex_defn_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex.defn | ||
+ | flex_drive_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex_drive.tbl | ||
+ | ligand_outfile_prefix output | ||
+ | write_footprints no | ||
+ | write_orientations no | ||
+ | num_scored_conformers 1 | ||
+ | rank_ligands no | ||
+ | |||
+ | After running this with dock6 you should have an output file (which should be checked for errors, as always) and a .mol2 file named output_scored.mol2. Rename this to 4qmz.parents_multigridmin.mol2, and visualize it in Chimera, to ensure you still have a realistic molecule. This is the mol2 file of the ligand minimized using the multigrid scoring. This will serve as our reference molecule for guided growth! | ||
==Running Denovo== | ==Running Denovo== |
Revision as of 10:27, 28 March 2017
Contents
2017 Denovo design tutorial 2 with PDB 4QMZ
The Denovo module of DOCK is a relatively new feature as of Fall 2016 that constructs new ligand molecules inside a protein active site from a library of user-specified "fragments." These novel ligand molecules are scored based on a number of unique scoring algorithms/criteria specified. The fragments used are common chemical functional groups -- or building blocks -- that are typically selected from a ZINC library of millions of compounds based off of their frequency of appearance. These fragments are classified as scaffolds, linkers, or side chains, according to the number of atomic positions that are permitted to seed growth: 3, 2, and 1 atoms, respectively. Thus, a scaffold could seed growth from three different atoms, having three linkers bonded to each position, and a linker could seed growth on two positions, and a side-chain on one position. Once the molecules are built within the active site, their interactions with the protein are scored using the user-specified method of scoring. This tutorial will walk through the steps needed to run a Denovo calculation on the 4QMZ system from the 2017 DOCK tutorial. This method will utilize the multi-grid scoring function, called through the descriptor score. Ensure you have all the folders and files necessary from running the 2017 tutorial. Users are encouraged to run through the traditional DOCK tutorial for the 4qmz system as many of the files are recycled for the denovo experiments. Before running the calculation, it's worth looking through the "Things to Keep in Mind" section at the bottom for some good pieces of information.
Additional Files Needed
To run the Denovo code with multigrid scoring you need these files:
fraglib_scaffold.mol2 <-- LIRed fraglib_linker.mol2 <-- LIRed fraglib_sidechain.mol2 <-- LIRed anchor_library.mol2 <--LIRed fraglib_torenv.dat <-- LIRed selected_spheres.sph <-- make your own primary_residues_multigrid.bmp / .nrg <-- make your own multigrid_minimized_ligand.mol2 <-- make your own vdw_AMBER_parm99.defn <--needed for regular dock flex.defn <--needed for regular dock flex_drive.tbl <-- needed for regular dock vdw_DumHyd.defn (needed for scoring functions other than multigrid) <-- LIRed
The fragment libraries and parameter files must be obtained prior to the Denovo calculation, and can be found on LIRed through the paths:
/gpfs/home/guest43/scratch/denovo/trial_denovo/000.fraglib
/trial_denovo/zzz.parameters/
Everything else is generated through this tutorial, prior to running the Denovo code.
Preparing The Files
Before running Denovo on 4QMZ, please ensure you have gone through the DOCK 2017 tutorial and have all the resulting files. The tutorial can be accessed through here. You should have these files in your directory:
4qmz.pdb 4qmz.lig.mol2 4qmz.rec.clean.mol2 4qmz.rec.noH.mol2 selected_spheres.sph
Additionally, you will also need these parameter files:
vdw_AMBER_parm99.defn flex.defn flex_drive.tbl
In order to run Denovo with multigrid scoring, we must first go through several steps:
1). Create a primary residue text file and a reference text file -- selects the primary residues of interest.
2). Make a multigrid file for each specified residue -- forms a grid for each residue specified in previous step.
3). Minimizes ligand mol2 file using multigrids from previous step.
4). Rescores ligand on multigrid to yield a minimized ligand .mol2 file. This serves as the reference ligand for Denovo calculations.
Luckily, our good friend Brian generated some extremely robust scripts to make this process easier. There is one script for each step, but we will only use the simple input files for DOCK. If you are interested in using the scripts (and a lot of debugging), they can be found on lired under: /gfps/home/guest43/scratch/denovo/trial_denovo/run/ .
Dock Specifying Primary Residues
Create a directory within your working directory titled 008.footprint_rescore. This is where all pertinent files from this step will go, and where we will run our calculation from. The input file for this step should be titled 4qmz.footprint_rescore.in, and should look like (substitute your own directory path ~/your/own/directory/01.dockprep/4qmz.lig.mol2) :
conformer_search_type rigid use_internal_energy no ligand_atom_file /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary no multigrid_score_secondary no dock3.5_score_primary no dock3.5_score_secondary no continuous_score_primary no continuous_score_secondary no footprint_similarity_score_primary yes footprint_similarity_score_secondary no fps_score_use_footprint_reference_mol2 yes fps_score_footprint_reference_mol2_filename /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2 fps_score_foot_compare_type Euclidean fps_score_normalize_foot no fps_score_foot_comp_all_residue no fps_score_choose_foot_range_type threshold fps_score_vdw_threshold 1 fps_score_es_threshold 0.5 fps_score_hb_threshold 0.5 fps_score_use_remainder yes fps_score_receptor_filename /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.rec.mol2 fps_score_vdw_att_exp 6 fps_score_vdw_rep_exp 12 fps_score_vdw_rep_rad_scale 1 fps_score_use_distance_dependent_dielectric yes fps_score_dielectric 4.0 fps_score_vdw_fp_scale 1 fps_score_es_fp_scale 1 fps_score_hb_fp_scale 0 pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex_drive.tbl ligand_outfile_prefix output write_footprints yes write_hbonds no write_orientations no num_scored_conformers 1 rank_ligands no
This calculation should be very quick (~10 seconds) and result in three output files:
4qmz.footprint_rescore.out output_footprint_scored.txt output_scored.mol2
Now, we must declare the primary residues in the active site and generate a grid file for each. Create a new file in the text editor named 4qmz.primary_residues.sh and write this inside of it (copied from Brian's script *.fpsrescore.qsub.sh):
#!/bin/bash grep -A 1 "range_union" footprintrescore.out | grep -v "range_union" | grep -v "\-" | sed -e '{s/,/\n/g}' | sed -e '{s/ //g}' | sed '/^$/d' | sort -n | uniq > temp.dat for i in `cat temp.dat`; do printf "%0*d\n" 3 $i; done > 4qmz.primary_residues.dat for RES in `cat temp.dat` do grep " ${RES} " output_footprint_scored.txt | awk -v temp=${RES} '{if ($2 == temp) print $0;}' | awk '{print $1 " " $3 " " $4}' >> reference.txt done grep "remainder" output_footprint_scored.txt | sed -e '{s/,/ /g}' | tr -d '\n' | awk '{print $2 " " $3 " " $6}' >> reference.txt mv reference.txt 4qmz.reference.txt rm temp.dat
Run the script and you should have two new files:
4qmz.primary_residues.dat 4qmz.reference.txt
These are our primary residues! Now we need to generate a grid for each one.
Generating the Grids
We must now generate a grid file for each residue. To do so, we will need the aid of another one of Brian's scripts: 4qmz.make_multigrids.qsub.sh. But before we can use his script, we need to generate two input files for Dock which will be called upon by the script. Create a file named 4qmz.multigrid.in inside your 007.multigrid folder with the following inside it:
compute_grids yes grid_spacing 0.4 output_molecule yes contact_score no chemical_score no energy_score yes energy_cutoff_distance 9999 atom_model a attractive_exponent 6 repulsive_exponent 9 distance_dielectric yes dielectric_factor 4 bump_filter yes bump_overlap 0.75 receptor_file temp.mol2 box_file ../03.box-grid/4qmz.box.pdb vdw_definition_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/vdw_AMBER_parm99.defn chemical_definition_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/chem.defn score_grid_prefix temp.rec receptor_out_file temp.rec.grid.mol2
Additionally, create a file named 4qmz.reference_multigrid.in:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary yes multigrid_score_secondary no multigrid_score_rep_rad_scale 1 multigrid_score_vdw_scale 1 multigrid_score_es_scale 1 multigrid_score_number_of_grids 19 multigrid_score_grid_prefix0 ../10.multigrid/4qmz.resid_017 multigrid_score_grid_prefix1 ../10.multigrid/4qmz.resid_018 multigrid_score_grid_prefix2 ../10.multigrid/4qmz.resid_019 multigrid_score_grid_prefix3 ../10.multigrid/4qmz.resid_026 multigrid_score_grid_prefix4 ../10.multigrid/4qmz.resid_039 multigrid_score_grid_prefix5 ../10.multigrid/4qmz.resid_071 multigrid_score_grid_prefix6 ../10.multigrid/4qmz.resid_087 multigrid_score_grid_prefix7 ../10.multigrid/4qmz.resid_088 multigrid_score_grid_prefix8 ../10.multigrid/4qmz.resid_089 multigrid_score_grid_prefix9 ../10.multigrid/4qmz.resid_090 multigrid_score_grid_prefix10 ../10.multigrid/4qmz.resid_091 multigrid_score_grid_prefix11 ../10.multigrid/4qmz.resid_093 multigrid_score_grid_prefix12 ../10.multigrid/4qmz.resid_097 multigrid_score_grid_prefix13 ../10.multigrid/4qmz.resid_100 multigrid_score_grid_prefix14 ../10.multigrid/4qmz.resid_139 multigrid_score_grid_prefix15 ../10.multigrid/4qmz.resid_279 multigrid_score_grid_prefix16 ../10.multigrid/4qmz.resid_280 multigrid_score_grid_prefix17 ../10.multigrid/4qmz.resid_283 multigrid_score_grid_prefix18 /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/10.multigrid/4qmz.resid_remaining multigrid_score_fp_ref_mol no multigrid_score_fp_ref_text yes multigrid_score_footprint_text /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/09.footprint_rescore/4qmz.reference.txt multigrid_score_use_euc yes multigrid_score_use_norm_euc no multigrid_score_use_cor no multigrid_vdw_euc_scale 1 multigrid_es_euc_scale 1 dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no ph4_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_descriptor_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min yes simplex_coefficient_restraint 5.0 atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex_drive.tbl ligand_outfile_prefix output write_orientations no num_scored_conformers 1 rank_ligands no
Now that we have our input files, we can form the script that will call upon them to generate the grid files for each specified residue. Create a blank file named 4qmz.make_multigrids.qsub.sh in your 007.multigrid folder. Then transcribe into it:
cd /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/09.footprint_rescore export PRIMARY_RES=` cat 4qmz.primary_residues.dat | sed -e 's/\n/ /g' ` export DOCKHOME="/gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/" python /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/bin/multigrid_fp_gen.py 4qmz.rec.mol2 4qmz.resid 4qmz.multigrid.in ${PRIMARY_RES} rm temp.mol2 rm 4qmz.resid_*.rec.grid.mol2 /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/bin/dock6.dn -i 4qmz.reference_multigridmin.in -o 4qmz.reference_multigridmin.out mv output_scored.mol2 4qmz.lig.multigridmin.mol2 cp 4qmz.lig.multigridmin.mol2 ../10.multigrid
Change the path to Dock and your primary residue file if necessary, and ensure you are using a version of Dock with the Denovo code. If you get an error that says something like "cannot stat *.nrg / *.bmp" etc, check to make sure your directories are all pointing to the right places in your two input files. After running this script, you should be given a plethora of different files. If you are running on the 4qmz system, you should have 19 different residues: 18 individual residues, and a 19th file containing the grid for the rest of the residues. You will have four files for each residue: a .bmp file, a .mol2 file, a .nrg file, and a .out file (for each residue!). Additionally you should have two other files: 4qmz.lig.multigridmin.mol2, 4qmz.reference_multigridmin.out. Check your output file for any errors and to make sure everything ran to completion. Visualize your ligand in Chimera to make sure it contains atoms and looks like a real chemical structure. You should have something that looks like this:
In addition to ensuring the ligand still seems reasonable, it may be worthwhile and interesting to visualize the minimized ligand with the primary residues to create a distilled down active site like this (ligand is highlighted green for ease of visualization):
Minimizing Ligand on the Grids
We're taking the input here straight from Brian's script for this part (run.003e.mg_rescore.sh). Create an input file named 4qmz.parents_multigridmin.in with this inside it:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file /gpfs/home/stelehany/rizzo_rot_research/dock_tutorial/01.dockprep/4qmz.lig.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary yes multigrid_score_secondary no multigrid_score_rep_rad_scale 1 multigrid_score_vdw_scale 1 multigrid_score_es_scale 1 multigrid_score_number_of_grids 19 multigrid_score_grid_prefix0 4qmz.resid_017 multigrid_score_grid_prefix1 4qmz.resid_018 multigrid_score_grid_prefix2 4qmz.resid_019 multigrid_score_grid_prefix3 4qmz.resid_026 multigrid_score_grid_prefix4 4qmz.resid_039 multigrid_score_grid_prefix5 4qmz.resid_071 multigrid_score_grid_prefix6 4qmz.resid_087 multigrid_score_grid_prefix7 4qmz.resid_088 multigrid_score_grid_prefix8 4qmz.resid_089 multigrid_score_grid_prefix9 4qmz.resid_090 multigrid_score_grid_prefix10 4qmz.resid_091 multigrid_score_grid_prefix11 4qmz.resid_093 multigrid_score_grid_prefix12 4qmz.resid_097 multigrid_score_grid_prefix13 4qmz.resid_100 multigrid_score_grid_prefix14 4qmz.resid_139 multigrid_score_grid_prefix15 4qmz.resid_279 multigrid_score_grid_prefix16 4qmz.resid_280 multigrid_score_grid_prefix17 4qmz.resid_283 multigrid_score_grid_prefix18 4qmz.resid_remaining multigrid_score_fp_ref_mol no multigrid_score_fp_ref_text yes multigrid_score_footprint_text 4qmz.reference.txt multigrid_score_foot_compare_type Euclidean multigrid_score_normalize_foot no multigrid_score_vdw_euc_scale 1.0 multigrid_score_es_euc_scale 1.0 dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_random_seed 0 simplex_restraint_min yes simplex_coefficient_restraint 5.0 atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dn_dock.6.7/parameters/flex_drive.tbl ligand_outfile_prefix output write_footprints no write_orientations no num_scored_conformers 1 rank_ligands no
After running this with dock6 you should have an output file (which should be checked for errors, as always) and a .mol2 file named output_scored.mol2. Rename this to 4qmz.parents_multigridmin.mol2, and visualize it in Chimera, to ensure you still have a realistic molecule. This is the mol2 file of the ligand minimized using the multigrid scoring. This will serve as our reference molecule for guided growth!