De novo Developer Progress
This is the Rizzo lab wiki page for coordinating bugs and progress on the de novo project.
Contents
Version of the code on cluster that Rizzo lab should be using:
Lauren:
/gpfs/home/leprentis/dock6.8_merge_06.14.2017 This version includes the orienting fragments fix, and rotatable bond fix.
DOCK6.8 merge: de novo and dock6.8 were merged (/gpfs/home/leprentis/dock6.8_merge_06.14.2017)
Path to Generic Fragment Library:
/gpfs/projects/rizzo/leprentis/gen-frags-12
Path to Frequency Anchors:
/gpfs/projects/rizzo/leprentis/zinc1_ancs_freq
Current Coding Progress:
Working on these currently:
- Lauren: Check 663 (5through15_ch2) systems with merged dock6.8 and check analysis against de novo paper analysis.
- Lauren: Fix Frag_String output into chimera for refinement situations
- Lauren: Implement Roulette fragment picking into graph and random as an option
- Lauren: Change scaling factor to a function of decay (currently a straight line to lowest score cutoff)
- Lauren: Test simple build function with merged de novo
- Lauren: Testing with sb2012 default values for molecules passed on to the next layer and root (looking for timing and efficiency).
- Steven: editting script to calculate SMILE string of de novo molecules in OpenBable
Need to be fixed:
- score_molecules and internal_energy problem (for simple_build)
- HMS needs to fixed when no heavy atoms matching
- Get rid of some scoring functions and only use descriptor score
Not working on these right now:
- Lauren: Adjacency matrix vs tors env
- Lauren: Addition of "3mer" combination fragment check (post tors check)
- Lauren: Min and Max for charge to replace absolute value of charge.(Dwight suggestion)
- Lauren: Capping groups for post growth process (halogens and methyls)
Completed:
-  Dwight & Lauren: MPI wrapper for 192 processors (8 nodes) for testsets on rizzo cluster
-  Lauren: Implement csingleton fix for orienting fragments with less than 3 heavy atoms
-  Lauren: Test bfochtman fix for rotatable bonds within an user defined anchor
List of features that we definitely want for the 6.8 release:
| Task | Owner | Complete? | 
|---|---|---|
| Smooth pruning scaling function | LEP | |
| Roulette function to Random and Graph as an option | LEP | |
| Overhaul the simple build function | LEP | |
| When minimizing with descriptor score, make sure fingerprint is turned off | xxx | |
| Speed up fingerprint calculations by saving reference ligand as a permanent object | WJA | yep | 
| Add pre-min conformations to growth trees | WJA | yep | 
| Add verbose flag options | WJA | yep | 
| Put molecular properties (RB, MW, etc) in mol2 header | WJA | yep | 
| Put ensemble properties (RB, MW, etc) output stream at the end of each layer | WJA | yep | 
| Check formal charge prune | BCF | yep | 
| Combination of horizontal pruning metrics (let's consider dropping tanimoto prune and just using hungarian prune) | WJA | yep | 
| Finish implementing growth trees | WJA | yep | 
| Revisit orienting to make sure it is working as intended | WJA | yep | 
| Fixed a bug where we were marking scaffold_this_layer as true for any fragment | WJA | yep | 
| Update random sampling function to use last layer changes in graph function | WJA | yep | 
| Do that same thing for the exhaustive function | WJA | yep | 
| I don't think we ever clear the scaf_link_sid vector, we definitely should do that somewhere | WJA | yep | 
| Update exhaustive to combine all frags into one library, just like graph / random. | WJA | yep | 
List of features/ideas for future releases:
- Stereo centers / volume overlap pruning
- Capping group functions (H, CH3, Halogen)
- Incorporate GA at the end of each layer
- Overhaul the simple-build function
- Monte carlo algorithm that checks bond frequency
- Scaling max root / layer size with layer
- Select torenv before selecting fragment. Will need to overhaul fraggraph, will keep us from needing to assemble mols that will not pass torenv.
- Add fragname string to restart and dump files, already done for final and fraglib files.
- Add ZINC name to torenv table
- Unusual behavior during library generation when frequency cutoff == 0
- Print out how many molecules cannot be capped. (Difference between ensemble size and dump.)
- building from anchor 0 -> building from scf.98
- Possible torenv check for dump molecules after capping before printing.
- keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities. In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries)
List of SB2012 systems that we will use for tests:
For now, let's use 5-15 rotatable bonds inclusive; total = 709 systems ("drug-like" size molecules). De novo paper only used 663 systems that removed 46 systems where the cognate ligand did not fall with a +/-2 formal charge. (5through15 = 709, 5through15_ch2 = 663)
{5RB = 107; 6RB = 96; 7RB = 103; 8RB = 75; 9RB = 66; 10RB = 75; 11RB = 57; 12RB = 41; 13RB = 38; 14RB = 26; 15RB = 25}
The input file we should be using:
Note: these are currently all the defaults (except for scoring function). Once we figure out the best set of parameters for both (1) focused and (2) generic libraries, then we need to update the defaults accordingly.
conformer_search_type denovo dn_fraglib_scaffold_file fraglib_scaffold.mol2 dn_fraglib_linker_file fraglib_linker.mol2 dn_fraglib_sidechain_file fraglib_sidechain.mol2 dn_user_specified_anchor yes dn_fraglib_anchor_file fraglib_anchor.mol2 dn_use_torenv_table yes dn_torenv_table torenv_table.dat dn_sampling_method graph dn_graph_starting_points 10 dn_graph_breadth 5 dn_graph_depth 2 dn_graph_temperature 100.0 dn_pruning_conformer_score_cutoff 100.0 dn_pruning_conformer_score_scaling_factor 1 dn_pruning_clustering_cutoff 100.0 dn_constraint_mol_wt 1000.0 dn_constraint_rot_bon 15 dn_constraint_formal_charge 2.0 dn_heur_unmatched_num 1 dn_heur_matched_rmsd 2.0 dn_unique_anchors 3 dn_max_grow_layers 9 dn_max_root_size 100 dn_max_layer_size 100 dn_max_current_aps 5 dn_max_scaffolds_per_layer 1 dn_write_checkpoints yes dn_write_prune_dump no dn_write_orients no dn_write_growth_trees no dn_output_prefix output use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file receptor.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary no grid_score_secondary no multigrid_score_primary no multigrid_score_secondary no dock3.5_score_primary no dock3.5_score_secondary no continuous_score_primary no continuous_score_secondary no footprint_similarity_score_primary no footprint_similarity_score_secondary no ph4_score_primary no ph4_score_secondary no descriptor_score_primary yes descriptor_score_secondary no descriptor_use_grid_score no descriptor_use_multigrid_score yes descriptor_use_pharmacophore_score no descriptor_use_tanimoto yes descriptor_use_hungarian yes descriptor_multigrid_score_rep_rad_scale 1 descriptor_multigrid_score_vdw_scale 1 descriptor_multigrid_score_es_scale 1 ... descriptor_multigrid_score_number_of_grids N descriptor_multigrid_score_grid_prefix0 grid0 descriptor_multigrid_score_grid_prefixN-2 gridN descriptor_multigrid_score_grid_prefixN-1 grid_remaining ... descriptor_multigrid_score_fp_ref_mol yes descriptor_multigrid_score_footprint_ref ../001.files/8ABP.lig.multigridmin.mol2 descriptor_multigrid_score_use_euc yes descriptor_multigrid_score_use_norm_euc no descriptor_multigrid_score_use_cor no descriptor_multigrid_vdw_euc_scale 1 descriptor_multigrid_es_euc_scale 1 descriptor_fingerprint_ref_filename ../001.files/8ABP.lig.multigridmin.mol2 descriptor_hungarian_ref_filename ../001.files/8ABP.lig.am1bcc.mol2 descriptor_hungarian_matching_coeff -5 descriptor_hungarian_rmsd_coeff 1 descriptor_weight_multigrid_score 1 descriptor_weight_fingerprint_tanimoto 0 descriptor_weight_hungarian 0 gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_descriptor_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file vdw.defn flex_defn_file flex.defn flex_drive_file flex_drive.tbl
