Difference between revisions of "De novo Developer Progress"

From Rizzo_Lab
Jump to: navigation, search
(The input file we should be using:)
 
(38 intermediate revisions by 2 users not shown)
Line 2: Line 2:
 
<br>
 
<br>
  
=== Version of the code on cluster that we should be using: ===
+
=== Valgrind clean version of the code on cluster that Rizzo lab should be using: ===
 
Lauren:
 
Lauren:
  /gpfs/home/leprentis/dock6.8_merge_06.14.2017
+
  /gpfs/projects/rizzo/zzz.programs/dock6.9_release
  This version includes the orienting fragments fix, and rotatable bond fix.
+
  This version includes all changes of the merge.
 
 
'''DOCK6.8 merge''': de novo and dock6.8 were merged (/gpfs/home/leprentis/dock6.8_merge_06.14.2017)
 
  
 
Path to Generic Fragment Library:
 
Path to Generic Fragment Library:
Line 18: Line 16:
 
=== Current Coding Progress: ===
 
=== Current Coding Progress: ===
 
Working on these currently:
 
Working on these currently:
# Lauren: Check 663 (5through15_ch2) systems with merged dock6.8 and check analysis against de novo paper analysis.
+
 
# Lauren: Fix Frag_String output into chimera for refinement situations
+
# Lauren: Check MGS+(-50)TAN before and after fingerprinting fix for 663 systems
 
# Lauren: Implement Roulette fragment picking into graph and random as an option
 
# Lauren: Implement Roulette fragment picking into graph and random as an option
# Lauren: Change scaling factor to a function of decay (currently a straight line to lowest score cutoff)
+
# Lauren: Implement Adjacency Matrix into fraglib/dn (initialize matrix and utilize matrix for graph and random fragment picking)
# Lauren: Test simple build function with merged de novo
 
 
# Lauren: Testing with sb2012 default values for molecules passed on to the next layer and root (looking for timing and efficiency).
 
# Lauren: Testing with sb2012 default values for molecules passed on to the next layer and root (looking for timing and efficiency).
#Steven: editting script to calculate SMILE string of de novo molecules in OpenBable
+
# Lauren&John: Rework VS protocol to integrate de novo protocol more smoothly
 +
 
  
 
<br>
 
<br>
Line 30: Line 28:
 
Need to be fixed:
 
Need to be fixed:
 
# score_molecules and internal_energy problem (for simple_build)
 
# score_molecules and internal_energy problem (for simple_build)
 +
# HMS needs to fixed when no heavy atoms matching
 +
# Get rid of some scoring functions and only use descriptor score
  
 
<br>
 
<br>
Line 36: Line 36:
 
# Lauren: Adjacency matrix vs tors env
 
# Lauren: Adjacency matrix vs tors env
 
# Lauren: Addition of "3mer" combination fragment check (post tors check)
 
# Lauren: Addition of "3mer" combination fragment check (post tors check)
# Lauren: Min and Max for charge to replace absolute value of charge.(Dwight suggestion)
+
# Lauren: Min and Max for charge to replace absolute value of charge.(Broke everything)
 
# Lauren: Capping groups for post growth process (halogens and methyls)
 
# Lauren: Capping groups for post growth process (halogens and methyls)
 
+
# Lauren: Fix Frag_String output into chimera for Refinement situations (current space can remove the spaces in the mol2 file - temp fix)
 +
# Stephen: Change scaling factor to a function of decay (currently a straight line to lowest score cutoff)
 +
# Lauren&John: SMILEs and ZINC script (for dn and ga)
 +
# Lauren: incorporate tan pruning as final step (post growth) as user option (replace make_unique script)
  
 
<br>
 
<br>
 
Completed:
 
Completed:
# <strike>Lauren: Implement csingleton fix for orienting fragments with less than 3 heavy atoms</strike>
+
# <strike>Lauren: determine if random seed is reset for each aps</strike>
# <strike>Lauren: Test bfochtman fix for rotatable bonds within an user defined anchor</strike>
+
#<strike> Lauren: Create testset for each dn function </strike>
 +
# <strike>Lauren: Test simple build function with merged de novo </strike>
 +
# <strike>Lauren&Stephen: clean make_unique script for release</strike>
 +
# <strike>Lauren: merge GA into dock/dn </strike>
 +
# <strike>Dwight & Lauren: MPI wrapper for 192 processors (8 nodes) for testsets on rizzo cluster </strike>
 +
# <strike>Lauren: Create short testsets for denovo frag gen, focused fragment generic for DOCK6.9 release </strike>
 +
# <strike>Dwight+Lauren: merge parameter files of de novo with DOCK </strike>
 +
# <strike>Lauren: add dn_defn file for separate defn with Hydrogens </strike>
 +
# <strike>Lauren: Implement csingleton fix for orienting fragments with less than 3 heavy atoms </strike>
 +
# <strike>Lauren: Test bfochtman fix for rotatable bonds within an user defined anchor </strike>
 +
# <strike>Lauren: Test csingleton fix for orienting fragments with Du </strike>
 +
#<strike>Lauren: Test MGS focused fragment library results with dn paper </strike>
 +
# <strike>Stephen: editting script to calculate SMILE string of de novo molecules in OpenBable </strike>
 
<br>
 
<br>
  
=== List of features that we definitely want for the 6.8 release: ===
+
=== List of features that we definitely want for the 6.9 release: ===
  
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
Line 54: Line 69:
 
! style="width:10%" !|Complete?
 
! style="width:10%" !|Complete?
 
|-
 
|-
|Smooth pruning scaling function || LEP ||
+
|<strike>Smooth pruning scaling function</strike> || LEP ||
 
|-
 
|-
|Roulette function to Random and Graph as an option || LEP ||  
+
|<strike>Roulette function to Random and Graph as an option</strike> || LEP ||  
 
|-
 
|-
|Overhaul the simple build function || LEP ||  
+
|<strike>Overhaul the simple build function</strike> || LEP ||  
 
|-
 
|-
 
|When minimizing with descriptor score, make sure fingerprint is turned off || xxx ||
 
|When minimizing with descriptor score, make sure fingerprint is turned off || xxx ||
Line 95: Line 110:
 
=== List of features/ideas for future releases: ===
 
=== List of features/ideas for future releases: ===
  
 +
* Using different references for different layers of dn growth
 
* Stereo centers / volume overlap pruning
 
* Stereo centers / volume overlap pruning
 
* Capping group functions (H, CH3, Halogen)
 
* Capping group functions (H, CH3, Halogen)
Line 108: Line 124:
 
* building from anchor 0 -> building from scf.98
 
* building from anchor 0 -> building from scf.98
 
* Possible torenv check for dump molecules after capping before printing.
 
* Possible torenv check for dump molecules after capping before printing.
 +
* keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities.  In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries)
  
 
<br>
 
<br>
Line 118: Line 135:
  
 
<br>
 
<br>
 
=== The input file we should be using: ===
 
 
Note: these are currently all the defaults (except for scoring function). Once we figure out the best set of parameters for both '''(1) focused and (2) generic libraries''', then we need to update the defaults accordingly.
 
 
conformer_search_type                                        denovo
 
dn_fraglib_scaffold_file                                    fraglib_scaffold.mol2
 
dn_fraglib_linker_file                                      fraglib_linker.mol2
 
dn_fraglib_sidechain_file                                    fraglib_sidechain.mol2
 
dn_user_specified_anchor                                    yes
 
dn_fraglib_anchor_file                                      fraglib_anchor.mol2
 
dn_use_torenv_table                                          yes
 
dn_torenv_table                                              torenv_table.dat
 
dn_sampling_method                                          graph
 
dn_graph_starting_points                                    10
 
dn_graph_breadth                                            5
 
dn_graph_depth                                              2
 
dn_graph_temperature                                        100.0
 
dn_pruning_conformer_score_cutoff                            100.0
 
dn_pruning_conformer_score_scaling_factor                    1
 
dn_pruning_clustering_cutoff                                100.0
 
dn_constraint_mol_wt                                        1000.0
 
dn_constraint_rot_bon                                        15
 
dn_constraint_formal_charge                                  2.0
 
dn_heur_unmatched_num                                        1
 
dn_heur_matched_rmsd                                        2.0
 
dn_unique_anchors                                            3
 
dn_max_grow_layers                                          9
 
dn_max_root_size                                            100
 
dn_max_layer_size                                            100
 
dn_max_current_aps                                          5
 
dn_max_scaffolds_per_layer                                  1
 
dn_write_checkpoints                                        yes
 
dn_write_prune_dump                                          no
 
dn_write_orients                                            no
 
dn_write_growth_trees                                        no
 
dn_output_prefix                                            output
 
use_internal_energy                                          yes
 
internal_energy_rep_exp                                      12
 
internal_energy_cutoff                                      100.0
 
use_database_filter                                          no
 
orient_ligand                                                yes
 
automated_matching                                          yes
 
receptor_site_file                                          receptor.sph
 
max_orientations                                            1000
 
critical_points                                              no
 
chemical_matching                                            no
 
use_ligand_spheres                                          no
 
bump_filter                                                  no
 
score_molecules                                              yes
 
contact_score_primary                                        no
 
contact_score_secondary                                      no
 
grid_score_primary                                          no
 
grid_score_secondary                                        no
 
multigrid_score_primary                                      no
 
multigrid_score_secondary                                    no
 
dock3.5_score_primary                                        no
 
dock3.5_score_secondary                                      no
 
continuous_score_primary                                    no
 
continuous_score_secondary                                  no
 
footprint_similarity_score_primary                          no
 
footprint_similarity_score_secondary                        no
 
ph4_score_primary                                            no
 
ph4_score_secondary                                          no
 
descriptor_score_primary                                    yes
 
descriptor_score_secondary                                  no
 
descriptor_use_grid_score                                    no
 
descriptor_use_multigrid_score                              yes
 
descriptor_use_pharmacophore_score                          no
 
descriptor_use_tanimoto                                      yes
 
descriptor_use_hungarian                                    yes
 
descriptor_multigrid_score_rep_rad_scale                    1
 
descriptor_multigrid_score_vdw_scale                        1
 
descriptor_multigrid_score_es_scale                          1
 
...
 
descriptor_multigrid_score_number_of_grids                  N
 
descriptor_multigrid_score_grid_prefix0                      grid0
 
descriptor_multigrid_score_grid_prefixN-2                    gridN
 
descriptor_multigrid_score_grid_prefixN-1                    grid_remaining
 
...
 
descriptor_multigrid_score_fp_ref_mol                        yes
 
descriptor_multigrid_score_footprint_ref                    ../001.files/8ABP.lig.multigridmin.mol2
 
descriptor_multigrid_score_use_euc                          yes
 
descriptor_multigrid_score_use_norm_euc                      no
 
descriptor_multigrid_score_use_cor                          no
 
descriptor_multigrid_vdw_euc_scale                          1
 
descriptor_multigrid_es_euc_scale                            1
 
descriptor_fingerprint_ref_filename                          ../001.files/8ABP.lig.multigridmin.mol2
 
descriptor_hungarian_ref_filename                            ../001.files/8ABP.lig.am1bcc.mol2
 
descriptor_hungarian_matching_coeff                          -5
 
descriptor_hungarian_rmsd_coeff                              1
 
descriptor_weight_multigrid_score                            1
 
descriptor_weight_fingerprint_tanimoto                      0
 
descriptor_weight_hungarian                                  0
 
gbsa_zou_score_secondary                                    no
 
gbsa_hawkins_score_secondary                                no
 
SASA_descriptor_score_secondary                              no
 
amber_score_secondary                                        no
 
minimize_ligand                                              yes
 
minimize_anchor                                              yes
 
minimize_flexible_growth                                    yes
 
use_advanced_simplex_parameters                              no
 
simplex_max_cycles                                          1
 
simplex_score_converge                                      0.1
 
simplex_cycle_converge                                      1.0
 
simplex_trans_step                                          1.0
 
simplex_rot_step                                            0.1
 
simplex_tors_step                                            10.0
 
simplex_anchor_max_iterations                                500
 
simplex_grow_max_iterations                                  500
 
simplex_grow_tors_premin_iterations                          0
 
simplex_random_seed                                          0
 
simplex_restraint_min                                        no
 
atom_model                                                  all
 
vdw_defn_file                                                vdw.defn
 
flex_defn_file                                              flex.defn
 
flex_drive_file                                              flex_drive.tbl
 

Latest revision as of 11:53, 4 February 2019

This is the Rizzo lab wiki page for coordinating bugs and progress on the de novo project.

Valgrind clean version of the code on cluster that Rizzo lab should be using:

Lauren:

/gpfs/projects/rizzo/zzz.programs/dock6.9_release
This version includes all changes of the merge.

Path to Generic Fragment Library:

/gpfs/projects/rizzo/leprentis/gen-frags-12

Path to Frequency Anchors:

/gpfs/projects/rizzo/leprentis/zinc1_ancs_freq


Current Coding Progress:

Working on these currently:

  1. Lauren: Check MGS+(-50)TAN before and after fingerprinting fix for 663 systems
  2. Lauren: Implement Roulette fragment picking into graph and random as an option
  3. Lauren: Implement Adjacency Matrix into fraglib/dn (initialize matrix and utilize matrix for graph and random fragment picking)
  4. Lauren: Testing with sb2012 default values for molecules passed on to the next layer and root (looking for timing and efficiency).
  5. Lauren&John: Rework VS protocol to integrate de novo protocol more smoothly



Need to be fixed:

  1. score_molecules and internal_energy problem (for simple_build)
  2. HMS needs to fixed when no heavy atoms matching
  3. Get rid of some scoring functions and only use descriptor score


Not working on these right now:

  1. Lauren: Adjacency matrix vs tors env
  2. Lauren: Addition of "3mer" combination fragment check (post tors check)
  3. Lauren: Min and Max for charge to replace absolute value of charge.(Broke everything)
  4. Lauren: Capping groups for post growth process (halogens and methyls)
  5. Lauren: Fix Frag_String output into chimera for Refinement situations (current space can remove the spaces in the mol2 file - temp fix)
  6. Stephen: Change scaling factor to a function of decay (currently a straight line to lowest score cutoff)
  7. Lauren&John: SMILEs and ZINC script (for dn and ga)
  8. Lauren: incorporate tan pruning as final step (post growth) as user option (replace make_unique script)


Completed:

  1. Lauren: determine if random seed is reset for each aps
  2. Lauren: Create testset for each dn function
  3. Lauren: Test simple build function with merged de novo
  4. Lauren&Stephen: clean make_unique script for release
  5. Lauren: merge GA into dock/dn
  6. Dwight & Lauren: MPI wrapper for 192 processors (8 nodes) for testsets on rizzo cluster
  7. Lauren: Create short testsets for denovo frag gen, focused fragment generic for DOCK6.9 release
  8. Dwight+Lauren: merge parameter files of de novo with DOCK
  9. Lauren: add dn_defn file for separate defn with Hydrogens
  10. Lauren: Implement csingleton fix for orienting fragments with less than 3 heavy atoms
  11. Lauren: Test bfochtman fix for rotatable bonds within an user defined anchor
  12. Lauren: Test csingleton fix for orienting fragments with Du
  13. Lauren: Test MGS focused fragment library results with dn paper
  14. Stephen: editting script to calculate SMILE string of de novo molecules in OpenBable


List of features that we definitely want for the 6.9 release:

Task Owner Complete?
Smooth pruning scaling function LEP
Roulette function to Random and Graph as an option LEP
Overhaul the simple build function LEP
When minimizing with descriptor score, make sure fingerprint is turned off xxx
Speed up fingerprint calculations by saving reference ligand as a permanent object WJA yep
Add pre-min conformations to growth trees WJA yep
Add verbose flag options WJA yep
Put molecular properties (RB, MW, etc) in mol2 header WJA yep
Put ensemble properties (RB, MW, etc) output stream at the end of each layer WJA yep
Check formal charge prune BCF yep
Combination of horizontal pruning metrics (let's consider dropping tanimoto prune and just using hungarian prune) WJA yep
Finish implementing growth trees WJA yep
Revisit orienting to make sure it is working as intended WJA yep
Fixed a bug where we were marking scaffold_this_layer as true for any fragment WJA yep
Update random sampling function to use last layer changes in graph function WJA yep
Do that same thing for the exhaustive function WJA yep
I don't think we ever clear the scaf_link_sid vector, we definitely should do that somewhere WJA yep
Update exhaustive to combine all frags into one library, just like graph / random. WJA yep


List of features/ideas for future releases:

  • Using different references for different layers of dn growth
  • Stereo centers / volume overlap pruning
  • Capping group functions (H, CH3, Halogen)
  • Incorporate GA at the end of each layer
  • Overhaul the simple-build function
  • Monte carlo algorithm that checks bond frequency
  • Scaling max root / layer size with layer
  • Select torenv before selecting fragment. Will need to overhaul fraggraph, will keep us from needing to assemble mols that will not pass torenv.
  • Add fragname string to restart and dump files, already done for final and fraglib files.
  • Add ZINC name to torenv table
  • Unusual behavior during library generation when frequency cutoff == 0
  • Print out how many molecules cannot be capped. (Difference between ensemble size and dump.)
  • building from anchor 0 -> building from scf.98
  • Possible torenv check for dump molecules after capping before printing.
  • keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities. In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries)


List of SB2012 systems that we will use for tests:

For now, let's use 5-15 rotatable bonds inclusive; total = 709 systems ("drug-like" size molecules). De novo paper only used 663 systems that removed 46 systems where the cognate ligand did not fall with a +/-2 formal charge. (5through15 = 709, 5through15_ch2 = 663)

{5RB = 107; 6RB = 96; 7RB = 103; 8RB = 75; 9RB = 66; 10RB = 75; 11RB = 57; 12RB = 41; 13RB = 38; 14RB = 26; 15RB = 25}