Difference between revisions of "DOCK DN Development Goals"

From Rizzo_Lab
Jump to: navigation, search
(Current Coding Progress:)
(Current Coding Progress:)
 
(31 intermediate revisions by 3 users not shown)
Line 17: Line 17:
 
Working on these currently:
 
Working on these currently:
  
# John: acceptance based on freq of torsenv
+
# John: Implement fragment frequency picking as an option
# John: Implement Roulette fragment picking into graph and random as an option
+
# John: Professional web page
# Lauren&John: Rework VS protocol to integrate de novo protocol more smoothly
+
# John: Implement Adjacency Matrix into fraglib/dn (initialize matrix and utilize matrix for graph and random fragment picking)
# Lauren: add dn name with date and counter function
+
# Chris: Guided FPS
 +
# Lauren: Covalent anchor for denovo growth
 +
# Guilherme: rdkit implementation for logp etc
 +
 
 +
 
  
 
<br>
 
<br>
  
Need to be fixed:
+
Need to be fixed/added:
 +
# methyl and amine capping groups
 +
# add print out anchor with frequency option into fraglib code
 +
# MPI option for each anchor
 +
# aromatic rings
 +
# QED
 +
# fraglib generation chirality issue
 +
# chiral centers
 
# score_molecules and internal_energy problem (for simple_build)
 
# score_molecules and internal_energy problem (for simple_build)
 
# HMS needs to fixed when no heavy atoms matching
 
# HMS needs to fixed when no heavy atoms matching
Line 31: Line 42:
  
 
Not working on these right now:
 
Not working on these right now:
# Lauren: Check MGS+(-50)TAN before and after fingerprinting fix for 663 systems Lauren: Adjacency matrix vs tors env focused
+
# Addition of "3mer" combination fragment check (post tors check)
# Lauren: Addition of "3mer" combination fragment check (post tors check)
+
# Min and Max formal charge to replace absolute value of charge.(Broke everything) Step down as layers of growth proceed (layers 1-3 FC = 4, Layers 4-5 FC = 3, Layers 6-8 FC = 2)
# Lauren: Implement Adjacency Matrix into fraglib/dn (initialize matrix and utilize matrix for graph and random fragment picking)
+
# Capping groups for post growth process (halogens and methyls)
# Lauren: Min and Max formal charge to replace absolute value of charge.(Broke everything) Step down as layers of growth proceed (layers 1-3 FC = 4, Layers 4-5 FC = 3, Layers 6-8 FC = 2)
+
# Incorporate tan pruning as final step (post growth) as user option (replace make_unique script) as database filter not dn
# Lauren: Capping groups for post growth process (halogens and methyls)
 
# Lauren: Fix Frag_String output into chimera for Refinement situations (current space can remove the spaces in the mol2 file - temp fix)
 
# Stephen: Change scaling factor to a function of decay (currently a straight line to lowest score cutoff)
 
# Lauren&John: SMILEs and ZINC script (for dn and ga)
 
# Lauren: incorporate tan pruning as final step (post growth) as user option (replace make_unique script) as database filter not dn
 
  
 
<br>
 
<br>
 
Completed:
 
Completed:
 +
# <strike>Lauren: hbond accept/donor descriptor implementation</strike>
 +
# <strike>Chris: increase orienting verbose statistics for dn</strike>
 +
# <strike>John: acceptance based on freq of torsenv</strike>
 +
# <strike>Lauren&John: secondary torenv check of prune dump molecules and testing</strike>
 +
# <strike>Lauren&John: SMILEs and ZINC script (for dn and ga)</strike>
 +
# <strike>Lauren: add dn name with date and counter function</strike>
 +
# <strike>Lauren: Check MGS+(-50)TAN before and after fingerprinting fix for 663 systems</strike>
 
# <strike>Lauren: determine if random seed is reset for each aps</strike>
 
# <strike>Lauren: determine if random seed is reset for each aps</strike>
#<strike> Lauren: Create testset for each dn function </strike>
+
# <strike>Lauren: Create testset for each dn function </strike>
 
# <strike>Lauren: Test simple build function with merged de novo </strike>
 
# <strike>Lauren: Test simple build function with merged de novo </strike>
 
# <strike>Lauren&Stephen: clean make_unique script for release</strike>
 
# <strike>Lauren&Stephen: clean make_unique script for release</strike>
Line 55: Line 68:
 
# <strike>Lauren: Test bfochtman fix for rotatable bonds within an user defined anchor </strike>
 
# <strike>Lauren: Test bfochtman fix for rotatable bonds within an user defined anchor </strike>
 
# <strike>Lauren: Test csingleton fix for orienting fragments with Du </strike>
 
# <strike>Lauren: Test csingleton fix for orienting fragments with Du </strike>
#<strike>Lauren: Test MGS focused fragment library results with dn paper </strike>
+
# <strike>Lauren: Test MGS focused fragment library results with dn paper </strike>
# <strike>Stephen: editting script to calculate SMILE string of de novo molecules in OpenBable </strike>
+
# <strike>Stephen: editting script to calculate SMILE string of de novo molecules in OpenBabel </strike>
 +
# <strike> Stephen: smooth function cutoff for mw </strike>
 +
# <strike> Lauren&John: Rework VS protocol to integrate de novo protocol more smoothly </strike>
 +
# <strike> Lauren&John: Fix torsion problem for prune_dump molecules </strike>
 
<br>
 
<br>
  
Line 66: Line 82:
 
! style="width:15%" !|Owner
 
! style="width:15%" !|Owner
 
! style="width:10%" !|Complete?
 
! style="width:10%" !|Complete?
|-
 
|<strike>Smooth pruning scaling function</strike> || LEP ||
 
|-
 
|<strike>Roulette function to Random and Graph as an option</strike> || LEP ||
 
|-
 
|<strike>Overhaul the simple build function</strike> || LEP ||
 
 
|-
 
|-
 
|When minimizing with descriptor score, make sure fingerprint is turned off || xxx ||
 
|When minimizing with descriptor score, make sure fingerprint is turned off || xxx ||
Line 109: Line 119:
  
 
* Using different references for different layers of dn growth (GFPS protocol) Guided footprint similarity - divide the reference into smaller pieces (layers) to help guide the growth paths more efficiently (i.e. directed growth)
 
* Using different references for different layers of dn growth (GFPS protocol) Guided footprint similarity - divide the reference into smaller pieces (layers) to help guide the growth paths more efficiently (i.e. directed growth)
* Smooth cutoff for molecular weight maximum
 
 
* Stereo centers / volume overlap pruning
 
* Stereo centers / volume overlap pruning
 
* Capping group functions (H, CH3, Halogen)
 
* Capping group functions (H, CH3, Halogen)
* Incorporate GA at the end of each layer
+
* Incorporate GA at the end of each layer (not easy)
 
* Overhaul the simple-build function
 
* Overhaul the simple-build function
 
* Monte carlo algorithm that checks bond frequency
 
* Monte carlo algorithm that checks bond frequency
Line 124: Line 133:
 
* Possible torenv check for dump molecules after capping before printing.
 
* Possible torenv check for dump molecules after capping before printing.
 
* keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities.  In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries)
 
* keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities.  In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries)
 
+
*De novo design with scaled VDW parameters. Exaggerate them and ramp them down or vice versus. May help to eliminate the anchor and slop or anchor and slosh problem.
 
<br>
 
<br>
  

Latest revision as of 10:37, 17 February 2022

This is the Rizzo lab wiki page for coordinating bugs and progress on the de novo project.

Valgrind clean version of the code on cluster that Rizzo lab should be using:

Lauren:

/gpfs/projects/rizzo/zzz.programs/dock6.9_release
This version includes all changes of the merge.

Path to Generic Fragment Library:

/gpfs/projects/rizzo/leprentis/gen-frags-12

Path to Frequency Anchors:

/gpfs/projects/rizzo/leprentis/zinc1_ancs_freq


Current Coding Progress:

Working on these currently:

  1. John: Implement fragment frequency picking as an option
  2. John: Professional web page
  3. John: Implement Adjacency Matrix into fraglib/dn (initialize matrix and utilize matrix for graph and random fragment picking)
  4. Chris: Guided FPS
  5. Lauren: Covalent anchor for denovo growth
  6. Guilherme: rdkit implementation for logp etc



Need to be fixed/added:

  1. methyl and amine capping groups
  2. add print out anchor with frequency option into fraglib code
  3. MPI option for each anchor
  4. aromatic rings
  5. QED
  6. fraglib generation chirality issue
  7. chiral centers
  8. score_molecules and internal_energy problem (for simple_build)
  9. HMS needs to fixed when no heavy atoms matching


Not working on these right now:

  1. Addition of "3mer" combination fragment check (post tors check)
  2. Min and Max formal charge to replace absolute value of charge.(Broke everything) Step down as layers of growth proceed (layers 1-3 FC = 4, Layers 4-5 FC = 3, Layers 6-8 FC = 2)
  3. Capping groups for post growth process (halogens and methyls)
  4. Incorporate tan pruning as final step (post growth) as user option (replace make_unique script) as database filter not dn


Completed:

  1. Lauren: hbond accept/donor descriptor implementation
  2. Chris: increase orienting verbose statistics for dn
  3. John: acceptance based on freq of torsenv
  4. Lauren&John: secondary torenv check of prune dump molecules and testing
  5. Lauren&John: SMILEs and ZINC script (for dn and ga)
  6. Lauren: add dn name with date and counter function
  7. Lauren: Check MGS+(-50)TAN before and after fingerprinting fix for 663 systems
  8. Lauren: determine if random seed is reset for each aps
  9. Lauren: Create testset for each dn function
  10. Lauren: Test simple build function with merged de novo
  11. Lauren&Stephen: clean make_unique script for release
  12. Lauren: merge GA into dock/dn
  13. Dwight & Lauren: MPI wrapper for 192 processors (8 nodes) for testsets on rizzo cluster
  14. Lauren: Create short testsets for denovo frag gen, focused fragment generic for DOCK6.9 release
  15. Dwight+Lauren: merge parameter files of de novo with DOCK
  16. Lauren: add dn_defn file for separate defn with Hydrogens
  17. Lauren: Implement csingleton fix for orienting fragments with less than 3 heavy atoms
  18. Lauren: Test bfochtman fix for rotatable bonds within an user defined anchor
  19. Lauren: Test csingleton fix for orienting fragments with Du
  20. Lauren: Test MGS focused fragment library results with dn paper
  21. Stephen: editting script to calculate SMILE string of de novo molecules in OpenBabel
  22. Stephen: smooth function cutoff for mw
  23. Lauren&John: Rework VS protocol to integrate de novo protocol more smoothly
  24. Lauren&John: Fix torsion problem for prune_dump molecules


List of features that we definitely want for the 6.9 release:

Task Owner Complete?
When minimizing with descriptor score, make sure fingerprint is turned off xxx
Speed up fingerprint calculations by saving reference ligand as a permanent object WJA yep
Add pre-min conformations to growth trees WJA yep
Add verbose flag options WJA yep
Put molecular properties (RB, MW, etc) in mol2 header WJA yep
Put ensemble properties (RB, MW, etc) output stream at the end of each layer WJA yep
Check formal charge prune BCF yep
Combination of horizontal pruning metrics (let's consider dropping tanimoto prune and just using hungarian prune) WJA yep
Finish implementing growth trees WJA yep
Revisit orienting to make sure it is working as intended WJA yep
Fixed a bug where we were marking scaffold_this_layer as true for any fragment WJA yep
Update random sampling function to use last layer changes in graph function WJA yep
Do that same thing for the exhaustive function WJA yep
I don't think we ever clear the scaf_link_sid vector, we definitely should do that somewhere WJA yep
Update exhaustive to combine all frags into one library, just like graph / random. WJA yep


List of features/ideas for future releases:

  • Using different references for different layers of dn growth (GFPS protocol) Guided footprint similarity - divide the reference into smaller pieces (layers) to help guide the growth paths more efficiently (i.e. directed growth)
  • Stereo centers / volume overlap pruning
  • Capping group functions (H, CH3, Halogen)
  • Incorporate GA at the end of each layer (not easy)
  • Overhaul the simple-build function
  • Monte carlo algorithm that checks bond frequency
  • Scaling max root / layer size with layer
  • Select torenv before selecting fragment. Will need to overhaul fraggraph, will keep us from needing to assemble mols that will not pass torenv.
  • Add fragname string to restart and dump files, already done for final and fraglib files.
  • Add ZINC name to torenv table
  • Unusual behavior during library generation when frequency cutoff == 0
  • Print out how many molecules cannot be capped. (Difference between ensemble size and dump.)
  • building from anchor 0 -> building from scf.98
  • Possible torenv check for dump molecules after capping before printing.
  • keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities. In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries)
  • De novo design with scaled VDW parameters. Exaggerate them and ramp them down or vice versus. May help to eliminate the anchor and slop or anchor and slosh problem.


List of SB2012 systems that we will use for tests:

For now, let's use 5-15 rotatable bonds inclusive; total = 709 systems ("drug-like" size molecules). De novo paper only used 663 systems that removed 46 systems where the cognate ligand did not fall with a +/-2 formal charge. (5through15 = 709, 5through15_ch2 = 663)

{5RB = 107; 6RB = 96; 7RB = 103; 8RB = 75; 9RB = 66; 10RB = 75; 11RB = 57; 12RB = 41; 13RB = 38; 14RB = 26; 15RB = 25}