Difference between revisions of "Developer's Info Goals"

From Rizzo_Lab
Jump to: navigation, search
(Using this page)
(General DOCK Goals)
 
(54 intermediate revisions by 2 users not shown)
Line 4: Line 4:
 
=DOCK 6.10 Information=
 
=DOCK 6.10 Information=
 
New features include: an enhanced chemical searching methods termed: molecular evolution DOCK (DOCK_GA), which is an evolution-based method for ligand construction that employs principles of breeding and mutations (see Prentis et al.), a new fragment library generation function was added to the docking protocol, a simplex minimization step ramping functionality for enhanced speed during docking, a new scoring function (internal energy score) that allows for generation and scoring of molecules without a protein, and a molecular weight smoothing function for de novo design that will allow a softer curve of weight distributions in the final ensemble. Secondary score, introduced in 6.1, has been fully removed in this version.
 
New features include: an enhanced chemical searching methods termed: molecular evolution DOCK (DOCK_GA), which is an evolution-based method for ligand construction that employs principles of breeding and mutations (see Prentis et al.), a new fragment library generation function was added to the docking protocol, a simplex minimization step ramping functionality for enhanced speed during docking, a new scoring function (internal energy score) that allows for generation and scoring of molecules without a protein, and a molecular weight smoothing function for de novo design that will allow a softer curve of weight distributions in the final ensemble. Secondary score, introduced in 6.1, has been fully removed in this version.
 
=General DOCK Goals=
 
  
 
=Using this page=
 
=Using this page=
Line 14: Line 12:
  
 
If there are any questions on anything for this page, please direct them to any of the current graduate students or post-docs listed on the [[Rizzo Lab Members and Contact Information|Contact]] page. If they don't know an answer to your inquiry, they will be able to direct you to someone that does.
 
If there are any questions on anything for this page, please direct them to any of the current graduate students or post-docs listed on the [[Rizzo Lab Members and Contact Information|Contact]] page. If they don't know an answer to your inquiry, they will be able to direct you to someone that does.
 +
 +
 +
=General DOCK Goals=
 +
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 +
|- style="background:lightblue"
 +
! style="width:50%" !|Tasks
 +
! style="width:10%" !|Owner
 +
! style="width:30%" !|Notes
 +
|-
 +
|Web server || Steve ||
 +
|-
 +
|Integrated development pipeline, including IDE and unit tests || Brian, John ||
 +
|-
 +
|Add the DUDE systems created by Jiaye, Brian, and Yuchen to the standard DOCK test set || Chris ||
 +
|-
 +
|Create an RNA test set using systems suggested by Al-Hashimi|| Rodger, John ||
 +
|-
 +
|HMS Islands || John || Use HMS to determine clusters of best overlaid congeneric poses
 +
|-
 +
|Anchor names || Brock || Output anchor names alongside molecule number when writing out conformers
 +
|-
 +
|}
 +
 +
<br>
  
 
=DOCK_VS - Virtual Screening/Traditional Docking=
 
=DOCK_VS - Virtual Screening/Traditional Docking=
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 
|- style="background:lightblue"
 
|- style="background:lightblue"
! style="width:75%" !|Tasks
+
! style="width:50%" !|Tasks
! style="width:25%" !|src
+
! style="width:10%" !|Owner
! style="width:13%" !|Owner
+
! style="width:30%" !|Notes
! style="width:10%" !|Complete?
 
 
|-
 
|-
| put in warning flag for missing flex defn type instead of segfaulting || || || no
+
| put in warning flag for missing flex defn type instead of segfaulting || ||  
 
|-
 
|-
| update torenv in dock6 beta || fraglib_torenv.dat || JDB || no
+
| update torenv in dock6 beta || JDB ||
 
|-
 
|-
| Add total conformers samples || || ||  
+
| Add total conformers samples || ||  
 
|-
 
|-
| Put minimize = 0 in flex.defn is a depricated feature in manual ||manual ||LEP || no
+
| Put minimize = 0 in flex.defn is a depricated feature in manual ||LEP ||  
 
|-
 
|-
|input checks in the vs protocol scripts to check whether the step before finalized mpi routines || || || no
+
|input checks in the vs protocol scripts to check whether the step before finalized mpi routines || ||
 
|-
 
|-
|SYLVIA Score || || ||no
+
|SYLVIA Score || ||  
 
|-
 
|-
|determine which library generation outputs are appended rather than overwritten, and change to overwrite || || ||no
+
|determine which library generation outputs are appended rather than overwritten, and change to overwrite || ||
 
|-
 
|-
|put in best first clustering option for database filter || || ||no
+
|put in best first clustering option for database filter || ||
 
|-
 
|-
|Web server || || Open || No
+
|Consensus score (within descriptor score)|| Open ||  
 
|-
 
|-
|Consensus score (within descriptor score)|| || Open || No
+
|Clean GNU warnings  || ||  
 
|-
 
|-
|Clean GNU warnings  || || || No
+
|Multigrid footprint text file formatting needs adjustment || LEP ||
 
|-
 
|-
|Multigrid footprint text file formatting needs adjustment || || LEP || No
+
|Fix minimization issue with perfectly linear (alkyne) compounds|| Open || Add dummy atom 90* as in other codes so dihedral is defined, treat the hydrogen as a part of the carbon or heavy atom (united atom) approach, flag dihedrals that are undefined or close to 180* as non rotatable
 
|-
 
|-
|Add the DUDE systems created by Jiaye, Brian, and Yuchen to the standard DOCK test set || || || No
+
|HMS Islands || Open || Use HMS to determine clusters of best overlaid congeneric poses
 
|-
 
|-
|Create an RNA test set using systems suggested by Al-Hashimi|| || Rodger, John || No
+
|Anchor names || Brock || Output anchor names alongside molecule number when writing out conformers
 
|-
 
|-
|Fix minimization issue with perfectly linear (alkyne) compounds|| Add dummy atom 90* as in other codes so dihedral is defined, treat the hydrogen as a part of the carbon or heavy atom (united atom) approach, flag dihedrals that are undefined or close to 180* as non rotatable|| Open || No
+
|Fuzzy Docking || || examine multiple conformations and take an average based on fuzzy logic
 
|-
 
|-
|HMS Islands || Use HMS to determine clusters of best overlaid congeneric poses || Open || No
+
|Gilson's mining minima || || another approach to incorporating multiple minima (double check this)
|-
 
|Anchor names || Output anchor names alongside molecule number when writing out conformers || Brock || No
 
 
|-
 
|-
 
|}
 
|}
Line 61: Line 80:
 
<br>
 
<br>
  
 
+
=DOCK_DN - De Novo Design=
 +
=== Current Coding Progress: ===
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 
|- style="background:lightblue"
 
|- style="background:lightblue"
! style="width:75%" !|Tasks
+
! style="width:50%" !|Task
! style="width:25%" !|src
+
! style="width:10%" !|Owner
! style="width:13%" !|Owner
+
! style="width:30%" !|Notes
! style="width:10%" !|Complete?
+
|-
 +
|Timing for each anchor - why is it the way it is? || Someone || Goal for 6.11?
 +
|-
 +
|Make the verbose output more readable, and include dev verbose || John || Goal for 6.11
 +
|-
 +
|Cleanse GA from the standard denovo file. || John || Goal for 6.11
 +
|-
 +
|Remove all the unnecessary and commented out code || John || Goal for 6.11
 +
|-
 +
|Completed molecules concatenation || || Need to remove/warning/something the concatenation issue
 +
|-
 +
|Fragment frequency selection || John || Currently testing
 +
|-
 +
|Roulette torsion acceptance || John || Currently testing
 +
|-
 +
|Fragment pair selection of fragments matrix method || John || Currently implementing
 +
|-
 +
|Dynamic reference || Chris ||
 +
|-
 +
|Addition of 3-mer combination fragment checks || John, kinda || Post torsion
 +
|-
 +
|Adding chirality into internal DOCK fingerprints || Brock || Would help w/ uniqueness pruning and fragmentation
 +
|-
 +
|Uniqueness pruning postprocess || John ||
 
|-
 
|-
| verbose ==2 option in dock6 beta || utils.cpp|| LEP || yes
+
|Uniqueness pruning during growth || Brock ||  
 
|-
 
|-
| Add total conformers samples || || ||  
+
|MPI integration - each anchor/all anchors || Brock ||  
 
|-
 
|-
| Check amide bond rotation during sampling - it's nto a bug it was fixed back in 2014 || || LEP||yes
+
|Using RDKit for guided growth - QED, SynthA, etc. || Pak, Guilherme || Planned for 6.11
 
|-
 
|-
| Write out # of HBond Donors and Acceptors || conf_gen_dn, library_file || LEP || yes
+
|When minimizing with descriptor score, make sure fingerprint is turned off || xxx ||
 
|-
 
|-
|put in compiler directives to compile with or without timespec || dock.cpp || LEP || yes
+
|HMS needs to be fixed when no heavy atoms matching. || ||  
 
|-
 
|-
|Fix bug that prints out 2/3 sigfigs instead of 6 for MW and FC  || library_file, filter, amber_typer || LEP || Yes
+
|Add print out anchor with frequency option into fraglib code. || ||  
 
|-
 
|-
|Fix nano/micro/milisecond timer  || dock.cpp || GDRM || Yes
+
|Using different references for different layers of dn growth (GFPS protocol) Guided footprint similarity - divide the reference into smaller pieces (layers) to help guide the growth paths more efficiently (i.e. directed growth) || ||
 
|-
 
|-
|ga flag and verbose == 2 for premin_mol in simplex  || simplex.cpp || LEP || Yes
+
|Min and max formal charge. Step down as layers of growth proceed || || FC = 4 for layers 1-3, FC =3 for 4-5, etc.
 
|-
 
|-
|Merge Hackathon changes to beta for clean faster code  || pow/memcpy/mpi pointers everywhere || LEP || Yes
+
|Stereo centers / volume overlap pruning || ||  
 
|-
 
|-
|Add Tip3p atom type to dock ||vdw.defn fingerprint || LEP || Yes
+
|Capping group functions (H, CH3, NH2, NH3, Halogen) || ||
 
|-
 
|-
|Hide secondary scoring function permanently ||lots || LEP || Yes
+
|Incorporate GA at the end of each layer (not easy) || ||
 
|-
 
|-
|Merge GIST into latest dock ||grid, master_score, score_descriptor, score_gist || LEP || Yes
+
|Monte carlo algorithm that checks bond frequency || ||
 
|-
 
|-
|Add second layer of verbosity ||  utils, conf_gen_dn so far || LEP || Yes
+
|Scaling max root / layer size with layer || ||
 
|-
 
|-
|RDKit integration with DOCK || || GDM || Yes
+
|Select torenv before selecting fragment.  Will need to overhaul fraggraph, will keep us from needing to assemble mols that will not pass torenv. || ||
 +
|-
 +
|Add fragname string to restart and dump files, already done for final and fraglib files. || ||
 +
|-
 +
|Add ZINC name to torenv table || ||
 +
|-
 +
|Unusual behavior during library generation when frequency cutoff == 0 || ||
 +
|-
 +
|Print out how many molecules cannot be capped. (Difference between ensemble size and dump.) || ||
 +
|-
 +
|building from anchor 0 -> building from scf.98 || ||
 +
|-
 +
|keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities.  In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries) || ||
 +
|-
 +
|De novo design with scaled VDW parameters. Exaggerate them and ramp them down or vice versus. May help to eliminate the anchor and slop or anchor and slosh problem. || ||
 +
|-
 +
|Change term storing fragment name to something more meaningful #energy || || the name of the variable is energy
 +
|-
 +
|Chair vs. boat conformations & stereochemistry - only first is picked, and don't account for the rest of the isomers/conformations. Should keep those. || ||  
 
|-
 
|-
|Modify Grid to show error on nonintegrality || || BTB || Yes
 
 
|}
 
|}
<br>
 
 
=DOCK_DN - De Novo Design=
 
This is the Rizzo lab wiki page for coordinating bugs and progress on the de novo project.
 
<br>
 
 
=== Valgrind clean version of the code on cluster that Rizzo lab should be using: ===
 
Lauren:
 
/gpfs/projects/rizzo/zzz.programs/dock6.9_release
 
This version includes all changes of the merge.
 
 
Path to Generic Fragment Library:
 
/gpfs/projects/rizzo/leprentis/gen-frags-12
 
 
Path to Frequency Anchors:
 
/gpfs/projects/rizzo/leprentis/zinc1_ancs_freq
 
<br>
 
 
=== Current Coding Progress: ===
 
Working on these currently:
 
 
# John: Implement fragment frequency picking as an option
 
# John: Professional web page
 
# John: Implement Adjacency Matrix into fraglib/dn (initialize matrix and utilize matrix for graph and random fragment picking)
 
# Chris: Guided FPS
 
# Lauren: Covalent anchor for denovo growth
 
# Guilherme: rdkit implementation for logp etc
 
 
  
 +
=DOCK_GA - Genetic Algorithm=
  
<br>
+
==General To-Do==
 
 
Need to be fixed/added:
 
# methyl and amine capping groups
 
# add print out anchor with frequency option into fraglib code
 
# MPI option for each anchor
 
# aromatic rings
 
# QED
 
# fraglib generation chirality issue
 
# chiral centers
 
# score_molecules and internal_energy problem (for simple_build)
 
# HMS needs to fixed when no heavy atoms matching
 
 
 
<br>
 
 
 
Not working on these right now:
 
# Addition of "3mer" combination fragment check (post tors check)
 
# Min and Max formal charge to replace absolute value of charge.(Broke everything) Step down as layers of growth proceed (layers 1-3 FC = 4, Layers 4-5 FC = 3, Layers 6-8 FC = 2)
 
# Capping groups for post growth process (halogens and methyls)
 
# Incorporate tan pruning as final step (post growth) as user option (replace make_unique script) as database filter not dn
 
 
 
<br>
 
Completed:
 
# <strike>Lauren: hbond accept/donor descriptor implementation</strike>
 
# <strike>Chris: increase orienting verbose statistics for dn</strike>
 
# <strike>John: acceptance based on freq of torsenv</strike>
 
# <strike>Lauren&John: secondary torenv check of prune dump molecules and testing</strike>
 
# <strike>Lauren&John: SMILEs and ZINC script (for dn and ga)</strike>
 
# <strike>Lauren: add dn name with date and counter function</strike>
 
# <strike>Lauren: Check MGS+(-50)TAN before and after fingerprinting fix for 663 systems</strike>
 
# <strike>Lauren: determine if random seed is reset for each aps</strike>
 
# <strike>Lauren: Create testset for each dn function </strike>
 
# <strike>Lauren: Test simple build function with merged de novo </strike>
 
# <strike>Lauren&Stephen: clean make_unique script for release</strike>
 
# <strike>Lauren: merge GA into dock/dn </strike>
 
# <strike>Dwight & Lauren: MPI wrapper for 192 processors (8 nodes) for testsets on rizzo cluster </strike>
 
# <strike>Lauren: Create short testsets for denovo frag gen, focused fragment generic for DOCK6.9 release </strike>
 
# <strike>Dwight+Lauren: merge parameter files of de novo with DOCK </strike>
 
# <strike>Lauren: add dn_defn file for separate defn with Hydrogens </strike>
 
# <strike>Lauren: Implement csingleton fix for orienting fragments with less than 3 heavy atoms </strike>
 
# <strike>Lauren: Test bfochtman fix for rotatable bonds within an user defined anchor </strike>
 
# <strike>Lauren: Test csingleton fix for orienting fragments with Du </strike>
 
# <strike>Lauren: Test MGS focused fragment library results with dn paper </strike>
 
# <strike>Stephen: editting script to calculate SMILE string of de novo molecules in OpenBabel </strike>
 
# <strike> Stephen: smooth function cutoff for mw </strike>
 
# <strike> Lauren&John: Rework VS protocol to integrate de novo protocol more smoothly </strike>
 
# <strike> Lauren&John: Fix torsion problem for prune_dump molecules </strike>
 
<br>
 
 
 
=== List of features that we definitely want for the 6.9 release: ===
 
 
 
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 
|- style="background:lightblue"
 
|- style="background:lightblue"
! style="width:75%" !|Task
+
! style="width:50%" !|Tasks
! style="width:15%" !|Owner
+
! style="width:10%" !|Owner
! style="width:10%" !|Complete?
+
! style="width:30%" !|Notes
 
|-
 
|-
|When minimizing with descriptor score, make sure fingerprint is turned off || xxx ||
+
|Adaptive maintenance of ensemble based on convergence (extinction, delta max, etc.). || BTB ||  
 
|-
 
|-
|Speed up fingerprint calculations by saving reference ligand as a permanent object || WJA || yep
+
|Add leading 0's to xover output filenames || JDB ||
 
|-
 
|-
|Add pre-min conformations to growth trees || WJA || yep
+
|DNM replacement unable to build list? || JDB ||
 
|-
 
|-
|Add verbose flag options || WJA || yep
+
|Multi-layer replacement for Amides || JDB ||
 
|-
 
|-
|Put molecular properties (RB, MW, etc) in mol2 header || WJA || yep
+
|Compute delta slope of fitness score || BTB ||
 
|-
 
|-
|Put ensemble properties (RB, MW, etc) output stream at the end of each layer || WJA || yep
+
|slow down molecular evolution so there are less drastic changes between each successive generation - "delta max/max change" || BTB ||
 
|-
 
|-
|Check formal charge prune || BCF || yep
+
|bring in new parents (e.g. from a pool of molecules) based on convergence - "migration" || BTB ||
 
|-
 
|-
|Combination of horizontal pruning metrics (let's consider dropping tanimoto prune and just using hungarian prune) || WJA  || yep
+
|user defined point vs on-the-fly convergence || BTB ||
 
|-
 
|-
|Finish implementing growth trees || WJA || yep
+
|metropolis selection for tournament/roulette || BTB  ||
 
|-
 
|-
|Revisit orienting to make sure it is working as intended || WJA || yep
+
|tanimoto coefficient percent change - might be inaccurate due to tan coef behavior || ||  
 
|-
 
|-
|Fixed a bug where we were marking scaffold_this_layer as true for any fragment || WJA || yep
+
|Rotatable bond changes (???) || || Still unsure what this means - JDB
 
|-
 
|-
|Update random sampling function to use last layer changes in graph function || WJA || yep
+
|Limit number of aromatic rings.|| ||
 
|-
 
|-
|Do that same thing for the exhaustive function || WJA || yep
+
|Adaptively keep differing #'s of parents and children based on some internal criteria (similarity?) || ||  
 
|-
 
|-
|I don't think we ever clear the scaf_link_sid vector, we definitely should do that somewhere || WJA || yep
+
|Determine whether the dummy vdw file is necessary for all DN (including GA) or just DN. || ||  
 
|-
 
|-
|Update exhaustive to combine all frags into one library, just like graph / random. || WJA || yep
+
|Change term storing mutation name to something more meaningful #energy || || the name of the variable is energy
 
|-
 
|-
 
|}
 
|}
 
<br>
 
<br>
  
=== List of features/ideas for future releases: ===
+
==Mutations==
 
 
* Using different references for different layers of dn growth (GFPS protocol) Guided footprint similarity - divide the reference into smaller pieces (layers) to help guide the growth paths more efficiently (i.e. directed growth)
 
* Stereo centers / volume overlap pruning
 
* Capping group functions (H, CH3, Halogen)
 
* Incorporate GA at the end of each layer (not easy)
 
* Overhaul the simple-build function
 
* Monte carlo algorithm that checks bond frequency
 
* Scaling max root / layer size with layer
 
* Select torenv before selecting fragment.  Will need to overhaul fraggraph, will keep us from needing to assemble mols that will not pass torenv.
 
* Add fragname string to restart and dump files, already done for final and fraglib files.
 
* Add ZINC name to torenv table
 
* Unusual behavior during library generation when frequency cutoff == 0
 
* Print out how many molecules cannot be capped. (Difference between ensemble size and dump.)
 
* building from anchor 0 -> building from scf.98
 
* Possible torenv check for dump molecules after capping before printing.
 
* keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities.  In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries)
 
*De novo design with scaled VDW parameters. Exaggerate them and ramp them down or vice versus. May help to eliminate the anchor and slop or anchor and slosh problem.
 
<br>
 
 
 
=== List of SB2012 systems that we will use for tests: ===
 
 
 
For now, let's use 5-15 rotatable bonds inclusive; total = 709 systems ("drug-like" size molecules). De novo paper only used 663 systems that removed 46 systems where the cognate ligand did not fall with a +/-2 formal charge. (5through15 = 709, 5through15_ch2 = 663)
 
 
 
{5RB = 107; 6RB = 96; 7RB = 103; 8RB = 75; 9RB = 66; 10RB = 75; 11RB = 57; 12RB = 41; 13RB = 38; 14RB = 26; 15RB = 25}
 
 
 
<br>
 
 
 
=DOCK_GA - Genetic Algorithm=
 
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 
|- style="background:lightblue"
 
|- style="background:lightblue"
! style="width:75%" !|Tasks
+
! style="width:50%" !|Tasks
! style="width:25%" !|src
+
! style="width:10%" !|Owner
! style="width:13%" !|Owner
+
! style="width:30%" !|Notes
! style="width:10%" !|Complete?
 
 
|-
 
|-
|Horizontal pruning issue not catching chemically identical molecules || LEP || no
+
|Adaptive mutation rate || Brock || Change rate of mutation based on some internal criteria
 
|-
 
|-
|Fix bug that collapsed atom coordinates  || everywhere? nowhere? somewhere. || LEP || YES MUAHAHAHAHA!
+
|Pick location of mutation based on some internal criteria || ||  
 
|-
 
|-
|Added Delimeter header ||conf_gen_ga || LEP || Yes
+
|Pick mutation type based on ensemble behavior.|| ||  
 
|-
 
|-
|Fix xover only feature ||conf_gen_ga || LEP || Yes
+
|If molecules are too large, boost deletion (useful for elitism).|| ||  
 
|-
 
|-
|Put in error messages for mut_rate > 1 || conf_gen_ga ||LEP || Yes
+
|If molecules are too small, boost additive mutations. || ||  
 
|-
 
|-
|Manual user-defined mutation type ||conf_gen_ga || LEP || Yes
+
|If molecules are too similar, boost replacements and substitutions. || ||  
 
|-
 
|-
|Remove check only option || conf_gen_ga || LEP || Yes
+
|Mutation type selection based on probability vs. ensemble || ||  
 
|-
 
|-
|Add single molecule evolution in testcase in install dir. || install/test/genetic || LEP ||yes
+
|Change chance of performing a mutation type based on number already tried || || Higher success = lower attempts, Lower success = more attempts, etc.
 
|-
 
|-
|Add leading 0's to xover output filenames || conf_gen_ga.cpp || JDB || no
+
|2 layer replacements || ||  
 
|-
 
|-
|DNM replacement unable to build list? || conf_gen_ga.cpp || JDB || no
+
|Multilayer substitutions || || Could be kinda difficult or may not even work
 +
|-
 +
|}
 +
<br>
 +
 
 +
==Crossover==
 +
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 +
|- style="background:lightblue"
 +
! style="width:50%" !|Tasks
 +
! style="width:10%" !|Owner
 +
! style="width:30%" !|Notes
 
|-
 
|-
|Multi-layer replacement for Amides || conf_gen_ga.cpp || JDB || no
+
|Crossover on multiple points simultaneously, rather than just 1 || ||  
 
|-
 
|-
|Compute delta slope of fitness score || congen_ga.cpp || BTB ||no
+
|Nonexhaustive crossover - pick subset based on probability || ||  
 
|-
 
|-
| slow down molecular evolution so there are less drastic canges between each successive generation || congen_ga.cpp || BTB ||no
+
|Nonexhaustive crossover for each pair of parents || || Try a specific number of bonds rather than exhaustively sampling all of them.
 
|-
 
|-
| bring in new parents (e.g. from a pool of molecules) based on convergence || confgen_ga.cpp || BTB || no
+
|Guided crossover based on score || || Good v Good, Bad v Good, Bad v Bad
 
|-
 
|-
| user defined point vs on-the-fly convergence || confgen_ga.cpp || BTB || no
+
|Choose crossover pairs based on measure of similarity (boost crossovers between dissimilar molecules). || ||
 +
|}
 +
<br>
 +
 
 +
==Fitness/Pruning/Diversity/Other==
 +
{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
 +
|- style="background:lightblue"
 +
! style="width:50%" !|Tasks
 +
! style="width:10%" !|Owner
 +
! style="width:30%" !|Notes
 
|-
 
|-
| metropolis selection for tournament/roulette || conf_gen_ga.cpp || BTB  || no
+
|Turn on and off niching adaptive/extinction || Brock ||  
 
|-
 
|-
 +
|Reduce boost of fragments and all poor mols with niching || ||
 +
|-
 +
|Pareto/multiobjective ga || ||
 +
|-
 +
|best first pruning - now uses descriptor score even if niching ned to delta to fitness/niching when used|| Brock ||
 +
|-
 +
|Geometric diversity using Hingarian and Tan pruning || Brock ||
 +
|-
 +
|Horizontal pruning issue not catching chemically identical molecules || LEP ||
 
|}
 
|}
 
<br>
 
<br>
  
 
+
== Known Potential Bugs ==
==To Do List==
 
# tanimoto coefficient percent change - might be inaccurate due to tan coef behavior
 
# Rotatable bond changes (???)
 
# Limit number of aromatic rings.
 
#-xover (guided based on score) - Good v Good ; Bad v Good ; Bad v Bad THIS
 
#Nonexhaustive xover (pick subset of xover based on probability)
 
#Nonexhaustive xover for each pair of parents - have a set number of bonds for xover rather than trying all of them.
 
#Crossover on multiple points simultaneously (2-3 crossovers on a given pair to make 3+ children rather than 2+).
 
#Adaptive maintenance of ensemble based on convergence (extinction, delta max, etc.).
 
#Stop convergence.
 
#Mutations-
 
##Adaptive mutation rate - change rate of mutation based on some internal criteria.
 
##Pick location of mutation based on some internal criteria.
 
##Pick mutation type based on ensemble behavior.
 
##If molecules are too large, boost deletion (useful for elitism).
 
##If molecules are too small, boost additive mutations.
 
##If molecules are too similar, boost replacements and substitutions.
 
##mutation type selection based on probability vs ensemble
 
##Complete x # y mutation so far so less prevalent etc
 
##Note: 3 layer substitutions probably aren't going to work.
 
##2-layer replacements.
 
#fitness-
 
##turn on and off niching adaptive/extinction
 
##reduce boost of fragments and all poor mols with niching
 
##pareto/mulitobjective ga
 
#selection-
 
##Adaptively keep differing #'s of parents and children based on some internal criteria (similarity?)
 
#extinction-
 
#which molecules are best-
 
##best first pruning - now uses descriptor score even if niching ned to delta to fitness/niching when used
 
## geometric diversity using Hingarian and Tan pruning
 
#Determine whether the dummy vdw file is necessary for all DN (including GA) or just DN.
 
#Choose crossover pairs based on measure of similarity (boost crossovers between dissimilar molecules).
 
 
 
== Known Bugs ==
 
 
#Molecules Processed bug (dock.cpp)
 
#Molecules Processed bug (dock.cpp)
 
#verbose mol stats (amber typer)
 
#verbose mol stats (amber typer)

Latest revision as of 04:31, 30 August 2023

DOCK 6.11 Information

Planned additions:

DOCK 6.10 Information

New features include: an enhanced chemical searching methods termed: molecular evolution DOCK (DOCK_GA), which is an evolution-based method for ligand construction that employs principles of breeding and mutations (see Prentis et al.), a new fragment library generation function was added to the docking protocol, a simplex minimization step ramping functionality for enhanced speed during docking, a new scoring function (internal energy score) that allows for generation and scoring of molecules without a protein, and a molecular weight smoothing function for de novo design that will allow a softer curve of weight distributions in the final ensemble. Secondary score, introduced in 6.1, has been fully removed in this version.

Using this page

This page is designed to be a unified location for development goals we have set for each sampling/experimental type we are able to perform in DOCK. New tasks and goals can be included in the following tables, and can link to their own development pages if need be.

The current page setup includes an Archive page for completed tasks. When a task is deemed complete, please move it to the designated section on the Archive, sign it with your initials, and date it. Any relevant information should also be included in notes, being brief but descriptive (ex. 'pushed addition/fix to GitHub on MM/DD/YYYY, Commit Hash: ######')


If there are any questions on anything for this page, please direct them to any of the current graduate students or post-docs listed on the Contact page. If they don't know an answer to your inquiry, they will be able to direct you to someone that does.


General DOCK Goals

Tasks Owner Notes
Web server Steve
Integrated development pipeline, including IDE and unit tests Brian, John
Add the DUDE systems created by Jiaye, Brian, and Yuchen to the standard DOCK test set Chris
Create an RNA test set using systems suggested by Al-Hashimi Rodger, John
HMS Islands John Use HMS to determine clusters of best overlaid congeneric poses
Anchor names Brock Output anchor names alongside molecule number when writing out conformers


DOCK_VS - Virtual Screening/Traditional Docking

Tasks Owner Notes
put in warning flag for missing flex defn type instead of segfaulting
update torenv in dock6 beta JDB
Add total conformers samples
Put minimize = 0 in flex.defn is a depricated feature in manual LEP
input checks in the vs protocol scripts to check whether the step before finalized mpi routines
SYLVIA Score
determine which library generation outputs are appended rather than overwritten, and change to overwrite
put in best first clustering option for database filter
Consensus score (within descriptor score) Open
Clean GNU warnings
Multigrid footprint text file formatting needs adjustment LEP
Fix minimization issue with perfectly linear (alkyne) compounds Open Add dummy atom 90* as in other codes so dihedral is defined, treat the hydrogen as a part of the carbon or heavy atom (united atom) approach, flag dihedrals that are undefined or close to 180* as non rotatable
HMS Islands Open Use HMS to determine clusters of best overlaid congeneric poses
Anchor names Brock Output anchor names alongside molecule number when writing out conformers
Fuzzy Docking examine multiple conformations and take an average based on fuzzy logic
Gilson's mining minima another approach to incorporating multiple minima (double check this)


DOCK_DN - De Novo Design

Current Coding Progress:

Task Owner Notes
Timing for each anchor - why is it the way it is? Someone Goal for 6.11?
Make the verbose output more readable, and include dev verbose John Goal for 6.11
Cleanse GA from the standard denovo file. John Goal for 6.11
Remove all the unnecessary and commented out code John Goal for 6.11
Completed molecules concatenation Need to remove/warning/something the concatenation issue
Fragment frequency selection John Currently testing
Roulette torsion acceptance John Currently testing
Fragment pair selection of fragments matrix method John Currently implementing
Dynamic reference Chris
Addition of 3-mer combination fragment checks John, kinda Post torsion
Adding chirality into internal DOCK fingerprints Brock Would help w/ uniqueness pruning and fragmentation
Uniqueness pruning postprocess John
Uniqueness pruning during growth Brock
MPI integration - each anchor/all anchors Brock
Using RDKit for guided growth - QED, SynthA, etc. Pak, Guilherme Planned for 6.11
When minimizing with descriptor score, make sure fingerprint is turned off xxx
HMS needs to be fixed when no heavy atoms matching.
Add print out anchor with frequency option into fraglib code.
Using different references for different layers of dn growth (GFPS protocol) Guided footprint similarity - divide the reference into smaller pieces (layers) to help guide the growth paths more efficiently (i.e. directed growth)
Min and max formal charge. Step down as layers of growth proceed FC = 4 for layers 1-3, FC =3 for 4-5, etc.
Stereo centers / volume overlap pruning
Capping group functions (H, CH3, NH2, NH3, Halogen)
Incorporate GA at the end of each layer (not easy)
Monte carlo algorithm that checks bond frequency
Scaling max root / layer size with layer
Select torenv before selecting fragment. Will need to overhaul fraggraph, will keep us from needing to assemble mols that will not pass torenv.
Add fragname string to restart and dump files, already done for final and fraglib files.
Add ZINC name to torenv table
Unusual behavior during library generation when frequency cutoff == 0
Print out how many molecules cannot be capped. (Difference between ensemble size and dump.)
building from anchor 0 -> building from scf.98
keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities. In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries)
De novo design with scaled VDW parameters. Exaggerate them and ramp them down or vice versus. May help to eliminate the anchor and slop or anchor and slosh problem.
Change term storing fragment name to something more meaningful #energy the name of the variable is energy
Chair vs. boat conformations & stereochemistry - only first is picked, and don't account for the rest of the isomers/conformations. Should keep those.

DOCK_GA - Genetic Algorithm

General To-Do

Tasks Owner Notes
Adaptive maintenance of ensemble based on convergence (extinction, delta max, etc.). BTB
Add leading 0's to xover output filenames JDB
DNM replacement unable to build list? JDB
Multi-layer replacement for Amides JDB
Compute delta slope of fitness score BTB
slow down molecular evolution so there are less drastic changes between each successive generation - "delta max/max change" BTB
bring in new parents (e.g. from a pool of molecules) based on convergence - "migration" BTB
user defined point vs on-the-fly convergence BTB
metropolis selection for tournament/roulette BTB
tanimoto coefficient percent change - might be inaccurate due to tan coef behavior
Rotatable bond changes (???) Still unsure what this means - JDB
Limit number of aromatic rings.
Adaptively keep differing #'s of parents and children based on some internal criteria (similarity?)
Determine whether the dummy vdw file is necessary for all DN (including GA) or just DN.
Change term storing mutation name to something more meaningful #energy the name of the variable is energy


Mutations

Tasks Owner Notes
Adaptive mutation rate Brock Change rate of mutation based on some internal criteria
Pick location of mutation based on some internal criteria
Pick mutation type based on ensemble behavior.
If molecules are too large, boost deletion (useful for elitism).
If molecules are too small, boost additive mutations.
If molecules are too similar, boost replacements and substitutions.
Mutation type selection based on probability vs. ensemble
Change chance of performing a mutation type based on number already tried Higher success = lower attempts, Lower success = more attempts, etc.
2 layer replacements
Multilayer substitutions Could be kinda difficult or may not even work


Crossover

Tasks Owner Notes
Crossover on multiple points simultaneously, rather than just 1
Nonexhaustive crossover - pick subset based on probability
Nonexhaustive crossover for each pair of parents Try a specific number of bonds rather than exhaustively sampling all of them.
Guided crossover based on score Good v Good, Bad v Good, Bad v Bad
Choose crossover pairs based on measure of similarity (boost crossovers between dissimilar molecules).


Fitness/Pruning/Diversity/Other

Tasks Owner Notes
Turn on and off niching adaptive/extinction Brock
Reduce boost of fragments and all poor mols with niching
Pareto/multiobjective ga
best first pruning - now uses descriptor score even if niching ned to delta to fitness/niching when used Brock
Geometric diversity using Hingarian and Tan pruning Brock
Horizontal pruning issue not catching chemically identical molecules LEP


Known Potential Bugs

  1. Molecules Processed bug (dock.cpp)
  2. verbose mol stats (amber typer)
  3. molecule being renamed when going into repl even tho it's the same molecule

DOCK_CV - Covalent Docking

As of right now only tested on pose reproduction for 2 testcases.

Create a sphere file by using the sidechain of the CYS residue. In this case isolate the sulfur, carbon, and carbon, and save as a pdb. Using the script pdbtosph, convert that pdb to a .sph file to be read by DOCK. Three atoms are required for spheres and orienting.

Prepare receptor: Remove the covalently bound ligand. Add Hydrogens and charge to the receptor. Remove the CYS sulfur and the next carbon with their associated Hydrogens only. Save as mol2.

Prepare the ligand: Remove all of the protein except for the CYS residue. Delete the backbone and some of the sidechain, leaving the sulfur and attached carbon. Add hydrogens and charge, and manually delete the hydrogens added onto the residue sidechain (ie sulfur and carbon). Save ligand as mol2. Open and edit the ligand.mol2 in vi and change the name of the sulfur to D1 and the atomtype to Du. Change the name of the carbon to D2 and the atomtype to Du. Save the ligand mol2.