Developer's Info Goals
Contents
DOCK 6.11 Information
Planned additions:
DOCK 6.10 Information
New features include: an enhanced chemical searching methods termed: molecular evolution DOCK (DOCK_GA), which is an evolution-based method for ligand construction that employs principles of breeding and mutations (see Prentis et al.), a new fragment library generation function was added to the docking protocol, a simplex minimization step ramping functionality for enhanced speed during docking, a new scoring function (internal energy score) that allows for generation and scoring of molecules without a protein, and a molecular weight smoothing function for de novo design that will allow a softer curve of weight distributions in the final ensemble. Secondary score, introduced in 6.1, has been fully removed in this version.
Using this page
This page is designed to be a unified location for development goals we have set for each sampling/experimental type we are able to perform in DOCK. New tasks and goals can be included in the following tables, and can link to their own development pages if need be.
The current page setup includes an Archive page for completed tasks. When a task is deemed complete, please move it to the designated section on the Archive, sign it with your initials, and date it. Any relevant information should also be included in notes, being brief but descriptive (ex. 'pushed addition/fix to GitHub on MM/DD/YYYY, Commit Hash: ######')
If there are any questions on anything for this page, please direct them to any of the current graduate students or post-docs listed on the Contact page. If they don't know an answer to your inquiry, they will be able to direct you to someone that does.
General DOCK Goals
Tasks | Owner | Notes | |
---|---|---|---|
Web server | Open | Steve | |
Integrated development pipeline, including IDE and unit tests | Brian, John | ||
Add the DUDE systems created by Jiaye, Brian, and Yuchen to the standard DOCK test set | Chris | ||
Create an RNA test set using systems suggested by Al-Hashimi | Rodger, John | ||
HMS Islands | John | Use HMS to determine clusters of best overlaid congeneric poses | |
Anchor names | Brock | Output anchor names alongside molecule number when writing out conformers |
DOCK_VS - Virtual Screening/Traditional Docking
Tasks | Owner | Notes |
---|---|---|
put in warning flag for missing flex defn type instead of segfaulting | ||
update torenv in dock6 beta | JDB | |
Add total conformers samples | ||
Put minimize = 0 in flex.defn is a depricated feature in manual | LEP | |
input checks in the vs protocol scripts to check whether the step before finalized mpi routines | ||
SYLVIA Score | ||
determine which library generation outputs are appended rather than overwritten, and change to overwrite | ||
put in best first clustering option for database filter | ||
Consensus score (within descriptor score) | Open | |
Clean GNU warnings | ||
Multigrid footprint text file formatting needs adjustment | LEP | |
Fix minimization issue with perfectly linear (alkyne) compounds | Open | Add dummy atom 90* as in other codes so dihedral is defined, treat the hydrogen as a part of the carbon or heavy atom (united atom) approach, flag dihedrals that are undefined or close to 180* as non rotatable |
HMS Islands | Open | Use HMS to determine clusters of best overlaid congeneric poses |
Anchor names | Brock | Output anchor names alongside molecule number when writing out conformers |
Fuzzy Docking | examine multiple conformations and take an average based on fuzzy logic | |
Gilson's mining minima | another approach to incorporating multiple minima (double check this) |
DOCK_DN - De Novo Design
Current Coding Progress:
Task | Owner | Notes |
---|---|---|
Timing for each anchor - why is it the way it is? | Someone | Goal for 6.11? |
Make the verbose output more readable, and include dev verbose | John | Goal for 6.11 |
Cleanse GA from the standard denovo file. | John | Goal for 6.11 |
Remove all the unnecessary and commented out code | John | Goal for 6.11 |
Completed molecules concatenation | Need to remove/warning/something the concatenation issue | |
Fragment frequency selection | John | Currently testing |
Roulette torsion acceptance | John | Currently testing |
Fragment pair selection of fragments matrix method | John | Currently implementing |
Dynamic reference | Chris | |
Addition of 3-mer combination fragment checks | John, kinda | Post torsion |
Adding chirality into internal DOCK fingerprints | Brock | Would help w/ uniqueness pruning and fragmentation |
Uniqueness pruning postprocess | John | |
Uniqueness pruning during growth | Brock | |
MPI integration - each anchor/all anchors | Brock | |
Using RDKit for guided growth - QED, SynthA, etc. | Pak, Guilherme | Planned for 6.11 |
When minimizing with descriptor score, make sure fingerprint is turned off | xxx | |
HMS needs to be fixed when no heavy atoms matching. | ||
Add print out anchor with frequency option into fraglib code. | ||
Using different references for different layers of dn growth (GFPS protocol) Guided footprint similarity - divide the reference into smaller pieces (layers) to help guide the growth paths more efficiently (i.e. directed growth) | ||
Min and max formal charge. Step down as layers of growth proceed | FC = 4 for layers 1-3, FC =3 for 4-5, etc. | |
Stereo centers / volume overlap pruning | ||
Capping group functions (H, CH3, NH2, NH3, Halogen) | ||
Incorporate GA at the end of each layer (not easy) | ||
Monte carlo algorithm that checks bond frequency | ||
Scaling max root / layer size with layer | ||
Select torenv before selecting fragment. Will need to overhaul fraggraph, will keep us from needing to assemble mols that will not pass torenv. | ||
Add fragname string to restart and dump files, already done for final and fraglib files. | ||
Add ZINC name to torenv table | ||
Unusual behavior during library generation when frequency cutoff == 0 | ||
Print out how many molecules cannot be capped. (Difference between ensemble size and dump.) | ||
building from anchor 0 -> building from scf.98 | ||
keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities. In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries) | ||
De novo design with scaled VDW parameters. Exaggerate them and ramp them down or vice versus. May help to eliminate the anchor and slop or anchor and slosh problem. | ||
Change term storing fragment name to something more meaningful #energy | the name of the variable is energy | |
Chair vs. boat conformations & stereochemistry - only first is picked, and don't account for the rest of the isomers/conformations. Should keep those. |
DOCK_GA - Genetic Algorithm
General To-Do
Tasks | Owner | Notes |
---|---|---|
Adaptive maintenance of ensemble based on convergence (extinction, delta max, etc.). | BTB | |
Add leading 0's to xover output filenames | JDB | |
DNM replacement unable to build list? | JDB | |
Multi-layer replacement for Amides | JDB | |
Compute delta slope of fitness score | BTB | |
slow down molecular evolution so there are less drastic changes between each successive generation - "delta max/max change" | BTB | |
bring in new parents (e.g. from a pool of molecules) based on convergence - "migration" | BTB | |
user defined point vs on-the-fly convergence | BTB | |
metropolis selection for tournament/roulette | BTB | |
tanimoto coefficient percent change - might be inaccurate due to tan coef behavior | ||
Rotatable bond changes (???) | Still unsure what this means - JDB | |
Limit number of aromatic rings. | ||
Adaptively keep differing #'s of parents and children based on some internal criteria (similarity?) | ||
Determine whether the dummy vdw file is necessary for all DN (including GA) or just DN. | ||
Change term storing mutation name to something more meaningful #energy | the name of the variable is energy |
Mutations
Tasks | Owner | Notes |
---|---|---|
Adaptive mutation rate | Brock | Change rate of mutation based on some internal criteria |
Pick location of mutation based on some internal criteria | ||
Pick mutation type based on ensemble behavior. | ||
If molecules are too large, boost deletion (useful for elitism). | ||
If molecules are too small, boost additive mutations. | ||
If molecules are too similar, boost replacements and substitutions. | ||
Mutation type selection based on probability vs. ensemble | ||
Change chance of performing a mutation type based on number already tried | Higher success = lower attempts, Lower success = more attempts, etc. | |
2 layer replacements | ||
Multilayer substitutions | Could be kinda difficult or may not even work |
Crossover
Tasks | Owner | Notes |
---|---|---|
Crossover on multiple points simultaneously, rather than just 1 | ||
Nonexhaustive crossover - pick subset based on probability | ||
Nonexhaustive crossover for each pair of parents | Try a specific number of bonds rather than exhaustively sampling all of them. | |
Guided crossover based on score | Good v Good, Bad v Good, Bad v Bad | |
Choose crossover pairs based on measure of similarity (boost crossovers between dissimilar molecules). |
Fitness/Pruning/Diversity/Other
Tasks | Owner | Notes |
---|---|---|
Turn on and off niching adaptive/extinction | Brock | |
Reduce boost of fragments and all poor mols with niching | ||
Pareto/multiobjective ga | ||
best first pruning - now uses descriptor score even if niching ned to delta to fitness/niching when used | Brock | |
Geometric diversity using Hingarian and Tan pruning | Brock | |
Horizontal pruning issue not catching chemically identical molecules | LEP |
Known Potential Bugs
- Molecules Processed bug (dock.cpp)
- verbose mol stats (amber typer)
- molecule being renamed when going into repl even tho it's the same molecule
DOCK_CV - Covalent Docking
As of right now only tested on pose reproduction for 2 testcases.
Create a sphere file by using the sidechain of the CYS residue. In this case isolate the sulfur, carbon, and carbon, and save as a pdb. Using the script pdbtosph, convert that pdb to a .sph file to be read by DOCK. Three atoms are required for spheres and orienting.
Prepare receptor: Remove the covalently bound ligand. Add Hydrogens and charge to the receptor. Remove the CYS sulfur and the next carbon with their associated Hydrogens only. Save as mol2.
Prepare the ligand: Remove all of the protein except for the CYS residue. Delete the backbone and some of the sidechain, leaving the sulfur and attached carbon. Add hydrogens and charge, and manually delete the hydrogens added onto the residue sidechain (ie sulfur and carbon). Save ligand as mol2. Open and edit the ligand.mol2 in vi and change the name of the sulfur to D1 and the atomtype to Du. Change the name of the carbon to D2 and the atomtype to Du. Save the ligand mol2.