Revision as of 21:23, 21 October 2022

DOCK 6.11 Information

General DOCK Goals

DOCK_VS - Virtual Screening/Traditional Docking

Tasks	src	Owner	Complete?
put in warning flag for missing flex defn type instead of segfaulting			no
update torenv in dock6 beta	fraglib_torenv.dat	JDB	no
Add total conformers samples
Put minimize = 0 in flex.defn is a depricated feature in manual	manual	LEP	no
input checks in the vs protocol scripts to check whether the step before finalized mpi routines			no
SYLVIA Score			no
determine which library generation outputs are appended rather than overwritten, and change to overwrite			no
put in best first clustering option for database filter			no
Web server		Open	No
Consensus score (within descriptor score)		Open	No
Clean GNU warnings			No
Multigrid footprint text file formatting needs adjustment		LEP	No
Add the DUDE systems created by Jiaye, Brian, and Yuchen to the standard DOCK test set			No
Create an RNA test set using systems suggested by Al-Hashimi		Rodger, John	No
Fix minimization issue with perfectly linear (alkyne) compounds	Add dummy atom 90* as in other codes so dihedral is defined, treat the hydrogen as a part of the carbon or heavy atom (united atom) approach, flag dihedrals that are undefined or close to 180* as non rotatable	Open	No
HMS Islands	Use HMS to determine clusters of best overlaid congeneric poses	Open	No
Anchor names	Output anchor names alongside molecule number when writing out conformers	Brock	No

Tasks	src	Owner	Complete?
verbose ==2 option in dock6 beta	utils.cpp	LEP	yes
Add total conformers samples
Check amide bond rotation during sampling - it's nto a bug it was fixed back in 2014		LEP	yes
Write out # of HBond Donors and Acceptors	conf_gen_dn, library_file	LEP	yes
put in compiler directives to compile with or without timespec	dock.cpp	LEP	yes
Fix bug that prints out 2/3 sigfigs instead of 6 for MW and FC	library_file, filter, amber_typer	LEP	Yes
Fix nano/micro/milisecond timer	dock.cpp	GDRM	Yes
ga flag and verbose == 2 for premin_mol in simplex	simplex.cpp	LEP	Yes
Merge Hackathon changes to beta for clean faster code	pow/memcpy/mpi pointers everywhere	LEP	Yes
Add Tip3p atom type to dock	vdw.defn fingerprint	LEP	Yes
Hide secondary scoring function permanently	lots	LEP	Yes
Merge GIST into latest dock	grid, master_score, score_descriptor, score_gist	LEP	Yes
Add second layer of verbosity	utils, conf_gen_dn so far	LEP	Yes
RDKit integration with DOCK		GDM	Yes
Modify Grid to show error on nonintegrality		BTB	Yes

DOCK_DN - De Novo Design

This is the Rizzo lab wiki page for coordinating bugs and progress on the de novo project.

Valgrind clean version of the code on cluster that Rizzo lab should be using:

Lauren:

/gpfs/projects/rizzo/zzz.programs/dock6.9_release
This version includes all changes of the merge.

Path to Generic Fragment Library:

/gpfs/projects/rizzo/leprentis/gen-frags-12

Path to Frequency Anchors:

/gpfs/projects/rizzo/leprentis/zinc1_ancs_freq

Current Coding Progress:

Working on these currently:

John: Implement fragment frequency picking as an option
John: Professional web page
John: Implement Adjacency Matrix into fraglib/dn (initialize matrix and utilize matrix for graph and random fragment picking)
Chris: Guided FPS
Lauren: Covalent anchor for denovo growth
Guilherme: rdkit implementation for logp etc

Need to be fixed/added:

methyl and amine capping groups
add print out anchor with frequency option into fraglib code
MPI option for each anchor
aromatic rings
QED
fraglib generation chirality issue
chiral centers
score_molecules and internal_energy problem (for simple_build)
HMS needs to fixed when no heavy atoms matching

Not working on these right now:

Addition of "3mer" combination fragment check (post tors check)
Min and Max formal charge to replace absolute value of charge.(Broke everything) Step down as layers of growth proceed (layers 1-3 FC = 4, Layers 4-5 FC = 3, Layers 6-8 FC = 2)
Capping groups for post growth process (halogens and methyls)
Incorporate tan pruning as final step (post growth) as user option (replace make_unique script) as database filter not dn

Completed:

~~Lauren: hbond accept/donor descriptor implementation~~
~~Chris: increase orienting verbose statistics for dn~~
~~John: acceptance based on freq of torsenv~~
~~Lauren&John: secondary torenv check of prune dump molecules and testing~~
~~Lauren&John: SMILEs and ZINC script (for dn and ga)~~
~~Lauren: add dn name with date and counter function~~
~~Lauren: Check MGS+(-50)TAN before and after fingerprinting fix for 663 systems~~
~~Lauren: determine if random seed is reset for each aps~~
~~Lauren: Create testset for each dn function~~
~~Lauren: Test simple build function with merged de novo~~
~~Lauren&Stephen: clean make_unique script for release~~
~~Lauren: merge GA into dock/dn~~
~~Dwight & Lauren: MPI wrapper for 192 processors (8 nodes) for testsets on rizzo cluster~~
~~Lauren: Create short testsets for denovo frag gen, focused fragment generic for DOCK6.9 release~~
~~Dwight+Lauren: merge parameter files of de novo with DOCK~~
~~Lauren: add dn_defn file for separate defn with Hydrogens~~
~~Lauren: Implement csingleton fix for orienting fragments with less than 3 heavy atoms~~
~~Lauren: Test bfochtman fix for rotatable bonds within an user defined anchor~~
~~Lauren: Test csingleton fix for orienting fragments with Du~~
~~Lauren: Test MGS focused fragment library results with dn paper~~
~~Stephen: editting script to calculate SMILE string of de novo molecules in OpenBabel~~
~~Stephen: smooth function cutoff for mw~~
~~Lauren&John: Rework VS protocol to integrate de novo protocol more smoothly~~
~~Lauren&John: Fix torsion problem for prune_dump molecules~~

List of features that we definitely want for the 6.9 release:

Task	Owner	Complete?
When minimizing with descriptor score, make sure fingerprint is turned off	xxx
Speed up fingerprint calculations by saving reference ligand as a permanent object	WJA	yep
Add pre-min conformations to growth trees	WJA	yep
Add verbose flag options	WJA	yep
Put molecular properties (RB, MW, etc) in mol2 header	WJA	yep
Put ensemble properties (RB, MW, etc) output stream at the end of each layer	WJA	yep
Check formal charge prune	BCF	yep
Combination of horizontal pruning metrics (let's consider dropping tanimoto prune and just using hungarian prune)	WJA	yep
Finish implementing growth trees	WJA	yep
Revisit orienting to make sure it is working as intended	WJA	yep
Fixed a bug where we were marking scaffold_this_layer as true for any fragment	WJA	yep
Update random sampling function to use last layer changes in graph function	WJA	yep
Do that same thing for the exhaustive function	WJA	yep
I don't think we ever clear the scaf_link_sid vector, we definitely should do that somewhere	WJA	yep
Update exhaustive to combine all frags into one library, just like graph / random.	WJA	yep

List of features/ideas for future releases:

Using different references for different layers of dn growth (GFPS protocol) Guided footprint similarity - divide the reference into smaller pieces (layers) to help guide the growth paths more efficiently (i.e. directed growth)
Stereo centers / volume overlap pruning
Capping group functions (H, CH3, Halogen)
Incorporate GA at the end of each layer (not easy)
Overhaul the simple-build function
Monte carlo algorithm that checks bond frequency
Scaling max root / layer size with layer
Select torenv before selecting fragment. Will need to overhaul fraggraph, will keep us from needing to assemble mols that will not pass torenv.
Add fragname string to restart and dump files, already done for final and fraglib files.
Add ZINC name to torenv table
Unusual behavior during library generation when frequency cutoff == 0
Print out how many molecules cannot be capped. (Difference between ensemble size and dump.)
building from anchor 0 -> building from scf.98
Possible torenv check for dump molecules after capping before printing.
keep tables of what fragments (and torsion types) are already included in a growing molecule (i.e.e the name string has this info) and only accept a new fragment (or torsion type) within certain ranges and probabilities. In other words use knowledge of chemical makeup probabilities to keep from over including or under including certain fragment and bond types (essentially use datamining to help us only build molecules within certain boundaries)
De novo design with scaled VDW parameters. Exaggerate them and ramp them down or vice versus. May help to eliminate the anchor and slop or anchor and slosh problem.

List of SB2012 systems that we will use for tests:

For now, let's use 5-15 rotatable bonds inclusive; total = 709 systems ("drug-like" size molecules). De novo paper only used 663 systems that removed 46 systems where the cognate ligand did not fall with a +/-2 formal charge. (5through15 = 709, 5through15_ch2 = 663)

{5RB = 107; 6RB = 96; 7RB = 103; 8RB = 75; 9RB = 66; 10RB = 75; 11RB = 57; 12RB = 41; 13RB = 38; 14RB = 26; 15RB = 25}

DOCK_GA - Genetic Algorithm

Tasks	src	Owner	Complete?
Horizontal pruning issue not catching chemically identical molecules	LEP	no
Fix bug that collapsed atom coordinates	everywhere? nowhere? somewhere.	LEP	YES MUAHAHAHAHA!
Added Delimeter header	conf_gen_ga	LEP	Yes
Fix xover only feature	conf_gen_ga	LEP	Yes
Put in error messages for mut_rate > 1	conf_gen_ga	LEP	Yes
Manual user-defined mutation type	conf_gen_ga	LEP	Yes
Remove check only option	conf_gen_ga	LEP	Yes
Add single molecule evolution in testcase in install dir.	install/test/genetic	LEP	yes
Add leading 0's to xover output filenames	conf_gen_ga.cpp	JDB	no
DNM replacement unable to build list?	conf_gen_ga.cpp	JDB	no
Multi-layer replacement for Amides	conf_gen_ga.cpp	JDB	no
Compute delta slope of fitness score	congen_ga.cpp	BTB	no
slow down molecular evolution so there are less drastic canges between each successive generation	congen_ga.cpp	BTB	no
bring in new parents (e.g. from a pool of molecules) based on convergence	confgen_ga.cpp	BTB	no
user defined point vs on-the-fly convergence	confgen_ga.cpp	BTB	no
metropolis selection for tournament/roulette	conf_gen_ga.cpp	BTB	no

To Do List

tanimoto coefficient percent change - might be inaccurate due to tan coef behavior
Rotatable bond changes (???)
Limit number of aromatic rings.
-xover (guided based on score) - Good v Good ; Bad v Good ; Bad v Bad THIS
Nonexhaustive xover (pick subset of xover based on probability)
Nonexhaustive xover for each pair of parents - have a set number of bonds for xover rather than trying all of them.
Crossover on multiple points simultaneously (2-3 crossovers on a given pair to make 3+ children rather than 2+).
Adaptive maintenance of ensemble based on convergence (extinction, delta max, etc.).
Stop convergence.
Mutations-
1. Adaptive mutation rate - change rate of mutation based on some internal criteria.
2. Pick location of mutation based on some internal criteria.
3. Pick mutation type based on ensemble behavior.
4. If molecules are too large, boost deletion (useful for elitism).
5. If molecules are too small, boost additive mutations.
6. If molecules are too similar, boost replacements and substitutions.
7. mutation type selection based on probability vs ensemble
8. Complete x # y mutation so far so less prevalent etc
9. Note: 3 layer substitutions probably aren't going to work.
10. 2-layer replacements.
fitness-
1. turn on and off niching adaptive/extinction
2. reduce boost of fragments and all poor mols with niching
3. pareto/mulitobjective ga
selection-
1. Adaptively keep differing #'s of parents and children based on some internal criteria (similarity?)
extinction-
which molecules are best-
1. best first pruning - now uses descriptor score even if niching ned to delta to fitness/niching when used
2. geometric diversity using Hingarian and Tan pruning
Determine whether the dummy vdw file is necessary for all DN (including GA) or just DN.
Choose crossover pairs based on measure of similarity (boost crossovers between dissimilar molecules).

Known Bugs

Molecules Processed bug (dock.cpp)
verbose mol stats (amber typer)
molecule being renamed when going into repl even tho it's the same molecule

DOCK_CV - Covalent Docking

As of right now only tested on pose reproduction for 2 testcases.

Create a sphere file by using the sidechain of the CYS residue. In this case isolate the sulfur, carbon, and carbon, and save as a pdb. Using the script pdbtosph, convert that pdb to a .sph file to be read by DOCK. Three atoms are required for spheres and orienting.

Prepare receptor: Remove the covalently bound ligand. Add Hydrogens and charge to the receptor. Remove the CYS sulfur and the next carbon with their associated Hydrogens only. Save as mol2.

Prepare the ligand: Remove all of the protein except for the CYS residue. Delete the backbone and some of the sidechain, leaving the sulfur and attached carbon. Add hydrogens and charge, and manually delete the hydrogens added onto the residue sidechain (ie sulfur and carbon). Save ligand as mol2. Open and edit the ligand.mol2 in vi and change the name of the sulfur to D1 and the atomtype to Du. Change the name of the carbon to D2 and the atomtype to Du. Save the ligand mol2.

@@ Line 236: / Line 236: @@
 =DOCK_GA - Genetic Algorithm=
+{| border="1" cellpadding="8" cellspacing="0" style="background:white; text-align:left; width:90%"
+|- style="background:lightblue"
+! style="width:75%" !|Tasks
+! style="width:25%" !|src
+! style="width:13%" !|Owner
+! style="width:10%" !|Complete?
+|-
+|Horizontal pruning issue not catching chemically identical molecules || LEP || no
+|-
+|Fix bug that collapsed atom coordinates  || everywhere? nowhere? somewhere. || LEP || YES MUAHAHAHAHA!
+|-
+|Added Delimeter header ||conf_gen_ga || LEP || Yes
+|-
+|Fix xover only feature ||conf_gen_ga || LEP || Yes
+|-
+|Put in error messages for mut_rate > 1 || conf_gen_ga ||LEP || Yes
+|-
+|Manual user-defined mutation type ||conf_gen_ga || LEP || Yes
+|-
+|Remove check only option || conf_gen_ga || LEP || Yes
+|-
+|Add single molecule evolution in testcase in install dir. || install/test/genetic || LEP ||yes
+|-
+|Add leading 0's to xover output filenames || conf_gen_ga.cpp || JDB || no
+|-
+|DNM replacement unable to build list? || conf_gen_ga.cpp || JDB || no
+|-
+|Multi-layer replacement for Amides || conf_gen_ga.cpp || JDB || no
+|-
+|Compute delta slope of fitness score || congen_ga.cpp || BTB ||no
+|-
+| slow down molecular evolution so there are less drastic canges between each successive generation || congen_ga.cpp || BTB ||no
+|-
+| bring in new parents (e.g. from a pool of molecules) based on convergence || confgen_ga.cpp || BTB || no
+|-
+| user defined point vs on-the-fly convergence || confgen_ga.cpp || BTB || no
+|-
+| metropolis selection for tournament/roulette || conf_gen_ga.cpp || BTB  || no
+|-
+|}
+<br>
+==To Do List==
+# tanimoto coefficient percent change - might be inaccurate due to tan coef behavior
+# Rotatable bond changes (???)
+# Limit number of aromatic rings.
+#-xover (guided based on score) - Good v Good ; Bad v Good ; Bad v Bad THIS
+#Nonexhaustive xover (pick subset of xover based on probability)
+#Nonexhaustive xover for each pair of parents - have a set number of bonds for xover rather than trying all of them.
+#Crossover on multiple points simultaneously (2-3 crossovers on a given pair to make 3+ children rather than 2+).
+#Adaptive maintenance of ensemble based on convergence (extinction, delta max, etc.).
+#Stop convergence.
+#Mutations-
+##Adaptive mutation rate - change rate of mutation based on some internal criteria.
+##Pick location of mutation based on some internal criteria.
+##Pick mutation type based on ensemble behavior.
+##If molecules are too large, boost deletion (useful for elitism).
+##If molecules are too small, boost additive mutations.
+##If molecules are too similar, boost replacements and substitutions.
+##mutation type selection based on probability vs ensemble
+##Complete x # y mutation so far so less prevalent etc
+##Note: 3 layer substitutions probably aren't going to work.
+##2-layer replacements.
+#fitness-
+##turn on and off niching adaptive/extinction
+##reduce boost of fragments and all poor mols with niching
+##pareto/mulitobjective ga
+#selection-
+##Adaptively keep differing #'s of parents and children based on some internal criteria (similarity?)
+#extinction-
+#which molecules are best-
+##best first pruning - now uses descriptor score even if niching ned to delta to fitness/niching when used
+## geometric diversity using Hingarian and Tan pruning
+#Determine whether the dummy vdw file is necessary for all DN (including GA) or just DN.
+#Choose crossover pairs based on measure of similarity (boost crossovers between dissimilar molecules).
+== Known Bugs ==
+#Molecules Processed bug (dock.cpp)
+#verbose mol stats (amber typer)
+#molecule being renamed when going into repl even tho it's the same molecule
 =DOCK_CV - Covalent Docking=
+As of right now only tested on pose reproduction for 2 testcases.
+Create a sphere file by using the sidechain of the CYS residue. In this case isolate the sulfur, carbon, and carbon, and save as a pdb. Using the script pdbtosph, convert that pdb to a .sph file to be read by DOCK.
+Three atoms are required for spheres and orienting.
+Prepare receptor:
+Remove the covalently bound ligand. Add Hydrogens and charge to the receptor. Remove the CYS sulfur and the next carbon with their associated Hydrogens only. Save as mol2.
+Prepare the ligand:
+Remove all of the protein except for the CYS residue. Delete the backbone and some of the sidechain, leaving the sulfur and attached carbon. Add hydrogens and charge, and manually delete the hydrogens added onto the residue sidechain (ie sulfur and carbon). Save ligand as mol2. Open and edit the ligand.mol2 in vi and change the name of the sulfur to D1 and the atomtype to Du. Change the name of the carbon to D2 and the atomtype to Du. Save the ligand mol2.

Difference between revisions of "Developer's Info Goals"

Revision as of 21:23, 21 October 2022

Contents

DOCK 6.11 Information

General DOCK Goals

DOCK_VS - Virtual Screening/Traditional Docking

DOCK_DN - De Novo Design

Valgrind clean version of the code on cluster that Rizzo lab should be using:

Current Coding Progress:

List of features that we definitely want for the 6.9 release:

List of features/ideas for future releases:

List of SB2012 systems that we will use for tests:

DOCK_GA - Genetic Algorithm

To Do List

Known Bugs

DOCK_CV - Covalent Docking

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Rizzo Lab

Courses

Toolbox