Difference between revisions of "Make Unique script usage"
Stonybrook (talk | contribs) |
|||
(12 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | This page will deal with a component of our post processing procedure for the | + | This page will deal with a component of our post processing procedure for the de novo algorithm. Typically, during a de novo experiment, multiple copies of the same molecule will be created and saved to the final mol2 file. We have determined it best to go through and remove redundant molecules for visualization and selection of molecules for purchase keeping only the best scored copy or conformer of a molecule. To accomplish this we will provide our make_unique script here: |
− | + | [http://ringo.ams.sunysb.edu/downloads/SB2010/make_unique_scripts/zzz.makeunique_clean_no_comments.sh make_unique.sh] [http://ringo.ams.sunysb.edu/downloads/SB2010/make_unique_scripts/split_on_tanimoto_clean_no_comments.py split_on_tanimoto.py] | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | The working directory, the dock home directory and the paths to parameter files need to be updated in make_unique.sh for your particular environment! | ||
+ | Both components, make_unique.sh and split_on_tanimoto.py, are needed to run the make_unique.sh script and it is best/easiest to keep both scripts in the same directory. | ||
− | + | The usage is as follows: | |
+ | |||
+ | ./make_unique.sh path/to/multimol2 scoring_function: | ||
− | |||
− | + | - The multimol2 is usually the concatenated output mol2 from each individual anchor from your de novo experiment. | |
+ | |||
+ | - The scoring function entry is a string that designates which score or scoring function would be used to select a "best conformer" if more than one copy of a molecule exists. | ||
− | |||
− | |||
− | The | + | The algorithm works by binning compounds according to their tanimoto to the initial reference. From here each bin is compared internally using a tanimoto calculation to a new, different reference than the one that was used initially. Compounds that have the same tanimoto score to the initial reference and a tanimoto of one to the new reference (where the new reference is a compound contained in one of the bins created in the first step) have to be identical. By using a double pass filter where compounds are first separated according to the tanimoto score initially computed to *ANY* reference, the time required for the second pass is greatly reduced. |
+ | |||
+ | |||
+ | Caveat Emptor: This protocol will only work if the multimol2 used as an input was scored or rescored using the tanimoto scoring function to some/ANY reference** |
Latest revision as of 15:35, 6 September 2018
This page will deal with a component of our post processing procedure for the de novo algorithm. Typically, during a de novo experiment, multiple copies of the same molecule will be created and saved to the final mol2 file. We have determined it best to go through and remove redundant molecules for visualization and selection of molecules for purchase keeping only the best scored copy or conformer of a molecule. To accomplish this we will provide our make_unique script here:
make_unique.sh split_on_tanimoto.py
The working directory, the dock home directory and the paths to parameter files need to be updated in make_unique.sh for your particular environment!
Both components, make_unique.sh and split_on_tanimoto.py, are needed to run the make_unique.sh script and it is best/easiest to keep both scripts in the same directory.
The usage is as follows:
./make_unique.sh path/to/multimol2 scoring_function:
- The multimol2 is usually the concatenated output mol2 from each individual anchor from your de novo experiment.
- The scoring function entry is a string that designates which score or scoring function would be used to select a "best conformer" if more than one copy of a molecule exists.
The algorithm works by binning compounds according to their tanimoto to the initial reference. From here each bin is compared internally using a tanimoto calculation to a new, different reference than the one that was used initially. Compounds that have the same tanimoto score to the initial reference and a tanimoto of one to the new reference (where the new reference is a compound contained in one of the bins created in the first step) have to be identical. By using a double pass filter where compounds are first separated according to the tanimoto score initially computed to *ANY* reference, the time required for the second pass is greatly reduced.
Caveat Emptor: This protocol will only work if the multimol2 used as an input was scored or rescored using the tanimoto scoring function to some/ANY reference**