Difference between revisions of "3D Analog Library Generation Using Pubchem and Zinc"
| Stonybrook (talk | contribs) | |||
| (2 intermediate revisions by the same user not shown) | |||
| Line 40: | Line 40: | ||
| − | From here we want to select the CID, Smiles or InChl tab, paste in our smiles string, then we can select some parameters (Tanimoto greater than 0.80 and compounds only from ZINC): | + | From here we want to select the "CID, Smiles or InChl" tab, paste in our smiles string, then we can select some parameters (Tanimoto greater than 0.80 and compounds only from ZINC): | 
| [[File:Similarity search w parameters.png|x500px]] | [[File:Similarity search w parameters.png|x500px]] | ||
| Line 50: | Line 50: | ||
| − | + | After clicking search, we will be brought to a screen like this and we can select the "send to" tab on the right hand side: | |
| [[File:Pubchem search output.png|x500px]] | [[File:Pubchem search output.png|x500px]] | ||
| Line 70: | Line 70: | ||
| From here we can grep out all of the ZINC ids obtained from the search using something along the lines of   | From here we can grep out all of the ZINC ids obtained from the search using something along the lines of   | ||
| − | grep 'ZINC' path_to_summary_file.out > all_zinc_ids.txt | + | grep -o 'ZINC' path_to_summary_file.out > all_zinc_ids.txt | 
| Here is an example of what all_zinc_ids.txt might look like: | Here is an example of what all_zinc_ids.txt might look like: | ||
| + | [[File:Zinc id list for database 3d search.png|x500px]] | ||
| − | |||
| − | |||
| Line 86: | Line 85: | ||
| From here our library curation turns to the ZINC database to download the 3-dimensional structures for docking.   | From here our library curation turns to the ZINC database to download the 3-dimensional structures for docking.   | ||
| − | If we are on the zinc15 substances page ( http://zinc15.docking.org/substances/home/ ) we can click the choose file tab user search using many and upload our all_zinc_ids.txt file to start a query. The only parameter to change is that we want our output in mol2 format rather than summary format (see picture): | + | If we are on the zinc15 substances page ( http://zinc15.docking.org/substances/home/ ) we can click the "choose file" tab user search using many and upload our all_zinc_ids.txt file to start a query. The only parameter to change is that we want our output in mol2 format rather than summary format (see picture): | 
| + | |||
| + | [[File:Screen Shot 2018-05-22 at 10.15.22 AM.png|x500px]] | ||
| − | |||
| − | After selecting the appropriate list of ZINC ids and selecting the mol2 output you can click the  | + | After selecting the appropriate list of ZINC ids and selecting the mol2 output you can click the "Search many" box. This will eventually begin downloading a file to your downloads directory, usually called resolved-3.mol2 or something along this lines. It is important to move this resulting mol2 file to an appropriate directory and rename it accordingly! | 
Latest revision as of 10:36, 22 May 2018
Hello! This short write up is designed to make it easier for the group and other users that may stumble across this writeup curate a library of compounds analogous to compounds identified experimentally as active for the purpose of a secondary or follow-up virtual screen. First, we want a list of the actives compound's ZINC ID and smiles string. The example Il use here is ZINC000019831888 who's smile string is: OC(COC=1C=CC(=CC1)C(=O)C=2C=CC=CC2)CN3CCN(CC3)C=4C=CC=CC4Cl.
After collecting the pertinent information for the compounds were interested in, we can head to https://pubchem.ncbi.nlm.nih.gov This will bring up a screen that looks like this:
We want to select the Structure Search bar on the right hand side of the screen:
Then we are taken to this page:
https://pubchem.ncbi.nlm.nih.gov/search/search.cgi
We want to select the Identity/Similarity tab:
That will bring up this screen:
From here we want to select the "CID, Smiles or InChl" tab, paste in our smiles string, then we can select some parameters (Tanimoto greater than 0.80 and compounds only from ZINC):
After clicking search, we will be brought to a screen like this and we can select the "send to" tab on the right hand side:
After clicking create file, a file will be downloaded to our downloads directory and look something like this:
From here we can grep out all of the ZINC ids obtained from the search using something along the lines of 
grep -o 'ZINC' path_to_summary_file.out > all_zinc_ids.txt
Here is an example of what all_zinc_ids.txt might look like:
Caveat emptor!: The list of zinc ids may contain trailing punctuation, in my experience it has been semi-colons that should be removed before querying ZINC. This can be done pretty simply using awk or sed. It is best to peruse each list of ZINC ids individually provided they aren't prohibitively large.
Depending on the size of the analog library collected (list of zinc IDs collected) we can break them up into chunks containing 1000 IDs. This can be done using the split command from the terminal.
From here our library curation turns to the ZINC database to download the 3-dimensional structures for docking.
If we are on the zinc15 substances page ( http://zinc15.docking.org/substances/home/ ) we can click the "choose file" tab user search using many and upload our all_zinc_ids.txt file to start a query. The only parameter to change is that we want our output in mol2 format rather than summary format (see picture):
After selecting the appropriate list of ZINC ids and selecting the mol2 output you can click the "Search many" box. This will eventually begin downloading a file to your downloads directory, usually called resolved-3.mol2 or something along this lines. It is important to move this resulting mol2 file to an appropriate directory and rename it accordingly!









