3D Analog Library Generation Using Pubchem and Zinc
Hello! This short write up is designed to make it easier for the group and other users that may stumble across this writeup curate a library of compounds analogous to compounds identified experimentally as active for the purpose of a secondary or follow-up virtual screen. First, we want a list of the actives compound's ZINC ID and smiles string. The example Il use here is ZINC000019831888 who's smile string is: OC(COC=1C=CC(=CC1)C(=O)C=2C=CC=CC2)CN3CCN(CC3)C=4C=CC=CC4Cl.
After collecting the pertinent information for the compounds were interested in, we can head to https://pubchem.ncbi.nlm.nih.gov This will bring up a screen that looks like this:
We want to select the Structure Search bar on the right hand side of the screen:
Then we are taken to this page:
https://pubchem.ncbi.nlm.nih.gov/search/search.cgi
We want to select the Identity/Similarity tab:
That will bring up this screen:
From here we want to select the CID, Smiles or InChl tab, paste in our smiles string, then we can select some parameters (Tanimoto greater than 0.80 and compounds only from ZINC):
This will bring up a screen like this and we can select the send to tab on the right hand side:
We can select the following options from the resulting menu:
Now, finally, we will generate some output that is a list of smiles strings of molecules analogous to the original query:
From here, it would be best to copy and paste the resulting smiles strings into a text file for further manipulation. The next portion of this tutorial will deal with obtaining there dimensional structures from the 2 dimensional smile strings that were just obtained from Pubchem.