3D Analog Library Generation Using Pubchem and Zinc

From Rizzo_Lab
Jump to: navigation, search

Hello! This short write up is designed to make it easier for the group and other users that may stumble across this writeup curate a library of compounds analogous to compounds identified experimentally as active for the purpose of a secondary or follow-up virtual screen. First, we want a list of the actives compound's ZINC ID and smiles string. The example Il use here is ZINC000019831888 who's smile string is: OC(COC=1C=CC(=CC1)C(=O)C=2C=CC=CC2)CN3CCN(CC3)C=4C=CC=CC4Cl.

After collecting the pertinent information for the compounds were interested in, we can head to https://pubchem.ncbi.nlm.nih.gov This will bring up a screen that looks like this:

Screen Shot 2018-05-07 at 12.00.26 PM.png

We want to select the Structure Search bar on the right hand side of the screen:

Structure search selection pubchem.JPG

Then we are taken to this page: https://pubchem.ncbi.nlm.nih.gov/search/search.cgi

Screen Shot 2018-05-07 at 12.17.39 PM.png

We want to select the Identity/Similarity tab:

Select similarity Search.JPG

That will bring up this screen:

Similarity search window.JPG

From here we want to select the CID, Smiles or InChl tab, paste in our smiles string, then we can select some parameters (Tanimoto greater than 0.80 and compounds only from ZINC):

Similarity search w parameters.png

Pubchem paramters for sim search.png

This will bring up a screen like this and we can select the send to tab on the right hand side:

Tanimoto search output 2.png

We can select the following options from the resulting menu:

Strcutrue downlaod menu.JPG

Now, finally, we will generate some output that is a list of smiles strings of molecules analogous to the original query:

Similarity search output.JPG

From here, it would be best to copy and paste the resulting smiles strings into a text file for further manipulation. The next portion of this tutorial will deal with obtaining there dimensional structures from the 2 dimensional smile strings that were just obtained from Pubchem.