Difference between revisions of "3D Analog Library Generation Using Pubchem and Zinc"

Latest revision as of 10:36, 22 May 2018

Hello! This short write up is designed to make it easier for the group and other users that may stumble across this writeup curate a library of compounds analogous to compounds identified experimentally as active for the purpose of a secondary or follow-up virtual screen. First, we want a list of the actives compound's ZINC ID and smiles string. The example Il use here is ZINC000019831888 who's smile string is: OC(COC=1C=CC(=CC1)C(=O)C=2C=CC=CC2)CN3CCN(CC3)C=4C=CC=CC4Cl.

After collecting the pertinent information for the compounds were interested in, we can head to https://pubchem.ncbi.nlm.nih.gov This will bring up a screen that looks like this:

We want to select the Structure Search bar on the right hand side of the screen:

Then we are taken to this page: https://pubchem.ncbi.nlm.nih.gov/search/search.cgi

We want to select the Identity/Similarity tab:

That will bring up this screen:

From here we want to select the "CID, Smiles or InChl" tab, paste in our smiles string, then we can select some parameters (Tanimoto greater than 0.80 and compounds only from ZINC):

After clicking search, we will be brought to a screen like this and we can select the "send to" tab on the right hand side:

After clicking create file, a file will be downloaded to our downloads directory and look something like this:

From here we can grep out all of the ZINC ids obtained from the search using something along the lines of

grep -o 'ZINC' path_to_summary_file.out > all_zinc_ids.txt

Here is an example of what all_zinc_ids.txt might look like:

Caveat emptor!: The list of zinc ids may contain trailing punctuation, in my experience it has been semi-colons that should be removed before querying ZINC. This can be done pretty simply using awk or sed. It is best to peruse each list of ZINC ids individually provided they aren't prohibitively large.

Depending on the size of the analog library collected (list of zinc IDs collected) we can break them up into chunks containing 1000 IDs. This can be done using the split command from the terminal.

From here our library curation turns to the ZINC database to download the 3-dimensional structures for docking.

If we are on the zinc15 substances page ( http://zinc15.docking.org/substances/home/ ) we can click the "choose file" tab user search using many and upload our all_zinc_ids.txt file to start a query. The only parameter to change is that we want our output in mol2 format rather than summary format (see picture):

After selecting the appropriate list of ZINC ids and selecting the mol2 output you can click the "Search many" box. This will eventually begin downloading a file to your downloads directory, usually called resolved-3.mol2 or something along this lines. It is important to move this resulting mol2 file to an appropriate directory and rename it accordingly!

@@ Line 40: / Line 40: @@
-From here we want to select the CID, Smiles or InChl tab, paste in our smiles string and search:
+From here we want to select the "CID, Smiles or InChl" tab, paste in our smiles string, then we can select some parameters (Tanimoto greater than 0.80 and compounds only from ZINC):
-[[File:Smiles string pasted.JPG|x500px]]
+[[File:Similarity search w parameters.png|x500px]]
+[[File:Pubchem paramters for sim search.png|x500px]]
-A brief progress window will pop up followed, eventually, by a page specific to the query molecule:
-[[File:Molecule title page.JPG|x500px]]
+After clicking search, we will be brought to a screen like this and we can select the "send to" tab on the right hand side:
+[[File:Pubchem search output.png|x500px]]
-We can scroll down to section 5.2 and select the similar compounds tab:
-[[File:Select similar componds from title page.JPG|x500px]]
+[[File:Send to tab pubchem.png|x500px]]
-This will bring up a screen like this and we can select structure download on the right hand side:
+After clicking create file, a file will be downloaded to our downloads directory and look something like this:
-[[File:Structure download tab.JPG|x500px]]
+[[File:Compound summary pubchem.png|x500px]]
-We can select the following options from the resulting menu:
+From here we can grep out all of the ZINC ids obtained from the search using something along the lines of
-[[File:Strcutrue downlaod menu.JPG|x500px]]
+grep -o 'ZINC' path_to_summary_file.out > all_zinc_ids.txt
+Here is an example of what all_zinc_ids.txt might look like:
+[[File:Zinc id list for database 3d search.png|x500px]]
-Now, finally, we will generate some output that is a list of smiles strings of molecules analogous to the original query:
-[[File:Similarity search output.JPG|x500px]]
+Caveat emptor!: The list of zinc ids may contain trailing punctuation, in my experience it has been semi-colons that should be removed before querying ZINC. This can be done pretty simply using awk or sed. It is best to peruse each list of ZINC ids individually provided they aren't prohibitively large.
+Depending on the size of the analog library collected (list of zinc IDs collected) we can break them up into chunks containing 1000 IDs. This can be done using the split command from the terminal.
+From here our library curation turns to the ZINC database to download the 3-dimensional structures for docking.
-From here, it would be best to copy and paste the resulting smiles strings into a text file for further manipulation. The next portion of this tutorial will deal with obtaining there dimensional structures from the 2 dimensional smile strings that were just obtained from Pubchem.
+If we are on the zinc15 substances page ( http://zinc15.docking.org/substances/home/ ) we can click the "choose file" tab user search using many and upload our all_zinc_ids.txt file to start a query. The only parameter to change is that we want our output in mol2 format rather than summary format (see picture):
+[[File:Screen Shot 2018-05-22 at 10.15.22 AM.png|x500px]]
+After selecting the appropriate list of ZINC ids and selecting the mol2 output you can click the "Search many" box. This will eventually begin downloading a file to your downloads directory, usually called resolved-3.mol2 or something along this lines. It is important to move this resulting mol2 file to an appropriate directory and rename it accordingly!

Difference between revisions of "3D Analog Library Generation Using Pubchem and Zinc"

Latest revision as of 10:36, 22 May 2018

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Rizzo Lab

Courses

Toolbox