Difference between revisions of "2025 DOCK tutorial 1 with PDBID 1O86"
| Stonybrook (talk | contribs)  (→008: 5k Virtual Screen) | Stonybrook (talk | contribs)   (→Slurm and Job Submission) | ||
| (335 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
| − | == DOCK Tutorial using PDB 1O86 == | + | ==  DOCK Tutorial using PDB 1O86 == | 
| − | [ | + | This tutorial is a series of modules designed to familiarize a new user with the basic functions and features of DOCK6, as well as provide guidance on its execution and analysis of results. Modules should be read in a linear fashion, as each module builds on the prior one.  | 
| + | |||
| + | The [https://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm the full DOCK manual] contains much more information than this tutorial, and as such is recommended as a resource if the information provided herein proves unsatisfactory. | ||
| == 000: Foundations == | == 000: Foundations == | ||
| − | + | ||
| + | === Chimera === | ||
| + | |||
| + | UCSF Chimera is a python-based, open-source molecular visualization and manipulation software suite. It is extremely helpful for both preparing molecules/receptors for docking and for visually analyzing the results of those calculations.  | ||
| + | |||
| + | It can be downloaded from [https://www.cgl.ucsf.edu/chimera/download.html the official UCSF site]; make sure to select the version that matches your operating system (Mac or Windows). Although Chimera is no longer under active development, it remains a relevant software for molecular modeling. | ||
| + | |||
| + | Once Chimera has installed, you can open it to find a blank blue-ish window, with a row of tabs along the top. Throughout this tutorial, you will be instructed to perform different actions contained within these tabs. We will denote the specific tab and sub-tab to be accessed by >> signs. For example, File >> Open PDB would indicate that you should click on the File tab, then mouse to and click Open PDB. This is necessary because some actions are nested in multiple sub-tabs (for instance, selecting all hydrogens in a model would require Select >> Chemistry >> element >> H, as shown below) | ||
| + | |||
| + | [[File:ChimeraMenu.png]] | ||
| + | |||
| + | More extensive documentation on Chimera and its functions is available [https://www.cgl.ucsf.edu/chimera/docindex.html on the official site]. | ||
| + | |||
| + | ===Seawulf=== | ||
| + | 	To complete this tutorial as a Stony Brook student, you will need an account on Seawulf. A ticket to obtain an account can be [https://rci.stonybrook.edu/HPC/faqs/requesting-a-seawulf-or-li-red-account submitted on the Seawulf website]; Dr. Rizzo will need to provide approval for account activation. | ||
| + | |||
| + | 	The Seawulf website also has a [https://rci.stonybrook.edu/HPC/faqs/getting-started?field_hpc_cluster_target_id=All list of best practices] for using a High Performance Computing (HPC) cluster. We recommend reading through them before attempting to run any intensive programs on Seawulf. Of particular note is the job management system SLURM - we provide a full tutorial on this system in section 008 of this tutorial if you want to read up on it now, but it will not be necessary for this tutorial until that point. | ||
| + | |||
| + | ===SSH=== | ||
| + | You will need to use a Secure Shell (SSH) connection to connect to Seawulf remotely. A guide to this process is [https://rci.stonybrook.edu/HPC/faqs/logging-into-seawulf available on the Seawulf website]. | ||
| + | |||
| + | ===Basic Unix Commands and Environment=== | ||
| + | 	Seawulf uses a terminal-based Unix operating system. If you have never used a terminal-based OS before, you should familiarize yourself with some [https://info-ee.surrey.ac.uk/Teaching/Unix/index.html basic Unix tutorials]. Broadly, you should know how to: | ||
| + | |||
| + | * List your working directory with <code>pwd</code>, change your working directory with <code>cd</code>, make new directories with <code>mkdir</code>, remove directories with <code>rmdir</code>, and list the contents of a directory with <code>ls</code> | ||
| + | * Edit files using the text editor <code>vim</code> (use the <code>vimtutor</code> to read about basic functionality) | ||
| + | * Create an empty file with <code>touch</code>, move files and directories around with <code>mv</code>, copy directories and files with <code>cp</code>, and (very carefully!) remove directories and files with <code>rm</code> | ||
| + | * Determine if commands are available with <code>which</code> | ||
| + | * Understand how to use commands and how flags/input parameters should be formatted | ||
| + | |||
| + | You should also understand the concept of filepaths, and how they are used by commands. Much of DOCK relies on you using the correct filepaths to pass files and parameters into the program, and if the paths are wrong, then the program will not work. When possible, it is advisable to use '''absolute''' filepaths (paths that start at the root directory), as opposed to paths relative to your current working directory. For example, if you are in the directory  <code>/gpfs/home/yourusername/tutorial/002_spheres</code>, referencing the absolute path <code>/gpfs/home/yourusername/tutorial/001_structure/important_structure.pdb</code> for a needed structure file is more reliable than the relative path <code>../001_structure/important_structure.pdb</code>, as the latter path is relative to <code>002_spheres/</code>, and meaningless outside of that context. The command <code>realpath</code>will return the absolute path of any file passed to it, which is useful for quickly and accurately determining the absolute path of any file. | ||
| + | |||
| + | To work with DOCK6, it may also be necessary to add a path to your config file that tells Unix where to look for DOCK-related commands. You can check if you need to do this using the command <code>which dock6</code>. If <code>which</code> informs you that no <code>dock6</code> can be found, you will need to edit your .bashrc file: | ||
| + | #Move to your home directory by either entering <code>cd</code> or <code>cd ~</code> | ||
| + | #Use the command <code>vi .bashrc</code> to start editing your personal config file | ||
| + | #At the bottom of the file, add the following line: | ||
| + | <code>export PATH=$PATH:/Path/To/Your/DOCK6/bin</code> | ||
| + | |||
| + | where Path/To/Your/DOCK6/ should be replaced with whatever the path is to your local DOCK installation (note: the full path should end with /bin). At the time of this writing, there is a compiled version of DOCK6.12 at /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/, but if this does not work then ask Dr. Rizzo or a lab member where a compiled instance can be found | ||
| + | |||
| + | :4. Save and exit the file, then enter the command <code>source .bashrc</code> into the terminal. | ||
| + | :5. Now when you request <code>which dock6</code>, the terminal should return the path to the directory you just provided. | ||
| + | |||
| + | After all this, you should have a good understanding of how to navigate the Unix terminal, and have the DOCK6 suite ready to use in your environment. | ||
| + | |||
| + | === SCP File Transfer === | ||
| + | 	Another important aspect of working on Seawulf is uploading and downloading files to your local machine. This is accomplished using the Secure Copy Protocol (scp) terminal command. See the pertinent section on the [https://rci.stonybrook.edu/hpc/faqs/transferring-files-to-and-from-seawulf Seawulf website]. For Windows, it may be advisable to download a third party program; [https://filezilla-project.org/ FileZilla] works for this task. | ||
| + | |||
| + | ===Directory setup=== | ||
| + | 	Finally, assuming all necessary base Unix knowledge has been obtained, you will want to set up a set of directories for the work you will be doing in this tutorial. The directories will help you compartmentalize your work and help to prevent important files from being overwritten or lost. For the modules of this tutorial, we recommend setting up this directory structure, and setting up folders on your local machine with the same names for ease of file transfer: | ||
| + | |||
| + | First, create a tutorial directory in your home directory: | ||
| + | |||
| + | <pre> | ||
| + | cd ~ | ||
| + | mkdir tutorial | ||
| + | </pre> | ||
| + | |||
| + | Then, make a series of subdirectories inside of that tutorial directory: | ||
| + | |||
| + | <pre> | ||
| + | cd ~/tutorial | ||
| + | mkdir 001_structure | ||
| + | mkdir 002_spheres | ||
| + | mkdir 003_gridbox | ||
| + | mkdir 004_energy_min | ||
| + | mkdir 005_rigid_dock | ||
| + | mkdir 006_flex_dock | ||
| + | mkdir 007_footprint | ||
| + | mkdir 008_virtual_screen | ||
| + | mkdir 009_gen_alg | ||
| + | mkdir 010_de_novo | ||
| + | </pre> | ||
| + | |||
| + | ===A Note on Troubleshooting=== | ||
| + | |||
| + | 	Finally, despite our best efforts at accuracy, a healthy amount of this tutorial will likely not go smoothly for you. There will be errors and inconsistencies between what you see here and what happens on Seawulf, and there may not be an immediate answer for what you should do to fix them. The instinct when this happens is to find someone who knows what might have happened and ask them for advice, and while this will usually be helpful, it may be inefficient. As such, we advise you to first please ''troubleshoot''. No program you will use in this tutorial is intended to be a black box. All of them do their best to return errors that tell you what is going wrong, and where to look to fix it. If you get an error, read it! Often it will tell you that it can't find a particular file ("doesn't exist" is the favorite wording), or that some piece of the input is giving it trouble— these tell you what paths to check or lines of a file to review. Also, copy-pasting an entire error into Google (or ChatGPT if you must) is a completely valid line of inquiry, and often yields helpful results.  | ||
| + | |||
| + | Having said that, here are some common errors and tactics you can use to fix them: | ||
| + | |||
| + | <code>"[File] could not be opened"</code> or <code>"[file] does not exist"</code> usually indicates that a path you gave the program is not accurate. You can "sanity check" paths by copying them directly from the input and putting them into an ls command. If it throws an error, fix the path! | ||
| + | |||
| + | <code>"version `GLIBCXX_whatever' not found"</code> is generally an issue with how the DOCK installation was compiled. Try to find a different version (/gpfs/projects/AMS536/zzz.programs usually has several versions available) or try running the command on a different node (Milan or Login) | ||
| + | |||
| + | Errors mentioning FORTRAAN: Make sure the input files are named correctly(ex INSPH for sphgen) and there are no extra spaces or lines anywhere in the input files. Sometimes they sneak in at the start or end of a line and are parsed literally. | ||
| + | |||
| + | Often, output is sent into a .out file instead of written to the console. If the program terminates immediately but does not return expected files, check whatever .out file was written for errors at the very bottom. | ||
| == 001: Structure Prep == | == 001: Structure Prep == | ||
| − | Download PDB,  | + | |
| + | === Downloading the PDB === | ||
| + | |||
| + | Having setup your necessary environment to work on Seawulf, let's navigate to your local computer and begin | ||
| + | the protein preparation process: | ||
| + | |||
| + | To begin protein preparation you will need the 1O86 PDB file. Using this link: https://www.rcsb.org/structure/1O86, | ||
| + | you will see the RCSB main page opened to our protein of choice: | ||
| + | |||
| + | [[File:1O86_RCSB_1.png|600px]] | ||
| + | |||
| + | Next, you'll want to navigate to the top right corner where it says Download Files. Then, select the dropdown arrow.  | ||
| + | The following dropdown menu will appear on the screen: | ||
| + | |||
| + | |||
| + | [[File:1O86_RCSB_2.png|600px]] | ||
| + | |||
| + | |||
| + | Select 'Legacy PDB Format'. Now the PDB file should be downloaded to your local computer. | ||
| + | |||
| + | Now that you have the PDB file, lets navigate to Chimera program to open the file. | ||
| + | |||
| + | === Protein Isolation === | ||
| + | |||
| + | Now that you have the downloaded PDB, the first step is to isolate and modify our protein, if necessary. | ||
| + | |||
| + | First, open the PDB in Chimera using File >> Open, zoom into where the ligand is, hold the control key and click to select an atom on the ligand, and press the up arrow key to highlight the ligand. Then go to Actions >> Focus. | ||
| + | |||
| + | The ligand should now be highlighted in green; you will see it is surrounded by water molecules: | ||
| + | |||
| + | [[File:Ligand_Embed.png|600px]] | ||
| + | |||
| + | We're going to get rid of the ligand and the water molecules.  | ||
| + | |||
| + | First, since the ligand is highlighted, go to Actions >> Atoms/Bonds >> Delete. | ||
| + | |||
| + | There may be a glycine molecule remaining in the binding pocket. For the sake of simplicity, we are going to delete this as well. Control+click to select an atom in the glycine, up arrow to highlight the whole molecule, and Actions >> Atoms/Bonds >> Delete it in the same manner as the ligand. | ||
| + | |||
| + | Next, to get rid of the water molecules, navigate to Select >> Residue >> HOH | ||
| + | |||
| + | You'll notice that all of the water molecules are highlighted: | ||
| + | |||
| + | [[File:Water.png|600px]] | ||
| + | |||
| + | Just like the ligand, navigate to Actions >> Atoms/Bonds >> Delete | ||
| + | |||
| + | Now you have just a protein: | ||
| + | |||
| + | [[File:Protein_only.png|600px]] | ||
| + | |||
| + | === Inspecting the Protein === | ||
| + | |||
| + | With just the protein file, lets inspect if there are any missing loops along the exterior. First, to make the view clearer, go to View >> Focus again to put the whole protein in focus. | ||
| + | |||
| + | You'll notice that there is a missing loop between residues 434 and 439. | ||
| + | |||
| + | [[File:Missing_loop.png|600px]] | ||
| + | |||
| + | To refine the missing loop, go to Tools >> Structure Editing >> select Model/Refine Loops | ||
| + | |||
| + | The following pop up menus will appear: | ||
| + | |||
| + | [[File:Residue List.png|600px]] | ||
| + | |||
| + | This is a list of the residues in the protein. The residues 435-438 boxed in red denote the ones involved with the missing loop. | ||
| + | |||
| + | [[File:Model_refine.png|600px]] | ||
| + | |||
| + | This is the Modeller/Refine Menu. If you have not used it before, the Modeller License Key field may be empty, which will prevent the program from running. | ||
| + | You will first need to register for a Modeller key [https://www.salilab.org/modeller/registration.html on the official page]. They will email you with the registration key relatively promptly (within a day but usually sooner). Note that because this program connects to a webserver for the loop modeling, you will require a stable internet connection to use it. | ||
| + | |||
| + | You can leave all the other options selected as is. Once you select apply, wait a couple of minutes. | ||
| + | The program is generating different refinements to the missing loops. The residue menu will say "Running Modeller Web Service".  | ||
| + | Additionally, you'll be able to track the process completed on the search bar menu on the bottom part of the screen on Chimera. | ||
| + | |||
| + | Once the modeler is 100% completed, the following popup menu will appear on Chimera: | ||
| + | |||
| + | [[File:Modeller_menu.png|600px]] | ||
| + | |||
| + | You can see that the modeller program generated five alternative protein models that refined the missing loop. | ||
| + | If you press the shift key and highlight all of the model numbers you will see the following proteins with fixed loops. | ||
| + | |||
| + | [[File:Fixed_loop.png|600px]] | ||
| + | |||
| + | Now, using the drop down menu, from earlier, you'll want to assess what's the best model to choose that has refined loops. | ||
| + | Sometimes the difference in scores are negligible and you may want to select based on zDope, which would mean any loop is sufficient.  | ||
| + | However, there may be another score that could narrow your decision making.  | ||
| + | |||
| + | In the 'Modeller Results' menu, select Fetch Scores. You'll see in your chimera window that a job was submitted. | ||
| + | This is signaling to the program to retrieve the scores from the program. Once the job is finished, the Estimated RMSD | ||
| + | and Estimated Overlap columns will contain values: | ||
| + | |||
| + | [[File:Scores.png|600px]] | ||
| + | |||
| + | From adding more scores, you'll notice that the models differ in their Estimated RMSD values. Based on these values, | ||
| + | Models 1.2 or 1.5 are a good choice based on RMSD. So, let's choose model 1.2 as our protein model to use in this tutorial. | ||
| + | We will also save this model as our new PDB file. | ||
| + | |||
| + | Go to File >> Save PDB. When the save menu appears, make sure that only the second model is selected like so: | ||
| + | |||
| + | [[File:Save_Menu.png|600px]] | ||
| + | |||
| + | Let's also give this PDB a new title, the one supplied is <code> 1O86_fixed_protein.pdb </code>. Additionally, we will need the original PDB for ligand preparation. | ||
| + | |||
| + | Press save and now you have a refined protein to prepare. | ||
| + | |||
| + | === Protein Preparation === | ||
| + | |||
| + | After saving a protein with modelled loops as a PDB, use File >> Close Session to clear your Chimera session. Then, File >> Open the <code> 1O86_fixed_protein.pdb </code>  | ||
| + | |||
| + | In order to use this protein for docking calculations, we will need to generate a charged, protonated protein and ligand files. Charged molecules must be saved in Mol2 format, as the PDB file format does not support partial atomic charges. | ||
| + | |||
| + | We will first save a version of the protein in Mol2 format that does not have any charges or hydrogens. | ||
| + | Go to File >> Save Mol2. You will want to give this mol2 file a title that describes the fact that  | ||
| + | the protein doesn't have hydrogens or charges. So let's name this file <code> 1O86_protein_nH_ncH.mol2 </code> | ||
| + | |||
| + | The nH refers to "no hydrogens" and the ncH refers to "no charges". | ||
| + | |||
| + | Now that the mol2 is saved, we can protonate and charge the receptor in preparation for docking.  | ||
| + | |||
| + | '''Adding Hydrogens''' | ||
| + | |||
| + | Go to Tools >> Structure Editing >> AddH  | ||
| + | |||
| + | The following menu will appear: | ||
| + | |||
| + | [[File:wm.png|600px]] | ||
| + | |||
| + | All the defaults that are selected are acceptable. Select OK. | ||
| + | |||
| + | |||
| + | You will get a confirmation message on the chimera text bar that says 'Hydrogens Added'. | ||
| + | Now you have added hydrogens to the protein file. The last thing to do is to add charges. | ||
| + | |||
| + | '''Adding Charges''' | ||
| + | |||
| + | Before we can save the second mol2 for the protein we must also add charges to the protein. | ||
| + | |||
| + | Go to Tools >> Structure Editing >> press Add Charge | ||
| + | |||
| + | The following menu will appear: | ||
| + | |||
| + | [[File:cm.png|600px]]  | ||
| + | |||
| + | By default Chimera selects AM1-BCC charges, but for this tutorial we are going to use Gasteiger charges.  | ||
| + | Every other selection on the menu is okay, so after changing the charge type, go ahead and select apply. You'll notice another pop up menu appears: | ||
| + | |||
| + | [[File:specify.png|600px]] | ||
| + | |||
| + | This deals with ions. Just as before, make sure that Gasteiger is selected. All other defaults are okay. | ||
| + | Select OK. The Chimera window will inform you that standard charges are added. | ||
| + | |||
| + | Now that your protein has hydrogens and charges, we can go ahead and generate the second mol2 file. | ||
| + | |||
| + | Go to File >> Save Mol2. This time the name should reflect that the protein has hydrogens and charges. | ||
| + | Let's name it <code> 1O86_protein_wH_wcH.mol2 </code>. | ||
| + | |||
| + | Now our protein files are prepared for docking, let's now turn our attention to preparing the ligand. | ||
| + | |||
| + | === Isolating the Ligand === | ||
| + | |||
| + | Having made a Protein PDB and generated two mol2 files, lets now do the same thing for the ligand. The first step is to File >> Close Session to clear the current environment, then open the original PDB file that we downloaded. | ||
| + | |||
| + | Just like in the protein section, press control and select an atom on the ligand. Then press the up arrow to highlight the entire ligand: | ||
| + | |||
| + | [[File:Ligand_Embed.png|600px]] | ||
| + | |||
| + | Go to Select >> Invert (selected models) | ||
| + | |||
| + | [[File:all_highlight.png|600px]] | ||
| + | |||
| + | You'll notice that everything around the ligand is highlighted.  | ||
| + | |||
| + | Now go to Actions >> Atoms/Bonds >> delete | ||
| + | |||
| + | Now we have a file that just contains the ligand.  | ||
| + | |||
| + | [[File:1O86_ligand.png|600px]] | ||
| + | |||
| + | Go to File >> Save PDB and name the file <code> 1O86_ligand.pdb </code> | ||
| + | |||
| + | This will be the file that we will use to prepare the mol2 files for the dock program. | ||
| + | |||
| + | You can now close your current session on Chimera by going to File >> Close Session | ||
| + | |||
| + | === Preparing the Ligand === | ||
| + | |||
| + | Now that we have a ligand only PDB, lets go ahead and generate the two mol2 files with the ligand. | ||
| + | |||
| + | The first step is to go to open the <code> 1O86_ligand.pdb </code> on chimera. | ||
| + | |||
| + | Go to File >> Save Mol2  | ||
| + | |||
| + | Lets give the title of this first mol2 to denote no hydrogens or charge on the ligand. | ||
| + | |||
| + | Call it <code> 1O86_ligand_nH_ncH.mol2 </code> | ||
| + | |||
| + | '''Running Dock Prep''' | ||
| + | |||
| + | To generate the second mol2 you are more than welcome to revisit the steps taken to generate the second | ||
| + | protein mol2. In the case of the ligand, we will introduce an alternative with a program called Dock Prep. | ||
| + | |||
| + | First, return to <code> 1O86_ligand.pdb </code> on chimera: | ||
| + | |||
| + | Go to Tools >> Structure Editing >> DockPrep | ||
| + | |||
| + | The following pop-up menu will appear: | ||
| + | |||
| + | [[File:dock_prep.png|600px]] | ||
| + | |||
| + | All of the default parameters are sufficient. Select OK. | ||
| + | |||
| + | The next set of drop down menus will be the same as adding hydrogens and charges to a protein. | ||
| + | |||
| + | '''Remember''' all of the defaults are okay for adding hydrogens, but make sure you switch from | ||
| + | |||
| + | AM1-BCC to Gasteiger charges when adding the charges for the ligand.  | ||
| + | |||
| + | All other defaults are sufficient for each of the other pop-up menus. | ||
| + | |||
| + | The following images show the pop-up menus that will appear after you accept using Dock Prep and contain the acceptable defaults: | ||
| + | |||
| + | [[File:Lh_menu.png|600px]] | ||
| + | |||
| + | The Hydrogen Menu | ||
| + | |||
| + | [[File:Lch_menu.png|600px]] | ||
| + | |||
| + | The Charge Menu | ||
| + | |||
| + | [[File:LncH.png|600px]] | ||
| + | |||
| + | The Net Charge Menu | ||
| + | |||
| + | What's neat about Dock Prep is that the last pop-up menu will be the Save Mol2 menu: | ||
| + | |||
| + | [[File:mol2_menu.png|600px]] | ||
| + | |||
| + | Lets give this mol2 a file name that specifies that it has hydrogens and charges. | ||
| + | |||
| + | Call it <code> 1O86_ligand_wH_wcH.mol2 </code> | ||
| + | |||
| + | Congratulations, you've generated all the necessary mol2 files to advance to the next steps of this tutorial. | ||
| + | |||
| + | Lets transfer the mol2 files onto seawulf using the following command | ||
| + | |||
| + |    scp *mol2 yourusername@stonybrook.edu:'/gpfs/home/yourusername/tutorial/001_structure' | ||
| == 002: Spheres == | == 002: Spheres == | ||
| − | + | ||
| + | === Generate the Required Surface File === | ||
| + | |||
| + | # Open the <code>1O86_protein_nH_ncH.mol2</code> (with no hydrogens, no charges) in Chimera and hit Actions >> Surface >> Show | ||
| + | # Write the DMS file by choosing Tools > Structure Editing > Write DMS; save the file as <code>1O86.dms</code> | ||
| + | # Upload the DMS file to your <code>~/tutorial/002_spheres</code> directory on Seawulf using <code>scp</code> | ||
| + | # Now, move to the Seawulf terminal interface and navigate to your <code>~/tutorial/002_spheres</code> directory. Create a sphgen input file using the following command:  | ||
| + |  vi INSPH | ||
| + | |||
| + | :5. Paste the following into your input file, ensuring that no additional lines are pasted:  | ||
| + |  ./1O86.dms | ||
| + |  R | ||
| + |  X | ||
| + |  0.0 | ||
| + |  4.0 | ||
| + |  1.4 | ||
| + |  1O86.sph | ||
| + | |||
| + | :6. Run the program with the following command | ||
| + | |||
| + |  sphgen -i INSPH -o OUTSPH | ||
| + | |||
| + | :7. Download the output file to your local directory and File >> Open it in the same scene as your protein file in Chimera. This will overlay the generated spheres on the protein. | ||
| + | |||
| + | [[File:1O86_allsphereoverlay.png|600px]] | ||
| + | |||
| + | Based on the overlay the ribbons are aligned with the spheres indicating the generation of surface spheres was successful. | ||
| + | |||
| + | === Selecting Spheres Localized on Binding Site === | ||
| + | #Run program on dock 6 called sphere_selector: | ||
| + |  sphere_selector 1O86.sph ../001_structure/1O86_ligand_nH_ncH.mol2 10.0  | ||
| + | :2. Output file will automatically be written as selected_spheres.sph | ||
| + | :3. Download this file to your local computer and overlay the file with your original protein file to ensure the spheres are localized over your ligand in the same manner as the 1O86.sph file in the previous step.  | ||
| + | :4. Select and hide binding spheres to ensure they are where the ligand is located | ||
| + | |||
| + | [[File:1O86_selsphereoverlay.png|600px]] | ||
| + | |||
| + | These spheres will be used for initial orientation of the ligand during docking, and provide a rough approximation of the negative space of the receptor. | ||
| == 003: Grid/box == | == 003: Grid/box == | ||
| − | showbox, grid generation | + | |
| + | Generating box and grid. The box and grid generation specify the 3-D space where the docking simulation will take place. The box defines the 3-D space where the docking algorithm will search for binding orientations and conformations of the ligand. The box size should be large enough to include the binding space, but not too large to conserve computational cost. The grid pattern allows for energy calculations between protein and ligand. A smaller grid step would allow for more precise calculations but would be computationally expensive and time consuming. A larger grid step would be computationally cheaper but would be lower in accuracy.  | ||
| + | |||
| + | === Generating the Box === | ||
| + | Move to your <code>~/tutorial/003_gridbox</code> directory. Create an input file for generating a box around the ligand binding site.  | ||
| + |  vi showbox.in | ||
| + | Insert the following code into the file. These are parameters for the showbox program:  | ||
| + |  Y | ||
| + |  8.0 | ||
| + |  ../002_spheres/selected_spheres.sph | ||
| + |  1 | ||
| + |  1O86.box.pdb | ||
| + | |||
| + | This will construct a box that will enclose the binding spheres that are localized on the binding side. The box will have dimensions of 8.0 Angstroms from the binding spheres.  | ||
| + | |||
| + | Run the input file using the following command:  | ||
| + |   showbox < showbox.in | ||
| + | |||
| + | An output file titled <code>1O86.box.pdb</code> will be generated in your local directory. | ||
| + | |||
| + | === Generating the Grid === | ||
| + | |||
| + |  vi grid.in  | ||
| + | |||
| + | Insert the following into your input file:  | ||
| + |  allow_non_integral_charges                no | ||
| + |  compute_grids                             yes | ||
| + |  grid_spacing                              0.4 | ||
| + |  output_molecule                           no | ||
| + |  contact_score                             no | ||
| + |  energy_score                              yes | ||
| + |  energy_cutoff_distance                    9999 | ||
| + |  atom_model                                a | ||
| + |  attractive_exponent                       6 | ||
| + |  repulsive_exponent                        9 | ||
| + |  distance_dielectric                       yes | ||
| + |  dielectric_factor                         4 | ||
| + |  bump_filter                               yes | ||
| + |  bump_overlap                              0.75 | ||
| + |  receptor_file                             ../001_structure/1O86_protein_nH_ncH.mol2 | ||
| + |  box_file                                  1O86.box.pdb | ||
| + |  vdw_definition_file                       /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn | ||
| + |  score_grid_prefix                         grid | ||
| + | |||
| + | If grid fails because it cannot find the .defn file, you may need to edit the path of the <code>vdw_definition_file</code> parameter to point to a VDW defn file that actually exists. It should be in the <code>parameters</code> folder of your DOCK6 installation. | ||
| + | |||
| + | Save the file and run using the following command:  | ||
| + |   grid -i grid.in -o 1O86_gridinfo.out | ||
| + | |||
| + | An output file with your Grid generation will appear in your directory. | ||
| == 004: Minimization == | == 004: Minimization == | ||
| − | + | ||
| + | Before moving toward preforming docking of any fragments/ligands onto your receptor of interest it is important that your original ligand is energy-minimized. This is important before dock is able to calculate any energy interactions between atoms on the rector and ligand.  | ||
| + | |||
| + | Move to your <code>~/tutorial/004_energy_min</code> directory and create an input file using: | ||
| + | |||
| + |  vi min.in | ||
| + | |||
| + | In this input file, use the following parameters to minimize the ligand without re-orienting it: | ||
| + | |||
| + |  conformer_search_type                                        rigid | ||
| + |  use_internal_energy                                          yes  | ||
| + |  internal_energy_rep_exp                                      12   | ||
| + |  internal_energy_cutoff                                       100.0 | ||
| + |  ligand_atom_file                                             ../001_structure/1O86_ligand_wH_wcH.mol2 | ||
| + |  limit_max_ligands                                            no   | ||
| + |  skip_molecule                                                no   | ||
| + |  read_mol_solvation                                           no   | ||
| + |  calculate_rmsd                                               yes  | ||
| + |  use_rmsd_reference_mol                                       yes  | ||
| + |  rmsd_reference_filename                                      ../001_structure/1O86_ligand_wH_wcH.mol2 | ||
| + |  use_database_filter                                          no   | ||
| + |  orient_ligand                                                no   | ||
| + |  bump_filter                                                  no   | ||
| + |  score_molecules                                              yes  | ||
| + |  contact_score_primary                                        no   | ||
| + |  grid_score_primary                                           yes  | ||
| + |  grid_score_rep_rad_scale                                     1    | ||
| + |  grid_score_vdw_scale                                         1    | ||
| + |  grid_score_es_scale                                          1    | ||
| + |  grid_lig_efficiency                                          no   | ||
| + |  grid_score_grid_prefix                                       ../003_gridbox/grid | ||
| + |  minimize_ligand                                              yes  | ||
| + |  simplex_max_iterations                                       1000 | ||
| + |  simplex_tors_premin_iterations                               0    | ||
| + |  simplex_max_cycles                                           1    | ||
| + |  simplex_score_converge                                       0.1  | ||
| + |  simplex_cycle_converge                                       1.0  | ||
| + |  simplex_trans_step                                           1.0  | ||
| + |  simplex_rot_step                                             0.1  | ||
| + |  simplex_tors_step                                            10.0 | ||
| + |  simplex_final_min                                            no   | ||
| + |  simplex_random_seed                                          0    | ||
| + |  simplex_restraint_min                                        no   | ||
| + |  atom_model                                                   all  | ||
| + |  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn | ||
| + |  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.defn | ||
| + |  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl | ||
| + |  ligand_outfile_prefix                                        1O86_min | ||
| + |  write_mol_solvation                                          no   | ||
| + |  write_orientations                                           no   | ||
| + |  num_scored_conformers                                        1    | ||
| + |  score_threshold                                              100.0 | ||
| + |  rank_ligands                                                 no | ||
| + | |||
| + | Finally, run the minimization using DOCK6 as such: | ||
| + |  dock6 -i min.in -o min.out | ||
| + | |||
| + | The program should run relatively quickly and create a new Mol2 file called <code>1O86_min_ranked.mol2</code>. Download this file to your local machine and open it in Chimera along with the original <code>1O86_ligand_wH_wcH.mol2</code> to compare the pre- and post-minimized poses. There should be minimal difference between them. | ||
| + | |||
| + | [[File:1O86_ligminoverlay.png|600px]] | ||
| + | |||
| + | (note that in the above image hydrogens have been hidden for simplicity; the generated Mol2 file and the original ligand file should both be versions where hydrogens are present) | ||
| == 005: Rigid Docking == | == 005: Rigid Docking == | ||
| − | + | Rigid docking allows the ligand to be treated as a rigid molecule (no confirmational changes) and determines best orientation for molecule to fit directly into the receptor site.  | |
| + | |||
| + | Move to your <code>~/tutorial/005_rigid_dock</code> directory and create an input file using the following command: | ||
| + |  vi rigid.in | ||
| + | |||
| + | Add the following parameters: | ||
| + |  conformer_search_type                                        rigid | ||
| + |  use_internal_energy                                          yes  | ||
| + |  internal_energy_rep_exp                                      12   | ||
| + |  internal_energy_cutoff                                       100.0 | ||
| + |  ligand_atom_file                                             ../004_energy_min/1O86_min_scored.mol2 | ||
| + |  limit_max_ligands                                            no   | ||
| + |  skip_molecule                                                no   | ||
| + |  read_mol_solvation                                           no   | ||
| + |  calculate_rmsd                                               yes  | ||
| + |  use_rmsd_reference_mol                                       yes  | ||
| + |  rmsd_reference_filename                                      ../004_energy_min/1O86_min_scored.mol2 | ||
| + |  use_database_filter                                          no   | ||
| + |  orient_ligand                                                yes  | ||
| + |  automated_matching                                           yes  | ||
| + |  receptor_site_file                                           ../002_spheres/selected_spheres.sph | ||
| + |  max_orientations                                             1000 | ||
| + |  critical_points                                              no   | ||
| + |  chemical_matching                                            no   | ||
| + |  use_ligand_spheres                                           no | ||
| + |  bump_filter                                                  no | ||
| + |  score_molecules                                              yes | ||
| + |  contact_score_primary                                        no | ||
| + |  grid_score_primary                                           yes | ||
| + |  grid_score_rep_rad_scale                                     1 | ||
| + |  grid_score_vdw_scale                                         1 | ||
| + |  grid_score_es_scale                                          1 | ||
| + |  grid_lig_efficiency                                          no | ||
| + |  grid_score_grid_prefix                                       ../003_gridbox/grid | ||
| + |  minimize_ligand                                              yes | ||
| + |  simplex_max_iterations                                       1000 | ||
| + |  simplex_tors_premin_iterations                               0 | ||
| + |  simplex_max_cycles                                           1 | ||
| + |  simplex_score_converge                                       0.1 | ||
| + |  simplex_cycle_converge                                       1.0 | ||
| + |  simplex_trans_step                                           1.0 | ||
| + |  simplex_rot_step                                             0.1 | ||
| + |  simplex_tors_step                                            10.0 | ||
| + |  simplex_final_min                                            no | ||
| + |  simplex_random_seed                                          0 | ||
| + |  simplex_restraint_min                                        no | ||
| + |  atom_model                                                   all | ||
| + |  vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn | ||
| + |  flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn | ||
| + |  flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl | ||
| + |  ligand_outfile_prefix                                        1O86_rigid | ||
| + |  write_mol_solvation                                          no | ||
| + |  write_orientations                                           no | ||
| + |  num_scored_conformers                                        100 | ||
| + |  write_conformations                                          no | ||
| + |  cluster_conformations                                        yes | ||
| + |  cluster_rmsd_threshold                                       2.0 | ||
| + |  score_threshold                                              100.0 | ||
| + |  rank_ligands                                                 no | ||
| + | |||
| + | Run using the following command:  | ||
| + |  dock6 -i rigid.in -o rigid.out | ||
| + | |||
| + | The job should take no longer than a minute.  | ||
| + | |||
| + | Or run using the following slurm script:  | ||
| + |  #!/bin/bash | ||
| + |  #SBATCH --job-name=1O86_rigid           # Name of the job | ||
| + |  #SBATCH --output=rigid.txt  # Standard output file | ||
| + |  #SBATCH --error=1O86_error.txt    # Standard error file | ||
| + |  #SBATCH --ntasks-per-node=1        # Number of tasks (adjust as needed) | ||
| + |  #SBATCH --mail-user=vanie.seecharan@stonybrook.edu          # user | ||
| + |  #SBATCH --nodes=1                   # nodes | ||
| + |  #SBATCH --time=4:00:00            # Maximum walltime (adjust as needed) | ||
| + |  #SBATCH --partition=short-40core    # Correct partition name | ||
| + |  # Ensure the input file exists before running the program | ||
| + |  if [ ! -f grid.in ]; then | ||
| + |     echo "Error: grid.in file not found!" | ||
| + |     exit 1 | ||
| + |  fi | ||
| + |  # Run the program | ||
| + |  /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/bin/dock6 -i rigid.in -o rigid.out 10.0 | ||
| + | |||
| + | The following output files will be produced in your directory:  | ||
| + | * 1O86_rigid_scored.mol2 | ||
| + | * rigid.out | ||
| + | |||
| + | Download the Mol2 file to your computer. This Mol2 file will contain an ensemble of poses which will be messy to visualize all at once. As such, you should use Chimera's ViewDock functionality to view the rigid docked poses. Use Tools >> Surface/Binding Analysis >> ViewDock and open the <code>1O86_rigid_scored.mol2</code> file that was created in the docking. The file will be of type DOCK 4, 5, or 6. A separate dialogue box will open that will allow you to select which pose you would like to view. | ||
| + | |||
| + | [[File:1O86_rigiddockingviewdock2.png|600px]] | ||
| == 006: Flexible Docking == | == 006: Flexible Docking == | ||
| − | + | Flexible docking accounts for the conformational flexibility of the ligand by sampling the torsions present in it.  | |
| + | |||
| + | Navigate to your <code>~/tutorial/006_flex_dock</code> and create an input file using the following command: | ||
| + | <pre> | ||
| + | vi flex.in | ||
| + | </pre> | ||
| + | |||
| + | Add the following script: | ||
| + | |||
| + | <pre> | ||
| + | conformer_search_type                                        flex | ||
| + | write_fragment_libraries                                     no | ||
| + | user_specified_anchor                                        no | ||
| + | limit_max_anchors                                            no | ||
| + | min_anchor_size                                              5 | ||
| + | pruning_use_clustering                                       yes | ||
| + | pruning_max_orients                                          1000 | ||
| + | pruning_clustering_cutoff                                    100 | ||
| + | pruning_orient_score_cutoff                                  1000.0 | ||
| + | pruning_conformer_score_cutoff                               100.0 | ||
| + | pruning_conformer_score_scaling_factor                       1.0 | ||
| + | use_clash_overlap                                            no | ||
| + | write_growth_tree                                            no | ||
| + | use_internal_energy                                          yes | ||
| + | internal_energy_rep_exp                                      12 | ||
| + | internal_energy_cutoff                                       100.0 | ||
| + | ligand_atom_file                                             ../004_dock/1o86_lig_min_scored.mol2 | ||
| + | limit_max_ligands                                            no | ||
| + | skip_molecule                                                no | ||
| + | read_mol_solvation                                           no | ||
| + | calculate_rmsd                                               yes | ||
| + | use_rmsd_reference_mol                                       yes | ||
| + | rmsd_reference_filename                                      ../004_dock/1o86_lig_min_scored.mol2 | ||
| + | use_database_filter                                          no | ||
| + | orient_ligand                                                yes | ||
| + | automated_matching                                           yes | ||
| + | receptor_site_file                                           ../002_surface_spheres/selected_spheres.sph | ||
| + | max_orientations                                             1000 | ||
| + | critical_points                                              no | ||
| + | chemical_matching                                            no | ||
| + | use_ligand_spheres                                           no | ||
| + | bump_filter                                                  no | ||
| + | score_molecules                                              yes | ||
| + | contact_score_primary                                        no | ||
| + | grid_score_primary                                           yes | ||
| + | grid_score_rep_rad_scale                                     1 | ||
| + | grid_score_vdw_scale                                         1 | ||
| + | grid_score_es_scale                                          1 | ||
| + | grid_lig_efficiency                                          no | ||
| + | grid_score_grid_prefix                                       ../003_grid/grid | ||
| + | minimize_ligand                                              yes | ||
| + | minimize_anchor                                              yes | ||
| + | minimize_flexible_growth                                     yes | ||
| + | use_advanced_simplex_parameters                              no | ||
| + | minimize_flexible_growth_ramp                                yes | ||
| + | simplex_max_cycles                                           1 | ||
| + | simplex_score_converge                                       0.1 | ||
| + | simplex_initial_score_coverge                                5 | ||
| + | simplex_cycle_converge                                       1.0 | ||
| + | simplex_trans_step                                           1.0 | ||
| + | simplex_rot_step                                             0.1 | ||
| + | simplex_tors_step                                            10.0 | ||
| + | simplex_anchor_max_iterations                                500 | ||
| + | simplex_grow_max_iterations                                  500 | ||
| + | simplex_grow_tors_premin_iterations                          0 | ||
| + | simplex_final_min                                            no | ||
| + | simplex_random_seed                                          0 | ||
| + | simplex_restraint_min                                        no | ||
| + | atom_model                                                   all | ||
| + | vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn | ||
| + | flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn | ||
| + | flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl | ||
| + | ligand_outfile_prefix                                        flex.out | ||
| + | write_mol_solvation                                          no | ||
| + | write_orientations                                           no | ||
| + | num_scored_conformers                                        1 | ||
| + | score_threshold                                              100.0 | ||
| + | rank_ligands                                                 no | ||
| + | </pre> | ||
| + | |||
| + | Now, you can run the above file, flex.in, by creating a shell script (flex.sh) as stated below. This will be helpful to have if you need to re-run the job for any reason; rather than re-typing the full dock command, you can save it in the shell script file and run that script instead.  | ||
| + | |||
| + | <pre> | ||
| + | #!/bin/bash | ||
| + | |||
| + | dock6 -i flex.in -o flex.out | ||
| + | </pre> | ||
| + | |||
| + | Now run the script- | ||
| + | |||
| + | <pre> | ||
| + | bash flex.sh | ||
| + | </pre> | ||
| + | |||
| + | DOCK6 program should finish running in a few minutes and should generate the following two files- | ||
| + | a) flex.out | ||
| + | b) flex_scored.mol2 | ||
| + | |||
| + | Now we can transfer the docked files to our local system and visualize the results in chimera. | ||
| + | |||
| + | [[File:flex_1o86.png|600px]] | ||
| + | |||
| + | The docked ligand (colored in cyan) shows good structural overlap with the reference structure (purple). One way of confirming this is by looking at the RMSD value (below) which is below 1 A for this case. | ||
| + | |||
| + | [[File:flex_param.png|600px]] | ||
| + | |||
| + | We can also overlay the structures from the three docking types and compare with the reference to observe the best binding pose. The ligand structure obtained due to rigid docking (green)  gives the lowest RMSD score of 0.59. Next best RMSD score is obtained from flexible docking (cyan) and then from fixed anchor docking (orange), giving an RMSD of 1.27.  | ||
| + | |||
| + | [[File:Overlap_dock_types.png|600px]] | ||
| == 007: Footprint Scoring == | == 007: Footprint Scoring == | ||
| − | |||
| − | == 008:  | + | Footprint analysis maps out the interaction landscape between a ligand and its target protein. By breaking down the energetics of binding into electrostatic and van der Waals (VDW) contributions, it provides a detailed visualization of how different parts of a ligand interact with specific regions of the protein. | 
| − | Slurm and queue  | + | |
| + | Move to your <~/tutorial/007_footprint</code> directory and create an input file (fps.in). | ||
| + | <pre> | ||
| + | vi fps.in | ||
| + | </pre> | ||
| + | |||
| + | Add the following- | ||
| + | <pre> | ||
| + | conformer_search_type                                        rigid | ||
| + | use_internal_energy                                          no | ||
| + | ligand_atom_file                                             ../004_dock/1O86_min_scored.mol2 | ||
| + | limit_max_ligands                                            no | ||
| + | skip_molecule                                                no | ||
| + | read_mol_solvation                                           no | ||
| + | calculate_rmsd                                               no | ||
| + | use_database_filter                                          no | ||
| + | orient_ligand                                                no | ||
| + | bump_filter                                                  no | ||
| + | score_molecules                                              yes | ||
| + | contact_score_primary                                        no | ||
| + | grid_score_primary                                           no | ||
| + | gist_score_primary                                           no | ||
| + | multigrid_score_primary                                      no | ||
| + | dock3.5_score_primary                                        no | ||
| + | continuous_score_primary                                     no | ||
| + | footprint_similarity_score_primary                           yes | ||
| + | fps_score_use_footprint_reference_mol2                       yes | ||
| + | fps_score_footprint_reference_mol2_filename                  ../001_structure/1O86_ligand_wH_wcH.mol2 | ||
| + | fps_score_foot_compare_type                                  Euclidean | ||
| + | fps_score_normalize_foot                                     no | ||
| + | fps_score_foot_comp_all_residue                              yes | ||
| + | fps_score_receptor_filename                                  ../001_structure/1O86_protein_wH_wcH.mol2 | ||
| + | fps_score_vdw_att_exp                                        6 | ||
| + | fps_score_vdw_rep_exp                                        9 | ||
| + | fps_score_vdw_rep_rad_scale                                  1 | ||
| + | fps_score_use_distance_dependent_dielectric                  yes | ||
| + | fps_score_dielectric                                         4.0 | ||
| + | fps_score_vdw_fp_scale                                       1 | ||
| + | fps_score_es_fp_scale                                        1 | ||
| + | fps_score_hb_fp_scale                                        0 | ||
| + | minimize_ligand                                              no | ||
| + | simplex_final_min                                            no | ||
| + | atom_model                                                   all | ||
| + | vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn | ||
| + | flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn | ||
| + | flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl | ||
| + | ligand_outfile_prefix                                        fps | ||
| + | write_mol_solvation                                          no | ||
| + | write_footprints                                             yes | ||
| + | write_hbonds                                                 yes | ||
| + | write_orientations                                           no | ||
| + | num_scored_conformers                                        1 | ||
| + | score_threshold                                              100.0 | ||
| + | rank_ligands                                                 no | ||
| + | |||
| + | </pre> | ||
| + | |||
| + | Now create a bash script to run the above create file- | ||
| + | <pre> | ||
| + | vi fps.sh | ||
| + | </pre> | ||
| + | |||
| + | <pre> | ||
| + | #!/bin/bash | ||
| + | |||
| + | echo "Running DOCK for footprint generation..." | ||
| + | |||
| + | dock6 -i fps.in -o fps.out >&fps.log1 | ||
| + | |||
| + | echo "Generating plot..." | ||
| + | |||
| + | python /gpfs/projects/AMS536/zzz.programs/plot_footprint_single_magnitude.py fps_footprint_scored.txt  50 >&fps.log2 | ||
| + | |||
| + | echo "Done" | ||
| + | </pre> | ||
| + | |||
| + | As shown above, the .sh script also includes the path to a python file that generates and saves the footprint analysis results as a PDF. Once the script has been executed, transfer the generated PDF to your local system to review the results. The output should resemble the example shown below, showing a visual representation of the electrostatic and van der Waals interactions between the ligand and the protein. | ||
| + | |||
| + | [[File:Footprint_score_1o86.png|600px]] | ||
| + | |||
| + | == 008: 5K Virtual Screening == | ||
| + | |||
| + | In this section, you will conduct a virtual screening on a set of druglike potential leads against the 1O86 receptor. Virtual screening of large ligand libraries is one of the main uses of DOCK and other molecular docking suites in drug discovery, as it allows for lead compound identification at much lower cost than traditional ''in vitro'' assays. | ||
| + | |||
| + | === Input File ===   | ||
| + | |||
| + | Because this section generates a high volume of molecule poses, you may need to make use of your scratch directory on Seawulf. While your home directory has a maximum storage volume of 20 gigabytes, scratch has a storage volume of 20 TERAbytes, so it will easily accommodate the output of your virtual screen. Your scratch directory should be located at <code>/gpfs/scratch/yourusername/</code>. Move there and create a directory for your virtual screening.  | ||
| + | |||
| + | Copy the input file from your <code>006_flex_dock</code> directory into your present directory. We can re-use the majority of input from this file because the docking process is the same in both cases; the only difference is in the number of molecules to be docked. If you like, you can re-name the input file using <code>mv</code> to something like <code>flex_vs.in</code> for clarity.  | ||
| + | |||
| + | The only parameter that needs to be changed in the input file is the <code>ligand_atom_file</code> parameter. Instead of the single <code>1O86_min_scored.mol2</code>, we want to dock a Mol2 file that contains multiple potential drug candidates. You should be able to find several such Mol2 files in <code>/gpfs/projects/AMS536/zzz.programs/VS_libraries/</code>. For this tutorial, we will screen the 5,000 molecule library (<code>VS_library_5K.mol2</code>), but you can elect to do the 100- (or 25,000) molecule library by simply changing the input file name. Just copy the path to the file (should be <code>/gpfs/projects/AMS536/zzz.programs/VS_libraries/VS_library_5K.mol2</code>), open the .in file, delete the current value for <code>ligand_atom_file</code>, and paste in the path. If you don't want to deal with the long filepath, you can also copy the VS_library file to your working directory, and just have the filename as the parameter instead. It is probably also advisable to change the names of other output files in the .in file so that the output is distinguishable from the previous flexible docking. | ||
| + | |||
| + | Note that if you have moved to your scratch directory and used relative paths in the .in file for the receptor sphere file, gridbox, et cetera, those will also need to be changed to reflect your new location. Absolute paths will likely be the most efficient solution for this. | ||
| + | |||
| + | === Slurm and Job Submission === | ||
| + | |||
| + | Up until this point, you have probably been running dock6 on the head node of Seawulf. This is fine for short, low-intensity jobs, like docking a single molecule or (debatably) generating a grid. However, the head node caters to all users currently accessing Seawulf, and has limited computing power at its disposal. As such, attempting to run a longer job (such as a 5,000 molecule screen) will fully consume the head node's limited resources, slowing down all user sessions in the process until the Seawulf admins notice and kick you. To safely utilize the full computing power of Seawulf, you need to submit a job to one of its allocated high-power computing nodes using its queueing system. This system tracks all jobs that have been submitted by various users and allocates them to processors as they are freed up, maximizing the productivity and efficiency of the computing cluster while ensuring a smooth user experience. The system Seawulf uses for this queue tracking is called Slurm (Simple Linux Utility and Resource Management). | ||
| + | |||
| + | Slurm is implemented as a ''module'' on Seawulf, and needs to be loaded into your environment before it can be used. You can load it by typing <code>module load slurm</code> into your terminal from any directory. Of note, you will need to do this every time you start a new SSH session. If you want the module to always be loaded when you log in, you can add the command to your .bashrc file (see section 000 for instructions on how to modify your .bashrc).  | ||
| + | |||
| + | Once the module has loaded, you can use the command <code>sinfo</code> to view what processors are present on Seawulf. If you are on Milan, you should see 40- and 96-core queues of varying lengths (short, medium, long), as well as some specialized GPU cores and high memory cores. The [https://rci.stonybrook.edu/HPC/faqs/seawulf-queues Seawulf website] has some good descriptions of what cores are available and the duration of jobs allowed on them. <code>sinfo --state=idle</code> allows you to see what queues have cores idle; this is helpful for knowing where a job can start running immediately.  | ||
| + | |||
| + | To submit a job to a queue, you need to create a special job script file that will tell the queue everything it needs to know about the job, as well as what commands to run. Here is what that file will look like for our VS run: | ||
| + | |||
| + |  <nowiki> | ||
| + | #!/bin/bash | ||
| + | # | ||
| + | #SBATCH --job-name=1o86_vs | ||
| + | #SBATCH --output=vs_output.txt | ||
| + | #SBATCH --nodes=1 | ||
| + | #SBATCH --ntasks-per-node=96 | ||
| + | #SBATCH --time=48:00:00 | ||
| + | #SBATCH -p long-96core | ||
| + | |||
| + | module load intel/mpi/64/2018/18.0.3 | ||
| + | |||
| + | mpirun -np 96 dock6.mpi -i flex_vs.in -o flex_vs.out | ||
| + | </nowiki> | ||
| + | |||
| + | A line-by-line breakdown of this file: | ||
| + | * <code>#!/bin/bash</code> is called a shebang, and is used in unix shell scripts to indicate that bash should be used to interpret and execute the script. | ||
| + | * <code>#SBATCH --job-name=1O86_vs</code> tells slurm what our job is called; this can be changed to whatever strikes your fancy without consequence | ||
| + | * <code>#SBATCH --output=vs_output.txt</code> tells slurm where to output information about the job itself. If jobs are not running properly, check this file for errors. | ||
| + | * <code>#SBATCH --nodes=1</code> tells slurm how many computing nodes we want for this job. 1 is appropriate here as the job is not very large. | ||
| + | * <code>#SBATCH --ntasks-per-node=96</code> tells slurm how many tasks it should expect to run per node. There are no strict guidelines on this, but setting it equal to the number of cores in the node (96 or 40 on Milan) is a good default. | ||
| + | * <code>#SBATCH --time=48:00:00</code> tells slurm the maximum amount of time it should allocate for this job. The best thing to do here is request the maximum time possible for the queue; check the [https://rci.stonybrook.edu/HPC/faqs/seawulf-queues Seawulf website] for each queue's parameters. Here, because we are submitting to the long-96core queue, we are requesting the queue maximum of 48 hours; our job will likely not take that long. | ||
| + | * <code>#SBATCH -p long-96core</code> tells slurm which queue we would like to submit to; in this case, we are submitting to the long-term 96-core queue.  | ||
| + | * <code>module load intel/mpi/64/2018/18.0.3</code> loads a module that allows for parallel processing, greatly speeding up job progress. | ||
| + | * <code>mpirun -np 96 dock6.mpi -i flex_vs.in -o flex_vs.out</code> is the actual command we would like to run in the job. This particular command asks <code>mpirun</code>, which implements parallel processing, to run the MPI version of dock6 for our screening. The <code>np</code> parameter provides information on the number of processors available for MPI; this can be calculated by multiplying <code>nodes</code> by <code>ntasks-per-node</code> (96 * 1 in this case). | ||
| + | |||
| + | Now that you have a sense of what we're trying to do, create a shell file with a recognizable name (ex, job.sh) and either copy-paste or re-type the above input into the file, making any modifications to the dock path or input filename as needed. Save and exit the file. | ||
| + | |||
| + | Finally, you are ready to submit your first job to the queue. Use the command <code>sbatch job.sh</code> to submit your job file. If slurm accepts it, you will be provided a job number.  | ||
| + | |||
| + | [[File:JobSubmitted.png|600px]] | ||
| + | |||
| + | You can use the command <code>squeue -u yourusername</code> to check the status of your job. If you do not see it in the queue directly after submitting, it has likely encountered an error, and you will need to troubleshoot. A good place to start will be reading any generated output files. | ||
| + | |||
| + | [[File:JobRunning.png|600px]] | ||
| + | |||
| + | Otherwise, it will likely start as PD (pending) for a time (depending on how full the queue is, this can vary from a minute to a day. Sorry.) before switching to R (running). As it runs, you should see .mol2 files spontaneously created in the directory. Make sure to check semi-regularly that the job is still running.  | ||
| + | |||
| + | Once the virtual screen has finished, you should have a .mol2 file in your directory that contains the best poses of the 5,000 ligands that were screened, according to purely grid-based docking. | ||
| + | |||
| + | === Re-scoring Docked Molecules === | ||
| + | |||
| + | Although grid scoring provides a good initial look at the most favorable poses of the given molecules, it is generally a good idea to re-score the poses provided by a different criteria to determine a final order. In this case, we will use footprint similarity scoring (FPS) as our re-scoring function. Once again, we can copy the input file that we used in the single-molecule footprint scoring example, and change the <code>ligand_atom_file</code> parameter from the 1O86 ligand to the output poses from the virtual screening. You should also change <code>ranked_ligands</code> to <code>yes</code> (towards the end of the file), and add a line with the parameter <code>max_ranked_ligands</code> set to <code>500</code> at the end of the file. This will ensure that the final .mol2 is a manageable size, while retaining the best poses of the docking. Finally, set the <code>ligand_outfile_prefix</code> parameter to something that will distinguish it from the other output already in the directory. Write and exit the file, and rename it to something appropriate like <code>rescore.in</code>. | ||
| + | |||
| + | Because this re-scoring uses rigid docking and no grid calculations, it should be relatively lightweight and safe to run on the head node. If you want to be safe (or if it is still running after a couple minutes), you can cancel it and use the command <code>srun -p short-96core -t 1:00:00 --pty bash</code> to instead open up an interactive bash session on one of the computing nodes. This will let you use an interactive terminal in a manner identical to the head node, but direct all processing requests to an available computing node instead of using the head node's resources. Unfortunately, it is also subject to the whims of the queue availability, so you may have to wait a while or time strategically to obtain an available node. | ||
| + | |||
| + | Once you run the re-scoring using <code> dock6 -i rescore.in -o rescore.out</code>, you should see a re-scored ligand file in the directory with the prefix indicated in the rescore.in file. If you want, you can copy this .mol2 file to your home directory so that it is not purged after 30 days (a convenient feature of the scratch directory). After that, download it to your local machine using <code>scp</code>. | ||
| + | |||
| + | === Viewing VS Results in Chimera === | ||
| + | |||
| + | Thanks to our infallible foresight, we have saved only the 500 best ligands to visualize in Chimera, as opposed to the original, crash-inducing number of 5,000. To analyze the poses of these ligands, launch Chimera and File >> Open your receptor .mol2 file. Then, navigate to Tools >> Surface/Binding Analysis >> ViewDock, and open the 500-ligand .mol2 file that you have just downloaded. Chimera may lag for a minute depending on your system specs, but should eventually provide a dialogue box and display one ligand in (hopefully) your receptor's binding site. | ||
| + | |||
| + | [[File:VS ViewDock.png|600px]] | ||
| + | |||
| + | The dialogue box allows you to select one or multiple ligands at a time to view, and displays their scores in a nice table format. You can view additional columns of data in the dialogue box by navigating to Column >> Show and selecting the desired parameter, and view multiple ligands at a time by clicking one, then holding shift and clicking another.  | ||
| + | |||
| + | Congratulations! You have successfully conducted a virtual screening, learned how to use slurm in the process, and now have results for potential lead compounds targeting this receptor. | ||
| + | |||
| + | == 009: Genetic Algorithm == | ||
| + | |||
| + | In addition to providing a platform for traditional virtual screening methods, DOCK also implements several unique algorithms for ''de novo''-type drug design. One of these is the genetic algorithm (GA), which accepts one or several "parent" molecules and "mutates" them with basic substitutions, insertions, or deletions using a standard molecular fragment library before culling the resultant "population" based on user-specified metrics. The intent is to iteratively improve molecules over several "generations" of mutation-cull steps so that new chemical space can be explored, and more potent ligands found.  | ||
| + | |||
| + | In order for the GA to mutate the provided parent molecules, it needs a standard library of molecular fragments to draw from. DOCK provides a foundational library that is generally effective in most GA applications, but can also accept .mol2 files to fragment and use as a custom library. To make this tutorial as educational as possible, you will learn how to fragment a molecule and add it to the foundational fragment library before running the genetic algorithm. | ||
| + | |||
| + | First, move to your <code>009_gen_alg</code> directory and create a directory within it called <code>fraglib/</code>. Then, move into that directory in preparation for creating the fraglib generation input file. | ||
| + | |||
| + | === Generating Fragments ===  | ||
| + | |||
| + | The overall goal in this section is to use DOCK to fragment the ligand present in the 1O86 system, and add those fragments to the foundational library. The backbone of this operation is the <code>write_fragment_libraries</code> option in the DOCK input file. Below is what a sample input file would look like; you may copy and paste it directly, but we recommend moving through DOCK's decision tree and understanding what defaults the program has that we are changing. In this instance, we override several defaults such as <code>orient_ligand</code> and <code>score_molecule</code>, as these are unnecessary for the pure operation of generating ligand fragments. We will refer to the input file as <code>fraglib_gen.in</code> Some parameters are left as exercises to the reader; refer back to previous steps if confused. | ||
| + | |||
| + |  <nowiki> | ||
| + | conformer_search_type                                    	flex | ||
| + | write_fragment_libraries                                 	yes | ||
| + | fragment_library_prefix                                  	1O86_lig_fraglib | ||
| + | fragment_library_freq_cutoff                             	1 | ||
| + | fragment_library_sort_method                             	freq | ||
| + | fragment_library_trans_origin                            	no | ||
| + | use_internal_energy                                      	no | ||
| + | ligand_atom_file                                         	FILL IN 1O86 LIGAND MOL2 | ||
| + | limit_max_ligands                                        	no | ||
| + | skip_molecule                                            	no | ||
| + | read_mol_solvation                                       	no | ||
| + | calculate_rmsd                                           	no | ||
| + | use_database_filter                                      	no | ||
| + | orient_ligand                                            	no | ||
| + | bump_filter                                              	no | ||
| + | score_molecules                                          	no | ||
| + | atom_model                                               	all | ||
| + | vdw_defn_file                                            	FILL IN VDW.DEFN FILEPATH | ||
| + | flex_defn_file                                           	FILL IN FLEX.DEFN FILEPATH | ||
| + | flex_drive_file                                          	FILL IN FLEX_DRIVE FILEPATH | ||
| + | ligand_outfile_prefix                                    	1o86_fraglib | ||
| + | write_mol_solvation                                      	no | ||
| + | write_orientations                                       	no | ||
| + | num_scored_conformers                                    	1 | ||
| + | score_threshold                                          	100.0 | ||
| + | rank_ligands                                             	no | ||
| + | </nowiki> | ||
| + | |||
| + | Once you have created this input file and inserted the relevant filepaths, you can run DOCK to generate the fragments: | ||
| + | |||
| + | <code>dock6 -i fraglib_gen.in -o fraglib_gen.out</code> | ||
| + | |||
| + | Because we are only fragmenting one molecule, this program should be fine to run on the head node, and should finish in a couple seconds. Once it finishes, you should see present in the directory: | ||
| + | * Four .mol2 files containing "linker", "rigid", "scaffold", and "sidechain" in their names; will be referred to as <code>1o86_fraglib_TYPE.mol2</code> | ||
| + | * A file ending in "torenv.dat" (referred to as 1o86_lig_torenv.dat hereafter) | ||
| + | * An output_scored.mol2 file | ||
| + | * A .out file with whatever name you specified | ||
| + | |||
| + | The set of four .mol2 files contain all the fragments that DOCK was able to generate from the original molecule, and the *torenv.dat file is a record of all the torsion-able bonds that DOCK cleaved to generate the fragments.  | ||
| + | |||
| + | === Combining Fragment Libraries === | ||
| + | |||
| + | Now that we have generated fragments and associated torsions from our ligand, we need to combine them with the standard DOCK fragment library and associated torsion environments. The torsion environment combination can be accomplished via a python script.  | ||
| + | |||
| + | First, determine the path to the default DOCK torsion environment collection. It should be located in the <code>parameters</code> folder associated with your DOCK installation. At the time of writing, the full path is <code>/gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_torenv.dat</code> | ||
| + | |||
| + | Next, determine the path for the python script used to combine torsion environments. It should be in the same <code>bin</code> directory as your dock6 command. At the time of writing, there is an instance at <code>/gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/bin/combine_torenv.py</code> | ||
| + | |||
| + | Once you have found paths that work for both of these files, you can combine the standard torsion environment with the newly generated custom one by running | ||
| + | |||
| + | <code>python insert_path_here/combine_torenv.py ./1o86_lig_torenv.dat insert_path_here/fraglib_torenv.dat</code> | ||
| + | |||
| + | (note that you may need to use the command <code>module load python</code> if you encounter a python-related error in running this) Once the script finishes, you should have a <code>full_fraglib.dat</code>, <code>full_sorted_fraglib.dat</code> and <code>unique_full_sorted_fraglib.dat</code> files in your directory. We will be using the "unique" variant for our genetic algorithm process. | ||
| + | |||
| + | We also need to combine the .mol2 files that we have generated from our ligand with the standard DOCK-provided ones. Because .mol2 files just contain lists of molecules without any special header/footer text, they can just be concatenated together by a shell script. Combine the linker, scaffold, and sidechain .mol2 files with their counterparts in the <code>parameters/</code> folder using <code>cat</code>; an example would be: | ||
| + | |||
| + | <code>cat insert_path_here/parameters/fraglib_linker.mol2 ./1o86_fraglib_linker.mol2 >> combined_fraglib_linker.mol2</code> | ||
| + | |||
| + | (Note that we do not do anything with the "rigid" .mol2; this set of fragments is not needed to run the GA). After you do this process for all three variants of the fraglib .mol2s, you will be ready to run the genetic algorithm. It may be advisable to move the ligand-specific fragment library mol2s to a sub-directory in order to prevent cluttering/confusion. | ||
| + | |||
| + | === Running the GA === | ||
| + | |||
| + | With all preparation completed, we can now start optimizing molecules using the genetic algorithm. We will use our original ligand as the "parent" molecule for the first generation, and allow the algorithm to mutate it into new molecules for fifty generations.  | ||
| + | |||
| + | Aside from the filepaths/filenames for the various fraglibs, ligand, protein spheres, and .defn files, the only parameters we are changing from default in the .in file are <code>ga_max_generations</code> and <code>ga_mutate_parents</code>. Maximum generations are lowered to 50 for computational tractability, and parents must be mutable because we only have one molecule to start with; the population would grow very slowly or even die off if we did not allow parental recombination. A full sample input file is given below (but as always, do please try to do it yourself): | ||
| + | |||
| + |  <nowiki> | ||
| + | conformer_search_type                                    	genetic | ||
| + | ga_molecule_file                                         	ORIGINAL 1O86 LIGAND MOL2 | ||
| + | ga_utilities                                             	no | ||
| + | ga_fraglib_scaffold_file                                 	FRAGLIB_SCAFFOLD MOL2 | ||
| + | ga_fraglib_linker_file                                   	FRAGLIB_LINKER MOL2 | ||
| + | ga_fraglib_sidechain_file                                	FRAGLIB_SIDECHAIN MOL2 | ||
| + | ga_torenv_table                                          	FRAGLIB TORENV DAT | ||
| + | ga_max_generations                                       	50 | ||
| + | ga_xover_on                                              	yes | ||
| + | ga_xover_sampling_method_rand                            	yes | ||
| + | ga_xover_max                                             	150 | ||
| + | ga_bond_tolerance                                        	0.5 | ||
| + | ga_angle_cutoff                                          	0.14 | ||
| + | ga_check_overlap                                         	no | ||
| + | ga_mutate_addition                                       	yes | ||
| + | ga_mutate_deletion                                       	yes | ||
| + | ga_mutate_substitution                                   	yes | ||
| + | ga_mutate_replacement                                    	yes | ||
| + | ga_mutate_parents                                        	yes | ||
| + | ga_pmut_rate                                             	0.3 | ||
| + | ga_omut_rate                                             	0.7 | ||
| + | ga_max_mut_cycles                                        	5 | ||
| + | ga_mut_sampling_method                                   	rand | ||
| + | ga_num_random_picks                                      	15 | ||
| + | ga_max_root_size                                         	5 | ||
| + | ga_energy_cutoff                                         	100 | ||
| + | ga_heur_unmatched_num                                    	1 | ||
| + | ga_heur_matched_rmsd                                     	0.5 | ||
| + | ga_constraint_mol_wt                                     	500 | ||
| + | ga_constraint_rot_bon                                    	10 | ||
| + | ga_constraint_H_accept                                   	10 | ||
| + | ga_constraint_H_don                                      	5 | ||
| + | ga_constraint_formal_charge                              	2 | ||
| + | ga_ensemble_size                                         	200 | ||
| + | ga_selection_method                                      	elitism | ||
| + | ga_elitism_combined                                      	yes | ||
| + | ga_elitism_option                                        	max | ||
| + | ga_max_num_gen_with_no_crossover                         	25 | ||
| + | ga_name_identifier                                       	1o86_ga | ||
| + | ga_output_prefix                                         	1086_ga_out | ||
| + | use_internal_energy                                      	yes | ||
| + | internal_energy_rep_exp                                  	12 | ||
| + | internal_energy_cutoff                                   	100.0 | ||
| + | use_database_filter                                      	no | ||
| + | orient_ligand                                            	yes | ||
| + | automated_matching                                       	yes | ||
| + | receptor_site_file                                       	RECEPTOR SPHERES .SPH | ||
| + | max_orientations                                         	1000 | ||
| + | critical_points                                          	no | ||
| + | chemical_matching                                        	no | ||
| + | use_ligand_spheres                                       	no | ||
| + | bump_filter                                              	no | ||
| + | score_molecules                                          	yes | ||
| + | contact_score_primary                                    	no | ||
| + | grid_score_primary                                       	yes | ||
| + | grid_score_rep_rad_scale                                 	1 | ||
| + | grid_score_vdw_scale                                     	1 | ||
| + | grid_score_es_scale                                      	1 | ||
| + | grid_lig_efficiency                                      	no | ||
| + | grid_score_grid_prefix                                   	GRID PATH | ||
| + | minimize_ligand                                          	yes | ||
| + | minimize_anchor                                          	yes | ||
| + | minimize_flexible_growth                                 	yes | ||
| + | use_advanced_simplex_parameters                          	no | ||
| + | simplex_max_cycles                                       	1 | ||
| + | simplex_score_converge                                   	0.1 | ||
| + | simplex_cycle_converge                                   	1.0 | ||
| + | simplex_trans_step                                       	1.0 | ||
| + | simplex_rot_step                                         	0.1 | ||
| + | simplex_tors_step                                        	10.0 | ||
| + | simplex_anchor_max_iterations                            	500 | ||
| + | simplex_grow_max_iterations                              	500 | ||
| + | simplex_grow_tors_premin_iterations                      	0 | ||
| + | simplex_final_min                                        	no | ||
| + | simplex_random_seed                                      	0 | ||
| + | simplex_restraint_min                                    	no | ||
| + | atom_model                                               	all | ||
| + | vdw_defn_file                                            	VDW DEFN FILE | ||
| + | flex_defn_file                                           	FLEX DEFN FILE | ||
| + | flex_drive_file                                          	FLEX_DRIVE FILE | ||
| + | </nowiki> | ||
| + | |||
| + | We will also need to create a batch file to run the GA using slurm, as it is an intensive process that will take a number of hours to run. A sample is provided below; you should adopt yours to the appropriate queue, job name, time to run, DOCK path, and input filename: | ||
| + | |||
| + |  <nowiki> | ||
| + | #!/bin/bash | ||
| + | # | ||
| + | #SBATCH --job-name=1o86_GA | ||
| + | #SBATCH --output=ga_output.txt | ||
| + | #SBATCH --nodes=1 | ||
| + | #SBATCH --ntasks-per-node=40 | ||
| + | #SBATCH --time=4:00:00 | ||
| + | #SBATCH -p short-40core | ||
| + | |||
| + | change_this_to_your_path_to_dock/bin/dock6 -i genalg.in -o genalg.out | ||
| + | </nowiki> | ||
| + | |||
| + | Use <code>sbatch job.sh</code> to submit this per usual, and check up on it regularly. The first few generations should go relatively quickly, with a slowdown as the molecular population increases past generation 8-10. If you want to track the population growth, you can use <code>grep -c "Name" *mol2</code> to view the molecule populations of every .mol2 file that has been written so far. | ||
| + | |||
| + | === Analysis === | ||
| + | |||
| + | Once the genetic algorithm has finished running, you can view the final generation's best molecules and poses in Chimera via the same process outlined in section 008. To review, this will entail downloading the final .mol2 to your local machine (generation 50 in this case), launching Chimera, opening the receptor .mol2 via File >> Open, and using the Tools >> Structure/Binding Analysis >> ViewDock tool to load the ligand .mol2 file in a conveniently viewable format. An example of this can be seen below: | ||
| + | |||
| + | [[File:GA_ViewDock.png|600px]] | ||
| + | |||
| + | == 010: De Novo Design Example == | ||
| + | |||
| + | === De Novo Design === | ||
| + | '''De Novo Design''' is a dock based algorithm that generates new ligands from scratch. This is | ||
| + | done by selecting a dummy atom, which is the 'seed' that 'grows' scaffolds, linkers, or side chains | ||
| + | based on user defined parameters. For example, say you only wanted to use de novo design to only 'grow' | ||
| + | drug-like molecules. The way this is accomplished is ensuring the input file contains parameters | ||
| + | that bias the algorithm to abide by [https://en.wikipedia.org/wiki/Lipinski%27s_rule_of_five Lipinski's Rule of 5] | ||
| + | The guiding principle for using De Novo design is because there is a limit to the amount of new | ||
| + | molecules that you could generate using a general virtual screening. Nevertheless, this method  | ||
| + | will certainly aid in enhancing your search space in generating numerous new compounds.  | ||
| + | |||
| + | In this tutorial, you will be guided through a '''Denovo refinement calculation'''. This kind of  | ||
| + | calculation involves deleting a part of a ligand, creating a dummy atom, and using the De Novo design | ||
| + | algorithm to 'regrow' that part of the ligand based on user defined parameters.  | ||
| + | The way in which the 'regrow' process happens comes from the algorithm pulling molecules from fragment libraries | ||
| + | that contain scaffolds, linkers, and sidechains. The user defined parameters will dictate which libraries the  | ||
| + | algorithm can pull from.  | ||
| + | |||
| + | The outcome will be a series of refinements that will retain a desired interaction in 1O86. | ||
| + | |||
| + | === Selecting a Dummy Atom === | ||
| + | |||
| + | To prepare our molecule for a De Novo calculation, we must first select a dummy atom to 'grow'  | ||
| + | from. To do this, first open your 1O86_protein_wH_wcH.mol2 file, then your 1O86_ligand_wH_wcH.mol2. | ||
| + | The rationale for this is we would like to delete an atom on the ligand that contains a group  | ||
| + | that interacts with the protein. This will help to produce meaningful results, from a drug design standpoint: | ||
| + | |||
| + | [[File:Ligand_In_1O86_DN.png|600px]] | ||
| + | |||
| + | As you can see it is a little difficult to see which atoms are interacting with the protein. | ||
| + | To refine this inspection, hit Control and select an atom on the ligand.  | ||
| + | Then, hit the up arrow to highlight the entire ligand. | ||
| + | Next, hit Select >> Zone and the following menu appears: | ||
| + | |||
| + | [[File:1O86_Zone.png|600px]] | ||
| + | |||
| + | Lets modify the Zone and change the number from 5.0 angstroms to 3.0 angstroms.  | ||
| + | Additionally, make sure that that the third box is checked off entitled that selects neighboring residues. | ||
| + | Then, press okay. You will notice that your ligand and the neighboring residues are highlighted: | ||
| + | |||
| + | [[File:highlight.png|600px]] | ||
| + | |||
| + | To modify this image even further: | ||
| + | Go to Actions >> Atoms/Bonds >> Show  | ||
| + | Next, navigate to Select >> Invert(selected models), here you'll notice most of the protein is highlighted | ||
| + | Lastly, Return to Actions >> Atoms/Bonds >> delete | ||
| + | You now see that there is a clearer picture of specifically, which atoms are interacting closely with the protein | ||
| + | |||
| + | [[File:Clear_picture.png|600px]] | ||
| + | |||
| + | You'll notice that there are two oxygens interacting with neighboring residues in the protein. Tracing your cursor  | ||
| + | in between the oxygens, you'll highlight a Carbon atom labeled C9. This will be the atom of choice for this tutorial. | ||
| + | |||
| + | === Generating a Dummy Atom === | ||
| + | |||
| + | Now that we have our atom of choice, we need to modify the ligand as well as the mol2 file itself.  | ||
| + | |||
| + | First, open the <code>1O86_ligand_wH_wcH.mol2</code> in Chimera. | ||
| + | |||
| + | Locate the C9 atom >> select the two oxygens attached to C9 >> Atoms/Bonds >> delete. | ||
| + | Then, save the mol2 file, lets call it 1O86_ligand_Du.mol2. | ||
| + | |||
| + | Finally, we must open the mol2 file on our terminal or a text editor and change the atom type of C9 to Du: | ||
| + | |||
| + | First in the terminal, type the command | ||
| + | |||
| + |    vi 1O86_ligand_Du.mol2  | ||
| + | |||
| + | Or, open the Mol2 file in your text editor of choice. | ||
| + | |||
| + | Find the C9 atom and modify the atom type. Your input file should look like this: | ||
| + | |||
| + | [[File:Input_file.png|600px]] | ||
| + | |||
| + | Save it.  | ||
| + | |||
| + | Now lets verify this change by opening the mol2 file on Chimera: | ||
| + | |||
| + | [[File:New_dummy_Ligand.png|600px]] | ||
| + | |||
| + | As you can see C9 is now a dummy atom as shown in purple | ||
| + | |||
| + | Now, the Mol2 is ready for De Novo calculations | ||
| + | |||
| + | As a last step, transfer the mol2 to your working directory on Seawulf | ||
| + | |||
| + |   scp 1O86_ligand_Du.mol2 yourusername@login.seawulf.stonybrook.edu:'/gpfs/home/yourusername/tutorial/010_de_novo/' | ||
| + | |||
| + | === Running The Denovo Calculation === | ||
| + | |||
| + | In your 010_De_Novo folder create an empty input file: | ||
| + | |||
| + |    touch DN.in  | ||
| + | |||
| + | Then prompt the question tree with the dock program: | ||
| + | |||
| + |    dock6 -i DN.in  | ||
| + | |||
| + | Follow the question tree and use the following sample input file as a template | ||
| + | |||
| + |    conformer_search_type                                        denovo | ||
| + |    dn_fraglib_scaffold_file                                     /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_scaffold.mol2 | ||
| + |    dn_fraglib_linker_file                                       /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_linker.mol2 | ||
| + |    dn_fraglib_sidechain_file                                    /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_sidechain.mol2 | ||
| + |    dn_user_specified_anchor                                     yes | ||
| + |    dn_fraglib_anchor_file                                       1O86_ligand_Du.mol2 | ||
| + |    dn_torenv_table                                              ../ga_calc/unique_full_sorted_fraglib.dat | ||
| + |    dn_name_identifier                                           denovo | ||
| + |    dn_sampling_method                                           graph | ||
| + |    dn_graph_max_picks                                           30 | ||
| + |    dn_graph_breadth                                             3 | ||
| + |    dn_graph_depth                                               2 | ||
| + |    dn_graph_temperature                                         100.0 | ||
| + |    dn_pruning_conformer_score_cutoff                            100.0 | ||
| + |    dn_pruning_conformer_score_scaling_factor                    2.0 | ||
| + |    dn_pruning_clustering_cutoff                                 100.0 | ||
| + |    dn_remove_duplicates                                         yes | ||
| + |    dn_max_duplicates_per_mol                                    0 | ||
| + |    dn_write_pruned_duplicates                                   no | ||
| + |    dn_advanced_pruning                                          yes | ||
| + |    dn_prune_initial_sample                                      yes | ||
| + |    dn_sample_torsions                                           yes | ||
| + |    dn_prune_individual_torsions                                 yes | ||
| + |    dn_prune_combined_torsions                                   yes | ||
| + |    dn_random_root_selection                                     no | ||
| + |    dn_mol_wt_cutoff_type                                        soft | ||
| + |    dn_upper_constraint_mol_wt                                   1000 | ||
| + |    dn_lower_constraint_mol_wt                                   0.0 | ||
| + |    dn_mol_wt_std_dev                                            35.0 | ||
| + |    dn_constraint_rot_bon                                        15 | ||
| + |    dn_constraint_formal_charge                                  2.0 | ||
| + |    dn_heur_unmatched_num                                        1 | ||
| + |    dn_heur_matched_rmsd                                         2.0 | ||
| + |    dn_unique_anchors                                            1 | ||
| + |    dn_max_grow_layers                                           1 | ||
| + |    dn_max_root_size                                             25 | ||
| + |    dn_max_layer_size                                            25 | ||
| + |    dn_max_current_aps                                           5 | ||
| + |    dn_max_scaffolds_per_layer                                   1 | ||
| + |    dn_max_successful_att_per_root                               50000 | ||
| + |    dn_write_checkpoints                                         yes | ||
| + |    dn_write_prune_dump                                          yes | ||
| + |    dn_write_orients                                             no | ||
| + |    dn_write_growth_trees                                        no | ||
| + |    dn_output_prefix                                             DN.out | ||
| + |    use_internal_energy                                          yes | ||
| + |    internal_energy_rep_exp                                      12 | ||
| + |    internal_energy_cutoff                                       100.0 | ||
| + |    use_database_filter                                          no | ||
| + |    orient_ligand                                                no | ||
| + |    bump_filter                                                  no | ||
| + |    score_molecules                                              yes | ||
| + |    contact_score_primary                                        no | ||
| + |    grid_score_primary                                           yes | ||
| + |    grid_score_rep_rad_scale                                     1 | ||
| + |    grid_score_vdw_scale                                         1 | ||
| + |    grid_score_es_scale                                          1 | ||
| + |    grid_lig_efficiency                                          no | ||
| + |    grid_score_grid_prefix                                       ../003_gridbox/grid | ||
| + |    minimize_ligand                                              yes | ||
| + |    minimize_anchor                                              yes | ||
| + |    minimize_flexible_growth                                     yes | ||
| + |    use_advanced_simplex_parameters                              no | ||
| + |    simplex_max_cycles                                           1 | ||
| + |    simplex_score_converge                                       0.1 | ||
| + |    simplex_cycle_converge                                       1.0 | ||
| + |    simplex_trans_step                                           1.0 | ||
| + |    simplex_rot_step                                             0.1 | ||
| + |    simplex_tors_step                                            10.0 | ||
| + |    simplex_anchor_max_iterations                                500 | ||
| + |    simplex_grow_max_iterations                                  500 | ||
| + |    simplex_grow_tors_premin_iterations                          0 | ||
| + |    simplex_final_min                                            no | ||
| + |    simplex_random_seed                                          0 | ||
| + |    simplex_restraint_min                                        no | ||
| + |    atom_model                                                   all | ||
| + |    vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn | ||
| + |    flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn | ||
| + |    flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl | ||
| + | |||
| + | |||
| + | Once you've filled out the input file, you can then go ahead and run the calculation | ||
| + | |||
| + |    dock6 -i DN.in -o DN.out | ||
| + | |||
| + | After running the calculations, the following output files will be generated: | ||
| + | |||
| + |   DN.out | ||
| + |   DN.out.anchor_1.root_layer_1.mol | ||
| + |   DN.out.completed.denovo_build.mol2 | ||
| + |   DN.out.denovo_build.mol2 | ||
| + | |||
| + | Examining the output file you can see how many molecules were grown. | ||
| + | |||
| + | [[File:Attach_out.png|600px]] | ||
| + | |||
| + | In this case there are 21 different attached molecules. | ||
| + | |||
| + | Now, copy over the DN.out.completed.denovo_build.mol2 to your home computer | ||
| + | |||
| + |     scp yourusername@login.seawulf.stonybrook.edu:'/gpfs/home/yourusername/tutorial/010_de_novo/DN.out.completed.denovo_build.mol2' . | ||
| + | |||
| + | === Visualizing the Molecules in Chimera === | ||
| + | |||
| + | Head over to Chimera and go to Tools >> Surface/Binding Analysis >> Scroll down to where you see ViewDock and select it | ||
| + | |||
| + | Then select the DN.out.completed_denovo.mol2 | ||
| + | |||
| + | The following menu will appear: | ||
| + | |||
| + | [[File:View_Dock.png|600px]] | ||
| + | |||
| + | You can navigate through the list of molecules that were grown from the calculations. | ||
| + | |||
| + | You'll notice that the best scoring ligand retained the oxygens from the original ligand: | ||
| + | |||
| + | [[File:Best_Scoring_Ligand.png|600px]] | ||
| + | |||
| + | Opening the original ligand mol2, you'll see that the molecules are almost identical | ||
| + | |||
| + | [[File:Overlap_Ligand.png|600px]] | ||
| + | |||
| + | If you return to the ViewDock menu and highlight all of the ligands.  | ||
| + | You will be able to see a depiction of all of the different refinements. | ||
| + | |||
| + | This can be done by going to Tools >> ViewDock >> Raise | ||
| + | Now that the menu has been prompted hold down your shift key and select the first ligand,  | ||
| + | then scroll down and select the last ligand. All your molecules will be shown overlapping on Chimera: | ||
| + | |||
| + | [[File:Refinements.png|600px]] | ||
| + | |||
| + | Well done, you've now completed a simple denovo calculation! | ||
| + | |||
| + | == Running a Generic De Novo Calculation == | ||
| + | |||
| + | In this part of the tutorial, you will be running a '''generic denovo calculation'''. | ||
| + | What this means is that you take an anchor, or don't specify one, and grow a ligand around parts of the molecule.  | ||
| + | These parts of the molecule are dictated by the selected spheres that were calculated in the second part of the VS tutorial. | ||
| + | The end result becomes a series of newly 'grown' ligands based on the user defined parameters. These parameters are what will bias  | ||
| + | the denovo growth to meet a certain criterian (i.e. a specific molecular weight). For this tutorial, you will select an anchor and use that  | ||
| + | to grow a series of ligands, which you will then be able to see on chimera. | ||
| + | |||
| + | To start, lets create a new directory in your <code> 010_de_novo </code> directory.  | ||
| + | Lets call it generic. | ||
| + | |||
| + |     mkdir generic | ||
| + | |||
| + | === Selecting an Anchor === | ||
| + | |||
| + | Now that you have a working directory for this calculation, lets go ahead and create an anchor | ||
| + | |||
| + | file to run the generic denovo calculation. For our anchor of choice, we will use the most popular | ||
| + | |||
| + | sidechain from dock6's current library. To start off, copy the library mol2 file to your home directory from  | ||
| + | |||
| + | seawulf: | ||
| + | |||
| + |      scp yourusername@milan.seawulf.stonybrook.edu:'/gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_sidechain.mol2' . | ||
| + | |||
| + | It's a pretty hefty file and the way you are going to open the file is by using ViewDock on Chimera. | ||
| + | |||
| + | Navigate to Chimera and select Tools --> Surface/Binding Analysis --> View Dock  | ||
| + | |||
| + | Select the fraglib_sidechain.mol2 file. | ||
| + | |||
| + | On chimera you'll see a methyl-ethyl sidechain popup and the View Dock menu to toggle through other sidechains: | ||
| + | |||
| + | [[File:methyl_ethyl.png|600px]] | ||
| + | |||
| + | [[File:view_dock_frag.png|600px]] | ||
| + | |||
| + | As mentioned before, you are going to use the most popular sidechain as the anchor for this calculation. | ||
| + | |||
| + | On the View Dock Menu go to Column --> Show --> FREQ | ||
| + | |||
| + | This is the frequency in which these particular sidechains occur in the dock6 testset: | ||
| + | |||
| + | [[File:freq.png|600px]] | ||
| + | |||
| + | Ironically, the methy ethyl sidechain appears to be the most popular sidechain. This will be the anchor of choice. | ||
| + | |||
| + | Go to File --> Save Mol2  | ||
| + | |||
| + | Lets call it Anchor.mol2  | ||
| + | |||
| + | Additionally, make sure that only the first model is selected in the Select Model panel like so: | ||
| + | |||
| + | [[File:save_anch.png|600px]] | ||
| + | |||
| + | Select Save. You can also verify that there is only one sidechain in the mol2 by viewing it on the terminal | ||
| + | |||
| + |    vi Anchor.mol2 | ||
| + | |||
| + | The file should look like this: | ||
| + | |||
| + | [[File:vi_anch.png|600px]] | ||
| + | |||
| + | Now you can go ahead and copy this file over to the <code> generic </code> directory in seawulf | ||
| + | |||
| + |   scp Anchor.mol2 yourusername@milan.stonybrook.edu:'/gpfs/yourusername/tutorial/010_de_novo_/generic' | ||
| + | |||
| + | === Running the Calculations === | ||
| + | |||
| + | Now that you're on seawulf in the <code> generic </code> directory with your Anchor.mol2 file, | ||
| + | |||
| + | lets create the necessary files to run the generic denovo calculation.  | ||
| + | |||
| + | First, create an input file, lets call it GDN.in. | ||
| + | |||
| + | Just like when running the refinement calculation, you are encouraged to create an empty input file and follow the dock6 question tree. | ||
| + | |||
| + | To create an empty input file run the following command: | ||
| + | |||
| + |      touch GDN.in  | ||
| + | |||
| + | Then, use dock6 to initiate the question tree: | ||
| + | |||
| + |      dock6 -i GDN.in | ||
| + | |||
| + | Use the following sample input file as a template when navigating the question tree: | ||
| + | |||
| + |    conformer_search_type                                        denovo | ||
| + |    dn_fraglib_scaffold_file                                     /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_scaffold.mol2 | ||
| + |    dn_fraglib_linker_file                                       /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_linker.mol2 | ||
| + |    dn_fraglib_sidechain_file                                    /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_sidechain.mol2 | ||
| + |    dn_user_specified_anchor                                     yes | ||
| + |    dn_fraglib_anchor_file                                       ../generic/Anchor.mol2 | ||
| + |    dn_torenv_table                                              /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_torenv.dat | ||
| + |    dn_name_identifier                                           generic | ||
| + |    dn_sampling_method                                           graph | ||
| + |    dn_graph_max_picks                                           30 | ||
| + |    dn_graph_breadth                                             3 | ||
| + |    dn_graph_depth                                               2 | ||
| + |    dn_graph_temperature                                         100.0 | ||
| + |    dn_pruning_conformer_score_cutoff                            100.0 | ||
| + |    dn_pruning_conformer_score_scaling_factor                    2.0 | ||
| + |    dn_pruning_clustering_cutoff                                 100.0 | ||
| + |    dn_remove_duplicates                                         yes | ||
| + |    dn_max_duplicates_per_mol                                    0 | ||
| + |    dn_write_pruned_duplicates                                   no | ||
| + |    dn_advanced_pruning                                          yes | ||
| + |    dn_prune_initial_sample                                      yes | ||
| + |    dn_sample_torsions                                           yes | ||
| + |    dn_prune_individual_torsions                                 yes | ||
| + |    dn_prune_combined_torsions                                   yes | ||
| + |    dn_random_root_selection                                     no | ||
| + |    dn_mol_wt_cutoff_type                                        soft | ||
| + |    dn_upper_constraint_mol_wt                                   550.0 | ||
| + |    dn_lower_constraint_mol_wt                                   0.0 | ||
| + |    dn_mol_wt_std_dev                                            35.0 | ||
| + |    dn_constraint_rot_bon                                        15 | ||
| + |    dn_constraint_formal_charge                                  2.0 | ||
| + |    dn_heur_unmatched_num                                        1 | ||
| + |    dn_heur_matched_rmsd                                         2.0 | ||
| + |    dn_unique_anchors                                            1 | ||
| + |    dn_max_grow_layers                                           9 | ||
| + |    dn_max_root_size                                             25 | ||
| + |    dn_max_layer_size                                            25 | ||
| + |    dn_max_current_aps                                           5 | ||
| + |    dn_max_scaffolds_per_layer                                   1 | ||
| + |    dn_max_successful_att_per_root                               50000 | ||
| + |    dn_write_checkpoints                                         yes | ||
| + |    dn_write_prune_dump                                          no | ||
| + |    dn_write_orients                                             no | ||
| + |    dn_write_growth_trees                                        no | ||
| + |    dn_output_prefix                                             generic | ||
| + |    use_internal_energy                                          yes | ||
| + |    internal_energy_rep_exp                                      12 | ||
| + |    internal_energy_cutoff                                       100.0 | ||
| + |    use_database_filter                                          no | ||
| + |    orient_ligand                                                yes | ||
| + |    automated_matching                                           yes | ||
| + |    receptor_site_file                                           ../002_spheres/selected_spheres.sph | ||
| + |    max_orientations                                             1000 | ||
| + |    critical_points                                              no | ||
| + |    chemical_matching                                            no | ||
| + |    use_ligand_spheres                                           no | ||
| + |    bump_filter                                                  no | ||
| + |    score_molecules                                              yes | ||
| + |    contact_score_primary                                        no | ||
| + |    grid_score_primary                                           yes | ||
| + |    grid_score_rep_rad_scale                                     1 | ||
| + |    grid_score_vdw_scale                                         1 | ||
| + |    grid_score_es_scale                                          1 | ||
| + |    grid_lig_efficiency                                          no | ||
| + |    grid_score_grid_prefix                                       ../003_gridbox/grid | ||
| + |    minimize_ligand                                              yes | ||
| + |    minimize_anchor                                              yes | ||
| + |    minimize_flexible_growth                                     yes | ||
| + |    use_advanced_simplex_parameters                              no | ||
| + |    simplex_max_cycles                                           1 | ||
| + |    simplex_score_converge                                       0.1 | ||
| + |    simplex_cycle_converge                                       1.0 | ||
| + |    simplex_trans_step                                           1.0 | ||
| + |    simplex_rot_step                                             0.1 | ||
| + |    simplex_tors_step                                            10.0 | ||
| + |    simplex_anchor_max_iterations                                500 | ||
| + |    simplex_grow_max_iterations                                  500 | ||
| + |    simplex_grow_tors_premin_iterations                          0 | ||
| + |    simplex_final_min                                            no | ||
| + |    simplex_random_seed                                          0 | ||
| + |    simplex_restraint_min                                        no | ||
| + |    atom_model                                                   all | ||
| + |    vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_de_novo.defn | ||
| + |    flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn      | ||
| + |    flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl | ||
| + | |||
| + | After creating the input file, you will need to create a slurm input file to run this calculation.  | ||
| + | |||
| + |   vi gdn.slurm  | ||
| + | |||
| + | Type the following into the slurm file: | ||
| + | |||
| + |   #!/bin/bash | ||
| + |   # | ||
| + |   #SBATCH --job-name=1O86_generic_de_novo | ||
| + |   #SBATCH --output=generic_dn_output.txt | ||
| + |   #SBATCH --ntasks-per-node=24 | ||
| + |   #SBATCH --nodes=4 | ||
| + |   #SBATCH --time=4:00:00 | ||
| + |   #SBATCH -p short-96core | ||
| + | |||
| + |   dock6 -i GDN.in -o GDN.out | ||
| + | |||
| + | To run the calculation submit a job with the slurm file: | ||
| + | |||
| + |     sbatch gdn.slurm | ||
| + | |||
| + | The job itself may take awhile to finish. It is highly recommended that you check to see if your job is still running and that your .out file to see if there are any errors. | ||
| + | |||
| + | After the calculations have run, the following files will show up in your directory: | ||
| + |   GDN.out                             generic_dn_output.txt | ||
| + |   generic.anchor_1.root_layer_1.mol2  generic.anchor_1.root_layer_5.mol2       | ||
| + |   generic.anchor_1.root_layer_2.mol2  generic.anchor_1.root_layer_6.mol2 | ||
| + |   generic.anchor_1.root_layer_3.mol2  generic.anchor_1.root_layer_7.mol2 | ||
| + |   generic.anchor_1.root_layer_4.mol2  generic.completed.denovo_build.mol2 | ||
| + |                                       generic.denovo_build.mol2 | ||
| + | |||
| + | These illustrate the different layers of growth that were specified in the input file, the <code> generic.completed.denovo_build.mol2 </code> contains all of the finished molecules regardless of how many layers of growth. To view how many molecules are in this output file, type the following line in your terminal: | ||
| + | |||
| + |    grep -c "SMALL" generic.completed.denovo_build.mol2  | ||
| + | |||
| + | You'll notice that the number of molecules in this file come out to be 2350. That's okay, go ahead and transfer this file onto your home computer: | ||
| + | |||
| + |     scp yourusername@milan.stonybrook.edu:'gpfs/yourusername/tutorial/010_de_novo/generic/generic.completed.denovo_build.mol2' . | ||
| + | |||
| + | === Visualizing The Results === | ||
| + | |||
| + | To view the molecules that were grown, use the View Dock tool in Chimera.  | ||
| + | The procedure is the same as when you were selecting an anchor.  | ||
| + | You should wait about 30 seconds as there are a large chunk of molecules being loaded into chimera.  | ||
| + | Once Chimera is finished loading you will see an image of a finished sidechain and the View Dock Menu: | ||
| + | |||
| + | [[File:First_side_chain.png|600px]] | ||
| + | |||
| + | [[File:Sc_menu.png|600px]] | ||
| + | |||
| + | You want to view the sidechains that have better overall grid scores.  | ||
| + | On the View Dock menu, go to Column --> Show --> Grid_Score. | ||
| + | You'll see that the Grid_Score column is columned. Let's sort the Grid_Score | ||
| + | column in descending order(i.e. from greatest to least). To do this, click on the  | ||
| + | Grid_Score title, you'll notice that there's now an arrow head over the column.  | ||
| + | Keep clicking on the column until you see that the arrow head is pointing up: | ||
| + | |||
| + | [[File:sort.png|600px]] | ||
| + | |||
| + | Now the best scoring molecule that was grown is at the top of the list and here's what it looks like: | ||
| + | |||
| + | [[File:bg.png|600px]] | ||
| + | |||
| + | Load in the <code> 1O86_fixed_protein.pdb </code> and the <code> 1O86_ligand_nH_ncH.mol2 </code>. | ||
| + | On Chimera, select Actions --> Ribbon --> hide. | ||
| + | This will give a clear picture of the ligand having slight overlap with the grown ligand in the protein: | ||
| + | |||
| + | [[File:overlap.png|600px]] | ||
| + | |||
| + | Congratulations, you've successfully run a generic denovo calculation. | ||
Latest revision as of 17:13, 13 August 2025
Contents
- 1 DOCK Tutorial using PDB 1O86
- 2 000: Foundations
- 3 001: Structure Prep
- 4 002: Spheres
- 5 003: Grid/box
- 6 004: Minimization
- 7 005: Rigid Docking
- 8 006: Flexible Docking
- 9 007: Footprint Scoring
- 10 008: 5K Virtual Screening
- 11 009: Genetic Algorithm
- 12 010: De Novo Design Example
- 13 Running a Generic De Novo Calculation
DOCK Tutorial using PDB 1O86
This tutorial is a series of modules designed to familiarize a new user with the basic functions and features of DOCK6, as well as provide guidance on its execution and analysis of results. Modules should be read in a linear fashion, as each module builds on the prior one.
The the full DOCK manual contains much more information than this tutorial, and as such is recommended as a resource if the information provided herein proves unsatisfactory.
000: Foundations
Chimera
UCSF Chimera is a python-based, open-source molecular visualization and manipulation software suite. It is extremely helpful for both preparing molecules/receptors for docking and for visually analyzing the results of those calculations.
It can be downloaded from the official UCSF site; make sure to select the version that matches your operating system (Mac or Windows). Although Chimera is no longer under active development, it remains a relevant software for molecular modeling.
Once Chimera has installed, you can open it to find a blank blue-ish window, with a row of tabs along the top. Throughout this tutorial, you will be instructed to perform different actions contained within these tabs. We will denote the specific tab and sub-tab to be accessed by >> signs. For example, File >> Open PDB would indicate that you should click on the File tab, then mouse to and click Open PDB. This is necessary because some actions are nested in multiple sub-tabs (for instance, selecting all hydrogens in a model would require Select >> Chemistry >> element >> H, as shown below)
More extensive documentation on Chimera and its functions is available on the official site.
Seawulf
To complete this tutorial as a Stony Brook student, you will need an account on Seawulf. A ticket to obtain an account can be submitted on the Seawulf website; Dr. Rizzo will need to provide approval for account activation.
The Seawulf website also has a list of best practices for using a High Performance Computing (HPC) cluster. We recommend reading through them before attempting to run any intensive programs on Seawulf. Of particular note is the job management system SLURM - we provide a full tutorial on this system in section 008 of this tutorial if you want to read up on it now, but it will not be necessary for this tutorial until that point.
SSH
You will need to use a Secure Shell (SSH) connection to connect to Seawulf remotely. A guide to this process is available on the Seawulf website.
Basic Unix Commands and Environment
Seawulf uses a terminal-based Unix operating system. If you have never used a terminal-based OS before, you should familiarize yourself with some basic Unix tutorials. Broadly, you should know how to:
-  List your working directory with pwd, change your working directory withcd, make new directories withmkdir, remove directories withrmdir, and list the contents of a directory withls
-  Edit files using the text editor vim(use thevimtutorto read about basic functionality)
-  Create an empty file with touch, move files and directories around withmv, copy directories and files withcp, and (very carefully!) remove directories and files withrm
-  Determine if commands are available with which
- Understand how to use commands and how flags/input parameters should be formatted
You should also understand the concept of filepaths, and how they are used by commands. Much of DOCK relies on you using the correct filepaths to pass files and parameters into the program, and if the paths are wrong, then the program will not work. When possible, it is advisable to use absolute filepaths (paths that start at the root directory), as opposed to paths relative to your current working directory. For example, if you are in the directory  /gpfs/home/yourusername/tutorial/002_spheres, referencing the absolute path /gpfs/home/yourusername/tutorial/001_structure/important_structure.pdb for a needed structure file is more reliable than the relative path ../001_structure/important_structure.pdb, as the latter path is relative to 002_spheres/, and meaningless outside of that context. The command realpathwill return the absolute path of any file passed to it, which is useful for quickly and accurately determining the absolute path of any file.
To work with DOCK6, it may also be necessary to add a path to your config file that tells Unix where to look for DOCK-related commands. You can check if you need to do this using the command which dock6. If which informs you that no dock6 can be found, you will need to edit your .bashrc file:
- Move to your home directory by either entering cdorcd ~
- Use the command vi .bashrcto start editing your personal config file
- At the bottom of the file, add the following line:
export PATH=$PATH:/Path/To/Your/DOCK6/bin
where Path/To/Your/DOCK6/ should be replaced with whatever the path is to your local DOCK installation (note: the full path should end with /bin). At the time of this writing, there is a compiled version of DOCK6.12 at /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/, but if this does not work then ask Dr. Rizzo or a lab member where a compiled instance can be found
- 4. Save and exit the file, then enter the command source .bashrcinto the terminal.
- 5. Now when you request which dock6, the terminal should return the path to the directory you just provided.
After all this, you should have a good understanding of how to navigate the Unix terminal, and have the DOCK6 suite ready to use in your environment.
SCP File Transfer
Another important aspect of working on Seawulf is uploading and downloading files to your local machine. This is accomplished using the Secure Copy Protocol (scp) terminal command. See the pertinent section on the Seawulf website. For Windows, it may be advisable to download a third party program; FileZilla works for this task.
Directory setup
Finally, assuming all necessary base Unix knowledge has been obtained, you will want to set up a set of directories for the work you will be doing in this tutorial. The directories will help you compartmentalize your work and help to prevent important files from being overwritten or lost. For the modules of this tutorial, we recommend setting up this directory structure, and setting up folders on your local machine with the same names for ease of file transfer:
First, create a tutorial directory in your home directory:
cd ~ mkdir tutorial
Then, make a series of subdirectories inside of that tutorial directory:
cd ~/tutorial mkdir 001_structure mkdir 002_spheres mkdir 003_gridbox mkdir 004_energy_min mkdir 005_rigid_dock mkdir 006_flex_dock mkdir 007_footprint mkdir 008_virtual_screen mkdir 009_gen_alg mkdir 010_de_novo
A Note on Troubleshooting
Finally, despite our best efforts at accuracy, a healthy amount of this tutorial will likely not go smoothly for you. There will be errors and inconsistencies between what you see here and what happens on Seawulf, and there may not be an immediate answer for what you should do to fix them. The instinct when this happens is to find someone who knows what might have happened and ask them for advice, and while this will usually be helpful, it may be inefficient. As such, we advise you to first please troubleshoot. No program you will use in this tutorial is intended to be a black box. All of them do their best to return errors that tell you what is going wrong, and where to look to fix it. If you get an error, read it! Often it will tell you that it can't find a particular file ("doesn't exist" is the favorite wording), or that some piece of the input is giving it trouble— these tell you what paths to check or lines of a file to review. Also, copy-pasting an entire error into Google (or ChatGPT if you must) is a completely valid line of inquiry, and often yields helpful results.
Having said that, here are some common errors and tactics you can use to fix them:
"[File] could not be opened" or "[file] does not exist" usually indicates that a path you gave the program is not accurate. You can "sanity check" paths by copying them directly from the input and putting them into an ls command. If it throws an error, fix the path!
"version `GLIBCXX_whatever' not found" is generally an issue with how the DOCK installation was compiled. Try to find a different version (/gpfs/projects/AMS536/zzz.programs usually has several versions available) or try running the command on a different node (Milan or Login)
Errors mentioning FORTRAAN: Make sure the input files are named correctly(ex INSPH for sphgen) and there are no extra spaces or lines anywhere in the input files. Sometimes they sneak in at the start or end of a line and are parsed literally.
Often, output is sent into a .out file instead of written to the console. If the program terminates immediately but does not return expected files, check whatever .out file was written for errors at the very bottom.
001: Structure Prep
Downloading the PDB
Having setup your necessary environment to work on Seawulf, let's navigate to your local computer and begin the protein preparation process:
To begin protein preparation you will need the 1O86 PDB file. Using this link: https://www.rcsb.org/structure/1O86, you will see the RCSB main page opened to our protein of choice:
Next, you'll want to navigate to the top right corner where it says Download Files. Then, select the dropdown arrow. The following dropdown menu will appear on the screen:
Select 'Legacy PDB Format'. Now the PDB file should be downloaded to your local computer.
Now that you have the PDB file, lets navigate to Chimera program to open the file.
Protein Isolation
Now that you have the downloaded PDB, the first step is to isolate and modify our protein, if necessary.
First, open the PDB in Chimera using File >> Open, zoom into where the ligand is, hold the control key and click to select an atom on the ligand, and press the up arrow key to highlight the ligand. Then go to Actions >> Focus.
The ligand should now be highlighted in green; you will see it is surrounded by water molecules:
We're going to get rid of the ligand and the water molecules.
First, since the ligand is highlighted, go to Actions >> Atoms/Bonds >> Delete.
There may be a glycine molecule remaining in the binding pocket. For the sake of simplicity, we are going to delete this as well. Control+click to select an atom in the glycine, up arrow to highlight the whole molecule, and Actions >> Atoms/Bonds >> Delete it in the same manner as the ligand.
Next, to get rid of the water molecules, navigate to Select >> Residue >> HOH
You'll notice that all of the water molecules are highlighted:
Just like the ligand, navigate to Actions >> Atoms/Bonds >> Delete
Now you have just a protein:
Inspecting the Protein
With just the protein file, lets inspect if there are any missing loops along the exterior. First, to make the view clearer, go to View >> Focus again to put the whole protein in focus.
You'll notice that there is a missing loop between residues 434 and 439.
To refine the missing loop, go to Tools >> Structure Editing >> select Model/Refine Loops
The following pop up menus will appear:
This is a list of the residues in the protein. The residues 435-438 boxed in red denote the ones involved with the missing loop.
This is the Modeller/Refine Menu. If you have not used it before, the Modeller License Key field may be empty, which will prevent the program from running. You will first need to register for a Modeller key on the official page. They will email you with the registration key relatively promptly (within a day but usually sooner). Note that because this program connects to a webserver for the loop modeling, you will require a stable internet connection to use it.
You can leave all the other options selected as is. Once you select apply, wait a couple of minutes. The program is generating different refinements to the missing loops. The residue menu will say "Running Modeller Web Service". Additionally, you'll be able to track the process completed on the search bar menu on the bottom part of the screen on Chimera.
Once the modeler is 100% completed, the following popup menu will appear on Chimera:
You can see that the modeller program generated five alternative protein models that refined the missing loop. If you press the shift key and highlight all of the model numbers you will see the following proteins with fixed loops.
Now, using the drop down menu, from earlier, you'll want to assess what's the best model to choose that has refined loops. Sometimes the difference in scores are negligible and you may want to select based on zDope, which would mean any loop is sufficient. However, there may be another score that could narrow your decision making.
In the 'Modeller Results' menu, select Fetch Scores. You'll see in your chimera window that a job was submitted. This is signaling to the program to retrieve the scores from the program. Once the job is finished, the Estimated RMSD and Estimated Overlap columns will contain values:
From adding more scores, you'll notice that the models differ in their Estimated RMSD values. Based on these values, Models 1.2 or 1.5 are a good choice based on RMSD. So, let's choose model 1.2 as our protein model to use in this tutorial. We will also save this model as our new PDB file.
Go to File >> Save PDB. When the save menu appears, make sure that only the second model is selected like so:
Let's also give this PDB a new title, the one supplied is  1O86_fixed_protein.pdb . Additionally, we will need the original PDB for ligand preparation.
Press save and now you have a refined protein to prepare.
Protein Preparation
After saving a protein with modelled loops as a PDB, use File >> Close Session to clear your Chimera session. Then, File >> Open the  1O86_fixed_protein.pdb  
In order to use this protein for docking calculations, we will need to generate a charged, protonated protein and ligand files. Charged molecules must be saved in Mol2 format, as the PDB file format does not support partial atomic charges.
We will first save a version of the protein in Mol2 format that does not have any charges or hydrogens.
Go to File >> Save Mol2. You will want to give this mol2 file a title that describes the fact that 
the protein doesn't have hydrogens or charges. So let's name this file  1O86_protein_nH_ncH.mol2 
The nH refers to "no hydrogens" and the ncH refers to "no charges".
Now that the mol2 is saved, we can protonate and charge the receptor in preparation for docking.
Adding Hydrogens
Go to Tools >> Structure Editing >> AddH
The following menu will appear:
All the defaults that are selected are acceptable. Select OK.
You will get a confirmation message on the chimera text bar that says 'Hydrogens Added'.
Now you have added hydrogens to the protein file. The last thing to do is to add charges.
Adding Charges
Before we can save the second mol2 for the protein we must also add charges to the protein.
Go to Tools >> Structure Editing >> press Add Charge
The following menu will appear:
By default Chimera selects AM1-BCC charges, but for this tutorial we are going to use Gasteiger charges. Every other selection on the menu is okay, so after changing the charge type, go ahead and select apply. You'll notice another pop up menu appears:
This deals with ions. Just as before, make sure that Gasteiger is selected. All other defaults are okay. Select OK. The Chimera window will inform you that standard charges are added.
Now that your protein has hydrogens and charges, we can go ahead and generate the second mol2 file.
Go to File >> Save Mol2. This time the name should reflect that the protein has hydrogens and charges.
Let's name it  1O86_protein_wH_wcH.mol2 .
Now our protein files are prepared for docking, let's now turn our attention to preparing the ligand.
Isolating the Ligand
Having made a Protein PDB and generated two mol2 files, lets now do the same thing for the ligand. The first step is to File >> Close Session to clear the current environment, then open the original PDB file that we downloaded.
Just like in the protein section, press control and select an atom on the ligand. Then press the up arrow to highlight the entire ligand:
Go to Select >> Invert (selected models)
You'll notice that everything around the ligand is highlighted.
Now go to Actions >> Atoms/Bonds >> delete
Now we have a file that just contains the ligand.
Go to File >> Save PDB and name the file  1O86_ligand.pdb 
This will be the file that we will use to prepare the mol2 files for the dock program.
You can now close your current session on Chimera by going to File >> Close Session
Preparing the Ligand
Now that we have a ligand only PDB, lets go ahead and generate the two mol2 files with the ligand.
The first step is to go to open the  1O86_ligand.pdb  on chimera.
Go to File >> Save Mol2
Lets give the title of this first mol2 to denote no hydrogens or charge on the ligand.
Call it  1O86_ligand_nH_ncH.mol2 
Running Dock Prep
To generate the second mol2 you are more than welcome to revisit the steps taken to generate the second protein mol2. In the case of the ligand, we will introduce an alternative with a program called Dock Prep.
First, return to  1O86_ligand.pdb  on chimera:
Go to Tools >> Structure Editing >> DockPrep
The following pop-up menu will appear:
All of the default parameters are sufficient. Select OK.
The next set of drop down menus will be the same as adding hydrogens and charges to a protein.
Remember all of the defaults are okay for adding hydrogens, but make sure you switch from
AM1-BCC to Gasteiger charges when adding the charges for the ligand.
All other defaults are sufficient for each of the other pop-up menus.
The following images show the pop-up menus that will appear after you accept using Dock Prep and contain the acceptable defaults:
The Hydrogen Menu
The Charge Menu
The Net Charge Menu
What's neat about Dock Prep is that the last pop-up menu will be the Save Mol2 menu:
Lets give this mol2 a file name that specifies that it has hydrogens and charges.
Call it  1O86_ligand_wH_wcH.mol2 
Congratulations, you've generated all the necessary mol2 files to advance to the next steps of this tutorial.
Lets transfer the mol2 files onto seawulf using the following command
scp *mol2 yourusername@stonybrook.edu:'/gpfs/home/yourusername/tutorial/001_structure'
002: Spheres
Generate the Required Surface File
-  Open the 1O86_protein_nH_ncH.mol2(with no hydrogens, no charges) in Chimera and hit Actions >> Surface >> Show
-  Write the DMS file by choosing Tools > Structure Editing > Write DMS; save the file as 1O86.dms
-  Upload the DMS file to your ~/tutorial/002_spheresdirectory on Seawulf usingscp
-  Now, move to the Seawulf terminal interface and navigate to your ~/tutorial/002_spheresdirectory. Create a sphgen input file using the following command:
vi INSPH
- 5. Paste the following into your input file, ensuring that no additional lines are pasted:
./1O86.dms R X 0.0 4.0 1.4 1O86.sph
- 6. Run the program with the following command
sphgen -i INSPH -o OUTSPH
- 7. Download the output file to your local directory and File >> Open it in the same scene as your protein file in Chimera. This will overlay the generated spheres on the protein.
Based on the overlay the ribbons are aligned with the spheres indicating the generation of surface spheres was successful.
Selecting Spheres Localized on Binding Site
- Run program on dock 6 called sphere_selector:
sphere_selector 1O86.sph ../001_structure/1O86_ligand_nH_ncH.mol2 10.0
- 2. Output file will automatically be written as selected_spheres.sph
- 3. Download this file to your local computer and overlay the file with your original protein file to ensure the spheres are localized over your ligand in the same manner as the 1O86.sph file in the previous step.
- 4. Select and hide binding spheres to ensure they are where the ligand is located
These spheres will be used for initial orientation of the ligand during docking, and provide a rough approximation of the negative space of the receptor.
003: Grid/box
Generating box and grid. The box and grid generation specify the 3-D space where the docking simulation will take place. The box defines the 3-D space where the docking algorithm will search for binding orientations and conformations of the ligand. The box size should be large enough to include the binding space, but not too large to conserve computational cost. The grid pattern allows for energy calculations between protein and ligand. A smaller grid step would allow for more precise calculations but would be computationally expensive and time consuming. A larger grid step would be computationally cheaper but would be lower in accuracy.
Generating the Box
Move to your ~/tutorial/003_gridbox directory. Create an input file for generating a box around the ligand binding site. 
vi showbox.in
Insert the following code into the file. These are parameters for the showbox program:
Y 8.0 ../002_spheres/selected_spheres.sph 1 1O86.box.pdb
This will construct a box that will enclose the binding spheres that are localized on the binding side. The box will have dimensions of 8.0 Angstroms from the binding spheres.
Run the input file using the following command:
showbox < showbox.in
An output file titled 1O86.box.pdb will be generated in your local directory.
Generating the Grid
vi grid.in
Insert the following into your input file:
allow_non_integral_charges no compute_grids yes grid_spacing 0.4 output_molecule no contact_score no energy_score yes energy_cutoff_distance 9999 atom_model a attractive_exponent 6 repulsive_exponent 9 distance_dielectric yes dielectric_factor 4 bump_filter yes bump_overlap 0.75 receptor_file ../001_structure/1O86_protein_nH_ncH.mol2 box_file 1O86.box.pdb vdw_definition_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn score_grid_prefix grid
If grid fails because it cannot find the .defn file, you may need to edit the path of the vdw_definition_file parameter to point to a VDW defn file that actually exists. It should be in the parameters folder of your DOCK6 installation.
Save the file and run using the following command:
grid -i grid.in -o 1O86_gridinfo.out
An output file with your Grid generation will appear in your directory.
004: Minimization
Before moving toward preforming docking of any fragments/ligands onto your receptor of interest it is important that your original ligand is energy-minimized. This is important before dock is able to calculate any energy interactions between atoms on the rector and ligand.
Move to your ~/tutorial/004_energy_min directory and create an input file using:
vi min.in
In this input file, use the following parameters to minimize the ligand without re-orienting it:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../001_structure/1O86_ligand_wH_wcH.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../001_structure/1O86_ligand_wH_wcH.mol2 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no grid_score_primary yes grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_lig_efficiency no grid_score_grid_prefix ../003_gridbox/grid minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_final_min no simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl ligand_outfile_prefix 1O86_min write_mol_solvation no write_orientations no num_scored_conformers 1 score_threshold 100.0 rank_ligands no
Finally, run the minimization using DOCK6 as such:
dock6 -i min.in -o min.out
The program should run relatively quickly and create a new Mol2 file called 1O86_min_ranked.mol2. Download this file to your local machine and open it in Chimera along with the original 1O86_ligand_wH_wcH.mol2 to compare the pre- and post-minimized poses. There should be minimal difference between them.
(note that in the above image hydrogens have been hidden for simplicity; the generated Mol2 file and the original ligand file should both be versions where hydrogens are present)
005: Rigid Docking
Rigid docking allows the ligand to be treated as a rigid molecule (no confirmational changes) and determines best orientation for molecule to fit directly into the receptor site.
Move to your ~/tutorial/005_rigid_dock directory and create an input file using the following command:
vi rigid.in
Add the following parameters:
conformer_search_type rigid use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../004_energy_min/1O86_min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../004_energy_min/1O86_min_scored.mol2 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002_spheres/selected_spheres.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no grid_score_primary yes grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_lig_efficiency no grid_score_grid_prefix ../003_gridbox/grid minimize_ligand yes simplex_max_iterations 1000 simplex_tors_premin_iterations 0 simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_final_min no simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl ligand_outfile_prefix 1O86_rigid write_mol_solvation no write_orientations no num_scored_conformers 100 write_conformations no cluster_conformations yes cluster_rmsd_threshold 2.0 score_threshold 100.0 rank_ligands no
Run using the following command:
dock6 -i rigid.in -o rigid.out
The job should take no longer than a minute.
Or run using the following slurm script:
#!/bin/bash #SBATCH --job-name=1O86_rigid # Name of the job #SBATCH --output=rigid.txt # Standard output file #SBATCH --error=1O86_error.txt # Standard error file #SBATCH --ntasks-per-node=1 # Number of tasks (adjust as needed) #SBATCH --mail-user=vanie.seecharan@stonybrook.edu # user #SBATCH --nodes=1 # nodes #SBATCH --time=4:00:00 # Maximum walltime (adjust as needed) #SBATCH --partition=short-40core # Correct partition name # Ensure the input file exists before running the program if [ ! -f grid.in ]; then echo "Error: grid.in file not found!" exit 1 fi # Run the program /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/bin/dock6 -i rigid.in -o rigid.out 10.0
The following output files will be produced in your directory:
- 1O86_rigid_scored.mol2
- rigid.out
Download the Mol2 file to your computer. This Mol2 file will contain an ensemble of poses which will be messy to visualize all at once. As such, you should use Chimera's ViewDock functionality to view the rigid docked poses. Use Tools >> Surface/Binding Analysis >> ViewDock and open the 1O86_rigid_scored.mol2 file that was created in the docking. The file will be of type DOCK 4, 5, or 6. A separate dialogue box will open that will allow you to select which pose you would like to view.
006: Flexible Docking
Flexible docking accounts for the conformational flexibility of the ligand by sampling the torsions present in it.
Navigate to your ~/tutorial/006_flex_dock and create an input file using the following command:
vi flex.in
Add the following script:
conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 1000 pruning_clustering_cutoff 100 pruning_orient_score_cutoff 1000.0 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file ../004_dock/1o86_lig_min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd yes use_rmsd_reference_mol yes rmsd_reference_filename ../004_dock/1o86_lig_min_scored.mol2 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002_surface_spheres/selected_spheres.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no grid_score_primary yes grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_lig_efficiency no grid_score_grid_prefix ../003_grid/grid minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no minimize_flexible_growth_ramp yes simplex_max_cycles 1 simplex_score_converge 0.1 simplex_initial_score_coverge 5 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_final_min no simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl ligand_outfile_prefix flex.out write_mol_solvation no write_orientations no num_scored_conformers 1 score_threshold 100.0 rank_ligands no
Now, you can run the above file, flex.in, by creating a shell script (flex.sh) as stated below. This will be helpful to have if you need to re-run the job for any reason; rather than re-typing the full dock command, you can save it in the shell script file and run that script instead.
#!/bin/bash dock6 -i flex.in -o flex.out
Now run the script-
bash flex.sh
DOCK6 program should finish running in a few minutes and should generate the following two files- a) flex.out b) flex_scored.mol2
Now we can transfer the docked files to our local system and visualize the results in chimera.
The docked ligand (colored in cyan) shows good structural overlap with the reference structure (purple). One way of confirming this is by looking at the RMSD value (below) which is below 1 A for this case.
We can also overlay the structures from the three docking types and compare with the reference to observe the best binding pose. The ligand structure obtained due to rigid docking (green) gives the lowest RMSD score of 0.59. Next best RMSD score is obtained from flexible docking (cyan) and then from fixed anchor docking (orange), giving an RMSD of 1.27.
007: Footprint Scoring
Footprint analysis maps out the interaction landscape between a ligand and its target protein. By breaking down the energetics of binding into electrostatic and van der Waals (VDW) contributions, it provides a detailed visualization of how different parts of a ligand interact with specific regions of the protein.
Move to your <~/tutorial/007_footprint</code> directory and create an input file (fps.in).
vi fps.in
Add the following-
conformer_search_type rigid use_internal_energy no ligand_atom_file ../004_dock/1O86_min_scored.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no grid_score_primary no gist_score_primary no multigrid_score_primary no dock3.5_score_primary no continuous_score_primary no footprint_similarity_score_primary yes fps_score_use_footprint_reference_mol2 yes fps_score_footprint_reference_mol2_filename ../001_structure/1O86_ligand_wH_wcH.mol2 fps_score_foot_compare_type Euclidean fps_score_normalize_foot no fps_score_foot_comp_all_residue yes fps_score_receptor_filename ../001_structure/1O86_protein_wH_wcH.mol2 fps_score_vdw_att_exp 6 fps_score_vdw_rep_exp 9 fps_score_vdw_rep_rad_scale 1 fps_score_use_distance_dependent_dielectric yes fps_score_dielectric 4.0 fps_score_vdw_fp_scale 1 fps_score_es_fp_scale 1 fps_score_hb_fp_scale 0 minimize_ligand no simplex_final_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl ligand_outfile_prefix fps write_mol_solvation no write_footprints yes write_hbonds yes write_orientations no num_scored_conformers 1 score_threshold 100.0 rank_ligands no
Now create a bash script to run the above create file-
vi fps.sh
#!/bin/bash echo "Running DOCK for footprint generation..." dock6 -i fps.in -o fps.out >&fps.log1 echo "Generating plot..." python /gpfs/projects/AMS536/zzz.programs/plot_footprint_single_magnitude.py fps_footprint_scored.txt 50 >&fps.log2 echo "Done"
As shown above, the .sh script also includes the path to a python file that generates and saves the footprint analysis results as a PDF. Once the script has been executed, transfer the generated PDF to your local system to review the results. The output should resemble the example shown below, showing a visual representation of the electrostatic and van der Waals interactions between the ligand and the protein.
008: 5K Virtual Screening
In this section, you will conduct a virtual screening on a set of druglike potential leads against the 1O86 receptor. Virtual screening of large ligand libraries is one of the main uses of DOCK and other molecular docking suites in drug discovery, as it allows for lead compound identification at much lower cost than traditional in vitro assays.
Input File
Because this section generates a high volume of molecule poses, you may need to make use of your scratch directory on Seawulf. While your home directory has a maximum storage volume of 20 gigabytes, scratch has a storage volume of 20 TERAbytes, so it will easily accommodate the output of your virtual screen. Your scratch directory should be located at /gpfs/scratch/yourusername/. Move there and create a directory for your virtual screening. 
Copy the input file from your 006_flex_dock directory into your present directory. We can re-use the majority of input from this file because the docking process is the same in both cases; the only difference is in the number of molecules to be docked. If you like, you can re-name the input file using mv to something like flex_vs.in for clarity. 
The only parameter that needs to be changed in the input file is the ligand_atom_file parameter. Instead of the single 1O86_min_scored.mol2, we want to dock a Mol2 file that contains multiple potential drug candidates. You should be able to find several such Mol2 files in /gpfs/projects/AMS536/zzz.programs/VS_libraries/. For this tutorial, we will screen the 5,000 molecule library (VS_library_5K.mol2), but you can elect to do the 100- (or 25,000) molecule library by simply changing the input file name. Just copy the path to the file (should be /gpfs/projects/AMS536/zzz.programs/VS_libraries/VS_library_5K.mol2), open the .in file, delete the current value for ligand_atom_file, and paste in the path. If you don't want to deal with the long filepath, you can also copy the VS_library file to your working directory, and just have the filename as the parameter instead. It is probably also advisable to change the names of other output files in the .in file so that the output is distinguishable from the previous flexible docking.
Note that if you have moved to your scratch directory and used relative paths in the .in file for the receptor sphere file, gridbox, et cetera, those will also need to be changed to reflect your new location. Absolute paths will likely be the most efficient solution for this.
Slurm and Job Submission
Up until this point, you have probably been running dock6 on the head node of Seawulf. This is fine for short, low-intensity jobs, like docking a single molecule or (debatably) generating a grid. However, the head node caters to all users currently accessing Seawulf, and has limited computing power at its disposal. As such, attempting to run a longer job (such as a 5,000 molecule screen) will fully consume the head node's limited resources, slowing down all user sessions in the process until the Seawulf admins notice and kick you. To safely utilize the full computing power of Seawulf, you need to submit a job to one of its allocated high-power computing nodes using its queueing system. This system tracks all jobs that have been submitted by various users and allocates them to processors as they are freed up, maximizing the productivity and efficiency of the computing cluster while ensuring a smooth user experience. The system Seawulf uses for this queue tracking is called Slurm (Simple Linux Utility and Resource Management).
Slurm is implemented as a module on Seawulf, and needs to be loaded into your environment before it can be used. You can load it by typing module load slurm into your terminal from any directory. Of note, you will need to do this every time you start a new SSH session. If you want the module to always be loaded when you log in, you can add the command to your .bashrc file (see section 000 for instructions on how to modify your .bashrc). 
Once the module has loaded, you can use the command sinfo to view what processors are present on Seawulf. If you are on Milan, you should see 40- and 96-core queues of varying lengths (short, medium, long), as well as some specialized GPU cores and high memory cores. The Seawulf website has some good descriptions of what cores are available and the duration of jobs allowed on them. sinfo --state=idle allows you to see what queues have cores idle; this is helpful for knowing where a job can start running immediately. 
To submit a job to a queue, you need to create a special job script file that will tell the queue everything it needs to know about the job, as well as what commands to run. Here is what that file will look like for our VS run:
#!/bin/bash # #SBATCH --job-name=1o86_vs #SBATCH --output=vs_output.txt #SBATCH --nodes=1 #SBATCH --ntasks-per-node=96 #SBATCH --time=48:00:00 #SBATCH -p long-96core module load intel/mpi/64/2018/18.0.3 mpirun -np 96 dock6.mpi -i flex_vs.in -o flex_vs.out
A line-by-line breakdown of this file:
-  #!/bin/bashis called a shebang, and is used in unix shell scripts to indicate that bash should be used to interpret and execute the script.
-  #SBATCH --job-name=1O86_vstells slurm what our job is called; this can be changed to whatever strikes your fancy without consequence
-  #SBATCH --output=vs_output.txttells slurm where to output information about the job itself. If jobs are not running properly, check this file for errors.
-  #SBATCH --nodes=1tells slurm how many computing nodes we want for this job. 1 is appropriate here as the job is not very large.
-  #SBATCH --ntasks-per-node=96tells slurm how many tasks it should expect to run per node. There are no strict guidelines on this, but setting it equal to the number of cores in the node (96 or 40 on Milan) is a good default.
-  #SBATCH --time=48:00:00tells slurm the maximum amount of time it should allocate for this job. The best thing to do here is request the maximum time possible for the queue; check the Seawulf website for each queue's parameters. Here, because we are submitting to the long-96core queue, we are requesting the queue maximum of 48 hours; our job will likely not take that long.
-  #SBATCH -p long-96coretells slurm which queue we would like to submit to; in this case, we are submitting to the long-term 96-core queue.
-  module load intel/mpi/64/2018/18.0.3loads a module that allows for parallel processing, greatly speeding up job progress.
-  mpirun -np 96 dock6.mpi -i flex_vs.in -o flex_vs.outis the actual command we would like to run in the job. This particular command asksmpirun, which implements parallel processing, to run the MPI version of dock6 for our screening. Thenpparameter provides information on the number of processors available for MPI; this can be calculated by multiplyingnodesbyntasks-per-node(96 * 1 in this case).
Now that you have a sense of what we're trying to do, create a shell file with a recognizable name (ex, job.sh) and either copy-paste or re-type the above input into the file, making any modifications to the dock path or input filename as needed. Save and exit the file.
Finally, you are ready to submit your first job to the queue. Use the command sbatch job.sh to submit your job file. If slurm accepts it, you will be provided a job number. 
You can use the command squeue -u yourusername to check the status of your job. If you do not see it in the queue directly after submitting, it has likely encountered an error, and you will need to troubleshoot. A good place to start will be reading any generated output files.
Otherwise, it will likely start as PD (pending) for a time (depending on how full the queue is, this can vary from a minute to a day. Sorry.) before switching to R (running). As it runs, you should see .mol2 files spontaneously created in the directory. Make sure to check semi-regularly that the job is still running.
Once the virtual screen has finished, you should have a .mol2 file in your directory that contains the best poses of the 5,000 ligands that were screened, according to purely grid-based docking.
Re-scoring Docked Molecules
Although grid scoring provides a good initial look at the most favorable poses of the given molecules, it is generally a good idea to re-score the poses provided by a different criteria to determine a final order. In this case, we will use footprint similarity scoring (FPS) as our re-scoring function. Once again, we can copy the input file that we used in the single-molecule footprint scoring example, and change the ligand_atom_file parameter from the 1O86 ligand to the output poses from the virtual screening. You should also change ranked_ligands to yes (towards the end of the file), and add a line with the parameter max_ranked_ligands set to 500 at the end of the file. This will ensure that the final .mol2 is a manageable size, while retaining the best poses of the docking. Finally, set the ligand_outfile_prefix parameter to something that will distinguish it from the other output already in the directory. Write and exit the file, and rename it to something appropriate like rescore.in.
Because this re-scoring uses rigid docking and no grid calculations, it should be relatively lightweight and safe to run on the head node. If you want to be safe (or if it is still running after a couple minutes), you can cancel it and use the command srun -p short-96core -t 1:00:00 --pty bash to instead open up an interactive bash session on one of the computing nodes. This will let you use an interactive terminal in a manner identical to the head node, but direct all processing requests to an available computing node instead of using the head node's resources. Unfortunately, it is also subject to the whims of the queue availability, so you may have to wait a while or time strategically to obtain an available node.
Once you run the re-scoring using  dock6 -i rescore.in -o rescore.out, you should see a re-scored ligand file in the directory with the prefix indicated in the rescore.in file. If you want, you can copy this .mol2 file to your home directory so that it is not purged after 30 days (a convenient feature of the scratch directory). After that, download it to your local machine using scp.
Viewing VS Results in Chimera
Thanks to our infallible foresight, we have saved only the 500 best ligands to visualize in Chimera, as opposed to the original, crash-inducing number of 5,000. To analyze the poses of these ligands, launch Chimera and File >> Open your receptor .mol2 file. Then, navigate to Tools >> Surface/Binding Analysis >> ViewDock, and open the 500-ligand .mol2 file that you have just downloaded. Chimera may lag for a minute depending on your system specs, but should eventually provide a dialogue box and display one ligand in (hopefully) your receptor's binding site.
The dialogue box allows you to select one or multiple ligands at a time to view, and displays their scores in a nice table format. You can view additional columns of data in the dialogue box by navigating to Column >> Show and selecting the desired parameter, and view multiple ligands at a time by clicking one, then holding shift and clicking another.
Congratulations! You have successfully conducted a virtual screening, learned how to use slurm in the process, and now have results for potential lead compounds targeting this receptor.
009: Genetic Algorithm
In addition to providing a platform for traditional virtual screening methods, DOCK also implements several unique algorithms for de novo-type drug design. One of these is the genetic algorithm (GA), which accepts one or several "parent" molecules and "mutates" them with basic substitutions, insertions, or deletions using a standard molecular fragment library before culling the resultant "population" based on user-specified metrics. The intent is to iteratively improve molecules over several "generations" of mutation-cull steps so that new chemical space can be explored, and more potent ligands found.
In order for the GA to mutate the provided parent molecules, it needs a standard library of molecular fragments to draw from. DOCK provides a foundational library that is generally effective in most GA applications, but can also accept .mol2 files to fragment and use as a custom library. To make this tutorial as educational as possible, you will learn how to fragment a molecule and add it to the foundational fragment library before running the genetic algorithm.
First, move to your 009_gen_alg directory and create a directory within it called fraglib/. Then, move into that directory in preparation for creating the fraglib generation input file.
Generating Fragments
The overall goal in this section is to use DOCK to fragment the ligand present in the 1O86 system, and add those fragments to the foundational library. The backbone of this operation is the write_fragment_libraries option in the DOCK input file. Below is what a sample input file would look like; you may copy and paste it directly, but we recommend moving through DOCK's decision tree and understanding what defaults the program has that we are changing. In this instance, we override several defaults such as orient_ligand and score_molecule, as these are unnecessary for the pure operation of generating ligand fragments. We will refer to the input file as fraglib_gen.in Some parameters are left as exercises to the reader; refer back to previous steps if confused.
conformer_search_type flex write_fragment_libraries yes fragment_library_prefix 1O86_lig_fraglib fragment_library_freq_cutoff 1 fragment_library_sort_method freq fragment_library_trans_origin no use_internal_energy no ligand_atom_file FILL IN 1O86 LIGAND MOL2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand no bump_filter no score_molecules no atom_model all vdw_defn_file FILL IN VDW.DEFN FILEPATH flex_defn_file FILL IN FLEX.DEFN FILEPATH flex_drive_file FILL IN FLEX_DRIVE FILEPATH ligand_outfile_prefix 1o86_fraglib write_mol_solvation no write_orientations no num_scored_conformers 1 score_threshold 100.0 rank_ligands no
Once you have created this input file and inserted the relevant filepaths, you can run DOCK to generate the fragments:
dock6 -i fraglib_gen.in -o fraglib_gen.out
Because we are only fragmenting one molecule, this program should be fine to run on the head node, and should finish in a couple seconds. Once it finishes, you should see present in the directory:
-  Four .mol2 files containing "linker", "rigid", "scaffold", and "sidechain" in their names; will be referred to as 1o86_fraglib_TYPE.mol2
- A file ending in "torenv.dat" (referred to as 1o86_lig_torenv.dat hereafter)
- An output_scored.mol2 file
- A .out file with whatever name you specified
The set of four .mol2 files contain all the fragments that DOCK was able to generate from the original molecule, and the *torenv.dat file is a record of all the torsion-able bonds that DOCK cleaved to generate the fragments.
Combining Fragment Libraries
Now that we have generated fragments and associated torsions from our ligand, we need to combine them with the standard DOCK fragment library and associated torsion environments. The torsion environment combination can be accomplished via a python script.
First, determine the path to the default DOCK torsion environment collection. It should be located in the parameters folder associated with your DOCK installation. At the time of writing, the full path is /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_torenv.dat
Next, determine the path for the python script used to combine torsion environments. It should be in the same bin directory as your dock6 command. At the time of writing, there is an instance at /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/bin/combine_torenv.py
Once you have found paths that work for both of these files, you can combine the standard torsion environment with the newly generated custom one by running
python insert_path_here/combine_torenv.py ./1o86_lig_torenv.dat insert_path_here/fraglib_torenv.dat
(note that you may need to use the command module load python if you encounter a python-related error in running this) Once the script finishes, you should have a full_fraglib.dat, full_sorted_fraglib.dat and unique_full_sorted_fraglib.dat files in your directory. We will be using the "unique" variant for our genetic algorithm process.
We also need to combine the .mol2 files that we have generated from our ligand with the standard DOCK-provided ones. Because .mol2 files just contain lists of molecules without any special header/footer text, they can just be concatenated together by a shell script. Combine the linker, scaffold, and sidechain .mol2 files with their counterparts in the parameters/ folder using cat; an example would be:
cat insert_path_here/parameters/fraglib_linker.mol2 ./1o86_fraglib_linker.mol2 >> combined_fraglib_linker.mol2
(Note that we do not do anything with the "rigid" .mol2; this set of fragments is not needed to run the GA). After you do this process for all three variants of the fraglib .mol2s, you will be ready to run the genetic algorithm. It may be advisable to move the ligand-specific fragment library mol2s to a sub-directory in order to prevent cluttering/confusion.
Running the GA
With all preparation completed, we can now start optimizing molecules using the genetic algorithm. We will use our original ligand as the "parent" molecule for the first generation, and allow the algorithm to mutate it into new molecules for fifty generations.
Aside from the filepaths/filenames for the various fraglibs, ligand, protein spheres, and .defn files, the only parameters we are changing from default in the .in file are ga_max_generations and ga_mutate_parents. Maximum generations are lowered to 50 for computational tractability, and parents must be mutable because we only have one molecule to start with; the population would grow very slowly or even die off if we did not allow parental recombination. A full sample input file is given below (but as always, do please try to do it yourself):
conformer_search_type genetic ga_molecule_file ORIGINAL 1O86 LIGAND MOL2 ga_utilities no ga_fraglib_scaffold_file FRAGLIB_SCAFFOLD MOL2 ga_fraglib_linker_file FRAGLIB_LINKER MOL2 ga_fraglib_sidechain_file FRAGLIB_SIDECHAIN MOL2 ga_torenv_table FRAGLIB TORENV DAT ga_max_generations 50 ga_xover_on yes ga_xover_sampling_method_rand yes ga_xover_max 150 ga_bond_tolerance 0.5 ga_angle_cutoff 0.14 ga_check_overlap no ga_mutate_addition yes ga_mutate_deletion yes ga_mutate_substitution yes ga_mutate_replacement yes ga_mutate_parents yes ga_pmut_rate 0.3 ga_omut_rate 0.7 ga_max_mut_cycles 5 ga_mut_sampling_method rand ga_num_random_picks 15 ga_max_root_size 5 ga_energy_cutoff 100 ga_heur_unmatched_num 1 ga_heur_matched_rmsd 0.5 ga_constraint_mol_wt 500 ga_constraint_rot_bon 10 ga_constraint_H_accept 10 ga_constraint_H_don 5 ga_constraint_formal_charge 2 ga_ensemble_size 200 ga_selection_method elitism ga_elitism_combined yes ga_elitism_option max ga_max_num_gen_with_no_crossover 25 ga_name_identifier 1o86_ga ga_output_prefix 1086_ga_out use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file RECEPTOR SPHERES .SPH max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no grid_score_primary yes grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_lig_efficiency no grid_score_grid_prefix GRID PATH minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_final_min no simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file VDW DEFN FILE flex_defn_file FLEX DEFN FILE flex_drive_file FLEX_DRIVE FILE
We will also need to create a batch file to run the GA using slurm, as it is an intensive process that will take a number of hours to run. A sample is provided below; you should adopt yours to the appropriate queue, job name, time to run, DOCK path, and input filename:
#!/bin/bash # #SBATCH --job-name=1o86_GA #SBATCH --output=ga_output.txt #SBATCH --nodes=1 #SBATCH --ntasks-per-node=40 #SBATCH --time=4:00:00 #SBATCH -p short-40core change_this_to_your_path_to_dock/bin/dock6 -i genalg.in -o genalg.out
Use sbatch job.sh to submit this per usual, and check up on it regularly. The first few generations should go relatively quickly, with a slowdown as the molecular population increases past generation 8-10. If you want to track the population growth, you can use grep -c "Name" *mol2 to view the molecule populations of every .mol2 file that has been written so far.
Analysis
Once the genetic algorithm has finished running, you can view the final generation's best molecules and poses in Chimera via the same process outlined in section 008. To review, this will entail downloading the final .mol2 to your local machine (generation 50 in this case), launching Chimera, opening the receptor .mol2 via File >> Open, and using the Tools >> Structure/Binding Analysis >> ViewDock tool to load the ligand .mol2 file in a conveniently viewable format. An example of this can be seen below:
010: De Novo Design Example
De Novo Design
De Novo Design is a dock based algorithm that generates new ligands from scratch. This is done by selecting a dummy atom, which is the 'seed' that 'grows' scaffolds, linkers, or side chains based on user defined parameters. For example, say you only wanted to use de novo design to only 'grow' drug-like molecules. The way this is accomplished is ensuring the input file contains parameters that bias the algorithm to abide by Lipinski's Rule of 5 The guiding principle for using De Novo design is because there is a limit to the amount of new molecules that you could generate using a general virtual screening. Nevertheless, this method will certainly aid in enhancing your search space in generating numerous new compounds.
In this tutorial, you will be guided through a Denovo refinement calculation. This kind of calculation involves deleting a part of a ligand, creating a dummy atom, and using the De Novo design algorithm to 'regrow' that part of the ligand based on user defined parameters. The way in which the 'regrow' process happens comes from the algorithm pulling molecules from fragment libraries that contain scaffolds, linkers, and sidechains. The user defined parameters will dictate which libraries the algorithm can pull from.
The outcome will be a series of refinements that will retain a desired interaction in 1O86.
Selecting a Dummy Atom
To prepare our molecule for a De Novo calculation, we must first select a dummy atom to 'grow' from. To do this, first open your 1O86_protein_wH_wcH.mol2 file, then your 1O86_ligand_wH_wcH.mol2. The rationale for this is we would like to delete an atom on the ligand that contains a group that interacts with the protein. This will help to produce meaningful results, from a drug design standpoint:
As you can see it is a little difficult to see which atoms are interacting with the protein. To refine this inspection, hit Control and select an atom on the ligand. Then, hit the up arrow to highlight the entire ligand. Next, hit Select >> Zone and the following menu appears:
Lets modify the Zone and change the number from 5.0 angstroms to 3.0 angstroms. Additionally, make sure that that the third box is checked off entitled that selects neighboring residues. Then, press okay. You will notice that your ligand and the neighboring residues are highlighted:
To modify this image even further: Go to Actions >> Atoms/Bonds >> Show Next, navigate to Select >> Invert(selected models), here you'll notice most of the protein is highlighted Lastly, Return to Actions >> Atoms/Bonds >> delete You now see that there is a clearer picture of specifically, which atoms are interacting closely with the protein
You'll notice that there are two oxygens interacting with neighboring residues in the protein. Tracing your cursor in between the oxygens, you'll highlight a Carbon atom labeled C9. This will be the atom of choice for this tutorial.
Generating a Dummy Atom
Now that we have our atom of choice, we need to modify the ligand as well as the mol2 file itself.
First, open the 1O86_ligand_wH_wcH.mol2 in Chimera.
Locate the C9 atom >> select the two oxygens attached to C9 >> Atoms/Bonds >> delete. Then, save the mol2 file, lets call it 1O86_ligand_Du.mol2.
Finally, we must open the mol2 file on our terminal or a text editor and change the atom type of C9 to Du:
First in the terminal, type the command
vi 1O86_ligand_Du.mol2
Or, open the Mol2 file in your text editor of choice.
Find the C9 atom and modify the atom type. Your input file should look like this:
Save it.
Now lets verify this change by opening the mol2 file on Chimera:
As you can see C9 is now a dummy atom as shown in purple
Now, the Mol2 is ready for De Novo calculations
As a last step, transfer the mol2 to your working directory on Seawulf
scp 1O86_ligand_Du.mol2 yourusername@login.seawulf.stonybrook.edu:'/gpfs/home/yourusername/tutorial/010_de_novo/'
Running The Denovo Calculation
In your 010_De_Novo folder create an empty input file:
touch DN.in
Then prompt the question tree with the dock program:
dock6 -i DN.in
Follow the question tree and use the following sample input file as a template
conformer_search_type denovo dn_fraglib_scaffold_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_scaffold.mol2 dn_fraglib_linker_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_linker.mol2 dn_fraglib_sidechain_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_sidechain.mol2 dn_user_specified_anchor yes dn_fraglib_anchor_file 1O86_ligand_Du.mol2 dn_torenv_table ../ga_calc/unique_full_sorted_fraglib.dat dn_name_identifier denovo dn_sampling_method graph dn_graph_max_picks 30 dn_graph_breadth 3 dn_graph_depth 2 dn_graph_temperature 100.0 dn_pruning_conformer_score_cutoff 100.0 dn_pruning_conformer_score_scaling_factor 2.0 dn_pruning_clustering_cutoff 100.0 dn_remove_duplicates yes dn_max_duplicates_per_mol 0 dn_write_pruned_duplicates no dn_advanced_pruning yes dn_prune_initial_sample yes dn_sample_torsions yes dn_prune_individual_torsions yes dn_prune_combined_torsions yes dn_random_root_selection no dn_mol_wt_cutoff_type soft dn_upper_constraint_mol_wt 1000 dn_lower_constraint_mol_wt 0.0 dn_mol_wt_std_dev 35.0 dn_constraint_rot_bon 15 dn_constraint_formal_charge 2.0 dn_heur_unmatched_num 1 dn_heur_matched_rmsd 2.0 dn_unique_anchors 1 dn_max_grow_layers 1 dn_max_root_size 25 dn_max_layer_size 25 dn_max_current_aps 5 dn_max_scaffolds_per_layer 1 dn_max_successful_att_per_root 50000 dn_write_checkpoints yes dn_write_prune_dump yes dn_write_orients no dn_write_growth_trees no dn_output_prefix DN.out use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no grid_score_primary yes grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_lig_efficiency no grid_score_grid_prefix ../003_gridbox/grid minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_final_min no simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl
Once you've filled out the input file, you can then go ahead and run the calculation
dock6 -i DN.in -o DN.out
After running the calculations, the following output files will be generated:
DN.out DN.out.anchor_1.root_layer_1.mol DN.out.completed.denovo_build.mol2 DN.out.denovo_build.mol2
Examining the output file you can see how many molecules were grown.
In this case there are 21 different attached molecules.
Now, copy over the DN.out.completed.denovo_build.mol2 to your home computer
scp yourusername@login.seawulf.stonybrook.edu:'/gpfs/home/yourusername/tutorial/010_de_novo/DN.out.completed.denovo_build.mol2' .
Visualizing the Molecules in Chimera
Head over to Chimera and go to Tools >> Surface/Binding Analysis >> Scroll down to where you see ViewDock and select it
Then select the DN.out.completed_denovo.mol2
The following menu will appear:
You can navigate through the list of molecules that were grown from the calculations.
You'll notice that the best scoring ligand retained the oxygens from the original ligand:
Opening the original ligand mol2, you'll see that the molecules are almost identical
If you return to the ViewDock menu and highlight all of the ligands. You will be able to see a depiction of all of the different refinements.
This can be done by going to Tools >> ViewDock >> Raise Now that the menu has been prompted hold down your shift key and select the first ligand, then scroll down and select the last ligand. All your molecules will be shown overlapping on Chimera:
Well done, you've now completed a simple denovo calculation!
Running a Generic De Novo Calculation
In this part of the tutorial, you will be running a generic denovo calculation. What this means is that you take an anchor, or don't specify one, and grow a ligand around parts of the molecule. These parts of the molecule are dictated by the selected spheres that were calculated in the second part of the VS tutorial. The end result becomes a series of newly 'grown' ligands based on the user defined parameters. These parameters are what will bias the denovo growth to meet a certain criterian (i.e. a specific molecular weight). For this tutorial, you will select an anchor and use that to grow a series of ligands, which you will then be able to see on chimera.
To start, lets create a new directory in your  010_de_novo  directory. 
Lets call it generic.
mkdir generic
Selecting an Anchor
Now that you have a working directory for this calculation, lets go ahead and create an anchor
file to run the generic denovo calculation. For our anchor of choice, we will use the most popular
sidechain from dock6's current library. To start off, copy the library mol2 file to your home directory from
seawulf:
scp yourusername@milan.seawulf.stonybrook.edu:'/gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_sidechain.mol2' .
It's a pretty hefty file and the way you are going to open the file is by using ViewDock on Chimera.
Navigate to Chimera and select Tools --> Surface/Binding Analysis --> View Dock
Select the fraglib_sidechain.mol2 file.
On chimera you'll see a methyl-ethyl sidechain popup and the View Dock menu to toggle through other sidechains:
As mentioned before, you are going to use the most popular sidechain as the anchor for this calculation.
On the View Dock Menu go to Column --> Show --> FREQ
This is the frequency in which these particular sidechains occur in the dock6 testset:
Ironically, the methy ethyl sidechain appears to be the most popular sidechain. This will be the anchor of choice.
Go to File --> Save Mol2
Lets call it Anchor.mol2
Additionally, make sure that only the first model is selected in the Select Model panel like so:
Select Save. You can also verify that there is only one sidechain in the mol2 by viewing it on the terminal
vi Anchor.mol2
The file should look like this:
Now you can go ahead and copy this file over to the  generic  directory in seawulf
scp Anchor.mol2 yourusername@milan.stonybrook.edu:'/gpfs/yourusername/tutorial/010_de_novo_/generic'
Running the Calculations
Now that you're on seawulf in the  generic  directory with your Anchor.mol2 file,
lets create the necessary files to run the generic denovo calculation.
First, create an input file, lets call it GDN.in.
Just like when running the refinement calculation, you are encouraged to create an empty input file and follow the dock6 question tree.
To create an empty input file run the following command:
touch GDN.in
Then, use dock6 to initiate the question tree:
dock6 -i GDN.in
Use the following sample input file as a template when navigating the question tree:
conformer_search_type denovo dn_fraglib_scaffold_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_scaffold.mol2 dn_fraglib_linker_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_linker.mol2 dn_fraglib_sidechain_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_sidechain.mol2 dn_user_specified_anchor yes dn_fraglib_anchor_file ../generic/Anchor.mol2 dn_torenv_table /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_torenv.dat dn_name_identifier generic dn_sampling_method graph dn_graph_max_picks 30 dn_graph_breadth 3 dn_graph_depth 2 dn_graph_temperature 100.0 dn_pruning_conformer_score_cutoff 100.0 dn_pruning_conformer_score_scaling_factor 2.0 dn_pruning_clustering_cutoff 100.0 dn_remove_duplicates yes dn_max_duplicates_per_mol 0 dn_write_pruned_duplicates no dn_advanced_pruning yes dn_prune_initial_sample yes dn_sample_torsions yes dn_prune_individual_torsions yes dn_prune_combined_torsions yes dn_random_root_selection no dn_mol_wt_cutoff_type soft dn_upper_constraint_mol_wt 550.0 dn_lower_constraint_mol_wt 0.0 dn_mol_wt_std_dev 35.0 dn_constraint_rot_bon 15 dn_constraint_formal_charge 2.0 dn_heur_unmatched_num 1 dn_heur_matched_rmsd 2.0 dn_unique_anchors 1 dn_max_grow_layers 9 dn_max_root_size 25 dn_max_layer_size 25 dn_max_current_aps 5 dn_max_scaffolds_per_layer 1 dn_max_successful_att_per_root 50000 dn_write_checkpoints yes dn_write_prune_dump no dn_write_orients no dn_write_growth_trees no dn_output_prefix generic use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../002_spheres/selected_spheres.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no grid_score_primary yes grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_lig_efficiency no grid_score_grid_prefix ../003_gridbox/grid minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_final_min no simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_de_novo.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl
After creating the input file, you will need to create a slurm input file to run this calculation.
vi gdn.slurm
Type the following into the slurm file:
#!/bin/bash # #SBATCH --job-name=1O86_generic_de_novo #SBATCH --output=generic_dn_output.txt #SBATCH --ntasks-per-node=24 #SBATCH --nodes=4 #SBATCH --time=4:00:00 #SBATCH -p short-96core
dock6 -i GDN.in -o GDN.out
To run the calculation submit a job with the slurm file:
sbatch gdn.slurm
The job itself may take awhile to finish. It is highly recommended that you check to see if your job is still running and that your .out file to see if there are any errors.
After the calculations have run, the following files will show up in your directory:
 GDN.out                             generic_dn_output.txt
 generic.anchor_1.root_layer_1.mol2  generic.anchor_1.root_layer_5.mol2      
 generic.anchor_1.root_layer_2.mol2  generic.anchor_1.root_layer_6.mol2
 generic.anchor_1.root_layer_3.mol2  generic.anchor_1.root_layer_7.mol2
 generic.anchor_1.root_layer_4.mol2  generic.completed.denovo_build.mol2
                                     generic.denovo_build.mol2
These illustrate the different layers of growth that were specified in the input file, the  generic.completed.denovo_build.mol2  contains all of the finished molecules regardless of how many layers of growth. To view how many molecules are in this output file, type the following line in your terminal:
grep -c "SMALL" generic.completed.denovo_build.mol2
You'll notice that the number of molecules in this file come out to be 2350. That's okay, go ahead and transfer this file onto your home computer:
scp yourusername@milan.stonybrook.edu:'gpfs/yourusername/tutorial/010_de_novo/generic/generic.completed.denovo_build.mol2' .
Visualizing The Results
To view the molecules that were grown, use the View Dock tool in Chimera. The procedure is the same as when you were selecting an anchor. You should wait about 30 seconds as there are a large chunk of molecules being loaded into chimera. Once Chimera is finished loading you will see an image of a finished sidechain and the View Dock Menu:
You want to view the sidechains that have better overall grid scores. On the View Dock menu, go to Column --> Show --> Grid_Score. You'll see that the Grid_Score column is columned. Let's sort the Grid_Score column in descending order(i.e. from greatest to least). To do this, click on the Grid_Score title, you'll notice that there's now an arrow head over the column. Keep clicking on the column until you see that the arrow head is pointing up:
Now the best scoring molecule that was grown is at the top of the list and here's what it looks like:
Load in the  1O86_fixed_protein.pdb  and the  1O86_ligand_nH_ncH.mol2 .
On Chimera, select Actions --> Ribbon --> hide.
This will give a clear picture of the ligand having slight overlap with the grown ligand in the protein:
Congratulations, you've successfully run a generic denovo calculation.
























































