Difference between revisions of "2025 DOCK tutorial 1 with PDBID 1O86"
Stonybrook (talk | contribs) (→Visualizing the Molecules on Chimera) |
Stonybrook (talk | contribs) (→Visualizing the Molecules on Chimera) |
||
Line 359: | Line 359: | ||
[[File:View_Dock.png]] | [[File:View_Dock.png]] | ||
+ | |||
+ | You can navigate through the list of molecules that were grown from the calculations. | ||
+ | |||
+ | You'll notice that the best scoring ligand grew back the original oxygens from the original ligand: | ||
+ | |||
+ | [[File:Best_Scoring_Ligand.png]] |
Revision as of 18:04, 21 February 2025
Contents
DOCK Tutorial using PDB 1O86
[intro text]
000: Foundations
Chimera
UCSF Chimera is a python-based, open-source molecular visualization and manipulation software suite. It is extremely helpful for both preparing molecules/receptors for docking and for visually analyzing the results of those calculations.
It can be downloaded from the official UCSF site; make sure to select the version that matches your operating system (Mac or Windows). Although Chimera is no longer under active development, it remains a relevant software for molecular modeling.
Once Chimera has installed, you can open it to find a blank blue-ish window, with a row of tabs along the top. Throughout this tutorial, you will be instructed to perform different actions contained within these tabs. We will denote the specific tab and sub-tab to be accessed by >> signs. For example, File >> Open PDB would indicate that you should click on the File tab, then mouse to and click Open PDB. This is necessary because some actions are nested in multiple sub-tabs (for instance, selecting all hydrogens in a model would require Select >> Chemistry >> element >> H, as shown below) More extensive documentation on Chimera and its functions is available on the official site.
Seawulf
To complete this tutorial as a Stony Brook student, you will need an account on Seawulf. A ticket to obtain an account can be submitted on the Seawulf website; Dr. Rizzo will need to provide approval for account activation.
The Seawulf website also has a list of best practices for using a High Performance Computing (HPC) cluster. We recommend reading through them before attempting to run any intensive programs on Seawulf.
SSH
You will need to use a Secure Shell (SSH) connection to connect to Seawulf remotely. A guide to this process is available on the Seawulf website.
Basic Unix Commands and Environment
Seawulf uses a terminal-based Unix operating system. If you have never used a terminal-based OS before, you should familiarize yourself with some basic Unix tutorials. Broadly, you should know how to:
- List your working directory with
pwd
, change your working directory withcd
, make new directories withmkdir
, remove directories withrmdir
, and list the contents of a directory withls
- Edit files using the text editor
vim
(use thevimtutor
to read about basic functionality) - Create an empty file with
touch
, move files and directories around withmv
, copy directories and files withcp
, and (very carefully!) remove directories and files withrm
- Determine if commands are available with
which
- Understand how to use commands and how flags/input parameters should be formatted
You should also understand the concept of filepaths, and how they are used by commands. Much of DOCK relies on you using the correct filepaths to pass files and parameters into the program, and if the paths are wrong, then the program will not work. When possible, it is advisable to use absolute filepaths (paths that start at the root directory), as opposed to paths relative to your current working directory. For example, if you are in the directory /gpfs/home/yourusername/tutorial/002_spheres
, referencing the absolute path /gpfs/home/yourusername/tutorial/001_structure/important_structure.pdb
for a needed structure file is more reliable than the relative path ../001_structure/important_structure.pdb
, as the latter path is relative to 002_spheres/
, and meaningless outside of that context. The command realpath
will return the absolute path of any file passed to it, which is useful for quickly and accurately determining the absolute path of any file.
To work with DOCK6, it may also be necessary to add a path to your config file that tells Unix where to look for DOCK-related commands. You can check if you need to do this using the command which dock6
. If which
informs you that no dock6
can be found, you will need to edit your .bashrc file:
- Move to your home directory by either entering
cd
orcd ~
- Use the command
vi .bashrc
to start editing your personal config file - At the bottom of the file, add the following line:
export PATH=$PATH:/Path/To/Your/DOCK6/bin
where Path/To/Your/DOCK6/ should be replaced with whatever the path is to your local DOCK installation (note: the full path should end with /bin). At the time of this writing, there is a compiled version of DOCK6.12 at /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/, but if this does not work then ask Dr. Rizzo or a lab member where a compiled instance can be found
- 4. Save and exit the file, then enter the command
source .bashrc
into the terminal. - 5. Now when you request
which dock6
, the terminal should return the path to the directory you just provided.
After all this, you should have a good understanding of how to navigate the Unix terminal, and have the DOCK6 suite ready to use in your environment.
SCP File Transfer
Another important aspect of working on Seawulf is uploading and downloading files to your local machine. This is accomplished using the Secure Copy Protocol (scp) terminal command. See the pertinent section on the Seawulf website. For Windows, it may be advisable to download a third party program; FileZilla works for this task.
Directory setup
Finally, assuming all necessary base Unix knowledge has been obtained, you will want to set up a set of directories for the work you will be doing in this tutorial. The directories will help you compartmentalize your work and help to prevent important files from being overwritten or lost. For the modules of this tutorial, we recommend setting up this directory structure, and setting up folders on your local machine with the same names for ease of file transfer:
~/001_structure
~/002_spheres
~/003_gridbox
~/004_energy_min
~/005_rigid_dock
~/006_flex_dock
~/007_footprint
~/008_virtual_screen
~/009_gen_alg
~/010_de_novo
A Note on Troubleshooting
Finally, despite our best efforts at accuracy, a healthy amount of this tutorial will likely not go smoothly for you. There will be errors and inconsistencies between what you see here and what happens on Seawulf, and there may not be an immediate answer for what you should do to fix them. The instinct when this happens is to find someone who knows what might have happened and ask them for advice, and while this will usually be helpful, it may be inefficient. As such, we advise you to first please troubleshoot. No program you will use in this tutorial is intended to be a black box. All of them do their best to return errors that tell you what is going wrong, and where to look to fix it. If you get an error, read it! Often it will tell you that it can't find a particular file ("doesn't exist" is the favorite wording), or that some piece of the input is giving it trouble— these tell you what paths to check or lines of a file to review. Also, copy-pasting an entire error into Google (or ChatGPT if you must) is a completely valid line of inquiry, and often yields helpful results.
Having said that, here are some common errors and tactics you can use to fix them:
"[File] could not be opened"
or "[file] does not exist"
usually indicates that a path you gave the program is not accurate. You can "sanity check" paths by copying them directly from the input and putting them into an ls command. If it throws an error, fix the path!
"version `GLIBCXX_whatever' not found"
is generally an issue with how the DOCK installation was compiled. Try to find a different version (/gpfs/projects/AMS536/zzz.programs usually has several versions available) or try running the command on a different node (Milan or Login)
Errors mentioning FORTRAAN: Make sure the input files are named correctly(ex INSPH for sphgen) and there are no extra spaces or lines anywhere in the input files. Sometimes they sneak in at the start or end of a line and are parsed literally.
Often, output is sent into a .out file instead of written to the console. If the program terminates immediately but does not return expected files, check whatever .out file was written for errors at the very bottom.
001: Structure Prep
Download PDB, separate lig/rec, model loops, addH/charge
Downloading the PDB
Having setup your necessary environment to work on seawulf, lets navigate to your local computer and begin the protein preparation process:
To begin protein preparation you will need the necessary PDB file to work with. Using this link: https://www.rcsb.org/structure/1O86, you will see the RCSB main page opened to our protein of choice:
Next, you'll want to navigate to the top right corner where it says Download Files. Then, select the dropdown arrow. The following pulldown menu will appear on the screen:
Select 'Download PDB'. Now the PDB file is downloaded to your local computer.
Now that you have the PDB file, lets navigate to Chimera program to open the file.
002: Spheres
surface generation, sphgen, selecting spheres, visualization in Chimera
Generate the required surface file
1. Open 1O86 protein only file in chimera and hit select > show> surface 2. Write the DMS file by choosing tools>structure editing>Write DMS 3. Upload the DMS file to your directory 4. Create a sphere input file using the following command:
vi INSPH
5. Paste the following into your input file:
./IO86.dms R X 0.0 4.0 1.4 IO86.sph
6. Run the program with the following command
sphgen -i INSPH -o OUTSPH
7. Download the output file to your local directory and open and overlay with protein file in Chimera File:Screenshot 2025-02-19 130820.png
Based on the overlay the ribbons are aligned with the spheres indicating the generation of surface spheres was successful.
Generate Spheres localized on binding site
003: Grid/box
showbox, grid generation, visualization in Chimera
004: Minimization
Explanation of .in file for minimization and process (Chimera visuals after)
005: Rigid Docking
Explanation of .in file for rigid docking and process (Chimera visuals after)
006: Flexible Docking
Explanation of .in for flex docking (Chimera visuals after)
007: Footprint Scoring
Explanation of .in for FPS, use of Python script to generate graph
008: 5k Virtual Screen
Slurm and queue etiquette, VS .in explanation and queue submission, ViewDock in Chimera
009: Genetic Algorithm Example
Explanation of rationale for GA and basic functionality, sample input file and expected outputs
010: De Novo Design Example
Explanation of rationale for DN and basic functionality, sample input file and expected outputs
De Novo Design is a dock based algorithm that generates new ligands from scratch. This is done by selecting a dummy atom, which is the 'seed' that 'grows' scaffolds, linkers, or side chains based on user defined parameters. For example, say you only wanted to use de novo design to only 'grow' drug-like molecules. The way this is accomplished is ensuring the input file contains parameters that bias the algorithm to abide by Lipinski's Rule of 5 The guiding principle for using De Novo design is because there is a limit to the amount of new molecules that you could generate using a general virtual screening. Nevertheless, this method will certainly aid in enhancing your search space in generating numerous new compounds.
Selecting a Dummy Atom
To prepare our molecule for a De Novo calculation, we must first select a dummy atom to 'grow' from. To do this, first open your 1O86_fixed_protein_H_cH.mol2 file, then your 1O86_ligand_H_cH.mol2. The rationale for this is we would like to delete an atom on the ligand that contains a group that interacts with the protein. This will help to produce meaningful results, from a drug design standpoint:
As you can see it is a little difficult to see which atoms are interacting with the protein. To refine this inspection, hit Control and select an atom on the ligand. Then, hit the up arrow to highlight the entire ligand. Next, hit Select --> Zone and the following menu appears:
Lets modify the Zone and change the number from 5.0 angstroms to 3.0 angstroms. Additionally, make sure that that the third box is checked off entitled that selects neighboring residues. Then, press okay. You will notice that your ligand and the neighboring residues are highlighted:
To modify this image even further: Go to Actions --> Atoms/Bonds --> hit Show Next, navigate to Select --> press Invert(selected models), here you'll notice most of the protein is highlighted Lastly, Return to Actions --> Atoms/Bonds --> hit delete You now see that there is a clearer picture of specifically, which atoms are interacting closely with the protein
You'll notice that there are two oxygens interacting with neighboring residues in the protein. Tracing your cursor in between the oxygens, you'll highlight a Carbon atom labeled C9. This will be the atom of choice for this tutorial.
Generating a Dummy Atom
Now that we have our atom of choice, we need to modify the ligand as well as the mol2 file itself.
First, open the 1O86_ligand_H_cH.mol2 in Chimera.
Locate the C9 atom --> select the two oxygens attached to C9 --> Atoms/Bonds --> delete. Then, save the mol2 file, lets call it 1O86_ligand_Du.mol2.
Finally, we must open the mol2 file on our terminal and change the atom type of C9 to Du:
First in the terminal, type the command
vi 1O86_ligand_Du.mol2
Find the C9 atom and modify the atom type. Your input file should look like this:
Save it.
Now lets verify this change by opening the mol2 file on Chimera:
As you can see C9 is now a dummy atom as shown in purple
Now, the mol2 is ready for De Novo calculations
As a last step, transfer the mol2 to your working directory on seawulf
scp 1O86_ligand_Du.mol2 username@login.seawulf.stonybrook.edu:'/gpfs/username/010_de_novo'
Running The Denovo Calculation
In your 010_De_Novo folder create an empty input file:
touch DN.in
Then prompt the question tree with the dock program:
dock6 -i DN.in
Follow the question tree and use the following sample input file as a template
conformer_search_type denovo dn_fraglib_scaffold_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_scaffold.mol2 dn_fraglib_linker_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_linker.mol2 dn_fraglib_sidechain_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/fraglib_sidechain.mol2 dn_user_specified_anchor yes dn_fraglib_anchor_file 1O86_ligand_Du.mol2 dn_torenv_table ../ga_calc/unique_full_sorted_fraglib.dat dn_name_identifier denovo dn_sampling_method graph dn_graph_max_picks 30 dn_graph_breadth 3 dn_graph_depth 2 dn_graph_temperature 100.0 dn_pruning_conformer_score_cutoff 100.0 dn_pruning_conformer_score_scaling_factor 2.0 dn_pruning_clustering_cutoff 100.0 dn_remove_duplicates yes dn_max_duplicates_per_mol 0 dn_write_pruned_duplicates no dn_advanced_pruning yes dn_prune_initial_sample yes dn_sample_torsions yes dn_prune_individual_torsions yes dn_prune_combined_torsions yes dn_random_root_selection no dn_mol_wt_cutoff_type soft dn_upper_constraint_mol_wt 1000 dn_lower_constraint_mol_wt 0.0 dn_mol_wt_std_dev 35.0 dn_constraint_rot_bon 15 dn_constraint_formal_charge 2.0 dn_heur_unmatched_num 1 dn_heur_matched_rmsd 2.0 dn_unique_anchors 1 dn_max_grow_layers 1 dn_max_root_size 25 dn_max_layer_size 25 dn_max_current_aps 5 dn_max_scaffolds_per_layer 1 dn_max_successful_att_per_root 50000 dn_write_checkpoints yes dn_write_prune_dump yes dn_write_orients no dn_write_growth_trees no dn_output_prefix DN.out use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 use_database_filter no orient_ligand no bump_filter no score_molecules yes contact_score_primary no grid_score_primary yes grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_lig_efficiency no grid_score_grid_prefix ../003_gridbox/grid minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_final_min no simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.12_ams536/parameters/flex_drive.tbl
Once you've filled out the input file, you can then go ahead and run the calculation
dock6 -i DN.in -o DN.out
After running the calculations, the following output files will be generated:
DN.out DN.out.anchor_1.root_layer_1.mol DN.out.completed.denovo_build.mol2 DN.out.denovo_build.mol2
Examining the output file you can see how many molecules were grown.
In this case there are 21 different attached molecules.
Now, copy over the DN.out.completed.denovo_build.mol2 to your home computer
scp username@login.seawulf.stonybrook.edu:'/gpfs/username/010_de_novo/DN.out.completed.denovo_build.mol2' .
Visualizing the Molecules on Chimera
Head over to Chimera and go to Tools --> Surface/Binding Analysis --> Scroll down to where you see ViewDock and select it
Then select the DN.out.completed_denovo.mol2
The following menu will appear:
You can navigate through the list of molecules that were grown from the calculations.
You'll notice that the best scoring ligand grew back the original oxygens from the original ligand: