Difference between revisions of "2022 DOCK tutorial 1 with PDBID 6ME2"

From Rizzo_Lab
Jump to: navigation, search
(Virtual Screening)
(Virtual Screening)
Line 226: Line 226:
 
= Virtual Screening =
 
= Virtual Screening =
  
hello
+
Now that we have finished preparing both the ligand and receptor and verified that the ligand can be reasonably docked back into the receptor, we can now begin with the virtual screen. The idea behind virtual screening is that you can take a library of known molecules, dock them to a receptor, and then score how well those molecules are binding to the receptor. In this tutorial, our library is a file named "VS_library_5K.mol2", which contains 5,000 small molecules for us to dock to 6ME2. Let's begin by generating that library file of 5,000 molecules. Cd into the directory "005_virtual_screen" and copy the "VS_library_25K.mol2" file from "/gpfs/projects/AMS536/zzz.programs" into that directory with the following command:
 +
 
 +
cp /gpfs/projects/AMS536/zzz.programs/VS_library_25K.mol2 ./
 +
 
 +
Now we need to truncate this library for the top 5,000 molecules. We can do this with a very simple python script. Vi into "truncate_library.py" and write the following to that file (you can delete the comments, but they will give context to what is happening line-by-line if you don't know):
 +
 
 +
#Keep track of how many molecules have been copied over.
 +
moleculenumber=0                                                       
 +
#Create a list to hold each line being copied.
 +
libraryfile = []                                                 
 +
#Open the 25K library file, and number each line so it can be read line-by-line.               
 +
read_file = open("VS_library_25K.mol2")
 +
lines = read_file.readlines()
 +
#Add each individual line to the list.
 +
for i, line in enumerate(lines):
 +
        libraryfile.append(line)
 +
#Create the new library file so that molecules can be written to it.
 +
with open("VS_library_5K.mol2", "w") as new_f:
 +
#Copy over each line in the list for the first 5,000 molecules
 +
        for d in libraryfile:
 +
                new_f.write(d)
 +
                if "MOLECULE" in d:
 +
                        moleculenumber += 1
 +
                if moleculenumber == 5001:
 +
                        break
 +
 
 +
Once you have finished doing that, save and exit the file. To run this, type the following command into the terminal:
 +
 
 +
python truncate_library.py
 +
 
 +
Once you have your library of molecules, the next thing we want to do is write the input file. Vi into virtual.in and write the following to that file:
 +
 
 +
conformer_search_type                                        flex
 +
write_fragment_libraries                                    no
 +
user_specified_anchor                                        no
 +
limit_max_anchors                                            no
 +
min_anchor_size                                              5
 +
pruning_use_clustering                                      yes
 +
pruning_max_orients                                          1000
 +
pruning_clustering_cutoff                                    100
 +
pruning_conformer_score_cutoff                              100.0
 +
pruning_conformer_score_scaling_factor                      1.0
 +
use_clash_overlap                                            no
 +
write_growth_tree                                            no
 +
use_internal_energy                                          yes
 +
internal_energy_rep_exp                                      12
 +
internal_energy_cutoff                                      100.0
 +
ligand_atom_file                                            VS_library_5K.mol2
 +
limit_max_ligands                                            no
 +
skip_molecule                                                no
 +
read_mol_solvation                                          no
 +
calculate_rmsd                                              no
 +
use_database_filter                                          no
 +
orient_ligand                                                yes
 +
automated_matching                                          yes
 +
receptor_site_file                                          ../02.spheres/selected_spheres.sph
 +
max_orientations                                            1000
 +
critical_points                                              no
 +
chemical_matching                                            no
 +
use_ligand_spheres                                          no
 +
bump_filter                                                  no
 +
score_molecules                                              yes
 +
contact_score_primary                                        no
 +
contact_score_secondary                                      no
 +
grid_score_primary                                          yes
 +
grid_score_secondary                                        no
 +
grid_score_rep_rad_scale                                    1
 +
grid_score_vdw_scale                                        1
 +
grid_score_es_scale                                          1
 +
grid_score_grid_prefix                                      ../03.grid/grid
 +
multigrid_score_secondary                                    no
 +
dock3.5_score_secondary                                      no
 +
continuous_score_secondary                                  no
 +
footprint_similarity_score_secondary                        no
 +
pharmacophore_score_secondary                                no
 +
descriptor_score_secondary                                  no
 +
gbsa_zou_score_secondary                                    no
 +
gbsa_hawkins_score_secondary                                no
 +
SASA_score_secondary                                        no
 +
amber_score_secondary                                        no
 +
minimize_ligand                                              yes
 +
minimize_anchor                                              yes
 +
minimize_flexible_growth                                    yes
 +
use_advanced_simplex_parameters                              no
 +
simplex_max_cycles                                          1
 +
simplex_score_converge                                      0.1
 +
simplex_cycle_converge                                      1.0
 +
simplex_trans_step                                          1.0
 +
simplex_rot_step                                            0.1
 +
simplex_tors_step                                            10.0
 +
simplex_anchor_max_iterations                                500
 +
simplex_grow_max_iterations                                  500
 +
simplex_grow_tors_premin_iterations                          0
 +
simplex_random_seed                                          0
 +
simplex_restraint_min                                        no
 +
atom_model                                                  all
 +
vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn
 +
flex_defn_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn
 +
flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl
 +
ligand_outfile_prefix                                        virtual.out
 +
write_orientations                                          no
 +
num_scored_conformers                                        1
 +
rank_ligands                                                no
 +
 
 +
Once you have finished doing that, save and exit the file. Now, we could try running the virtual screen similarly to previous steps, but it would be really slow. The best way to do this is to parallelize the tasks as many time as possible, and to do this we are going to use MPI.
  
 
= Virtual Screening Using MPI =  
 
= Virtual Screening Using MPI =  

Revision as of 10:36, 6 March 2022

Introduction =

Required Software

UCSF Chimera, UCSF DOCK 6.9

Chimera

UCSF Chimera is a software which aids the interactive 3D visualization, editing, and analysis of biological macromolecules.

[insert screenshot of chimera]

Chimera will be used in this tutorial to prepare molecules for docking, as well as for simply exploring the structures of molecules of interest.

DOCK

UCSF DOCK is a molecular docking program, used to find the optimal orientation of a ligand binding to a receptor. DOCK can be used for a number of different molecular modeling/design tasks, such as testing binding affinities, virtual screening, and de novo design.

6ME2

6ME2 is the modeled structure of receptor MT1 of melatonin complexed with ramelteon (the ligand). Further details on the structure from the original researchers can be found at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6696938/

Seawulf: Getting Started

Basics of the Command Line/Terminal

There are some basic commands for the command line/terminal that are necessary to know in order to complete this tutorial effectively.

When one first opens a command line window, it will begin in their home directory. To change this directory to, say, Desktop, type:

 cd ~/Desktop

To get to a folder, say folder_1, in Desktop, type

 cd ~/Desktop/folder_1

To list out all of the directories/files contained within this current directory, type

 ls

To change the directory to one folder in the hierarchy above the current, type

 cd ../

to change directories a folder, say folder_2 inside the current directory, type

 cd ./folder_2

To create a file in the current directory, called "file_example", using vim, type

 vi filename_example

This will open up a prompt where one can type in, paste, or edit (if the file already exists in the current directory) the contents of the file. To edit the file once opened, type "i", which will change the environment to "INSERT" mode. "INSERT" is displayed at the bottom of the window when in this mode. To exit "INSERT" mode, press "Escape". If in "INSERT" mode, to save the file, press "Escape" and then type

 :wq

which will save the file and open the terminal window back up to commands, in the current directory. If one types

 ls

the newly created "file_example" should be visible in the outputted list of files.

To move files between directories, say "file_example" is the file of interest, from Directory_1 to Directory_2, type

 mv /Directory_1/file_example /Directory_2/file_example

The name of file_example may also be changed (to "file_new_name") during this move:

 mv /Directory_1/file_example /Directory_2/file_new_name

Note this should be used to rename directories--move the file to the same directory and change the name to rename the file.

To copy files between directories, type the following

 cp /Directory_1/file_example /Directory_2/file_new_name

And one can copy directories as well, by typing

 cp -R /Directory_1/sub_directory /Directory_2/sub_directory_new_name

Note that the "-R" flag specifies to copy over the sub_directory folder and all of its contents from Directory_1 to Directory_2.

To create a new directory, called "new_directory" for example, type

 mkdir new_directory

To delete a file in the current directory, "file_example", type

 rm file_example

To delete a directory called "directory_example", including all of its subdirectories and associated files, type

 rm -r

Note that if one wishes to perform the same action on multiple files at once, this can be done on the same line by separating the filenames with a space. For example to delete the files filename_1 in Directory_1 and filename_2 in Directory_2, type

 rm Directory_1/filename_1 Directory_2/filename_2

Important note: the action of deleting files with rm cannot be undone. Use extreme caution when removing files/directories as there is no Ctrl + Z in the terminal.

Logging in and navigating SeaWulf

SeaWulf is the High Performance Computing (HPC) cluster which researchers and students at Stony Brook University can gain access to use for computationally intensive tasks. Note Stony Brook students often do not automatically have access to SeaWulf, but students can request access from IT or be given access by their labs/instructors. For help getting access to SeaWulf, or questions/problems encountered when using the cluster at any point, submit a ticket to IT at https://it.stonybrook.edu/help/kb/understanding-seawulf.

If one has access to SeaWulf, they can access the cluster by typing the following into their terminal

 ssh -X NETID@login.seawulf.stonybrook.edu

Note that the flag -X enables visualizations when running programs in SeaWulf, and NETID is the user's assigned NETID username, used to sign into Blackboard and all associated Stony Brook University domains. The terminal will then prompt the user for the password associated with the entered NETID, and then will give a list of options to use DUO Authentication to grant access. After cooperating with DUO, the user will be in their SeaWulf home directory.

To navigate to the AMS 536 space within SeaWulf, type

 cd /gpfs/projects/AMS536

To access a user's personal scratch space, which is where users can keep files and test programs (but note all files in this directory will be automatically deleted if not modified in over 30 days), type

 cd /gpfs/scratch/NETID

File transfers to/from SeaWulf

To transfer files between the user's local machine and the SeaWulf cluster, this can be done using scp. For example, to transfer the file "file_1" from the directory Directory_1 on the user's local directory to the user's scratch space on SeaWulf, type the following into the user's local terminal window (not connected to SeaWulf):

 scp Directory_1/file_1 NETID@login.seawulf.stonybrook.edu:/gpfs/scratch/NETID

Likewise, to transfer file_1 from the scratch space in SeaWulf back to a local directory, type the following into the local terminal window:

 scp NETID@login.seawulf.stonybrook.edu:/gpfs/scratch/NETID/file_1 /Directory_1

Chimera: Getting Started

Searching for/Downloading files from the PDB

The RCSB Protein Data Bank is a useful tool for searching for and downloading files containing the 3D structures of biological macromolecules. One can search for a molecule by using an exact 4 letter code (such as "6ME2") or by searching for a short description (such as "melatonin with ramelteon complex"). Note that there may be many molecules that fit a given description, so one may need to scroll through results shown after the search to find the one of interest.

(insert picture of PDB w/ search bar, results below after searching)

After finding the page for the structure of interest, one can download it in PDB or Mol2 (or other) format by selecting the dark blue Download Files drop-down menu to the right of the page. Select the desired format, and the download will begin.

First-steps for visualizations in Chimera

Opening files in Chimera

Once one opens Chimera, they are met with a blue screen, and at the bottom right corner are two options called Browse... and Fetch.... If the structure of interest is downloaded already to a local computer, choose Browse...; it is possible to download the structure from the PDB directly in Chimera using the Fetch... option, but note this requires knowledge of the 4-digit code for the structure listed in the PDB.

Visual configurations

Chimera can be a helpful tool for producing publication-ready visualizations of molecules. When Chimera is first opened, the structure will be drawn against a black screen; if a different appearance is desired, click the Presets tab at the top, where there are a number of options. A recommended preset is Publication 1 (silhouette, rounded ribbons).

Selecting substructures

Note that it is possible to select portions of the current structure. To do so, click the Select tab at the top. Options will become visible to select by chain, residue, structure, and so on. One can also hit Control + Click to interactively select a small portion of the structure. If one presses the "up" arrow on their keyboard, this will select a larger substructure containing the initial selection. Hitting "up" again will select more, until the entire structure is selected. Alternatively, hitting the "down" arrow will go backwards in selection scope, until the initial small selection is restored.

Modifying selected substructures

One of the easiest ways to make a visualization in Chimera more clear is through color editing. If one is dealing with a protein-ligand complex, for example, it may be useful to color the two structures differently to contrast the two. To do this, select one of the structures (this can be done by interactively clicking and hitting the "up" arrow, or more easily by selecting by structure), then, while the substructure is selected, click the Actions tab, click Color, and make a color choice.

If one wishes to isolate specific parts of a structure from the rest, it is necessary to understand how to delete substructures. This can be done by selecting a substructure of interest, clicking "Actions", hovering over Atoms/Bonds, then under the shown options clicking delete at the bottom. This will remove the selected substructure, and cannot be immediately undone, so make sure that parts of the substructure have been colored to make their separation as clear as possible to avoid deleting any parts by accident.

Saving sessions

If one wishes to save an entire session, with all configurations (such as visual orientation, focus, coloring, etc. which cannot be captures by just saving as a Mol2 or PDB file), one can go to File -> Save Session As... and this will save the current configurations.

(put in a picture?)

Organizing File Directories

It is imperative to one's ability to keep and access records, for any purpose, by having a detailed and organized directory list wherein any necessary files may be kept. While following this docking tutorial, it is recommended to have the following directories on both the local machine as well as in the user's designated space on SeaWulf:

 000_files
 001_structure
 002_surface_spheres
 003_gridbox
 004_dock
 005_virtual_screen
 006_virtual_screen_mpi
 007_cartesian_min
 008_rescore

Dock Prep

Initial investigation/preparation

First, download the 6ME2 file from the PDB, using either Mol2 or PDB format. Save this in 000_files. Then, open this structure using Chimera. It should look as follows:

[insert image of PL complex]

Inspect the structure of the complex. Note that other non-standard residues besides ramelteon (code "JEV") are present, and will be deleted. Note that to begin the docking preparation process, all solvent must be removed from the file. There is water present in this file, so go to Select -> Residue -> HOH, and click Actions -> Atoms/Bonds -> sphere then Actions -> Atoms/Bonds -> show, which will show the selected as red spheres, shown below

[insert image of HOH selected]

Next, hit Select -> Residue and select all non-standard residues outside of JEV. While these are selected, go to Actions -> Atoms/Bonds -> delete, leaving only the JEV ligand and receptor. Save this as a Mol2 file called 6me2_noH_noDetergents.mol2 in the 000_files directory.

[insert photo of ligand-protein complex without extra residues]

Isolating the ligand and receptor

To prepare the ligand and receptor for docking, it is necessary to have two separate files--one for the prepared ligand, and one for the prepared receptor, which will be used separately at points.

To obtain the isolated ligand, open up the 6me2_noH_noDetergents.mol2 file, and go to Select -> Structure -> protein. Then, to delete the selected receptor, go to Actions -> Atoms/Bonds -> delete. This will leave only the isolated ligand in the session. Save the isolated ligand as a Mol2 file called 6me2_ligand.mol2 in the 001_dockprep folder. Close the current session.

To isolate the receptor, re-open the 6me2_noH_noDetergents.mol2 file, and go to Select -> Structure -> ligand. Then, to delete the selected ligand, go to Actions -> Atoms/Bonds -> delete, leaving only the isolated receptor in the session. Save the isolated receptor as a Mol2 file called 6me2_receptor.mol2 in the 001_dockprep folder.

Prepping the ligand

(show how to add hydrogens/charges in the isolated ligand)

Prepping the receptor

(show how to add hydrogens/charges to the isolated receptor)

Surface Spheres

Generating DMS of receptor

(show how to get the surface/save as a DMS file)

Creating Surface Spheres

(show how to create the INSPH file in the correct directory, how to run sphgen using the INSPH file, show output).

Selecting Relevant Spheres

(show how to run sphere_selector and create necessary input file)

Generating Box/Grid

Box Generation

(show how to create showbox.in, how to run showbox)

Grid Generation

(show how to create the file grid.in, how to run grid)

Docking

Energy Minimization

Rigid Docking

Fixed Anchor Docking

Flexible Docking

Calculating Molecular Footprint

Virtual Screening

Now that we have finished preparing both the ligand and receptor and verified that the ligand can be reasonably docked back into the receptor, we can now begin with the virtual screen. The idea behind virtual screening is that you can take a library of known molecules, dock them to a receptor, and then score how well those molecules are binding to the receptor. In this tutorial, our library is a file named "VS_library_5K.mol2", which contains 5,000 small molecules for us to dock to 6ME2. Let's begin by generating that library file of 5,000 molecules. Cd into the directory "005_virtual_screen" and copy the "VS_library_25K.mol2" file from "/gpfs/projects/AMS536/zzz.programs" into that directory with the following command:

cp /gpfs/projects/AMS536/zzz.programs/VS_library_25K.mol2 ./

Now we need to truncate this library for the top 5,000 molecules. We can do this with a very simple python script. Vi into "truncate_library.py" and write the following to that file (you can delete the comments, but they will give context to what is happening line-by-line if you don't know):

  1. Keep track of how many molecules have been copied over.

moleculenumber=0

  1. Create a list to hold each line being copied.

libraryfile = []

  1. Open the 25K library file, and number each line so it can be read line-by-line.

read_file = open("VS_library_25K.mol2") lines = read_file.readlines()

  1. Add each individual line to the list.

for i, line in enumerate(lines):

       libraryfile.append(line)
  1. Create the new library file so that molecules can be written to it.

with open("VS_library_5K.mol2", "w") as new_f:

  1. Copy over each line in the list for the first 5,000 molecules
       for d in libraryfile:
               new_f.write(d)
               if "MOLECULE" in d:
                       moleculenumber += 1
               if moleculenumber == 5001:
                       break

Once you have finished doing that, save and exit the file. To run this, type the following command into the terminal:

python truncate_library.py

Once you have your library of molecules, the next thing we want to do is write the input file. Vi into virtual.in and write the following to that file:

conformer_search_type flex write_fragment_libraries no user_specified_anchor no limit_max_anchors no min_anchor_size 5 pruning_use_clustering yes pruning_max_orients 1000 pruning_clustering_cutoff 100 pruning_conformer_score_cutoff 100.0 pruning_conformer_score_scaling_factor 1.0 use_clash_overlap no write_growth_tree no use_internal_energy yes internal_energy_rep_exp 12 internal_energy_cutoff 100.0 ligand_atom_file VS_library_5K.mol2 limit_max_ligands no skip_molecule no read_mol_solvation no calculate_rmsd no use_database_filter no orient_ligand yes automated_matching yes receptor_site_file ../02.spheres/selected_spheres.sph max_orientations 1000 critical_points no chemical_matching no use_ligand_spheres no bump_filter no score_molecules yes contact_score_primary no contact_score_secondary no grid_score_primary yes grid_score_secondary no grid_score_rep_rad_scale 1 grid_score_vdw_scale 1 grid_score_es_scale 1 grid_score_grid_prefix ../03.grid/grid multigrid_score_secondary no dock3.5_score_secondary no continuous_score_secondary no footprint_similarity_score_secondary no pharmacophore_score_secondary no descriptor_score_secondary no gbsa_zou_score_secondary no gbsa_hawkins_score_secondary no SASA_score_secondary no amber_score_secondary no minimize_ligand yes minimize_anchor yes minimize_flexible_growth yes use_advanced_simplex_parameters no simplex_max_cycles 1 simplex_score_converge 0.1 simplex_cycle_converge 1.0 simplex_trans_step 1.0 simplex_rot_step 0.1 simplex_tors_step 10.0 simplex_anchor_max_iterations 500 simplex_grow_max_iterations 500 simplex_grow_tors_premin_iterations 0 simplex_random_seed 0 simplex_restraint_min no atom_model all vdw_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/vdw_AMBER_parm99.defn flex_defn_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex.defn flex_drive_file /gpfs/projects/AMS536/zzz.programs/dock6.9_release/parameters/flex_drive.tbl ligand_outfile_prefix virtual.out write_orientations no num_scored_conformers 1 rank_ligands no

Once you have finished doing that, save and exit the file. Now, we could try running the virtual screen similarly to previous steps, but it would be really slow. The best way to do this is to parallelize the tasks as many time as possible, and to do this we are going to use MPI.

Virtual Screening Using MPI

Cartesian Minimization of Docked Molecules

Rescoring Docked Molecules