Difference between revisions of "2021 DOCK tutorial 2 with PDBID 2ZD1"

From Rizzo_Lab
Jump to: navigation, search
(Directory and File Setup)
(Directory and File Setup)
 
(55 intermediate revisions by the same user not shown)
Line 4: Line 4:
 
This tutorial will guide the student in performing a structure-based virtual screen of a large number of small molecule compounds (ligands) to assess the effectiveness of each in binding to a protein drug target (receptor) through the use of molecular docking. The student will additionally learn how to use tools to visualize and manipulate the components of a receptor-ligand complex, along with the results (docked ligand poses) of the virtual screen.
 
This tutorial will guide the student in performing a structure-based virtual screen of a large number of small molecule compounds (ligands) to assess the effectiveness of each in binding to a protein drug target (receptor) through the use of molecular docking. The student will additionally learn how to use tools to visualize and manipulate the components of a receptor-ligand complex, along with the results (docked ligand poses) of the virtual screen.
  
The techniques, computational tools, and biological system used in this tutorial are detailed below:
+
The techniques, computer-based tools, and biological system used in this tutorial are detailed below.
  
 
== '''Techniques''' ==
 
== '''Techniques''' ==
  
=== '''Virtual Screening''' ===
+
=== '''Structure-Based Virtual Screening''' ===
  
vf
+
A computational method used in drug discovery that evaluates a pre-defined set of compounds for relative likelihood of binding to a potential drug target, such as a protein receptor. This technique aids researchers in narrowing down a large set of small molecule drug candidates into a relatively smaller number of leads which can be further refined or purchased for subsequent <i>in vitro</i> or <i>in vivo</i> studies.
  
 +
The structure-based flavor of virtual screening utilizes molecular docking to "fit" each ligand to a particular site on the drug target. One or more scoring functions subsequently rank each ligand with respect to predicted binding affinity.
  
 +
'''Background''': Maia, E.H.B., Assis, L.C., Olivera, T.A. et al. Structure-Based Virtual Screening: From Classical to Artificial Intelligence. Front Chem 8, 343 (2020). https://doi.org/10.3389/fchem.2020.00343
  
  
 
=== '''Molecular Docking''' ===
 
=== '''Molecular Docking''' ===
  
https://ringo.ams.stonybrook.edu/~rizzo/StonyBrook/teaching/AMS532_AMS535_AMS536/References/rizzo014.pdf
+
In the context of computer-aided drug design, a technique that computationally samples the interaction states between ligands and drug targets (usually protein receptors), both geometrically and energetically. Given variable degrees of flexibility conferred to the ligand and receptor by the parameters of the simulation, the most favorable conformation of each ligand is identified through the use of different scoring functions, which are traditionally based on energies calculated from molecular mechanics force fields.
  
The DOCKING program specifically computes the interaction energy
+
Docking algorithms can be broadly classified as flexible or rigid. A rigid docking algorithm begins with a fully-formed ligand and allows for sampling of the rigid placement of the given experimental pose in the binding site of the receptor while varying the translational and rotational degrees of freedom of the whole ligand within the three spatial dimensions. Internal angle rotational degrees of freedom are not explicitly sampled with basic rigid docking.
between the receptor and candidate ligand at different orientations. The ligand
 
with the highest activity (usually the ideal candidate for a synthetic ligand or drug)
 
would have the lowest computed energy.  
 
  
 +
Traditional flexible docking, or fixed anchor docking (FAD) starts with a ligand scaffold, which is usually the largest substructure in a ligand, identified as such after the molecule is divided into substructures at its rotatable bonds. By a chosen method, such as Monte Carlo sampling or simulated annealing, multiple poses of this “anchor” substructure are then generated within the receptor binding pocket and scored. The next substructure’s layers of atoms are then added to the most favorable subset of initial anchor poses, and the process repeats until all the molecule is fully rebuilt within the receptor. This on-the-fly flexible conformer growth and minimization process is known as “anchor and grow.”
  
Docking algorithms can be broadly classified as flexible (anchor and grow) or
+
A second form of flexible docking, know as "flex docking," allows sampling of all internal degrees of freedom of the ligand. This is considered the most accurate form of docking, with the least chance of "missing out" on an optimal ligand pose. For each of the docking methods mentioned above, although increased ligand flexibility provides for broader conformational search and potentially more realistic poses, computational cost also increases.
rigid. A rigid docking algorithm begins with a fully-formed ligand (whose structure
 
is obtained from experiment) and allows for sampling of the rigid placement of
 
the given experimental pose in the binding site of the receptor while varying the
 
translational and rotational degrees of freedom of the whole ligand within the
 
three spatial dimensions. Internal angle rotational degrees of freedom are not
 
explicitly sampled with basic rigid docking.
 
  
 +
In this tutorial, we will perform all three docking techniques listed above.
  
 +
'''Background''':
 +
<ul>
 +
<li>Fan, J., Fu, A. & Zhang, L. Progress in molecular docking. Quant Biol 7, 83–89 (2019). https://doi.org/10.1007/s40484-019-0172-y
 +
<li>Meng, Xuan-Yu et al. Molecular docking: a powerful approach for structure-based drug discovery. Current computer-aided drug design vol. 7,2 (2011): 146-57. https://doi:10.2174/157340911795677602
 +
</ul>
 +
<p></p>
 +
<p></p>
  
 +
== '''Computer-Based Tools''' ==
  
Traditional flexible docking starts with a ligand scaffold, which is usually the
+
=== '''Protein DataBank (PDB)''' ===
largest substructure in a ligand, identified as such after the molecule is divided
 
into substructures at its rotatable bonds. By a chosen method, such as Monte
 
Carlo sampling or simulated annealing, multiple poses of this “anchor”
 
substructure are then generated within the receptor binding pocket and scored.
 
The next substructure’s layers of atoms are then added to the most favorable
 
subset of initial anchor poses, and the process repeats until all the molecule is
 
fully rebuilt within the receptor. This on-the-fly flexible conformer growth and
 
minimization process is known as “anchor and grow.”
 
  
 +
A publicly-accessible database that houses downloadable 3-D structural information of proteins and other large biomolecules obtained mainly from X-ray crystallography and NMR experiments. This information is provided in a format (.pdb file) that is easily manipulated by molecular visualization and modeling software. We will use the PDB to learn about and obtain the structure data for the reference protein-ligand complex used in this tutorial.
  
 +
'''Organization Home''': https://www.wwpdb.org/
  
<i>Conformal space search</i>
+
'''Search Home''': https://www.rcsb.org/
  
A sequence of complexes of receptors and ligands in specific poses are constructed for subsequent evaluation by a set of scoring functions.
 
  
 +
=== '''UCSF Chimera''' ===
  
 +
<i>Version 1.15 for Windows is used in this tutorial.</i>
  
 +
A computer program that enables visualization and manipulation of molecules using structural data. We will use this tool to prepare our system for the virtual screen and to perform important visual verifications and observations.
  
 +
'''Information / Download''': https://www.cgl.ucsf.edu/chimera/
  
  
 +
=== '''Seawulf Computational Cluster''' ===
  
 +
A high performance computing (HPC) cluster located on the Stony Brook University campus containing 164 compute nodes with up to 40 CPU cores per node. We will perform our molecular docking calculations on this cluster, and in some instances, take advantage of its multiple cores by performing these calculations in parallel.
  
<i>Scoring</i>
+
'''Background''': https://it.stonybrook.edu/help/kb/understanding-seawulf
  
zzz
 
  
 +
=== '''GNU/Linux''' ===
  
 +
<i>CentOS Linux release 7.8.2003 (Core) with bash shell is used in this tutorial.</i>
  
 +
The command-line operating system used to interact with Seawulf. Practice with this environment is highly recommended for those unfamiliar with it before beginning this tutorial.
  
 +
'''Tutorials''': See the "Basic Linux Tools" section of https://ringo.ams.stonybrook.edu/index.php/Rizzo_Lab_Information_and_Tutorials
  
  
=== '''Docking Assessment''' ===
+
=== '''Vi / Vim''' ===
  
zzz
+
<i>Vim version 7.4 is used in this tutorial.</i>
  
== '''Computational Tools''' ==
+
The command-line text editor that is used to create and manipulate the various files needed to perform the virtual screen. As with GNU/Linux, practice with Vim is highly recommended before beginning this tutorial.
  
=== '''Protein DataBank (PDB)''' ===
+
'''Primer''': https://ringo.ams.stonybrook.edu/index.php/Vi
  
A publicly-accessible database that houses downloadable 3-D structural information of proteins and other large biomolecules obtained mainly from X-ray crystallography and NMR experiments. This information is provided in a format that is easily manipulated by molecular visualization and modeling software. We will use the PDB to learn about and obtain the structure data for the reference protein-ligand complex used in this tutorial.
 
  
Organization Home: https://www.wwpdb.org/
+
=== '''DOCK''' ===
  
Search Home: https://www.rcsb.org/
+
<i>Version 6.9 is used in this tutorial.</i>
  
=== '''UCSF Chimera''' ===
+
A computer program that performs molecular docking to predict favorable ligand binding geometries and interactions with a receptor. The program includes several scoring functions to assess the relative ranking of ligands and poses. The functions of DOCK are diverse; a primary use is virtual screening (the subject of this tutorial) of large numbers of molecules obtained from a library or database.
  
<i>Version 1.15 for Windows used in this tutorial.</i>
+
'''Background''':
 +
<ul>
 +
<li>Ewing, T.J., Makino, S., Skillman, A.G. et al. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J Comput Aided Mol Des 15, 411–428 (2001). https://doi.org/10.1023/A:1011115820450
 +
<li>Moustakas, D.T., Lang, P.T., Pegg, S. et al. Development and validation of a modular, extensible docking program: DOCK 5. J Comput Aided Mol Des 20, 601–619 (2006). https://doi.org/10.1007/s10822-006-9060-4
 +
</ul>
  
 +
'''Dock 6.9 Users Manual''': http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm
  
 +
== '''2ZD1: Crystal Structure of HIV-1 Reverse Transcriptase in Complex with Rilpivirine''' ==
  
https://www.cgl.ucsf.edu/chimera/
+
In this tutorial, we will use the complex of the HIV-1 reverse transcriptase (RT) receptor with the TMC278 (Rilpivirine) ligand as the basis for our virtual screen. HIV-1 RT is an enzyme used by the HIV-1 virus to produce DNA from its viral RNA template, a process that is essential for replication of the virus within the host. The DNA generated by the reverse transcription process is integrated into the genome of the host and replicates along with endogenous DNA, creating the starting material for the construction of new virions. TMC278 is a diarylpyrimidine (DAPY) nonnucleoside reverse transcriptase inhibitor (NNRTI) that binds to wild type and various mutant HIV-1 RT receptors and is highly successful in  blocking their function. '''2ZD1''' is the PDB code for this complex, with structural data obtained from X-ray diffraction experiments.
 
 
=== '''Seawulf''' ===
 
 
 
https://it.stonybrook.edu/help/kb/understanding-seawulf
 
 
 
 
 
=== '''DOCK''' ===
 
 
 
<i>Version 6.9 used in this tutorial.</i>
 
  
is one of the many tools available to computational biologists that predicts ligand binding geometries and interactions. The functions of DOCK 6.9 are diverse and have several general applications. A primary use of the program involves a virtual screening of thousands of molecules for an intended purpose. These purposes can include database screenings for molecules that inhibit enzyme activity, bind a particular protein, or even bind to larger complexes. As more versions of the program are released, new features are added such as the inclusion of solvation and receptor flexibility considerations in its calculations.
+
'''PDB Information''': https://www.rcsb.org/structure/2zd1
 
 
== '''2ZD1: Crystal Structure of HIV-1 Reverse Transcriptase in Complex with Rilpivirine''' ==
 
 
 
, A Non-nucleoside RT Inhibitor
 
'''2ZD1''' is the PDB code for the catalytic complex between human HMG-CoA reductase (HMGR) and Simvastatin. HMGR is considered a rate-controlling enzyme in the metabolic pathway responsible for the biosynthesis of cholesterol. Inhibitors of HMGR, known as statins, are often prescribed as treatment therapies for high cholesterol patients. While statins inhibit the catalytic effect of HMGR, they also provide other positive biochemical effects such as the stimulation of bone growth and anti-inflammatory responses. Studying statin binding using this complex can potentially aid in the discovery of drugs capable of producing these off-target effects.
 
  
 
= '''Directory and File Setup''' =
 
= '''Directory and File Setup''' =
Before beginning the docking procedure, we will create a set of directories to store the various files we will be generating in an organized manner. We will also download the initial PDB file for the 2ZD1 complex from the RCSB PDB.
+
Before beginning the virtual screen/molecular docking process, we will create a set of directories to store the various files we will be generating in an organized manner. We will also download the initial .pdb file for the 2ZD1 complex from the RCSB PDB.
  
 
Notes:
 
Notes:
Line 115: Line 111:
 
<li><i>Italics</i> are used in directory and file names to signify terms that may differ for each student.
 
<li><i>Italics</i> are used in directory and file names to signify terms that may differ for each student.
 
</ul>
 
</ul>
 +
 +
Procedure:
  
 
<ol>
 
<ol>
Line 121: Line 119:
 
<p><code> cd /gpfs/projects/AMS536/<i>year</i>/students/<i>name</i> </code></p>
 
<p><code> cd /gpfs/projects/AMS536/<i>year</i>/students/<i>name</i> </code></p>
 
<li>Create a directory to store all files used and generated in this tutorial:
 
<li>Create a directory to store all files used and generated in this tutorial:
<p><code> mkdir dock_vs_<i>2ZD1</i></code></p>
+
<p><code> mkdir <i>2zd1</i>_dock_vs</code></p>
 
<li>Navigate into this new directory:
 
<li>Navigate into this new directory:
<p><code> cd dock_vs_<i>2ZD1</i></code></p>
+
<p><code> cd <i>2zd1</i>_dock_vs</code></p>
<li>Create all directories required for this tutorial:
+
<li>Create all subdirectories required for this tutorial:
<p><code> mkdir 01_structure&nbsp;&nbsp;&nbsp;02_surface_spheres&nbsp;&nbsp;&nbsp;03_grid_box&nbsp;&nbsp;&nbsp;04_dock&nbsp;&nbsp;&nbsp;05_virtual_screen&nbsp;&nbsp;&nbsp;06_virtual_screen_mpi&nbsp;&nbsp;&nbsp;07_cartesian_min&nbsp;&nbsp;&nbsp;08_rescore </code></p>
+
<p><code> mkdir 01_structure&nbsp;02_surface_spheres&nbsp;03_grid_box&nbsp;04_dock&nbsp;05_virtual_screen&nbsp;06_virtual_screen_mpi&nbsp;07_cartesian_min&nbsp;08_rescore</code></p>
<li> Download the PDB file to a local directory  
+
<li> Download the <i>2zd1</i>.pdb file from the PDB to a local directory  
<li> Copy the PDB file to <code>01_structure</code> using <code>scp</code> or <code>rsync</code>
+
<li> Copy the PDB file to <code>01_structure/</code> using <code>scp</code> or <code>rsync</code>
 
</ol>
 
</ol>
  
Line 134: Line 132:
 
<li>Confirm the existence of all newly-created directories by executing the <code>ls</code> command
 
<li>Confirm the existence of all newly-created directories by executing the <code>ls</code> command
 
<li>Set up a dedicated local directory to store files that will be utilized in Chimera
 
<li>Set up a dedicated local directory to store files that will be utilized in Chimera
<li>Set up an alias in <code>.bashrc</code> for a <code>scp</code> or <code>rsync</code> command that will allow for easy transfer of files between the relevant local and Seawulf directories
+
<li>Set up an alias in <code>.bashrc</code> for an <code>scp</code> or <code>rsync</code> command that will allow for easy transfer of files between the relevant local and remote Seawulf directories
 
</ul>
 
</ul>
  
 
= '''Receptor and Ligand Preparation''' =
 
= '''Receptor and Ligand Preparation''' =
We will extract the individual receptor and ligand structure from our PDB file and add hydrogens and charges as appropriate.  
+
We will extract the individual receptor and ligand structures from our .pdb file and add hydrogens and charges to each as appropriate.  
 
== '''Structure Verification''' ==
 
== '''Structure Verification''' ==
 
We will verify that the protein structure downloaded from the PDB aligns with the description in the submitted experimental paper.
 
We will verify that the protein structure downloaded from the PDB aligns with the description in the submitted experimental paper.
Line 144: Line 142:
 
<li> Open the Chimera application.   
 
<li> Open the Chimera application.   
 
<li> Open the PDB file in Chimera:  
 
<li> Open the PDB file in Chimera:  
<p><code> File -> Open, <i>navigate to PDB file and click</i>Open </code></p>
+
<p><code> File -> Open, <i>navigate to PDB file and click</i>&nbsp;Open</code></p>
You should now be able to visualize the complex as shown below  
+
You should now be able to visualize the complex as shown below
 +
</ol>
 
[[File:2zd1 rec ligand.png|thumb|center|800px]]
 
[[File:2zd1 rec ligand.png|thumb|center|800px]]
 +
 +
== '''Receptor Preparation''' ==
  
 
<li> Begin to prepare the receptor by deleting all non-receptor atoms, including those in water molecules, salts, and the ligand...
 
<li> Begin to prepare the receptor by deleting all non-receptor atoms, including those in water molecules, salts, and the ligand...
Line 155: Line 156:
 
<code>Actions -> Atoms/Bonds -> Delete </code></p>
 
<code>Actions -> Atoms/Bonds -> Delete </code></p>
 
<li> Save the prepared receptor locally in .mol2 format.  
 
<li> Save the prepared receptor locally in .mol2 format.  
<p><code> File -> Save mol2... -> "2ZD1_rec_nH.mol2" </code></p>
+
<p><code> File -> Save mol2... -> "<i>2zd1</i>_rec_nH.mol2" </code></p>
 
<li> Copy the mol2 file to the <code>01_structure</code> directory using <code>scp</code> or <code>rsync</code>
 
<li> Copy the mol2 file to the <code>01_structure</code> directory using <code>scp</code> or <code>rsync</code>
 
</ol>
 
</ol>
Line 165: Line 166:
 
<p><code> Select -> Residue -> All-nonstandard
 
<p><code> Select -> Residue -> All-nonstandard
 
Actions -> Atoms/Bonds -> Delete. </code>
 
Actions -> Atoms/Bonds -> Delete. </code>
<li> Save the prepared receptor locally in mol2 format.  
+
<li> Save the prepared receptor locally in .mol2 format.  
<p><code> File -> Save Mol2... -> "<i>2ZD1</i>_rec_nH.mol2"
+
<p><code> File -> Save Mol2... -> "<i>2zd1</i>_rec_nH.mol2"
 
<li> Copy the mol2 file to <code> 01_structure</code> using <code> scp or rsync</code>
 
<li> Copy the mol2 file to <code> 01_structure</code> using <code> scp or rsync</code>
 
</ol>
 
</ol>
  
= '''test''' =
+
== '''Ligand Preparation''' ==
 +
 
 +
= '''Surface Sphere Generation and Selection''' =
 +
 
 +
== '''File Preparation''' ==
 +
 
 +
== '''Sphere Generation''' ==
 +
 
 +
== '''Sphere Selection''' ==
 +
 
 +
= '''Cutoff Box and Energy Grid Generation''' =
 +
 
 +
== '''Cutoff Box Generation''' ==
 +
 
 +
== '''Grid Generation''' ==
 +
 
 +
= '''Single-Molecule Docking: Pose Reproduction''' =
 +
 
 +
== '''Energy Minimization''' ==
 +
 
 +
== '''Rigid Docking''' ==
 +
 
 +
== '''Fixed Anchor Docking''' ==
 +
 
 +
== '''Flex Docking''' ==
 +
 
 +
= '''Virtual Screening''' =
 +
 
 +
== '''Test Run''' ==
 +
 
 +
== '''Parallel Virtual Screening''' ==
 +
 
 +
== '''Cartesian Minimization''' ==
 +
 
 +
== '''Cartesian Minimization''' ==

Latest revision as of 23:25, 5 April 2021

Introduction

Learning Goals for this Tutorial

This tutorial will guide the student in performing a structure-based virtual screen of a large number of small molecule compounds (ligands) to assess the effectiveness of each in binding to a protein drug target (receptor) through the use of molecular docking. The student will additionally learn how to use tools to visualize and manipulate the components of a receptor-ligand complex, along with the results (docked ligand poses) of the virtual screen.

The techniques, computer-based tools, and biological system used in this tutorial are detailed below.

Techniques

Structure-Based Virtual Screening

A computational method used in drug discovery that evaluates a pre-defined set of compounds for relative likelihood of binding to a potential drug target, such as a protein receptor. This technique aids researchers in narrowing down a large set of small molecule drug candidates into a relatively smaller number of leads which can be further refined or purchased for subsequent in vitro or in vivo studies.

The structure-based flavor of virtual screening utilizes molecular docking to "fit" each ligand to a particular site on the drug target. One or more scoring functions subsequently rank each ligand with respect to predicted binding affinity.

Background: Maia, E.H.B., Assis, L.C., Olivera, T.A. et al. Structure-Based Virtual Screening: From Classical to Artificial Intelligence. Front Chem 8, 343 (2020). https://doi.org/10.3389/fchem.2020.00343


Molecular Docking

In the context of computer-aided drug design, a technique that computationally samples the interaction states between ligands and drug targets (usually protein receptors), both geometrically and energetically. Given variable degrees of flexibility conferred to the ligand and receptor by the parameters of the simulation, the most favorable conformation of each ligand is identified through the use of different scoring functions, which are traditionally based on energies calculated from molecular mechanics force fields.

Docking algorithms can be broadly classified as flexible or rigid. A rigid docking algorithm begins with a fully-formed ligand and allows for sampling of the rigid placement of the given experimental pose in the binding site of the receptor while varying the translational and rotational degrees of freedom of the whole ligand within the three spatial dimensions. Internal angle rotational degrees of freedom are not explicitly sampled with basic rigid docking.

Traditional flexible docking, or fixed anchor docking (FAD) starts with a ligand scaffold, which is usually the largest substructure in a ligand, identified as such after the molecule is divided into substructures at its rotatable bonds. By a chosen method, such as Monte Carlo sampling or simulated annealing, multiple poses of this “anchor” substructure are then generated within the receptor binding pocket and scored. The next substructure’s layers of atoms are then added to the most favorable subset of initial anchor poses, and the process repeats until all the molecule is fully rebuilt within the receptor. This on-the-fly flexible conformer growth and minimization process is known as “anchor and grow.”

A second form of flexible docking, know as "flex docking," allows sampling of all internal degrees of freedom of the ligand. This is considered the most accurate form of docking, with the least chance of "missing out" on an optimal ligand pose. For each of the docking methods mentioned above, although increased ligand flexibility provides for broader conformational search and potentially more realistic poses, computational cost also increases.

In this tutorial, we will perform all three docking techniques listed above.

Background:

Computer-Based Tools

Protein DataBank (PDB)

A publicly-accessible database that houses downloadable 3-D structural information of proteins and other large biomolecules obtained mainly from X-ray crystallography and NMR experiments. This information is provided in a format (.pdb file) that is easily manipulated by molecular visualization and modeling software. We will use the PDB to learn about and obtain the structure data for the reference protein-ligand complex used in this tutorial.

Organization Home: https://www.wwpdb.org/

Search Home: https://www.rcsb.org/


UCSF Chimera

Version 1.15 for Windows is used in this tutorial.

A computer program that enables visualization and manipulation of molecules using structural data. We will use this tool to prepare our system for the virtual screen and to perform important visual verifications and observations.

Information / Download: https://www.cgl.ucsf.edu/chimera/


Seawulf Computational Cluster

A high performance computing (HPC) cluster located on the Stony Brook University campus containing 164 compute nodes with up to 40 CPU cores per node. We will perform our molecular docking calculations on this cluster, and in some instances, take advantage of its multiple cores by performing these calculations in parallel.

Background: https://it.stonybrook.edu/help/kb/understanding-seawulf


GNU/Linux

CentOS Linux release 7.8.2003 (Core) with bash shell is used in this tutorial.

The command-line operating system used to interact with Seawulf. Practice with this environment is highly recommended for those unfamiliar with it before beginning this tutorial.

Tutorials: See the "Basic Linux Tools" section of https://ringo.ams.stonybrook.edu/index.php/Rizzo_Lab_Information_and_Tutorials


Vi / Vim

Vim version 7.4 is used in this tutorial.

The command-line text editor that is used to create and manipulate the various files needed to perform the virtual screen. As with GNU/Linux, practice with Vim is highly recommended before beginning this tutorial.

Primer: https://ringo.ams.stonybrook.edu/index.php/Vi


DOCK

Version 6.9 is used in this tutorial.

A computer program that performs molecular docking to predict favorable ligand binding geometries and interactions with a receptor. The program includes several scoring functions to assess the relative ranking of ligands and poses. The functions of DOCK are diverse; a primary use is virtual screening (the subject of this tutorial) of large numbers of molecules obtained from a library or database.

Background:

  • Ewing, T.J., Makino, S., Skillman, A.G. et al. DOCK 4.0: Search strategies for automated molecular docking of flexible molecule databases. J Comput Aided Mol Des 15, 411–428 (2001). https://doi.org/10.1023/A:1011115820450
  • Moustakas, D.T., Lang, P.T., Pegg, S. et al. Development and validation of a modular, extensible docking program: DOCK 5. J Comput Aided Mol Des 20, 601–619 (2006). https://doi.org/10.1007/s10822-006-9060-4

Dock 6.9 Users Manual: http://dock.compbio.ucsf.edu/DOCK_6/dock6_manual.htm

2ZD1: Crystal Structure of HIV-1 Reverse Transcriptase in Complex with Rilpivirine

In this tutorial, we will use the complex of the HIV-1 reverse transcriptase (RT) receptor with the TMC278 (Rilpivirine) ligand as the basis for our virtual screen. HIV-1 RT is an enzyme used by the HIV-1 virus to produce DNA from its viral RNA template, a process that is essential for replication of the virus within the host. The DNA generated by the reverse transcription process is integrated into the genome of the host and replicates along with endogenous DNA, creating the starting material for the construction of new virions. TMC278 is a diarylpyrimidine (DAPY) nonnucleoside reverse transcriptase inhibitor (NNRTI) that binds to wild type and various mutant HIV-1 RT receptors and is highly successful in blocking their function. 2ZD1 is the PDB code for this complex, with structural data obtained from X-ray diffraction experiments.

PDB Information: https://www.rcsb.org/structure/2zd1

Directory and File Setup

Before beginning the virtual screen/molecular docking process, we will create a set of directories to store the various files we will be generating in an organized manner. We will also download the initial .pdb file for the 2ZD1 complex from the RCSB PDB.

Notes:

  • The directory and file nomenclature used throughout the tutorial is not required but is recommended for most efficient use of subsequently-provided commands and scripts.
  • Italics are used in directory and file names to signify terms that may differ for each student.

Procedure:

  1. Log in to Seawulf
  2. Navigate to your personal student directory for AMS 536:

    cd /gpfs/projects/AMS536/year/students/name

  3. Create a directory to store all files used and generated in this tutorial:

    mkdir 2zd1_dock_vs

  4. Navigate into this new directory:

    cd 2zd1_dock_vs

  5. Create all subdirectories required for this tutorial:

    mkdir 01_structure 02_surface_spheres 03_grid_box 04_dock 05_virtual_screen 06_virtual_screen_mpi 07_cartesian_min 08_rescore

  6. Download the 2zd1.pdb file from the PDB to a local directory
  7. Copy the PDB file to 01_structure/ using scp or rsync

Recommendations:

  • Confirm the existence of all newly-created directories by executing the ls command
  • Set up a dedicated local directory to store files that will be utilized in Chimera
  • Set up an alias in .bashrc for an scp or rsync command that will allow for easy transfer of files between the relevant local and remote Seawulf directories

Receptor and Ligand Preparation

We will extract the individual receptor and ligand structures from our .pdb file and add hydrogens and charges to each as appropriate.

Structure Verification

We will verify that the protein structure downloaded from the PDB aligns with the description in the submitted experimental paper.

  1. Open the Chimera application.
  2. Open the PDB file in Chimera:

    File -> Open, navigate to PDB file and click Open

    You should now be able to visualize the complex as shown below

2zd1 rec ligand.png

Receptor Preparation

  • Begin to prepare the receptor by deleting all non-receptor atoms, including those in water molecules, salts, and the ligand...

    Select -> Residue -> all nonstandard
    Actions -> Atoms/Bonds -> Delete

    ...and any full receptor chains that are not of interest in the virtual screen and/or do not coordinate with the ligand.

    Select -> Chain -> B
    Actions -> Atoms/Bonds -> Delete

  • Save the prepared receptor locally in .mol2 format.

    File -> Save mol2... -> "2zd1_rec_nH.mol2"

  • Copy the mol2 file to the 01_structure directory using scp or rsync </ol>

    Note that the file we have just created does not contain any hydrogen atoms. We will now generate a second receptor file with hydrogens and charge added to the molecule.

  • Begin to prepare the receptor by deleting all the non-receptor atoms. This also includes delete any chains in the receptor that are not of interest and or do not coordinate with the ligand.

    Select -> Chain -> (Undesired chain) Actions -> Atoms/Bonds -> Delete. <p> Select -> Residue -> All-nonstandard Actions -> Atoms/Bonds -> Delete.

  • Save the prepared receptor locally in .mol2 format. <p> File -> Save Mol2... -> "2zd1_rec_nH.mol2"
  • Copy the mol2 file to <code> 01_structure using scp or rsync </ol>

    Ligand Preparation

    Surface Sphere Generation and Selection

    File Preparation

    Sphere Generation

    Sphere Selection

    Cutoff Box and Energy Grid Generation

    Cutoff Box Generation

    Grid Generation

    Single-Molecule Docking: Pose Reproduction

    Energy Minimization

    Rigid Docking

    Fixed Anchor Docking

    Flex Docking

    Virtual Screening

    Test Run

    Parallel Virtual Screening

    Cartesian Minimization

    == Cartesian Minimization ==