Virtual Screening Protocol

From Rizzo_Lab
Revision as of 11:16, 30 June 2010 by Pholden (talk | contribs) (Preparation of Reference Molecule for Footprint Rescoring)
Jump to: navigation, search

In this document, the current Rizzo group protocol will be described in detail. This protocol has be through successive iterations and has be used to select compounds for Flu and HIVgp41 (n=112).

ZINC Database

ZINC is a free, online database of commercially-available compounds. In June 2009, the "big" vendor libraries at a pH of 7 were downloaded and processed into chunks of approximately 100,000 molecules each. The ZINC molecules come with AMSOL charges already determined.

These chunks are then processed through MOE and sorted by rotatable bonds in ascending order. These processed chunks are available on BG, path=~pholden/RCR/projects_ZINC8/ZINC8, or on ringo, path=/media/sdb1/pholden/ZINC8.descending.rot.bonds. The lab is currently using the ChemDiv library for its virtual screen.

Downloading and Preparation of Receptor

Preparation of Reference Molecule for Footprint Rescoring

The reference molecule is the molecule whose footprint guides the selection of molecules during footprint rescoring. The reference molecule will most likely be a native substrate or known inhibitor. The molecule should be prepared with charges unless it already has charges. These charges can be applied using MOE or antechamber (AmberTools). The charge model selected is not of great importance, as per Sudipto's testset paper.

Once the molecule is prepared, it should be minimized in the receptor using a tethered minimization.

An example input file:

ligand_atom_file                                             ligand.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               yes
use_rmsd_reference_mol                                       no
use_database_filter                                          no
orient_ligand                                                no
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
flexible_ligand                                              no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
contact_score_secondary                                      no
grid_score_primary                                           no
grid_score_secondary                                         no
dock3.5_score_primary                                        no
dock3.5_score_secondary                                      no
continuous_score_primary                                     yes
continuous_score_secondary                                   no
cont_score_rec_filename                                      receptor.mol2
cont_score_att_exp                                           6
cont_score_rep_exp                                           12
cont_score_rep_rad_scale                                     1
cont_score_use_dist_dep_dielectric                           yes
cont_score_dielectric                                        4.0
cont_score_vdw_scale                                         1
cont_score_es_scale                                          1
gbsa_zou_score_secondary                                     no
gbsa_hawkins_score_secondary                                 no
amber_score_secondary                                        no
minimize_ligand                                              yes
simplex_max_iterations                                       1000
simplex_tors_premin_iterations                               0
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1.0
simplex_trans_step                                           1.0
simplex_rot_step                                             0.1
simplex_tors_step                                            10.0
simplex_random_seed                                          0
simplex_restraint_min                                        yes
simplex_restraint_coefficient                                10.0
atom_model                                                   all
vdw_defn_file                                                /sbhome0/sudipto/RCR/projects_BNL/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               /sbhome0/sudipto/RCR/projects_BNL/parameters/flex.defn
flex_drive_file                                              /sbhome0/sudipto/RCR/projects_BNL/parameters/flex_drive.tbl
ligand_outfile_prefix                                        xtalmin
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no


The version of DOCK currently being used for virtual screening is dock6_09-09-08.footprint. The most up to date script is Trent's modifications, which include the database filter. The database filter removes molecules with charges greater than +2 and less than -2, and also molecules with greater than 15 rotatable bonds.

The script also takes the processed chunks from ZINC and splits them into two subsections: the first 60,000 and the remainder. Because the sets are sorted by rotatable bonds, the first 60,000 should dock in fairly quickly, and be completed within the allowable wallclock limit. The second set, or molecule 60,001 and beyond, will have higher numbers of rotatable bonds, and take exponentially more time for each molecule. As per Sudipto's testset paper, we also have less confidence in the molecules with more rotatable bonds. If this second job does not finish, DO NOT restart it. The molecules that did not dock in will have higher numbers of rotatable bonds (should this be revised in light of the database filtering?)

Note that the values for the virtual screen are the default values.

Path for script: ~balius/RCR/projects_BG/screening/run.dock2grid.max_lig_tebmod.csh

Minimization of Poses

In this step, the two multimol2s generated from the virtual screen step are recombined and minimized on the continuous receptor. This step removes artifacts from the lower resolution grid, generally resolving clashes with sidechains. The minimization will be tethered with a 10kcal/mol restraint, such that with each step of minimization, the molecule cannot move too much. The original DOCK pose is more likely to be kept with this method.

Path of script: ~pholden/RCR/projects_BG/screening/HIV/run.minoffgrid.csh

Pose Rescoring using Molecular Footprints

In this step, the minimized molecules are rescoring using the molecular footprints. The footprint is a per-residue decomposition of the interaction energies between the ligand and the receptor. These energies will then be compared to a reference molecule's footprint, prepared earlier, and a numerical comparison between the two will be determined. The possible comparison techniques include a Pearson Correlation, Euclidean, Normalized Euclidean, and a threshold based method, where only values exceeding a certain energy will be compared. The energies compared are van der waal's energy (vdw_fp), electrostatic energy (es_fp), and the number of hydrogen bonds (hb_fp). Combinations of the footprints can also be determined.

The script simply reads in a multimol2 and a reference and reports the per-residue decomposed energies and the actual footprint value.

For Pearson, values close to 1 are best. For Euclidean, values close to 0 are best.

Path: ~pholden/RCR/projects_BG/screening/HIV/run.footprint.csh

Clustering using MOE

Compound Selection