Difference between revisions of "2024 DOCK GA tutorial 1 with 1NDV"

From Rizzo_Lab
Jump to: navigation, search
(Fragment Libraries)
(III. Performing GA)
 
(8 intermediate revisions by the same user not shown)
Line 6: Line 6:
 
=='''II. Fragment Library Generation for 1NDV'''==
 
=='''II. Fragment Library Generation for 1NDV'''==
  
===Fragment Libraries===
 
 
In order for the GA to work, we first need to produce a large and diverse fragment library. Make a directory to run the GA in called "1NDV_GA". Now enter that directory and make a new directory  named "fraglib". This is where we will build our fragment library. Create a new file for the fragment library and use it as input for dock6 so we can interactively build the library using the following commands.
 
In order for the GA to work, we first need to produce a large and diverse fragment library. Make a directory to run the GA in called "1NDV_GA". Now enter that directory and make a new directory  named "fraglib". This is where we will build our fragment library. Create a new file for the fragment library and use it as input for dock6 so we can interactively build the library using the following commands.
  
Line 38: Line 37:
 
  num_scored_conformers                                        1
 
  num_scored_conformers                                        1
 
  rank_ligands                                                no
 
  rank_ligands                                                no
 +
 +
Now combine the torsion environment file (1NDV_torenv.dat) with the full_sorted_fraglib.dat. This can be done with the python script combine_torenv.py using the following command. The new torenv.dat will be used for the GA.
 +
 +
python ${DOCK_DIR}/bin/combine_torenv.py 2NNQ.fraglib_torenv.dat ${DOCK_DIR}/parameters/fraglib_torenv.dat
  
 
=='''III. Performing GA'''==
 
=='''III. Performing GA'''==
 +
It is now time to perform the GA. Let's make a new directory to store our results called "1NDV_GA_results". Enter this new directory and create a new file for the input of our GA called 1NDV_GA.in. It is recommended that you input the parameters interactively as we did before. To do this, simply give dock6 the empty input file and follow the inputs below as a guide. Things such as the number of generations, elimination method, and mutations allowed can be adjusted to suit your needs. Do not submit this job! We will create a slurm job in the next step.
 +
conformer_search_type                                        genetic
 +
ga_molecule_file                                            /Path/1ndv_ligand_only_H_charge.mol2
 +
ga_utilities                                                no
 +
ga_fraglib_scaffold_file                                    /gpfs/projects/rizzo/zzz.programs/dock6.10/parameters/fraglib_ga_scaffold.mol2
 +
ga_fraglib_linker_file                                      /gpfs/projects/rizzo/zzz.programs/dock6.10/parameters/fraglib_linker.mol2
 +
ga_fraglib_sidechain_file                                    /gpfs/projects/rizzo/zzz.programs/dock6.10/parameters/fraglib_sidechain.mol2
 +
ga_torenv_table                                              ../fraglib/unique_full_sorted_fraglib.dat
 +
ga_max_generations                                          100
 +
ga_xover_on                                                  yes
 +
ga_xover_sampling_method_rand                                yes
 +
ga_xover_max                                                150
 +
ga_bond_tolerance                                            0.5
 +
ga_angle_cutoff                                              0.14
 +
ga_check_overlap                                            no
 +
ga_mutate_addition                                          yes
 +
ga_mutate_deletion                                          yes
 +
ga_mutate_substitution                                      yes
 +
ga_mutate_replacement                                        yes
 +
ga_mutate_parents                                            yes
 +
ga_pmut_rate                                                0.3
 +
ga_omut_rate                                                0.7
 +
ga_max_mut_cycles                                            5
 +
ga_mut_sampling_method                                      rand
 +
ga_num_random_picks                                          10
 +
ga_max_root_size                                            5
 +
ga_energy_cutoff                                            100
 +
ga_heur_unmatched_num                                        2
 +
ga_heur_matched_rmsd                                        2
 +
ga_constraint_mol_wt                                        550
 +
ga_constraint_rot_bon                                        10
 +
ga_constraint_H_accept                                      10
 +
ga_constraint_H_don                                          5
 +
ga_constraint_formal_charge                                  4
 +
ga_ensemble_size                                            200
 +
ga_selection_method                                          elitism
 +
ga_elitism_combined                                          no
 +
ga_elitism_option                                            max
 +
ga_max_num_gen_with_no_crossover                            1000
 +
ga_name_identifier                                          ga
 +
ga_output_prefix                                            1NDV_GA_output
 +
use_internal_energy                                          yes
 +
internal_energy_rep_exp                                      12
 +
internal_energy_cutoff                                      100
 +
use_database_filter                                          no
 +
orient_ligand                                                no
 +
bump_filter                                                  no
 +
score_molecules                                              yes
 +
contact_score_primary                                        no
 +
grid_score_primary                                          no
 +
multigrid_score_primary                                      no
 +
dock3.5_score_primary                                        no
 +
continuous_score_primary                                    no
 +
footprint_similarity_score_primary                          no
 +
pharmacophore_score_primary                                  no
 +
hbond_score_primary                                          no
 +
internal_energy_score_primary                                no
 +
descriptor_score_primary                                    yes
 +
descriptor_use_grid_score                                    yes
 +
descriptor_use_pharmacophore_score                          no
 +
descriptor_use_tanimoto                                      no
 +
descriptor_use_hungarian                                    no
 +
descriptor_use_volume_overlap                                yes
 +
descriptor_grid_score_rep_rad_scale                          1
 +
descriptor_grid_score_vdw_scale                              1
 +
descriptor_grid_score_es_scale                              1
 +
descriptor_grid_score_grid_prefix                            grid
 +
descriptor_volume_score_reference_mol2_filename              descriptor_volume_score_reference.mol2
 +
descriptor_volume_score_overlap_compute_method              analytical
 +
descriptor_weight_grid_score                                1
 +
descriptor_weight_volume_overlap_score                      -1
 +
minimize_ligand                                              yes
 +
minimize_anchor                                              yes
 +
minimize_flexible_growth                                    yes
 +
use_advanced_simplex_parameters                              no
 +
simplex_max_cycles                                          1
 +
simplex_score_converge                                      0.1
 +
simplex_cycle_converge                                      1
 +
simplex_trans_step                                          1
 +
simplex_rot_step                                            0.1
 +
simplex_tors_step                                            10
 +
simplex_anchor_max_iterations                                500
 +
simplex_grow_max_iterations                                  500
 +
simplex_grow_tors_premin_iterations                          00
 +
simplex_random_seed                                          0
 +
simplex_restraint_min                                        yes
 +
simplex_coefficient_restraint                                10
 +
atom_model                                                  all
 +
vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6/parameters/vdw_AMBER_parm99.defn
 +
flex_defn_file                                              /gpfs/projects/AMS536/zzz.programs/dock6/parameters/flex.defn
 +
flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6/parameters/flex_drive.tbl
 +
chem_defn_file                                              /gpfs/projects/AMS536/zzz.programs/chem.defn
 +
 +
To create a slurm submission script, first create a file called "1NDV_GA.sh". Input the following lines into the new file.
 +
 +
#!/bin/bash
 +
#
 +
#SBATCH --job-name=1ndv_GA
 +
#SBATCH --output=GA.txt
 +
#SBATCH --ntasks-per-node=24
 +
#SBATCH --nodes=6
 +
#SBATCH --time=1:00:00
 +
#SBATCH -p debug-28core
 +
 +
module load intel/mpi/64/2018/18.0.3
 +
 +
mpirun -np 144  dock6.mpi -i 1NDV_GA.in -o 1NDV_GA.out
 +
 +
You can now sumbit the job using:
 +
sbatch 1NDV_GA.sh
 +
 +
The GA will take some time to run, but be sure to check that you do not have any errors early on. Once the GA has run, you should now have a directory with mol2 files for every generation of the GA. You can download all of these individually, but it may be easier to use the "cat" command to place them into one file. Once you have your desired generations downloaded on your local computer, start a new Chimera session. (Note that his works best on Chimera and not Chimera X) Start by opening the mol2 file that only contains the protein. Now navigate to the tool bar click  "Tools" → "Surface/Binding Analysis" → "View Dock". Open the mol2 files each generation of GA. Be sure to select DOCK6 as your file type. You can then use the ViewDock toolbar to select "Column" → "Show" → "Generation" and "Column" → "Show" → "Type". This will show what generation each ligand is from, and what the crossover or mutation was that generated the structure. An example of this is shown below.
 +
[[File:1ndv_ga.png|center|1000px|PDB 1NDV]]

Latest revision as of 06:46, 6 May 2024

I. Introduction

A genetic algorithm is a form of de novo design. This form of de novo grows new molecules through a process of evolutionary growth that includes crossover of parent molecules, mutations, and a "natural selection" process. Crossover occurs when two parent molecules have rotatable bonds in the same area of the binding site. This will exchange the fragments between the parents, and creature two new offspring. There are 4 different mutations that can occur during genetic growth. There can be deletion of an existing fragment, addition of a new fragment, substitution of a terminal fragment, and replacement of a linker fragment. After crossover and mutation occurs the new molecules are scored and a select amount are carried over into the next stage of growth. In this tutorial will be using the Elitism elimination method, so only the top scoring molecules will move on to the next generation.

PDB 1NDV

II. Fragment Library Generation for 1NDV

In order for the GA to work, we first need to produce a large and diverse fragment library. Make a directory to run the GA in called "1NDV_GA". Now enter that directory and make a new directory named "fraglib". This is where we will build our fragment library. Create a new file for the fragment library and use it as input for dock6 so we can interactively build the library using the following commands.

touch 1NDV.fraglib
dock6 -i 1NDV.fraglib

Now answer the prompts using the responses below as a guide.

conformer_search_type                                        flex
write_fragment_libraries                                     yes
fragment_library_prefix                                      1ndv.fraglib
fragment_library_freq_cutoff                                 1
fragment_library_sort_method                                 freq
fragment_library_trans_origin                                no
use_internal_energy                                          no
ligand_atom_file                                             /gpfs/projects/AMS536/2024/students/group_2_1NDV/dock_screen/001.structure/1ndv_ligand_only_H_charge.mol2
limit_max_ligands                                            no
skip_molecule                                                no
read_mol_solvation                                           no
calculate_rmsd                                               no
use_database_filter                                          no
orient_ligand                                                no
bump_filter                                                  no
score_molecules                                              no
atom_model                                                   all
vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/flex.defn
flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6.10/parameters/flex_drive.tbl
ligand_outfile_prefix                                        1ndv.frag.output
write_orientations                                           no
num_scored_conformers                                        1
rank_ligands                                                 no

Now combine the torsion environment file (1NDV_torenv.dat) with the full_sorted_fraglib.dat. This can be done with the python script combine_torenv.py using the following command. The new torenv.dat will be used for the GA.

python ${DOCK_DIR}/bin/combine_torenv.py 2NNQ.fraglib_torenv.dat ${DOCK_DIR}/parameters/fraglib_torenv.dat

III. Performing GA

It is now time to perform the GA. Let's make a new directory to store our results called "1NDV_GA_results". Enter this new directory and create a new file for the input of our GA called 1NDV_GA.in. It is recommended that you input the parameters interactively as we did before. To do this, simply give dock6 the empty input file and follow the inputs below as a guide. Things such as the number of generations, elimination method, and mutations allowed can be adjusted to suit your needs. Do not submit this job! We will create a slurm job in the next step.

conformer_search_type                                        genetic
ga_molecule_file                                             /Path/1ndv_ligand_only_H_charge.mol2
ga_utilities                                                 no
ga_fraglib_scaffold_file                                     /gpfs/projects/rizzo/zzz.programs/dock6.10/parameters/fraglib_ga_scaffold.mol2
ga_fraglib_linker_file                                       /gpfs/projects/rizzo/zzz.programs/dock6.10/parameters/fraglib_linker.mol2
ga_fraglib_sidechain_file                                    /gpfs/projects/rizzo/zzz.programs/dock6.10/parameters/fraglib_sidechain.mol2
ga_torenv_table                                              ../fraglib/unique_full_sorted_fraglib.dat
ga_max_generations                                           100
ga_xover_on                                                  yes
ga_xover_sampling_method_rand                                yes
ga_xover_max                                                 150
ga_bond_tolerance                                            0.5
ga_angle_cutoff                                              0.14
ga_check_overlap                                             no
ga_mutate_addition                                           yes
ga_mutate_deletion                                           yes
ga_mutate_substitution                                       yes
ga_mutate_replacement                                        yes
ga_mutate_parents                                            yes
ga_pmut_rate                                                 0.3
ga_omut_rate                                                 0.7
ga_max_mut_cycles                                            5
ga_mut_sampling_method                                       rand
ga_num_random_picks                                          10
ga_max_root_size                                             5
ga_energy_cutoff                                             100
ga_heur_unmatched_num                                        2
ga_heur_matched_rmsd                                         2
ga_constraint_mol_wt                                         550
ga_constraint_rot_bon                                        10
ga_constraint_H_accept                                       10
ga_constraint_H_don                                          5
ga_constraint_formal_charge                                  4
ga_ensemble_size                                             200
ga_selection_method                                          elitism
ga_elitism_combined                                          no
ga_elitism_option                                            max
ga_max_num_gen_with_no_crossover                             1000
ga_name_identifier                                           ga
ga_output_prefix                                             1NDV_GA_output
use_internal_energy                                          yes
internal_energy_rep_exp                                      12
internal_energy_cutoff                                       100
use_database_filter                                          no
orient_ligand                                                no
bump_filter                                                  no
score_molecules                                              yes
contact_score_primary                                        no
grid_score_primary                                           no
multigrid_score_primary                                      no
dock3.5_score_primary                                        no
continuous_score_primary                                     no
footprint_similarity_score_primary                           no
pharmacophore_score_primary                                  no
hbond_score_primary                                          no
internal_energy_score_primary                                no
descriptor_score_primary                                     yes
descriptor_use_grid_score                                    yes
descriptor_use_pharmacophore_score                           no
descriptor_use_tanimoto                                      no
descriptor_use_hungarian                                     no
descriptor_use_volume_overlap                                yes
descriptor_grid_score_rep_rad_scale                          1
descriptor_grid_score_vdw_scale                              1
descriptor_grid_score_es_scale                               1
descriptor_grid_score_grid_prefix                            grid
descriptor_volume_score_reference_mol2_filename              descriptor_volume_score_reference.mol2
descriptor_volume_score_overlap_compute_method               analytical
descriptor_weight_grid_score                                 1
descriptor_weight_volume_overlap_score                       -1
minimize_ligand                                              yes
minimize_anchor                                              yes
minimize_flexible_growth                                     yes
use_advanced_simplex_parameters                              no
simplex_max_cycles                                           1
simplex_score_converge                                       0.1
simplex_cycle_converge                                       1
simplex_trans_step                                           1
simplex_rot_step                                             0.1
simplex_tors_step                                            10
simplex_anchor_max_iterations                                500
simplex_grow_max_iterations                                  500
simplex_grow_tors_premin_iterations                          00
simplex_random_seed                                          0
simplex_restraint_min                                        yes
simplex_coefficient_restraint                                10
atom_model                                                   all
vdw_defn_file                                                /gpfs/projects/AMS536/zzz.programs/dock6/parameters/vdw_AMBER_parm99.defn
flex_defn_file                                               /gpfs/projects/AMS536/zzz.programs/dock6/parameters/flex.defn
flex_drive_file                                              /gpfs/projects/AMS536/zzz.programs/dock6/parameters/flex_drive.tbl
chem_defn_file                                               /gpfs/projects/AMS536/zzz.programs/chem.defn

To create a slurm submission script, first create a file called "1NDV_GA.sh". Input the following lines into the new file.

#!/bin/bash
#
#SBATCH --job-name=1ndv_GA
#SBATCH --output=GA.txt
#SBATCH --ntasks-per-node=24
#SBATCH --nodes=6
#SBATCH --time=1:00:00
#SBATCH -p debug-28core
module load intel/mpi/64/2018/18.0.3

mpirun -np 144  dock6.mpi -i 1NDV_GA.in -o 1NDV_GA.out

You can now sumbit the job using:

sbatch 1NDV_GA.sh

The GA will take some time to run, but be sure to check that you do not have any errors early on. Once the GA has run, you should now have a directory with mol2 files for every generation of the GA. You can download all of these individually, but it may be easier to use the "cat" command to place them into one file. Once you have your desired generations downloaded on your local computer, start a new Chimera session. (Note that his works best on Chimera and not Chimera X) Start by opening the mol2 file that only contains the protein. Now navigate to the tool bar click "Tools" → "Surface/Binding Analysis" → "View Dock". Open the mol2 files each generation of GA. Be sure to select DOCK6 as your file type. You can then use the ViewDock toolbar to select "Column" → "Show" → "Generation" and "Column" → "Show" → "Type". This will show what generation each ligand is from, and what the crossover or mutation was that generated the structure. An example of this is shown below.

PDB 1NDV