Difference between revisions of "Useful Slurm Commands"

From Rizzo_Lab
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 15: Line 15:
 
  scancel -u $(whoami) -t "PENDING"  
 
  scancel -u $(whoami) -t "PENDING"  
  
View only idle nodes
+
View only idle nodes - displays available nodes (note: some nodes may have exclusive access to certain lab groups)
sinfo -l -a | grep idle - displays available nodes (note: some nodes may have exclusive access to certain lab groups)
+
sinfo -l -a | grep idle
  
 +
Check out a node to run jobs interactively (so that you don't use the login nodes!!!)
 +
srun -N 1 -n 28 -t 8:00:00 -p long-28core --pty bash
  
 
==Sample slurm script for many independent jobs==
 
==Sample slurm script for many independent jobs==

Latest revision as of 15:40, 22 February 2024


Basic Slurm commands

  • squeue -u $(whoami) - displays your running jobs and their job ids
  • scancel <jobid> - remove the job from the queue


Advanced Tricks

Delete all your jobs (use with caution)

scancel -u $(whoami)

Delete all your queued jobs only. Leaves all runnings jobs alone.

scancel -u $(whoami) -t "PENDING" 

View only idle nodes - displays available nodes (note: some nodes may have exclusive access to certain lab groups)

sinfo -l -a | grep idle 

Check out a node to run jobs interactively (so that you don't use the login nodes!!!)

srun -N 1 -n 28 -t 8:00:00 -p long-28core --pty bash

Sample slurm script for many independent jobs

#!/bin/bash
#SBATCH --time=2-00:00:00  #lower times get higher priority in the queuing system
#SBATCH --nodes=3 #number
#SBATCH --ntasks=(# cores per node) * (# nodes) #this should equal the number of jobs you want to run at the same time
#SBATCH --job-name=<your_job_name>
#SBATCH --output=<std_out_filename>
#SBATCH -p <partition_name>  
 
DOCK_PATH="<path to dock bin/executable>"
JOB_FILE="<your common input file>"
CDIR=$(pwd)
# Assuming your paths to experiments are listed line by line in a file named paths.txt, which should have # paths = # cores = # tasks
while IFS= read -r path; do
  cp $JOB_FILE -t $path
  cd $path
  base=$(basename -s .in $JOB_FILE)
  # You can modify this srun command based on your requirements, the -n1 requests 1 core for the srun job, the -N1 requests 1 node, the --exclusive prevents this job's cores from being used in other jobs, -W 0 tells the script to not give   the job a timelimit
  srun --mem=6090 --exclusive -N1 -n1 -W 0 $DOCK_PATH/dock6.rdkit -i $base.in -o $base.out &
  cd $CDIR
done < paths.txt
wait #necessary to prevent the script from ending and terminating your jobs