Difference between revisions of "Useful Slurm Commands"

Revision as of 16:31, 22 February 2024

Sample slurm script for many independent jobs

#!/bin/bash
#SBATCH --time=<desired time up to node upper time limit> #lower times get higher priority in the queuing system
#SBATCH --nodes=3
#SBATCH --ntasks=(# cores per node) * (# nodes) #this should equal the number of jobs you want to run at the same time
#SBATCH --job-name=<your_job_name>
#SBATCH --output=<std_out_filename>
#SBATCH -p <partition_name>  
 
DOCK_PATH="<path to dock bin/executable>"
JOB_FILE="<your common input file>"
CDIR=$(pwd)
# Assuming your paths to experiments are listed line by line in a file named paths.txt
while IFS= read -r path; do
  cp $JOB_FILE -t $path
  cd $path
  base=$(basename -s .in $JOB_FILE)
  # You can modify this srun command based on your requirements, the -n1 requests 1 core for the srun job, the -N1 requests 1 node, the --exclusive prevents this job's cores from being used in other jobs, -W 0 tells the script to not give   the job a timelimit
  srun --mem=6090 --exclusive -N1 -n1 -W 0 $DOCK_PATH/dock6.rdkit -i $base.in -o $base.out &
  cd $CDIR
done < paths.txt
wait #necessary to prevent the script from ending and terminating your jobs

Basic Slurm commands

squeue -u $(whoami) - displays your running jobs and their job ids
sinfo -l -a | grep idle - displays available nodes (note: some nodes may have exclusive access to certain lab groups)
scancel <jobid>- remove the job from the queue

Use the man or info command in Unix to get more details of usage for these commands. You can use PBS commands inside your script. These are usually optional, but can be useful. There is an example script see NAMD on Seawulf

#PBS -l nodes=8:ppn=2
#PBS -l walltime=08:00:00
#PBS -N my_job_name
#PBS -M user@ic.sunysb.edu
#PBS -o namd.md01.out
#PBS -e namd.md01.err
#PBS -V

Using these PBS directives will name the job, the output files etc.

#PBS -j oe
#PBS -o pbs.out

This will join the output and error streams into the output file. $PBS_O_WORKDIR is an environment variable that contains the path the script was submitted in. Usually you want to define a specific workdir and use that instead of relying on this variable.

Advanced Tricks

Delete all your jobs (either will work, check different versions of PBS)

qstat -u sudipto | awk -F. '/sudipto/{print $1}' | xargs qdel
qstat | grep sudipto | awk -F. '{print $1}' | xargs qdel

Delete all your queued jobs only. Leaves all runnings jobs alone.

qstat -u sudipto | awk -F. '/00 Q /{printf $1" "}' | xargs qdel
qstat -u sudipto | awk -F. '/ Q   --/{printf $1" " }' | xargs qdel

List all working nodes in the queue

pbsnodes | egrep '^node|state =' | grep -B 1 'state = free' | grep ^node

set NODELIST=`pbsnodes | egrep '^node|state =' | grep -B 1 'state = free' | grep ^node`
foreach node ($NODELIST)

/usr/local/torque-2.1.6/bin/qsub -l nodes=${NODE} ${HOME}/get.nodes.stats.csh done

Run on a particular type of node

 qsub -l nodes=1:beta:ppn=1 ${script}

@@ Line 23: / Line 23: @@
-==Basic PBS commands==
+==Basic Slurm commands==
-* qsub <script> - will submit your script to the queue
+* squeue -u $(whoami) - displays your running jobs and their job ids
-* qstat - will display a list of jobs running
+* sinfo -l -a | grep idle - displays available nodes (note: some nodes may have exclusive access to certain lab groups)
-* qdel <jobid> - remove the job from the queue
+* scancel <jobid>- remove the job from the queue
- Job id                    Name             User            Time Use S Queue
- ------------------------- ---------------- --------------- -------- - -----
-.nagling            STDIN            liuxt12         1657:13: R batch
-.nagling            STDIN            liuxt12         3366:45: R batch
-.nagling            jet3dKn0.002PR20 wli             874:42:4 R batch
-.nagling            ...dKn0.002PR100 wli             855:59:2 R batch
-.nagling            jet3dKn0.002PR10 wli             596:48:0 R batch
-.nagling            SAM              mriessen        542:37:3 R batch
-.nagling            STDIN            justin          00:00:32 R batch
-.nagling            latency_test     penzhang               0 Q batch
-.nagling            flex_1.dock.csh  xinyu           34:26:46 R batch
-.nagling            flex_2.dock.csh  xinyu           31:22:33 R batch
-.nagling            flex_5.dock.csh  xinyu           26:33:35 R batch
-.nagling            ....1_2.dock.csh xinyu           06:13:43 R batch
-.nagling            p0.10            hjli            326:10:5 R batch
-.nagling            p0.4             hjli            325:49:5 R batch
-.nagling            p0.7             hjli            332:11:4 R batch
-.nagling            ...1_p0.0_11.csh xinyu           17:16:57 R batch
-.nagling            STDIN            lli             00:34:43 R batch
-On seawulf the 'nodes' command shows a list of free nodes
-        avail   alloc   down
- nodes:  35      165     25
- wulfie: 35      165     25
 Use the man or info command in Unix to get more details of usage for these commands.

Difference between revisions of "Useful Slurm Commands"

Revision as of 16:31, 22 February 2024

Sample slurm script for many independent jobs

Basic Slurm commands

Advanced Tricks

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Rizzo Lab

Courses

Toolbox