Difference between revisions of "Useful Slurm Commands"

Latest revision as of 16:40, 22 February 2024

Basic Slurm commands

squeue -u $(whoami) - displays your running jobs and their job ids
scancel <jobid> - remove the job from the queue

Advanced Tricks

Delete all your jobs (use with caution)

scancel -u $(whoami)

Delete all your queued jobs only. Leaves all runnings jobs alone.

scancel -u $(whoami) -t "PENDING"

View only idle nodes - displays available nodes (note: some nodes may have exclusive access to certain lab groups)

sinfo -l -a | grep idle

Check out a node to run jobs interactively (so that you don't use the login nodes!!!)

srun -N 1 -n 28 -t 8:00:00 -p long-28core --pty bash

Sample slurm script for many independent jobs

#!/bin/bash
#SBATCH --time=2-00:00:00  #lower times get higher priority in the queuing system
#SBATCH --nodes=3 #number
#SBATCH --ntasks=(# cores per node) * (# nodes) #this should equal the number of jobs you want to run at the same time
#SBATCH --job-name=<your_job_name>
#SBATCH --output=<std_out_filename>
#SBATCH -p <partition_name>  
 
DOCK_PATH="<path to dock bin/executable>"
JOB_FILE="<your common input file>"
CDIR=$(pwd)
# Assuming your paths to experiments are listed line by line in a file named paths.txt, which should have # paths = # cores = # tasks
while IFS= read -r path; do
  cp $JOB_FILE -t $path
  cd $path
  base=$(basename -s .in $JOB_FILE)
  # You can modify this srun command based on your requirements, the -n1 requests 1 core for the srun job, the -N1 requests 1 node, the --exclusive prevents this job's cores from being used in other jobs, -W 0 tells the script to not give   the job a timelimit
  srun --mem=6090 --exclusive -N1 -n1 -W 0 $DOCK_PATH/dock6.rdkit -i $base.in -o $base.out &
  cd $CDIR
done < paths.txt
wait #necessary to prevent the script from ending and terminating your jobs

Difference between revisions of "Useful Slurm Commands"

Latest revision as of 16:40, 22 February 2024

Basic Slurm commands

Advanced Tricks

Sample slurm script for many independent jobs

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Rizzo Lab

Courses

Toolbox

@@ Line 1: / Line 1: @@
+==Basic Slurm commands==
+* squeue -u $(whoami) - displays your running jobs and their job ids
+* scancel <jobid> - remove the job from the queue
+==Advanced Tricks==
+Delete all your jobs (use with caution)
+ scancel -u $(whoami)
+Delete all your queued jobs only. Leaves all runnings jobs alone.
+ scancel -u $(whoami) -t "PENDING"
+View only idle nodes - displays available nodes (note: some nodes may have exclusive access to certain lab groups)
+ sinfo -l -a | grep idle
+Check out a node to run jobs interactively (so that you don't use the login nodes!!!)
+ srun -N 1 -n 28 -t 8:00:00 -p long-28core --pty bash
 ==Sample slurm script for many independent jobs==
   #!/bin/bash
-  #SBATCH --time=<desired time up to node upper time limit> #lower times get higher priority in the queuing system
+  #SBATCH --time=2-00:00:00  #lower times get higher priority in the queuing system
-  #SBATCH --nodes=3
+  #SBATCH --nodes=3 #number
   #SBATCH --ntasks=(# cores per node) * (# nodes) #this should equal the number of jobs you want to run at the same time
   #SBATCH --job-name=<your_job_name>
@@ Line 11: / Line 33: @@
   JOB_FILE="<your common input file>"
   CDIR=$(pwd)
-  # Assuming your paths to experiments are listed line by line in a file named paths.txt
+  # Assuming your paths to experiments are listed line by line in a file named paths.txt, which should have # paths = # cores = # tasks
   while IFS= read -r path; do
     cp $JOB_FILE -t $path
@@ Line 21: / Line 43: @@
   done < paths.txt
   wait #necessary to prevent the script from ending and terminating your jobs
-==Basic PBS commands==
-* qsub <script> - will submit your script to the queue
-* qstat - will display a list of jobs running
-* qdel <jobid> - remove the job from the queue
- Job id                    Name             User            Time Use S Queue
- ------------------------- ---------------- --------------- -------- - -----
-.nagling            STDIN            liuxt12         1657:13: R batch
-.nagling            STDIN            liuxt12         3366:45: R batch
-.nagling            jet3dKn0.002PR20 wli             874:42:4 R batch
-.nagling            ...dKn0.002PR100 wli             855:59:2 R batch
-.nagling            jet3dKn0.002PR10 wli             596:48:0 R batch
-.nagling            SAM              mriessen        542:37:3 R batch
-.nagling            STDIN            justin          00:00:32 R batch
-.nagling            latency_test     penzhang               0 Q batch
-.nagling            flex_1.dock.csh  xinyu           34:26:46 R batch
-.nagling            flex_2.dock.csh  xinyu           31:22:33 R batch
-.nagling            flex_5.dock.csh  xinyu           26:33:35 R batch
-.nagling            ....1_2.dock.csh xinyu           06:13:43 R batch
-.nagling            p0.10            hjli            326:10:5 R batch
-.nagling            p0.4             hjli            325:49:5 R batch
-.nagling            p0.7             hjli            332:11:4 R batch
-.nagling            ...1_p0.0_11.csh xinyu           17:16:57 R batch
-.nagling            STDIN            lli             00:34:43 R batch
-On seawulf the 'nodes' command shows a list of free nodes
-        avail   alloc   down
- nodes:  35      165     25
- wulfie: 35      165     25
-Use the man or info command in Unix to get more details of usage for these commands.
-You can use PBS commands inside your script. These are usually optional, but can be useful.
-There is an example script see [[NAMD on Seawulf]]
- #PBS -l nodes=8:ppn=2
- #PBS -l walltime=08:00:00
- #PBS -N my_job_name
- #PBS -M user@ic.sunysb.edu
- #PBS -o namd.md01.out
- #PBS -e namd.md01.err
- #PBS -V
-Using these PBS directives will name the job, the output files etc.
- #PBS -j oe
- #PBS -o pbs.out
-This will join the output and error streams into the output file.
-$PBS_O_WORKDIR is an environment variable that contains the path the script was submitted in. Usually you want to define a specific workdir and use that instead of relying on this variable.
-==Advanced Tricks==
-Delete all your jobs (either will work, check different versions of PBS)
- qstat -u sudipto | awk -F. '/sudipto/{print $1}' | xargs qdel
- qstat | grep sudipto | awk -F. '{print $1}' | xargs qdel
-Delete all your queued jobs only. Leaves all runnings jobs alone.
- qstat -u sudipto | awk -F. '/00 Q /{printf $1" "}' | xargs qdel
- qstat -u sudipto | awk -F. '/ Q   --/{printf $1" " }' | xargs qdel
-List all working nodes in the queue
- pbsnodes | egrep '^node|state =' | grep -B 1 'state = free' | grep ^node
- set NODELIST=`pbsnodes | egrep '^node|state =' | grep -B 1 'state = free' | grep ^node`
- foreach node ($NODELIST)
-/usr/local/torque-2.1.6/bin/qsub -l nodes=${NODE} ${HOME}/get.nodes.stats.csh
-done
-Run on a particular type of node
-  qsub -l nodes=1:beta:ppn=1 ${script}