Difference between revisions of "Useful Slurm Commands"

Revision as of 16:38, 22 February 2024

Basic Slurm commands

squeue -u $(whoami) - displays your running jobs and their job ids
sinfo -l -a | grep idle - displays available nodes (note: some nodes may have exclusive access to certain lab groups)
scancel <jobid> - remove the job from the queue

Advanced Tricks

Delete all your jobs (use with caution)

scancel -u $(whoami)

Delete all your queued jobs only. Leaves all runnings jobs alone.

scancel -u $(whoami) -t "PENDING"

Sample slurm script for many independent jobs

#!/bin/bash
#SBATCH --time=2-00:00:00  #lower times get higher priority in the queuing system
#SBATCH --nodes=3 #number
#SBATCH --ntasks=(# cores per node) * (# nodes) #this should equal the number of jobs you want to run at the same time
#SBATCH --job-name=<your_job_name>
#SBATCH --output=<std_out_filename>
#SBATCH -p <partition_name>  
 
DOCK_PATH="<path to dock bin/executable>"
JOB_FILE="<your common input file>"
CDIR=$(pwd)
# Assuming your paths to experiments are listed line by line in a file named paths.txt, which should have # paths = # cores = # tasks
while IFS= read -r path; do
  cp $JOB_FILE -t $path
  cd $path
  base=$(basename -s .in $JOB_FILE)
  # You can modify this srun command based on your requirements, the -n1 requests 1 core for the srun job, the -N1 requests 1 node, the --exclusive prevents this job's cores from being used in other jobs, -W 0 tells the script to not give   the job a timelimit
  srun --mem=6090 --exclusive -N1 -n1 -W 0 $DOCK_PATH/dock6.rdkit -i $base.in -o $base.out &
  cd $CDIR
done < paths.txt
wait #necessary to prevent the script from ending and terminating your jobs

@@ Line 1: / Line 1: @@
+==Basic Slurm commands==
+* squeue -u $(whoami) - displays your running jobs and their job ids
+* sinfo -l -a | grep idle - displays available nodes (note: some nodes may have exclusive access to certain lab groups)
+* scancel <jobid> - remove the job from the queue
+==Advanced Tricks==
+Delete all your jobs (use with caution)
+ scancel -u $(whoami)
+Delete all your queued jobs only. Leaves all runnings jobs alone.
+ scancel -u $(whoami) -t "PENDING"
 ==Sample slurm script for many independent jobs==
   #!/bin/bash
-  #SBATCH --time=<desired time up to node upper time limit> #lower times get higher priority in the queuing system
+  #SBATCH --time=2-00:00:00  #lower times get higher priority in the queuing system
-  #SBATCH --nodes=3
+  #SBATCH --nodes=3 #number
   #SBATCH --ntasks=(# cores per node) * (# nodes) #this should equal the number of jobs you want to run at the same time
   #SBATCH --job-name=<your_job_name>
@@ Line 11: / Line 31: @@
   JOB_FILE="<your common input file>"
   CDIR=$(pwd)
-  # Assuming your paths to experiments are listed line by line in a file named paths.txt
+  # Assuming your paths to experiments are listed line by line in a file named paths.txt, which should have # paths = # cores = # tasks
   while IFS= read -r path; do
     cp $JOB_FILE -t $path
@@ Line 21: / Line 41: @@
   done < paths.txt
   wait #necessary to prevent the script from ending and terminating your jobs
-==Basic Slurm commands==
-* squeue -u $(whoami) - displays your running jobs and their job ids
-* sinfo -l -a | grep idle - displays available nodes (note: some nodes may have exclusive access to certain lab groups)
-* scancel <jobid>- remove the job from the queue
-Use the man or info command in Unix to get more details of usage for these commands.
-You can use PBS commands inside your script. These are usually optional, but can be useful.
-There is an example script see [[NAMD on Seawulf]]
- #PBS -l nodes=8:ppn=2
- #PBS -l walltime=08:00:00
- #PBS -N my_job_name
- #PBS -M user@ic.sunysb.edu
- #PBS -o namd.md01.out
- #PBS -e namd.md01.err
- #PBS -V
-Using these PBS directives will name the job, the output files etc.
- #PBS -j oe
- #PBS -o pbs.out
-This will join the output and error streams into the output file.
-$PBS_O_WORKDIR is an environment variable that contains the path the script was submitted in. Usually you want to define a specific workdir and use that instead of relying on this variable.
-==Advanced Tricks==
-Delete all your jobs (either will work, check different versions of PBS)
- qstat -u sudipto | awk -F. '/sudipto/{print $1}' | xargs qdel
- qstat | grep sudipto | awk -F. '{print $1}' | xargs qdel
-Delete all your queued jobs only. Leaves all runnings jobs alone.
- qstat -u sudipto | awk -F. '/00 Q /{printf $1" "}' | xargs qdel
- qstat -u sudipto | awk -F. '/ Q   --/{printf $1" " }' | xargs qdel
-List all working nodes in the queue
- pbsnodes | egrep '^node|state =' | grep -B 1 'state = free' | grep ^node
- set NODELIST=`pbsnodes | egrep '^node|state =' | grep -B 1 'state = free' | grep ^node`
- foreach node ($NODELIST)
-/usr/local/torque-2.1.6/bin/qsub -l nodes=${NODE} ${HOME}/get.nodes.stats.csh
-done
-Run on a particular type of node
-  qsub -l nodes=1:beta:ppn=1 ${script}

Difference between revisions of "Useful Slurm Commands"

Revision as of 16:38, 22 February 2024

Basic Slurm commands

Advanced Tricks

Sample slurm script for many independent jobs

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Rizzo Lab

Courses

Toolbox