Difference between revisions of "Useful Slurm Commands"
BrockBoysan (talk | contribs) |
BrockBoysan (talk | contribs) |
||
Line 7: | Line 7: | ||
#SBATCH --output=<std_out_filename> | #SBATCH --output=<std_out_filename> | ||
#SBATCH -p <partition_name> | #SBATCH -p <partition_name> | ||
+ | |||
DOCK_PATH="<path to dock bin/executable>" | DOCK_PATH="<path to dock bin/executable>" | ||
JOB_FILE="<your common input file>" | JOB_FILE="<your common input file>" | ||
CDIR=$(pwd) | CDIR=$(pwd) | ||
+ | |||
# Assuming your paths to experiments are listed line by line in a file named paths.txt | # Assuming your paths to experiments are listed line by line in a file named paths.txt | ||
while IFS= read -r path; do | while IFS= read -r path; do | ||
Line 15: | Line 17: | ||
cd $path | cd $path | ||
− | |||
base=$(basename -s .in $JOB_FILE) | base=$(basename -s .in $JOB_FILE) | ||
− | # You can modify this srun command based on your requirements, the -n1 requests 1 core for the srun job, the -N1 requests 1 node, the --exclusive prevents this job's cores from being used in other jobs, -W 0 tells the script to not give the job a timelimit | + | # You can modify this srun command based on your requirements, the -n1 requests 1 core for the srun job, the -N1 requests 1 node, the --exclusive prevents this job's cores from being used in other jobs, -W 0 tells the script to not give the job a timelimit |
srun --mem=6090 --exclusive -N1 -n1 -W 0 $DOCK_PATH/dock6.rdkit -i $base.in -o $base.out & | srun --mem=6090 --exclusive -N1 -n1 -W 0 $DOCK_PATH/dock6.rdkit -i $base.in -o $base.out & | ||
cd $CDIR | cd $CDIR | ||
− | done < paths.txt | + | done < paths.txt |
− | wait #necessary to prevent the script from ending and terminating your jobs | + | wait #necessary to prevent the script from ending and terminating your jobs |
Revision as of 15:26, 22 February 2024
Sample slurm script for many independent jobs
#!/bin/bash #SBATCH --time=<desired time up to node upper time limit> #lower times get higher priority in the queuing system #SBATCH --nodes=3 #SBATCH --ntasks=(# cores per node) * (# nodes) #this should equal the number of jobs you want to run at the same time #SBATCH --job-name=<your_job_name> #SBATCH --output=<std_out_filename> #SBATCH -p <partition_name>
DOCK_PATH="<path to dock bin/executable>" JOB_FILE="<your common input file>" CDIR=$(pwd)
# Assuming your paths to experiments are listed line by line in a file named paths.txt while IFS= read -r path; do cp $JOB_FILE -t $path cd $path base=$(basename -s .in $JOB_FILE) # You can modify this srun command based on your requirements, the -n1 requests 1 core for the srun job, the -N1 requests 1 node, the --exclusive prevents this job's cores from being used in other jobs, -W 0 tells the script to not give the job a timelimit srun --mem=6090 --exclusive -N1 -n1 -W 0 $DOCK_PATH/dock6.rdkit -i $base.in -o $base.out &
cd $CDIR done < paths.txt wait #necessary to prevent the script from ending and terminating your jobs
Basic PBS commands
- qsub <script> - will submit your script to the queue
- qstat - will display a list of jobs running
- qdel <jobid> - remove the job from the queue
Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 514722.nagling STDIN liuxt12 1657:13: R batch 514723.nagling STDIN liuxt12 3366:45: R batch 514724.nagling jet3dKn0.002PR20 wli 874:42:4 R batch 514725.nagling ...dKn0.002PR100 wli 855:59:2 R batch 514803.nagling jet3dKn0.002PR10 wli 596:48:0 R batch 514809.nagling SAM mriessen 542:37:3 R batch 514811.nagling STDIN justin 00:00:32 R batch 514815.nagling latency_test penzhang 0 Q batch 514822.nagling flex_1.dock.csh xinyu 34:26:46 R batch 514839.nagling flex_2.dock.csh xinyu 31:22:33 R batch 514856.nagling flex_5.dock.csh xinyu 26:33:35 R batch 514859.nagling ....1_2.dock.csh xinyu 06:13:43 R batch 514862.nagling p0.10 hjli 326:10:5 R batch 514863.nagling p0.4 hjli 325:49:5 R batch 514864.nagling p0.7 hjli 332:11:4 R batch 514884.nagling ...1_p0.0_11.csh xinyu 17:16:57 R batch 514920.nagling STDIN lli 00:34:43 R batch
On seawulf the 'nodes' command shows a list of free nodes
avail alloc down nodes: 35 165 25 wulfie: 35 165 25
Use the man or info command in Unix to get more details of usage for these commands. You can use PBS commands inside your script. These are usually optional, but can be useful. There is an example script see NAMD on Seawulf
#PBS -l nodes=8:ppn=2 #PBS -l walltime=08:00:00 #PBS -N my_job_name #PBS -M user@ic.sunysb.edu #PBS -o namd.md01.out #PBS -e namd.md01.err #PBS -V
Using these PBS directives will name the job, the output files etc.
#PBS -j oe #PBS -o pbs.out
This will join the output and error streams into the output file. $PBS_O_WORKDIR is an environment variable that contains the path the script was submitted in. Usually you want to define a specific workdir and use that instead of relying on this variable.
Advanced Tricks
Delete all your jobs (either will work, check different versions of PBS)
qstat -u sudipto | awk -F. '/sudipto/{print $1}' | xargs qdel qstat | grep sudipto | awk -F. '{print $1}' | xargs qdel
Delete all your queued jobs only. Leaves all runnings jobs alone.
qstat -u sudipto | awk -F. '/00 Q /{printf $1" "}' | xargs qdel qstat -u sudipto | awk -F. '/ Q --/{printf $1" " }' | xargs qdel
List all working nodes in the queue
pbsnodes | egrep '^node|state =' | grep -B 1 'state = free' | grep ^node
set NODELIST=`pbsnodes | egrep '^node|state =' | grep -B 1 'state = free' | grep ^node` foreach node ($NODELIST)
/usr/local/torque-2.1.6/bin/qsub -l nodes=${NODE} ${HOME}/get.nodes.stats.csh done
Run on a particular type of node
qsub -l nodes=1:beta:ppn=1 ${script}