How does Slurm decide what job to start next?
**These servers are ONLY for debugging/testing - long or multiple process per user will be killed - no prior notice.
GPU | Feature | שם מחשב |
---|---|---|
72 | NVIDIA GeForce RTX 3090 | n-3xx |
20 | A5000 | n-5xx |
16 | A6000 | n-6xx |
32 | NVIDIA A100-SXM-80GB |
n-4xx |
24 | NVIDIA H100-80GB HBM3 | n-1xx |
16 | Tesla V100-SXM2-32GB | rack-xxx-dgx1 |
8 | Quadro RTX 8000 | rack-omerl-g01 |
32 | NVIDIA GeForce RTX 2080 Ti | n-2xx |
48 | Nvidia Titan XP | s-xxx |
CPU | Feature | שם מחשב |
---|---|---|
21*40 | Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz |
rack-iscb-[01-21] |
9*72 | Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz | rack-iscb-[31-39] |
256 | AMD EPYC 7713 64-Core Processor | rack-ai-01 |
--partition
in your job script in order for your job to run on the appropriate type of node using --constraint
to specify a specific hardware in the partition.sinfo
command.
Max Run Time | Notes | Partition name | Group |
---|---|---|---|
1 day | Default partition | killable | Research |
1 day | Partition for a100 resources only | gpu-a100-killable | Research |
1-5 days | Priority partition. You need to get permission to use it! | gpu-<research-group> | Research |
5 days | For CPU jobs | cpu-killable | Research |
1 day | Low Priority partition | studentkillable | Students |
3 day | For batch jobs - Limit 6 batch jobs per user | studentbatch | Students |
3 Hours | Open interactive session - Mainly for testing | studentrun | Students |
Example
cd /home/<YOUR-COURSE/LAB-PATH>
wget repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh
bash Anaconda3-2020.11-Linux-x86_64.sh
Welcome to Anaconda3 2020.11 (Follow the interactive instructions)…
Anaconda3 will now be installed into this location:
- Press ENTER to confirm the location
- Press CTRL-C to abort the installation
- Or specify a different location below ## CHANGE THE PREFIX TO YOUR PATH
[<SUGGESTED-PATH>/anaconda3] >>> <YOUR-PATH>/anaconda3
.
.
.
Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no] ##(This will change your .bashrc file for future sessions)
[no] >>> yes
.
.
.
Thank you for installing Anaconda3!
===========================================================================
**Note (do NOT install in your HOME-DIR as you don’t have enough quota for it)
**Note (Change location of package dirs path conda config --add pkgs_dirs <path>)
Make sure to set the PATH environment variable in order to point to your netapp storage installation in your shell’s .Xrc file (the Anaconda installation does this automatically for bash). If not, your default python or pip commands will not point to your personal network installation but to the version already installed by root on the specific machine you are running on.
After Anaconda is successfully installed, you can install any package you need via conda install or pip install. It is recommended to use conda install
bash
## Above command will activate your conda env, provided that you selected "yes" for running "conda init" during installation script
(base) <user>@c-002:~$ pip install torch
Collecting torch
Downloading torch-1.8.1-cp38-cp38-manylinux1_x86_64.whl (804.1 MB)
|████████████████████████████████| 804.1 MB 3.3 kB/s
Installing collected packages: torch
Successfully installed torch-1.8.1
(base) <user>@c-002:~$ conda deactivate
First of all, you’ll need to receive login and work permissions to the Slurm Cluster. If you’re studying a relevant course – your supervising TA will make a request to add you to the right partition . If you’ve enrolled in a project which requires use of the cluster – ask your project supervisor to contact the IT team (system@cs.tau.ac.il) and request usage permissions for you and your project partner. The request must include your moodle username and required project resources, as those are used for authentication.
Note: ‘op-controller2’ and clients c-[001-008] are not accessable from outside the University, so if you’re working outside the campus you will need to connect to the TAU network via the University VPN as described in the following link: https://computing.tau.ac.il/helpdesk/remote-access/communication/vpn
After receiving login permissions, SSH to ‘op-controller2.cs.tau.ac.il’ or one of slurm client nodes c-[001-008].cs.tau.ac.il:
ssh c-001.cs.tau.ac.il
Slurm has various mechanisms for prioritizing resource allocation. One of these mechanisms is a partition system which prioritizes certain jobs over others on select resources.
To check partitions available to your user
sacctmgr -P -i show user -s <username>
You will get a list of partitions and accounts attach to it( You can use -p for full column with). If you will like to use a partition that is NOT in your default account you MUST use --account (see example below)
To check the partitions available to your group
sinfo
Output example:
To check resources GPUS/CPU/Memory
sinfo -o "%20N %10c %10m %25f %10G "
When you submit a job on the CS Slurm Cluster, it gets an initial priority. The job's priority at any given time will be a weighted sum of multiple factors we have enabled (are still in fine tuning).
When there are free nodes, an approximate model of SLURM's behavior is this:
As soon as a new job is submitted and as soon as a job finishes, SLURM restarts with step 1, so most of the time only jobs at the top of the queue are tested for the possibility to start. As a side effect of this restart behavior, START_TIME approximations are normally NOT CALCULATED FOR ALL JOBS.
sinfo # show all available partitions and nodes squeue # view the queue squeue --me # shows only your jobs squeue documentaions: https://slurm.schedmd.com/squeue.html scancel <jobid> # cancel a job scancel documentaions: https://slurm.schedmd.com/scancel.html sacct -l -j <jobid> # List accounting info about a job
* You can find all information and options for using each command by running 'man <cmd>' or '<cmd> --help' to see the command manual, e.g. 'man sinfo' or 'sinfo --help'
The command "sbatch" should be the default command for running jobs. "srun" can be use ONLY in specifics partitions and only for testing and develop.
With sbatch
, you submit your job and it is handled by Slurm ; you can disconnect, kill your terminal, etc. with no consequence. Your job is no longer linked to a running process. Also failures involving sbatch jobs typically result in the job being requeued and executed again
The srun command is designed for interactive use, with someone monitoring the output. The output of the application is seen as output of the srun command, typically at the user's terminal. Failures involving srun typically result in an error message being generated with the expectation that the user will respond in an appropriate fashion - slurm session WILL NOT stop and resources WILL NOT be released.
Basic options available to the sbatch command in order to request the correct allocation of resources for your jobs (can be used in a script or at the command line):
Meaning: | Option: |
Partition name (MANDATORY) | --partition |
Job name (preferrably one that's easy to identify/manage) | --job-name |
Redirect stdout instead of slurm-%j.out in the current directory | --output |
Redirect stderr instead of job output file (see -o above) | --error |
Maximum duration (in minutes). Default based on the partition | --time |
How to end job when time's up | --signal |
Number of cluster servers to be used | --nodes |
Number of processes | --ntasks |
CPU cores per process | --cpus-per-task |
CPU memory (in MB) | --mem |
Ask for X number of GPUs. Also, if you combine this with an -N option, you will get X of GPUs per node which you asked for with -N, not X GPUs total. SLURM does not support having varying number of GPUs per node in a job yet. | --gres=gpu:x |
Nodes have features assigned to them . Users can specify which of these features are required by their job using the constraint option. Supported Features: tesla_v100, quadro_rtx_8000, geforce_rtx_3090, titan_xp, geforce_rtx_2080,a100,a5000,a6000 Example: --constraint=" tesla_v100|quadro_rtx_8000" indicates that the job requires gpu server with features tesla_v100 OR quadro_rtx_8000 |
--constraint |
For more info: https://slurm.schedmd.com/sbatch.html
Below is an example of how to run a simple batch job with minimum allocation of resources (1 node + 1 GPU):
Write python script - awesome.py:
# Author: Cs System Example # Name: awesome.py print('hello awesome world')
Write submit file - awesome.slurm:
#! /bin/sh #SBATCH --job-name=awesome #SBATCH --output=<your_dir>/awesome.out # redirect stdout #SBATCH --error=<your_dir>/awesome.err # redirect stderr #SBATCH --partition=studentbatch # (see resources section) #SBATCH --time=1 # max time (minutes) #SBATCH --signal=USR1@120 # how to end job when time’s up #SBATCH --nodes=1 # number of machines #SBATCH --ntasks=1 # number of processes #SBATCH --mem=50000 # CPU memory (MB) #SBATCH --cpus-per-task=4 # CPU cores per process #SBATCH --gpus=1 # GPUs in total python awesome.py
Submit job:
$ sbatch awesome.slurm
Submitted batch job 214726
A script can also be directly run from the command line/terminal as follows:
Shell script - my_awesome_script.sh
#! /bin/sh
python nlp_is_awesome.py --everything=cool
sbatch --job-name=awesome --output=<your_dir>/awesome.out \
--error=<your_dir>/awesome.err --partition= studentbatch \
--time=1440 --signal=USR1@120 --nodes=1 --ntasks=1 --mem=50000 \
--cpus-per-task=4 --gpus=2 ./my_awesome_script.sh
*The job will be executed in a new shell, but from the same directory. This means that relative directories are sensitive to the location from which the job was launched.
Sometimes working with a conda env. is not enough and you require other tools to be installed. Slurm supports work with docker.
1. Create slurm file like this (run.slurm):
#!/bin/bash #SBATCH --job-name=awesome #SBATCH --output=sample.out #SBATCH --error=sample.err #SBATCH --time=150 #SBATCH --partition=studentkillable #SBATCH --gpus=1 #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --mem=50000 #50,000 #SBATCH --cpus-per-task=4 #SBATCH --gpus-per-task=1 CMD=/path/to/script.sh srun easy_ngc --cmd ${CMD} nvcr.io/nvidia/pytorch:20.12-py3
Remember to change the partition, memory, time and everything to your parameters.
2. Create shell script to run (/path/to/script.sh). In the script you should insert every relevant thing - the command can't be more than 1 word! For example, ```script.sh -flag``` will NOT work!:
#!/bin/bash source ~/.bashrc cd /path/to/python python awesome.py --param1 --param2 "lorem ipsum" echo 'Done'
3. Verify that the script can be executed:
> chmod ug+rx /path/to/script.sh
4. Run:
> sbatch run.slurm
If you get an error that looks like this:
usage: easy_ngc_impl [-h] [--cmd CMD] [--version VERSION] [--modules MODULES]
...
--ssh_key_prefix ...
...
It is most probably because you used a more than 1 word for the --cmd flag. There is no way to workaround that except for inserting everything into the script of step 2. So, get back to step 2, and then fix your command accordingly. For clarity, this lines:
FULL_CMD=${SCRIPT_PATH}/${SCRIPT_NAME} \ --flag -f 'string with spaces' srun easy_ngc --cmd ${FULL_CMD} nvcr.io/nvidia/pytorch
Will cause a problem. Change it like this - inside script.sh:
/path/script.sh --flag parameter1 parameter2
And inside run.slurm:
FULL_CMD=${SCRIPT_PATH}/script.sh srun easy_ngc --cmd ${FULL_CMD} nvcr.io/nvidia/pytorch
DL containers from Nvidia (ngc):
> srun --gpus=2 --pty easy_ngc \ nvcr.io/nvidia/tensorflow:20.11-tf2-py3
The srun command works similarly to sbatch, but runs the job in interactive (blocking) mode. The --pty easy_ngc option tells slurm to run the job in a container, which emulates a shell in a safe environment (like Docker). Exiting the container (ctrl+d) ends the job and releases the resource. Besides the --pty option, srun and sbatch share almost all other options, such as --gpus.
srun documentation: https://slurm.schedmd.com/srun.html
> srun -G 3 --pty easy_ngc --cmd nvidia-smi nvcr.io/nvidia/pytorch
Use 3 GPUs and run the 'nvidia-smi' command using container with latest version of pytorch
> srun -G 2 --pty easy_ngc --jupyter mxnet
Use 2 GPUs and run jupyter notebook server using container with latest version of mxnet
> srun -G 5 --mem 120G --pty easy_ngc \ nvcr.io/nvidia/tensorflow:9.03-py3
Use 5 GPUs, allocate 120GB of system memory and use the March 2019 release of Tensorflow NGC container with python 3
> srun -G 2 --pty easy_ngc --modules=imagehash \
--packages=julia nvcr.io/nvidia/pytorch
Use 2 GPUs, run container with latest version of tensorflow, apt-get install julia and pip install imagehash inside container
> srun -G 2 --pty easy_ngc \
--modules=/home_dir/requirements.txt \
nvcr.io/nvidia/tensorflow:20.11-tf2-py3
Use 2 GPUs, run container with latest version of tensorflow, pip install list of python packages with latest/specific version (You need to create your own requirements.txt file)
> srun -G 3 --pty easy_ngc \
nvcr.io/hpc/vmd:cuda9-ubuntu1604-egl-1.9.4a17
Pull the VMD HPC container from NGC and run a command line on it (with 3 GPUs)
>srun -G 5 --pty easy_ngc \
--cmd 'nvidia-smi' docker.io/library/ubuntu:19.10
Pull generic Ubuntu 19.10 container from Dockerhub and run nvidia-smi on it (with 5 GPUs)
One might have a need to use a tool that is not installed on school servers. For example, R, gcc-11, python3.9 (without anaconda) and so on. For that, a tool named udocker is installed. Some examples of using udocker with slurm:
Interactive session of R
udocker pull rocker/r-base udocker create --name=r-container rocker/r-base srun -p killable --pty udocker run r-container
R script - 2 options. You can also put these lines inside a slurm script:
srun -p killable udocker run \ --volume=/directory/of/R/script:/name/you/choose \ r-container R --vanilla -f /name/you/choose/script.r srun -p killable udocker run \ --volume=/directory/of/R/script:/name/you/choose \ r-container Rscript /name/you/choose/script.r
Example of a slurm script that runs R container:
#!/bin/bash #SBATCH --job-name=awesome #SBATCH --output=awesome.out #SBATCH --partition=cpu-killable srun udocker run --bindhome r-container Rscript ~/script.r
NOTE: when using Rscript, it is recommended to put the following line as first line in the script:
#!/usr/bin/Rscript
NOTE: You might need to change the permisiions of the script, like this:
chmod ug+rx /path/to/script.r
Here is an example of condor '.cmd' file and a matching '.slurm' script. An explanation about the conversion process appears after the example.
Condor .cmd file:
DIR = $ENV(HOME)/condor Executable = $(DIR)/sample1 Log = sample1.log Error = sample1.error.$(Process) Output = sample1.output.$(Process) notification = Always Universe = Vanilla Queue 4
A matching .slurm script:
#!/bin/bash #SBATCH --job-name=sample1 #SBATCH --output=sample1.output.%A.%a # %j can be useful #SBATCH --error=sample1.error.%A.%a #SBATCH --partition=studentkillable #SBATCH --mail-type=ALL,TIME_LIMIT_80 #notification #SBATCH --time=1440 #minutes #SBATCH --array=0-3 #not the same as --ntasks=4 #SBATCH --gres=gpu:1 DIR=$HOME/condor EXE=${DIR}/sample1 srun ${EXE}
Run the job with:
> sbatch <filename>.slurm
NOTE: sample1 execute bit has to be set:
> chmod ug+rx ~/condor/sample1
Useful environmanet variables are listed here:
https://slurm.schedmd.com/sbatch.html#SECTION_OUTPUT-ENVIRONMENT-VARIABLES
Explanation of the conversion process:
The conversion process has 4 parts: 1. Direct conversion, 2. Addition of missing commands/instructions, 3. Elimination of unnecessary lines and 4. Re-ordering of the slurm script to its final form.
Note that in the slurm script the space were removed and the round brackets were replaced with curly brackets. Thst is because slurm scripts are also valid bash scripts.
Here is a table for the Direct conversion:
CMD file | slurm file |
---|---|
DIR = $ENV(HOME)/condor Executable = $(DIR)/sample1 |
EXE="~/condor/sample1" |
Error = <filename>.err.$(Process) | #SBATCH --error=<filename>.err.%a |
Output = <filename>.out.$(Process) | #SBATCH --output=<filename>.out.%a |
Arguments = $(Process) | ARGS=${SLURM_ARRAY_TASK_ID} |
notification = Always | #SBATCH --mail-type=ALL |
Queue 100 | #SBATCH --array=0-99 |
Usually, one needs to add the following to the slurm script:
#!/bin/bash # as a first line
#SBATCH --account=<relevant-account>
#SBATCH --partition=<relevant-partition>
#SBATCH --gres=gpu:n
#SBATCH -c <number of requested CPUs> # optional
srun ${EXE} ${ARGS}
This lines has no direct replacement, and should be removed (except special cases):
Log = sample.log # can be replaced
# using
# scontrol show job <job number>
# or
# sacct -j <job-number>
Universe = Vanilla # should be removed
Finally, check that the slurm script is well formed:
1. The line starts with #! (Shebang) should be first line of the file.
2. After it should be all the lines that start with '#SBATCH'.
3. Then the variables.
4. And at last you should add the running line, the line starts with 'srun'.
The example we saw at the beginning show this exactly.
Possible errors
Usually not defining the right account/partition, running on the wrong cluster or not having permissions or resources.