Slurm, GPU, CGroups, ConstrainDevices

Has any site managed to make slurm + constraindevices + cgroups work well enough that it actually prevents people from manually setting CUDA_VISIBLE_DEVICES and double booking processes on GPU?

I can see constraindevices working for other things (cpu, etc)

cat /proc/$$/cgroup

but so far not gpu. Slurm manages to track the allocation correctly, so 1 job requesting 1 gpu on node A (which has 2 gpu) results in CUDA_VISIBLE_DEVICES=0 as expected. And another job requesting 1 gpu on node A results in CUDA_VISIBLE_DEVICES=0 also as expected.

Unfortunately they both run on the same gpu so the proper device is not getting assigned plus this still allows people to manually set CUDA_VISIBLE_DEVICES to another gpu which interferes with other users.

Certainly with all the GPU sites out there, someone has made the configs cooperate???

Are you using the nvidia persistence daemon or legacy mode?

We are seeing the same or a very similar problem as what you describe here so I don’t know for sure if I have the answer you need, but I believe I know what change I need to make in order to make this work I just want to notify our users before we make the change because I believe it will break some of their current workflows because we have some users that are not using --gres=gpu and still using the GPUs.

I believe that the fix is to make sure you have the following line in your cgroup.conf

ConstrainDevices=yes

If you already have that set then we may need to hear from others on what else needs to be done.

We are running Slurm 20.02.6 (via Bright Cluster Manager 9.0) on RHEL 8.1. Seems to work right for us.

Define GRES for GPUs; in /etc/slurm/slurm.conf have:

GresTypes=gpu

In /etc/slurm/gres.conf have:

NodeName=gpu[001-012] Name=gpu Type=v100 File=/dev/nvidia0 Cores=0,1,2,3,4,5,6,7,8,9,10,11
NodeName=gpu[001-012] Name=gpu Type=v100 File=/dev/nvidia1 Cores=12,13,14,15,16,17,18,19,20,21,22,23
NodeName=gpu[001-012] Name=gpu Type=v100 File=/dev/nvidia2 Cores=24,25,26,27,28,29,30,31,32,33,34,35
NodeName=gpu[001-012] Name=gpu Type=v100 File=/dev/nvidia3 Cores=36,37,38,39,40,41,42,43,44,45,46,47

In /etc/slurm/cgroup.conf, have:

CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
TaskAffinity=yes
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=no
ConstrainDevices=yes

On a simple test, the job shows 2 available devices:

$ srun --time=1:00:00 --partition=gpu --gres=gpu:2 --pty /bin/bash
gpu007$ nvidia-smi
Tue May 11 17:40:46 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:18:00.0 Off |                    0 |
| N/A   35C    P0    44W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  On   | 00000000:3B:00.0 Off |                    0 |
| N/A   37C    P0    71W / 300W |      0MiB / 32510MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
gpu007$ env |grep CUDA_VISIBLE_DEVICES
CUDA_VISIBLE_DEVICES=0,1