I want to be able to list nodes in a slurm-managed cluster with specific features - how many cores, which processor, how much memory, does it have gpu, what are the available features. How do I do that?
CURATOR: Kristina Plazonic KrisP
I want to be able to list nodes in a slurm-managed cluster with specific features - how many cores, which processor, how much memory, does it have gpu, what are the available features. How do I do that?
CURATOR: Kristina Plazonic KrisP
ANSWER: Short answer is the following:
sinfo -o "%20N %10c %10m %25f %10G "
You can see the options of sinfo
by doing sinfo --help
. In particular sinfo -o
specifies the format of the output, and the options above are short for
N
= node namec
= number of coresm
= memoryf
= features, often it will be the architecture or type of associated gpuG
= gres type and number, e.g. gpu:2
The %20
means 20 characters for this field. For example, for easy import to a Confluence page, you would want to separate fields with |
, and so your command would be
sinfo -o “|%20N | %10c | %10m | %25f | %10G|”
+1! And here is an example of the output above:
$ sinfo -o "%20N %10c %10m %25f %10G "
NODELIST CPUS MEMORY AVAIL_FEATURES GRES
sh-01-[01-36],sh-02- 16 64000+ CPU_GEN:IVB,CPU_SKU:E5-26 (null)
sh-03-01,sh-07-[25-3 16 64000+ CPU_GEN:HSW,CPU_SKU:E5-26 (null)
sh-101-[01-58,61-72] 20 128000+ CPU_GEN:BDW,CPU_SKU:E5-26 (null)
sh-112-01 56 3072000 CPU_GEN:BDW,CPU_SKU:E5-46 (null)
sh-02-[13-14] 32 1536000 CPU_GEN:SNB,CPU_SKU:E5-46 (null)
sh-112-[02-03],sh-11 32 512000+ CPU_GEN:BDW,CPU_SKU:E5-26 (null)
sh-09-[03-05],sh-13- 16 64000+ CPU_GEN:IVB,CPU_SKU:E5-26 gpu:8
sh-112-[04,06-07],sh 20 256000+ CPU_GEN:BDW,CPU_SKU:E5-26 gpu:4
sh-113-[12-14],sh-11 20 256000 CPU_GEN:BDW,CPU_SKU:E5-26 gpu:4
sh-09-[01-02] 16 256000 CPU_GEN:IVB,CPU_SKU:E5-26 gpu:8
sh-112-05,sh-113-08 20 256000 CPU_GEN:BDW,CPU_SKU:E5-26 gpu:4
sh-17-29,sh-27-[21,3 16 128000+ CPU_GEN:HSW,CPU_SKU:E5-26 gpu:8
sh-103-[25-36],sh-10 24 191000+ CPU_GEN:SKX,CPU_SKU:5118, (null)
sh-09-[07-10] 16 64000 CPU_GEN:IVB,CPU_SKU:E5-26 gpu:4
sh-15-[01-08],sh-16- 16 128000+ CPU_GEN:IVB,CPU_SKU:E5-26 gpu:8
sh-17-12,sh-25-23 48 1536000 CPU_GEN:HSW,CPU_SKU:E7-48 (null)
sh-17-[23-28] 32 256000 CPU_GEN:HSW,CPU_SKU:E5-26 (null)
sh-28-10 20 256000 CPU_GEN:HSW,CPU_SKU:E5-26 (null)
sh-112-[08-12],sh-11 20 128000+ CPU_GEN:BDW,CPU_SKU:E5-26 gpu:4
sh-112-[13-17],sh-11 20 256000+ CPU_GEN:BDW,CPU_SKU:E5-26 gpu:8
sh-114-[01-04] 20 256000 CPU_GEN:BDW,CPU_SKU:E5-26 gpu:4
sh-15-[09-10],sh-19- 16 128000 CPU_GEN:HSW,CPU_SKU:E5-26 gpu:8
sh-18-[01-10] 24 96000 CPU_GEN:HSW,CPU_SKU:E5-26 (null)
sh-18-11 28 128000 CPU_GEN:HSW,CPU_SKU:E5-26 (null)
If you want to do a custom format, take a look at the formatting examples and options for sinfo on this page.
One thing I always had a hard time with (and I don’t have a good answer beyond looking at cluster-specific documentation) is “What in the heck do the features actually mean?” The best I could find is to poke around and look at the slurm configurtation file at /etc/slurm/slurm.conf. I think it’s non standard (and not found across clusters) because our cluster admin is a badass. but it helped explain the feature listed by sinfo.
# -- Nodes --------------------------------------------------------------------
# >> Nodes features
#
# -- CPU features
# * CPU_GEN: CPU generation: SNB|IVB|HSW|BDW|SKX|CNL
# * CPU_SKU: CPU model : E5-2640v2
# * CPU_FRQ: CPU frequency : 2.60GHz
#
# -- GPU features
# * GPU_GEN: GPU generation: KPL|MXW|PSC|VLT
# * GPU_BRD: GPU brand : GEFORCE|TESLA
# * GPU_SKU: GPU model : TITAN_{BLACK,X,Xp}|TESLA_{K{20,80},P40,P100}
# * GPU_MEM: GPU memory : 8GB
# * GPU_CC: GPU Compute Capability : 3.5|3.7|6.1
# >> Weights
#
# * Nodes with lower weight will be selected first.
# * Nodes with more memory or GRES are given a higher weight, so they're
# selected last and saved for jobs that really need it
# * Nodes with more recent CPU generation will be selected first to give the
# best performance
#
# Weight mask: 1 | #GRES | Memory | #Cores | CPUgen | 1
# prefix is to avoid octal conversion
# suffix is to avoid having null weights
#
# Values:
# #GRES none: 0 Memory 64 GB: 0 #Cores 16: 0 CPUgen ???: 3
# 1 GPU: 1 96 GB: 1 20: 1 CNL: 4
# 2 GPU: 2 128 GB: 2 24: 2 SKX: 5
# 3 GPU: 3 192 GB: 3 28: 3 BDW: 6
# 4 GPU: 4 256 GB: 4 32: 4 HSW: 7
# 6 GPU: 5 384 GB: 5 48: 5 IVB: 8
# 8 GPU: 6 512 GB: 6 56: 6 SNB: 9
# 10 GPU: 7 1024 GB: 7
# 16 GPU: 8 1534 GB: 8
# 3072 GB: 9
All that said, knowing that “HSU” is a kind of CPU generation doesn’t really help me much. I would never know how or when it is appropriate to ask for these features. Does anyone else have thoughts on this?