How can one determine the amount of RAM a node has in an HPC environment?

toreliza · May 31, 2018, 6:55pm

wirawan0 · June 15, 2018, 5:16pm

The best way is to consult your site’s documentation.

On SLURM, one can do this in two steps:

invoke sinfo to see the list of nodes and their states
invoke srun to run the free command on the desired compute node. For example, for node named r001, invoke: srun -w r001 free.

Here is a real example from PSC Bridges supercomputer:

$ sinfo
PARTITION  AVAIL  TIMELIMIT  NODES  STATE NODELIST
RM*           up 2-00:00:00      1 drain* r242
RM*           up 2-00:00:00      1   comp r400
RM*           up 2-00:00:00      1  drain r668
RM*           up 2-00:00:00      9   resv r[405-412,670]
RM*           up 2-00:00:00    702  alloc r[006-241,243-399,401-404,413-667,669,671-719]
RM-shared     up 2-00:00:00     21    mix r[720-721,733-747,749-752]
RM-shared     up 2-00:00:00      3  alloc r[723-724,748]
RM-shared     up 2-00:00:00      9   idle r[722,725-732]
...

The nodes under “alloc”, “drain”, “resv” states cannot be reached, so let’s try to see the free memory of r720 (it is on the non-default partition RM-shared so we have to specify that):

srun -p RM-shared -w r720 free
              total        used        free      shared  buff/cache   available
Mem:      131734464    13434632   107366248      560604    10933584   116069696
Swap:      17591292     2375984    15215308

Looks like we have 128GB of total RAM. That matches what is said on its manual page, here (r720 is one of the regular memory nodes): https://www.psc.edu/bridges/user-guide/system-configuration .

jpessin1 · August 3, 2018, 1:21pm

In GridEngine Family systems you can pull up what the Queue Scheduler is using with qhost.

jpessin1 · August 3, 2018, 1:22pm

If you are inspecting directly and need more details than free gives, most systems will let you run other tools like lshw on the job node as well.

mrobbert · July 30, 2018, 9:38pm

The method is dependent on the resource manager or scheduler in use at your site. With Slurm you can quickly see this in the output of

scontrol show node $NODENAME

Replace $NODENAME with the actual name of the node you’re interested in. You may leave the node name off and get a listing that includes all nodes.