How can I use SLURM’s sacct command to show memory usage statistics on a cluster that I am administering?

I want to see the memory footprint for all jobs currently running on a cluster that uses the SLURM scheduler. When I run the sacct command, the output does not include information about memory usage. The man page for sacct, shows a long and somewhat confusing array of options, and it is hard to tell which one is best.

1 Like

This topic has been addressed in stackoverflow at:

Rephrased and enhanced by me:

As stated in the sacct man pages:

sacct  -  displays  accounting  data for all jobs and job steps in the Slurm job accounting log or Slurm database

Viewing the man pages offers help with options and output formatting, but as stated in the stackoverflow response, MaxRSS and CPUTime are probably the fields you need.

For example:

[battelle@mio001 ~]$ sacct -j 4296946.batch --format="CPUTime,MaxRSS"
   CPUTime     MaxRSS 
---------- ---------- 
  00:08:00    669060K 

Here the jobid is 4296946. I added the .batch because I have no associated job steps and I submitted in batch.

An option to sacct is to use sstat; the job must be running to produce output with this command.
Example:

[battelle@mio001 tbDocu52a]$ sstat -j 4296949.batch --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID
    AveCPU   AvePages     AveRSS  AveVMSize        JobID 
---------- ---------- ---------- ---------- ------------ 
 00:29.000          0    761920K  10872144K 4296949.bat+ 

Fields are analogous to those of sacct;
AveVMSize refers to average virtual memory size of all tasks in the job.

At our site we run a program called ganglia on the head node which all users can access;
it shows many job properties, including real-time memory use for a given node.

ANSWER:

As an addendum to my previous reply; one can view all information on all jobs on the cluster as well; here is one example of how to do so (without the actual output… it’s similar to that of a specific job):

[battelle@mio001 ~]$ sacct -a --format="CPUTime,MaxRSS,JobID"
   CPUTime     MaxRSS        JobID 
---------- ---------- ------------