I want to see the memory footprint for all jobs currently running on a cluster that uses the SLURM scheduler. When I run the sacct command, the output does not include information about memory usage. The man page for sacct, shows a long and somewhat confusing array of options, and it is hard to tell which one is best.
This topic has been addressed in stackoverflow at:
Rephrased and enhanced by me:
As stated in the sacct
man pages:
sacct - displays accounting data for all jobs and job steps in the Slurm job accounting log or Slurm database
Viewing the man pages offers help with options and output formatting, but as stated in the stackoverflow response, MaxRSS and CPUTime are probably the fields you need.
For example:
[battelle@mio001 ~]$ sacct -j 4296946.batch --format="CPUTime,MaxRSS"
CPUTime MaxRSS
---------- ----------
00:08:00 669060K
Here the jobid is 4296946
. I added the .batch
because I have no associated job steps and I submitted in batch.
An option to sacct
is to use sstat
; the job must be running to produce output with this command.
Example:
[battelle@mio001 tbDocu52a]$ sstat -j 4296949.batch --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID
AveCPU AvePages AveRSS AveVMSize JobID
---------- ---------- ---------- ---------- ------------
00:29.000 0 761920K 10872144K 4296949.bat+
Fields are analogous to those of sacct
;
AveVMSize
refers to average virtual memory size of all tasks in the job.
At our site we run a program called ganglia on the head node which all users can access;
it shows many job properties, including real-time memory use for a given node.
ANSWER:
As an addendum to my previous reply; one can view all information on all jobs on the cluster as well; here is one example of how to do so (without the actual output… it’s similar to that of a specific job):
[battelle@mio001 ~]$ sacct -a --format="CPUTime,MaxRSS,JobID"
CPUTime MaxRSS JobID
---------- ---------- ------------