How I can view the logs for my jobs on SLURM for some period of time? I ran a number of jobs and I would like to know how much memory was used, how long did the job take, and its exit status?
CURATOR: Katia
How I can view the logs for my jobs on SLURM for some period of time? I ran a number of jobs and I would like to know how much memory was used, how long did the job take, and its exit status?
CURATOR: Katia
ANSWER:
The sacct
command is a good way to extract information about previous SLURM jobs. Use of the man pages ($ man sacct
) will list and explain options for inputting parameters for the data one wishes to view, and specifying the desired output in the desired format.
The options -S and -E allow one to select the start date and end date respectively. Start date selects jobs in any state (i.e. COMPLETED, FAILED) that started before that date. End date selects those that finished prior to this date. The resulting data can be output and formatted based on command line options. Here is an example:
$ sacct -a -S2018-03-15-10:30 -E2018-03-31-10:30 -X -o jobid,start,end,state
A sample output is below:
4194000 2018-03-16T00:58:26 2018-03-16T02:20:38 COMPLETED
4194001 2018-03-16T02:20:43 2018-03-16T02:22:00 COMPLETED
4194002 2018-03-16T02:22:04 2018-03-16T02:23:07 COMPLETED
4194562 2018-03-15T07:41:25 2018-03-17T09:57:14 CANCELLED+
4194563 2018-03-15T07:43:40 2018-03-17T09:59:59 CANCELLED+
4194564 2018-03-15T07:48:45 2018-03-19T16:19:36 PREEMPTED
4194565 2018-03-15T07:49:45 2018-03-19T16:19:36 PREEMPTED
4194566 2018-03-15T07:51:10 2018-03-19T16:21:38 PREEMPTED
4194585 2018-03-15T09:22:37 2018-03-15T10:33:03 FAILED
4194586 2018-03-15T09:22:37 2018-03-15T10:31:44 FAILED
The -X
option displays only cumulative stats for each job, leaving out intermediate steps.