Some of your processes may have been killed by the cgroup out-of-memory handler

Hi,

I sometimes encounter the following error.

/var/spool/slurm/d/job55019920/slurm_script: line 16: 13119 Killed python functions_to_select_pages.py --sample sample7500 slurmstepd: error: Detected 3 oom-kill event(s) in step 55019920.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

If I run my code (python file) alone, I do not think the error happens.
It happens when I run multiple jobs running the same code.

What does the error mean? Is there a way to trace it and understand why it happens?

Thank you!

Hi Antoine,

The error message shows that your job was killed because it exceeded the memory limit specified by the job. You can use β€˜top’ to view in real time how much memory is being used by your python program when it is running. To further profile memory usage by the code, you can use the methods mentioned in this article:

https://www.pluralsight.com/blog/tutorials/how-to-profile-memory-usage-in-python

Ping

1 Like