Hi,
I sometimes encounter the following error.
/var/spool/slurm/d/job55019920/slurm_script: line 16: 13119 Killed python functions_to_select_pages.py --sample sample7500 slurmstepd: error: Detected 3 oom-kill event(s) in step 55019920.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
If I run my code (python file) alone, I do not think the error happens.
It happens when I run multiple jobs running the same code.
What does the error mean? Is there a way to trace it and understand why it happens?
Thank you!