I’m running a Sage MATH script I wrote with GNU parallel to get local parallelization, however I am currently looking to scale up to a cluster that uses Slurm as a workload manager.
I am using a simple GNU parallel script locally:
#!/bin/bash
time parallel --timeout 10 -j$(nproc) -N0 …/sage ./loader.sage.py ::: {1…4000} --progress echo {} >/tmp/out
I’m not sure how to go about recreating this with Slurm, I would appreciate any and all suggestions of resources and guides to look at. I am also looking to implement an output file that uses a first-in-first-out semaphore, and ideally an input file to read from. If any of this doesn’t make sense, I’m happy to clarify.
I’m not too familiar with GNU parallel or Sage, but this looks like it would work well as a Job Array, which Slurm supports.
Slurm thinks in terms of tasks. For you, each task would be an individual Sage process. Each task in the array runs the same script, but each may do something different in that script based on which Task ID they are, such as run on a different parameter or set of parameters. In my experience, creating a single task for each parameter is a bad idea, it will bog down the scheduler and slow the scheduling process down for everyone. You also end up spending a lot of time starting up (starting your program, loading packages if needed, etc, adds up if you have a lot of tasks). It’s best to batch up your parameters so you have a set number of tasks that you pick, say 4 for example, that each iterate through a number of parameters.
I have some examples of this in a github repo for a cluster at MIT. Take a look at this Python Example, from the quick look I did it looks like Sage uses a lot of the same syntax as Python. You’ll want to look at both the Python and submission script.The trick is to first convert your code to one big for loop that iterates over the parameters you want to use. Then it’s a matter of adding about three lines to your code, and using a submission script like the one I have in the repo.
Happy to answer any questions.
Thank you very much! I’m actually working on Northeastern’s Discovery cluster. Crossing my fingers this will work for both my Python bits and my SAGE stuff!