Hey HPC nerds! I’m putting together a little tutorial and introduction to job arrays (very simple) and the most important detail is having a list of compelling reasons we would want to use them in the first place, say, over something standard like submission with sbatch. Let’s put our heads together and think! I’m relatively new to using them so my list is likely limited.
Running a randomized simulation many times, with output files numbered from some 1…N. The output files can be numbered according to the array index. This means we use the array index as a variable to name our output files.
Running an analysis over inputs, where each input is named according to the array index, and outputs follow suit. This could also be applied to directory names.
Any large data operation where the results are not dependent on each other and the input and outputs can be designated by the iterator. Here’s what we put together to demo the problem and some approaches:
In this case, the scheduler is SGE, but the approaches (and some of the problems addressed) are identical among schedulers:
An example which reads multiple parameters from a separate file is handy, e.g. in the job script:
#!/bin/bash
# An input file with a line for each array element
# and parameters separated by spaces.
PARAMS=/path/to/parameter/file.txt
read -ra params <<<"$(sed ${SLURM_ARRAY_TASK_ID}'q;d' $PARAMS)"
After which params will be an array with all the things from line ${SLURM_ARRAY_TASK_ID} in the $PARAMS file. By using an array it allows the number of parameters to change, so the next lines in the script might be
edit: Although I use the line read from the file as parameters here, there’s no reason why the entire command can’t be in the parameter file, so the array can be used really to run any arbitrary set of commands as the tasks.
Also when working on a cluster which limits the number of job array elements, using a step in the array spec and then having each array task run several commands for it’s step can be useful. Say with
--array=1-100:10
Then each element can work on lines ${SLURM_ARRAY_TASK_ID} through ${SLURM_ARRAY_TASK_ID} + 10 of the input file. Season to taste, of course.