I am exploring a parameter space, and need to launch several hundred variants of the same small job. What can I do to ensure the shortest completion time?

I have several hundred small jobs to run on a shared system with a queue scheduler.
What are the relevant factors to optimize for highest throughput?

CURATOR: John Goodhue

For shared systems with a range of usage type, highest throughput generally comes from having fewer, and more easily met job requirements. Generally this means asking for as few cores as possible (ideally 1), with as little memory as possible.

Array jobs are usually great for managing this (see the instructions of the specific batch scheduler for details).

If the job requirements are very divergent, say for example the memory requirements range from 100MB to 10GB, and this is determinable from the input files, it may help to batch them in smaller groups by range.

Note that pushing these limits is often not considered ‘good’ use of the resource, there is a point at which the time and resources used to start and end the job become a major part of the usage, reducing the overall efficiency of the machine etc. A common compromise is to batch the jobs, so while they still request minimal resources several run sequentially within a task.