Huge computation, optimizing batch and requesting multiple nodes on Europa?

mjheard · January 31, 2022, 5:50pm

Hi experts,

I have a very large computation planned–working with MRI data, I want to run a Monte Carlo simulation of my machine learning models. Previously, I’ve been working in the realm of ~1,000,000 models (around 700,000 models per participant, times 11 participants), but this project is going to dramatically increase this number (~1,000,000,000 models).

I asked someone with more experience, they recommended the following settings for the batch:
#SBATCH -t 40:00:00
#SBATCH --nodes=1
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --mem-per-cpu=30GB

Does anyone have recommendations on updates for these settings? And specifically, is it possible to request more than one node on Europa to add some extra power to this project?

Matthew

georgia · February 1, 2022, 12:04am

Hi Matthew!

Can you provide a little more information? What format or language is your program in, how is your data split up, etc.

Requesting more than one node on Europa is definitely possible, but how we can go about it depends on the specifics of your project. It sounds like you have a high throughput setup, so Launcher will likely be the way to go: https://portal.tacc.utexas.edu/software/launcher

Georgia

mjheard · February 1, 2022, 2:56am

Yes, glad to provide more information. My code is written in MATLAB, and the parallelization is managed by the Parallel Processing toolbox. So all I had to add to my code was the following lines which might be relevant to the discussion:

myCluster = parcluster(‘local’);
delete(gcp(‘nocreate’)); parpool(myCluster, 16);

The delete() command is probably legacy from my previous institution. We had the ability to run jobs interactively, so I included it to reset the cluster connection. From there, I simply run a parfor loop.

My data is separated by each subject, saved as one huge matrix per participant. Within the parfor loop, I select a subset of the larger matrix and run computations on it. I then store the output from the parfor loop in another variable.

Let me know if I can provide any more relevant information!

georgia · February 1, 2022, 12:56pm

A few follow up questions…

Are you working with the same 11 participants, just with sampling?
Are the samples independent between AND within each participant (e.g., sample 1 from participant 1 does not affect anything from participant 2 or sample 2 from participant 1).
Is the matrix subset selected at random or systematically?

To my knowledge, we can’t split a single Matlab run across multiple nodes on Europa, but we can surely slice and dice your data to make it suitable for Launcher so you can run multiple matlab jobs at once.

mjheard · February 1, 2022, 4:15pm

The matrix stores information from each voxel. I am running the classification on beta estimates from a general linear model. So, the matrix has dimensions [# voxels] x [# betas]. I pull specific voxels and betas out–it’s a searchlight algorithm.

I plan to run the 1000 simulations for each participant, within each searchlight of voxels.
I am pretty sure each sample is independent. I pull certain voxels and betas from the large matrix and save it as a temporary variable. Occasionally I need the same values across different iterations of a loop. Then I perform computations on the temporary variable, and I poke the resulting output into another variable.
These subsets are systematic–the matrix represents values from voxels, and I pull a subset of voxels (i.e. a searchlight) and then run computation.

Definitely disappointed to hear we can’t split runs across nodes. I can easily slice the data so I run one node per subject. Hopefully I don’t need to slice it any smaller than that!