Config to select infiniband nodes

Is there a way to select only nodes with infiniband using slurm?

Ordinarily this isn’t a problem because the first racks on Europa all have infiniband, so by default my jobs end up on a suitable set of nodes. But as usage increases moving forward it will be useful to have this functionality. I tried running VASP on a set nodes without infiniband and the performance hit was crippling. It’ll be preferable to wait in the queue than run on nodes without.

I’ve read that this can be done using the #SBATCH --gres tag, but I think this requires configuration the Europa does not yet have.

Thanks,
Patrick

Hey Patrick,

Really good question. Longer term, I was thinking of having two queues, normal and batch. Normal would have the infiniband interconnect but batch wouldn’t; batch jobs could run on the normal queue but not vice versa.

However, in the interim, here is what I recommend. You can specify a list of nodes that are eligible to run a job by using the -w flag. I’d recommend using the range options to specify them and check out the man pages for more info. However, first we need to know what racks have IB and what don’t.

@solj when you get a second, can you let us know which racks have IB?

Thanks,
Chris

This is strange. I had intended that only nodes with infiniband would be available via slurm (I thought I had removed those without from the queue). I will double-check that the nodes missing IB connections are marked unusable (for now).

The problem I encountered was on compute-2-x-x nodes. Previously I had run all my tests and calculations on the compute-1-x-x nodes, and largely everything worked fine. But on 2-x-x VASP had extremely slow startup times. I assumed this was due to lack of infiniband, but if that’s not the case there may be something else going on that I didn’t notice.

@prconlin if you can send me a specific list of nodes that had issues, that’d be great. I did a cursory check just to verify IB connectivity and all the nodes that you would be able to submit to at least have IB. It’s still possible something is misconfigured on certain nodes but I’d need the list to check.

Unfortunately I cleaned the directory I was working in and no longer have the slurm.out. I tried to reproduce the issue and tested a few nodes on compute 2-1, 2-2, and 2-3, but everything is behaving as expected.

Please let us know if you encounter the issue again. It’s certainly possible something changed because europa is in constant development (being a beta resource that we are still improving).

I found the problem - and it’s happening again! Another user is running enormous launcher jobs, and somehow that’s killing the startup time of my jobs. Maybe it’s a bandwidth issue?

This thread started when I ran VASP on the 2-x-x nodes for the first time. The reason I did that was because another user was using 250+ nodes for a single job. When I noticed a severe performance hit, I incorrectly assumed there was an issue with the nodes (missing infiniband), but later I couldn’t reproduce the problem. But right now, someone is running a 500 node job, and I’ve got same startup time issue even with tiny jobs.

Hey Patrick,

Thanks for the heads up. I am seeing similar issues and it is related to /scratch being affectively DDoS’d. We are working on some tuning parameters and will get back to you when we have a solution.

Thanks,
Chris