How do I check the status of HPC systems?

tellison · March 16, 2020, 8:51pm

How can I check the current status reports on any of the HPC clusters at OSC?

tellison · March 16, 2020, 8:54pm

There are currently three methods to check the status of HPC clusters at OSC. The most general method is through OnDemand. By logging into the OnDemand Dashboard, you can click “show status” under the “Clusters” tab to see a page reporting the entire system’s status. This includes the nodes and cores in use, and the number of running, queued, and blocked jobs for each cluster. Clicking on any cluster will take you to a more in-depth rundown of that system’s performance. More information, as well as a picture of the OnDemand system status page, can be found at https://www.osc.edu/resources/online_portals/ondemand under “system status”

The second method is using the showq command. When you are logged in to one of the HPC clusters, typing showq will list all the current job information as seen by the scheduler. This will group jobs in by their state (running, idle, or blocked). showq will show the job’s ID, user, state, cores used, time remaining or walltime, and starttime or queuetime. More information can be found at https://www.osc.edu/supercomputing/batch-processing-at-osc/monitoring-and-managing-your-job under “showq”

The last method is using the checkgpu command. This command reports information regarding OSC’s GPU nodes. This command can report:

The total number of jobs using gpu nodes checkgpu -j/--jobs
The total number of running jobs using gpu nodes checkgpu -j/--jobs -r/--run
The total number of queued jobs using gpu nodes checkgpu -j/--jobs -q/--queued
The total usage of the gpu nodes checkgpu -n/--node
The total number of used gpu nodes checkgpu -n/--node -u/--used
The total number of available gpu nodes checkgpu -n/--node -a/--avail
Any report can be expanded to report individual job and node reports using -v/--verbose

For more information please refer to https://www.osc.edu/resources/getting_started/osc_custom_commands/checkgpu_command