How can I check the current status reports on any of the HPC clusters at OSC?
There are currently three methods to check the status of HPC clusters at OSC. The most general method is through OnDemand. By logging into the OnDemand Dashboard, you can click “show status” under the “Clusters” tab to see a page reporting the entire system’s status. This includes the nodes and cores in use, and the number of running, queued, and blocked jobs for each cluster. Clicking on any cluster will take you to a more in-depth rundown of that system’s performance. More information, as well as a picture of the OnDemand system status page, can be found at https://www.osc.edu/resources/online_portals/ondemand under “system status”
The second method is using the showq
command. When you are logged in to one of the HPC clusters, typing showq
will list all the current job information as seen by the scheduler. This will group jobs in by their state (running, idle, or blocked). showq
will show the job’s ID, user, state, cores used, time remaining or walltime, and starttime or queuetime. More information can be found at https://www.osc.edu/supercomputing/batch-processing-at-osc/monitoring-and-managing-your-job under “showq”
The last method is using the checkgpu command. This command reports information regarding OSC’s GPU nodes. This command can report:
- The total number of jobs using gpu nodes
checkgpu -j/--jobs
- The total number of running jobs using gpu nodes
checkgpu -j/--jobs -r/--run
- The total number of queued jobs using gpu nodes
checkgpu -j/--jobs -q/--queued
- The total usage of the gpu nodes
checkgpu -n/--node
- The total number of used gpu nodes
checkgpu -n/--node -u/--used
- The total number of available gpu nodes
checkgpu -n/--node -a/--avail
- Any report can be expanded to report individual job and node reports using
-v/--verbose
For more information please refer to https://www.osc.edu/resources/getting_started/osc_custom_commands/checkgpu_command