I am running an MPI job using 8 nodes with 16 cores each. When I execute qstat -u username
command it shows only the master node. How can I view all the nodes that are used for my job?
Curator: Katia
I am running an MPI job using 8 nodes with 16 cores each. When I execute qstat -u username
command it shows only the master node. How can I view all the nodes that are used for my job?
Curator: Katia
This will not give you the precise answer if you have multiple, multiple node MP jobs running, but will give you all the nodes that all your jobs are running on:
http://moo.nac.uci.edu/~hjm/qbetta
It’s a fast and dirty perl script that merges the output of ‘qstat -s r’ and ‘qhost -h (host) -q’ and does some dirty math on the result to show which nodes are under/overloaded.
grep the result for anything you want (usually hostnames or usernames).
Here’s a stanza of output. takes no option - just grep for what you want.
Shows most of the info shown from 'qhost -q' and 'qstat -s r' but in one
line. Also shows whether a node is over (+) or under(-) loaded. At the end
of each line is the status of all Qs that use this node. Only compute nodes
are shown in this output.
under/ CPUs RAM (Assigned/Total)
HOSTNAME over USED/TOTAL USED/TOTAL Queue v [flags] users,jobs
compute-1-10 64.06 / 64 3.5G / 126.2G free64(64/64) vturlo,64 tw(0/64)
compute-1-11 64.03 / 64 3.9G / 126.2G free64(64/64) vturlo,64 tw(0/64)
compute-1-12 - 27.01 / 64 1.8G / 126.2G free64(0/64)[S] tw(24/64) frankes,24
compute-1-13 - 0.05 / 64 3.8G / 252.4G
compute-1-14 - 0.53 / 24 7.2G / 94.7G free24i(0/24)[S] gpu(3/24) staimour,2 yoshitom,1
compute-1-2 - 3.99 / 64 3.9G / 252.4G abio(0/64) free64(64/64) jfarran,61 vojh1,3 sf(0/64)
compute-1-3 64.04 / 64 6.2G / 252.4G free64(64/64) meganjm1,64
compute-1-4 64.07 / 64 4.2G / 252.4G air(0/32) chem(0/32) free64(64/64) vturlo,64
To run it, you’ll also need scut
qstat by default tries for easy reading - and limits output to one line per, but there are several way to customize the output. $ man qstat
or
http://gridscheduler.sourceforge.net/htmlman/htmlman1/qstat.html
has number of options for outputs.
In this case the -g
flag might do the trick, add -g t
and it will give the info one line per process/processor and a bit more info.
For example if you want to keep a snapshot of this for later parsing:
$ qstat -g t -u username >> myQstatOutput
as explained by the man page
With -g t parallel jobs are displayed verbosely in a one line per parallel job task fashion. By default, parallel job tasks are displayed in a single line. Also with the -g t option, the function of each parallel task is displayed, rather than the jobs slot amount (see section OUTPUT FORMATS).
** EDIT: Additionally qhost -j
will give you jobs by host set.
Both can be select jobs by user name with the -u flag.