SGE Accounting Logs

Greetings all,

We are trying to pull historical information from our old SGE accounting logs. I’m able to get the entire job history from the logs we have, by using qacct – j. However, the one thing I cannot seem to find in the logs is the number of compute nodes each job used.

I know I can see that info for currently running SGE jobs, but I need historical data. I’ve Googled until my fingers bled, but can’t find anything that seems to work.

Any help or being pointed in the right direction would be greatly appreciated.

Many thanks!

Brent
UTSA

Hi Brent!

In our setup, we define different parallel environments (PEs) based on the number of cores of each machine in that environment. So, for example, our ‘mpi-24’ PE has 24 core nodes. The output from qacct shows us the number of slots for a given job along with the PE it ran on. For a particular job, I see that:

# qacct -j 258039 | grep -E 'slots|granted_pe'
granted_pe   mpi-24              
slots        96                 

From this information, I know that the job used 4 nodes. If your site is setup similarly, this should allow you to calculate it.

Hope this helps.

Hi Brent,

Check out the fields $34 and $35 in the SGE accounting log. These are granted_pe and slots respectively.

Here is the list of all the fields in the accounting file for your reference:

            qname=plinesplit[0];

            hostname=plinesplit[1];

            group=plinesplit[2];

            owner=plinesplit[3];

            job_name=plinesplit[4];

            job_number=plinesplit[5];

            account=plinesplit[6];

            priority=plinesplit[7];

            submission_time=plinesplit[8];

            start_time=plinesplit[9];

            end_time=plinesplit[10];

            failed=plinesplit[11];

            exit_status=plinesplit[12];

            ru_wallclock=plinesplit[13];

            ru_utime=plinesplit[14];

            ru_stime=plinesplit[15];

            ru_maxrss=plinesplit[16];

            ru_ixrss=plinesplit[17];

            ru_ismrss=plinesplit[18];

            ru_idrss=plinesplit[19];

            ru_isrss=plinesplit[20];

            ru_minflt=plinesplit[21];

            ru_majflt=plinesplit[22];

            ru_nswap=plinesplit[23];

            ru_inblock=plinesplit[24];

            ru_oublock=plinesplit[25];

            ru_msgsnd=plinesplit[26];

            ru_msgrcv=plinesplit[27];

            ru_nsignals=plinesplit[28];

            ru_nvcsw=plinesplit[29];

            ru_nivcsw=plinesplit[30];

            project=plinesplit[31];

            department=plinesplit[32];

            granted_pe=plinesplit[33];

            slots=plinesplit[34];

            task_number=plinesplit[35];

            cpu=plinesplit[36];

            mem=plinesplit[37];

            io=plinesplit[38];

            category=plinesplit[39];

            iow=plinesplit[40];

            pe_taskid=plinesplit[41];

            maxvmem=plinesplit[42];

            arid=plinesplit[43];

            arid=plinesplit[44];

Hope this helps!

Thanks,

TV