What are cgroups and how are people using them for cluster administration?

I have heard that cgroups might be very useful for cluster administrators. Could someone explain what cgroups are and how they are used?

CGroups (in the RedHat/CentOS sense) is a binding mechanism for limiting a “job” to specific cores and ram. This is useful for localizing memory references, as well as minimizing job interaction on a multi-core, multi-gigabyte compute node. I’ve used them extensively on SGI hardware (UV systems) in conjunction with PBSPro in the past, and they provide great localization and isolation, but if a user jobs bloats outside its RAM, that process will swap itself silly. If you have multiple processes doing that, then swap space (and swap performance) both become a serious contention problem. There are other uses, like limiting cpu cycles for a given user group, but I have no hands on experience with those.

CGroups result problems on RHEL 6.x (lost kernel memory), but RHEL says the problems is fixed in RHEL 7. We’re repurposing some old hardware into a new test environment, and that will be RH 7 based, so I can get some hands on with CGroups and make sure the claims about it “working fine in 7” ring true.

1 Like

I’ve heard that one usage of cgroups is for login nodes. Since many more users are likely to be on a login node than compute nodes, it is important to make sure that no one user can grab all the memory / CPU resources. I don’t have first hand experience, but I think it is pretty common. I think we will be trying it on RHEL7 soon where I work.