Many sites have multiple generations of clusters how do you deal with the proliferation of models and remote management controllers? Every new system seems to bring another challange to find it on the network, boot and get the initial OS installed. Sure the top three vendors typically have their own semi-consistent lights out management controllers but after that it is the flav of the day from ASUS, GigaByte, Supermicro and I’m not even accounting for all the various iterations and of BIOS vendors.
How does your site deal with this?
Personally I would like to see all non-open source BMC, bios and PXE firmware implementations go away but that is another question for another day.
All major manufacturers of BMC chips including those of the Open Compute Project now support Redfish as a replacement for IPMI. This is in all major HPC and server-class vendor hardware including the ones that you mentioned. HPC software including cluster configuration tools have however been slow to implement this capability so far, and we are working on this at TTU. For example, one of our students got one of the OpenHPC mentorships this fall to implement tools that can do the discovery phase for building a cluster much faster using Redfish than IPMI, and plans to release that shortly. It would help to have a clean description of the problem to be solved with respect to going further to solve, for example, the boot provisioning and image discovery parts of the problem. If you’d like to work with us on this, please respond or feel free to email me directly.