Measuring Cluster Performance

Attempting to come up with a realistic distributed benchmark that includes not only the hardware but the also the software such as the scheduler, the data plane and internode communication

Measuring Cluster Performance

I’m trying to do something a bit silly. Again.

I want to measure the performance of a HPC system (full on supercomputer, tiny Beowulf cluster, whatever) that includes the impact not only of the hardware but of all the software that makes it go.

I know we already have HPL, the most useful benchmark in HPC if not the world, but that rather cunningly manages to evade having to use much else apart from MPI.

Somewhat in the spirit of COREx, I want something that is representative of real systems, so it needs to include an actual scheduler and all the related duct tape and string that goes with it. Here’s the tricky bit though: It also needs to be able to swap out the scheduler and various bits of string.

My initial aim is to do this using COREx as the workload but the engineer in me is already trying to generalise the solution to swap in any workload.

Of course, the problem is compounded by the fact that the amount and shape of tape and string needed varies rather a lot. For example, something like IBM Symphony natively has the ability to pass data to the workload and has a client API. Swap that out for Flux and perhaps I could use Flux KVS to pass around some of that data but I’d need a bit more tape to do it. Plus, I’d need to install and use the Flux REST API. Swap it for Slurm and I probably need to pull in a KV cache too, I think?

All of the above is eminently possible, but not trivial. Leaves me wishing we’d already achieved some of our aims of Supercomputing Strategy Group!

In all of this, I do have a question for you. Especially those of your responsible for delivering HPC platforms to your users. How important is it that the cluster benchmark be truly representative? What parts of your stack would you want to swap out and measure the impact on performance? Where can we compromise and standardise?