Improving Utilisation in HPC & AI – Part 2
Part 2 of how to get more from your expensive HPC and AI compute resources without spending more money
Improving utilisation in HPC and AI can sometimes mean changes that appear counterintuitive. This is the first of those.
Imagine you have several different workloads, all requiring the same (or similar) hardware but each one demanding that it is not impacted in by the others. This could be different trading desks running intraday risk, different AI models being used for inference, end of day risk batch jobs or training a new LLM etc.
The usual solution to this is to carve up the compute capacity and allocate a proportion to each type of workload. This certainly solves the problem. There is no way your end of day risk or LLM training will impact on your intraday risk or inference. Job done (pun intended. Sorry). Not so fast.
Depending on how predictable (and consistent) your workloads are this may be enough. You have several happy customers and high utilisation. Chances are though, that the allocation to each type of workload isn’t so static. And segregating capacity probably was. Which means that your utilisation rates probably aren’t so great. Not only that but shuffling capacity around between use cases is probably getting old quickly.
What if you didn’t split up the resources. Sharing is caring right?
The immediate solution is of course to simply share the entire resource across all workloads but this will mean that if your LLM training overruns you might not have capacity to serve your intraday trading risk. Oops. You’ll be getting an irate phone call faster than your scheduler can queue a new job. However, some schedulers (and apologies but this tip is scheduler dependent) will allow certain guarantees to be made for each type of workload (capacity, hours of operation) and will automatically pre-empt other workloads to meet these capacity guarantees.
Often this may be coupled with the ability to snapshot workloads (especially if your workload has long job times or slow start up times) to minimise lost compute time.
In practice I’ve seen this sort of change result in upwards of 20 percentage points of improvement in utilisation rates with little downside (assuming your scheduler can support capacity guarantees).
This type of workload decorrelation often also leads to more predictable overall load on the compute resource. This may seem counterintuitive but actually mirrors other systems that operate at large scale such as AWS S3.

