Ingredients of a Benchmark
How would you go about creating a single representative benchmark for all your different workloads. This isn’t a rhetorical question where I give you the answer at the end. I’m genuinely asking!
I don’t have any wisdom to share today. Instead, I have a question for you. I supposed it being Friday the 13thdoes have some spooky effects.
Enough ghosts, let’s get down to business. Imagine (or maybe you don’t need to) you work in a large organisation that runs a diverse set of workloads on a shared compute infrastructure. Call it a compute grid, supercomputer, AI factory whatever. It’s all the same thing 😁
Examples could include a bank running risk for multiple asset classes, combinations of portfolio and market risk, both real time and end of day risk. Or an AI startup that might be running inference on various different models but also a training run for its new model and RL for custom enterprise customers. You get the idea.
How would you, given that diversity of workload, define a truly representative single benchmark?
My own answer to this question has always been to simply not answer it. I have refused its validity as a question and pushed back to reframe it. But that doesn’t always work and frankly is sometimes the wrong thing to do too. There are valid reasons for wanting an answer to this question.
What would your methodology be for determining such a benchmark. A friend of mine once said any workload can be represented as some combination of a set of standard CPU/ GPU/ IO benchmarks. I’m sure he’s right. What’s the correct methodology for determining the proportions though? For a single workload that’s fairly clear. What about when there are many that differ significantly?
I’m looking for some top notch benchmark alchemy here folks!
Answers on a postcard to HMx Labs Central. Or you know, just in the comments or by DM / email.
