An AI Future Will Need HPC
If AI is indeed the future and we have many AI models, we will also need HPC to manage them
I was watching an infomercial by Fireship (at least he’s up front about it!) earlier and something he said made me sit up and think… AI will need HPC.
In his latest video Fireship posits that if we were to imagine a future where AI actually delivers everything it currently promises to, this will likely take the form of many specialised models rather than a single general purpose model.
He suggests we’ll have models for lawyers, doctors, software developers and so on. In such a world, all these models will also therefore compete for resource to actually run. The video then goes on to talk about NVIDIA NIM as a solution, based on Kubernetes, but I had stopped listening at that point.
Firstly, Kubernetes used to allocate large scale compute, at the scale of HPC, at the scale at which AI will no doubt require, doesn’t work. I’ve tried it. I’ve seen many other very large HPC users try it. I’ve yet to see it be successful.
Multiple applications all making large demands on a finite pool of compute resource. Where have we seen that before? We already have a solution to this problem. HPC applications or LLM models, it doesn’t really matter. Sharing resource between competing users that all have large demands… isn’t that what an HPC scheduler does?
So we come back full circle. From HPC enabling the initial training of many AI models to it scheduling and allocating resource for competing models in the future.
Oh and I don’t buy into the future that the your cloud solution architect has been selling you for HPC. That compute is essentially unlimited on the cloud, so a scheduler isn’t required. That isn’t completely true for CPUs today in the cloud and certainly won’t be true for GPUs in cloud for quite some time if AI actually delivers on what its promising.