Dreams and Wishes for HPC in 2026

At the start of 2025 I avoided the temptation to write my predictions for HPC and instead chose to write what my own hopes for 2025 were. A year on, it’s time to revisit that. What moved, what didn’t and what have I changed my mind on.

Dreams and Wishes for HPC in 2026
Dreams and Wishes for HPC in 2025
A contrarian look at what I hope will be forthcoming for HPC in 2025 rather than any real predictions.

As per last year, I’m not going to be drawn into the trap of imagining I have any special insight that you don’t and calling these predictions. They are merely what I would love to see in terms of evolution of the industry and based on the problems we see working with our clients. The gap between the art of the possible and the vision of what could be.

I think it’s no longer possible to talk about HPC without also talking about AI. At the very least the parts of AI that overlap with HPC, though to be honest it tends to be difficult to isolate it to that. This is both good and bad. Its good because it has increased the visibility and demand for HPC in a way that nothing else ever has. Even if that often takes the form of not recognising what is being demanded is in fact HPC. To the extent that AI, whether that is training or inference, is just another HPC workload (despite the AI bros protestations otherwise) I think it needs to be a part of the HPC conversation. I think we are also seeing a slow acknowledgement from the AI industry that there is experience and expertise in the HPC community that is required to solve the problems that AI has today. Namely utilisation and cost optimisation.

Breaking out of its Niche

I think HPC is still very much stuck in its little cubby hole. Despite the huge prevalence of AI. Despite all the talk about poor utilisation levels. Despite more and more systems running at large scale.

I think the problem here is that HPC is a very broad niche (I’ve talked about this before). Many of the problems that need to be solved within other areas are often not seen as HPC because they either don’t sit under the classical HPC umbrella of usages or people don’t want to associate with “old” technology.

Either way, AI’s fundamental problem of being power constrained hasn’t shifted, and if anything, the lack of sensible utilisation levels has only been highlighted further in 2025. While I’m hopeful that events such as Nvidia’s acquisition of SchedMD provides a signal that HPC expertise at least if not products will become more highly valued.

Stop Reinventing the Wheel

Somewhat in line with the above, I do wonder if we’ll see Nvidia build on Slurm (as well as no doubt integrate it within its control plane offering).

Last year we worked on a tool to help select a HPC scheduler. One of the things that struck me was quite how many schedulers there are. Far more than I had imagined but also many that are relatively new and under active development. A far cry from the perception of 30year old tech.

Choice is great. They serve difference purposes. That’s fine. But the cloud/ distributed-computing community converging on Kubernetes as a solution served to further the industry as a whole much faster than multiple competing and fragmented products ever could have. Even if K8 is not the right solution for many.

Sadly, my gut feel is that even though Slurm is the dominant scheduler and is now owned by a company with market cap bigger than the GDP of most of the world, we still won’t see that happen.

The features such a scheduler would require in order to drive mass adoption, while they may be forthcoming, will likely be limited to Nvidia hardware. 

I keep promising (and failing) to write a piece on what we need to see in a true next generation scheduler. It’s coming real soon. With a twist and a crazy experiment. Watch this space! (For the un-initiated, that means hit subscribe).

Go From It Kind of Works to It Just Works

Yea… we’re still a long way from this. If anything, we’re getting further away. Increased hardware diversity is great but the software to hide if from the people that just want to science, finance or AI isn’t keeping pace. More on that below.

Explode the Monolith

This is another area that hasn’t moved the needle much. We tried but ran out of money and time for 2025 but will continue this year. IYKYK.

Most vendors still want to build the thing that runs your supercomputer as a single solution that does everything. Given most users can’t even pick a scheduler that does everything they need, the hope that your control, data and observability planes will also do everything is misplaced.

The sooner we evolve into an ecosystem where components can be swapped at will, the sooner we can progress to solving the problems the end users care about and with a bit of luck burning a bit less energy in the process.

Embrace Diversity

On this front at least we’ve seen some great progress. ARM has made brilliant in-roads in 2025 with significant vendors adopting ARM support for their software in the HPC space. Further, we’ve even seen a large diversity and adoption of GPU and APUs.

This is great news. Now we just need to make it so that the cancer researchers, weather forecasters, finance quants, and yes even AI scientists can just define their problem, hit go. Let the software figure out what’s free, what’s fastest/cheapest/best and just run it on that. Because you know what, they still don’t give a FLOP if you ran on x86 or ARM or an A100 or a MI3000. Get the results back as fast as possible with what’s available. Period.

HPC and AI

Well, this topic got a whole lot more convoluted this year.

It’s now almost impossible to find HPC news that isn’t just AI news wearing a HPC hat. Partly because HPC is a broad niche I guess and partly because AI is just eating the world and partly, I think we just lack the language to be more precise.

HPC and AI are now incestuous bedfellow and there’s not much that’s going to separate them. There are a few different ways that this takes shap. Firstly, we have the usage of AI (in all its flavours not just GenAI, though that is currently prevalent) seeing increasing usage in the problem domains. i.e. we’re seeing finance quants, scientists and other end users adopt AI solutions as part of their workflow. This has an obvious impact on the hardware requirements of the underlying compute. I don’t see that changing. 

What we haven’t seen, and I’d still like to see, is that AI canon being pointed back at the HPC software stack. Using ML techniques to optimise the actual compute.

There is of course a good reason we haven’t seen that. The whole process is a black box. There’s nothing to train on. Yet.

That needs to change.

Final Thoughts

Broadly, many of the problems I saw in the HPC world at the start of 2025 still exist and many of them have been amplified by the demands of AI. I’m quietly optimistic that this means we’ll see better progress in solving some of them. But I’m also pessimistic that given the current power structures in play those solutions may only serve the few. That would be a shame.