Quantum #89

Issue #89 of the weekly HPC newsletter by HMx Labs. Some HPC updates from AWS re:Invent, agentic AI for HPC and tackling numerical stability to reduce HPC costs.

Quantum #89

Whilst most of the HPC specific announcements from AWS came a couple of weeks ago during SC25 we did have some interesting things released at re:Invent last week too. AWS have cooked up another VM based on AMD’s 5th generation EPYC CPUs but this one is running at a 5GHz clock speed. I’m quite keen to see how this compares to the regular M8a family. I think that might make for quite an interesting data point to add to our paper looking at 4th and 5th generation EPYC processors. 

The bigger CPU news though was the release of Graviton5 as a 192 core socket, twice as many as in the Graviton4. I wonder if we’ll see a HPC specific VM with graviton this time. A hpc9g if you will. Full roundup of what else was released in the HPC Cloud Release Notes linked below

By happy coincidence, not only did Microsoft release a blog about automating HPC with Copilot agents but I also came across another paper where researchers at a national lab have done some work in this area but using Gemini instead. Note that this is still work in progress.

agentic-hpc-sochat-milroy.pdf

Both are use cases that I hadn’t considered and quite honestly, I would have just put them down to another AI hype pump it wasn’t for the fact that the second paper is written by some very credible people I know personally. So maybe we don’t need to make HPC easy for everyone. It’s fairly clear we’ve been failing at that for long enough. Maybe we just need to point an AI at the complicated stuff and let it deal with it for us.


In The News

AWS re:Invent and all the other updates this week that are HPC and cloud related.

HPC Cloud Updates WE 07 Dec 2025
Updates to AWS, Azure & GCP in the last week relevant for HPC practitioners. AWS re:Invent ensures plenty of AWS news this week!

And for a little more detail around that new Graviton 5 CPU, Next Platform has a pretty good write up. We’re waiting on our preview access to be approved (or not) but given the usual NDA requirements around preview access we won’t be able to share much more till it becomes GA anyway 😦

AWS Graviton5 Strikes A Different Balance For Server CPUs
Updated: We have obtained new information in the wake of publishing our story. We have been expecting a new Arm server CPU design out of the Annapurna

Glenn Lockwood’s SC25 recap is available now and as per previous years well worth a read. 

SC′25 recap
The annual SC conference was held last week, drawing over 16,000 registrants and 560 exhibitors to St. Louis, Missouri to talk about…

From HMx Labs

Both shorts this week are based on the talk I gave last month at Quant Minds around reducing costs for CPU compute workloads by having code that is portable across CPU manufacturers and architectures… while avoiding the ghost of numerical instability

Numerical stability and an very brief introduction of where to find it

Battling Numerical Instability: Stop Blaming the CPU
Before you blame the new CPU for breaking your VaR, maybe ask what your compiler flags and math libraries have been up to after hours.

and the benefits of doing so

Comparing Cloud VM Performance Across Architectures
Is it worth making your code portable across CPU architectures when running on the cloud? 3 Clouds, 3 CPU Architectures, 3,000 Hours of Benchmarks. Here’s the Verdict

Know someone else who might like to read this newsletter? Forward this on to them or even better, ask them to sign up here: https://cloudhpc.news