HPC Cloud Updates WE 04 May 2025

Updates to AWS, Azure & GCP in the last week relevant for HPC practitioners. Azure Compute Fleet is GA, Google and OCP plan for a 1MW rack.

HPC Cloud Updates WE 04 May 2025

AWS

Want an easy way to make use of Capacity Blocks? Combine them with AWS Batch. An ML focused example here to get you started

How to use Capacity Blocks for ML with AWS Batch | Amazon Web Services
Capacity Blocks for ML (CBML) are a powerful feature that allows you to reserve highly sought-after GPU based EC2 instances for a future date to support your short-duration machine learning (ML) workloads. Since the reservations are “for a future date” you must have a mechanism to launch the instances that you have paid for and place jobs onto them at that specific time. This is where AWS Batch comes in. With an always-on queue ready to accept jobs, and the ability to scale your capacity block reservation at the correct time, AWS Batch provides you with everything you need to maximize your CBML reservations.

Need some fast I/O for your HPC? The new i7i instance types might be a useful base to build it on if you need persistent storage

Introducing Amazon EC2 I7i high performance Storage Optimized instances - AWS
Discover more about what’s new at AWS with Introducing Amazon EC2 I7i high performance Storage Optimized instances

Azure

Azure Compute Fleet is now GA in all regions. I think I need to spend more time playing with these and the new features of VM Scale Sets as it is not completely clear to me which of the two I would be better off using to scale my HPC estate effectively.

Azure updates | Microsoft Azure
Subscribe to Microsoft Azure today for service updates, all in one place. Check out the new Cloud Platform roadmap to see our latest product plans.

Want an intro to running CAE on Azure that’s written as a bit of an advertorial?

Computer-Aided Engineering “CAE” on Azure | Microsoft Community Hub
Table of Contents: What is Computer-Aided Engineering (CAE)? Why Moving CAE to Cloud? Cloud vs. On-Premises What Makes Azure Special for CAE Workloads? What…

Google Cloud

Earlier this year NVIDIA introduced us to the idea of a 600kW rack and now Google and the Open Compute Project are pushing that even further to a 1MW rack. Liquid cooled of course by necessity

Enabling 1 MW IT racks and liquid cooling at OCP EMEA Summit | Google Cloud Blog
At the 2025 Open Compute Project Summit, we announced a +/-400 VDC enabling 1 MW IT racks, and the Project Deschutes liquid cooling distribution unit.