HPC Cloud Updates WE 06 Apr 2025

Updates to AWS, Azure & GCP in the last week relevant for HPC practitioners. Includes news from KubeCon, a new version of AWS Parallel Cluster and mixing Azure Cycle Cloud with Slurm on K8. Yes lots of of K8 related updates of course!

HPC Cloud Updates WE 06 Apr 2025

AWS

Parallel Cluster version 3.13 released. Adds support fof Slurm 24.05.07, Ubuntu 24.04 and FSx for Lustre

AWS ParallelCluster 3.13 with support for Ubuntu 24.04 and support for EFA-enabled with Amazon FSx for Lustre - AWS
Discover more about what’s new at AWS with AWS ParallelCluster 3.13 with support for Ubuntu 24.04 and support for EFA-enabled with Amazon FSx for Lustre

FSx for NetApp now available in Stockholm and Singapore

Second-generation Amazon FSx for NetApp ONTAP now available in additional EMEA and APAC Regions - AWS
Discover more about what’s new at AWS with Second-generation Amazon FSx for NetApp ONTAP now available in additional EMEA and APAC Regions

Direct Connect is available in Greece

AWS announces new AWS Direct Connect location in Athens, Greece - AWS
Discover more about what’s new at AWS with AWS announces new AWS Direct Connect location in Athens, Greece

Azure

This looks interesting. Using Cycle Cloud to run containerised workloads in combination with Slurm

Running Container Workloads in CycleCloud-Slurm – Multi-Node, Multi-GPU Jobs (NCCL Benchmark) | Microsoft Community Hub
Running high-performance computing (HPC) and AI workloads in the cloud requires a flexible and scalable orchestration platform. Microsoft Azure CycleCloud,…

Some benchmarks on the Azure NDv6 (powered by NVIDIA GB200s)

Azure ND GB200 v6 Delivers Record Performance for Inference Workloads
Achieving peak AI performance requires both cutting-edge hardware and a finely optimized infrastructure. Azure’s ND GB200 v6 Virtual Machines, accelerated by…

Google Cloud

Behind the scenes look at Colussus, the storage system that powers Google products (and maybe your HPC)

How Colossus optimizes data placement for performance | Google Cloud Blog
Learn how the Google Colossus distributed storage system determines how to place files on HDD vs. SSD to balance cost and performance.

I’m still uncertain about K8 for HPC but this multi cluster ability across regions certain adds weight in its favour

Multi-Cluster Orchestrator for cross-region Kubernetes workloads | Google Cloud Blog
The new Multi-Cluster Orchestrator service helps platform and application teams manage workloads across Kubernetes clusters across regions.

and since it was KubeCon last week, I guess if you’re using GPUs this might be of interest as well

Using MultiKueue to provision global GPU resources | Google Cloud Blog
Together, MultiKueue, GKE, and Dynamic Workload Scheduler let you provision GPU resources in a GKE cluster regardless of region.