By Hamza in Random Bytes — Aug 30, 2024

AWS Parallel Computing Service

Amazon releases a managed Slurm based HPC service.

I’m sure you can’t have missed it. My news feeds have been full of posts from AWS promoting it, but just in case you did, AWS released a managed HPC service this week. Focus on running your jobs. Let AWS deal with the compute infrastructure.

AWS Parallel Computing Service (PCS) is about as close to a turnkey solution that any of the cloud providers offer to run HPC workloads using Slurm. While it’s not quite as simple as giving it your worker code and off your go, it’s not far off. About a morning of work should see you up and running with a scalable cluster.

You pay for the resources you consume (naturally) such as EC2, Lustre file system, networking etc but in addition there are two further costs for PCS. Firstly, a fee for running the Slurm controller, about 60 cents to $6 an hour depending on how many concurrent jobs you need to be able to run.

Then there’s a cost to manage the nodes with OS updates, security patches and so on, around 8 cents per hour per instance. Both fees are paid per minute so we’re truly into the world of ephemeral HPC. Nice.

Should you move to PCS? Well, if you’re running Slurm and on AWS and want to pay someone else to manage it for you then yea, you probably should.

If you use another scheduler, say IBM Spectrum Symphony or TIBCO DataGrid 🤨 or something you cooked up in the basement then moving to PCS also means changing schedulers. How big a job that is varies hugely, but it’s rarely trivial.

If you have regulatory requirements to be multi cloud you’re also going to be out of luck. Don’t expect PCS to support running on Azure or GCP anytime soon :lol (Though maybe AWS will prove me wrong on that yet)

So probably not one for the HPC in FSI folk but for a lot of engineering and science applications that already use Slurm, don’t have an infrastructure team to manage their estate and just want to get on with running their code, this might be just what the solution architect ordered.

AWS Parallel Computing Service

Call That Big Data?

HPC Cloud Updates WE 01 Sep 2024

Call That Big Data?

HPC Cloud Updates WE 01 Sep 2024

You might also like...