Misadventures in HPC Cloud Migration #8

BYO Scheduler & Cloud Aware

Misadventures in HPC Cloud Migration #8

We’ve established that Lift n Shift isn’t a great idea, but we still need to get to cloud and, rightly or wrongly, the thought of having to replace the scheduler fills most enterprises with the kind of dread that envelopes developers having to return to the office.  

This leaves us with an approach that is restricted to retaining the scheduler. Bring your own scheduler if you will. 

As we have already ruled out running a fixed set of infrastructure on reserved instances, that pushes us down a path of dynamically controlling, at the very least, the worker compute resources. Turning on and off capacity as required. Something to make things cloud aware if not cloud native. Yes, I am coining a new term. CloudAware.

The resulting architecture for this would look something like the picture below. The additional components, above what is required for lift n shift, are in blue. I have deliberately left the component responsible for controlling the cloud infrastructure unnamed. This could be as simple as a bash script and a cron job, or something as fully fledged and cloud native as Cycle Cloud. Whether it is directly controlled by your scheduler or by something else that is integrated with your monitoring will vary greatly depending on your scheduler (and a host of other things). Hence the dotted blue lines from both.