Misadventures in HPC Cloud Migration #5
Your HPC scheduler does not exist in a vacuum. Misadventures in HPC cloud migration #5
Yes, your #HPC scheduler is probably the most important part of your HPC application (though the #quants probably disagree!) but it is not the only part. Ignore that fact at your peril.
Last week I shared a diagram of HPC applications, as viewed by your non HPC savvy board or CTO, just before they ask you to move it all to the cloud in a week. Well, here’s what it looks like with some of the missing (ignored) components included (in blue).
If you’re running real world, mission critical, HPC workloads such as #FinancialRisk calculations then the scheduler is just one (critical) element in your HPC periodic table. Without some form of log aggregation and metric collection mechanism you’re running blind. The Bank of England peering over your shoulder at the results will soon cure you of any illusions that this may be a viable option.
I see very few apps actually making use of the scheduler’s ability (for those schedulers that even provide it) to distribute data to the worker processes. Instead, there are a variety of different and creative solutions to achieve the same results for each type of workload.
And naturally there will be at least one application that has a long-forgotten dependency on a random file share for a small bit of configuration. It won’t work without it of course.
All this before we even start to consider the #devops processes that are needed to get all the different bits of code to the right place.
The whole shebang needs a way to move to cloud. Even that forgotten configuration file. Bear all that in mind and you’re closer to a successful migration.
If you think that’s all you’ll need on cloud though, then you’re still in for a nasty surprise. Stay tuned.