The Impact of DevOps and Cloud Scale Computing on HPC

A look at the impact DevOps, cloud scale computing and other software engineering advances have had on HPC

The Impact of DevOps and Cloud Scale Computing on HPC

I had many interesting conversations at HPC Club, some of which are really worthy of sharing. The first of these started out as a familiar favourite. Hiring for HPC, particularly younger people as many people in the field slowly retire, but then evolved into a conversation about skill sets that were always important in HPC but have now become more widely adopted, such as DevOps.

Unlike most of my posts, the ideas here are not all my own. But in accordance with HPC Club rules I won’t credit who they are from, unless that person chooses claim credit themselves of course 😉

I have on multiple occasions asked the question “What is HPC?”. Not because I’m suffering from retrograde amnesia, but because I have been trying, perhaps too subtly, to make a point. In a pre cloud-scale compute era, HPC was quite unique in its ability to manage hundreds to thousands of servers. Today there are SaaS companies that quietly run larger compute estates.

As such, many skills that were perhaps once unique to HPC, can now be found more commonly. Where once click ops was the only way people managed their compute estate we have seen the emergence and establishment of DevOps and numerous tools such as Ansible and Terraform. And along with them, increased expertise in this field.

Having to deal with concurrency and parallelism were perhaps once reserved to people in the HPC domain. Today are commonplace in even the smallest distributed systems. Things that were once complex and touched by few (probably writing code in C++) are now commonplace and used daily (in C# and other modern languages). Not only have programming languages and tools evolved to give us constructs to ease parallel computing, so has the understanding of the average engineer to take advantage of them.

While perhaps HPC is still niche and a smaller percentage of people are working in the domain, many more of the skills needed to do so are available in the larger and now much more prevalent domain of large scale compute.