Misadventures in HPC Cloud Migration #15

Its always DNS

Misadventures in HPC Cloud Migration #15

You read my last post. You’ve worked out your network traffic volumes. You know which bits of your network are metered on cloud and what they cost. You know what your network costs will be. Sorted.

One more problem. DNS. It’s always DNS.

When you were running on premises and your HPC code needed to reach out and grab its input data, that wasn’t a problem. You simply connected to

hpc-cache.some.odd.network.name.that.makes.no.sense.evilcorp.com

from HPC code executing on the worker node. Both machines were on the same network, both used the same DNS. It all just worked. With zero thought or input from the HPC application team. 

You’re not that lucky on cloud. Remember the diagram from the last post? Your HPC worker node is running in a VPC/ subscription (choose your preferred term depending on your cloud provider and network setup). Your HPC cache service is running, well, somewhere else. 

Let’s be generous. Let’s say you migrated it to the cloud too. (You really should have). So, its running in a different VPC/ subscription. That’s fine though. You’ve allocated distinct non routable IPv4 ranges to each VPC. They can communicate via the hub. We’re all good here. 

What DNS is each one using though? Are they in the same domain? Can you resolve host names in one VPC from another? Or are you back to IP addresses or modifying the hosts file in your machine image? 

What if your HPC cache service is running on premises. Using your existing on prem network infrastructure. Including its old, rather static, DNS service. Can you still resolve your hostname from your HPC worker node? 

The simplest things you take for granted on premises can become a hot mess very quickly in the cloud. Make sure your application teams spend the time to understand their network requirements.