The HPC Experience
A look at the relative absence but importance of UX, UI and observability in HPC
You won’t read this. I know you won’t because every metric on my writing for the last year shows me that anytime I talk about observability, UI or UX in HPC I get virtually no interest. From anyone.
But we need to talk about this. It’s important. Even if you don’t realise it.
Look I get it. HPC grew up in a world where every cent, penny and spare minute was spent on getting more compute. In a market where calculating the next risk metric meant less money lost. In a space race where getting the correct exit pressure on a nozzle meant winning. On a planet where predicting temperature rise due to climate change means…well depends who you ask. How pretty the user interface or management console for any of that looked, how much time it took to debug when it failed or how frustrating the interface was for the user really wasn’t top of mind. Or anywhere in mind.
Guess what though. Failed runs cost money too. Miss catching that hot node and it burning out is expensive as well. We might not be able to put a number on it, ironically because we’ve not invested into the observability to be able to do so. But what’s the point of building exaflop supercomputers if the workload we can stress them with barely tickles them. And we don’t even know?
Worse still, where is the data to convince your C suite for the next machine going to come from? A poor intern working for weeks trying to scrape it together with mash up of sed, awk and grep? Or can you just pull up a screen to illustrate what’s going on so it’s obvious even to a three year old? Maybe you don’t care because you only need to have that conversation once every 4 to 6 years? Ok, how are you training up your grads and juniors? How long does that take? Want to make it cheaper? Want to attract new people into the field in the first place?
I want to see my scheduler playing Tetris with workload in real time. I want heatmaps that look like summer in Australia showing me how hot my estate is running. Both literally and figuratively.
HPC is a little stuck in its ways in a lot of regards but none more so than modern user experiences, data visualisation and observability. Isn’t it about time we started to fix this?