Death by HPC

Friday fun. Which parts of a HPC system are most likely to kill you. Or the components of HPC for beginners in disguise.

Death by HPC

Its Friday morning, I went to bed late, got woken up at 5am by the neighbour’s cat and my brain is in a funk but I know I need to write something lastminute.com this morning because yesterday was too frantic to breath.

After a few minutes of insipid idea hunting inside my own brain I gave my old pal C. Gippity a shout. After first freaking me out by repeating a line of yesterday LinkedIn post (pure off the cuff Hamza that one) which left me thinking WTAF … I’m hoping its coincidence rather than SkyNet emergence and real-time intelligence … We settled on amusing you this morning with a list of HPC components and how likely they are to result in your early demise…


Compute Nodes

What is it: Densely packed CPUs and GPUs in steel drawers. Sometimes with hidden sharp edges and penchant for crushing feet in open toed shoes

Mortal Danger: 3/5

GPUs and Accelerator Cards

What is it: Silicon that’s been chugging energy drinks and injecting roids. Imminently responsible for your heat death if not that of the universe.

Mortal Danger: 5/5

Schedulers & Resource Managers

What is it: The team captain picking out who gets to come and play. The skinny kid left standing at the end, yea that’s your critical job that needed completing yesterday. Will kill your social life before it poses any physical danger to you.

Mortal Danger: 0/5

High speed interconnects

What is it: Shiny bits of copper or glass wrapped in plastic sending smoke signals from one machine to another faster than Usain Bolt running from a hoard of zombies. Ever ready to trip you up. Literally and figuratively.

Mortal Danger: 2/5

Parallel File Systems

What is it: An excuse to keep living in the past and not adopt modern technology to handle your storage like a grown up. Oops. I mean a pack of howling SSDs running in unison to keep up with the pointless data you’ll throw away later being spat out the compute nodes. Always one rm -rf away from losing your job.

Mortal Danger: 3/5

Monitoring and Telemetry

What is it: A ghost. It doesn’t exist for most of you. Most of you that do have it never look at any of it. Mostly harmless. Till a misconfigured alert fills your mailbox and you miss that critical email from your boss resulting in you working three weekends in a row.

Mortal Danger: 2/5


Thank you, folks. I’ll be here all week. You can pay me to stop at any time. Or add your own.

Gippity had quite a lot more and you can read our conversation here