HAL Update: A Confidence Game
An update on HAL, the HPC Vibe Coding experiment. One step forward and two steps back?
Our little HPC vibe coding experiment is going backwards. I deleted all of the test code it generated last week.
The first interesting thing about that is how easy was to delete large amounts of vibe code. I feel no attachment to it. I’m not worried about what I might lose (which I sometimes do despite the git history). git rm -r test/. 💥Boom. It’s gone. Don’t care.
I didn’t delete it because it was wrong. It might have been. I don’t know. And that’s why I really deleted it. I didn’t trust it; it was verbose and gaining confidence that those test passing would mean I had working software meant I would have to read and understand it all. Sure, I had asked AI to validate it, and it did. Still didn’t trust it. I wasn’t about to read 4500 lines of AI generated code. Not gonna happen.
I think the key difference here is that the level of trust and confidence I’m demanding for this is far higher than I did for our various vibe coded internal tools. If our ATS doesn’t work I’m somewhat inconvenienced but somehow, I’ll manage. If your calculations for a nuclear reactor or risk exposures for a “too big to fail” bank aren’t correct though people tend to get a tiny but more annoyed. The same level of trust is therefore demanded of the software that runs those calculations. i.e. this (HAL).
So, I deleted it all and started again. A bit less vibey this time but still trying to do this AI first. I recreated the test, providing more direction on how I want the test framework created, small steps, checking the output more frequently. I still didn’t write any code by hand, and I haven’t really read of all of it, but it feels a bit more correct. Yea pretty vibey still 🤣
My workflow for this includes tracking the prompt history itself in a markdown file which led to a funny thing happening. I had started to write the prompts into the markdown file first and then paste them into Codex. The file was open in VS Code with GitHub Copilot. Yes, you guessed it. Copilot (set to use Opus 4.6) started writing the prompts that were then fed to Codex. 😆 What’s that called? Meta vibe coding?
Now I have a new test harness. I still don’t trust it, but its less verbose and makes a little more sense than the last one. I’ve asked the LLMs to validate it again too but I’m honestly not sure at this point how I build a greater level of trust in this without resorting to writing the test code myself, AI assisted too perhaps, but really writing it all myself.
I think it’s no accident that the recent large scale vibe coding experiments by Cursor and Anthropic to create a browser and C compiler respectively chose products to code that had existing, and fairly comprehensive, test suites.
I might be missing something, and please fill me in if I am, but I don’t see anything out there yet on how to solve this. Answers on a postcard.
As ever all the gory details are in the git commit history and the prompt-history.md files.
P.S: I’ve just kicked it off to go and implement the data control plane… Guess we’ll see what we get.
