Evaluating AI Written Code

We’ve usually defaulted to simple and limited metrics like line counts when evaluating code. Often shunning the hard work of a better method. Will AI force us to finally change?

Evaluating AI Written Code

An early study in 2023 by GitClear showed an increase in code churn since the advent of Github Copilot. In the study this is presented as a bad outcome. But we’re judging that outcome based on the standards and expectations we have of code quality when code is written by humans. I’m not sure that will necessarily be true for AI written code.

I think we are rather unprepared for a world in which more code might be written by AI than by humans. Many of our software engineering practices won’t really hold any longer and we have not even started to adapt these.

The most important of these, and the one we have struggled with greatly so far, is judging the quality and acceptability of software. Perhaps AI will finally force us to solve this? Probably not. Will we focus instead of defining more clearly the parameters of the problem, something we tend to struggle with today as in industry (see the rise and fall of agile). Certainly, any kind of AI agent will benefit from clearly defined requirements. Perhaps if are also able to judge its output by those same requirements we no longer care about the implementation?

I believe if we are to migrate from AI working simply as copilots and assistants to being fully fledged engineer replacements (as Devin claims to be) then we also need to change the paradigm under which they operate. This is especially true if we expect such AI software engineers to be used by non-engineers.

After all, when you hire a motor mechanic to repair your car, you really only look at the outcome. (Unless you’re like me and can do the work yourself and check).

This is an excert from:

Thoughts on AI for 2025
I am in the fun position not only of using AI as engineer but also need to making hiring decisions in the face of tools such as Cursor and Devin. I thought I’d share some thoughts.