Ephemeral Test Suites

Updated: September 8th, 2025

I realized something the other day that now feels obvious: Opus 4.1 and other models are so good at writing bash, it makes sense to rethink how we treat tests. Unit tests are naturally 'write once, run forever'. But integration and e2e tests don’t scale that way—each one wants its own environment setup, fixture resets, and timeouts. Some of them can now be 'write once, run once, throw away'.

So when making a change that might affect many interfaces, like a major version update of your web app framework (Next.js / Rails / Express etc) for example, you can just ask Cursor or Claude Code to walk your route table, write bash to hit each endpoint, and make a plan to fix whatever problems are found.

Or to take another common angle, when making a change to one endpoint that you're worried might have downstream impacts on subsequent calls in the same real world workflow, you can pivot 90 degrees and ask the LLM to produce many inputs for the same sequence and get into some ugly edge cases.

In the former case, I might want to commit those tests to the codebase and get them running in CI. In the latter case, this might be such a bespoke use case that it makes more sense to just throw away the suite.

Test suite ephemerality is appetizing when writing a test suite takes an agent a few minutes, rather than the hours it might require when written by hand. If I'm putting hours or even days into writing tests, it's pretty likely I'm going to want to treat them as precious things to be held onto and run forever. Whereas knowing they can be regenerated at a whim, I'm much more likely to just throw them away.

A lightweight framework for deciding might look like this:

Add it to CI: We have a way to deterministically run integration tests in CI already (not a given in every codebase, sadly), the tests are performant (not easy), it’s a stable contract, a recurring risk, or you're fixing a regression that would be horrendous egg-on-face if it came back. (I've written before about not really being a fan of integration tests in CI, so if this skews towards exclusiveness there's that to consider)
Ephemeral when: it’s exploratory, migration-specific, multi-system brittle, you’re mapping blast radius, or (probably most common) it might be a useful suite but getting it deterministic, performant and not impacting other tests in parallel is intractable.

posted in Software Development