On Using AI To Make Software (in early 2026)

Updated: March 29th, 2026

"Watch out for other people's changemaking campaigns" is the type of advice I find myself liking to give, as I push well into my third decade in tech.

So when I hear the roar and fuss of AI hype, the tone and assertions of all it can do, the suggestions of what tomorrow might look like, the manic motions to hitch your own content to the train (like this post of mine), the tangible impact the hype has had on the market, and the resistance forming against it, all of which seems to be building toward the biggest IPOs in history, public offerings with the unstated suggestion of we shall capture every inch of margin in every meaningful knowledge work vertical? My antennae are up: this is likely the awareness phase of an attempt to affect massive change.

Where are we on a hype cycle model? It's hard to pick an exact inflection point, but sometime between the releases of Opus 4.5 + GPT 5.2 and now 4.6 and 5.4, it became imperative for tech companies (the little slice of the world I'm most exposed to) to signal (a) you're an AI company not an xyz company (b) your engineers (or you) are using xyz tooling to ship features and fix bugs at a tremendous rate. We've gone from IntelliSense is so good now it's almost creepy to CEOs bragging about their best engineers not writing any code in under four years. Extrapolating even linear growth from here makes 2030 seem like a wildly different place. And none of the suggestions are self-limiting to linearity.

Yes, there are massive incentives for OpenAI and Anthropic in telling a story of exponential competence growth. And there are historical precedents for populations pushing back against rapid technological change, Luddism and general technophobia being no new things. Those two poles will attract the masses. I'm more interested in the grey areas where most of us live: Where go us makers of software who cringe at both extremes, and instead want to understand the nuances, utility and limits of the thing?

This middle ground is what I will attempt to find in the rest of this piece.

Acknowledging the hype and the distaste it can leave in your mouth, it's apparent to me that there's been a shift over the last six months. Andrej Karpathy said it really well here, emphasis mine:

It is hard to communicate how much programming has changed due to AI in the last 2-3 months: not gradually and over time in the "progress as usual" way, but specifically this last December. There are a number of asterisks but imo coding agents basically didn’t work before December [2025] and basically work since - the models have significantly higher quality, long-term coherence and tenacity and they can power through large and long tasks, well past enough that it is extremely disruptive to the default programming workflow.

A lot of energy is going into discourse on code generation, but I'm finding LLMs useful across the entire software lifecycle:

Discovery / ideation - AI is much less likely than a human to fall for recency bias, power of personality or their own whims when parsing customer feedback. Gathering evidence for what competitor roadmaps might look like is a full-time role at many big companies, now a full table with your choice of columns is available within a 20 minute Extended Think. All divergent activities that could be done by a human, they just take a lot of time, and unlike a human, an agent won't 80/20 the thing. Diligent task completion will be a theme...
Requirements & Design - I don't like AI for doing the actual writing of writing docs like PRDs or tech specs, but LLMs are a phenomenally persistent editing partner. This is another theme: I feel strongly about the words in text pieces being written by the writer, with the LLM as coach.
Implementation - Yes, Opus 4.6 and GPT 5.4 code really well, at least in my experience in the TypeScript & Pythonic web services and experiences I work in. And can be very very good when prompted well inside of a well-thought-out loop... I'll say more about that in a moment.
Verification (testing & QA) - Building and maintaining run-in-any-env, deterministic integration test suites and building towards natural language e2e tests are my current thought. I never want to debug or add yet another await to another flaky Cypress script again, and why should any of us when agentic click-throughs will be better, more thorough testers anyway. Again this idea of diligence...
Code Review & Refinement - Combo of hand-rolled skills and tools like Copilot or CodeRabbit. By hand-rolled, I mean collaborated on with an agent. Yet another theme: Everything is meta in this world, everything is turtles all the way down. The best way to learn to use the tools better is to ask the tool...
Release and Operations (monitoring, rollback, runbooks, triaging bugs, etc) - Again, a lot of AI's usefulness isn't that it can do what humans can't, it's that it can do what humans won't. Cursor and Claude Code using the DataDog MCP or the pup cli have the conscientiousness and orderliness you always wished your on-call engineers had. They monitor without fail and dig deep on every anomaly.

Is AI performing magic in any of these aspects? No. As I go through my day to day, maybe waiting for a prompt or four to resolve, I'll ponder what's happening here: I have a small army of extremely diligent assistants, whose feelings don't get hurt if I have to countermand my instructions or abandon a thread unfinished. I'm not sure what this means for human relations - I suspect we will struggle as more and more of our textual and eventually voice interactions play out this way, and we slip into talking to our peer humans this way, too.

Does using AI ever slow me down? Yes. Whether it's inducing analysis paralysis in the ideation phase, following bad paths that a sycophantic agent is reinforcing, falling for red herrings in monitoring, picking tiny nits in code reviews, there are plenty of ways using AI can actually make a task more tedious, the output less accurate and slower to produce.

Given all of this, is using AI net-net making me a more productive builder in a real world, established, brownfield environment with plenty of legacy code to work in and around?

Yes.

Read on for an example of how I'm using LLMs in the programming itself.

So: What does a real-world coding loop that works in a productionized, highly in-use codebase where bugs and mistakes cost real money look like?

Here's a loop I'm having success with. This will change as the wrapper apps (Cursor, Claude Code, Kiro et al) adapt and subsume logic like this into their layers. Guessing this will look completely archaic when I read this in 2027, or maybe even by mid-2026:

Develop a few specialist skills. Or just steal Garry Tan's.
Make a PLAN.md which is a mostly empty markdown file, maybe one header that says "Next steps for project".
Make a HISTORY.md which will be a decision and change and how-we-did-this log
Make an AGENTS.md or a CLAUDE.md, and reference the skills, PLAN.md and HISTORY.md and what they're each for
New chat, "/plan-ceo-review I'm thinking of a new feature to do x, y, z, help me fill out PLAN.md", have a dialogue, end result is a filled out PLAN.md
New chat, "/plan-end-review review the next plan", have a dialogue, at the end of the dialogue "build it".
Next to last step of the build: Ask it to add to CLAUDE.md, HISTORY.md and clear PLAN.md
Last step of the build: "/review", go through review and testing

Why is this working for me?

I have not lost touch with the code or turned over parts of the codebase to the agents; I still look at every diff and make decisions at the which-module-goes-where level. For that reason, I currently prefer Cursor for most tasks, though Claude Code has its spots too.
For that reason, this works much better in parts of our codebase that I know well, versus working in something new.
Because I know the roadmap and structure of most of the projects I'm working on (we use about half a dozen different patterns through our TypeScript & Python codebases), I'm able to start each PLAN myself and assess the iterations the agent is making against both reality and desired future shape.
Because I know most of the code myself or my team is working on well, I'm able to assess and pre-review each specific diff and prune out undesirable directions or offer quick corrections, and usually assess the quality of the reviews as well.
Knowing all this helps me understand when an addition to the CLAUDE file is likely worthwhile or likely a bad path

All this comes together at a higher level in ways like this three-layered system Gurwinder suggests here. My iterations are faster, and I'm able to much more quickly assess whether a change is feasible or likely to lead to a mess - staying out of one of the classic traps of software making, the dreaded three point ticket that turns into a thirteen.

Note that I'm not using AI much for little shortcuts. Creating a commit message and pushing the code takes me, what, 15 seconds? For now I prefer to do it myself. I'm using it for tasks that might take me minutes, but often would take me hours, and require determination and continuous focus over that period of time.

What happens when I don't use a loop like this, when I just fire into a new codebase and prompt Claude or a model in Cursor to go do something? Sometimes it gets it. Other times I end up in one or more of the traps I mentioned above under "Does using AI ever slow me down". This week a naive prompt in a part of the codebase I know poorly took down a component in a dev env. And then because I didn't have permissions to restart the component, I had to shoulder tap a teammate who does. My poor usage of AI slowed us both down.

I facepalmed humbly and went back to working on my loop.

So that's a workflow that works in at least one mature codebase. What about all these breathless claims we read regularly on social about one-shotting someone's version of some app or another?

My typical first response to a breathless post is to click the bio. If it says founder of an AI company that's a wrapper around the big provider APIs, well, there you go: someone else's changemaking framework at work.

The rest? Well, it's often the first 80/20 of a perhaps-demoable web or mobile app. And that's always been easy to stand up. Yes, it's easier now. But we shouldn't mistake a lightweight, happy-path-only web app for full stack+data software systems that are observable, secure, resilient, testable, accessible, extensible, great DevX, have the right abstractions and the right consistency / performance tradeoffs. That's always been where the difficulty of software engineering has been.

And those aspects of software that make for durable systems and delightful experiences sit downstream in importance from product direction, principles, taste, market fit. And even those product aspects sit downstream of distribution, founder/market fit, right place / right time.

It's funny even having to remind people of that. I see excitement over demos of one or two aspects of a broad software surface, and yes I know the excitement comes from projecting what else might become possible if the trajectory holds. But that doesn't mean that making high quality, high uptime productionized software systems no longer requires specific knowledge.

I can't go there without taking a stab at what skills I see getting more valuable in this new world.

The work of software development has been most often like a game of chess. A series of puzzles that, when solved in sequence, results in a winning system. As a chess player, you have ideas, themes in mind, but also you'll pivot if you discover another opportunity or your opponent makes a blunder. LLMs writing software don't work that way (at least not today). They play the game best when you start with what you want. Often the best way to define what you want seems to be Q&A with a model.

Pulling up a layer of abstraction has always been about this. Lower level programmers have always been critical of how those working in the higher levels are ignorant of their memory and compute usage etc.

One good way to think about how writing code might change in the back half of the 20s is to look into how the tools are being made. I suggest looking over OpenClaw's architecture, either a summary like this or reading over the codebase with your LLM of choice. I haven't spun up my own lobster yet - this is not the kind of thing you're likely to see allowed in a productionized environment yet - but I've learned a lot just from how the workspace config files are laid out and how dynamic context is managed.

What kinds of skills will be more valuable as syntax production gets cheaper and faster? This seems a likely place to start:

knowing what you want
decomposing problems
creating good context
recognizing bad output
verifying rigorously
exercising taste
owning outcomes
designing loops and guardrails
laying out workspaces for ergonomics and efficiency

So, despite being annoyed by the tone of the hype, I'm an eager adopter of AI. At my core I'm a builder, a software maker. It's a thing in the world I've felt drawn to since I was a small boy. In college, I studied CS, not business. So this idea that I can make complex software systems working solo or in a small team is very exciting.

A lot of technologists reacted with distaste to the code academy / bootcamp boom of the 2010s. I embraced it. Some of my favorite peers got into the industry via the bootcamp route. A few things they have in common: they took the profession seriously, they've continued furthering their educations (including pursuing CS degrees), and they each got a helping hand into the profession via entry level, internship or junior roles.

So now I ask myself this: If you come through a bootcamp, or even if you've just gotten a CS degree, how do you get into this profession anymore with entry level roles being slashed? While it's true that, like mate selection, an individual usually only needs one job, there's something to be said for abundance states and what that does for the psyche of the searcher. And, with the end of the ZIRP era, we made a hard pivot from wild abundance in 2021ish to a scary scarcity of entry level / pre-senior roles here in 2026.

The CS background, never a hard requirement, does represent depth of knowledge that bootcamps don't and won't approach. Shallow breadth was valued simply because supply of even basic competence in web engineering, the ability to glue this to that, lagged the demand - demand fueled by near-zero-interest-rate investment philosophies.

It seems like LLMs are now at that point where their output can fill much of the remaining demand. The ability to stitch together a few React components - absent any deep knowledge of systems design, how to balance consistency with availability, how to achieve quality in all its dimensions - will no longer be sufficiently rare as to warrant huge offers to mid-level web software engineers. The wave, having crested in the final throes of ZIRP, will begin to recede. And that will and is causing emotional reactions.

I have an emotional reaction: I get sad when I see folks advising people not to go to a bootcamp or even not to study Computer Science in 2026. Not least because people told me when I was in high school in the 90s that "all the software developer jobs were going away" - overseas, according to that narrative. Also because WTF; in a university system that produces thousands of art history majors a year, why would there not be value in studying Comp Sci? Degree programs have never focused on applied knowledge. You are learning how to learn and think, gaining deep subject knowledge is a second order effect. Acquiring applied programming skills mostly happens at the workbench.

So it is today with AI.

Yes, the role may look much different by the time you get there.

But I think the wise move is to trust that this hiring slowdown on juniors is either temporary, or that there will be new paths for the bolder and brighter to follow. Just as I ultimately trusted that "all the software developer jobs were going away" would be and was washed away by the rise of something nobody expected: the dot com boom. And how much of it is part of the hype, anyway? After all, as someone on Reddit pointed out, even Spotify with its latest breathless claim this winter of its best software engineers writing no code, is still hiring software engineers.

So that's where I'm at, as of March 2026: Between the hype and the fear, I believe there's a huge and hugely enjoyable leverage to be found in using agentic tools in every aspect of the SDLC to make software. They can slow you down or lead you down bad paths if used poorly.

Definitive conclusions seem ridiculous when each day lately brings fresh capabilities or ideas about them. So I'm going to let this fan out in a few directions. Will any or all of them seem silly in a few years, or even a few months? Seems likely. Which? Seems really hard to say.

Something I mentioned above: I don't think using an LLM to write text is the right move. AI writing currently has a few tells that, as soon as I encounter them on socials or in work Slack, my brain shuts off. I deeply believe in writing to think, and if you skip the writing of the words, you're skipping the most important part. Maybe I'm like the teacher telling the students to write out their essays by hand: I'll submit to being open to being wrong here. But for now, every word you read on here or - if you're my coworker reading this, on Slack or in Confluence - I wrote myself.
Turtles all the way down: Something else I mentioned above: The best way to learn how to use AI better might be to ask AI. And the better you are at using AI, the better you are at asking AI how to use AI. So there's a huge advantage to putting in 60, 30, even 10 minutes a day getting better, and letting that compound.
Any task where a primary mode of failure is that the human who's doing it might give up halfway due to tedium or an 80/20 assumption or just natural laziness? Top candidate for using AI.
To get better results from tasking up an LLM, let the LLM task you up. A well-written markdown doc helps the agent understand from your first principles what you really want. And a Q&A session is a great way to get there.
Dunning-Kruger and Parkinson's Law will be amplified in the world of LLMs. If there's anything I know about human nature, it's that humans hate thinking any more than we have to. Given a task that is expected to take an hour, and then accomplishing that task / getting to an acceptable answer in 5 minutes with Claude or ChatGPT, there is a very low probability that the next move will be to use the remaining 55 minutes (a) to go 11x deeper on the subject or (b) accomplish 11 more tasks. It is our nature to use the 55 minutes in relaxation, or writing another breathless post on something we know a little about but have a lot of articulate words to say about - thanks to AI. Only those who have gone deep into the hurt locker of programming themselves - or have the 'gift' of deep anxiety, or an actual forcing function like their company is about to go bankrupt - will be able to power through their biology to get to either depth or breadth in a problem space.
This might be nothing, or it might be my most profound idea about using AI: Using AI resembles petitionary prayer, where the most valuable result is the forcing function for the user to decide what it is they actually want, and whether or not it's delivered to them might be best considered a second order effect.
Our deep desire to not think any harder than we have to will remain a primary limiting factor in the application of agentic tooling.
The jerkiness and time compression is real right now. I started musing on this piece in January. And then 4.6 came out, and I had to rewrite it to reflect a new point of view. If this continues, this piece and the points of view it contains will seem archaic in 6-9 months.
All of this assumes LLMs will remain a tool, and that AGI including full agency is not coming in this wave. If I'm wrong about that, then little of this matters anyway, because everything changes.

posted in Artificial Intelligence