Letting o1 Cook
Updated:
An experiment: Could OpenAI's o1 model recreate the 80s Nintendo classic Duck Hunt using JavaScript, CSS and an HTML canvas element, my prompts, and no other help from me (I'm not allowed to edit the code).
As you can see above, while missing quite a few features and animations, o1 and I were able to get to working code that runs the core of the game in about 30 minutes of wall time, not including the additional 30 minutes I used to capture the deltas as git commits and make notes.
Not bad for an hour's work. I estimate it would have taken me 4 hours working without o1, with most of that time being spent understanding the canvas api which is not an area of experience for me.
What you see above represents five prompts, with a correction made after the last prompts. You can see the code that o1 produced along with my prompts here.
When I tried to give it a sixth prompt, o1 started making a series of mistakes that I can only attribute to reaching its limit of the complexity it can juggle.
The mistakes started innocently enough: ~200 lines of code hand waved away with a comment "the rest of the code omitted for brevity".
From that point on, however, I wasn't able to get o1 to reconcile its mistakes and get the game back to where it was. Trying to add another feature, starting with introducing a reloading sequence, broke several other features.
Rather than take over on the coding, I let it stand: forever in o1's version of Duck Hunt, you will be able to machine gun ducks into submission.
Other thoughts:
o1 is likely currently better suited to writing classes or functions with defined inputs and outputs than the entire program. This is how I use Cursor these days, coding specs and letting it fill in an initial implementation.
One obvious exception to that is APIs you don't know. Here it might be best, if like me you learn best through experimenting with a real world example, to let a model write a program that fully exercises the API, providing you with a working example that you can tinker with.
While the game works, several of the architectural decisions make me question extensibility and maintainability. Specifically, placing the checkCollisions() function and all its logic in the Game class seems likely to cause Game to fall into the God Class anti-pattern https://en.wikipedia.org/wiki/God_object as more and more physical interactions are added to the system.
I fired up Cursor after the session and asked Claude Sonnet 3.5 to criticize the code with regard to maintainability and extensibility. Here are the highlights:
Class Dependencies
- Tight coupling between classes
Collision Detection
- Collision logic is tightly coupled with game logic
- No separate physics system
- Hard to extend for different types of collisions
Code Organization
- Mixed concerns (game logic, rendering, input handling)
There were other critiques but most of them were either things I didn't ask o1 to consider, like error handling, or relatively easy things to fix later like magic numbers & string constants.
Even with AI doing all the coding, it's still a tool that will likely increase the surface area an individual engineer can work on without eliminating the need for software engineering.
My contrarian guess is that AI coding tools will make more work for software engineers; In an instantiation of Jevons paradox, more low quality code will be written because AI tools lower the barrier to entry for a lot of code to be created by people who don't know how to recognize high quality code.
It seems possible that with enough competent models, day to day software engineering looks like defining an interface and requirements, then asking 3-10 different models for their implementations that you look over, weigh against each other, and choose the best. Because a lazy developer is a good developer, most will opt to be on the low end of that range, and a way of standing out from your peers will be rigorously testing using more models.
Last thought: That was fun, can't wait to try the same thing using o3.