Claude Code vs OpenAI Codex vs Cursor
I pitted these three together to see which one would yield the best result and this blog post is a documentation of that effort. So let's take a look at the prompts, how the tests were executed and what are the takeaways from this that should be considered as a result of this effort.
Before I got too far into developing the server component and releasing a version of this into the wild and trying to get folks onboarded into the ecosystem I'm building, I wanted to highlight some progress that's being made on the new FinTech Website!
To do this, I wanted to showcase the different experiences I had using OpenAI's Codex, Anthropic's Claude Code and Cursor.
I created a repo and setup the scaffolding for an Angular project I'm building. It's the front-end to my new and upgraded FinTech website. Then, I proceeded to create AGENTS.md at the root of the project and in subdirectories where it made sense. I leverated this to add context in each level of the directory. Generic Angular/TypeScript near the root of the repo and getting more specific as you drilled down to the details.
You can see the nitty gritty details on Github.
(NOTE: This is just the scaffolding + AGENTS.md files. The real code is in a private repo to avoid TikTok clones from replicating my site under typosquatting domains).
Let's get into the results. Starting with the worst: OpenAI Codex.
OpenAI's Codex
I gave it the repo, told it the instructions and where to find the rest of them and let it go. It took a while mulling thru all the files and setting it up.
When it was done, I found this to be the results:
Now, granted, it got no feedback from me about anything. It was the farthest removed and just took my instructions at face value and did its best to implement the solution per my instructions. Also, keep in mind, I didn't have the cal.com integration setup properly, hence the 404 "this page does not exist" errors you're seeing in the images above.
I was by no means expecting it to provide a five-star website, but this is just barebones minimalist and just crying for a bit of color!
I might also add: This was part of my $20/mo subscription. I didn't pay anything extra for this to be implemented by OpenAI.
Anthropic's Claude Code
This was a little more interesting ... I gave it the same prompts, links and everything and let it
run without obstruction from my part and whitelisted npm (test|run build)
commands so it could
test and build to its heart's content without me approving every 3 seconds. I also let it auto-apply
changes instead of waiting on my approval.
It took several minutes to produce at least this much.
It was a little better. At least used some of the available photos and more of the color pallete.
I will say: Codex vs Claude-Code is probably the best apples-to-apples test of the three because of the way they were constructed. I basically gave them a really elaborate prompt in a set of files and expected them to understand and build everything I wanted from that. They got no feedback from me from the time of initial instruction to final line of code written.
Cursor
I spent the most time with Cursor and boy, did I bash it across the head!
(and we wonder why my boyfriend calls me "Abuelita" 😂️
Many, many, many times, I had to tell it to avoid deprecated code, avoid unused imports, stop inventing new ways to run tests (I swear, it would create 3-4 different ways to run the same command on every iteration over the tests), and a swath of other things I found it repetedly doing over and over again that annoyed me. Sometimes, so badly to the point I wrote some stuff myself.
However, the results were worth it as I lost patience and just got the site put together!
Cursor's results are probably nto fair, given my actual hand in building the site and giving it feedback in addition to searching for/creating additional assets for it to use in the website.
Conclusion
Those that are saying that AI will be able to build whatever you want on a whim are lost in the sauce here. You have to work it! Garbage in -- garbage out.
That goes for data.
That goes for prompts.
That goes for whatever you're building.
The more refinement and detail and time you put into it, the better the outcome.
The key is in how you use it. For me, it's helped me stay focused on goals, alignment, higher level tasks than focusing in code that doesn't really serve my higher goals or purpose.
By the way, if you want to explore decentralized federated education, you should follow because I am releasing Kizano's FinTech after I get the server component done!
See ya'll soon!
Comments
Post a Comment