AI Project
I programmed a project from start to finish using AI and it didn't save me any time.
Not because I'm some Anti-AI Luddite, not because I dislike AI but because it didn't save me any of the critical steps involved with testing an app out and making sure it met business criteria.
I'll explain in this post.
However, what I will say is it vastly increased my ability to effectively communicate what needed to be done to get the project to completion.
Now this is probably the leveling out of the "J" curve for me because I am starting to have seen the worst and am starting to see better results as I continue to build stuff out and switch my workflows around.
To anyone who has ever prompted their way to work-done knows the dopamine hit you get. You put some instructions into the textbox and watch it go. It comes back with results way faster than you could have hand typed. However, once you step into the details and try to make sense of what was just crafted, you realize:
- It's not at all correct except for a few places where it could have been easily assumed.
- There's nuance in many places you didn't think about and didn't instruct.
- It made assumptions that turned out to be incorrect.
- There's details missing or expectations/preferences not met that you would have done differently.
So you spend the time going and correcting the work, making tweaks and getting the work up to par as you would have originally done it.
Has anyone ever done a task themselves to see how long it would take and then done a side-by-side comparison of taking a similar task and done it with a language model as an extra hand to get the same task done?
Does anyone have any objective results that can quantify the work to get done.
I'll take coding and programming as an example because it's what I know best.
I can usually spec something out to 4 hours give or take 10 minutes pretty damn accurately.
That's not 4 hours in the day, that's 4 concrete focus hours on the goal with zero distractions and 100% of my attention on the task at hand; pushing all else to the side for the work that needs to get done.
Sometimes, that work spans 6 hours, sometimes it spans 2 days.
Either way, I can tell if something is going to run over because of unexpected nuance or questions I had to have answered and I measure my time spent on stuff. This is how I know I'm so accurate.
I have a time study that I sit next to.
What is a Time Study? you might ask...
Alex Hormozi breaks it down in less than five minutes, but I'll give a 1-liner:
It's a notebook I use to track my time. Every line contains a snippet about what I worked on in that moment.
In this way, I can track my day down to 15 minutes of accuracy and know exactly how much I spent on it. This is also why I'm so accurate.
I have spent a deal of time enough to know that AI and LLM's do not actually save time. However, they are really great at convincing you that they are.
What they do accomplish is clarity of getting the task done. The bottleneck is no longer typing the words out, it's clarifying the end goal. It's communicating the requirements in a way that can be translated to algebraic English for the computer to compile to bytecode and execute.
For this task, I allowed my work to shift to see what it would be like, driving an agent to completion and I have to say: Even though the agent made less mistakes than I've seen Cursor and ChatGPT|Codex make in the past, I'm still not impressed.
It's definitely better. My experience is less frustration. Less complaining. Less wingeing/cribbing about extra whitespace, unused variables, invalid syntax, deprecated code. There's definitely less frustration. However, there's also a lot of assumptions that are being made that are often incorrect and the model didn't come back to me for guidance nor clarity, it just made the best decision it could based on what it saw (as most humans do, mind you).
I still spent the time I thought I was going to spend in validating the code worked as expected. I still spent a great deal of time correcting the assumptions that were made. Instead of bitching at the agent about what it did wrong, I just issued correction as if it was a fresh slate. It did well enough to avoid the bombarding of ridicule it got in the past.
The fact remains: It still got it wrong 50-75% of the time.
I will admit -- I did spend a good several hours building a master prompt that would define and describe EVERYTHING it needed to do from start to finish. I did the best I could to tell every nuance and every detail and even told it where to find the screenshots!
Alas... my experiments revealed that even with all that context, it still couldn't get it right completely in the first shot. I cannot blame it tho -- neither I nor the tool had the full instructions in our context window. So I forgive you past-me, dear Claude Code. I forgive you.
A lot of the work is done in discovery and understanding. There's something about novel problems that result in a discovery of sorts. We cannot assume we have all the details immediately.
Describing the problem, the solution and the definition of done is difficult ... In general.
Doing it all in one-shot seems nearly impossible right now. Albeit, I am working on my best with this. @Nate B Jones talks about 4 core skills that build upon each other in order to excel beyond just prompt engineering.
I'll quickly touch on the 4 key skills here:
- Prompt Engineering: This is no longer a special skill, it's now just a table setting. It's just expected. You should know how to craft a decent prompt that has clear expectations, goals, examples and counter examples on a given task or goal.
- Context Engineering: This is not just crafting the perfect prompt, but understanding what information will drive a model to behave a certain way or output certain stuff.
- Intent Engineering: Being able to describe what the models wants instead of just driving what it needs to do. This reduces ambiguity, and gives better assumptions that could be made that reduce the model from going off the rails on long-running tasks.
- Specification Engineering: Being able to write out the specification and details to guide a long-running model completion. This includes the nuance and assumptions spelled out to a degree it reduces the chances of ambiguous output or interpretation.
Even if I had a tool that could get it done perfectly in the first shot (which, at this point, let's be honest, they probably can with the right crafting of instructions), I'm not sure it would get done that quickly just because there's still judgement and taste that can only be applied at the moment of observation, measurement and just seeing it work and operate in the environment.
The takeaway I want you to gather from this is that your job is safe for the time being because we still have the opportunity to learn these tools as much as they learn us and our workflows will have to change over time to accommodate the upgrade.
Cheers!
Comments
Post a Comment