An increasingly regular question at WitFoo is one that I suspect every team building with AI is asking themselves right now: "Could AI do part or all of this project?" It sounds simple. It is not.
In my earlier post on OODA loops and feedback cycles in AI development, I laid out Anthropic's 4D framework for AI Fluency: Delegation, Description, Discernment and Diligence. That framework has been remarkably useful for answering the capability side of the question. We can determine what's possible to delegate and select the right models for the right work. We can learn how to describe the problem to the models that will be doing portions of the work. We can discern whether the capabilities actually exist for a given task. And we can apply diligence in the review of what comes back. The 4D's answer a lot of questions.
But they don't answer all of them.
The Missing Variable
What the 4D's don't directly address is cost and time. You can determine that a task can be delegated to AI. You can describe the problem beautifully. You can discern that the model is capable. You can apply all the diligence in the world. And the whole thing can still be the wrong decision if the economics don't work.
This is where tokens become interesting as more than just a billing unit.
I know how much we pay for a million tokens. That's a fixed, knowable number. Which means that if I can estimate how many tokens a piece of work will consume, I can immediately translate a project into a hard cost. We've been doing this at WitFoo for a while now (I wrote about the shifting economics in The Closing Window). But what's new is extending that same token framework beyond AI and into the human side of the equation.
Human Tokens
Here's the trick that changed how we plan projects. If we create a ten-phase project, we can assign each phase with "human tokens" and "AI tokens" to represent time and effort. For a baseline, I can bind one million tokens to one clock hour for a given model. I can also look at different human experts and their capacity per hour at their labor rate.
On each sub-task, we ask both sides the same question: "What is the estimated amount of time and cost for this task?" The AI answers in tokens. The human answers in labor hours at a rate.
If the AI comes back with 10 million tokens and the human comes back with 100 labor hours at $200 per hour ($20,000), I can equate for this specific work that 10 million tokens equals $20,000 of human value. If (and it's a big if) the quality is similar.
That "if" is where the whole thing gets interesting.
The Quality Variable
In some cases, the quality of AI output will be better than most human experts at a fraction of the time. Coding is the obvious example. I've written extensively about building WitFoo's analytics platform with Claude and migrating legacy code into AI-assisted workflows. For many development tasks, the AI produces cleaner, more consistent, better-documented code than a typical engineer would. Not always. But often enough that the economics are overwhelming.
In other cases (producing a movie, for instance), AI will be much quicker at a much lower quality. Speed without quality isn't value. It's just fast garbage.
The power of the token framework is that it lets you break the phases into smaller and smaller sub-tasks until you find the right balance of cost, speed and quality for each one. You're not making a single binary decision ("AI or human?"). You're making dozens of granular decisions across a project, optimizing each one independently.
A Real Example
A recent project at WitFoo rolled out like this with our current AI tooling:
- WitFoo Employee (2 hours): Write the rough draft of a voice script for a video
- AI (1M tokens / ~$10): Research tone, accuracy and flow; suggest edits to the voice script
- Contracted Voice Actor ($300): Record the voiceover
- WitFoo Employee (1 hour) + AI (1M tokens / ~$10): Research WitFoo documentation, brand guide and screenplay details to produce a highly detailed video brief to accompany the voiceover
- Contracted Overseas Video Production with internal AI tools ($1,000): Create the video from the highly detailed instructions
A traditional pure-human production bill for this kind of work would run $10,000 to $40,000 and take roughly three times as long. Finding the right places to invest in human quality and where to utilize AI tools gave us a higher quality product at a fraction of the time and cost.
That's the new math. Not "replace humans with AI" or "ignore AI and do it the old way." It's a surgical allocation of each sub-task to the resource (human or AI) that delivers the best intersection of cost, speed and quality.
Three Classes of Problems
In working through dozens of projects this way, I've noticed that AI tasks tend to fall into three classes:
Class 1: The No-Brainer. One million tokens or less to get a very high quality deliverable. Research, summarization, first-draft editing, code generation for well-defined tasks, documentation. These should almost always be delegated to AI. The cost is trivial. The quality is high. The speed is extraordinary. If you're paying a human to do Class 1 work, you're burning money.
Class 2: The Collaboration Zone. Hundreds of millions of tokens with a significant risk of a deficient deliverable. This is where things get nuanced. The work can be done by AI, but only if experts with deeper domain knowledge guide the 4D's in co-work with the model. Our video production example above is a Class 2 problem. The contracted production firm brought AI tools and human expertise together. Neither alone would have delivered the result.
Class 3: The Other No-Brainer. AI can't reliably do it at all. Creative judgment, complex stakeholder management, novel strategic decisions, anything requiring physical presence. This is human work. Trying to force it through AI is a waste of tokens and a guarantee of a bad outcome.
The Sorting Problem
In building a project, the new trick is getting Class 1 and Class 3 tasks grouped together so AI or humans can do that work cleanly and efficiently. Those are the easy decisions. Everyone knows what to do.
The real challenge is Class 2. When a task sits in the collaboration zone, you have three options: retrain (invest in teaching people how to work with AI on this specific type of task), retool (find or build better AI tooling for this category), or outsource to a firm that has already solved the human-AI collaboration for this domain (like we did with video production). Short of one of those three, the task has to be treated as Class 3 work: human only, slower, more expensive.
The comparison of capital, time and quality across AI and human labor is a powerful set of lenses for producing better products faster. It doesn't eliminate hard decisions. It does make them visible and quantifiable in a way that "should we use AI for this?" never could.
A Word About the Window
At the current costs of AI generation (especially text LLMs), there are many sub-tasks that should certainly be delegated to AI. Class 1 work at today's token prices is practically free. But as I wrote in The Closing Window, those economics are shifting. The value-to-cost ratio that makes Class 1 work a no-brainer at current prices might look very different in 12 months. Some of today's Class 1 tasks could become Class 2 (or even economically unjustifiable) as token prices rise.
Which means the framework isn't just useful for planning individual projects. It's useful for understanding where to invest right now while the economics are favorable, and where to build contingencies for when they aren't.
Wrap Up
The token-as-unit-of-effort concept started as a back-of-the-napkin shortcut for estimating AI costs. It's become something more than that. By applying the same metric to human labor (what is the token-equivalent value of a human hour on this specific task?), we've built a planning language that lets us compare apples to apples across AI and human work. It's not perfect. Quality is subjective, estimates are estimates, and the 4D's still have to be applied rigorously to avoid garbage-in-garbage-out at scale. But it's a better framework than "let's just try AI and see what happens," which is what I see most teams doing.
The new math isn't complicated. Break the work down. Classify each task. Assign the right resource. Compare cost, speed and quality. Adjust and repeat. The hard part isn't the math. The hard part is being honest about which class each task belongs in, and resisting the temptation to force a Class 2 problem into a Class 1 box because you want it to be cheaper than it is.
Build honestly. The tokens don't lie. (Even if they're getting more expensive.)