Joshua Sievert – Why Tokens are not a product metric and AI credits fix the multi model chaos

When people talk about AI usage, they usually end up talking about tokens. That makes sense because tokens are how most providers bill for text.

But tokens are not a human unit.

If you have ever tried to explain an AI bill to a customer, a colleague in finance, or a founder who is not deep in the weeds, you know the feeling. Tokens sound abstract. The conversation quickly turns into questions like: is that a lot, why did this one cost more, and why am I blocked if I still have usage left.

That confusion is not because people are slow. It is because tokens are an engineering unit, and we keep trying to use them as a product metric.

This is a plain English explanation of what tokens are, why costs vary so much, and why AI credits are a simple scalable way to control usage once your product uses more than one model.

What tokens really mean

Every AI request has two parts.

Input is the text you send to the model. That includes your prompt, instructions, chat history, documents you attach, system messages, and anything your app adds behind the scenes.

Output is the text the model sends back, meaning the answer.

Models do not count words or characters the way humans do. They break text into tokens, which are small chunks of text. Sometimes a token is a full word. Sometimes it is half a word. Sometimes it is punctuation. It depends on the language and the content.

A useful mental shortcut for English is that one token is roughly four characters on average.

It is not exact, and it gets less accurate with other languages, code, emojis, and unusual formatting, but as a back of the napkin estimate it is good enough.

So when someone says a request used 2,000 tokens, you can translate that into roughly a few pages of text moved through the model.

Why the bill splits input and output

Here is the first reason costs feel unpredictable. Most AI pricing splits input tokens and output tokens, and those two are priced differently.

It costs one rate for the model to read your input. It costs another rate for it to generate the output.

That is why a short question can still be expensive if it triggers a long answer. Output tokens can dominate.

And a small user prompt can still become costly if your app includes a lot of hidden context like chat history, system instructions, or attached documents.

Model choice changes the exchange rate

The second reason costs feel chaotic is that different models have very different prices.

A smaller model can be extremely cheap. A larger, more capable model can cost many times more because it is doing more work, using more compute, and often delivering higher quality or stronger reasoning.

So the exact same number of tokens can cost very different amounts depending on which model you used. That is not a bug. That is how the market works. Capability has a price tag.

This is the core tension.

Tokens measure volume. Models determine value.

Why token limits break once you have multiple models

If your product uses only one model, token limits are workable. You can say you get 200,000 tokens per month.

Even if it is not super user friendly, it is one bucket, one rate, one meter.

But as your product grows, you naturally start using multiple models. For example, you add a smaller model for quick cheap tasks, a bigger model for complex reasoning, a long context model for documents, or a user facing setting like fast, balanced, best.

Now you have a real product problem.

If you try to manage usage with tokens, you end up creating separate token buckets. You have 200,000 tokens of the small model, and 5,000 tokens of the big model.

It sounds logical until a workflow uses both.

Imagine a feature that does this.

Step one uses a small model for cheap preprocessing.

Step two uses a big model for high quality final output.

A user can still have tons of small model tokens left, but if they run out of big model tokens, the whole feature breaks.

From their perspective it feels unfair. I still have AI usage left. Why am I blocked.

From your perspective it is technically correct. You ran out of the expensive bucket.

This mismatch is exactly where customers lose trust. They do not care about buckets. They care about completing work.

So the question becomes: how do you let the product use different models without turning usage into a confusing checklist.

The simplest fix: blend usage into one metric

This is where AI credits come in.

Credits are not magic. They are just a decision. Instead of tracking multiple token buckets, you track one budget unit that represents spend, regardless of which model created it.

Here is the clean mental model:

1 – You count input tokens and output tokens because that is what providers report.

2 – You compute the real cost using that model’s pricing because different models have different rates.

3 – You convert that cost into credits using an internal conversion rule.

4 – You deduct from one shared credit balance.

To keep it simple and avoid tying it to any specific currency claim, you can phrase it like this.

Assume one credit represents a tiny fixed amount of value.

Example Calculation

It is a conversion factor like N credits per one unit of currency value.

Model tier	Price per 1k INPUT – tokens	Price per 1k OUTPUT – tokens
Heavy	0.01 $	0.05 $
Light	0.0005$	0.001 $

Using the prices above, let’s calculate the cost of one workflow with this usage:

Heavy model: 5,000 input tokens and 2,000 output tokens
Light model: 40,000 input tokens and 5,000 output tokens

Model tier	Direction	Calculation (Usage * Cost)	Cost
Heavy	INPUT	5k * (0.01 $ / 1000)	0.05 $
Heavy	OUTPUT	2k * (0.05 $ / 1000)	0.10 $
Light	INPUT	40k * (0.0005 $ / 1000)	0.02 $
Light	OUTPUT	5k * (0.001 $ / 1000)	0.005 $

This Example leads to a usage cost of 0.175 $

Now that we have the real usage cost, we can convert it into a single product friendly unit: AI credits.

The key idea is simple. Tokens are the technical measurement, but credits are the user facing budget. Instead of forcing users to understand different model prices and separate token buckets, you translate every request into one shared balance that always means the same thing.

AI credits conversion and calculation

First, define a conversion factor. For example:

10,000 credits equals 1 dollar of usage value.

Second, calculate the usage in AI Credits:

0.175\ \text{USD} \times \frac{10000\ \text{AI Credits}}{1\ \text{USD}} = 1750\ \text{AI Credits}

What does that mean for the User

Now the user does not need to understand token pricing tables, model differences, or input output rates.

They just see:

I have 50,000 credits and I have used n.

It is easy to understand that:

Fast model costs fewer credits.
Premium model costs more credits.

Everything comes out of one, blended balance.

That is the whole point. One meter instead of several incompatible meters.

Honest accounting: estimate first, then charge actual

There is one technical detail that makes credits reliable rather than hand wavy.

You need to decide before running the request whether the user can afford it. But you only know the exact token usage after the model finishes.

So a robust credit system does two passes.

Before the call, estimate.

Approximate input size. Even a rough four characters per token estimate helps.

Assume the maximum output you allow.

Convert estimated cost to estimated credits.

Reserve that amount so two parallel requests cannot overspend the budget.

After the call, reconcile.

Read the actual input output token counts from the provider response.

Compute the actual cost using the same pricing table.

Convert that to final credits.

Settle the real amount and release any unused reservation.

That is why credits can be a single metric without becoming inaccurate.

Why this ends up being better for everyone

For users, it is simple. A single number they can reason about.

For product and marketing, credits are a clean packaging tool. You can offer monthly credit plans, add on top ups, premium model access, and usage breakdowns by fast versus premium instead of token math.

For finance, credits become a stable forecasting unit because they are anchored to real cost.

And for engineering, tokens are still the truth underneath. Credits do not replace token tracking. They translate it into something the business can use.

Tokens are how AI providers measure text. Credits are how products measure value.

If you are running one model, tokens might be enough. But once you run multiple models or give users options, credits stop usage from turning into a confusing set of incompatible limits and start making it feel like what it really is: a budget.

Why Tokens are not a product metric and AI credits fix the multi model chaos