AI Product

Pricing AI Features When Every Call Costs You Money

TL;DR

Traditional SaaS could charge a flat $29/month because serving one more user cost almost nothing. AI features broke that math: every call has real cost of goods sold, and your power users can quietly turn your best plan into a money pit. I walk through the four pricing models I've actually used — usage-based, credits, tiered caps, and hybrid — when each fits, how to protect margin without nickel-and-diming people, and why you should price the outcome, not the tokens. Flat-rate AI pricing isn't brave, it's a slow leak.

April 28, 20268 min read
PricingAI ProductSaaSUnit EconomicsFounders

For about fifteen years, SaaS pricing was almost a solved problem. You built the thing once, and serving the thousandth customer cost roughly the same as serving the first: a sliver of compute, a row in a database, some bandwidth. Marginal cost was basically zero. That single fact is why you could charge a flat $29/month, why "unlimited" plans existed, and why a free tier didn't bankrupt anyone. The economics forgave a lot.

AI features quietly deleted that fact.

Now every call has a real, measurable cost of goods sold. When a user clicks your shiny "Summarize" button, you pay for it — in tokens, in inference time, in some provider's invoice that lands at the end of the month whether the customer was delighted or not. The first time I watched a single enthusiastic user generate a four-figure model bill on a plan we'd priced at $49, I understood that I wasn't pricing software anymore. I was pricing a commodity I had to buy wholesale and resell, and I'd forgotten to check the wholesale price.

The marginal cost problem, drawn out

Here's the shift in one picture. The old world on the left, the AI world on the right.

   Classic SaaS                        AI Feature
   ────────────                        ──────────
   Cost
   per     │                           Cost     │        ╱
   user    │                           per      │      ╱
           │                           call     │    ╱
   ~$0 ────┼──────────────             $$$  ────┼──╱──────────
           └───────────────                     └─────────────
              more usage →                         more usage →

   Flat fee covers everyone.       Each call adds real COGS.
   Power users are free.           Power users cost you money.

In classic SaaS, your heaviest user and your lightest user cost you about the same, so a flat fee averages out beautifully. In AI, your heaviest user can cost 50x your lightest, and a flat fee averages out into a loss the moment your usage distribution skews — which it always does. Usage in any product follows a power law. A small slice of users drives the majority of consumption. With near-zero marginal cost, those people are your champions. With AI, those same people are your largest line item.

Flat-rate AI pricing is a bet you'll lose at scale

A flat unlimited plan on an AI feature works fine until it doesn't. It's a bet that no one will use it "too much." That bet pays off in the demo and in the first 100 friendly customers. Then someone wires your feature into their own automation, runs it 8,000 times a day, and you discover you've been subsidizing their business with your runway.

You can't price what you don't measure

Before you choose a model, you need to know your cost per action — not per month, per action. What does one summary, one generation, one agent run actually cost you, fully loaded? That means the model tokens, yes, but also the retrieval calls, the reranking, the retries when a call fails, and the occasional expensive fallback to a bigger model.

I track this obsessively, and I've written a whole separate piece on the discipline of it in the daily prompt audit. The short version: instrument cost the way you'd instrument latency. Tag every call with the user, the feature, and the model. If you can't answer "what does my median user cost me per month, and what does my 95th-percentile user cost me," you are not ready to set a price. You're guessing.

The four models I've actually used

There's no universally correct model, but there are four that work, and choosing between them is mostly about who your buyer is and how predictable their usage is.

ModelHow it worksBest whenThe downside
Usage-basedPay per unit (call, token, generation)Developer/API products; sophisticated buyersUnpredictable bills scare non-technical buyers
CreditsBuy a bucket of credits, spend on actionsMixed feature costs; you want flexibilityUsers hoard or resent "expiring" credits
Tiered with capsFlat tiers, each with a usage ceilingSimple products; predictable buyersHitting a hard cap mid-task feels punishing
HybridSubscription + included allowance + overageMost B2B SaaSMore complex to explain and bill

Usage-based is the most honest mapping of cost to price, and it's why every foundation model API uses it. But honesty isn't always sellable. I've watched a perfectly good usage-based plan die in procurement because the buyer couldn't predict next month's invoice and their finance team wouldn't approve an open-ended line item. Engineers love metering. CFOs hate surprises.

Credits are a psychological trick as much as a pricing model, and I mean that as a compliment. They let you charge different amounts for different features (a quick classification costs 1 credit, a long agent run costs 20) without exposing the raw token math. They also decouple the purchase from the consumption, which smooths your revenue. The trap is expiration: nothing generates angrier support tickets than credits that vanished at month's end. If you expire credits, say so in 48-point font.

Tiered with caps is the simplest to understand and the easiest to sell, which is why I reach for it in products aimed at small businesses and non-technical buyers. The danger is the hard cap. There's nothing worse than a user hitting their limit in the middle of real work and getting a wall. Soft caps — where you warn, then throttle, then offer an upgrade — keep the relationship intact.

Hybrid is where I land for most B2B products, because it gives both sides what they need. The customer gets a predictable base price and a generous included allowance, so 90% of them never think about usage at all. You get a meter underneath that catches the 10% who would otherwise quietly destroy your margin.

   Hybrid pricing, visualized

   $ ┤                                    ╱  ← overage (per unit,
     │                                  ╱       priced with margin)
     │                                ╱
     │━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╱
     │   flat base fee              ↑
     │   (covers included          included
     │    allowance)               allowance
     └────────────────────────────┴──────────────→ usage
                                median        power
                                user          user

Protect your margin without nickel-and-diming

The hard part isn't the model. It's protecting your economics without making customers feel like every click runs a taxi meter. A few things I've learned, mostly the painful way:

Set the included allowance where the median user never thinks about it. If your typical customer is constantly bumping into limits, your pricing is sending a hostile message on every interaction. The allowance should feel like abundance to normal users and only bind on the outliers.

Cache and dedupe aggressively, and let the savings flow to your margin, not your customer's bill. If two users ask the same question, you shouldn't pay twice — and you certainly shouldn't charge twice for a cached answer. Cheap optimizations on your side are the most polite way to improve unit economics, because the customer never feels them.

Charge for the expensive things and give away the cheap ones. A classification that costs you a fraction of a cent? Make it free, make it unlimited, make it the thing people fall in love with. The 40-step agentic workflow that costs real money? That's what the meter is for. Aligning your pricing pressure with your actual cost pressure feels fair because it is fair.

Price the outcome, not the tokens

Customers don't want tokens. They want the resume screened, the ticket resolved, the report drafted. Bill on the unit of value they recognize — "per screened candidate," "per resolved ticket" — not "per 1,000 tokens." It maps to their ROI, it survives model price drops without renegotiation, and it lets you switch to a cheaper model later and keep the savings instead of handing them back.

That last point is the one I'd tattoo on a junior founder's arm. If you price in tokens, you've chained your revenue to a number that providers cut every few months. When the model gets 60% cheaper, a token-priced plan forces an awkward conversation about lowering your price. An outcome-priced plan lets you quietly improve your margin and reinvest it. Same feature, same value to the customer, very different business.

The heavy-user conversation you should plan for

You will, eventually, have a customer whose usage doesn't fit any tier. In classic SaaS this is a celebration — a whale! In AI it's a negotiation. The right answer is almost never to cut them off or to silently eat the cost. It's to have a frank, friendly conversation: "You're getting enormous value here, and your usage is well beyond the plan. Let's design something that works for both of us." Build the muscle for that conversation early, because the demo that makes everyone fall in love is also the demo that creates these accounts — which, as it happens, is a whole separate failure mode worth its own discussion.

Pricing AI is genuinely harder than pricing software was, and anyone who tells you they've nailed it on the first try is either lying or hasn't seen their power-user bill yet. But the principle underneath it is old and simple: know your costs, align price with value, and don't let your most enthusiastic customers become your biggest losses. Get that right and the meter stops being scary. It becomes the thing that lets you say yes to growth instead of fearing it.

Frequently Asked Questions

Don't miss a post

Articles on AI, engineering, and lessons I learn building things. No spam, I promise.

OR

Osvaldo Restrepo

Senior Full Stack AI & Software Engineer. Building production AI systems that solve real problems.