AI is breaking SaaS economics — and why the Law of Large Numbers, Entropy and Independence may be your best ally in designing a price model that works.
As an insider observing core business principles that make SaaS companies successful, I find AI natives playing by a completely different rule-book. Originally I wanted to write how AI companies are fundamentally different from SaaS, but decided to explore in detail one topic at a time. What began a curiosity exercise — why AI pricing is different than SaaS pricing — soon turned into a rabbit hole of tokens, inference costs, and stochastic behaviour. The deeper I went, the clearer it became: this isn’t just about pricing models. It’s about reconciling two opposing forces — stochastic usage vs predictable revenue.
Founders trying to understand price modelling in the AI era need a new latticework of mental models.
Disappearing Religion of Predictability :
SaaS businesses are built on one single truth: predictable revenue, which is the cornerstone of success and "certainty". Pricing is plain truth based on user seats and length of contracts. The Multi tenant approach of writing code once and replicating many times over for new customers without incurring additional cost makes SaaS businesses a darling for investors with over 80-95% profit margins.
Then there is AI — where every customer requires fine tuning & telemetry. Predictability is far within reach and margins eroding to 55% even at frontier AI labs like Anthropic with millions of users where ideally unit economics should catapult gross margins. SaaS usage was linear. One user = one seat = one predictable stream of revenue. AI doesn’t work like that. Every interaction consumes tokens — the atomic currency of large language models. For those who cannot follow - Think of a token as a tiny slice of word, roughly four characters or three-quarters per word. When GPT responds to you with :
“Hello, how are you?”
you spend about six to eight tokens. A thousand words should be 1,300-1,500 tokens. Every model call burns tokens, be it summarising emails, generating caption or transcribing audio. Take the example below,
Both pay the same flat subscription fee, but one quietly costs 500× more in compute. Revenue per user remains predictable; but your compute cost does not. That randomness right there is stochastic usage — an unpredictable consumption driven by many number of factors like prompt length, model response, and human curiosity. Mathematically speaking, it is introducing variance to the process. Operationally for any business, variance introduces chaos.
Most AI native startups I meet these days are on the watchdog for these wild usage swings because one heavy user is all you need to blow your margins kingdom come. Which makes me wonder how does a company like Open AI or Anthropic decide a per use seat while erratic, despicable stochasticity running amok in their 100s of millions of everyday users. Upon further reading, I got reintroduced to the probabilistic concepts: The Law of Large Numbers, Entropy and Independence.
The New Pricing Geometry :
In simple words, the Law of Large Numbers says that the more times you repeat a random event, the closer the average of your results will get to the true, theoretical average
When you operate at scale with high degree of variance, the Law of Large Numbers (LLN) guides you in principle to frame a common cost per user as independent usage events grow, average cost converges to a stable mean and variance falls 1/n, until breakout entropy.
Experiment : I asked GPT to create random 10K user dataset [Assume these to be customers] and assign token usage with high variance and run Python script to figure the average compute cost using LLN.
The resulting theoretical average compute cost / user is a first level observation of what the LLN converges into. However its is not that simple. Law of Large Numbers fails where variance meet Entropy and independence.
Imagine the dataset to be true, reflecting real world scenario of 10000 customers using your AI features, Two worlds of possibilities emerge viz,
All 10K customers send different prompts or audio file to process during the day, most follow similar patterns of usage, the compute is stable - Calm and steady usage where LLN covers variance for model predictability.
Imagine a viral event happening, new model release, popular prompt from influencers or everyone runs a cron job at 10 am to summarise emails. The system sees a huge synchronised spike. The average usage is still the same, but variance has exploded again— this time due to compute cost. You can’t call it calm usage. You must add markups, overage, or burst pricing — not for extra compute, but to insure against chaos.
Think of independence as how differently your users behave from one another — and entropy as how unpredictable that behaviour is.These two factors stress test average user cost at different scenarios. Understanding these variables help us design pricing tiers and resurrect predictability to extent.
Experiment : I asked GPT to append random independence and entropy values to all 10K users and run script for LLN variance into different bands based on Independence [Predicability] and Entropy [Unpredictability]. The resulting chart shows the different types of users in the system. This is not just based on token consumption but also usage behaviour
Based on the outcome, we infer :
Low entropy (independent users) → You can rely on the Law of Large Numbers; costs even out; charge lower prices.
High entropy (correlated users) → Everyone acts together; your cost curve becomes unpredictable; charge higher prices or require commitments.
Independence buys you predictability, entropy taxes you for chaos.
Add markup value to the /token cost you pay your LLM vendor and that would help create a pricing plan based on how your current user group and their usage behvaiours.
In this way, we understand randomness and safeguard against spike in usage and concurrency. Once you accept that usage is stochastic, you stop pretending flat pricing will save you. Frontier AI companies know this better: Take a look at the heavyweights in this category and how they model pricing :
OpenAI bills per token — predictability through scale.
Anthropic & Mistral uses context-window pricing tiers to segment compute load.
The emerging pattern I see, most AI natives introduce a Base Fee + Credit Allowance + Overage for Sane Unit Economics. A base fee buys subscription predictability. Credits cap average usage and overage pricing captures heavy consumption.
It’s a new level of Pricing Geometry: you’re finding the curve where cost, usage, and value intersect without breaking trust or margin. SaaS once priced by access; AI must price by entropy and independence — the unpredictability of workload.
The Dilemma :
Investors want predictable revenue but models produce stochastic usage.
You can’t promise linearity in a non-linear world. The same applies to your customers. As people selling premium deep tech software, we are accountable in explaining these concepts in detail, not upsell a higher plan but de-risk customer for erratic usage spikes. Engineer synthetic predictability — via tiers, credits, usage caps, and statistical assumptions. The trick is not to eliminate variance, but to design with it:
Encourage pre-commit usage (“buy 1 M tokens/month at discount”).
Forecast cohorts statistically, not contractually.
Leverage PLG, observe the first 50000 turns of API usage, calculate the token consumption / call and plot the user types.
So — does stochastic pricing actually work? Yes — but conditionally. You can make it work if you have:
Scale → so LLN smooths variance;
Capital → to endure variance until your model flips positive.
If you do not have the scale yet, The LLN may not apply. In these scenarios, predictability is best engineered and federated though contracts.
Thinking of the following ideas:
1. Minimum Commitment plans : Customers pay a base fee to use the platform.
2. Create Credit packs with an expiry date. Unused Credits become pre paid buffer margins.
3. Concurrency and usage based billing.
The Predictability cross over is something that you plan on reaching perhaps your first 50 independent customers.
Until then, absorb volatility as growth spends and classify heavy-usage cost under R&D or broad “Cost of Revenue” lines. Compute spend is often treated as “growth investment” rather than cost in the AI era.
Welcome to the ”Growth at All Costs 2.0” race and stochastic by design. AI profitability isn’t about eliminating variance; it’s about outlasting it.
In SaaS you priced certainty and delivered stability. In AI, you price variance and deliver expectation. The companies that master this translation — from randomness to reliability — will define the next decade of software economics.
Over time: token costs fall (Moore’s-law for inference), routing + caching improve, usage distributions compress. Variance becomes background noise. In SaaS, predictability was engineered. In AI, it must be earned through scale. Founders who master: modelling variance, designing buffers, aligning price with value — will own the decade. Because in this new economy your revenue line isn’t straight anymore. It’s a probability distribution.
What are your thoughts?