Cloud Pricing Is A UX Problem

Cloud platforms price units. Developers ship systems.

RecallMEM looks like a chat app until you try to price it.

Then it becomes a web app, Postgres, pgvector, background memory jobs, embeddings, file uploads, provider keys, voice sessions, transcript storage, model routing, and one very expensive mistake where background tasks quietly burned through Claude API tokens like an idiot.

That is when cloud pricing starts getting weird. Not because the math is fake. The math is usually real. CPU costs money. Memory costs money. Storage costs money. Model calls cost money. Unfortunately, no one is running a charity for my app.

The weird part is that the thing you think you are pricing is often not the thing you are actually building.

I thought I was pricing a request. I was pricing a small system wearing a request costume.

That’s the part pricing pages are bad at showing you.

The Wrong Model

A lot of cloud pricing arguments start with the same move: pick a unit that makes your platform look sane.

CPU milliseconds. RAM. Requests. Seats. Machines. Function invocations. Bandwidth. Active time. Idle time.

All of those units can be honest. None of them are the whole product. That is why these debates get annoying so fast. Someone says a platform is expensive. Someone else says the comparison is unfair because the workload is mostly idle. Someone else points out all the tiny metered charges hiding around the edges.

Everyone is technically arguing about pricing. They are really arguing about what the app is.

Cloud pricing is not just a spreadsheet problem. It is a UX problem. A good pricing model helps me understand what my app depends on before the bill shows up. A bad one explains my architecture back to me after it already charged my card.

CPU Is Not The Whole Computer

There was a recent X discussion about Vercel sandbox pricing versus a small always-available machine. Jacob Paris made a fair point: if a workload only uses active CPU a tiny percent of the month, pricing it like a fully busy VM is misleading.

I agree with that. But CPU is not the whole computer.

Sometimes the thing you care about is memory staying provisioned. Or files staying where they were. Or a database living nearby. Or an open socket. Or a runtime with packages installed because an agent just spent ten minutes setting up its little workbench and it would be annoying if all of that vanished.

This is where AI apps get annoying to price. The CPU graph can look boring while the product is still very much alive. A voice agent might be waiting on audio. A memory app might be writing facts after the user leaves. An agent might be holding files, tools, logs, and a half-finished plan. A chat app might have background jobs doing the actual expensive work after the visible response is done.

You thought you were pricing a request. Then the product needed a database. Then a background worker. Then embeddings. Then storage. Then a voice socket. Then a model call. Then another model call. Then the cheap background model accidentally became the expensive model and now your “small feature” has a daily burn rate.

That was RecallMEM for me.

The infrastructure bill was not the scariest part. I was testing on a Fly Sprite, not a normal Fly Machine, and the sleep/wake behavior made the compute side feel reasonable. The app could sleep when inactive and wake fast enough that, in testing, it did not feel like the whole thing had gone cold.

The scary part was the model bill. Some background jobs were routing through a more expensive Claude model than I intended. Fact extraction, profile updates, title generation. Tiny product behaviors. Normal usage. Quietly $30+ per day.

The app looked calm. The bill absolutely did not. That is pricing surprise: not “this costs money,” because of course it costs money, but “which behavior caused this?” after the damage was already visible.

Requests Hide Systems

Most AI demos are one request. Ask a question. Get an answer. Nice.

But the product version of that demo is rarely that clean.

In RecallMEM, one user message can trigger retrieval, prompt assembly, provider routing, usage tracking, transcript storage, fact extraction, quote validation, embedding, profile updates, and future recall. The user sees a chat bubble. The system sees a little parade of work.

Voice makes this even more obvious. The user is just talking. But the app has to listen, stream audio, respond quickly, preserve context, avoid interruptions, save turns, and make the conversation useful later.

A demo usually looks like a request. A product is usually a small system wearing a request costume.

That does not make request-based pricing bad. Sometimes it is great. It just means the developer has to know when the request stopped being the product.

Platform Constraints Become Product Constraints

This stops being theoretical once the platform starts shaping product decisions.

The other recent thread that stuck with me was Zach Wilson’s post about DataExpert leaving Heroku after nine years. He said they had deployed the app more than 2,500 times on Heroku, were paying roughly $1,200/month, and moved to Fly.io for about $175/month with better performance and more flexibility.

The savings are the obvious part. The more interesting part is why they left.

It was not just “hosting got expensive.” It was wildcard SSL constraints, separate apps, pricing jumps, and database decisions. Those are product constraints. If one certificate limitation doubles your hosting shape, well then, that is not just billing. If a pricing tier changes how you split services, that is architecture. If your database choice is shaped by what keeps the bill tolerable, the platform is already inside the product.

That is the thing old platforms can hide for a long time. They feel simple until your product shape stops matching their pricing shape. Then the platform starts making product decisions with you. Not because anyone was malicious. Because constraints compound.

Localhost Is Lying To You

I felt a smaller version of this trying to get RecallMEM out of my local setup.

On my machine, everything felt obvious. Postgres was there. Ollama was there. Files were there. Environment variables were familiar. Model downloads had already happened. Random setup steps from three weeks ago were still quietly helping.

My computer was doing a lot of unpaid emotional labor.

Then I tried to run it somewhere else, and now I had to ask the rude questions. Where do secrets live? Does Postgres have pgvector? Are migrations running? Can file uploads survive? What happens to voice sessions? Which parts assume Ollama is nearby? Which background jobs run after the request? Which model do they use? If someone connects an OpenAI key, do we switch embeddings to OpenAI, or do we keep using Gemma?

That is not just deployment. That is discovering what the app actually is.

This is where I like explicit primitives more than magic. If I am using a machine, I want to know what size it is. If something sleeps, I want to know what wakes it. If storage persists, I want to know where.

That does not mean every app should use the same platform. Sometimes you want serverless. Sometimes you want a VPS. Sometimes you want a machine. Sometimes you want a Sprite because you are experimenting and want a real little computer that does not punish you for walking away.

The point is not that one model wins every time. The point is that I want the model to be legible before the invoice.

Surprise Is The Bad UX

Most developers do not compare cloud platforms by list price. They compare them by surprise.

Heroku surprise is: why did this become so expensive?

Vercel surprise is: why are there all these little metered things on this bill?

Traditional VPS surprise is: wait, I own all of this now?

Fly surprise is: okay, I need to understand machines, memory, regions, volumes, and usage.

That last one is real. Explicit primitives ask more from you. If a platform gives you machines, it is asking you to think about machines. Sometimes you do not want that. Sometimes you just want to ship the thing and go be happy.

But visible complexity has one big advantage.

You can build a mental model before the bill shows up. You can look at the machine size. You can look at the memory. You can look at the volume. You can understand what stays running, what sleeps, where the database is, and what happens when the app sits there doing almost nothing.

That does not automatically make it cheaper. It makes it legible. And legible is underrated.

The Bill Should Be Boring

The bad outcome is not paying money. Apps cost money. The bad outcome is not knowing which behavior caused the bill.

Trust me, you do not want to learn this from the invoice.

That is when pricing becomes bad UX. The platform had a secret, and the invoice was the reveal.

By the way, this applies to model APIs too. If I choose an expensive model for an important user-facing answer, fine. I made that tradeoff. If background jobs quietly use that same expensive model because my routing logic ignored the dropdown, that is not a tradeoff.

That is a bug with a credit card attached.

The pricing I trust is pricing I can reason about. Not because it always gives me the lowest number. Because when the bill shows up, I want it to feel boring.

The worst bill is not the expensive one. The worst bill is the one that teaches you what your app was.


Chris Dabatos - Staff DevRel Engineer

Chris Dabatos

Staff DevRel Engineer, Builder, and Technical Storyteller based in Las Vegas. He builds things with AI and writes about what breaks.

Sections
Now playing
Intro
0:00 / 0:00