How we built it: Usage-based billing

Taras Mitran Billing Experience
Karan Dhabalia Usage-Based Billing
Blog > How we built it: Usage-based billing > Header image

Illustration by Álvaro Bernis

Usage-based billing (UBB) is rapidly becoming the preferred pricing model for many businesses due to the way it provides flexibility to customers while allowing businesses to precisely align cost and value.

At Stripe, we’ve seen a significant increase in businesses requesting high-throughput UBB solutions. We began to address this last year at Stripe Sessions where we debuted our UBB product, which is part of Stripe Billing. Then in October, we released a set of major upgrades, including support for credit burndown pricing models and the capacity to process up to 100,000 events per second per business. 

As we’ve built our UBB product, we’ve emphasized three core features:

  1. Accurate and highly available revenue ledger
  2. Real-time events processing with ultrahigh throughput billing
  3. The ability to support complex pricing models and accurate billing, even in the face of delayed events

Any one of these features is hard to build on its own. They’re even more challenging to engineer together due to the way high throughput puts pressure on real-time processing, for example. To have everything we wanted in our UBB product, we had to find ways to scale our data systems far beyond what was considered state of the art when we started out. And, as we explain in the rest of this post, many of our design decisions created additional challenges that we then had to solve. We think that the product architecture we ended up with contains some useful lessons for how to think about building a highly scaled, highly reliable event streaming platform—in addition to enabling great UBB experiences for our users.

Lesson 1: Asynchronous events processing increases speed and lowers costs—but it needs to be combined with additional developer observability

As we built the second version of our UBB API, we wanted to increase throughput 100x while maintaining 99.999% availability with zero data loss and low latency. We also wanted to keep costs low, aiming to create a system that could process millions of events per second in a cost-effective manner. This required coming up with a new API design.

Most Stripe APIs process new events through a series of steps—authentication, validation, internal routing, RPC function calls—before running business logic within Stripe. This stateful process works well for most of our APIs, including payments, but it requires holding synchronous requests across many internal servers; this is prohibitively slow and expensive at the scale of an event stream, which involves orders of magnitude more requests.

We solve this in our UBB API by sending events to an edge router, which does stateless authentication and API validation, then loads them directly onto an event bus. This form of asynchronous processing allows the event to be conveyed to other parts of our UBB solution immediately (such as the Dashboard or billing logic) without slowing the event stream. But the challenge with asynchronous processing is that when processing does fail, developers don’t necessarily know it right away. This makes failures hard to debug. 

We addressed this challenge by building two developer tools: a Dashboard that lets you follow asynchronous events processing in real time, and webhooks that notify you when validation fails. These tools reflect a lesson we learned while building our UBB product, which is that asynchronous APIs are a good way to enable fast, reliable, cost-effective event streams—but only if they’re combined with improved developer observability.

Blog > How we built it: Usage-based billing > Image 1

Lesson 2: An active-active setup solves the problem of downtime, but it needs to be enriched with metadata that enables accurate reconciliation

Now that we had a usage data pipeline capable of ingesting 100,000 events per second per user, we had to come up with a way to process that data accurately, without compromising reliability or undermining latency. We started by choosing Apache Flink for its distributed stream processing capabilities, low latency, and exactly-once processing guarantees. 

Like any complex events processing system, Flink has occasional downtime. For some streaming applications, such as batch reporting or data analysis tools, this isn’t a problem. However, it doesn’t work for applications with real-time financial implications such as usage-based billing. Here, a business needs accurate aggregation and invoicing and its customers need to know exactly how much usage they’ve accrued. To address this, we decided to deploy Flink in an active-active setup, where we process the same event in two geographic regions simultaneously. 

This solves the problem of downtime, because if one region is down, we can use a failover mechanism to automatically route traffic to the other. But it also presents a challenge—ensuring data consistency across regions. To achieve this, we implemented an event metadata system that tags each event with a standardized time stamp, generated before reaching the two Flink applications, along with relevant details such as event type and source. This consistency ensures that the time stamp is the same across both streams, allowing us to accurately track and compare data. In the case of a delay in one stream, the metadata enables us to reconcile events by leveraging these time stamps, ensuring accurate aggregation and continuity in our billing calculations. This process—which we’re always evolving—helps maintain the integrity of our real-time ratings, even during any downtime.

Lesson 3: Use a “fast” path, “slow” path approach to matching usage with pricing

Blog > How we built it: UBB > Image 2

After we’d determined how to ingest and process usage-based events, we had to figure out how to match them with your billing logic, essentially converting a record of events into an invoice. We knew it was important to solve this in a way that allowed you to change your pricing on the fly—such as upgrading a customer to a new plan or applying a discount—and to apply those changes accurately without interrupting your customers’ usage.

To do that, we made the decision to turn your pricing structure into a stream of changes. The state of the stream at any given moment reflects the pricing structure assigned to a customer and when you change that pricing structure, it’s reflected in the stream. Then we had to match that pricing structure stream against a customer’s event stream, so that the two come together like two sides of a zipper—perfectly matching usage events with the rate at which they’re billed.

This gets tricky when the two streams fall out of sync. Maybe you want to apply a discount retroactive to the beginning of the day. That means opening a lookback window to resync past events with the newly updated pricing structure. This would be easy if we could just pause the event stream, but that would undo all of the throughput and processing optimizations. So, we had to find a way to adjust the matched streams (pricing and events) while keeping the event stream moving full speed. To do this, we created a dual-path aggregation system. It’s essentially two different pipelines moving at two different speeds for processing events. 

One is a fast path with a 30-second tumbling window, which we store in memory in real time. This path is what we use to trigger billing alerts when someone has spent too much money or is running out of credits. It guarantees we never lose an event, and it enables fast alerting so that your customers don’t have the experience of their balances going negative.  

The second path is slower, with a five-minute window. It’s where we write streaming events to disk, in a kind of transactional ledger where our systems don’t face the real-time pressure they do on the fast path. This allows us to accommodate any delayed or out-of-order events and handle edge cases effectively. It’s also where we generate product analytics, invoicing data, and financial records for revenue recognition.

This dual-path approach allows us to have the best of both worlds in the context of highly scaled, reliable events processing.

A usage-based billing solution that can scale with your ambitions

With these main pieces in place, our usage-based billing solution can handle 100,000 events per second per user, with a P95 latency under 30 seconds for time-sensitive operations. For most use cases, we maintain an end-to-end latency of about five minutes from usage ingestion to rated output. This performance enables us to offer near-real-time usage insights, support complex pricing models, and provide timely billing for our users.

Learn more about Stripe’s usage-based billing solution. And if you find this work exciting, come join us.

Like this post? Join our team.

Stripe builds financial tools and economic infrastructure for the internet.

Have any feedback or questions?

We’d love to hear from you.