Average Programmer

Most systems don’t break because of a missing feature. They break because they weren’t designed to survive growth, chaos, or time—mainly chaos.

That’s the first thing Designing Data-Intensive Applications wants you to understand. Chapter 1 isn’t about tech stacks or tuning knobs. It’s about how to build systems that don’t collapse under their own weight.

And you know what popped into my head? A bathroom scale.

Scalability: What my bathroom scale taught me about load

Scalability isn’t just a buzzword. It’s a design constraint. And I realized this while staring at my bathroom scale.

That little device does one job: measure weight. But it has a maximum load—mine tops out at 350 lbs. Go over that, and it stops being useful. The reading becomes inaccurate, or worse, nonexistent. Software is the same way.

Every system, from web servers to queues and databases, has its equivalent of that weight limit. But in software, load isn’t just how much, it’s how fast and how often.

You can have 100K users, and that might be fine. But if 50K show up at once during a ticket drop? That’s rush hour. And rush hour is what tests your scale.

chibi version of rush hour 1 poster

No, no, not that rush hour!

a toddler sitting on an alpaca, standing next to a chibi llama and a starfish

That’s better.

Scalability isn’t about handling more users. It’s about handling more users at once.

That’s why we ask:

Can our DB handle fan-out writes to 10M timelines?
Can our app layer handle 500 RPS bursts?
Can workers scale up fast enough when the queue spikes?
WHAT IF ARIANA GRANDE POSTS A VIDEO? THE COMMENTS. WILL MY YOUTUBE COMMENTS DISAPPEAR?

Scalability is architectural. It's built around two questions: Can it carry more load than the cognitive load I’m carrying now? What’s the real load this system needs to carry?

🚦 Rush hour (traffic) and the twitter timeline

Let’s apply this to a real-world problem: Twitter’s timeline architecture. Now, every system has a set of critical business problems that make heads hurt. For payment processing, it’s preventing double processing (let’s not mess with people’s money). For a toddler’s bedtime routine, it’s getting them to bed. So, of course, news feeds will have their own set of issues.

Let’s start big. When a celebrity tweets:

Do you push that tweet to every follower’s timeline immediately (fan-out on write)?
Or compute timelines per user when they open the app (fan-out on read)?

Each has tradeoffs:

Fan-out on write = massive burst load, but fast reads
Fan-out on read = consistent writes, but slower reads

Neither scales forever.

That’s why Twitter uses hybrid models with caching, deferred writes, async workers (you gotta handle computationally-heavy timelines somehow, right), priority queues, x, y, z, this, that, and the third in order for you, the hopefully-didn’t-just-dust-off-the-old-iPhone-and-haven’t-touched-twitter-in-3-years user, to open up the app and obtain a fresh timeline (with async fetches and recommended posts so you never put down your phone).

You can’t just buy a bigger scale. You need different kinds of scales for different kinds of load.

This is the key lesson from scalability: you can’t optimize for the average (which ultimately ignores the lived experience of many, many people). There are many different philosophies and approaches to this, but, in my opinion, if you think it’ll get enough people to write bad reviews and skew your J curve, you should probably design with it in mind.

A couple questions to keep your mind busy on your days off:

What breaks first?
Why does one person have 5 variations of hydroflasks?

Next in this series: What my toddler taught me about fault tolerance

The bathroom scale and the myth of scalability: Why most systems fail at rush hour—not at rest

Scalability: What my bathroom scale taught me about load

🚦 Rush hour (traffic) and the twitter timeline