Building Products That Scale (Without Dying)
I've built systems that scaled to millions of users and systems that collapsed at thousands. The difference is boring.
I've built systems that scaled to millions of users and systems that collapsed at thousands. The difference is boring.

Scaling is unsexy. Nobody wants to talk about database indexes and connection pools. Everyone wants to talk about features and AI.
Then load increases, the system falls over, and suddenly scaling is the only thing anyone talks about.
I've been on both sides. Here's what I've learned.
You don't fix scale by throwing hardware at it. Well, sometimes you do. But usually, scale problems reveal architecture flaws.
Common patterns:
The fix isn't more servers. It's fixing the architecture.
Every time you open a database connection, there's overhead. Open too many, and the database falls over.
Connection pools reuse connections. This is not exciting. It's essential.
I've seen production outages because someone forgot to configure the connection pool. 10,000 simultaneous connections → database melts.
"Just add an index" is the answer to half of all performance problems.
No index on a commonly queried field = full table scan = O(n) = system dies as data grows.
Add index = O(log n) = system keeps working.
Review slow queries weekly. Add indexes. Remove unused indexes. Basic hygiene.
Not everything needs to hit the database. User profiles that change rarely? Cache them. Reference data? Cache it. Expensive calculations? Cache the results.
Cache invalidation is famously hard. But "no caching" is worse than "imperfect caching" at scale.
User submits order → process payment → update inventory → send email → generate invoice → update analytics.
If all of this is synchronous, the user waits. If any step fails, everything fails.
Async:
Faster response, better resilience.
Without rate limits, one bad actor can take down your system. Or one buggy client. Or one legitimate user doing something unexpected.
Rate limits protect you from:
Add rate limits early. Increase them as needed. Never remove them.
"Don't prematurely optimize." True.
"Don't design for scale you'll never reach." True.
But also: design so that 10x growth doesn't require a rewrite.
Questions to ask:
You don't need to build for 10x. But you need to know what 10x requires.
Database becomes the bottleneck. It always does eventually. Solutions: read replicas, sharding, caching layer in front.
Third-party services become unreliable. That API that worked fine at low volume? It'll rate limit you or go down. Solutions: circuit breakers, fallbacks, queuing.
Monitoring becomes essential. At low scale, you can notice problems. At high scale, you need dashboards, alerts, and logging (see A Manifest for Better Logging). Build observability before you need it.
Simple bugs become critical. Memory leak that takes 12 hours to matter? At scale, you have more processes, it matters faster. That N+1 query? Multiplied across thousands of requests.
Before launching anything that might see real traffic:
Database:
Application:
Infrastructure:
Observability:
Protection:
Premature optimization is bad. But so is ignoring warning signs.
Scale when:
Don't scale when:
Real scaling decisions are based on data, not fear.
After building several systems that scaled well:
[CDN / Edge]
↓
[Load Balancer]
↓
[API Servers] (stateless, horizontally scalable)
↓
[Message Queue] (async work)
↓
[Workers] (background processing)
↓
[Cache Layer] (Redis/Memcached)
↓
[Database] (with read replicas)
Nothing revolutionary. Just solid fundamentals.
The teams that scale well aren't doing magic. They're doing the boring things consistently.
Scaling is earned through discipline, not cleverness. The systems that scale are the ones where someone cared about the boring parts: connection pools, indexes, caching, async processing, rate limits.
Not exciting. Just necessary.