Jun 8, 20245 min read

Building Products That Scale (Without Dying)

I've built systems that scaled to millions of users and systems that collapsed at thousands. The difference is boring.

architecture engineering product-management

Building at Scale

Scaling is unsexy. Nobody wants to talk about database indexes and connection pools. Everyone wants to talk about features and AI.

Then load increases, the system falls over, and suddenly scaling is the only thing anyone talks about.

I've been on both sides. Here's what I've learned.

Scale Problems Are Architecture Problems

You don't fix scale by throwing hardware at it. Well, sometimes you do. But usually, scale problems reveal architecture flaws.

Common patterns:

System works fine at 100 users, dies at 1,000 → synchronous dependencies
Works fine for months, dies during peak event → no capacity planning
Works fine for writes, dies for reads → missing read replicas or caching
Works fine for simple queries, dies for complex ones → N+1 queries or missing indexes

The fix isn't more servers. It's fixing the architecture.

The Boring Things That Matter

1. Connection Pooling

Every time you open a database connection, there's overhead. Open too many, and the database falls over.

Connection pools reuse connections. This is not exciting. It's essential.

I've seen production outages because someone forgot to configure the connection pool. 10,000 simultaneous connections → database melts.

2. Indexes

"Just add an index" is the answer to half of all performance problems.

No index on a commonly queried field = full table scan = O(n) = system dies as data grows.

Add index = O(log n) = system keeps working.

Review slow queries weekly. Add indexes. Remove unused indexes. Basic hygiene.

3. Caching

Not everything needs to hit the database. User profiles that change rarely? Cache them. Reference data? Cache it. Expensive calculations? Cache the results.

Cache invalidation is famously hard. But "no caching" is worse than "imperfect caching" at scale.

4. Async Processing

User submits order → process payment → update inventory → send email → generate invoice → update analytics.

If all of this is synchronous, the user waits. If any step fails, everything fails.

Async:

User submits order → respond immediately
Queue the rest for background processing
If email fails, retry later

Faster response, better resilience.

5. Rate Limiting

Without rate limits, one bad actor can take down your system. Or one buggy client. Or one legitimate user doing something unexpected.

Rate limits protect you from:

DDoS attacks
Runaway loops
Expensive operations overwhelming resources
Bill shock from API costs

Add rate limits early. Increase them as needed. Never remove them.

Design for 10x

"Don't prematurely optimize." True.

"Don't design for scale you'll never reach." True.

But also: design so that 10x growth doesn't require a rewrite.

Questions to ask:

If traffic 10x'd tomorrow, what breaks first?
Can we add capacity by adding machines, or do we need architectural changes?
Where are the single points of failure?

You don't need to build for 10x. But you need to know what 10x requires.

What Actually Happens at Scale

Database becomes the bottleneck. It always does eventually. Solutions: read replicas, sharding, caching layer in front.

Third-party services become unreliable. That API that worked fine at low volume? It'll rate limit you or go down. Solutions: circuit breakers, fallbacks, queuing.

Monitoring becomes essential. At low scale, you can notice problems. At high scale, you need dashboards, alerts, and logging (see A Manifest for Better Logging). Build observability before you need it.

Simple bugs become critical. Memory leak that takes 12 hours to matter? At scale, you have more processes, it matters faster. That N+1 query? Multiplied across thousands of requests.

My Scaling Checklist

Before launching anything that might see real traffic:

Database:

[ ] Connection pooling configured
[ ] Indexes on queried columns
[ ] Query analyzer run on slow queries
[ ] Read replica for read-heavy loads (or plan for it)

Application:

[ ] Stateless where possible
[ ] Heavy work moved to background jobs
[ ] Circuit breakers on external dependencies
[ ] Graceful degradation when services fail

Infrastructure:

[ ] Horizontal scaling possible (can add more instances)
[ ] Load balancer in front
[ ] Health checks configured
[ ] Auto-scaling rules defined

Observability:

[ ] Key metrics dashboarded
[ ] Alerts on critical thresholds
[ ] Logs structured and queryable
[ ] Distributed tracing for request flows

Protection:

[ ] Rate limiting on all endpoints (critical for AI systems too)
[ ] Authentication/authorization checked
[ ] Input validation everywhere
[ ] DDoS protection at edge

When to Actually Scale

Premature optimization is bad. But so is ignoring warning signs.

Scale when:

P95 latency is creeping up
Database CPU consistently above 70%
You're hitting rate limits on dependencies
Error rate is increasing with traffic

Don't scale when:

"We might get featured on TechCrunch"
"What if we go viral?"
"Better safe than sorry"

Real scaling decisions are based on data, not fear.

The Architecture That Scales

After building several systems that scaled well:

[CDN / Edge]
    ↓
[Load Balancer]
    ↓
[API Servers] (stateless, horizontally scalable)
    ↓
[Message Queue] (async work)
    ↓
[Workers] (background processing)
    ↓
[Cache Layer] (Redis/Memcached)
    ↓
[Database] (with read replicas)

Nothing revolutionary. Just solid fundamentals.

The teams that scale well aren't doing magic. They're doing the boring things consistently.

Scaling is earned through discipline, not cleverness. The systems that scale are the ones where someone cared about the boring parts: connection pools, indexes, caching, async processing, rate limits.

Not exciting. Just necessary.