Back to Break It Down
Break It Down

API Rate Limiting: The Nightclub Bouncer Mental Model

Qentium TeamNov 25, 20244 min read

Think of your API as a nightclub. Without rate limiting, it's like a club with no bouncer, no capacity limit, and no line management.

Sounds fun? It's not. The club gets overcrowded, the music stops, everyone's unhappy.

That's where the bouncer comes in.

The Nightclub Analogy

No Limit = Disaster

Without rate limiting, anyone can hit your API as hard as they want. One user accidentally (or intentionally) sends 10,000 requests per second. Your servers choke. Everyone suffers.

This is called a "Denial of Service" attack—even if it's accidental.

The Bouncer (Rate Limiter)

The bouncer decides: - How many people can enter at once (requests per second) - How many people can wait in line (queue length) - What happens when the line is full (429 Too Many Requests)

Good rate limiting protects everyone. It lets legitimate users in while keeping bad actors out.

How Rate Limiting Works

Token Bucket

Imagine a bucket of tokens. Every request costs a token. Tokens refill at a fixed rate.

  • **Burst:** You can use up to the bucket size instantly
  • **Sustained:** After the bucket empties, you're limited by the refill rate

This handles traffic spikes well while preventing long-term abuse.

Leaky Bucket

Requests go into a "bucket" and leak out at a fixed rate. Like water in a leaky bucket.

Great for smoothing out traffic. If requests come in bursts, they queue up and process at a steady rate.

Fixed Window

Count requests in fixed time windows (e.g., "100 requests per minute"). Simple to implement but can have edge cases at window boundaries.

Sliding Window

More accurate than fixed window. Tracks requests in a rolling time window. Harder to implement but fairer.

What to Return

When a user hits the limit:

429 Too Many Requests — The standard response. Include headers: - `Retry-After`: How long to wait - `X-RateLimit-Remaining`: Requests left - `X-RateLimit-Limit`: Total allowed

Don't just drop requests! Tell users when they can try again.

Strategy by Use Case

Public APIs: Strict limits. 100-1000 requests/hour. Requires authentication.

Internal APIs: Looser limits. Trust your colleagues (a bit).

Third-party integrations: Tiered limits. Pay more, get more.

Login/Auth: Tightest limits. This is where brute-force attacks happen.

The Philosophy

Rate limiting isn't about blocking users. It's about fairness.

The goal isn't "let in as few people as possible." It's "let in as many people as possible without destroying the experience."

Be generous with your limits. Be clear about when you're not. And always, always tell users why they were blocked.

That's how you keep the party going.