43 — LLD: Rate Limiter
Frequently Asked at Salesforce — Reported repeatedly in SMTS technical rounds (2024-2026).
Understanding the Problem
A rate limiter caps how many requests a given caller can make in a given window. It returns allowed=true or allowed=false with a retryAfterMs hint. The interesting bit is choosing the algorithm: fixed window, sliding window log, or token bucket, each with different burst semantics.
What is rate limiting?
It is admission control at the gate of your service. Instead of your downstream systems buckling under a traffic spike, the limiter rejects the overflow early and tells the caller when to try again. The algorithm choice is about how "bursty" you tolerate.
Requirements
Clarifying Questions
You: In-memory per node, or distributed?
Interviewer: In-memory for v1, but design the interface so distributed is swappable.
Keep a narrow interface so a Redis-backed implementation drops in later.
You: Hard reject or queue?
Interviewer: Hard reject with retry-after hint.
No queue. Return rejection atomically.
You: Limits dynamic per user?
Interviewer: Yes — free tier 60/min, paid 600/min.
Introduce a LimitResolver so the rate limiter itself does not know the business rules.
You: Per endpoint too?
Interviewer: Yes, the key should be
(userId, endpoint).
Key is a composite string built by the caller.
Final Requirements
tryAcquire(key, now)returns{allowed, retryAfterMs}.- Three algorithm choices: fixed window, sliding window log, token bucket.
- Pluggable via a common interface.
- Thread-safe per key.
- Configurable limit and window.
- Friendly to distributed follow-up (Redis).
Out of scope: cost-based limiting (not all requests are equal), queueing, cross-region coordination.
Core Entities and Relationships
| Entity | Responsibility |
|---|---|
RateCheck | {allowed: bool, retryAfterMs: int}. |
RateLimiter | Interface. tryAcquire(key, now). |
FixedWindowLimiter | Counter reset per window boundary. |
SlidingWindowLogLimiter | Deque of timestamps; evict older than window. |
TokenBucketLimiter | Refill over time; spend 1 per acquire. |
LimitConfig | limit, windowMs, refillPerMs. |
LimitResolver | Returns config for a given key. |
RateLimiterFactory | Builds a limiter given a policy enum. |
Why a Factory? So callers do not know or care which algorithm is wired — they pass "TOKEN_BUCKET" in config.
Why LimitResolver separate? Keeps the limiter pure. The resolver is where tier lookup (free vs paid) happens.
Design is iterative — each algorithm is about 20 lines once the interface exists.
Class Design
RateLimiter (interface)
Methods:
tryAcquire(key, now) -> RateCheck
FixedWindowLimiter
State:
limit: int,windowMs: intcounters: Map<key, {windowStart, count}>
SlidingWindowLogLimiter
State:
limit: int,windowMs: intlog: Map<key, Deque<timestamp>>
TokenBucketLimiter
State:
capacity: int,refillPerMs: floatbuckets: Map<key, {tokens, lastRefill}>
Design principles at play:
- Strategy Pattern — three algorithms behind one interface.
- Factory Pattern —
RateLimiterFactory.create(policy, config). - Open/Closed — adding a fourth algorithm (GCRA, leaky bucket) is a new class, zero edits.
Implementation
Core Logic
Fixed window — cheapest but spiky at boundaries. A user at the edge can make 2*limit requests in a second if they straddle the boundary.
Sliding window log — most accurate but memory is O(limit) per key. Great for small limits, bad for high-throughput endpoints.
Token bucket — allows bursts up to capacity, then smooths to refill rate. Most production APIs use this.
Verification — Token Bucket
Capacity 5, refill 1 per second.
t=0,tryAcquire(alice)→ bucket {tokens=5, lastRefill=0}. Elapsed 0. Tokens stay 5. Spend 1 → 4. Allowed.t=100ms,tryAcquire(alice)→ elapsed 100ms, refill 0.1 → tokens 4.1. Spend 1 → 3.1. Allowed.- Rapid 5 more calls within a second → tokens drop below 1 →
allowed=false, retryAfterMs=ceil((1-tokens)/refillPerMs). - Wait 1s → tokens climb back; next call allowed.
Thread Safety
- Per-key entries must be read, mutated, and written atomically.
- In Java,
ConcurrentHashMap.compute(key, fn)locks only that bin — clean and fast. - Sliding log needs a per-key
Dequeunder a lock becausewhile (...) arr.shift()pluspushis a multi-step operation. - Distributed version: Redis with a Lua script for atomic
INCR+EXPIRE(fixed window), orZADD+ZREMRANGEBYSCORE+ZCARD(sliding log). Lua guarantees atomicity on a single Redis node.
Complete Code Implementation
import java.util.*;
import java.util.concurrent.*;
record RateCheck(boolean allowed, long retryAfterMs) {}
interface RateLimiter { RateCheck tryAcquire(String key, long now); }
class FixedWindowLimiter implements RateLimiter {
private final long limit, windowMs;
private final ConcurrentHashMap<String, long[]> counters = new ConcurrentHashMap<>();
public FixedWindowLimiter(long limit, long windowMs) { this.limit = limit; this.windowMs = windowMs; }
public RateCheck tryAcquire(String key, long now) {
long windowStart = (now / windowMs) * windowMs;
long[] state = counters.compute(key, (k, v) -> {
if (v == null || v[0] != windowStart) return new long[]{windowStart, 1};
if (v[1] < limit) { v[1]++; return v; }
return v;
});
if (state[1] <= limit && !(state[0] == windowStart && state[1] > limit)) {
// one more check: if we didn't exceed, allow
if (state[0] == windowStart && state[1] <= limit) return new RateCheck(true, 0);
}
if (state[1] >= limit && state[0] == windowStart) {
return new RateCheck(false, windowStart + windowMs - now);
}
return new RateCheck(true, 0);
}
}
class SlidingWindowLogLimiter implements RateLimiter {
private final int limit; private final long windowMs;
private final ConcurrentHashMap<String, Deque<Long>> log = new ConcurrentHashMap<>();
public SlidingWindowLogLimiter(int limit, long windowMs) { this.limit = limit; this.windowMs = windowMs; }
public RateCheck tryAcquire(String key, long now) {
Deque<Long> q = log.computeIfAbsent(key, k -> new ArrayDeque<>());
synchronized (q) {
long cutoff = now - windowMs;
while (!q.isEmpty() && q.peekFirst() <= cutoff) q.pollFirst();
if (q.size() < limit) { q.offerLast(now); return new RateCheck(true, 0); }
return new RateCheck(false, q.peekFirst() + windowMs - now);
}
}
}
class TokenBucketLimiter implements RateLimiter {
private final double capacity, refillPerMs;
private final ConcurrentHashMap<String, double[]> buckets = new ConcurrentHashMap<>();
public TokenBucketLimiter(double capacity, double refillPerMs) {
this.capacity = capacity; this.refillPerMs = refillPerMs;
}
public RateCheck tryAcquire(String key, long now) {
double[] res = new double[2];
buckets.compute(key, (k, v) -> {
double[] b = v == null ? new double[]{capacity, now} : v;
double elapsed = now - b[1];
b[0] = Math.min(capacity, b[0] + elapsed * refillPerMs);
b[1] = now;
if (b[0] >= 1) { b[0] -= 1; res[0] = 1; res[1] = 0; }
else { res[0] = 0; res[1] = (long) Math.ceil((1 - b[0]) / refillPerMs); }
return b;
});
return new RateCheck(res[0] == 1, (long) res[1]);
}
}#include <chrono>
#include <deque>
#include <mutex>
#include <string>
#include <unordered_map>
struct RateCheck { bool allowed; long long retryAfterMs; };
struct RateLimiter { virtual RateCheck tryAcquire(const std::string& key, long long now) = 0; };
class TokenBucketLimiter : public RateLimiter {
public:
TokenBucketLimiter(double cap, double r) : capacity(cap), refillPerMs(r) {}
RateCheck tryAcquire(const std::string& key, long long now) override {
std::lock_guard<std::mutex> g(mu);
auto& b = buckets[key];
if (b.lastRefill == 0) { b.tokens = capacity; b.lastRefill = now; }
double elapsed = now - b.lastRefill;
b.tokens = std::min(capacity, b.tokens + elapsed * refillPerMs);
b.lastRefill = now;
if (b.tokens >= 1) { b.tokens -= 1; return {true, 0}; }
return {false, static_cast<long long>((1 - b.tokens) / refillPerMs + 1)};
}
private:
struct Bucket { double tokens = 0; long long lastRefill = 0; };
double capacity, refillPerMs;
std::unordered_map<std::string, Bucket> buckets;
std::mutex mu;
};
class FixedWindowLimiter : public RateLimiter {
public:
FixedWindowLimiter(long long lim, long long w) : limit(lim), windowMs(w) {}
RateCheck tryAcquire(const std::string& key, long long now) override {
std::lock_guard<std::mutex> g(mu);
long long windowStart = (now / windowMs) * windowMs;
auto& s = counters[key];
if (s.windowStart != windowStart) { s = {windowStart, 1}; return {true, 0}; }
if (s.count < limit) { s.count++; return {true, 0}; }
return {false, windowStart + windowMs - now};
}
private:
struct Counter { long long windowStart = -1; long long count = 0; };
long long limit, windowMs;
std::unordered_map<std::string, Counter> counters;
std::mutex mu;
};
class SlidingWindowLogLimiter : public RateLimiter {
public:
SlidingWindowLogLimiter(int lim, long long w) : limit(lim), windowMs(w) {}
RateCheck tryAcquire(const std::string& key, long long now) override {
std::lock_guard<std::mutex> g(mu);
auto& q = log[key];
long long cutoff = now - windowMs;
while (!q.empty() && q.front() <= cutoff) q.pop_front();
if ((int)q.size() < limit) { q.push_back(now); return {true, 0}; }
return {false, q.front() + windowMs - now};
}
private:
int limit; long long windowMs;
std::unordered_map<std::string, std::deque<long long>> log;
std::mutex mu;
};interface RateCheck { allowed: boolean; retryAfterMs: number; }
interface RateLimiter { tryAcquire(key: string, now?: number): RateCheck; }
class FixedWindowLimiter implements RateLimiter {
private counters = new Map<string, { windowStart: number; count: number }>();
constructor(private limit: number, private windowMs: number) {}
tryAcquire(key: string, now = Date.now()): RateCheck {
const entry = this.counters.get(key);
const windowStart = Math.floor(now / this.windowMs) * this.windowMs;
if (!entry || entry.windowStart !== windowStart) {
this.counters.set(key, { windowStart, count: 1 });
return { allowed: true, retryAfterMs: 0 };
}
if (entry.count < this.limit) { entry.count += 1; return { allowed: true, retryAfterMs: 0 }; }
return { allowed: false, retryAfterMs: windowStart + this.windowMs - now };
}
}
class SlidingWindowLogLimiter implements RateLimiter {
private log = new Map<string, number[]>();
constructor(private limit: number, private windowMs: number) {}
tryAcquire(key: string, now = Date.now()): RateCheck {
const arr = this.log.get(key) ?? [];
const cutoff = now - this.windowMs;
while (arr.length && arr[0] <= cutoff) arr.shift();
if (arr.length < this.limit) { arr.push(now); this.log.set(key, arr); return { allowed: true, retryAfterMs: 0 }; }
return { allowed: false, retryAfterMs: arr[0] + this.windowMs - now };
}
}
class TokenBucketLimiter implements RateLimiter {
private buckets = new Map<string, { tokens: number; lastRefill: number }>();
constructor(private capacity: number, private refillPerMs: number) {}
tryAcquire(key: string, now = Date.now()): RateCheck {
const b = this.buckets.get(key) ?? { tokens: this.capacity, lastRefill: now };
const elapsed = now - b.lastRefill;
b.tokens = Math.min(this.capacity, b.tokens + elapsed * this.refillPerMs);
b.lastRefill = now;
if (b.tokens >= 1) { b.tokens -= 1; this.buckets.set(key, b); return { allowed: true, retryAfterMs: 0 }; }
this.buckets.set(key, b);
return { allowed: false, retryAfterMs: Math.ceil((1 - b.tokens) / this.refillPerMs) };
}
}
type Policy = "FIXED_WINDOW" | "SLIDING_LOG" | "TOKEN_BUCKET";
class RateLimiterFactory {
static create(policy: Policy, limit: number, windowMs: number): RateLimiter {
switch (policy) {
case "FIXED_WINDOW": return new FixedWindowLimiter(limit, windowMs);
case "SLIDING_LOG": return new SlidingWindowLogLimiter(limit, windowMs);
case "TOKEN_BUCKET": return new TokenBucketLimiter(limit, limit / windowMs);
}
}
}Extensibility
1. "Burst allowance"
Token bucket already supports bursts via capacity greater than
refillPerMs * windowMs. For fixed/sliding window, compose a second limiter with a shorter window and smaller limit — both must allow.
2. "Distributed across N nodes"
Move the state map to Redis. The interface does not change; the Redis-backed impl uses a Lua script per call for atomicity. Local copies can optionally cache denials for a short TTL.
class RedisTokenBucket implements RateLimiter {
async tryAcquire(key: string, now = Date.now()): Promise<RateCheck> {
// EVAL script that reads bucket, refills, decrements or rejects, writes back
}
}3. "Per-IP plus per-user combined"
Compose two limiters; request allowed only if both allow. Return the larger
retryAfterMs. Decorator pattern fits cleanly.
What is Expected at Each Level
| Level | Expectations |
|---|---|
| Mid | One algorithm, in-memory, no retry-after. |
| Senior / SMTS | Three strategies, clean interface, retry-after, thread-safe per key, mentions distributed option. |
| Staff | Exact semantics of sliding-window-counter approximation, cost of Redis round-trip, fallback on Redis outage, GCRA or leaky bucket variants. |