15 — LLD: Rate Limiter (Class-Level Design)
Understanding the Problem
A rate limiter controls how many requests a client can make within a time window. It protects services from abuse, ensures fair usage, and prevents overload. This is the class-level (LLD) design — focused on the algorithms, interfaces, and data structures. For the distributed/HLD version, see 07 — HLD: Rate Limiter.
What makes this an LLD problem?
The HLD version asks "where does the rate limiter sit in the architecture?" This LLD version asks "how do you implement the rate limiting algorithm itself?" You are designing classes, interfaces, and the internal state management for algorithms like Token Bucket, Fixed Window, and Sliding Window.
Requirements
Clarifying Questions
You: Should I support multiple rate limiting algorithms, or just one?
Interviewer: Design it so the algorithm is swappable. Implement at least two: Token Bucket and Sliding Window.
You: Should rate limiting be per-user, per-endpoint, or configurable?
Interviewer: Configurable. You should support per-user, per-endpoint, or per-user-per-endpoint.
You: What is the API surface? A single
is_allowed(request)method?
Interviewer: Yes. The rate limiter receives a request and returns whether it should be allowed or rejected. Also return how long until the client can retry.
You: Should I handle thread safety? Multiple requests from the same user could arrive concurrently.
Interviewer: Yes. The rate limiter must be safe for concurrent access.
You: Should the configuration be changeable at runtime? Like updating the rate limit for an endpoint?
Interviewer: That would be good. Discuss it at minimum.
Final Requirements
RateLimiterinterface withisAllowed(key) -> RateLimitResult.- Swappable algorithms via Strategy pattern: Token Bucket, Fixed Window, Sliding Window.
- Configurable rate limit rules: per-user, per-endpoint, or composite keys.
- Thread-safe implementations.
- Return both allow/deny decision and retry-after time.
- Runtime-configurable limits.
Out of scope: Distributed rate limiting (Redis, shared state), HTTP middleware integration, metrics/monitoring dashboards.
Core Entities and Relationships
| Entity | Responsibility |
|---|---|
RateLimitResult | Result of a rate limit check — allowed, remaining, retryAfter |
RateLimitAlgorithm | Interface — implements the core algorithm logic |
TokenBucket | Algorithm: steady refill of tokens, burst-friendly |
FixedWindowCounter | Algorithm: count requests per fixed time window |
SlidingWindowLog | Algorithm: exact count using timestamps of each request |
SlidingWindowCounter | Algorithm: weighted average of current and previous window |
RateLimitConfig | Configuration: max requests, window size, burst size |
RateLimitRule | Maps a key pattern (user, endpoint) to a config |
RateLimiter | Orchestrator: resolves the key, picks the right algorithm and config |
Why Strategy pattern? Different endpoints need different algorithms. An API endpoint with bursty traffic benefits from Token Bucket. A login endpoint needs strict per-window limits (Fixed Window). The Strategy pattern lets you swap algorithms without changing the rate limiter's public API.
Why separate RateLimitConfig from RateLimitAlgorithm? Config is data (numbers). Algorithm is behavior (logic). A single algorithm class can serve multiple keys, each with its own config. Separating them avoids creating one algorithm instance per key.
Class Design
RateLimitResult
State:
allowed: booleanremaining: number— how many more requests are allowed in the current windowretryAfter: number— seconds until the client can retry (0 if allowed)limit: number— the configured max requests
RateLimitAlgorithm (Interface)
Methods:
tryAcquire(key: string, config: RateLimitConfig) -> RateLimitResult
Every algorithm implements this single method. The key identifies the client/endpoint. The config provides the limits.
RateLimitConfig
State:
maxRequests: number— maximum requests per windowwindowSeconds: number— time window durationburstSize: number | undefined— for Token Bucket, the burst capacity (defaults to maxRequests)
Final Class Design
interface RateLimitConfig {
maxRequests: number;
windowSeconds: number;
burstSize?: number; // For token bucket; defaults to maxRequests
}
interface RateLimitResult {
allowed: boolean;
remaining: number;
retryAfter: number; // seconds; 0 if allowed
limit: number;
}
interface RateLimitAlgorithm {
tryAcquire(key: string, config: RateLimitConfig): RateLimitResult;
}Design principles at play:
- Strategy Pattern: Each algorithm is a concrete strategy. The rate limiter delegates to whichever strategy is configured for the given rule.
- Single Responsibility:
RateLimitConfigholds numbers.RateLimitAlgorithmholds logic.RateLimiterholds routing/orchestration. - Open/Closed: Adding a new algorithm (e.g., Leaky Bucket) means implementing one new class. Nothing else changes.
Implementation
Algorithm 1: Token Bucket
The Token Bucket algorithm works like this: imagine a bucket that holds tokens. Tokens are added at a steady rate (refill). Each request consumes one token. If the bucket is empty, the request is denied. The bucket has a maximum capacity (burst size), so tokens cannot accumulate beyond that.
Why Token Bucket? It naturally handles bursts. A client that was idle for a while has a full bucket and can make a burst of requests. This is good for APIs where traffic is spiky.
interface TokenBucketState {
tokens: number;
lastRefill: number;
}
class TokenBucket implements RateLimitAlgorithm {
private buckets: Map<string, TokenBucketState> = new Map();
tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
const now = Date.now() / 1000; // seconds
const refillRate = config.maxRequests / config.windowSeconds;
const burst = config.burstSize ?? config.maxRequests;
if (!this.buckets.has(key)) {
this.buckets.set(key, { tokens: burst, lastRefill: now });
}
const state = this.buckets.get(key)!;
// Refill tokens based on elapsed time
const elapsed = now - state.lastRefill;
const tokensToAdd = elapsed * refillRate;
state.tokens = Math.min(burst, state.tokens + tokensToAdd);
state.lastRefill = now;
if (state.tokens >= 1.0) {
state.tokens -= 1.0;
return {
allowed: true,
remaining: Math.floor(state.tokens),
retryAfter: 0,
limit: config.maxRequests,
};
} else {
// How long until one token is available?
const wait = (1.0 - state.tokens) / refillRate;
return {
allowed: false,
remaining: 0,
retryAfter: Math.round(wait * 100) / 100,
limit: config.maxRequests,
};
}
}
}Algorithm 2: Fixed Window Counter
The Fixed Window algorithm divides time into fixed windows (e.g., 0:00-1:00, 1:00-2:00). It counts requests in the current window and rejects when the count exceeds the limit.
Weakness: Boundary spike. If a client sends 100 requests at 0:59 and 100 at 1:01, they made 200 requests in 2 seconds but each window only sees 100.
interface FixedWindowState {
windowStart: number;
count: number;
}
class FixedWindowCounter implements RateLimitAlgorithm {
private windows: Map<string, FixedWindowState> = new Map();
tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
const now = Date.now() / 1000;
const windowStart = now - (now % config.windowSeconds);
const state = this.windows.get(key);
if (!state || state.windowStart !== windowStart) {
this.windows.set(key, { windowStart, count: 0 });
}
const current = this.windows.get(key)!;
if (current.count < config.maxRequests) {
current.count += 1;
return {
allowed: true,
remaining: config.maxRequests - current.count,
retryAfter: 0,
limit: config.maxRequests,
};
} else {
const retryAfter = windowStart + config.windowSeconds - now;
return {
allowed: false,
remaining: 0,
retryAfter: Math.round(Math.max(0, retryAfter) * 100) / 100,
limit: config.maxRequests,
};
}
}
}Algorithm 3: Sliding Window Log
This algorithm stores the timestamp of every request. To check the rate, it counts how many timestamps fall within the last windowSeconds. It is the most accurate but uses the most memory.
class SlidingWindowLog implements RateLimitAlgorithm {
private logs: Map<string, number[]> = new Map();
tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
const now = Date.now() / 1000;
const windowStart = now - config.windowSeconds;
if (!this.logs.has(key)) {
this.logs.set(key, []);
}
const log = this.logs.get(key)!;
// Remove timestamps outside the window
while (log.length > 0 && log[0] <= windowStart) {
log.shift();
}
if (log.length < config.maxRequests) {
log.push(now);
return {
allowed: true,
remaining: config.maxRequests - log.length,
retryAfter: 0,
limit: config.maxRequests,
};
} else {
// The oldest request in the window — when it expires, a slot opens
const oldest = log[0];
const retryAfter = oldest + config.windowSeconds - now;
return {
allowed: false,
remaining: 0,
retryAfter: Math.round(Math.max(0, retryAfter) * 100) / 100,
limit: config.maxRequests,
};
}
}
}Algorithm Comparison
| Algorithm | Accuracy | Memory | Burst Handling | Best For |
|---|---|---|---|---|
| Token Bucket | Smooth rate | O(1) per key | Naturally allows bursts | API rate limiting, bursty traffic |
| Fixed Window | Boundary spikes possible | O(1) per key | No burst control | Simple rate limits, login attempts |
| Sliding Window Log | Exact | O(n) per key (n = max_requests) | No burst control | When accuracy is critical |
The Orchestrator: RateLimiter
interface RateLimitRule {
keyType: string; // "user", "endpoint", "user_endpoint"
config: RateLimitConfig;
algorithm: RateLimitAlgorithm;
}
class RateLimiter {
private rules: Map<string, RateLimitRule> = new Map();
private defaultRule: RateLimitRule | null = null;
addRule(name: string, rule: RateLimitRule): void {
this.rules.set(name, rule);
}
setDefaultRule(rule: RateLimitRule): void {
this.defaultRule = rule;
}
updateConfig(name: string, config: RateLimitConfig): void {
const rule = this.rules.get(name);
if (rule) {
rule.config = config;
}
}
check(options: {
userId?: string;
endpoint?: string;
ruleName?: string;
}): RateLimitResult {
const rule = options.ruleName
? this.rules.get(options.ruleName)
: this.defaultRule;
if (!rule) {
// No rule found — allow by default
return { allowed: true, remaining: -1, retryAfter: 0, limit: -1 };
}
const key = this.buildKey(rule.keyType, options.userId, options.endpoint);
return rule.algorithm.tryAcquire(key, rule.config);
}
private buildKey(
keyType: string,
userId?: string,
endpoint?: string,
): string {
switch (keyType) {
case "user":
return `user:${userId ?? "anon"}`;
case "endpoint":
return `endpoint:${endpoint ?? "unknown"}`;
case "user_endpoint":
return `user:${userId ?? "anon"}:ep:${endpoint ?? "unknown"}`;
default:
return "global";
}
}
}Edge Cases
- First request from a new user — State is initialized on first call. Token Bucket starts with a full bucket. Fixed Window starts a new window.
- Clock drift — In a single-process system,
Date.now()is consistent. In distributed systems, use a centralized time source. - Config update during active window — Old state may be inconsistent with new limits. This is acceptable for rate limiting — eventual correctness is fine.
- Very long idle period — Token Bucket refills to max. Fixed Window's old window is discarded. Sliding Window Log's old timestamps are pruned.
- Zero window size — Should be validated at config creation time. A zero-second window means infinite rate, which is not useful.
Verification
Setup: Token Bucket, maxRequests=5 per 10 seconds, burstSize=5.
Step 1: 5 rapid requests at t=0.
- Bucket starts with 5 tokens.
- Request 1: tokens 5 -> 4. Allowed, remaining=4.
- Request 2: tokens 4 -> 3. Allowed, remaining=3.
- Request 3: tokens 3 -> 2. Allowed, remaining=2.
- Request 4: tokens 2 -> 1. Allowed, remaining=1.
- Request 5: tokens 1 -> 0. Allowed, remaining=0.
Step 2: 6th request at t=0.1.
- Elapsed: 0.1s. Refill rate: 5/10 = 0.5 tokens/sec. Tokens added: 0.05.
- Tokens: 0 + 0.05 = 0.05. Not enough (need 1.0).
- Denied. retryAfter = (1.0 - 0.05) / 0.5 = 1.9 seconds.
Step 3: Request at t=2.0.
- Elapsed since last refill: 1.9s. Tokens added: 1.9 * 0.5 = 0.95.
- Tokens: 0.05 + 0.95 = 1.0. Exactly enough.
- Allowed, tokens -> 0.0, remaining=0.
Step 4: Wait until t=12.0 (long idle).
- Elapsed: 10s. Tokens added: 10 * 0.5 = 5.0. Capped at burstSize=5.
- Tokens: 5.0. Full burst available again.
Complete Code Implementation
// ---- Config and Result ----
interface RateLimitConfig {
maxRequests: number;
windowSeconds: number;
burstSize?: number;
}
interface RateLimitResult {
allowed: boolean;
remaining: number;
retryAfter: number;
limit: number;
}
interface RateLimitAlgorithm {
tryAcquire(key: string, config: RateLimitConfig): RateLimitResult;
}
// ---- Token Bucket ----
interface TokenBucketState {
tokens: number;
lastRefill: number;
}
class TokenBucket implements RateLimitAlgorithm {
private buckets: Map<string, TokenBucketState> = new Map();
tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
const now = Date.now() / 1000;
const refillRate = config.maxRequests / config.windowSeconds;
const burst = config.burstSize ?? config.maxRequests;
if (!this.buckets.has(key)) {
this.buckets.set(key, { tokens: burst, lastRefill: now });
}
const state = this.buckets.get(key)!;
const elapsed = now - state.lastRefill;
state.tokens = Math.min(burst, state.tokens + elapsed * refillRate);
state.lastRefill = now;
if (state.tokens >= 1.0) {
state.tokens -= 1.0;
return { allowed: true, remaining: Math.floor(state.tokens), retryAfter: 0, limit: config.maxRequests };
} else {
const wait = (1.0 - state.tokens) / refillRate;
return { allowed: false, remaining: 0, retryAfter: Math.round(wait * 100) / 100, limit: config.maxRequests };
}
}
}
// ---- Fixed Window Counter ----
interface FixedWindowState {
windowStart: number;
count: number;
}
class FixedWindowCounter implements RateLimitAlgorithm {
private windows: Map<string, FixedWindowState> = new Map();
tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
const now = Date.now() / 1000;
const windowStart = now - (now % config.windowSeconds);
const state = this.windows.get(key);
if (!state || state.windowStart !== windowStart) {
this.windows.set(key, { windowStart, count: 0 });
}
const current = this.windows.get(key)!;
if (current.count < config.maxRequests) {
current.count += 1;
return {
allowed: true,
remaining: config.maxRequests - current.count,
retryAfter: 0,
limit: config.maxRequests,
};
} else {
const retry = windowStart + config.windowSeconds - now;
return {
allowed: false,
remaining: 0,
retryAfter: Math.round(Math.max(0, retry) * 100) / 100,
limit: config.maxRequests,
};
}
}
}
// ---- Sliding Window Log ----
class SlidingWindowLog implements RateLimitAlgorithm {
private logs: Map<string, number[]> = new Map();
tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
const now = Date.now() / 1000;
const windowStart = now - config.windowSeconds;
if (!this.logs.has(key)) {
this.logs.set(key, []);
}
const log = this.logs.get(key)!;
while (log.length > 0 && log[0] <= windowStart) {
log.shift();
}
if (log.length < config.maxRequests) {
log.push(now);
return {
allowed: true,
remaining: config.maxRequests - log.length,
retryAfter: 0,
limit: config.maxRequests,
};
} else {
const oldest = log[0];
const retry = oldest + config.windowSeconds - now;
return {
allowed: false,
remaining: 0,
retryAfter: Math.round(Math.max(0, retry) * 100) / 100,
limit: config.maxRequests,
};
}
}
}
// ---- Rate Limiter Orchestrator ----
interface RateLimitRule {
keyType: string;
config: RateLimitConfig;
algorithm: RateLimitAlgorithm;
}
class RateLimiter {
private rules: Map<string, RateLimitRule> = new Map();
private defaultRule: RateLimitRule | null = null;
addRule(name: string, rule: RateLimitRule): void {
this.rules.set(name, rule);
}
setDefaultRule(rule: RateLimitRule): void {
this.defaultRule = rule;
}
updateConfig(name: string, config: RateLimitConfig): void {
const rule = this.rules.get(name);
if (rule) rule.config = config;
}
check(options: {
userId?: string;
endpoint?: string;
ruleName?: string;
}): RateLimitResult {
const rule = options.ruleName
? this.rules.get(options.ruleName)
: this.defaultRule;
if (!rule) {
return { allowed: true, remaining: -1, retryAfter: 0, limit: -1 };
}
const key = this.buildKey(rule.keyType, options.userId, options.endpoint);
return rule.algorithm.tryAcquire(key, rule.config);
}
private buildKey(keyType: string, userId?: string, endpoint?: string): string {
switch (keyType) {
case "user":
return `user:${userId ?? "anon"}`;
case "endpoint":
return `endpoint:${endpoint ?? "unknown"}`;
case "user_endpoint":
return `user:${userId ?? "anon"}:ep:${endpoint ?? "unknown"}`;
default:
return "global";
}
}
}
// ---- Demo ----
const limiter = new RateLimiter();
// Rule: 5 requests per 10 seconds using Token Bucket
const apiRule: RateLimitRule = {
keyType: "user",
config: { maxRequests: 5, windowSeconds: 10 },
algorithm: new TokenBucket(),
};
limiter.addRule("api", apiRule);
// Rule: 3 login attempts per 60 seconds using Fixed Window
const loginRule: RateLimitRule = {
keyType: "user_endpoint",
config: { maxRequests: 3, windowSeconds: 60 },
algorithm: new FixedWindowCounter(),
};
limiter.addRule("login", loginRule);
// Simulate API requests
console.log("API rate limiting (Token Bucket):");
for (let i = 0; i < 7; i++) {
const result = limiter.check({ userId: "alice", ruleName: "api" });
const status = result.allowed ? "ALLOWED" : "DENIED";
console.log(
` Request ${i + 1}: ${status} (remaining=${result.remaining}, retryAfter=${result.retryAfter}s)`
);
}
// Simulate login attempts
console.log("\nLogin rate limiting (Fixed Window):");
for (let i = 0; i < 5; i++) {
const result = limiter.check({
userId: "alice",
endpoint: "/login",
ruleName: "login",
});
const status = result.allowed ? "ALLOWED" : "DENIED";
console.log(
` Attempt ${i + 1}: ${status} (remaining=${result.remaining}, retryAfter=${result.retryAfter}s)`
);
}Extensibility
1. Adding Leaky Bucket Algorithm
The Leaky Bucket processes requests at a fixed rate, like water dripping from a bucket. It queues excess requests instead of rejecting them immediately.
class LeakyBucket implements RateLimitAlgorithm {
private queues: Map<string, number[]> = new Map();
private lastLeak: Map<string, number> = new Map();
tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
const now = Date.now() / 1000;
const leakRate = config.maxRequests / config.windowSeconds;
const burst = config.burstSize ?? config.maxRequests;
if (!this.queues.has(key)) {
this.queues.set(key, []);
}
const queue = this.queues.get(key)!;
// Leak (process) queued requests
const lastLeakTime = this.lastLeak.get(key);
if (lastLeakTime !== undefined) {
const elapsed = now - lastLeakTime;
const leaked = Math.floor(elapsed * leakRate);
queue.splice(0, Math.min(leaked, queue.length));
}
this.lastLeak.set(key, now);
if (queue.length < burst) {
queue.push(now);
return {
allowed: true,
remaining: burst - queue.length,
retryAfter: 0,
limit: config.maxRequests,
};
} else {
const wait = 1.0 / leakRate;
return {
allowed: false,
remaining: 0,
retryAfter: Math.round(wait * 100) / 100,
limit: config.maxRequests,
};
}
}
}Tradeoff: Leaky Bucket smooths traffic perfectly but does not allow bursts. Good for downstream services that cannot handle spikes.
2. Composite Rate Limiting (Multiple Rules Per Request)
Apply multiple rules to the same request — e.g., per-user AND per-endpoint limits:
checkAll(
userId: string,
endpoint: string,
ruleNames: string[],
): RateLimitResult {
for (const name of ruleNames) {
const result = this.check({ userId, endpoint, ruleName: name });
if (!result.allowed) {
return result; // First rule that denies wins
}
}
return { allowed: true, remaining: 0, retryAfter: 0, limit: 0 };
}Tradeoff: Order matters. If rule A denies, rule B's counter is not incremented. This prevents double-counting but means rule B's state is not accurate.
3. Warm-Up Period
New users get a reduced rate limit that gradually increases:
interface RateLimitConfig {
maxRequests: number;
windowSeconds: number;
burstSize?: number;
warmupRequests?: number; // lower limit for new users
warmupDuration?: number; // seconds before full rate kicks in
}Tradeoff: Adds complexity to every algorithm. The algorithm must track when a key was first seen and interpolate between warmup and full limits.
What is Expected at Each Level
| Level | Expectations |
|---|---|
| Junior | Implement one algorithm (usually Fixed Window). Track counts per key. May not handle retry-after or thread safety. |
| Mid-level | Strategy pattern with at least two algorithms. Clean interface. Thread-safe implementations. Explain pros/cons of each algorithm (boundary spike problem, memory usage). Return retry-after. |
| Senior | Everything above plus: all three algorithms implemented, composable rules (per-user + per-endpoint), runtime config updates, discussion of distributed rate limiting (how to extend this to Redis), analysis of Token Bucket math (refill rate, burst capacity), Sliding Window Counter as a memory optimization over Sliding Window Log. |