Skip to content

15 — LLD: Rate Limiter (Class-Level Design)

Understanding the Problem

A rate limiter controls how many requests a client can make within a time window. It protects services from abuse, ensures fair usage, and prevents overload. This is the class-level (LLD) design — focused on the algorithms, interfaces, and data structures. For the distributed/HLD version, see 07 — HLD: Rate Limiter.

What makes this an LLD problem?

The HLD version asks "where does the rate limiter sit in the architecture?" This LLD version asks "how do you implement the rate limiting algorithm itself?" You are designing classes, interfaces, and the internal state management for algorithms like Token Bucket, Fixed Window, and Sliding Window.


Requirements

Clarifying Questions

You: Should I support multiple rate limiting algorithms, or just one?

Interviewer: Design it so the algorithm is swappable. Implement at least two: Token Bucket and Sliding Window.

You: Should rate limiting be per-user, per-endpoint, or configurable?

Interviewer: Configurable. You should support per-user, per-endpoint, or per-user-per-endpoint.

You: What is the API surface? A single is_allowed(request) method?

Interviewer: Yes. The rate limiter receives a request and returns whether it should be allowed or rejected. Also return how long until the client can retry.

You: Should I handle thread safety? Multiple requests from the same user could arrive concurrently.

Interviewer: Yes. The rate limiter must be safe for concurrent access.

You: Should the configuration be changeable at runtime? Like updating the rate limit for an endpoint?

Interviewer: That would be good. Discuss it at minimum.

Final Requirements

  1. RateLimiter interface with isAllowed(key) -> RateLimitResult.
  2. Swappable algorithms via Strategy pattern: Token Bucket, Fixed Window, Sliding Window.
  3. Configurable rate limit rules: per-user, per-endpoint, or composite keys.
  4. Thread-safe implementations.
  5. Return both allow/deny decision and retry-after time.
  6. Runtime-configurable limits.

Out of scope: Distributed rate limiting (Redis, shared state), HTTP middleware integration, metrics/monitoring dashboards.


Core Entities and Relationships

EntityResponsibility
RateLimitResultResult of a rate limit check — allowed, remaining, retryAfter
RateLimitAlgorithmInterface — implements the core algorithm logic
TokenBucketAlgorithm: steady refill of tokens, burst-friendly
FixedWindowCounterAlgorithm: count requests per fixed time window
SlidingWindowLogAlgorithm: exact count using timestamps of each request
SlidingWindowCounterAlgorithm: weighted average of current and previous window
RateLimitConfigConfiguration: max requests, window size, burst size
RateLimitRuleMaps a key pattern (user, endpoint) to a config
RateLimiterOrchestrator: resolves the key, picks the right algorithm and config

Why Strategy pattern? Different endpoints need different algorithms. An API endpoint with bursty traffic benefits from Token Bucket. A login endpoint needs strict per-window limits (Fixed Window). The Strategy pattern lets you swap algorithms without changing the rate limiter's public API.

Why separate RateLimitConfig from RateLimitAlgorithm? Config is data (numbers). Algorithm is behavior (logic). A single algorithm class can serve multiple keys, each with its own config. Separating them avoids creating one algorithm instance per key.


Class Design

RateLimitResult

State:

  • allowed: boolean
  • remaining: number — how many more requests are allowed in the current window
  • retryAfter: number — seconds until the client can retry (0 if allowed)
  • limit: number — the configured max requests

RateLimitAlgorithm (Interface)

Methods:

  • tryAcquire(key: string, config: RateLimitConfig) -> RateLimitResult

Every algorithm implements this single method. The key identifies the client/endpoint. The config provides the limits.

RateLimitConfig

State:

  • maxRequests: number — maximum requests per window
  • windowSeconds: number — time window duration
  • burstSize: number | undefined — for Token Bucket, the burst capacity (defaults to maxRequests)

Final Class Design

typescript
interface RateLimitConfig {
  maxRequests: number;
  windowSeconds: number;
  burstSize?: number; // For token bucket; defaults to maxRequests
}

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  retryAfter: number; // seconds; 0 if allowed
  limit: number;
}

interface RateLimitAlgorithm {
  tryAcquire(key: string, config: RateLimitConfig): RateLimitResult;
}

Design principles at play:

  • Strategy Pattern: Each algorithm is a concrete strategy. The rate limiter delegates to whichever strategy is configured for the given rule.
  • Single Responsibility: RateLimitConfig holds numbers. RateLimitAlgorithm holds logic. RateLimiter holds routing/orchestration.
  • Open/Closed: Adding a new algorithm (e.g., Leaky Bucket) means implementing one new class. Nothing else changes.

Implementation

Algorithm 1: Token Bucket

The Token Bucket algorithm works like this: imagine a bucket that holds tokens. Tokens are added at a steady rate (refill). Each request consumes one token. If the bucket is empty, the request is denied. The bucket has a maximum capacity (burst size), so tokens cannot accumulate beyond that.

Why Token Bucket? It naturally handles bursts. A client that was idle for a while has a full bucket and can make a burst of requests. This is good for APIs where traffic is spiky.

typescript
interface TokenBucketState {
  tokens: number;
  lastRefill: number;
}

class TokenBucket implements RateLimitAlgorithm {
  private buckets: Map<string, TokenBucketState> = new Map();

  tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
    const now = Date.now() / 1000; // seconds
    const refillRate = config.maxRequests / config.windowSeconds;
    const burst = config.burstSize ?? config.maxRequests;

    if (!this.buckets.has(key)) {
      this.buckets.set(key, { tokens: burst, lastRefill: now });
    }

    const state = this.buckets.get(key)!;

    // Refill tokens based on elapsed time
    const elapsed = now - state.lastRefill;
    const tokensToAdd = elapsed * refillRate;
    state.tokens = Math.min(burst, state.tokens + tokensToAdd);
    state.lastRefill = now;

    if (state.tokens >= 1.0) {
      state.tokens -= 1.0;
      return {
        allowed: true,
        remaining: Math.floor(state.tokens),
        retryAfter: 0,
        limit: config.maxRequests,
      };
    } else {
      // How long until one token is available?
      const wait = (1.0 - state.tokens) / refillRate;
      return {
        allowed: false,
        remaining: 0,
        retryAfter: Math.round(wait * 100) / 100,
        limit: config.maxRequests,
      };
    }
  }
}

Algorithm 2: Fixed Window Counter

The Fixed Window algorithm divides time into fixed windows (e.g., 0:00-1:00, 1:00-2:00). It counts requests in the current window and rejects when the count exceeds the limit.

Weakness: Boundary spike. If a client sends 100 requests at 0:59 and 100 at 1:01, they made 200 requests in 2 seconds but each window only sees 100.

typescript
interface FixedWindowState {
  windowStart: number;
  count: number;
}

class FixedWindowCounter implements RateLimitAlgorithm {
  private windows: Map<string, FixedWindowState> = new Map();

  tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
    const now = Date.now() / 1000;
    const windowStart = now - (now % config.windowSeconds);

    const state = this.windows.get(key);
    if (!state || state.windowStart !== windowStart) {
      this.windows.set(key, { windowStart, count: 0 });
    }

    const current = this.windows.get(key)!;

    if (current.count < config.maxRequests) {
      current.count += 1;
      return {
        allowed: true,
        remaining: config.maxRequests - current.count,
        retryAfter: 0,
        limit: config.maxRequests,
      };
    } else {
      const retryAfter = windowStart + config.windowSeconds - now;
      return {
        allowed: false,
        remaining: 0,
        retryAfter: Math.round(Math.max(0, retryAfter) * 100) / 100,
        limit: config.maxRequests,
      };
    }
  }
}

Algorithm 3: Sliding Window Log

This algorithm stores the timestamp of every request. To check the rate, it counts how many timestamps fall within the last windowSeconds. It is the most accurate but uses the most memory.

typescript
class SlidingWindowLog implements RateLimitAlgorithm {
  private logs: Map<string, number[]> = new Map();

  tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
    const now = Date.now() / 1000;
    const windowStart = now - config.windowSeconds;

    if (!this.logs.has(key)) {
      this.logs.set(key, []);
    }
    const log = this.logs.get(key)!;

    // Remove timestamps outside the window
    while (log.length > 0 && log[0] <= windowStart) {
      log.shift();
    }

    if (log.length < config.maxRequests) {
      log.push(now);
      return {
        allowed: true,
        remaining: config.maxRequests - log.length,
        retryAfter: 0,
        limit: config.maxRequests,
      };
    } else {
      // The oldest request in the window — when it expires, a slot opens
      const oldest = log[0];
      const retryAfter = oldest + config.windowSeconds - now;
      return {
        allowed: false,
        remaining: 0,
        retryAfter: Math.round(Math.max(0, retryAfter) * 100) / 100,
        limit: config.maxRequests,
      };
    }
  }
}

Algorithm Comparison

AlgorithmAccuracyMemoryBurst HandlingBest For
Token BucketSmooth rateO(1) per keyNaturally allows burstsAPI rate limiting, bursty traffic
Fixed WindowBoundary spikes possibleO(1) per keyNo burst controlSimple rate limits, login attempts
Sliding Window LogExactO(n) per key (n = max_requests)No burst controlWhen accuracy is critical

The Orchestrator: RateLimiter

typescript
interface RateLimitRule {
  keyType: string; // "user", "endpoint", "user_endpoint"
  config: RateLimitConfig;
  algorithm: RateLimitAlgorithm;
}

class RateLimiter {
  private rules: Map<string, RateLimitRule> = new Map();
  private defaultRule: RateLimitRule | null = null;

  addRule(name: string, rule: RateLimitRule): void {
    this.rules.set(name, rule);
  }

  setDefaultRule(rule: RateLimitRule): void {
    this.defaultRule = rule;
  }

  updateConfig(name: string, config: RateLimitConfig): void {
    const rule = this.rules.get(name);
    if (rule) {
      rule.config = config;
    }
  }

  check(options: {
    userId?: string;
    endpoint?: string;
    ruleName?: string;
  }): RateLimitResult {
    const rule = options.ruleName
      ? this.rules.get(options.ruleName)
      : this.defaultRule;

    if (!rule) {
      // No rule found — allow by default
      return { allowed: true, remaining: -1, retryAfter: 0, limit: -1 };
    }

    const key = this.buildKey(rule.keyType, options.userId, options.endpoint);
    return rule.algorithm.tryAcquire(key, rule.config);
  }

  private buildKey(
    keyType: string,
    userId?: string,
    endpoint?: string,
  ): string {
    switch (keyType) {
      case "user":
        return `user:${userId ?? "anon"}`;
      case "endpoint":
        return `endpoint:${endpoint ?? "unknown"}`;
      case "user_endpoint":
        return `user:${userId ?? "anon"}:ep:${endpoint ?? "unknown"}`;
      default:
        return "global";
    }
  }
}

Edge Cases

  1. First request from a new user — State is initialized on first call. Token Bucket starts with a full bucket. Fixed Window starts a new window.
  2. Clock drift — In a single-process system, Date.now() is consistent. In distributed systems, use a centralized time source.
  3. Config update during active window — Old state may be inconsistent with new limits. This is acceptable for rate limiting — eventual correctness is fine.
  4. Very long idle period — Token Bucket refills to max. Fixed Window's old window is discarded. Sliding Window Log's old timestamps are pruned.
  5. Zero window size — Should be validated at config creation time. A zero-second window means infinite rate, which is not useful.

Verification

Setup: Token Bucket, maxRequests=5 per 10 seconds, burstSize=5.

Step 1: 5 rapid requests at t=0.

  • Bucket starts with 5 tokens.
  • Request 1: tokens 5 -> 4. Allowed, remaining=4.
  • Request 2: tokens 4 -> 3. Allowed, remaining=3.
  • Request 3: tokens 3 -> 2. Allowed, remaining=2.
  • Request 4: tokens 2 -> 1. Allowed, remaining=1.
  • Request 5: tokens 1 -> 0. Allowed, remaining=0.

Step 2: 6th request at t=0.1.

  • Elapsed: 0.1s. Refill rate: 5/10 = 0.5 tokens/sec. Tokens added: 0.05.
  • Tokens: 0 + 0.05 = 0.05. Not enough (need 1.0).
  • Denied. retryAfter = (1.0 - 0.05) / 0.5 = 1.9 seconds.

Step 3: Request at t=2.0.

  • Elapsed since last refill: 1.9s. Tokens added: 1.9 * 0.5 = 0.95.
  • Tokens: 0.05 + 0.95 = 1.0. Exactly enough.
  • Allowed, tokens -> 0.0, remaining=0.

Step 4: Wait until t=12.0 (long idle).

  • Elapsed: 10s. Tokens added: 10 * 0.5 = 5.0. Capped at burstSize=5.
  • Tokens: 5.0. Full burst available again.

Complete Code Implementation

typescript
// ---- Config and Result ----

interface RateLimitConfig {
  maxRequests: number;
  windowSeconds: number;
  burstSize?: number;
}

interface RateLimitResult {
  allowed: boolean;
  remaining: number;
  retryAfter: number;
  limit: number;
}

interface RateLimitAlgorithm {
  tryAcquire(key: string, config: RateLimitConfig): RateLimitResult;
}

// ---- Token Bucket ----

interface TokenBucketState {
  tokens: number;
  lastRefill: number;
}

class TokenBucket implements RateLimitAlgorithm {
  private buckets: Map<string, TokenBucketState> = new Map();

  tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
    const now = Date.now() / 1000;
    const refillRate = config.maxRequests / config.windowSeconds;
    const burst = config.burstSize ?? config.maxRequests;

    if (!this.buckets.has(key)) {
      this.buckets.set(key, { tokens: burst, lastRefill: now });
    }

    const state = this.buckets.get(key)!;
    const elapsed = now - state.lastRefill;
    state.tokens = Math.min(burst, state.tokens + elapsed * refillRate);
    state.lastRefill = now;

    if (state.tokens >= 1.0) {
      state.tokens -= 1.0;
      return { allowed: true, remaining: Math.floor(state.tokens), retryAfter: 0, limit: config.maxRequests };
    } else {
      const wait = (1.0 - state.tokens) / refillRate;
      return { allowed: false, remaining: 0, retryAfter: Math.round(wait * 100) / 100, limit: config.maxRequests };
    }
  }
}

// ---- Fixed Window Counter ----

interface FixedWindowState {
  windowStart: number;
  count: number;
}

class FixedWindowCounter implements RateLimitAlgorithm {
  private windows: Map<string, FixedWindowState> = new Map();

  tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
    const now = Date.now() / 1000;
    const windowStart = now - (now % config.windowSeconds);

    const state = this.windows.get(key);
    if (!state || state.windowStart !== windowStart) {
      this.windows.set(key, { windowStart, count: 0 });
    }

    const current = this.windows.get(key)!;
    if (current.count < config.maxRequests) {
      current.count += 1;
      return {
        allowed: true,
        remaining: config.maxRequests - current.count,
        retryAfter: 0,
        limit: config.maxRequests,
      };
    } else {
      const retry = windowStart + config.windowSeconds - now;
      return {
        allowed: false,
        remaining: 0,
        retryAfter: Math.round(Math.max(0, retry) * 100) / 100,
        limit: config.maxRequests,
      };
    }
  }
}

// ---- Sliding Window Log ----

class SlidingWindowLog implements RateLimitAlgorithm {
  private logs: Map<string, number[]> = new Map();

  tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
    const now = Date.now() / 1000;
    const windowStart = now - config.windowSeconds;

    if (!this.logs.has(key)) {
      this.logs.set(key, []);
    }
    const log = this.logs.get(key)!;

    while (log.length > 0 && log[0] <= windowStart) {
      log.shift();
    }

    if (log.length < config.maxRequests) {
      log.push(now);
      return {
        allowed: true,
        remaining: config.maxRequests - log.length,
        retryAfter: 0,
        limit: config.maxRequests,
      };
    } else {
      const oldest = log[0];
      const retry = oldest + config.windowSeconds - now;
      return {
        allowed: false,
        remaining: 0,
        retryAfter: Math.round(Math.max(0, retry) * 100) / 100,
        limit: config.maxRequests,
      };
    }
  }
}

// ---- Rate Limiter Orchestrator ----

interface RateLimitRule {
  keyType: string;
  config: RateLimitConfig;
  algorithm: RateLimitAlgorithm;
}

class RateLimiter {
  private rules: Map<string, RateLimitRule> = new Map();
  private defaultRule: RateLimitRule | null = null;

  addRule(name: string, rule: RateLimitRule): void {
    this.rules.set(name, rule);
  }

  setDefaultRule(rule: RateLimitRule): void {
    this.defaultRule = rule;
  }

  updateConfig(name: string, config: RateLimitConfig): void {
    const rule = this.rules.get(name);
    if (rule) rule.config = config;
  }

  check(options: {
    userId?: string;
    endpoint?: string;
    ruleName?: string;
  }): RateLimitResult {
    const rule = options.ruleName
      ? this.rules.get(options.ruleName)
      : this.defaultRule;
    if (!rule) {
      return { allowed: true, remaining: -1, retryAfter: 0, limit: -1 };
    }
    const key = this.buildKey(rule.keyType, options.userId, options.endpoint);
    return rule.algorithm.tryAcquire(key, rule.config);
  }

  private buildKey(keyType: string, userId?: string, endpoint?: string): string {
    switch (keyType) {
      case "user":
        return `user:${userId ?? "anon"}`;
      case "endpoint":
        return `endpoint:${endpoint ?? "unknown"}`;
      case "user_endpoint":
        return `user:${userId ?? "anon"}:ep:${endpoint ?? "unknown"}`;
      default:
        return "global";
    }
  }
}

// ---- Demo ----

const limiter = new RateLimiter();

// Rule: 5 requests per 10 seconds using Token Bucket
const apiRule: RateLimitRule = {
  keyType: "user",
  config: { maxRequests: 5, windowSeconds: 10 },
  algorithm: new TokenBucket(),
};
limiter.addRule("api", apiRule);

// Rule: 3 login attempts per 60 seconds using Fixed Window
const loginRule: RateLimitRule = {
  keyType: "user_endpoint",
  config: { maxRequests: 3, windowSeconds: 60 },
  algorithm: new FixedWindowCounter(),
};
limiter.addRule("login", loginRule);

// Simulate API requests
console.log("API rate limiting (Token Bucket):");
for (let i = 0; i < 7; i++) {
  const result = limiter.check({ userId: "alice", ruleName: "api" });
  const status = result.allowed ? "ALLOWED" : "DENIED";
  console.log(
    `  Request ${i + 1}: ${status} (remaining=${result.remaining}, retryAfter=${result.retryAfter}s)`
  );
}

// Simulate login attempts
console.log("\nLogin rate limiting (Fixed Window):");
for (let i = 0; i < 5; i++) {
  const result = limiter.check({
    userId: "alice",
    endpoint: "/login",
    ruleName: "login",
  });
  const status = result.allowed ? "ALLOWED" : "DENIED";
  console.log(
    `  Attempt ${i + 1}: ${status} (remaining=${result.remaining}, retryAfter=${result.retryAfter}s)`
  );
}

Extensibility

1. Adding Leaky Bucket Algorithm

The Leaky Bucket processes requests at a fixed rate, like water dripping from a bucket. It queues excess requests instead of rejecting them immediately.

typescript
class LeakyBucket implements RateLimitAlgorithm {
  private queues: Map<string, number[]> = new Map();
  private lastLeak: Map<string, number> = new Map();

  tryAcquire(key: string, config: RateLimitConfig): RateLimitResult {
    const now = Date.now() / 1000;
    const leakRate = config.maxRequests / config.windowSeconds;
    const burst = config.burstSize ?? config.maxRequests;

    if (!this.queues.has(key)) {
      this.queues.set(key, []);
    }
    const queue = this.queues.get(key)!;

    // Leak (process) queued requests
    const lastLeakTime = this.lastLeak.get(key);
    if (lastLeakTime !== undefined) {
      const elapsed = now - lastLeakTime;
      const leaked = Math.floor(elapsed * leakRate);
      queue.splice(0, Math.min(leaked, queue.length));
    }
    this.lastLeak.set(key, now);

    if (queue.length < burst) {
      queue.push(now);
      return {
        allowed: true,
        remaining: burst - queue.length,
        retryAfter: 0,
        limit: config.maxRequests,
      };
    } else {
      const wait = 1.0 / leakRate;
      return {
        allowed: false,
        remaining: 0,
        retryAfter: Math.round(wait * 100) / 100,
        limit: config.maxRequests,
      };
    }
  }
}

Tradeoff: Leaky Bucket smooths traffic perfectly but does not allow bursts. Good for downstream services that cannot handle spikes.

2. Composite Rate Limiting (Multiple Rules Per Request)

Apply multiple rules to the same request — e.g., per-user AND per-endpoint limits:

typescript
checkAll(
  userId: string,
  endpoint: string,
  ruleNames: string[],
): RateLimitResult {
  for (const name of ruleNames) {
    const result = this.check({ userId, endpoint, ruleName: name });
    if (!result.allowed) {
      return result; // First rule that denies wins
    }
  }
  return { allowed: true, remaining: 0, retryAfter: 0, limit: 0 };
}

Tradeoff: Order matters. If rule A denies, rule B's counter is not incremented. This prevents double-counting but means rule B's state is not accurate.

3. Warm-Up Period

New users get a reduced rate limit that gradually increases:

typescript
interface RateLimitConfig {
  maxRequests: number;
  windowSeconds: number;
  burstSize?: number;
  warmupRequests?: number;  // lower limit for new users
  warmupDuration?: number;  // seconds before full rate kicks in
}

Tradeoff: Adds complexity to every algorithm. The algorithm must track when a key was first seen and interpolate between warmup and full limits.


What is Expected at Each Level

LevelExpectations
JuniorImplement one algorithm (usually Fixed Window). Track counts per key. May not handle retry-after or thread safety.
Mid-levelStrategy pattern with at least two algorithms. Clean interface. Thread-safe implementations. Explain pros/cons of each algorithm (boundary spike problem, memory usage). Return retry-after.
SeniorEverything above plus: all three algorithms implemented, composable rules (per-user + per-endpoint), runtime config updates, discussion of distributed rate limiting (how to extend this to Redis), analysis of Token Bucket math (refill rate, burst capacity), Sliding Window Counter as a memory optimization over Sliding Window Log.

Frontend interview preparation reference.