43 — LLD: Rate Limiter

Frequently Asked at Salesforce — Reported repeatedly in SMTS technical rounds (2024-2026).

Understanding the Problem

A rate limiter caps how many requests a given caller can make in a given window. It returns allowed=true or allowed=false with a retryAfterMs hint. The interesting bit is choosing the algorithm: fixed window, sliding window log, or token bucket, each with different burst semantics.

What is rate limiting?

It is admission control at the gate of your service. Instead of your downstream systems buckling under a traffic spike, the limiter rejects the overflow early and tells the caller when to try again. The algorithm choice is about how "bursty" you tolerate.

Requirements

Clarifying Questions

You: In-memory per node, or distributed?

Interviewer: In-memory for v1, but design the interface so distributed is swappable.

Keep a narrow interface so a Redis-backed implementation drops in later.

You: Hard reject or queue?

Interviewer: Hard reject with retry-after hint.

No queue. Return rejection atomically.

You: Limits dynamic per user?

Interviewer: Yes — free tier 60/min, paid 600/min.

Introduce a LimitResolver so the rate limiter itself does not know the business rules.

You: Per endpoint too?

Interviewer: Yes, the key should be (userId, endpoint).

Key is a composite string built by the caller.

Final Requirements

tryAcquire(key, now) returns {allowed, retryAfterMs}.
Three algorithm choices: fixed window, sliding window log, token bucket.
Pluggable via a common interface.
Thread-safe per key.
Configurable limit and window.
Friendly to distributed follow-up (Redis).

Out of scope: cost-based limiting (not all requests are equal), queueing, cross-region coordination.

Core Entities and Relationships

Entity	Responsibility
`RateCheck`	`{allowed: bool, retryAfterMs: int}`.
`RateLimiter`	Interface. `tryAcquire(key, now)`.
`FixedWindowLimiter`	Counter reset per window boundary.
`SlidingWindowLogLimiter`	Deque of timestamps; evict older than window.
`TokenBucketLimiter`	Refill over time; spend 1 per acquire.
`LimitConfig`	`limit`, `windowMs`, `refillPerMs`.
`LimitResolver`	Returns config for a given key.
`RateLimiterFactory`	Builds a limiter given a policy enum.

Why a Factory? So callers do not know or care which algorithm is wired — they pass "TOKEN_BUCKET" in config.

Why LimitResolver separate? Keeps the limiter pure. The resolver is where tier lookup (free vs paid) happens.

Design is iterative — each algorithm is about 20 lines once the interface exists.

Class Design

RateLimiter (interface)

Methods:

tryAcquire(key, now) -> RateCheck

FixedWindowLimiter

State:

limit: int, windowMs: int
counters: Map<key, {windowStart, count}>

SlidingWindowLogLimiter

State:

limit: int, windowMs: int
log: Map<key, Deque<timestamp>>

TokenBucketLimiter

State:

capacity: int, refillPerMs: float
buckets: Map<key, {tokens, lastRefill}>

Design principles at play:

Strategy Pattern — three algorithms behind one interface.
Factory Pattern — RateLimiterFactory.create(policy, config).
Open/Closed — adding a fourth algorithm (GCRA, leaky bucket) is a new class, zero edits.

Implementation

Core Logic

Fixed window — cheapest but spiky at boundaries. A user at the edge can make 2*limit requests in a second if they straddle the boundary.

Sliding window log — most accurate but memory is O(limit) per key. Great for small limits, bad for high-throughput endpoints.

Token bucket — allows bursts up to capacity, then smooths to refill rate. Most production APIs use this.

Verification — Token Bucket

Capacity 5, refill 1 per second.

t=0, tryAcquire(alice) → bucket {tokens=5, lastRefill=0}. Elapsed 0. Tokens stay 5. Spend 1 → 4. Allowed.
t=100ms, tryAcquire(alice) → elapsed 100ms, refill 0.1 → tokens 4.1. Spend 1 → 3.1. Allowed.
Rapid 5 more calls within a second → tokens drop below 1 → allowed=false, retryAfterMs=ceil((1-tokens)/refillPerMs).
Wait 1s → tokens climb back; next call allowed.

Thread Safety

Per-key entries must be read, mutated, and written atomically.
In Java, ConcurrentHashMap.compute(key, fn) locks only that bin — clean and fast.
Sliding log needs a per-key Deque under a lock because while (...) arr.shift() plus push is a multi-step operation.
Distributed version: Redis with a Lua script for atomic INCR + EXPIRE (fixed window), or ZADD + ZREMRANGEBYSCORE + ZCARD (sliding log). Lua guarantees atomicity on a single Redis node.

Complete Code Implementation

JavaC++TypeScript

java

import java.util.*;
import java.util.concurrent.*;

record RateCheck(boolean allowed, long retryAfterMs) {}

interface RateLimiter { RateCheck tryAcquire(String key, long now); }

class FixedWindowLimiter implements RateLimiter {
    private final long limit, windowMs;
    private final ConcurrentHashMap<String, long[]> counters = new ConcurrentHashMap<>();

    public FixedWindowLimiter(long limit, long windowMs) { this.limit = limit; this.windowMs = windowMs; }

    public RateCheck tryAcquire(String key, long now) {
        long windowStart = (now / windowMs) * windowMs;
        long[] state = counters.compute(key, (k, v) -> {
            if (v == null || v[0] != windowStart) return new long[]{windowStart, 1};
            if (v[1] < limit) { v[1]++; return v; }
            return v;
        });
        if (state[1] <= limit && !(state[0] == windowStart && state[1] > limit)) {
            // one more check: if we didn't exceed, allow
            if (state[0] == windowStart && state[1] <= limit) return new RateCheck(true, 0);
        }
        if (state[1] >= limit && state[0] == windowStart) {
            return new RateCheck(false, windowStart + windowMs - now);
        }
        return new RateCheck(true, 0);
    }
}

class SlidingWindowLogLimiter implements RateLimiter {
    private final int limit; private final long windowMs;
    private final ConcurrentHashMap<String, Deque<Long>> log = new ConcurrentHashMap<>();

    public SlidingWindowLogLimiter(int limit, long windowMs) { this.limit = limit; this.windowMs = windowMs; }

    public RateCheck tryAcquire(String key, long now) {
        Deque<Long> q = log.computeIfAbsent(key, k -> new ArrayDeque<>());
        synchronized (q) {
            long cutoff = now - windowMs;
            while (!q.isEmpty() && q.peekFirst() <= cutoff) q.pollFirst();
            if (q.size() < limit) { q.offerLast(now); return new RateCheck(true, 0); }
            return new RateCheck(false, q.peekFirst() + windowMs - now);
        }
    }
}

class TokenBucketLimiter implements RateLimiter {
    private final double capacity, refillPerMs;
    private final ConcurrentHashMap<String, double[]> buckets = new ConcurrentHashMap<>();

    public TokenBucketLimiter(double capacity, double refillPerMs) {
        this.capacity = capacity; this.refillPerMs = refillPerMs;
    }

    public RateCheck tryAcquire(String key, long now) {
        double[] res = new double[2];
        buckets.compute(key, (k, v) -> {
            double[] b = v == null ? new double[]{capacity, now} : v;
            double elapsed = now - b[1];
            b[0] = Math.min(capacity, b[0] + elapsed * refillPerMs);
            b[1] = now;
            if (b[0] >= 1) { b[0] -= 1; res[0] = 1; res[1] = 0; }
            else { res[0] = 0; res[1] = (long) Math.ceil((1 - b[0]) / refillPerMs); }
            return b;
        });
        return new RateCheck(res[0] == 1, (long) res[1]);
    }
}

cpp

#include <chrono>
#include <deque>
#include <mutex>
#include <string>
#include <unordered_map>

struct RateCheck { bool allowed; long long retryAfterMs; };

struct RateLimiter { virtual RateCheck tryAcquire(const std::string& key, long long now) = 0; };

class TokenBucketLimiter : public RateLimiter {
public:
    TokenBucketLimiter(double cap, double r) : capacity(cap), refillPerMs(r) {}
    RateCheck tryAcquire(const std::string& key, long long now) override {
        std::lock_guard<std::mutex> g(mu);
        auto& b = buckets[key];
        if (b.lastRefill == 0) { b.tokens = capacity; b.lastRefill = now; }
        double elapsed = now - b.lastRefill;
        b.tokens = std::min(capacity, b.tokens + elapsed * refillPerMs);
        b.lastRefill = now;
        if (b.tokens >= 1) { b.tokens -= 1; return {true, 0}; }
        return {false, static_cast<long long>((1 - b.tokens) / refillPerMs + 1)};
    }
private:
    struct Bucket { double tokens = 0; long long lastRefill = 0; };
    double capacity, refillPerMs;
    std::unordered_map<std::string, Bucket> buckets;
    std::mutex mu;
};

class FixedWindowLimiter : public RateLimiter {
public:
    FixedWindowLimiter(long long lim, long long w) : limit(lim), windowMs(w) {}
    RateCheck tryAcquire(const std::string& key, long long now) override {
        std::lock_guard<std::mutex> g(mu);
        long long windowStart = (now / windowMs) * windowMs;
        auto& s = counters[key];
        if (s.windowStart != windowStart) { s = {windowStart, 1}; return {true, 0}; }
        if (s.count < limit) { s.count++; return {true, 0}; }
        return {false, windowStart + windowMs - now};
    }
private:
    struct Counter { long long windowStart = -1; long long count = 0; };
    long long limit, windowMs;
    std::unordered_map<std::string, Counter> counters;
    std::mutex mu;
};

class SlidingWindowLogLimiter : public RateLimiter {
public:
    SlidingWindowLogLimiter(int lim, long long w) : limit(lim), windowMs(w) {}
    RateCheck tryAcquire(const std::string& key, long long now) override {
        std::lock_guard<std::mutex> g(mu);
        auto& q = log[key];
        long long cutoff = now - windowMs;
        while (!q.empty() && q.front() <= cutoff) q.pop_front();
        if ((int)q.size() < limit) { q.push_back(now); return {true, 0}; }
        return {false, q.front() + windowMs - now};
    }
private:
    int limit; long long windowMs;
    std::unordered_map<std::string, std::deque<long long>> log;
    std::mutex mu;
};

typescript

interface RateCheck { allowed: boolean; retryAfterMs: number; }
interface RateLimiter { tryAcquire(key: string, now?: number): RateCheck; }

class FixedWindowLimiter implements RateLimiter {
  private counters = new Map<string, { windowStart: number; count: number }>();
  constructor(private limit: number, private windowMs: number) {}
  tryAcquire(key: string, now = Date.now()): RateCheck {
    const entry = this.counters.get(key);
    const windowStart = Math.floor(now / this.windowMs) * this.windowMs;
    if (!entry || entry.windowStart !== windowStart) {
      this.counters.set(key, { windowStart, count: 1 });
      return { allowed: true, retryAfterMs: 0 };
    }
    if (entry.count < this.limit) { entry.count += 1; return { allowed: true, retryAfterMs: 0 }; }
    return { allowed: false, retryAfterMs: windowStart + this.windowMs - now };
  }
}

class SlidingWindowLogLimiter implements RateLimiter {
  private log = new Map<string, number[]>();
  constructor(private limit: number, private windowMs: number) {}
  tryAcquire(key: string, now = Date.now()): RateCheck {
    const arr = this.log.get(key) ?? [];
    const cutoff = now - this.windowMs;
    while (arr.length && arr[0] <= cutoff) arr.shift();
    if (arr.length < this.limit) { arr.push(now); this.log.set(key, arr); return { allowed: true, retryAfterMs: 0 }; }
    return { allowed: false, retryAfterMs: arr[0] + this.windowMs - now };
  }
}

class TokenBucketLimiter implements RateLimiter {
  private buckets = new Map<string, { tokens: number; lastRefill: number }>();
  constructor(private capacity: number, private refillPerMs: number) {}
  tryAcquire(key: string, now = Date.now()): RateCheck {
    const b = this.buckets.get(key) ?? { tokens: this.capacity, lastRefill: now };
    const elapsed = now - b.lastRefill;
    b.tokens = Math.min(this.capacity, b.tokens + elapsed * this.refillPerMs);
    b.lastRefill = now;
    if (b.tokens >= 1) { b.tokens -= 1; this.buckets.set(key, b); return { allowed: true, retryAfterMs: 0 }; }
    this.buckets.set(key, b);
    return { allowed: false, retryAfterMs: Math.ceil((1 - b.tokens) / this.refillPerMs) };
  }
}

type Policy = "FIXED_WINDOW" | "SLIDING_LOG" | "TOKEN_BUCKET";
class RateLimiterFactory {
  static create(policy: Policy, limit: number, windowMs: number): RateLimiter {
    switch (policy) {
      case "FIXED_WINDOW": return new FixedWindowLimiter(limit, windowMs);
      case "SLIDING_LOG": return new SlidingWindowLogLimiter(limit, windowMs);
      case "TOKEN_BUCKET": return new TokenBucketLimiter(limit, limit / windowMs);
    }
  }
}

Extensibility

1. "Burst allowance"

Token bucket already supports bursts via capacity greater than refillPerMs * windowMs. For fixed/sliding window, compose a second limiter with a shorter window and smaller limit — both must allow.

2. "Distributed across N nodes"

Move the state map to Redis. The interface does not change; the Redis-backed impl uses a Lua script per call for atomicity. Local copies can optionally cache denials for a short TTL.

typescript

class RedisTokenBucket implements RateLimiter {
  async tryAcquire(key: string, now = Date.now()): Promise<RateCheck> {
    // EVAL script that reads bucket, refills, decrements or rejects, writes back
  }
}

3. "Per-IP plus per-user combined"

Compose two limiters; request allowed only if both allow. Return the larger retryAfterMs. Decorator pattern fits cleanly.

What is Expected at Each Level

Level	Expectations
Mid	One algorithm, in-memory, no retry-after.
Senior / SMTS	Three strategies, clean interface, retry-after, thread-safe per key, mentions distributed option.
Staff	Exact semantics of sliding-window-counter approximation, cost of Redis round-trip, fallback on Redis outage, GCRA or leaky bucket variants.

43 — LLD: Rate Limiter ​

Understanding the Problem ​

Requirements ​

Clarifying Questions ​

Final Requirements ​

Core Entities and Relationships ​

Class Design ​

RateLimiter (interface) ​

FixedWindowLimiter ​

SlidingWindowLogLimiter ​

TokenBucketLimiter ​

Implementation ​

Core Logic ​

Verification — Token Bucket ​

Thread Safety ​

Complete Code Implementation ​

Extensibility ​

1. "Burst allowance" ​

2. "Distributed across N nodes" ​

3. "Per-IP plus per-user combined" ​

What is Expected at Each Level ​

43 — LLD: Rate Limiter

Understanding the Problem

Requirements

Clarifying Questions

Final Requirements

Core Entities and Relationships

Class Design

RateLimiter (interface)

FixedWindowLimiter

SlidingWindowLogLimiter

TokenBucketLimiter

Implementation

Core Logic

Verification — Token Bucket

Thread Safety

Complete Code Implementation

Extensibility

1. "Burst allowance"

2. "Distributed across N nodes"

3. "Per-IP plus per-user combined"

What is Expected at Each Level