06 — Concurrency: Correctness Primitives ​
Why Concurrency Shows Up in LLD Interviews ​
Open any senior LLD problem and trace the data for thirty seconds. You will almost always find a contended resource: a seat in a booking system, a token bucket in a rate limiter, a counter in a cache, a stock level in inventory, a locker slot, a payment ledger row, a leader-election flag. The interviewer is not quizzing you on pthread APIs — they want to see whether you notice the race before they point it out, and whether you reach for the smallest tool that fixes it.
Production backends do not run single-threaded. A Spring service has a thread pool. A Go service has goroutines. A Node service has async callbacks interleaving on one thread but often fronted by multiple worker processes sharing a Redis or a database. Every design you produce in an interview is implicitly going to live in that environment. "Single-threaded, won't think about it" is rarely a correct answer past SDE-1.
The senior framing is different from the junior one. A junior candidate often reaches for synchronized as a reflex. A senior candidate asks, in order:
- Can I make this stateless so the question doesn't arise?
- If not, can the state live outside the process (DB row, Redis key) so the concurrency story becomes "whatever that store gives me"?
- If not, which tiny piece of in-process state is actually shared, and what is the smallest primitive that makes it correct?
This note is the vocabulary for those conversations. It covers the primitives interviewers expect you to name out loud, the tradeoff tables you should be able to draw on a whiteboard, and the scripts for the moments an interviewer says "can this deadlock?" or "what happens under contention?"
Scope note
This file is about correctness: mutexes, atomics, thread-safety categories, deadlock, idempotency. The coordination side — producer/consumer queues, barriers, semaphores for resource pools, futures/promises, actor model — lives in file 07. The two form a pair. Don't pull Producer/Consumer examples into this note; it's the canonical coordination problem, saved for next.
How interviewers probe concurrency in LLD
The probes are rarely "implement a concurrent hashmap." They look like:
- "What if two users click 'book seat A1' at the same millisecond?" — seat booking.
- "What happens when the same webhook fires twice because the provider retried?" — idempotency.
- "How do you handle the cache stampede when the key expires?" — coordination + locks.
- "If one request is decrementing inventory and another is reading it, what could go wrong?" — read/write consistency.
- "Can this deadlock?" — lock-order walkthrough.
Each of these has a two-sentence answer you should have pre-canned. The rest of this document is those answers, with the vocabulary to defend them.
Threads vs Processes ​
A thread is a schedulable unit that shares an address space with its siblings. A process is a schedulable unit that owns its address space. Everything downstream — how expensive the primitives are, what a race even means, what "shared state" refers to — falls out of that one distinction.
| Axis | Thread | Process |
|---|---|---|
| Address space | Shared with siblings | Isolated |
| Creation cost | Microseconds | Milliseconds (fork) or hundreds of ms (new JVM/Node) |
| IPC cost | Direct memory read/write | Pipe, socket, shared memory, queue |
| Crash blast radius | Whole process dies | Just that process |
| Default concurrency model | Java, C++, Rust, Python (with GIL caveat) | Node cluster, Python multiprocessing, Nginx worker model |
| Correctness burden | High — every shared mutable field is suspect | Lower — shared-nothing by default |
"Why not just use a process per request?" At low scale, you can. Classic CGI did exactly this. The reason it does not survive modern load is that each request pays the full cost of process creation and re-establishing every connection pool, cache, config load, JIT warmup, and TLS session. A 1 ms handler gets wrapped in 30 ms of startup. The economics reverse only when the per-request work is expensive enough (ML inference, video transcoding) or the isolation is worth paying for (sandboxing untrusted code, e.g. Lambda firecracker VMs, Cloudflare isolates).
In an LLD interview, the thread/process split matters when the interviewer asks "how does this scale across machines?" You answer by naming where the state lives. If it lives in process memory, horizontal scale means moving it out (to Redis, a DB, a queue). If it lives in a shared store already, horizontal scale is mostly a load-balancer problem.
Shared State — The Core Problem ​
Concurrency bugs almost always reduce to one sentence: two threads mutated the same memory and neither knew about the other. The canonical example is a counter:
class Counter {
private count = 0;
increment(): void {
this.count = this.count + 1; // NOT atomic
}
}The reason this breaks is that this.count = this.count + 1 is not one operation at the machine level. It decomposes into roughly:
1. LOAD count -> register R
2. ADD R, 1 -> register R
3. STORE R -> countTwo threads running this in lockstep can both read 5, both compute 6, both write 6. One increment is lost. This is a data race: concurrent access to the same location, at least one a write, with no ordering between them. Data races in languages with a memory model (Java, C++, Go, Rust) are not just "sometimes wrong" — they are formally undefined behavior. The compiler is allowed to optimize assuming no race, which means seemingly impossible states become possible (torn reads, stale caches, instructions reordered across what looked like a boundary).
The fix is to make the read-modify-write atomic — either by wrapping it in a lock, or by using a hardware atomic like AtomicInteger.incrementAndGet(), or by making the counter not-shared in the first place (one counter per thread, merged later). Which one you pick is a tradeoff question, and that's the rest of this document.
The interview signal
When you spot shared mutable state in your own design, say it out loud before the interviewer does: "This field is read by both the booking path and the cleanup path — let me think about how to serialize that." That single sentence moves you a level up in the interviewer's scorecard.
Stateless Design Wins Where It Can ​
Before reaching for any lock, ask: does this method need to read or write shared state at all? A pure function — same input, same output, no side effects — is thread-safe by construction. No locks, no atomics, no contention.
Three techniques push problems from "shared mutable" to "not a concurrency problem":
1. Pure functions. A price calculator that takes a cart and a discount table and returns a total has no state to share. Ten threads can call it in parallel and the only "contention" is on the inputs, which are already owned by the caller.
2. Thread-local state. When a piece of state is expensive to construct but only used inside one request, give each thread its own copy.
private static final ThreadLocal<SimpleDateFormat> FORMATTER =
ThreadLocal.withInitial(() -> new SimpleDateFormat("yyyy-MM-dd"));SimpleDateFormat is notoriously not thread-safe. A junior fix is to synchronize every call. The senior fix is to give each thread its own formatter — no sharing means no race. Same trick applies to Random, scratch buffers in a hot loop, per-request tracing context.
3. Idempotent APIs. Make createOrder(id) safe to call twice with the same id. Then if a retry happens because of a timeout, the second call observes "already created" and returns the same result. The state is still shared — but the concurrency semantics become trivial because the second caller can't do damage. More on this below under Idempotency.
The senior framing to verbalize in an interview: "My first instinct is to see how much of this I can make stateless — the state I can't eliminate is the state I actually need to protect." That's a strong opener before any lock discussion.
Thread Safety — A Taxonomy ​
Every class in your design falls into one of five categories. A senior candidate can classify a class instantly and defend the choice.
| Category | What it means | LLD example | When to use |
|---|---|---|---|
| Immutable | State cannot change after construction. Every thread sees the same bytes. | Money(amount, currency), a Config loaded at startup, a Selection {start, end} value object. | Value objects, configuration snapshots, keys. Default choice. |
| Synchronized / thread-safe | Mutable state, but all access is serialized via lock, atomic, or lock-free structure. | ConcurrentHashMap, a rate-limiter class with internal synchronized methods. | Shared mutable collections, counters, caches. |
| Thread-confined | Mutable, but only ever touched by one thread. Ownership is the contract. | Swing components (EDT only), a request-scoped context object. | When you can enforce ownership by policy. |
| Thread-local | Mutable, but each thread has its own copy. | ThreadLocal<SimpleDateFormat>, per-thread DB connection in a pool. | Expensive-to-construct, per-thread-scoped objects. |
| Not thread-safe | Mutable, no protection. Caller must provide it. | ArrayList, HashMap, most plain domain entities. | Internal state that's already protected by an enclosing synchronized class. |
The mistake to avoid: letting a not-thread-safe object leak into a synchronized class without saying so. If a RateLimiter has a synchronized method that returns its internal HashMap, the returned reference lives outside the lock. The interviewer will catch this.
The rule: classify every field. If a field's category doesn't match the class's category (e.g. a HashMap inside a class claiming to be thread-safe), you have a bug or an undocumented caller contract.
Mutex / Locks ​
A mutex is the oldest, crudest, most widely understood primitive. One thread holds it, everyone else waits. The interview conversation is almost never "do we need a lock" — it's "what is this lock protecting, and how granular should it be?"
Coarse-grained vs fine-grained ​
Coarse-grained: one lock per object. Simple, obviously correct, and a scalability ceiling.
class Cache<K, V> {
private readonly map = new Map<K, V>();
private readonly lock = new Mutex();
async get(k: K): Promise<V | undefined> {
return this.lock.withLock(() => this.map.get(k));
}
async put(k: K, v: V): Promise<void> {
return this.lock.withLock(() => { this.map.set(k, v); });
}
}Two threads doing get("a") and get("b") serialize on the same lock even though they touch disjoint keys. Under load, most of your thread pool is waiting on this.lock and CPU utilization looks fine while throughput flatlines.
Fine-grained (lock striping): split the lock into N shards. Each key hashes to one shard; unrelated keys don't block each other.
class StripedCache<K, V> {
private readonly stripes: { map: Map<K, V>; lock: Mutex }[];
constructor(private readonly stripeCount = 16) {
this.stripes = Array.from({ length: stripeCount }, () => ({
map: new Map(),
lock: new Mutex(),
}));
}
private stripeFor(k: K) {
return this.stripes[Math.abs(hash(k)) % this.stripeCount];
}
async get(k: K): Promise<V | undefined> {
const s = this.stripeFor(k);
return s.lock.withLock(() => s.map.get(k));
}
}ConcurrentHashMap in Java is essentially this pattern taken further (segment-level in Java 7, per-bucket CAS in Java 8+).
| Granularity | Throughput under contention | Memory overhead | Code complexity | Cross-shard operations |
|---|---|---|---|---|
| One lock (coarse) | Low — all ops serialize | Minimal | Trivial | Easy — everything is under the lock |
| N stripes (fine) | ~N× better if hash distributes well | N locks + per-stripe structures | Medium | Hard — a "clear all" needs N locks in order |
| Per-key lock | Near-linear for uncontended keys | One lock per entry | High | Very hard, easy to deadlock |
| Lock-free (CAS-based) | Highest | Low | High, easy to get wrong | Requires algorithm redesign |
Rule of thumb: start coarse. Measure. Move to striping when you have evidence the lock is the bottleneck. Per-key locking and lock-free designs are a cost most LLDs don't justify, but naming them shows range.
Reentrancy ​
A reentrant lock lets the same thread acquire it multiple times without deadlocking itself. The lock tracks a holder and a recursion count.
ReentrantLock lock = new ReentrantLock();
void outer() {
lock.lock();
try {
inner(); // calls lock.lock() again — fine, count goes to 2
} finally {
lock.unlock(); // count back to 1
}
}
void inner() {
lock.lock();
try { ... }
finally { lock.unlock(); }
}Java's synchronized and ReentrantLock are both reentrant. Many other languages' primitives are not. std::mutex in C++ is non-reentrant — calling lock() twice on the same thread is undefined behavior; you need std::recursive_mutex. Go's sync.Mutex is non-reentrant by design and the Go team considers that a feature — they argue reentrancy hides design bugs.
The surprise: in a non-reentrant world, a publicMethodA that takes the lock and then calls publicMethodB on the same object (which also takes the lock) deadlocks the caller against itself. The usual fix is to split into a public lock-taking entry point and a private already-locked helper:
public void transfer(Account from, Account to, Money amount) {
lock.lock();
try {
transferLocked(from, to, amount);
} finally {
lock.unlock();
}
}
private void transferLocked(Account from, Account to, Money amount) {
// callers must hold `lock`; no re-acquire here
...
}In an interview, if you pick a reentrant lock, say so and why (API ergonomics — public methods can call each other). If you pick non-reentrant, say why (forces cleaner separation between locked helpers and entry points).
Read-Write Locks ​
A read-write lock allows either one writer or many readers, never both. The win is real when readers massively outnumber writers — configuration caches, routing tables, permission lookups, leaderboards read hundreds of times per write.
ReadWriteLock rw = new ReentrantReadWriteLock();
V get(K k) {
rw.readLock().lock();
try { return map.get(k); }
finally { rw.readLock().unlock(); }
}
void put(K k, V v) {
rw.writeLock().lock();
try { map.put(k, v); }
finally { rw.writeLock().unlock(); }
}The trap: RW locks are not a free upgrade over mutexes. They have higher overhead per acquisition (more state to track: reader count, writer flag, possibly fairness queue), and under write-biased load they can perform worse than a plain mutex.
| Workload | Mutex | RW lock |
|---|---|---|
| 99% read, 1% write | Throughput bottleneck at single writer logic | Huge win — readers parallelize |
| 50/50 read/write | Comparable, maybe mutex edges it out | Worse than mutex (writer starvation risk + overhead) |
| Write-heavy | Works fine | Worse than mutex |
| Short critical sections | Mutex simpler, faster | RW overhead may eat the savings |
| Very long critical sections | Holds up everyone | Readers overlap, genuine win |
Rule of thumb: RW locks pay off when the read:write ratio is roughly 10:1 or better and the critical section is long enough that the extra overhead is noise. Below that, a mutex is simpler and often faster.
Starvation hazard: unfair RW lock implementations let a steady stream of readers starve a waiting writer indefinitely. Most real RW locks ship in both fair and unfair modes. Fair = writer arrives, future readers block until writer runs; costs some throughput, prevents starvation.
LLD example. A rate limiter with a per-tenant config (rules, limits, allowlist). Reads happen on every request. Writes happen when an operator updates config, which is rare. RW lock is the right call — reads parallelize, the occasional config update briefly blocks.
Anti-example. A cache with a hot key rewritten on every request (LRU recency updates, hit counters). Every read also mutates state, so reads effectively behave as writes. Mutex or lock-free structure beats RW here.
Compare-And-Swap (CAS) and Atomics ​
CAS is a hardware-level instruction: "atomically, if memory location M still equals expected E, replace it with new value N, and tell me whether you succeeded." Every modern CPU has it (lock cmpxchg on x86, ldrex/strex on ARM). On top of CAS you build atomic counters, lock-free queues, and the guts of ConcurrentHashMap.
The usage pattern is a retry loop:
AtomicInteger seats = new AtomicInteger(100);
boolean reserve() {
while (true) {
int current = seats.get();
if (current == 0) return false;
if (seats.compareAndSet(current, current - 1)) return true;
// CAS failed — somebody else reserved. Loop and read again.
}
}The attractive property: no thread ever blocks. If a contender wins, the loser just re-reads and tries again. Under low-to-medium contention this crushes a mutex — no OS-level waiting, no context switch.
Under heavy contention, the retry loop becomes wasteful. Many threads read, all try to CAS, most fail, all read again. You burn CPU making no progress. Above some threshold (workload-dependent, typically dozens of contenders) a mutex with a proper wait queue becomes competitive again.
| Tool | Best at | Fails at |
|---|---|---|
| CAS / atomics | Short critical sections, low-to-medium contention, simple state (counter, flag, single pointer) | Complex multi-field invariants, high contention |
| Mutex | Arbitrary critical sections, moderate contention, fairness guarantees | Very short sections (overhead dominates), very high contention with long holds |
| RW lock | Read-heavy with long reads | Short reads, write-heavy |
| Lock-free data structure | Highest throughput, predictable latency | Complexity, ABA, memory reclamation |
The ABA problem ​
CAS checks value equality, not "hasn't been touched." Consider a lock-free stack with a top pointer. Thread 1 reads top = A, gets descheduled. Thread 2 pops A, pops B, pushes A back (maybe reusing the same object, or a recycled address). Thread 1 wakes, CAS's top from A to what-it-thinks-is-A's-next. The CAS succeeds — the value is A again — but the list structure has changed under it. Data corruption.
Fixes:
- Tagged pointers / version counters. Pair the pointer with a monotonic counter. CAS the whole pair. Every modification bumps the counter, so "A-then-back-to-A" is now "A#5" vs "A#7" and the CAS fails.
- Hazard pointers / epoch-based reclamation. Never reuse memory that some other thread might still be holding a pointer into. Bigger hammer, whole-algorithm concern.
- In GC'd languages. A removed node can't be recycled while another thread still references it, which kills the classic ABA for free — but a logical ABA can still occur if the algorithm re-inserts the same object.
When to reach for atomics in an LLD interview ​
- Counters (
AtomicInteger,AtomicLong) — request counts, active connections, token bucket levels. - Flags that flip once (
AtomicBoolean, or a simplevolatile booleanfor visibility-only cases). - Single-pointer swaps (
AtomicReference) — snapshot swap, lock-free singleton. - First-thread-wins races —
compareAndSet(null, value)is the idiomatic way to install something exactly once.
Don't invent lock-free data structures on the whiteboard. Name them, know that Java's ConcurrentHashMap, Go's sync/atomic, and Rust's std::sync::atomic give you the pieces, and move on.
Memory visibility — the quieter half of atomicity ​
Atomicity is "the read-modify-write happens indivisibly." Visibility is a separate property: "when one thread writes a value, when do other threads see it?" Naively you'd assume "immediately." In practice, CPUs have per-core caches and store buffers, compilers reorder loads and stores, and the answer is "when something forces a synchronization."
class WorkerFlag {
boolean done = false; // PROBLEM: non-volatile
void signal() { done = true; }
void waitFor() { while (!done) {} } // may loop forever
}The spinning thread may cache done in a register and never re-read it. Adding volatile (Java), atomic<bool> with the right ordering (C++), or going through a mutex gives you the visibility guarantee. This is why "I just need a flag, a plain boolean is fine" is a trap. Name it in the interview: "I'll mark it volatile so the other thread is guaranteed to observe the write."
TypeScript / Node note ​
JavaScript is single-threaded per event loop, so within one isolate you rarely need atomics — two lines of synchronous code can't race with each other. The exceptions:
SharedArrayBuffer+Atomics— real shared memory across workers.- Across Node cluster workers or processes, "atomicity" means "in Redis" (
INCR,SETNX), not JS atomics. - Interleavings between awaits are a concurrency hazard (one request's
awaitlets another request's handler run); they behave like cooperative multitasking on a single thread. A classic bug:if (!cache.has(k)) { const v = await load(k); cache.set(k, v); }— two concurrent callers both see "not in cache," both load, last write wins. Textbook check-then-act race even in a single-threaded runtime.
Deadlock ​
Deadlock is the failure mode where two or more threads each hold a resource the others need and none can make progress. Once deadlocked, the system stays deadlocked until external intervention (timeout, restart, thread dump + kill). It is not a crash — the process is "healthy" and unresponsive, which is often worse because alerts don't fire.
The four conditions ​
Coffman's four conditions. A deadlock requires all four to hold simultaneously. Break any one, the deadlock cannot occur.
| # | Condition | What it means | How you'd break it |
|---|---|---|---|
| 1 | Mutual exclusion | A resource is held exclusively; others wait. | Make the resource shareable (e.g. read-only, CAS-based). Often not possible — the resource is genuinely exclusive. |
| 2 | Hold and wait | A thread holding one resource blocks trying to acquire another. | Acquire all needed resources at once (tryLock everything or back out). |
| 3 | No preemption | A held resource can only be released by the holder voluntarily. | Use tryLock with timeout; release on failure. |
| 4 | Circular wait | There's a cycle in the wait-for graph (T1 waits on T2 waits on T1). | Impose a global ordering on resources; acquire in order. |
Of these four, circular wait is the one you actually control in code. Breaking mutual exclusion requires rethinking what the resource is. Breaking hold-and-wait and no-preemption pushes you toward tryLock-with-backoff, which introduces livelock risk. Breaking circular wait is almost free and is the default professional discipline.
Prevention techniques ​
Lock ordering. If every thread that needs locks A and B acquires A first, then B, no cycle can form. In an account transfer, sort the two accounts by ID:
void transfer(Account a, Account b, Money amount) {
Account first = a.id < b.id ? a : b;
Account second = a.id < b.id ? b : a;
synchronized (first) {
synchronized (second) {
a.debit(amount);
b.credit(amount);
}
}
}Without the sort, transfer(x, y) and transfer(y, x) running concurrently deadlock. With the sort, both grab the lower-id account first. Simple, bulletproof, the answer for almost every "can this deadlock?" question in an interview.
tryLock with timeout. Acquire with a deadline. On failure, release everything you hold and retry from scratch.
boolean transfer(Account a, Account b, Money amount) {
while (true) {
if (a.lock.tryLock(50, MILLISECONDS)) {
try {
if (b.lock.tryLock(50, MILLISECONDS)) {
try { a.debit(amount); b.credit(amount); return true; }
finally { b.lock.unlock(); }
}
} finally { a.lock.unlock(); }
}
// backoff
}
}Breaks the "no preemption" condition. The catch: without randomized backoff, two threads can repeatedly grab-and-release in lockstep — that's livelock, see next section.
Lock timeouts + assertions. In production, hang a monitoring wrapper that logs any lock held more than N seconds. Not prevention, but you learn about your deadlocks instead of waking up to an outage.
Avoid nested locks entirely. If a design requires holding two locks, ask whether a different decomposition would need one.
Interview script: "Can this deadlock?" ​
When the interviewer asks, walk them through the four conditions out loud. This is a set piece — have it ready.
"Let me check the four conditions. Mutual exclusion — yes, these locks are exclusive. Hold-and-wait — yes, in method
transferI hold lock A and ask for lock B. No preemption — yes, I'm usingsynchronized, no timeout. So all we need for deadlock is circular wait.Can circular wait happen here?
transfer(X, Y)locks X then Y. If another thread callstransfer(Y, X)concurrently, it locks Y then waits for X, while the first waits for Y. That's a cycle — so yes, this can deadlock.I'd fix it by imposing a lock order. Sort the two accounts by ID and always acquire the lower-ID lock first. That breaks circular wait, and the other three conditions are fine to leave as they are."
The interviewer is listening for three things: (1) you named the four conditions, (2) you identified which one applies, (3) you picked the cheapest fix rather than escalating to tryLock-and-retry.
Starvation and Livelock ​
Deadlock's cousins. Less famous, equally production-lethal.
Starvation is when a thread makes no progress because other threads keep beating it to a resource. The thread is runnable, the lock is repeatedly available — but every time it becomes available, someone else grabs it first.
- Unfair locks. Most default mutex implementations are unfair (barging). A thread releasing the lock is more likely to re-acquire it than a queued waiter, because the releasing thread is hot in cache. Throughput is higher; tail latency is worse. If one low-priority writer is queued behind a torrent of readers on an RW lock, it may wait forever.
- Priority inversion. A low-priority thread holds a lock; a high-priority thread waits for it; a medium-priority thread is CPU-bound and preempts the low-priority holder. High-priority waits, low-priority can't run, medium-priority has effectively inverted the priority order. The Mars Pathfinder bug. Fixes: priority inheritance (holder temporarily adopts the waiter's priority), priority ceiling protocols.
Livelock is when threads are actively running but none make useful progress, because they keep reacting to each other.
- Two tryLock-with-retry threads that repeatedly acquire-then-release in lockstep.
- Two conflict-resolution routines that each back off to let the other proceed — and both back off simultaneously, forever.
- Two hallway-walkers who keep side-stepping in the same direction.
Fix for livelock: randomized backoff. Each retry waits a random amount of time; the symmetry that caused the lockstep is broken. Exponential backoff with jitter is the production default.
LLD example — starvation. A connection pool with an unfair semaphore and a thread that keeps re-requesting connections in a tight loop. Other threads that request infrequently are constantly out-competed. The fix is a fair semaphore (FIFO queue).
LLD example — livelock. Two replicas trying to become leader: each sees the other's heartbeat, demotes itself, sees the other demote, re-promotes, sees the other re-promote... forever. Fix: randomized election timeouts (the Raft answer).
In an interview, "fair vs unfair" is a phrase to have ready when discussing locks. Default unfair, flip to fair when you see a starvation risk.
Idempotency ​
Idempotency is the property that running an operation once has the same effect as running it N times. It is the single most important correctness tool in distributed systems, and it shows up in LLD whenever the interviewer mentions retries, message processing, webhooks, or at-least-once delivery.
Why it matters: networks lose replies. The client sends chargeCard($50), the server succeeds, the ACK is dropped, the client retries. Without idempotency, you charged the card twice. With idempotency, the second call observes "already charged, request ID X" and returns the first result.
The idempotency key pattern ​
The client generates a unique key per logical operation and sends it with the request. The server stores a mapping: key → (result, status). On retry with the same key, the server returns the stored result without re-executing.
interface PaymentRequest {
idempotencyKey: string; // UUID the client mints once, reuses on retries
amount: Money;
accountId: string;
}
async function charge(req: PaymentRequest): Promise<PaymentResult> {
const existing = await store.getByKey(req.idempotencyKey);
if (existing) return existing.result;
const result = await processCharge(req);
await store.save(req.idempotencyKey, result, { ttl: "24h" });
return result;
}This has two subtle concurrency issues worth naming:
- Between the
getand thesave, two retries can both read "nothing stored" and both execute the charge. Fix:INSERT ... ON CONFLICTat the DB level, orSETNXin Redis, to make the "reserve this key" step atomic. Whoever wins the insert processes; the loser reads the winner's result. - In-flight retries. A retry arrives while the first call is still processing. Options: (a) block the retry until the first completes, (b) return a "still processing, retry later" response, (c) trust at-least-once semantics and let both run with a lower-level serialization (row lock on the account). Stripe's approach is (b) for most endpoints.
Request-ID design ​
Two levels of identifier to keep separate:
- Request ID / trace ID — unique per HTTP call. Changes on retry. Used for logging and tracing.
- Idempotency key — stable across retries of the same logical operation. Used for deduplication.
Conflating them is a common junior mistake. If the "ID" changes when the client retries, you have a request ID, not an idempotency key, and you haven't made the operation idempotent.
Natural vs manufactured idempotency ​
Some operations are naturally idempotent. setBalance(100) is — running it twice still leaves the balance at 100. debit(50) is not — running it twice leaves it $50 lower than intended. You can manufacture idempotency for any operation by layering a key check on top.
| Operation | Natural? | Fix if not |
|---|---|---|
PUT /user/123 { ... } | Yes | — |
POST /charge { amount } | No | Idempotency key |
DELETE /user/123 | Yes (idempotent by HTTP spec; second call = 404 or 204) | — |
incrementCounter() | No | Operation-ID check |
sendEmail(to, body) | No | Dedup key on (to, hash(body), recentWindow) |
createOrder(items) | No | Client-supplied order key |
LLD examples ​
Payment service. Idempotency key is non-negotiable. PCI and financial correctness require it. Store (key, result) for 24 hours minimum.
Notification service. Same email or push should not fire twice because of a retry. Key on (user, notification type, payload hash, time window). Cheaper failure: miss a notification. More expensive failure: double-send.
Message processing. Kafka is at-least-once; SQS is at-least-once. Your consumer must be idempotent or you will process the same message twice. Usually done by keeping a "processed message IDs" set (bounded, e.g. last 100k) or by making the downstream effect naturally idempotent.
Inventory decrement. decrement(productId, quantity) is not idempotent. You handle it by either (a) keying on an order ID and checking "has this order already decremented" before applying, or (b) moving to event-sourced inventory where each decrement has its own immutable event and replaying applies each event once.
In the interview: when you design any write API, the sentence "I'd make this endpoint idempotent with a client-supplied key stored for 24 hours" is a senior flag. Don't wait to be asked.
Reentrancy (Beyond Locks) ​
Reentrancy has a second meaning beyond "can the same thread re-acquire this lock." A reentrant function is one that's safe to call recursively, including from a callback invoked in the middle of its own execution.
The classic failure mode is the observer pattern:
class EventBus {
private listeners: Listener[] = [];
private firing = false;
addListener(l: Listener): void {
this.listeners.push(l);
}
fire(event: Event): void {
this.firing = true;
for (const l of this.listeners) {
l.handle(event); // BUG: what if handle() calls addListener()?
}
this.firing = false;
}
}If a listener's handle calls bus.addListener(newL), we're mutating the array mid-iteration. In Java you get ConcurrentModificationException. In JavaScript the behavior is defined but weird (the new element may or may not be visited in this iteration). In either case the programmer almost certainly did not intend it.
Fixes:
- Copy-on-iterate. Snapshot the list before iterating; new additions land in the next-fire's list.typescript
for (const l of [...this.listeners]) l.handle(event); - Queue mutations. During firing, append to a pending list; drain pending into listeners after firing.
- Forbid reentrant fires. Assert
!this.firingat entry and throw or drop.
The same shape appears in lots of LLD designs:
- A cache whose value-loader calls back into the cache for a related key. If the loader holds the cache's lock and tries to reacquire it, you'll hit reentrancy (fine under a reentrant lock, deadlock otherwise). If the loader triggers an eviction, you're mutating the cache mid-put.
- A state machine whose
onEnter(state)transitions to another state. Usually solved by queueing the transition instead of executing it synchronously. - A DOM / UI event handler that triggers another event (click → update → second click handler). React's reconciler has explicit handling for this (batched updates).
Ask yourself in every public method: could this, directly or via a callback, re-enter my own class? If yes, name the policy — "reentrant, supported" or "not reentrant, guarded by a flag/assertion" or "queued." Silent reentrancy bugs are among the harder ones to reproduce.
The Race-Detection Checklist ​
A five-question checklist to apply to every class you design in an interview. Say these out loud. Interviewers love it.
Is this state shared? If only one thread ever touches this field, stop. No concurrency question. Make the confinement explicit in a comment or a
@GuardedBy("this thread")annotation.Is it mutable? If immutable, you're done — publish safely (
finalin Java,readonlyin TypeScript, ownership transfer in Rust) and share freely. The overwhelming majority of "thread-safety" in well-designed systems is achieved by immutability, not locking.Is it mutated under a lock (or atomic, or in a lock-free structure)? Every write path and every read path that depends on consistent multi-field state must go through the same protection. A single unprotected writer ruins the scheme.
Is the lock granularity right for the workload? One lock per class is fine until you have evidence of contention. Stripe only when you need to. Don't per-key-lock unless you have a specific reason.
What happens under 1000× concurrency? Mentally (or on the whiteboard) pile 1000 threads onto the hottest path. Where do they queue? What's the tail latency? Is there a starvation risk? A deadlock opportunity? A retry storm?
A senior candidate routinely applies this to their own designs before the interviewer asks. You're not just answering "is this thread-safe" — you're demonstrating the habit of mind.
Language Notes ​
You should know the default concurrency model of the language you're interviewing in. One paragraph per big four:
Java. Threads map 1:1 to OS threads (until virtual threads / Project Loom, which flip the default for many use cases). Primitives in java.util.concurrent: synchronized, ReentrantLock, ReadWriteLock, Semaphore, CountDownLatch, CyclicBarrier, CompletableFuture, ExecutorService, ConcurrentHashMap, atomics in java.util.concurrent.atomic. Memory model is defined by the JMM — volatile gives visibility + ordering (not atomicity of compound ops), synchronized and Lock give both. Default lock fairness is unfair; ReentrantLock(true) flips it. This is the most concurrency-aware mainstream language by convention, and interviewers expect you to wield the vocabulary.
Go. Lightweight goroutines, M:N scheduled over OS threads by the runtime. Idiom is "share memory by communicating" — channels first, mutexes when channels don't fit. sync.Mutex (non-reentrant by design), sync.RWMutex, sync.WaitGroup, sync/atomic, context.Context for cancellation propagation. The race detector (go test -race) is excellent and production-usable. Goroutine leaks are the equivalent of memory leaks and come up in interviews (always pair a goroutine with a cancel path).
Rust. Threads are OS threads; async is cooperative, runtime-of-your-choice (tokio dominant). The distinctive feature is compile-time race prevention: Send (type can be moved across threads), Sync (type can be shared across threads). Arc<Mutex<T>> is the textbook "shared mutable state across threads." RwLock, atomic::*, channels in std::sync::mpsc and crossbeam. Data races are a compile error. Higher-level races (deadlock, logic races, starvation) are still your problem.
Node.js / TypeScript. Single-threaded event loop per process. Worker threads + SharedArrayBuffer + Atomics exist but are uncommon. Most "concurrency" you discuss in a Node LLD is really about interleaving of awaits on one thread — no data races inside a module, but plenty of race-condition-like bugs (check-then-act across an await). For multi-process scale (cluster, PM2, Kubernetes pods), state moves to Redis / Postgres and the concurrency discussion becomes about DB transactions, advisory locks, and distributed coordination. In a Node interview, naming async/await's cooperative nature and the SharedArrayBuffer + Atomics escape hatch signals you understand the model.
Python. The GIL means CPython threads don't run Python bytecode in parallel, though they do release the GIL around I/O — so thread pools still help for I/O-bound work. CPU-bound parallelism goes through multiprocessing or native extensions that release the GIL (NumPy, PyTorch). threading.Lock, threading.RLock (reentrant), asyncio for cooperative concurrency, concurrent.futures as the high-level executor. When someone asks you to design something concurrent in Python, the first senior question is "is this CPU-bound or I/O-bound?" — the answer dictates threads vs processes vs asyncio.
The interview point is not to recite these — it's to show that when you pick a primitive, you know what it costs in the runtime you're targeting. "I'd use a ConcurrentHashMap" is fine in Java; "I'd use a ConcurrentHashMap" in Go is a red flag.
What to Take to the Interview ​
- A mental taxonomy of state (immutable / synchronized / thread-local / confined / not-safe) and the discipline to classify every field.
- A script for deadlock that walks through Coffman's four conditions and lands on lock ordering.
- The idempotency sentence ("I'd make this endpoint idempotent with a client-supplied key stored for N hours") as a default reflex for any write API.
- A contention tradeoff table for coarse vs striped vs per-key vs lock-free, and the judgment to start coarse unless you have a reason.
- The five-question race checklist applied proactively, not just when asked.
- Enough language-specific vocabulary to make your primitive choices credible in the language you're interviewing in.
Coordination primitives — the other half of concurrency, where you stop defending shared state and start orchestrating workers — are file 07.