07 — Concurrency: Coordination Primitives

Correctness vs Coordination

File 06 covered the primitives that keep you out of trouble when threads touch the same state — mutexes, read-write locks, atomics, memory ordering. That vocabulary is about correctness: preventing torn reads, lost updates, and visibility bugs. It's defensive work. You hold a lock so that nothing bad happens while you're in the critical section.

This file is about the other half of concurrency: coordination. Locks don't get work done in parallel — they stop it. Coordination primitives are how threads cooperate to produce throughput. A semaphore admits N workers into a pool. A blocking queue hands off work between a fast producer and a slow consumer. A thread pool keeps a bounded fleet of workers busy. A future lets one thread fan out work and collect the result later.

The distinction that signals seniority

Junior candidates reach for synchronized / mutex.lock() for every concurrency problem, because that's the tool they learned first. Senior candidates pick the coordination primitive that matches the shape of the problem. "Limit concurrent DB connections" isn't a mutex problem — it's a semaphore. "Fan out work to workers" isn't a lock problem — it's a queue + pool. Using a mutex where a semaphore fits (or vice versa) is a vocabulary tell. The interviewer hears "I know this one tool" instead of "I know the toolbox."

The mental split to internalize:

Concern	Primitives	What they answer
Correctness	Mutex, RWLock, atomic, memory barriers	"How do I safely mutate shared state?"
Coordination	Semaphore, blocking queue, thread pool, future, condition variable	"How do threads hand off work and wait for each other?"
Scheduling	Executor, scheduler, delay queue, cron	"When and where does this work run?"

Every LLD problem that gets past the happy path needs both halves. A rate limiter needs a lock and a semaphore. An order processor needs idempotent writes and a bounded queue. The senior reflex is to name which primitive solves which concern separately, instead of reaching for one hammer.

Semaphore

What it is

A semaphore is a counter with two atomic operations: acquire() (decrement; block if the count would go below zero) and release() (increment; wake a waiter if any). The count represents how many permits are available — how many workers are allowed into the guarded resource right now.

java

Semaphore permits = new Semaphore(10); // 10 permits

void handleRequest() {
  permits.acquire();      // blocks if all 10 are in use
  try {
    callDownstream();
  } finally {
    permits.release();    // always return the permit
  }
}

A mutex is a semaphore with a permit count of exactly 1, but that framing buries the distinguishing feature: a semaphore has no owner. Any thread can call release() regardless of whether it was the one that called acquire(). That's the property that makes it a coordination primitive rather than a mutual-exclusion one.

Binary semaphore vs counting semaphore

Variant	Permit count	Typical use	How it differs from a mutex
Binary semaphore	1	Signaling between threads (thread A finishes, thread B proceeds).	No ownership — the releaser doesn't have to be the acquirer. Use when you're signaling, not protecting state.
Counting semaphore	N > 1	Bounded resource pool (connections, workers, API quotas).	Lets N threads in concurrently. Mutex tops out at 1 — wrong tool for resource-pool problems.

LLD use cases

Limit concurrent downstream calls. Your service fans out to a DB that melts above 20 concurrent queries. Wrap the call path in Semaphore(20). Excess callers queue; none of them kill the DB.
Resource pool (connections, file handles, GPUs). The pool size is the permit count. acquire() hands you a connection; release() returns it. Classic.
Bulkhead pattern. Partition your thread budget so that one slow dependency can't starve another. Dependency A gets 20 permits, B gets 20, C gets 10 — if C hangs, A and B keep serving.
Rate-limit fan-out. Processing 10k notifications but the SMS provider caps you at 100 concurrent sends? Counting semaphore with 100 permits, acquired inside each worker before the send call.
Admission control on expensive operations. PDF rendering, video transcoding, ML inference — work that consumes a lot of memory. Semaphore gates how many run at once so the process doesn't OOM under a traffic spike.

The mistake people make

The most common error is reaching for a semaphore to guard mutable state:

java

// WRONG — semaphore to protect a shared counter
Semaphore s = new Semaphore(1);
s.acquire();
counter++;
s.release();

This works, but it abuses the primitive. The intent here is mutual exclusion — use a mutex. Readers of your code who see a semaphore are looking for a resource count, not a critical section. Using a binary semaphore as a mutex hides the ownership expectation: any thread can release it, so if your code has a bug where one thread acquires and a different thread releases, a mutex would catch it and a semaphore won't.

The reverse mistake is using a mutex where concurrency should be allowed:

java

// WRONG — mutex when 10 concurrent workers are fine
synchronized void callDb() { ... }  // serializes every caller!

You wanted to protect the DB from more than 10 concurrent queries — you accidentally protected it from more than one. Throughput tanks.

Rule of thumb: if the answer to "how many threads should be inside at once?" is literally 1 — mutex. If it's a tunable number — semaphore.

Producer-Consumer

The pattern

One or more producer threads generate work items; one or more consumer threads pull them off and process them. Between them sits a queue. That's it. The pattern is fossilized into every language's standard library because it's the backbone of every asynchronous pipeline ever built.

[Producer] ──put()──> [ Queue ] ──take()──> [Consumer]
[Producer] ──put()──────^                    [Consumer]
                                             [Consumer]

Why it decouples speed

The producer and consumer run at different rates. An HTTP handler can accept 10k requests per second but your email provider tops out at 500/s. Without the queue, either the handler blocks on the email call (request latency balloons) or the email provider melts. With a queue, the handler drops the work item and returns in 10ms; consumers drain the queue at whatever rate the provider supports.

Decoupling isn't just about speed — it's about isolation of failure domains. If the email provider goes down, consumers block or retry, but the handler keeps accepting requests (up to the queue's capacity). The blast radius of a downstream failure stops at the queue boundary.

Bounded vs unbounded

Every producer-consumer design has to choose a queue capacity. The wrong default kills you in production.

Choice	Behavior	When it bites
Unbounded (default in most libs!)	Producer never blocks. Queue grows until the process OOMs.	Consumer gets slow → queue grows unboundedly → eventually the JVM dies. You lose everything in the queue, not just the slow item.
Bounded	Producer blocks (or rejects) when full. Applies backpressure upstream.	Producer latency gets worse under load — but that's the correct signal. Load shedding beats cascading failure.

The senior default is bounded. An unbounded queue is an unexercised memory leak — the leak only materializes when something downstream slows down, which is exactly when you can least afford it. A bounded queue forces you to answer "what happens when this is full?" before production asks the question at 3am.

java

// DANGEROUS — default LinkedBlockingQueue is unbounded
ExecutorService pool = Executors.newFixedThreadPool(10);

// BETTER — bounded queue + explicit rejection policy
ExecutorService pool = new ThreadPoolExecutor(
  10, 10, 0, TimeUnit.MILLISECONDS,
  new ArrayBlockingQueue<>(1000),        // bounded
  new ThreadPoolExecutor.CallerRunsPolicy() // backpressure
);

Real LLD problems where it shows up

Notification service fan-out. One event → push notification, email, SMS, in-app. Each channel has its own queue and workers. A slow SMS provider doesn't stall the email channel.
Order processing pipeline. Place order → validate → charge → allocate inventory → notify. Each stage is a queue + workers. Inventory allocation can fall behind at peak without dropping orders.
Log / metric ingestion. Application threads drop log events into a bounded in-process queue. A background thread batches and ships them. Application latency stays predictable; the logger is allowed to fall behind or drop.
Webhook delivery. Incoming webhook → enqueue → deliver. Decouples the receive path (must be fast) from the delivery path (retries, backoff, dead-letter).
Image/video processing. Upload → enqueue → transcode workers. The upload handler returns immediately with a job ID; the client polls or subscribes.

Blocking Queue

The BlockingQueue is the implementation detail that makes producer-consumer pleasant. Without it, you're juggling a queue plus a mutex plus two condition variables plus "full" and "empty" predicates — all of which you'll get subtly wrong at 11pm. The blocking queue bundles them.

The shape of the API everyone expects:

java

interface BlockingQueue<T> {
  void put(T item) throws InterruptedException;   // blocks if full
  T    take()     throws InterruptedException;    // blocks if empty

  boolean offer(T item, long timeout, TimeUnit unit);  // bounded wait, returns false on timeout
  T       poll(long timeout, TimeUnit unit);           // bounded wait, returns null on timeout

  boolean offer(T item);  // non-blocking: returns false if full
  T       poll();         // non-blocking: returns null if empty
}

Why this is cleaner than wait() / notify() by hand:

Hand-rolled	`BlockingQueue`
You forget `notifyAll` vs `notify` and wake the wrong thread.	Correct signalling is baked in.
You re-check the predicate in a loop — or forget to, and ship a spurious-wakeup bug.	`put`/`take` handle the loop for you.
Timeouts require extra bookkeeping.	`offer` / `poll` have timeout overloads.
Adding a second producer means another round of review.	No producer-count assumption in the API.

Common implementations and when to reach for each:

Implementation	Backing structure	When to pick it
`ArrayBlockingQueue`	Fixed-size ring buffer.	Bounded capacity fixed at construction. Single lock — simpler, slightly lower throughput under heavy contention. The default senior choice.
`LinkedBlockingQueue`	Linked list. Optional capacity.	Watch out: unbounded by default! If you use it, always set a capacity. Dual-lock (head & tail) means higher throughput at high concurrency.
`SynchronousQueue`	Zero capacity — direct handoff.	Each `put` waits for a `take`. Used under the hood by `Executors.newCachedThreadPool`. Pure coordination, no buffering.
`PriorityBlockingQueue`	Heap.	Delayed/prioritized jobs. Unbounded — beware.
`DelayQueue`	Heap keyed on delay timestamp.	"Do X at time T" scheduling. Cornerstone of the seat-lock-timeout problem below.

TypeScript note: Node doesn't have a built-in blocking queue because the event loop is single-threaded. The equivalent pattern is an async queue with await semaphore — conceptually identical, just Promise-based. Libraries like p-queue or async.queue implement it.

Condition Variables

When they beat plain mutexes

A mutex answers "can I enter the critical section?" A condition variable answers a different question: "the section is empty — wake me when the state becomes X." It's the primitive you need when a thread has to wait for something to become true, not just wait its turn.

The canonical scenario is any "wait until available" pattern: bounded buffer full/empty, connection pool empty, latch not yet counted down, shutdown not yet signalled.

java

ReentrantLock lock = new ReentrantLock();
Condition notEmpty = lock.newCondition();
Condition notFull  = lock.newCondition();
Queue<T> buf = new ArrayDeque<>();
int capacity = 100;

void put(T item) throws InterruptedException {
  lock.lock();
  try {
    while (buf.size() == capacity) notFull.await();  // note: WHILE, not IF
    buf.add(item);
    notEmpty.signal();
  } finally { lock.unlock(); }
}

T take() throws InterruptedException {
  lock.lock();
  try {
    while (buf.isEmpty()) notEmpty.await();
    T item = buf.poll();
    notFull.signal();
    return item;
  } finally { lock.unlock(); }
}

Two condition variables, not one. If you used a single condition and called signal(), a waiting producer might get woken when another producer should be. signalAll() works but wakes every waiter every time — a thundering herd. Separate conditions per predicate is the clean design.

Spurious wakeups — why you always re-check the predicate in a loop

Condition.await() is allowed to return without any corresponding signal(). This is not a bug — the OS and JVM are permitted to wake threads for implementation reasons (signal delivery, kernel event coalescing). Your code must assume any wakeup is meaningless unless the predicate actually holds.

java

// WRONG
if (buf.isEmpty()) notEmpty.await();  // spurious wakeup → you read from an empty buffer
T item = buf.poll();                   // returns null, crashes downstream

// RIGHT
while (buf.isEmpty()) notEmpty.await();
T item = buf.poll();

Every await / wait call in production code should be inside a while (!predicate) loop. Treating await as if it were "sleep until signalled" is the single most common bug with condition variables, and a senior interviewer will catch it the instant they see if (...) wait().

The same applies in POSIX (pthread_cond_wait), C# (Monitor.Wait), Python (Condition.wait), and every other implementation. Spurious wakeups are a language-agnostic reality.

Thread Pools

Why not spawn a thread per task

"Just start a thread" feels simple until you look at what it costs:

Cost	Magnitude	Why it matters
Thread creation	~1 ms, several MB stack allocation	Spawning per request ties request latency to thread-creation latency.
OS thread limit	~10k–50k per process (ulimit, kernel scheduler)	A request spike → fork-bomb → `OutOfMemoryError: unable to create native thread`.
Context-switch overhead	~1–10 μs per switch	10k threads all runnable on 8 cores means the CPU spends most of its time switching, not working.
Kernel memory	~8 KB per thread (kernel stack + TCB)	Thousands of idle threads still cost memory.
GC pressure (Java)	Each thread's stack is a GC root	More threads → longer GC pauses.

A thread pool amortizes the creation cost across many tasks, caps the total thread count, and replaces "create a thread" with "submit a task." The pool itself is a producer-consumer system: submitters are producers, workers are consumers, the queue is the task queue.

Pool sizing rules of thumb

The two rules you'll be asked for:

Workload shape	Sweet spot	Rationale
CPU-bound (image transcode, compression, math)	`~N_cores` (often `N_cores + 1`)	More workers than cores just context-switches. One extra absorbs an occasional page fault / blocking syscall.
IO-bound (DB calls, HTTP calls, disk)	`N_cores × (1 + wait_time / compute_time)` — often 50–200	Workers sit blocked on IO; more workers → higher utilization of each core. Bounded by downstream tolerance and memory.
Mixed	Split into two pools	One CPU-bound pool, one IO-bound pool. Prevents IO-heavy tasks from starving CPU-heavy ones.

The formula for IO-bound is Little's Law in disguise. If each task waits 90ms and computes for 10ms, one core can usefully run 10 such tasks in parallel, so 8 cores → ~80 workers. Memorize the shape, don't worry about the exact numbers in interview.

Senior move: always pair pool size with a bounded queue and an explicit rejection policy. Pool size alone doesn't bound memory — a pool of 10 with an unbounded queue is still an OOM waiting to happen.

Rejection policies

When the queue is full and every worker is busy, the pool has to decide what happens to the new task:

Policy	Behavior	When it's right
Abort (throw)	Caller gets `RejectedExecutionException`.	Default for most cases. Forces the caller to make the decision — retry, log-and-drop, surface 503.
Caller-runs	The submitting thread runs the task itself.	Applies natural backpressure — producer gets slow, so upstream gets slow. Great for ingress paths that have their own thread pool upstream (e.g., an HTTP server).
Discard	Silently drop the new task.	Only for work where loss is acceptable (metrics, sampled logs). Dangerous default — hides overload.
Discard-oldest	Drop the head of the queue, then enqueue the new task.	LIFO semantics for time-sensitive work (live telemetry where stale is useless). Niche.

The one people get wrong: silently defaulting to discard. The symptoms look like "work just... didn't happen" with no error log. Always pick a policy explicitly and make the failure loud, even if it's just log.error("task rejected") in a custom handler.

Executor frameworks

Platform	Model	Notes
Java `ExecutorService`	Pool of OS threads + queue + policy.	The reference implementation everyone else is compared against. `ThreadPoolExecutor` exposes every knob; `Executors.newFixedThreadPool` hides them (and ships an unbounded queue — be careful).
Java 21 virtual threads	User-mode "green" threads multiplexed over carrier OS threads.	Changes the rules for IO-bound. Spawn a virtual thread per request; the JVM parks them cheaply on IO. Thread-pool sizing for IO becomes less critical. Not a drop-in for CPU-bound — still need a carrier pool.
Go goroutines	M:N scheduler. `go f()` is a scheduler-managed green thread.	Runtime multiplexes goroutines over `GOMAXPROCS` OS threads. You don't size a "pool" — you lean on goroutines and use channels for backpressure. The equivalent of pool size becomes channel capacity + semaphore.
Node.js `worker_threads`	Actual OS threads, each with its own V8 isolate + event loop.	Heavy. Use for CPU-bound offload, not per-request. Main event loop is the "pool" for IO.
Python `concurrent.futures`	`ThreadPoolExecutor` (for IO, GIL blocks CPU parallelism) or `ProcessPoolExecutor` (real parallelism, process overhead).	GIL means thread pool is IO-only. For CPU-bound, process pool or `asyncio` or native extensions.
Async / coroutine runtimes (Tokio, `asyncio`)	Task scheduler over a thread pool.	Tasks are cooperatively scheduled; blocking syscalls must go through runtime-aware APIs or you starve the pool.

The senior takeaway: the platform dictates the idiom, but the concepts are identical. "Bounded workers + backpressure queue + rejection" is the universal shape whether you call them goroutines, virtual threads, or workers.

Futures / Promises

Synchronous vs asynchronous composition

A future (Java) / promise (JS) is a handle to a result that will exist later. get() / await blocks until it does. That's the easy part.

The interesting part is composition. The moment you have more than one future, you need to decide how they combine:

Combinator	Meaning
`all(f1, f2, f3)`	Wait for all. First failure fails the whole thing.
`allSettled`	Wait for all, whether they succeed or fail. Returns the outcomes.
`any(f1, f2, f3)`	First success wins. All failures → aggregate failure.
`race(f1, f2, f3)`	First settled wins — success or failure. Useful with timeouts.
`select(ch1, ch2)` (Go)	Wait on multiple channels at once; first ready wins.

Why await isn't enough — sometimes you want explicit composition

typescript

// sequential — 600ms total
const user    = await getUser(id);       // 200ms
const orders  = await getOrders(id);     // 200ms
const credits = await getCredits(id);    // 200ms

// concurrent — 200ms total
const [user, orders, credits] = await Promise.all([
  getUser(id), getOrders(id), getCredits(id)
]);

The first version serializes three independent requests. In an interview, writing the first when the second is correct is a tell that you're thinking of await as "wait here" instead of "resolve this particular handle." Senior candidates reach for explicit composition when the dependency graph is wider than one.

Timeout via race:

typescript

Promise.race([
  fetch(url),
  new Promise((_, rej) => setTimeout(() => rej(new Error("timeout")), 5000))
])

Every external call should have a timeout. The race pattern is the composable way to add one.

Cancellation

"Can this be cancelled?" is the senior question on any async API. An in-flight future whose caller has given up should stop doing work — otherwise you're paying for results nobody will read.

Language-specific idioms all converge on "pass a cancellation token":

Platform	Pattern
Go	`context.Context` — first parameter of every blocking function. `ctx.Done()` returns a channel that closes on cancel.
.NET	`CancellationToken` — passed through call chains, polled by cooperative code.
JS	`AbortSignal` (from `AbortController`) — passed to `fetch`, timers, etc.
Java	`Future.cancel(mayInterrupt)` + `Thread.interrupt()` — cooperative; task must check `Thread.currentThread().isInterrupted()`.
Python `asyncio`	`task.cancel()` raises `CancelledError` at the next `await` point.

Cancellation is cooperative, not preemptive. You can't actually force a thread to stop mid-computation (not safely — pre-2006 Java had Thread.stop and it broke invariants everywhere). What you can do is signal intent and let the task check. Which means:

Long-running tasks must periodically check the cancellation signal.
Blocking IO must be cancellation-aware (e.g., InterruptibleChannel, AbortSignal-aware fetch).
Cancellation propagates — if A waits on B, cancelling A should cancel B.

Interviewers will ask "what if the user closes the tab while this is running?" If your answer is "uh, the promise resolves and we throw away the result," you've wasted work. The right answer is "we propagate the abort signal down through the call chain and each stage bails early."

Async Task Execution

Fire-and-forget

"Kick off work and don't wait for it" is dangerous when it's wrong and fine when it's right. The axis is whether the caller's correctness depends on the work completing.

Case	Fire-and-forget?	Why
Emitting a metric counter	✅ Fine	If the metric is lost, you lose observability, not correctness.
Updating a non-critical cache	✅ Fine	Cache miss on next read is self-healing.
Writing an audit log	⚠️ Only if audit has a backup path	Regulators will ask.
Sending a payment side-effect	❌ Bug	If the process crashes before it runs, money is wrong.
Decrementing inventory after order	❌ Bug	Oversell.
Sending a confirmation email	❌ Bug	Customer never sees it, support tickets flood in.

The senior framing: if losing the work would require a human to investigate, it's not fire-and-forget. It's a durable async job — it needs a queue that survives process death (DB, Redis, SQS, Kafka), and a consumer with retry semantics.

A common junior pattern:

typescript

// WRONG — handler returns 200 but the email may never send
app.post("/order", async (req, res) => {
  const order = await createOrder(req.body);
  sendConfirmationEmail(order);    // <-- no await, no queue, no retry
  res.json({ id: order.id });
});

If sendConfirmationEmail throws after the response, the error is unhandled and the email is lost. The fix is to enqueue into a durable job system and let a worker handle retries:

typescript

app.post("/order", async (req, res) => {
  const order = await createOrder(req.body);
  await jobs.enqueue("send-email", { orderId: order.id });  // durable
  res.json({ id: order.id });
});

Background job patterns

For work that must happen later or repeatedly:

Pattern	Mechanism	LLD example
Scheduled executor	`ScheduledExecutorService.schedule(task, delay)` / `setTimeout`	"Send reminder 24h before event."
Delay queue	`DelayQueue<T>` — tasks dequeue when their delay elapses.	"Release the seat lock 5 min after booking started if not confirmed."
Cron / periodic	`scheduleAtFixedRate` / cron / K8s CronJob	"Recompute leaderboard every 10 min."
Durable job queue	SQS, Redis queue, Sidekiq, Temporal	Multi-process, survives restarts, at-least-once.
Workflow engine	Temporal, Airflow, Step Functions	Multi-step sagas with checkpoints.

The seat-lock example from the movie-booking problem: when a user starts booking, the seat is soft-locked. If they don't confirm in 5 minutes, the lock must be released. Three implementations, in order of sophistication:

Poll-based sweep. A cron job runs every minute, scans for expired locks, releases them. Simple, up to 1min latency, survives restart (lock state is in DB).
In-process DelayQueue. On lock creation, enqueue a release-task with a 5min delay. Single worker thread dequeues as delays expire. Low latency, lost on process restart — you'd back it with a DB sweep anyway.
Durable delayed job. SQS with DelaySeconds, or Redis ZADD with score = expiry timestamp + poller. Survives restart, scales across processes. The right answer for production.

Interviewers are happy with (1) or (2) if you explain the tradeoff and name (3) as the production path.

At-least-once vs at-most-once vs exactly-once

The delivery-semantics taxonomy is a senior-interview staple:

Semantics	Meaning	How you achieve it	Cost
At-most-once	Message delivered 0 or 1 times. Loss is possible.	Fire-and-forget, no retry.	Simplest. Losses on crash / network error.
At-least-once	Message delivered ≥ 1 time. Duplicates possible.	Ack after processing. Retry on no-ack.	Consumer must be idempotent.
Exactly-once	Delivered exactly 1 time.	(Mostly) a lie in distributed systems. Requires transactional coordination across queue + consumer.	Expensive or not really what's offered (Kafka's "exactly-once" is scoped within Kafka).

Exactly-once is usually a lie

Any real production system you build at scale will be at-least-once plus idempotent consumers, because that's what you can actually achieve without pinning your infrastructure to one vendor's special transaction. The framing in an LLD interview: "I'll build at-least-once delivery and design each consumer to be idempotent via a unique message ID / dedup key, which gives us effective exactly-once semantics."

Idempotency in practice — the consumer stores a dedup key (message ID, idempotency key) and checks it before acting:

java

void handle(Message m) {
  if (!dedupStore.claim(m.id)) return;  // already processed — drop
  doWork(m);
  dedupStore.markDone(m.id);
}

The dedup store is a DB table with a unique key, or Redis with SET NX EX. The claim+doWork+markDone sequence has its own subtleties (what if we crash between claim and markDone? → background cleanup), but the core pattern is idempotency keys + dedup lookup. Any LLD discussion of async task delivery should land here.

Putting It Together — A Worked Example

Problem: Design a notification dispatcher that handles ~1M events/day with per-user rate limits (e.g., no more than 5 notifications per user per minute).

The input is an event stream. For each event, we look up the user's notification preferences, check the rate limit, and send via one or more channels (push, email, SMS). Each channel has its own throughput ceiling.

Primitives in play:

Bounded blocking queue — the event intake. Applies backpressure to the event source when consumers fall behind. Size ~10k events; if full, the producer blocks or the upstream publisher (Kafka, SQS) gets the backpressure signal.
Thread pool (worker pool) — N workers (say 50) pull from the queue. Sized for IO-bound work (most latency is waiting on downstream channels).
Per-user rate-limit semaphore — not one global semaphore. A map of userId → Semaphore(5) with sliding-window accounting, protecting per-user quota. Cheaper alternative: a token-bucket per user (one in-memory structure, not a thread-blocking primitive).
Per-channel semaphore — pushSemaphore = Semaphore(200), emailSemaphore = Semaphore(100), smsSemaphore = Semaphore(20). Each send acquires and releases. This is the bulkhead — a slow SMS provider can't starve push delivery.
Future-based retry with timeout — each send is wrapped in a timeout (race with a 5s timer). On failure, re-enqueue with exponential backoff, capped at 3 retries, then to dead-letter.
Durable dead-letter queue — messages that fail after all retries land in a DLQ table for human review. At-least-once + idempotency keys throughout.

Sketch:

typescript

class NotificationDispatcher {
  private queue = new BoundedQueue<Event>(10_000);
  private workers: Worker[];
  private perUserLimiter = new TokenBucketMap(5 /* per min */);
  private channels = {
    push:  new Semaphore(200),
    email: new Semaphore(100),
    sms:   new Semaphore(20),
  };

  async accept(e: Event): Promise<void> {
    await this.queue.put(e);           // blocks → backpressure
  }

  private async worker(): Promise<void> {
    while (!this.shutdown) {
      const e = await this.queue.take();
      if (!this.perUserLimiter.allow(e.userId)) {
        this.metrics.rateLimitDropped(); continue;
      }
      for (const channel of e.channels) {
        await this.sendWithRetry(e, channel);  // acquires channel semaphore inside
      }
    }
  }

  private async sendWithRetry(e: Event, channel: Channel): Promise<void> {
    const sem = this.channels[channel];
    await sem.acquire();
    try {
      await Promise.race([
        this.channelClient(channel).send(e),
        timeout(5000)
      ]);
    } catch (err) {
      if (e.retries < 3) {
        this.queue.putDelayed(withRetry(e), backoff(e.retries));
      } else {
        this.deadLetter.put(e);
      }
    } finally {
      sem.release();
    }
  }
}

Notice how every primitive has a specific job:

Queue → decouples intake from dispatch, applies backpressure.
Pool → bounds concurrency.
Per-user rate limiter → fairness.
Per-channel semaphore → bulkheading.
Future + race → timeout.
Durable retry + DLQ → at-least-once with idempotency.

A junior answer is "I'll have a thread pool that processes events." A senior answer names every primitive and the specific pressure it absorbs.

Coordination Anti-Patterns

The anti-patterns to call out — either because you recognize them in your own code, or because an interviewer will hand you code containing one and ask "what's wrong here?":

Busy-wait / spin loops. while (!condition) {} burns a core. Use a condition variable, a blocking queue, or at minimum Thread.yield() / sleep(small). Spin-waits are only acceptable in tight kernel-level code where the wait is known to be microseconds.
Object.wait() / Condition.await() without a while loop. Spurious wakeups will bite you. Always re-check the predicate.
Fire-and-forget for correctness-critical work. If losing the call requires human escalation, it needs a durable queue, not a setTimeout.
Unbounded queues. LinkedBlockingQueue with no capacity, Executors.newFixedThreadPool (unbounded queue under the hood), raw ArrayList as a job buffer. Every unbounded queue is a latent OOM.
Shared mutable state in a worker that you swore was independent. "Each worker has its own state" — except that Map they all read from, which you later make mutable. Worker independence is a claim to audit, not assume.
Spawning a thread per request. new Thread(() -> handle(req)).start() inside a handler. Fine in a 3-request demo, fork-bomb at load.
Treating Promise.all / Future.allOf as error-safe. The first rejection rejects the aggregate, but the other promises keep running and their rejections become unhandled if nobody catches them. Use allSettled when you want to wait for all outcomes.
Global mutex for "simplicity." "I'll just synchronized the whole service." You've serialized the entire request path; throughput caps at 1.
Using a mutex across an await / blocking IO. In async code, holding a lock across await can deadlock (if the awaited work needs the lock) or kill throughput. In threaded code, holding a mutex across a downstream RPC makes the lock's hold time include network latency. Reduce the critical section before the blocking call.
Shutdown that never happens. Pools with no awaitTermination, workers with no interrupt handling, while (true) { take() } with no shutdown check. "How do we cleanly stop?" is an interviewer favorite.
Exception swallowing in pool workers. while (true) { try { run(task); } catch (Exception e) {} }. One bug makes every task silently no-op. Log and surface.
Re-entering your own pool from within a task. Task A submits task B to the same pool and get()s it. If the pool is single-threaded (or saturated), you've deadlocked. Use separate pools for dependent stages, or CompletableFuture composition.

Interview Cheatsheet

When the interviewer describes a problem, map the phrasing to the primitive on first pass:

Problem phrasing	Primitive to reach for
"Limit concurrent access to N..."	Counting semaphore. Permits = N.
"Only one thread can X at a time"	Mutex. Not a semaphore.
"Work queue with backpressure"	Bounded blocking queue + producer blocks on put.
"Decouple producer speed from consumer speed"	Producer-consumer + bounded queue.
"Wait until N tasks finish"	CountDownLatch (one-shot) or Phaser (reusable).
"Wait for a specific condition on shared state"	Condition variable + `while(!pred)` loop.
"Run work without blocking the caller"	Thread pool + future.
"Concurrent map access"	Concurrent map (CAS-based) — not a mutex around HashMap.
"Fan out then gather"	`Future.allOf` / `Promise.all`.
"First one wins / timeout"	`Future.anyOf` / `Promise.race`.
"Cancel this if the caller bails"	Cancellation token (`Context`, `CancellationToken`, `AbortSignal`).
"Do X after Y minutes / at time T"	Delay queue or scheduled executor.
"Retry on failure"	Future + backoff + cap; idempotent consumer.
"Rate-limit per user"	Token bucket per user, not a per-user semaphore.
"Rate-limit fan-out to a downstream"	Semaphore(N) around the call site.
"Keep one failing dependency from starving others"	Bulkhead = separate pools / separate semaphores per dependency.
"Work must not be lost on crash"	Durable queue (SQS/Kafka/Redis) + idempotent consumer.
"Process the oldest first"	FIFO queue.
"Process the highest priority first"	Priority queue.
"Multiple stages of processing, each with its own speed"	Pipeline: bounded queue between every pair of stages.

The meta-pattern: name the primitive, name the pressure it absorbs, name the failure mode if you chose wrong. That's the senior vocabulary.

07 — Concurrency: Coordination Primitives ​

Correctness vs Coordination ​

Semaphore ​

What it is ​

Binary semaphore vs counting semaphore ​

LLD use cases ​

The mistake people make ​

Producer-Consumer ​

The pattern ​

Why it decouples speed ​

Bounded vs unbounded ​

Real LLD problems where it shows up ​

Blocking Queue ​

Condition Variables ​

When they beat plain mutexes ​

Spurious wakeups — why you always re-check the predicate in a loop ​

Thread Pools ​

Why not spawn a thread per task ​

Pool sizing rules of thumb ​

Rejection policies ​

Executor frameworks ​

Futures / Promises ​

Synchronous vs asynchronous composition ​

Why await isn't enough — sometimes you want explicit composition ​

Cancellation ​

Async Task Execution ​

Fire-and-forget ​

Background job patterns ​

At-least-once vs at-most-once vs exactly-once ​

Putting It Together — A Worked Example ​

Coordination Anti-Patterns ​

Interview Cheatsheet ​

07 — Concurrency: Coordination Primitives

Correctness vs Coordination

Semaphore

What it is

Binary semaphore vs counting semaphore

LLD use cases

The mistake people make

Producer-Consumer

The pattern

Why it decouples speed

Bounded vs unbounded

Real LLD problems where it shows up

Blocking Queue

Condition Variables

When they beat plain mutexes

Spurious wakeups — why you always re-check the predicate in a loop

Thread Pools

Why not spawn a thread per task

Pool sizing rules of thumb

Rejection policies

Executor frameworks

Futures / Promises

Synchronous vs asynchronous composition

Why await isn't enough — sometimes you want explicit composition

Cancellation

Async Task Execution

Fire-and-forget

Background job patterns

At-least-once vs at-most-once vs exactly-once

Putting It Together — A Worked Example

Coordination Anti-Patterns

Interview Cheatsheet