Backend Fundamentals for Uber L5A (R2 / R4 / HM) ​
This is your fast-reference for backend CS concepts that Uber interviewers probe in R2 deep-dive coding, R4 HLD follow-ups, and HM behavioral. Each item has: a one-liner definition, when to use / when not to use, and a likely interview follow-up.
Pair this with the HLD problem files (50-56) — HLD rounds will probe these fundamentals behind every architectural decision.
Quick Reference Cheat Sheet ​
| Concept | One-line |
|---|---|
| ACID | Atomicity, Consistency, Isolation, Durability — the transactional guarantees of classic RDBMS |
| Read Committed | Default in Postgres; prevents dirty reads, allows non-repeatable reads and phantoms |
| Repeatable Read | Prevents non-repeatable reads; Postgres implements it with snapshot isolation |
| Serializable | Strongest isolation; implemented via SSI (Postgres) or strict 2PL |
| B-tree index | Ordered, O(log n), supports range queries — default for most RDBMS |
| Hash index | O(1) point lookup, no range queries |
| Cache-aside | App checks cache, loads DB on miss, populates cache — default pattern |
| Write-through | Writes go to cache + DB atomically — strong consistency, higher write latency |
| LRU | Evict least recently used — good for temporal locality |
| Thundering herd | Many concurrent misses on the same key — mitigate with coalescing or jitter |
| Kafka partition | Unit of ordering + parallelism; one consumer per partition per group |
| At-least-once | Kafka default; consumer must be idempotent |
| Idempotency key | Client-supplied token to dedupe retries; store with TTL in Redis |
| DLQ | Dead-letter queue for poison messages; retry with exponential backoff + jitter |
| HTTP/2 | Multiplexed, header compression, server push; binary framing |
| gRPC | Protobuf + HTTP/2; 4 streaming types |
| CAP | You pick 2 of Consistency/Availability/Partition-tolerance in a partition |
| PACELC | Extends CAP with latency vs consistency tradeoff in normal operation |
| Raft | Leader-based consensus; used by etcd, Consul, CockroachDB |
| Vector clock | Captures causality across distributed writes |
| Circuit breaker | Fail fast when downstream is unhealthy; half-open to probe recovery |
| Bulkhead | Isolate thread pools / connections per dependency to contain failure |
| CQRS | Split read model from write model; scale independently |
| Saga | Distributed transaction via compensating actions |
| Outbox | Transactionally publish events by writing them with your DB write |
| JWT | Stateless bearer token signed by issuer |
| mTLS | Both sides present certs; standard for service-to-service auth |
| L4 LB | Balance on TCP/UDP; no app awareness |
| L7 LB | Balance on HTTP; routing by path, headers |
| Consistent hashing | Minimize remaps on node add/remove; used in sharding + caches |
| Hot partition | A shard getting disproportionate traffic; mitigate with re-keying or salting |
How to Deploy This Knowledge in Interviews ​
- In R2 (coding deep-dive), your interviewer may ask the concurrency / DB question under the guise of a coding problem. Know the primitives and their cost.
- In R4 (HLD), every architectural choice you make will get probed: "why Redis over Memcached?", "why Kafka over RabbitMQ?", "why Cassandra over Postgres?" You answer in under 15 seconds with specific tradeoffs — that is the L5A bar.
- In HM (hiring manager round), use these terms to back the technical decisions in your stories. "We chose sagas over 2PC because..." lands.
Specifically at Uber, interviewers are looking for:
- Numbers, not adjectives. "Fast" is useless; "p99 < 10 ms" is a grade.
- Named tech. "A message queue" earns a follow-up; "Kafka with 64 partitions per topic, keyed by userId" ends it.
- Failure thinking. Every design must survive a node, AZ, or region going down.
Section 1: Databases ​
ACID ​
Definition. Atomicity (all or nothing), Consistency (DB invariants preserved), Isolation (concurrent txns don't interfere), Durability (committed writes survive crashes).
When to use. Any system where partial updates corrupt state: payments, inventory, trip lifecycle.
When NOT. Append-only event logs, analytics, eventually-consistent caches.
Follow-up. "How does Postgres implement durability?" -> WAL (write-ahead log) flushed to disk on commit; fsync cost is why synchronous_commit=off is a real tuning knob.
Isolation Levels ​
Four levels defined by SQL standard, plus snapshot isolation as a common extension.
| Level | Dirty Read | Non-Repeatable Read | Phantom | Typical impl |
|---|---|---|---|---|
| Read Uncommitted | Yes | Yes | Yes | No locks |
| Read Committed | No | Yes | Yes | Row locks released at statement end (PG default) |
| Repeatable Read | No | No | Yes (strict SQL) | Locks held until txn commit; PG uses snapshot |
| Serializable | No | No | No | SSI (PG) or strict 2PL |
Definitions of anomalies:
- Dirty read. Read uncommitted data from another txn.
- Non-repeatable read. Same row read twice returns different values.
- Phantom. Same predicate read twice returns different sets of rows.
- Write skew. Two txns each read a set, make disjoint writes based on it, both commit, invariant broken. Only Serializable prevents this.
When to use each. Read Committed is default and fine for most CRUD. Use Repeatable Read for read-heavy reporting queries that need a stable view. Use Serializable for multi-row invariants (double-booking, balance > 0).
Follow-up. "Show me a case where Repeatable Read is not enough." -> Write skew example: two doctors both on-call, each reads the set {on_call} sees 2 doctors, each marks themselves off; both commit; invariant (>= 1 on-call) violated.
Indexes ​
B-tree. Ordered, balanced, O(log n). Supports equality and range. Default for most RDBMS.
Hash. O(1) equality, no range. Postgres hash indexes became crash-safe only in PG10; rarely worth it over B-tree.
Covering index. Includes all columns a query needs; DB answers from index alone (index-only scan). Huge win for hot queries.
Composite index. Multi-column; leftmost-prefix rule — (a, b, c) serves queries on a, a,b, a,b,c, not b, c, or b,c.
Partial index. WHERE deleted_at IS NULL style; smaller, faster for that slice.
GIN / GIST. Postgres; full-text search, JSONB, geo.
When indexes hurt:
- Write amplification: each index doubles write cost.
- Small tables: planner ignores index.
- Low-cardinality columns (boolean): bitmap scan beats B-tree.
Follow-up. "I have SELECT * FROM trips WHERE driver_id = ? AND status = ? ORDER BY created_at DESC LIMIT 10. What index?" -> (driver_id, status, created_at DESC) composite; index answers the WHERE and provides ordering so no sort.
Normalization vs Denormalization ​
1NF. Atomic columns (no arrays in cells).
2NF. 1NF + non-key columns depend on the whole primary key.
3NF. 2NF + no transitive dependencies (non-key -> non-key).
When to denormalize. Read-heavy workloads where joins are expensive; reporting tables; materialized views. OLAP systems denormalize by default (star/snowflake schemas).
When to stay normalized. OLTP with frequent writes; update anomalies must be avoided.
Follow-up. "Uber stores each completed trip with the driver's name snapshot at trip time. Why not just join to drivers?" -> Historical audit (driver name can change; the receipt should show name at time of trip) and to avoid cross-shard joins at query time.
SQL vs NoSQL ​
Pick SQL when:
- You need multi-row transactions.
- Schema is stable and relational.
- You want ad-hoc queries with joins.
Pick NoSQL when:
- Single-entity access patterns (get by id / by user).
- Horizontal scale beyond a single machine (Cassandra, DynamoDB, Mongo).
- Flexible schema (document stores).
- Extreme write throughput.
Uber's mix. Schemaless (MySQL-backed but operated like NoSQL) for trips/payments. Cassandra for time-series. Redis for hot state. Postgres for internal tools.
NoSQL Types ​
- Document (MongoDB). JSON-like documents, secondary indexes, limited transactions. Good for flexible schema.
- Key-value (Redis, DynamoDB). Pure get/put. Simplest and fastest. Redis adds data structures (lists, sets, streams, hashes).
- Columnar (Cassandra, HBase, ScyllaDB). Rows grouped by partition key, sorted within partition by clustering key. Huge write throughput, eventual consistency tunable. Not for ad-hoc queries.
- Graph (Neo4j, JanusGraph). Traversal queries, relationship-heavy. Social graphs, fraud rings.
Sharding ​
Range-based. Partition by a key range (e.g., userId 0-1M -> shard A). Range queries are efficient. Hotspot risk at edges (recent time ranges hot).
Hash-based. shard = hash(key) % N. Uniform distribution. Range queries require scatter-gather.
Directory-based. A lookup service maps key -> shard. Flexible; adds a hop; lookup svc itself must be HA.
Consistent hashing. Nodes placed on a ring; keys hash to the ring and assigned to the next clockwise node. Adding/removing a node only remaps ~1/N of keys. Use virtual nodes (each physical node owns many points on the ring) to smooth distribution.
Hotspot mitigation:
- Key salting. Append a random prefix to hot keys to spread across shards.
- Splitting. Hot shard splits into two; rebalance.
- Caching. Offload reads from the hot shard into a cache tier.
Follow-up. "Hot shard on a celebrity user — what do you do?" -> salt the key by partition count, write writes to salted keys, aggregate on read; or extract hot user to its own dedicated shard pool.
Replication ​
Master-slave (single leader). Writes to master, replicated to slaves sync or async. Simple, consistent writes, reads can scale via slaves. Failover requires promoting a slave.
Master-master (multi-leader). Writes on any node. Requires conflict resolution (last-write-wins, CRDTs, vector clocks). Use for multi-region active-active.
Sync vs async. Sync replication blocks commit until replicas ack — strong durability, higher latency. Async replication returns on leader persist — low latency, risk of data loss on leader crash. Semi-sync (one replica acks) is a middle ground.
Follow-up. "What happens if your async replica is 10s behind and master dies?" -> lose up to 10s of writes; if promoted, clients may see data disappear ("read-your-writes" violated). Mitigations: semi-sync, external consistency check, or bounded staleness clients.
Transactions: 2PC, Sagas, Outbox ​
2PC (Two-Phase Commit). Coordinator asks all participants to prepare, then to commit. Blocking: participant awaiting commit holds locks indefinitely if coordinator fails. Rarely used in microservices.
Saga. Sequence of local transactions, each with a compensating action. Orchestration (a workflow engine drives) or choreography (services react to events).
Outbox pattern. To reliably publish an event alongside a DB write, write the event into an outbox table in the same transaction; a separate relay reads the table and publishes to Kafka. Solves the dual-write problem (DB commits, Kafka publish fails).
@Transactional
public void completeTrip(Trip trip) {
tripRepo.save(trip);
outboxRepo.save(new OutboxEvent("trip.completed", trip.getId(), payload));
// both rows committed atomically; relay picks up outbox rows
}await db.transaction(async (tx) => {
await tx.trips.update(trip);
await tx.outbox.insert({
topic: "trip.completed",
key: trip.id,
payload: JSON.stringify(trip),
});
});Follow-up. "Why not publish to Kafka first and DB second?" -> if Kafka publish succeeds and DB write fails, downstream consumers act on a non-existent trip. Outbox makes DB the source of truth.
Section 2: Caching ​
Cache Patterns ​
- Cache-aside (lazy loading). App reads cache; on miss, reads DB and populates cache. Default and simplest.
- Read-through. Cache library owns the DB read on miss. App only talks to cache. Centralizes cache policy.
- Write-through. App writes cache + DB atomically. Strong consistency; writes are slower.
- Write-behind (write-back). App writes cache; cache asynchronously flushes to DB. Fast writes; risk of data loss on cache crash.
- Refresh-ahead. Cache predicts expiry and refreshes before a miss. Good for predictable hot keys.
String value = cache.getIfPresent(key);
if (value == null) {
value = db.load(key);
cache.put(key, value, Duration.ofMinutes(5));
}
return value;async function getWithCache(key: string): Promise<string> {
const cached = await redis.get(key);
if (cached) return cached;
const value = await db.load(key);
await redis.set(key, value, "EX", 300);
return value;
}auto it = cache.find(key);
if (it != cache.end()) return it->second;
auto value = db.load(key);
cache.put(key, value, std::chrono::minutes(5));
return value;Eviction Policies ​
- LRU. Least recently used. Good for temporal locality; default for most systems.
- LFU. Least frequently used. Better for long-tail where some keys are accessed rarely but always useful; harder to implement efficiently (approximate LFU in Redis).
- FIFO. First in, first out. Simple, ignores access patterns.
- TTL. Time-based; evict when expired. Use alongside LRU/LFU.
Follow-up. "Which policy for a hot-trip cache?" -> LRU with short TTL; once a trip completes, its cache entry goes cold.
Consistency Models for Caches ​
- Cache-aside with TTL. Eventually consistent. Window of staleness = TTL.
- Write-through. Strong consistency between cache and DB; but reads can still be stale if the cache was populated before the write.
- Write-behind. Weakest: DB can lag cache.
- Invalidate-on-write. On DB write, delete the cache key. Next read rehydrates. Must handle the race (writer deletes cache; concurrent reader reads stale DB and repopulates cache with stale data). Redis
SET NXor version-stamped keys mitigate.
Thundering Herd / Cache Stampede ​
Problem. Popular key expires; thousands of concurrent requests miss cache, all hit DB, DB falls over.
Mitigations:
- Request coalescing (single-flight). Per-key mutex: one request fetches, others wait.
- Jittered TTL. Randomize TTL +/- 20% so keys don't all expire at once.
- Probabilistic early expiration. Refresh key before expiry with probability that increases as expiry approaches (XFetch algorithm).
- Bloom filter guard. Negative cache / bloom filter to reject definitely-missing keys before hitting DB.
- Lock on miss. First miss acquires a lock (SETNX in Redis), fetches, populates. Others poll or wait.
Follow-up. "Why not just cache forever?" -> no invalidation story; stale data forever; memory grows unbounded.
Distributed Caches: Redis vs Memcached ​
| Feature | Redis | Memcached |
|---|---|---|
| Data types | Strings, lists, sets, hashes, streams, geo, bitmaps | Strings only |
| Persistence | RDB + AOF | None |
| Replication | Primary-replica + Redis Cluster | None built-in |
| Transactions | MULTI/EXEC + Lua scripts | No |
| Pub/sub | Yes | No |
| Use cases | Almost everywhere | Pure simple cache |
Redis Cluster. 16,384 hash slots distributed across nodes; client-side routing; gossip-based topology. Multi-key ops limited to same hash tag {tag}:key.
Follow-up. "When Memcached over Redis?" -> when you want only a cache with simpler ops and zero features. Operational simplicity. Uber and most modern stacks use Redis.
Section 3: Concurrency ​
Primitives ​
- Mutex. Mutual exclusion; one holder at a time.
- Semaphore. Counting; up to N holders.
- Read-write lock. Many readers OR one writer. Good for read-heavy caches.
- Condition variable. Wait for a predicate to become true; paired with a mutex.
- Atomic operations. Lock-free primitives: CAS (compare-and-swap), fetch-and-add. Basis for lock-free data structures.
Java Specifics ​
synchronized. Intrinsic lock on an object monitor. Simple; less flexible.ReentrantLock. Explicit lock; supportstryLock, fairness,Conditionobjects.ReadWriteLock. Separate read and write locks; high-concurrency caches.ConcurrentHashMap. Segment-based locking (Java 7) / CAS + bin-level synchronization (Java 8+). Much faster thanCollections.synchronizedMap.BlockingQueue. Producer-consumer pattern;ArrayBlockingQueue,LinkedBlockingQueue,SynchronousQueue.CompletableFuture. Composable async primitives;thenCompose,thenCombine,allOf.
private final ReentrantLock lock = new ReentrantLock();
private final Condition notEmpty = lock.newCondition();
public void put(T item) {
lock.lock();
try {
queue.add(item);
notEmpty.signal();
} finally {
lock.unlock();
}
}
public T take() throws InterruptedException {
lock.lock();
try {
while (queue.isEmpty()) notEmpty.await();
return queue.poll();
} finally {
lock.unlock();
}
}std::mutex m;
std::condition_variable cv;
std::queue<T> q;
void put(T item) {
{
std::lock_guard<std::mutex> lk(m);
q.push(std::move(item));
}
cv.notify_one();
}
T take() {
std::unique_lock<std::mutex> lk(m);
cv.wait(lk, [&]{ return !q.empty(); });
T item = std::move(q.front()); q.pop();
return item;
}// Node is single-threaded; use a semaphore for concurrency limits
class Semaphore {
private pending: Array<() => void> = [];
constructor(private permits: number) {}
async acquire(): Promise<void> {
if (this.permits > 0) { this.permits--; return; }
return new Promise((res) => this.pending.push(res));
}
release(): void {
const next = this.pending.shift();
if (next) next(); else this.permits++;
}
}Race Conditions ​
- TOCTOU (time-of-check-to-time-of-use). Check a state, act on it; state changes in between.
- Check-then-act. Variant of TOCTOU (
if (!map.containsKey(k)) map.put(k, v)— two threads both find missing and both put). - Compound actions. Read-modify-write that must be atomic (increment a counter). Use
AtomicIntegeror a single DB UPDATE.
Deadlock — 4 Conditions (Coffman) ​
- Mutual exclusion. Resources held exclusively.
- Hold and wait. Holder waits for another resource while holding.
- No preemption. Resources can't be taken away.
- Circular wait. Cycle in the wait-for graph.
Prevention: break any one.
- Impose a global lock ordering to break circular wait.
- Use tryLock with timeout to break hold-and-wait.
- Use lock-free data structures to eliminate mutual exclusion.
- Detect deadlocks via the wait-for graph and abort one transaction (what Postgres does).
Follow-up. "Two threads, lock A then B, and lock B then A — design." -> always acquire in ascending id order.
Livelock and Starvation ​
- Livelock. Threads aren't blocked but make no progress (e.g., both back off and retry in sync).
- Starvation. A thread never gets the resource because others keep acquiring (unfair schedulers).
Mitigations: randomized backoff, fair locks (new ReentrantLock(true) in Java).
Actor Model ​
Definition. Concurrency unit is an actor — isolated state, message-passing, single-threaded processing of its mailbox.
When to use. Stateful entities with many independent instances (1M ride sessions). Akka (JVM), Erlang/Elixir, Orleans (.NET).
Uber relevance. Cadence workflows are actor-like: each workflow instance is independent and durable.
Event Loop ​
Definition. Single thread processes events from a queue; non-blocking I/O. Node.js and browser JS.
When to use. High-concurrency I/O-bound workloads (gateways, proxies). Not for CPU-bound work (blocks the loop).
Section 4: Messaging and Queues ​
Kafka ​
- Partitions. Topic split into N partitions; each partition is an ordered log. Parallelism unit.
- Offsets. Monotonic sequence within a partition; consumers track offsets.
- Consumer groups. One consumer per partition per group; rebalancing on membership change.
- Keys. Messages with the same key land on the same partition (preserves per-key ordering).
- Retention. Time-based or size-based; default 7 days.
- Compaction. Keep only the latest message per key (Kafka "topics-as-snapshot"; used for configs).
- Exactly-once. Enabled via idempotent producer (per-partition sequence numbers) + transactional API for consumer-producer pipelines.
When to use. High-throughput event streaming, log aggregation, CDC, event sourcing.
When NOT. Low-latency point-to-point RPC; simple work queues with complex routing (use RabbitMQ).
Follow-up. "How many partitions?" -> depends on expected throughput / consumer capacity. Rule of thumb: partitions >= max consumers per group; budget 1-10 MB/s per partition.
RabbitMQ ​
- Exchanges. Route messages to queues: direct, topic, fanout, headers.
- Queues. Durable, message acks required for delivery guarantee.
- Bindings. Connect exchange -> queue by routing key pattern.
- Acks. Consumer acks after processing; unacked returns to queue.
When to use. Task queues with complex routing, priority queues, per-message TTLs, RPC-over-queue.
When NOT. Replayable event streams (use Kafka); high-throughput firehose (Kafka throughput > RabbitMQ by orders of magnitude).
Delivery Semantics ​
- At-most-once. Fire and forget; may lose messages.
- At-least-once. Kafka default. May duplicate; consumer must be idempotent.
- Exactly-once. Requires idempotent producer + consumer transactionally updating offset + side-effect store. Possible but complex.
Idempotency ​
- Client idempotency key. Token supplied by client; server dedupes within window.
- Unique constraint on primary key. DB rejects duplicate insert.
- Dedup table. Separate table
(idempotency_key, response), checked first. - Natural idempotency.
SET x = 5(notx = x + 1).
public Response create(CreateReq req) {
String key = req.getIdempotencyKey();
Optional<Response> prior = dedupRepo.find(key);
if (prior.isPresent()) return prior.get();
Response resp = doCreate(req);
try {
dedupRepo.insert(key, resp); // unique on key
} catch (DuplicateKeyException e) {
return dedupRepo.find(key).orElseThrow();
}
return resp;
}async function create(req: CreateReq): Promise<Response> {
const prior = await redis.get(`idem:${req.idempotencyKey}`);
if (prior) return JSON.parse(prior);
const resp = await doCreate(req);
await redis.set(
`idem:${req.idempotencyKey}`,
JSON.stringify(resp),
"EX",
86400,
"NX",
);
return resp;
}DLQ and Retry ​
- Retry with exponential backoff.
delay = base * 2^attempt. - Jitter. Add randomness to avoid synchronized retries (thundering herd). Full jitter:
sleep(random(0, base * 2^attempt)). - Max attempts. After N, route to DLQ.
- DLQ consumer. Human review, alerting, or replay after bug fix.
Follow-up. "Why jitter?" -> without it, N clients retry at identical times, amplifying load. AWS wrote the canonical article on this.
Section 5: Networking ​
HTTP ​
- Methods. GET (safe, idempotent), POST (non-idempotent create), PUT (idempotent upsert), DELETE (idempotent), PATCH (partial update).
- Status codes. 2xx success, 3xx redirect, 4xx client error (400 bad request, 401 auth, 403 forbidden, 404 not found, 409 conflict, 429 rate limited), 5xx server error (500 internal, 502 bad gateway, 503 unavailable, 504 gateway timeout).
- Headers.
Cache-Control,ETag,If-None-Match(conditional GET),Authorization,Content-Type.
HTTP/1.1. Text protocol; one request per connection (or keep-alive with head-of-line blocking).
HTTP/2. Binary, multiplexed streams on one TCP connection, header compression (HPACK), server push (rarely used, being removed).
HTTP/3 (QUIC). UDP-based; avoids TCP head-of-line blocking at the transport layer; faster connection setup (0-RTT). Default for Google and increasingly mobile.
Follow-up. "Why did HTTP/2 not solve HoL blocking?" -> it solved application-layer HoL but TCP still has packet-loss HoL; HTTP/3 moves to QUIC over UDP to fix that.
WebSocket ​
Definition. Upgrade a single HTTP connection to a bidirectional framed protocol.
When to use. Server-push with low latency: chat, live location, notifications. Uber uses it for driver and rider streams.
When NOT. One-off request/response (HTTP is fine). Very short-lived interactions (SSE / long-poll may be simpler).
Alternatives. Server-Sent Events (server -> client only, auto-reconnect built in), long-poll, gRPC streaming.
gRPC ​
Definition. RPC framework over HTTP/2 with protobuf serialization.
Four streaming types:
- Unary. Single request, single response.
- Server streaming. Single request, stream of responses.
- Client streaming. Stream of requests, single response.
- Bidirectional streaming. Full duplex.
Why gRPC at Uber. Service-to-service default; efficient binary wire format; strong schemas; native cross-language codegen.
When NOT. Browser-facing APIs (gRPC-Web exists but clunky); simple CRUD where REST is enough.
DNS ​
- Resolution. Recursive resolver walks root -> TLD -> authoritative.
- Caching. TTL on every record; resolvers and clients cache aggressively.
- Load balancing. Multiple A records; round-robin at resolver.
GeoDNSfor region-based routing. SRV records for service discovery.
Follow-up. "Why is DNS TTL a knob for deployments?" -> short TTL (30s) lets you fail over quickly; long TTL (24h) increases cache hit rate but slows recovery.
TLS ​
- Handshake. ClientHello -> ServerHello + cert -> key exchange -> finished. TLS 1.3 reduces to 1-RTT (or 0-RTT resumption).
- Cert validation. Chain to a trusted CA; check expiry, hostname, revocation (OCSP).
- Mutual TLS (mTLS). Both sides present certs; standard for internal service-to-service auth at Uber.
Section 6: Distributed Systems Primitives ​
CAP Theorem ​
Under a network partition, you can have at most two of: Consistency, Availability, Partition-tolerance. Since partitions happen, the real tradeoff is CP vs AP when a partition occurs.
- CP. Payments, ledger, inventory — reject writes to preserve consistency.
- AP. Feed, search suggestions, trending — keep serving, reconcile later.
Follow-up. "Is Postgres CP or AP?" -> traditionally CP (single leader). With replication lag, reads from replicas can be AP-flavored. Spanner is CP globally via TrueTime.
PACELC ​
Extension: if Partition, choose A or C; else, choose Latency or Consistency.
- Dynamo: PA/EL (prefer availability in partition, low latency otherwise).
- Spanner: PC/EC (prefer consistency always).
Consistency Models ​
Strongest to weakest:
- Linearizable (strong). Operations appear to occur instantly in some global order.
- Sequential. Same global order but may not match real time.
- Causal. Preserves cause-effect; concurrent ops may differ across replicas.
- Read-your-writes. A client sees its own writes.
- Monotonic-read. A client never sees state go backward.
- Eventual. If no more writes, replicas converge.
Follow-up. "A user updates their profile pic and refreshes — they see the old one. Which model failed?" -> read-your-writes. Mitigate by routing the client to the primary or to the replica that acknowledged their write (session consistency).
Consensus ​
Definition. Multiple nodes agree on a single value despite failures.
- Paxos. Classical, hard to implement correctly.
- Raft. Leader-based, simpler than Paxos. Used by etcd, Consul, CockroachDB, TiDB.
- Zab. Used by ZooKeeper; similar to Raft.
When you need consensus:
- Leader election (who is the single writer).
- Distributed configuration (K/V that all nodes must agree on).
- Distributed locks.
- Stored offsets in exactly-once pipelines.
When you don't. Per-user state (just shard it); analytics (AP is fine).
Follow-up. "Why is consensus expensive?" -> majority quorum requires RTT to a majority of nodes; cross-region Paxos/Raft adds WAN latency to every write.
Time ​
- Clock skew. NTP keeps hosts within a few ms; untrusted in distributed correctness.
- Lamport timestamps.
t = max(local, received) + 1. Partial order. - Vector clocks. Per-node counter map; captures causality. Used by Dynamo, Riak.
- Hybrid Logical Clocks (HLC). Combine physical time with a logical counter; used by CockroachDB.
- TrueTime. Spanner's bounded-uncertainty clock using GPS + atomic clocks.
Follow-up. "Why not just use NTP timestamps?" -> skew of even 10 ms causes lost updates at high write rate; need logical ordering for correctness.
Leader Election ​
- Raft. Election triggered by heartbeat timeout; candidate requests votes; majority wins.
- ZooKeeper ephemeral nodes. Create ephemeral sequential node; lowest seq wins. On disconnect, node disappears and next takes over.
- etcd. Same role; TTL-based leases.
Distributed Locks ​
- Redis Redlock. Acquire lock across N Redis nodes; majority required. Martin Kleppmann critiqued it (GC pauses can break safety); Antirez responded. Know both sides.
- ZooKeeper. Ephemeral sequential nodes; well-known pattern. Safer than Redlock.
- etcd leases. TTL-backed leases; if client dies, lease expires.
Follow-up. "Can you build a correct distributed lock with Redis?" -> single-instance Redis lock is only as reliable as that instance. Redlock with fencing tokens (lock returns a monotonically increasing token, resource checks the token) mitigates GC-pause issues.
Rate Limiting (Distributed) ​
- Token bucket / leaky bucket. Standard algorithms.
- Sliding window. Fixed buckets lead to edge bursts; sliding window smoother.
- Implementation. Redis + Lua script for atomicity:
-- KEYS[1] = user key, ARGV[1] = limit, ARGV[2] = window seconds
local current = redis.call("INCR", KEYS[1])
if current == 1 then
redis.call("EXPIRE", KEYS[1], ARGV[2])
end
if current > tonumber(ARGV[1]) then
return 0
end
return 1Long allowed = redis.eval(
LUA_SCRIPT,
Collections.singletonList("rate:" + userId),
"100", "60"
);
if (allowed == 0) throw new RateLimitedException();Circuit Breaker ​
States. Closed (normal), Open (fail fast), Half-open (probe).
Rule. In closed, count failures. Above threshold over a window -> open. After cooldown -> half-open, allow N probes. If probes succeed -> closed; else -> open.
Use. Wrap every outbound network call to a dependency. Protects thread pools, surfaces failures fast.
Libraries. Hystrix (deprecated), Resilience4j (Java), Polly (.NET), go-circuit.
Bulkhead ​
Definition. Isolate resources per dependency — dedicated thread pool or connection pool per downstream service — so one failing dependency can't exhaust shared resources.
Example. Your API calls Pricing + Maps. Without bulkheads, a Maps slowdown eats the shared thread pool; Pricing calls starve. With bulkheads, Maps calls use their own pool; Pricing unaffected.
Section 7: Observability ​
Logs, Metrics, Traces ​
- Logs. Discrete events with context. High cardinality, hard to aggregate. Use for debugging.
- Metrics. Numeric time series. Low cardinality, cheap to aggregate. Use for SLOs and alerting.
- Traces. Per-request spans across services. Use to debug latency and failures across service boundaries.
Distributed Tracing ​
- Trace ID. Identifies a single user-facing request end-to-end.
- Span ID. One segment of work; nested inside trace.
- Propagation headers. W3C TraceContext (
traceparent,tracestate) or B3 (Zipkin). Gateway creates the trace; each service propagates headers on outbound calls.
Uber's Jaeger. Open-source, widely adopted. Sampling is critical (head-based or tail-based) since storing every span is infeasible.
Prometheus + Grafana ​
- Prometheus. Pull-based; scrapes
/metricsendpoint exposing counters, gauges, histograms. PromQL query language. Local storage limited; federate or remote-write to long-term (Thanos, Cortex, Uber's M3). - Grafana. Dashboards over Prometheus / other sources.
SLI / SLO / SLA ​
- SLI (Indicator). A measured metric (success rate, p99 latency).
- SLO (Objective). Internal target for the SLI (99.9% of requests succeed).
- SLA (Agreement). External contract with penalties (rare for internal services).
Error budget. 1 - SLO. If SLO is 99.9%, error budget is 0.1% of requests per month. Running low -> freeze deploys.
Follow-up. "Why not just aim for 100%?" -> cost grows nonlinearly; SLO should match user expectations. Zero budget means zero room for improvement.
Section 8: Security ​
AuthN vs AuthZ ​
- Authentication (who). Verify identity (login, JWT validation, mTLS cert).
- Authorization (what). Verify permissions (RBAC, ABAC, scopes on a token).
Do both, in that order, at the gateway.
JWT ​
- Structure.
header.payload.signature, base64url-encoded JSON. - Signing. HS256 (shared secret) or RS256 / ES256 (asymmetric; issuer signs, consumers verify with public key).
- Revocation. Hard because tokens are stateless. Options: short TTL + refresh tokens, revocation list, opaque tokens with introspection.
When to use. Short-lived API access tokens, service-to-service tokens.
When NOT. Long-lived sessions you want to revoke instantly; browser cookie-based auth is usually simpler.
Follow-up. "How do you log out a JWT?" -> you don't, unless you maintain a blacklist or use a token introspection endpoint. Preferred design: short access token (5 min) + server-side session for refresh.
OAuth 2.0 + OIDC ​
- OAuth 2.0. Authorization framework; delegated access.
- OIDC. Identity layer on top (ID token, user info).
Flows:
- Authorization Code. Web apps with backends; exchange code for token server-side.
- Auth Code + PKCE. SPAs and mobile; code challenge prevents code interception.
- Client Credentials. Service-to-service; no user.
- Device Code. TVs, CLIs without browser.
- Resource Owner Password. Deprecated; insecure.
- Implicit. Deprecated; replaced by PKCE.
mTLS ​
Definition. Both sides present certificates; each validates the other.
Use. Service-to-service in a zero-trust network. Istio and Linkerd bootstrap mTLS automatically.
Operational. Short-lived certs (hours) issued by internal CA (e.g., SPIFFE/SPIRE).
Secrets Management ​
- Vault (HashiCorp). Central secrets store with dynamic creds (DB passwords rotated on every issue).
- KMS (AWS/GCP). Key-management for encryption keys; apps call KMS to encrypt/decrypt.
- Principles. Never commit secrets to git. Rotate regularly. Audit access.
Section 9: Scaling Patterns ​
Horizontal vs Vertical ​
- Vertical. Bigger machine. Easy, limited, single point of failure.
- Horizontal. More machines. Harder (requires stateless services or sharding), unlimited ceiling.
Default to horizontal for stateless services; vertical is acceptable for single-writer databases until sharding pays off.
Load Balancers ​
- L4 (transport). Balance TCP/UDP flows. Fast; no application awareness. AWS NLB, HAProxy in TCP mode.
- L7 (application). HTTP-aware; route by path, header, cookie. TLS termination. AWS ALB, Envoy, NGINX.
Algorithms:
- Round-robin. Simplest.
- Least connections. Fairer for uneven request durations.
- Consistent hashing. Sticky by key (useful for caches, session affinity).
- Weighted. Assign capacity weights to heterogeneous backends.
Follow-up. "When would you use L4 over L7?" -> pure TCP services (databases, gRPC without path routing), or when you need maximum throughput with no per-request overhead.
CDN ​
Definition. Geographically distributed edge cache for static (and some dynamic) content.
- TTLs. Per-asset. Short for frequently-changing, long for static.
- Purge. Invalidate on demand when content changes; can be slow (tens of seconds).
- Edge compute. Cloudflare Workers, Lambda@Edge for dynamic logic at edge.
Sharding (Cross-Reference) ​
See Section 1. Remember: sharding key choice dominates everything. For Uber, sharding by userId works for most user-scoped data; for global state (marketplace), shard by geo (H3 cell).
Hot Partition Detection and Mitigation ​
- Detection. Per-shard QPS, CPU, and storage dashboards. Alert when one shard > 2x median.
- Mitigation.
- Rebalance: split the shard.
- Salt the key: prepend random bytes to hot keys and read from all variants.
- Cache: offload read traffic.
- Dedicated node: move the hot key to its own shard.
Section 10: Architecture-Level Design Patterns ​
CQRS (Command Query Responsibility Segregation) ​
Definition. Separate the write model (commands) from the read model (queries). Each optimized independently.
When to use. Read-heavy with complex query needs (reporting, dashboards) on top of write-focused domain model. Often paired with event sourcing.
Pitfalls. Two data stores to maintain; eventual consistency between write and read sides. Don't adopt it for simple CRUD — it's overkill.
Follow-up. "How do you keep the read side in sync?" -> subscribe to domain events (Outbox pattern -> Kafka -> read projection).
Event Sourcing ​
Definition. Persist every state-changing event as an immutable record; derive current state by replaying events.
Pros. Audit log by default; time-travel; easy to add new projections.
Cons. Complexity; migrations of event schema; storage grows unboundedly without snapshots.
Snapshots. Periodically persist derived state to skip replaying from the start.
When to use. Payments, trip lifecycle — anywhere audit is required. Uber's Cadence is event-sourced internally.
Saga Pattern ​
Orchestration. Central coordinator (workflow engine) drives each step. Easier to reason about, single source of truth.
Choreography. Each service reacts to events and emits new events. No central coordinator. Harder to debug flows.
At Uber: Cadence workflows are orchestrated sagas. Prefer orchestration for complex multi-step flows like trip lifecycle, refunds, driver onboarding.
Outbox Pattern ​
See Section 1. This is how you reliably publish events from a DB transaction to a message bus.
Strangler Fig ​
Definition. Gradually migrate from a legacy system by routing a subset of traffic to the new system, expanding the subset over time until the legacy can be deleted.
When to use. Large monolith decomposition. Uber's move from a Python monolith to Go/Java microservices was a textbook strangler-fig migration.
Tools. Feature flags, canary routing, service mesh route rules.
Interview-Ready Phrases ​
Drop these verbatim when asked "how would you..." and an interviewer will nod.
- "Idempotency key in Redis with a 24-hour TTL backed by a unique constraint in the DB."
- "Consistent hashing with virtual nodes to bound remapping to 1/N on topology changes."
- "Saga orchestrated by Cadence with compensating activities for each step."
- "Outbox pattern to atomically write state and publish to Kafka."
- "Circuit breaker with half-open probing and a 50-ms-bucketed failure rate window."
- "L7 load balancer with consistent hashing on user id for sticky caching."
- "Write-through cache with jittered TTL to prevent stampedes."
- "Tail-based sampling for tracing so every slow request is captured."
- "PACELC: we choose PC/EC — consistency always for ledger writes."
- "mTLS with SPIFFE identities, rotated hourly."
Final Study Tips for Uber ​
- Know Uber's own tech names cold: H3, Cadence, Ringpop, Schemaless, M3, Jaeger, Peloton, Pinot.
- Every answer has a number. If you say "low latency", follow with "p99 < 50 ms".
- Know at least one failure mode per technology you mention. "Redis goes down — here's what happens."
- Read the Uber Engineering blog for a week before your interview. Recent posts often surface in loops.
- Pair this file with the HLD problem files (50-56) and practice whiteboarding 3-4 full HLDs out loud. Draw the components, speak the numbers, commit to tradeoffs.