HLD: Uber Ride Matching Service ​
Frequently Asked at Uber — Uber's signature system design problem; reported in nearly every L5A loop.
Understanding the Problem ​
What is the Ride Matching Service? ​
The ride matching (or dispatch) service is the brain of Uber. A rider opens the app, taps "Request", and within about 2 seconds the system must pick the best available driver nearby, notify both sides, and start a trip. It sits at the intersection of real-time geospatial indexing, streaming data, stateful sharding, and hard consistency (a driver must never be double-booked). Because every line of Uber's business depends on it, interviewers treat this as a vehicle to probe your ability to reason about latency, throughput, and partial failure at millions-of-events-per-second scale.
Functional Requirements ​
Core (above the line):
- Rider requests a ride at a pickup location with a destination.
- System finds the optimal nearby driver (typically within 5 minutes ETA).
- Driver receives the offer and accepts/declines within a timeout (~15s).
- On accept, both sides receive real-time location updates until pickup.
- Trip state transitions:
REQUESTED -> MATCHED -> EN_ROUTE -> ARRIVED -> IN_TRIP -> COMPLETED.
Below the line (out of scope):
- Pricing / surge — separate service (consumed via RPC).
- ETA computation — separate service, consumed via RPC.
- Payments — separate service, invoked on trip completion.
- Fraud detection, driver onboarding, ratings — not on the hot path.
- Multi-stop trips and Pool/shared rides — call out that you would handle with a dedicated matcher.
Non-Functional Requirements ​
Core:
- Scale — 25M trips/day, peak 10K requests/sec globally, 1–2K/sec in the largest city. 6M active drivers pinging at 4 Hz.
- Latency — p99 dispatch decision < 2s from request to driver-offered. Location write < 100ms.
- Consistency — A driver cannot be double-booked (strong consistency on driver state). Location data is eventually consistent (1–2s staleness tolerated).
- Availability — 99.99%. A city going dark for 5 minutes is a board-level incident.
- Geo — Active-active across regions; a rider request is served in the nearest region.
Below the line:
- Analytical queries (dashboards, retention curves) — offline pipeline.
- Historical route replay — data warehouse.
Capacity Estimation ​
Write these on the whiteboard:
- Location writes: 6M drivers x 4 pings/sec x peak factor 0.5 = 12M writes/sec global; 100K–500K/sec per region.
- Match QPS: 10K requests/sec globally; at ~5 driver candidates per match, 50K driver-reads/sec.
- Trip records: 25M/day x 2KB = 50 GB/day ~ 18 TB/year in Schemaless.
- Location stream: 12M writes/sec x 200 B = 2.4 GB/sec ~ 200 TB/day (retain 24–72h hot, archive to cold).
- WebSocket fan-out: 5M concurrent trips x 200 B at 1 Hz = 1 GB/sec outbound.
The Set Up ​
Core Entities ​
| Entity | Description |
|---|---|
| Rider | userId, currentLocation, paymentMethod |
| Driver | driverId, vehicleId, status (OFFLINE/AVAILABLE/ON_TRIP), location, h3CellId |
| Trip | tripId, riderId, driverId, pickup, dropoff, state, createdAt |
| Offer | offerId, tripId, driverId, expiresAt, status (PENDING/ACCEPTED/DECLINED/EXPIRED) |
| LocationPing | driverId, lat, lng, heading, speed, timestamp |
API Design ​
We use REST on the client edge (for easy mobile SDKs) and gRPC internally between services. Every write carries an idempotency key because mobile networks retry aggressively.
// Rider-facing
POST /v1/trips
Headers: Authorization, Idempotency-Key
Body: {
pickup: { lat: number; lng: number };
dropoff: { lat: number; lng: number };
productType: "UberX" | "Comfort" | "Black";
}
Response 202: { tripId: string; state: "REQUESTED" }
GET /v1/trips/:tripId
Response 200: { tripId; state; driverId?; etaSeconds?; ... }
// Driver-facing
POST /v1/drivers/me/location
Body: { lat; lng; heading; speed; ts }
Response 204
POST /v1/offers/:offerId/accept
Response 200: { tripId; pickup; ... }
// WebSocket channel for real-time
WS /v1/stream?token=...
-> server pushes { type: "OFFER" | "LOCATION_UPDATE" | "STATE_CHANGE", ... }service Dispatch {
rpc RequestTrip(RequestTripReq) returns (RequestTripResp);
rpc GetTrip(GetTripReq) returns (Trip);
rpc AcceptOffer(AcceptOfferReq) returns (AcceptOfferResp);
rpc StreamDriverLocation(stream LocationPing) returns (google.protobuf.Empty);
rpc SubscribeTripEvents(SubscribeReq) returns (stream TripEvent);
}
message RequestTripReq {
string idempotency_key = 1;
LatLng pickup = 2;
LatLng dropoff = 3;
string product_type = 4;
}auto channel = grpc::CreateChannel("dispatch.uber.internal:443",
grpc::SslCredentials({}));
auto stub = Dispatch::NewStub(channel);
RequestTripReq req;
req.set_idempotency_key(uuid);
req.mutable_pickup()->set_lat(lat);
req.mutable_pickup()->set_lng(lng);
RequestTripResp resp;
grpc::ClientContext ctx;
ctx.set_deadline(std::chrono::system_clock::now() + std::chrono::seconds(2));
auto status = stub->RequestTrip(&ctx, req, &resp);High-Level Design ​
+----------+ +------------------+ +--------------------+
| Rider | --> | API Gateway (L7) | --> | Trip Service |
| app | | (mTLS, authZ) | | (Cadence workflow) |
+----------+ +------------------+ +--------------------+
|
v
+----------+ +------------------+ +--------------------+
| Driver | --> | Connection Svc | <-- | Dispatch Service |
| app |<--- | (WebSocket GW) | | (matching engine) |
+----------+ +------------------+ +--------------------+
^ |
| v
+--------------+ +-------------------+
| Fan-out svc | | Location Service |
| (Kafka topic |<--------| (Ringpop-sharded |
| per region) | | by H3 cell) |
+--------------+ +-------------------+
|
v
+-------------------+
| In-memory geo |
| index per shard |
| (H3 res 9 cells) |
+-------------------+End-to-End Flow for a Trip Request ​
- Rider app calls
POST /v1/tripswith an idempotency key. - API Gateway authenticates (JWT), applies rate limits, and routes to Trip Service.
- Trip Service starts a Cadence workflow (
TripWorkflow) keyed ontripId. Writes initialREQUESTEDstate to Schemaless (source of truth). - Workflow invokes Dispatch Service:
FindDrivers(pickup, productType). - Dispatch computes the H3 cell for pickup (resolution 9, ~170m hex), queries the Ringpop shard that owns that cell, and returns top-K candidate drivers ranked by ETA.
- Dispatch offers the best driver via Connection Service (WebSocket push). Sets a 15s timer in Cadence.
- Driver accepts → Dispatch writes
MATCHEDstate, notifies rider via Connection Service. If driver declines or times out, the offer cascades to the next candidate. If all K are exhausted, expand search radius and retry. - For the rest of the trip, driver location pings (4 Hz) flow Connection Service → Location Service → Kafka → Fan-out Service → the rider's WebSocket.
- On trip completion, Cadence kicks off the billing workflow.
Why These Components ​
- Cadence gives us durable state machines. A TripWorkflow survives host crashes and picks up where it left off — critical when offers span multiple seconds.
- Ringpop pins each H3 cell and each driverId to exactly one node so state mutations are serialized without distributed locks.
- Connection Service is a dedicated WebSocket tier that insulates business services from FD limits and mobile-connection churn.
Data Model Detail ​
It is worth noting the physical layout because interviewers probe it:
- Trip table (Schemaless): Partition key
tripId. Columns:riderId,driverId?,state,pickup,dropoff,product,createdAt,updatedAt,surgeToken,fareCents. Secondary index onriderIdfor history. - Driver status (Redis / Ringpop): Key
driver:{id}. Fields:status,vehicleId,currentLocation,h3Cell,lastPingTs,ownedByNode. TTL 90s — stale entries are treated as offline. - Offer table (Cassandra): Partition key
tripId, clustering keyofferSequence. Stores every offer sent with its fate (accepted/declined/expired). Useful for dispatch analytics and driver-fairness audits. - Location history (Cassandra): Partition key
(driverId, bucket=dayHour), clustering keyts. TTL 72h. Bucketed partitions prevent wide-row hot partitions on long-driving drivers.
Capacity Walkthrough ​
Let's size a single region (e.g., NYC):
- 100K active drivers × 4 pings/sec × peak factor 0.6 = 240K writes/sec in the region.
- 50 Ringpop Location Service nodes (from 500 global), each holding ~2000 drivers and handling ~5K writes/sec — fits comfortably in a modern core.
- H3 res 9 cells per region: ~200K active cells at any time. Each node owns ~4K cells, ~500 drivers in those cells.
- Match QPS in region: 2K/sec peak × 5 candidates = 10K candidate lookups/sec; negligible at the in-process speed of a hash map.
- Trip Service: 2K requests/sec × average ~150ms end-to-end hold = 300 concurrent workflows. Each Cadence workflow uses ~few KB of history; total ~1 GB active histories per region, persisted in Cadence shards.
Potential Deep Dives ​
1) How do we find the nearest drivers at scale? ​
You are asking 50K driver-reads/sec to search over 6M live driver locations that each update 4x/sec. A naive geo-query is catastrophic.
Bad Solution — SQL table + bounding box ​
- Approach: Store
(driverId, lat, lng)in a SQL table. On each match,SELECT * FROM drivers WHERE lat BETWEEN ... AND lng BETWEEN ...and compute haversine in app code. - Challenges: Fails past a few hundred drivers per city: full index scans, no hotspot isolation, and writes compete with reads. Location writes alone saturate the DB within minutes.
Good Solution — Geohash or PostGIS ​
- Approach: PostgreSQL + PostGIS with a
GISTindex onGEOGRAPHY(POINT). Or bucket drivers by geohash prefix. - Challenges: Decent for ~10K QPS but writes still compete with reads. Geohash edges cause "nearest neighbor" gaps — two drivers a meter apart can sit in different top-level cells.
Great Solution — H3-indexed, Ringpop-sharded in-memory geo index ​
- Approach:
- Each driver's location ping is tagged with its H3 res-9 cell id (computed client-side or at ingress).
- Ringpop consistent-hashes H3 cells onto Location Service nodes. Each node holds an in-memory map
H3CellId -> List<DriverState>. - To find nearest drivers, compute the H3 cell of pickup, call
kRing(cell, k=2)to return the cell plus all hexagonal neighbors within 2 rings (19 cells), fetch driver lists from those owning nodes, and rank by precise haversine + real-time ETA.
- Numbers: 12M writes/sec distributed across ~500 Ringpop nodes = 24K writes/sec/node, trivially handled in-process. Candidate lookup sub-10ms p99. Writes are lock-free per cell with a single-writer pattern per shard.
- Challenges: Hot cells (airports, stadiums) concentrate load on one node. Mitigate by cell-splitting to res 10 (~66m edge) when per-cell QPS exceeds a threshold, or by replicating a hot cell across N nodes with a scatter-gather read.
2) Matching algorithm — greedy or batched? ​
Once you have candidates, how do you pick which driver goes to which rider? Greedy is easy but globally wasteful.
Bad Solution — First-come-first-served nearest driver ​
- Approach: On every request, grab the single closest available driver.
- Challenges: Locally optimal but globally wasteful. A driver 30s from rider A might be 10s from rider B who shows up 5s later.
Good Solution — Greedy with a scoring function ​
- Approach: Score =
alpha * eta + beta * driverAcceptanceRate - gamma * idleTime. Balances ETA, quality, and fairness. - Challenges: Still myopic. Works at low QPS but leaves aggregate wait time on the table.
Great Solution — Batched matching with Hungarian algorithm ​
- Approach: Buffer requests for 1–2 seconds per geo region. Build a bipartite graph of pending riders × available drivers in the region. Edge weight = expected total wait time + deadhead distance. Solve with the Hungarian algorithm (
O(n^3)) for up to ~500×500. Beyond that, use auction algorithms or LP relaxation. - Numbers: Adds 1–2s latency to offer but Uber's internal data shows 5–15% aggregate ETA reduction at city scale.
- Challenges: For tail regions with sparse supply, bypass batching and fall back to greedy — don't make a rider wait when only one driver is available. Also need per-region window tuning so dense markets get more batching than rural ones.
3) How do we prevent double-booking? ​
A driver accepting two offers at once is the worst failure mode — both riders are mad and the driver can only serve one.
Bad Solution — Read-modify-write in app code ​
- Approach: Read driver state, mark
BOOKEDin app, write back. - Challenges: Classic TOCTOU. Two dispatcher threads can both read
AVAILABLEand both writeBOOKED. Rider A and rider B get the same driver.
Good Solution — Optimistic concurrency ​
- Approach: Version-stamp driver state.
UPDATE driver SET status='BOOKED', version=version+1 WHERE id=? AND version=? AND status='AVAILABLE'. Retry on conflict. - Challenges: Works but creates contention at high QPS in hot cells. Retry storms during airport surges.
Great Solution — Ringpop-pinned state + Cadence workflow ​
- Approach:
- Driver state is owned by a single Ringpop node (consistent-hashed by
driverId). All state mutations are serialized per driver on that node — no locks needed across the cluster. - The Cadence
TripWorkflowis the authoritative state machine for the trip itself; it issues offers via activities and transitions trip state deterministically. If the Ringpop node fails, Ringpop reassigns the range and the new owner replays state from Schemaless.
- Driver state is owned by a single Ringpop node (consistent-hashed by
- Challenges: During net partitions you get brief unavailability rather than inconsistency — the gossip-layer invariant is "one owner at a time". Worth explicitly noting this CAP choice: we prefer CP for driver state.
4) How does TripWorkflow handle offer timeouts and cascades? ​
A driver has 15 seconds to accept an offer. If they don't, we need to cascade to the next candidate without losing time.
Bad Solution — Polling in dispatch thread ​
- Approach: Block the dispatch request thread for 15s waiting on an acceptance.
- Challenges: Thread starvation at any respectable QPS. Cannot free the worker for other requests.
Good Solution — Timer wheel in Dispatch Service ​
- Approach: Register a timer callback per offer; when it fires, move to the next candidate.
- Challenges: Timer state is lost on crash. If Dispatch restarts during an offer, the rider is stuck.
Great Solution — Cadence TripWorkflow with durable timers ​
- Approach:
TripWorkflowis a durable state machine. Issuing an offer is an activity with aworkflow.Sleep(15s)after.- On activity return or sleep expiry, the workflow transitions to the next candidate or fails the request.
- If the Cadence worker crashes, the workflow resumes on a new worker from the last checkpoint — no lost timers.
- Rider sees a responsive "finding a driver" state backed by a long-lived tripId.
- Challenges: Cadence history size grows with cascades; use
continue-as-newafter ~500 events to keep histories bounded.
5) How do we avoid sending the same rider back-to-back to the same unresponsive driver? ​
If a driver's phone is wedged, we'll burn 15 seconds offering them the trip. We must not do this repeatedly.
Good Solution — Blacklist per trip ​
- Approach: The Dispatch Service tracks offered-but-declined drivers per trip and skips them.
- Challenges: Works for a single trip but offers no memory across trips; the bad driver cycles back into the pool minutes later.
Great Solution — Rolling driver responsiveness score + circuit breaker ​
- Approach:
- Per-driver rolling 10-minute acceptance rate, published to Redis.
- Drivers below threshold (e.g., 2 consecutive no-response events) are suppressed for 5 minutes.
- Connection Service health pings double-check: if WebSocket has been idle > 60s despite location pings arriving, warn dispatch.
- Challenges: Need to distinguish buggy clients from legitimate misses (driver carrying luggage, driver in a bad cell). Pair with a gentle reactivation test offer after cooldown.
6) How do we handle airport/stadium hotspots? ​
Drop zones at airports can concentrate 500+ drivers into a single H3 cell.
Good Solution — Cell splitting to higher resolution ​
- Approach: When a cell exceeds a threshold, automatically split to res 10 (~66m) internally. Dispatch queries span 7 sub-cells.
- Challenges: Split/merge overhead. Need stable cell ownership during migration to prevent double-matching.
Great Solution — Per-cell replication with scatter-gather ​
- Approach:
- Hot cells are replicated across N=3 nodes with a designated primary for writes.
- Reads scatter to all N and merge locally.
- Control plane auto-promotes cells hitting an ops threshold (QPS or driver count) and demotes them after a cooldown.
- Challenges: Divergence risk during a primary failover — use fencing tokens to ensure the old primary stops accepting writes.
6.5) How do we support a multi-region active-active topology? ​
A request in Paris must not wait on latency to us-east.
Good Solution — Region-local Dispatch + async global replication ​
- Approach: Each region runs its own Dispatch and Location Services. Async replication via uReplicator sends trip data to a global analytics cluster. No cross-region calls on the hot path.
- Challenges: A driver near a regional boundary (rare) might be discoverable only in their home region; acceptable tradeoff.
Great Solution — Region-local with data-residency guards ​
- Approach:
- Riders and drivers pinned to their home region based on auth region.
- Cross-region requests rare (roaming rider); handled by a thin global gateway that routes to the rider's home region.
- Data-residency laws (India, Brazil) enforced by keeping storage in-region and failing over to in-country DR sites.
- Cadence clusters are per-region; workflows don't cross region boundaries.
- Challenges: Roaming users cannot use local-currency trips without a cross-region handoff. Document this as an explicit product decision.
7) How do we fan out real-time events to 5M concurrent trips? ​
Every matched trip pushes driver locations to the rider at ~1 Hz. That is a persistent-connection fan-out problem, not a throughput problem.
Bad Solution — Direct socket per service ​
- Approach: Each backend service opens WebSockets directly to clients.
- Challenges: Fails at tens of thousands of connections — epoll limits, FD exhaustion, no broadcast primitive.
Good Solution — Dedicated Connection Service tier ​
- Approach: Node.js or Netty service that holds persistent WebSocket connections and exposes an internal pub-sub. Dispatch publishes to
trip:{tripId}; Connection Service routes to the right socket. - Challenges: Scales to ~100K connections per node. Loses state if a node crashes, taking thousands of users offline briefly.
Great Solution — Sharded gateway with Kafka-backed fan-out ​
- Approach:
- 500 Connection Service nodes × 200K connections/node = 100M concurrent capacity (Uber's public number).
- Each node subscribes to a partition of the
trip-eventsKafka topic. Producer writes events keyed bytripIdso all events for a trip land on one partition. - For presence: when a client connects, the node writes
(userId -> nodeId)into a Redis hash. Publishers look upnodeIdand push to a per-node internal gRPC stream. - Heartbeat every 30s; dead connections are cleaned up via Redis TTL.
- Challenges: Ingress writes (driver pings) fan out to at most 1 recipient (the matched rider), so fan-out ratio is small — the real challenge is connection count, not fan-out depth. For fleet-ops dashboards watching thousands of drivers in a city, use wildcard subscriptions keyed by H3 cell rather than per-driver pub-sub.
Alternatives Considered ​
- Quadtree vs H3: Quadtrees have variable cell depth, useful for adaptive density but don't have uniform neighbor distances. H3 wins for ride matching because of
kRinguniformity. - S2 cells (Google): Similar to H3 in spirit, spherical, but squares not hexagons. H3 is the Uber house choice.
- Global Dispatch pool vs regional: A global matcher could in theory optimize across cities, but network latency makes it infeasible. Regional is correct.
- Strong consistency store (Spanner) vs sharded + gossip (Ringpop): Spanner-style gives you correctness for free but at 10x the latency budget. Ringpop trades brief unavailability on partitions for normal-case ms latency.
- DB-first (Postgres with GIST) vs in-memory geo index: DB-first is easier to reason about and crashes-safe, but read QPS and write amplification make it impractical at 12M writes/sec.
Frequently Asked Follow-ups ​
- "What if two riders request from the same pickup cell at the same millisecond?" — They both go through Dispatch; Ringpop serializes state mutations per driver, so the first one to mark a driver BOOKED wins. The second cascades.
- "How do you handle drivers going offline mid-trip?" — Connection heartbeat miss → Connection Service publishes
driver.disconnectedevent. Trip Service reacts: continue trying to reconnect for 60s; if no reconnect, treat trip as needing rider intervention (cancel with no fee, rematch). - "What about latency for a rider in a foreign country (e.g., roaming)?" — DNS geo-routes to the nearest region; the rider is served region-local. Cross-region calls are avoided in the hot path.
- "How do you roll out a new matching algorithm safely?" — Shadow-mode: run new algorithm in parallel, log its decisions but don't act. Measure aggregate ETA and driver-wait metrics. Canary at 1% → 10% with per-city feature flags. Keep previous algorithm warm for rollback.
- "What if Cadence itself is down?" — Cadence runs on multiple shards with Raft; full outage is rare. Degraded mode: fall back to in-memory state machines in Trip Service with loss-of-durability acknowledged to ops.
What's Expected at Each Level ​
Mid-level (L4) ​
- Sensible API, single-region design, correct basic components (dispatch svc, location svc, DB).
- Can discuss geohashing and the need for an index. Picks a reasonable storage choice.
- Misses sharding subtlety; may need prompting on double-booking and on fan-out cost.
Senior (L5 / L5A) ​
- Names H3 or explains why hexagons beat squares. Proposes sharding strategy (Ringpop or equivalent) and discusses hot-shard mitigation.
- Understands the double-booking problem and resolves it explicitly with versioning or single-owner serialization.
- Back-of-envelope QPS with units written on the board.
- Discusses at least one partial-failure scenario — e.g., what happens if Dispatch Service crashes mid-offer (Cadence resumes the workflow; offer-timeout fires and we move to the next candidate).
Staff+ (L6) ​
- Multi-region active-active with data-residency and failover story (e.g., a rider in EU is served from the EU region; Cadence workflows are region-local; cross-region replication is async via uReplicator).
- Discusses matching optimality (batched vs greedy) with quantitative tradeoffs.
- Designs for tail latency (p99.9) and cost. Proposes canary deployments for algorithm changes with shadow traffic.
- Can argue for or against Cadence vs a bespoke state machine (e.g., Uber pre-2017 used Riak-backed state machines — Cadence won because durability + retries + timers are built in).
Common Pitfalls ​
- Skipping capacity estimation. Don't start drawing boxes before writing QPS and storage numbers on the board. Uber interviewers deduct for this every time.
- Waving hands on geohash vs H3. If you say "geohash" without naming the edge-case problem (two nearby points in different cells), senior interviewers will push back.
- Forgetting Cadence. Any state machine over > 2s deserves Cadence; hand-rolling trip lifecycle on Redis is a red flag at L5A.
- Ignoring the accept-timeout cascade. Many candidates stop at "dispatch the nearest driver". The interesting engineering is offer timeouts, cascades, and the feedback loop.
- Assuming one region. Global active-active is table stakes for Uber.
- Missing the double-booking failure mode. If you don't proactively address it, expect a targeted probe.
Walkthrough: Interview Dialogue Example ​
Interviewer: "Walk me through what happens in the first 500ms of a trip request."
You should answer:
- 0ms: Rider app calls
POST /v1/trips. API Gateway validates JWT (~5ms with cached key), applies a rate limit per user, routes to Trip Service. - 20ms: Trip Service starts a Cadence workflow. The workflow's first activity writes the REQUESTED state to Schemaless (~10ms).
- 40ms: Workflow calls Dispatch Service. Dispatch computes H3 cell for pickup (< 1ms), contacts the owning Ringpop node via gRPC (~2ms RTT intra-DC).
- 60ms: Ringpop node does
kRing(cell, 2)to collect 19 cells and gathers driver lists (sub-ms in memory). For a dense city, that's ~500 candidates. - 80ms: Dispatch calls ETA Service for the top 20 by haversine distance. Bulk ETA returns in 50ms.
- 140ms: Dispatch ranks candidates, issues an offer to the top one via Connection Service (WebSocket push). Sets a 15s timer in Cadence.
- 150ms onward: We're waiting for the driver. If no accept in 15s, cascade.
The entire path from request to offer is ~150ms in the happy path; the remainder of the 2s budget is offer-acceptance slack.
What If They Pivot Mid-Interview? ​
- "Now design Uber Pool / shared rides." — Separate matcher with a combinatorial component: group compatible riders based on pickup/drop-off overlap. Optimization objective changes from pure ETA to expected total detour. Still reuses the H3-based candidate lookup.
- "Handle Uber Eats / courier dispatch." — Replace rider with order, driver with courier. Additional constraints: restaurant prep time, multi-pickup routing, cold-food penalty. Batched matching becomes more valuable.
- "What if we wanted to dispatch based on driver earning fairness over time?" — Add a
fairness_scoreto the matching scoring function using recent earnings history. Cadence workflow maintains per-driver earning stats. - "How would you handle autonomous vehicles?" — AV have different state machines (no accept/decline, no acceptance rate). Replace that subtree of the workflow. Cells with AV coverage get AV-first preference policies.
Reliability and Observability ​
Reserve the last 3 minutes for these:
- SLO statement: "99.99% availability, p99 dispatch < 2s, p99 location write < 100ms."
- Failure modes:
- Single-node: Ringpop reassigns the key range to a peer within 5s; TripWorkflow replays from Schemaless.
- Datacenter: active-active regions; DNS failover re-routes traffic, but in-flight trips in the failed DC complete via their last-known state in Schemaless (cross-region replicated via uReplicator).
- Region: full-region DNS flip with a 2-minute RTO; sessions reconnect via long-backoff.
- Deployment: Canary 1% → 10% → 100% with shadow traffic on matching changes. Feature flags behind a control plane so emergency disables take seconds.
- Monitoring: Golden signals per service (latency, traffic, errors, saturation). Per-city dashboards for matching success rate, median ETA, driver acceptance. M3 + Jaeger end-to-end.
- Runbooks: Hot-cell detection → auto-split to res 10; Kafka lag → pause analytics consumers, preserve hot path; PSP timeouts → circuit breaker trips.
Uber-Specific Notes ​
- H3 is the canonical answer for any geo-indexing question at Uber. Know that res 9 ≈ 174m average edge, res 10 ≈ 66m.
kRing(cell, k)andpolyfillare the two functions you should mention. - Ringpop is used across Uber for stateful services that need application-layer sharding — mention it by name. Alternatives the interviewer may probe: Raft-based sharded services (like etcd), or pure hash-ring libraries (like libketama).
- Cadence/Temporal for the trip lifecycle is the modern Uber pattern. Pre-2017 Uber used a bespoke state machine on top of Riak.
- Schemaless is the trip system-of-record; Cassandra for high-write location history; Redis for real-time caches.
- Mention Jaeger for tracing a single trip across Trip Service → Dispatch → Location → Connection. Every trip gets a root span on request and child spans for each RPC.
- Be ready to explain why hexagons beat squares: hexagons have uniform neighbor distances (six neighbors all equidistant), which makes
kRingsearches produce circular zones without the distortion you get with squares. - Close with: "SLO is 99.99% availability, p99 < 2s dispatch. On datacenter failure we fail the region over via DNS; on single-node failure Ringpop reassigns ownership in under 5 seconds. We've quantified our matching gains from batched dispatch at 5–15% ETA reduction."
Scaling Milestones ​
How does this design evolve as you scale? Useful lens to narrate:
- 10 drivers, 100 trips/day (seed city): Single Postgres, no H3 needed, geohash + haversine filter suffices. API and matcher run in one Go process.
- 1K drivers, 10K trips/day (first market): Split matcher from API. Introduce Redis for driver status. Geohash buckets as pseudo-shards.
- 50K drivers, 500K trips/day (major metro): Introduce H3 + in-memory geo index. Cadence for trip lifecycle. Multi-AZ for availability.
- 500K drivers, 5M trips/day (national): Ringpop sharding. Kafka backbone. Connection Service tier emerges.
- 6M drivers, 25M trips/day (global): Multi-region active-active. uReplicator for analytics. Batched matching. Hot-cell splitting.
At each stage, narrate what broke and why you added the next layer; don't jump to the endgame if the interviewer wants incremental reasoning.
Summary Checklist ​
Before you finish, make sure you've covered:
- [ ] Clarifying questions upfront: scope, SLOs, scale.
- [ ] Functional requirements split above/below the line with reasons.
- [ ] NFRs with concrete numbers.
- [ ] Capacity estimation written on the board (QPS, storage, bandwidth).
- [ ] Core entities + API signatures with idempotency keys.
- [ ] High-level architecture with labeled components and data flow.
- [ ] At least 3 deep dives with Bad/Good/Great.
- [ ] Double-booking addressed explicitly.
- [ ] H3 and Ringpop named.
- [ ] Cadence called out for trip lifecycle.
- [ ] Multi-region story for L6+.
- [ ] Reliability/observability closing.
- [ ] Operational concerns: canaries, feature flags, SLO alerts.
Key Numbers to Memorize ​
| Metric | Value |
|---|---|
| Global trips/day | 25M |
| Peak match QPS globally | 10K |
| Peak match QPS largest city | 1–2K |
| Location writes/sec (peak) | 12M |
| Location writes/sec (steady) | 1–2M |
| Concurrent WebSocket clients | 100M |
| Dispatch decision p99 | < 2s |
| Location write p99 | < 100ms |
| Fan-out p99 | < 500ms |
| H3 res 9 edge | ~174m |
| H3 res 10 edge | ~66m |
One-Liner You Should Remember ​
"H3 for geo indexing, Ringpop for stateful sharding, Cadence for trip lifecycle, Kafka for the durable backbone, Schemaless for the system of record, Redis for hot state. Dispatch decision in under 2 seconds p99, 12M location writes/sec globally, 10K matches/sec peak, no double-bookings."