HLD: Uber Driver Location Streaming / Dispatch Pipes

Frequently Asked at Uber — Uber's signature system design problem; reported in nearly every L5A loop.

Understanding the Problem

What is Driver Location Streaming?

This is the ingress and egress system that moves GPS pings around the Uber stack. Millions of drivers stream location updates into the platform at 4 Hz; tens of millions of riders and internal consumers subscribe to live positions for matched trips, fleet-ops dashboards, and heatmaps. Unlike ride matching, this problem is about the pipes — how you manage 100M concurrent WebSocket connections, hit sub-100ms write latency for pings, and fan out events without building a fleet of blocking proxies.

Functional Requirements

Core (above the line):

Ingest driver location pings at 4 Hz per active driver.
Store the latest location with < 100ms write p99.
Retain 24–72h of location history for disputes and analytics.
Push location updates to subscribed clients (rider during trip, fleet-ops dashboards) at < 500ms p99 end-to-end.
Connection management for 100M concurrent mobile clients.

Below the line (out of scope):

Matching logic — covered by the Ride Matching HLD.
Privacy redaction for long-term storage — batch job.
Map-matching (snapping GPS to roads) — separate service.

Non-Functional Requirements

Core:

Scale — 6M active drivers × 4 Hz peak = 24M writes/sec; steady state 1–2M/sec. 100M concurrent mobile connections (drivers + riders).
Latency — write p99 < 100ms. Fan-out p99 < 500ms end-to-end.
Durability — last-known location is durable; ping history is best-effort (a missed ping is fine).
Availability — 99.99%.

Below the line:

Cross-region replication of historical pings — lives in a batch pipeline.
Turn-by-turn telemetry — handled by Nav service.

Capacity Estimation

Be prepared to show the math:

Bandwidth ingress: 2M pings/sec × 200 B = 400 MB/s steady. Double that at peak (~800 MB/s). With protobuf, ~160 MB/s steady.
History storage: 2M pings/sec × 200 B × 72h = ~100 TB hot in Cassandra. RF=3 → 300 TB replicated. Compressed 3× → 100 TB on disk.
Connections: 100M clients / 500 WS gateway nodes = 200K connections/node. At ~1 KB/connection state (with optimized layout), 200 MB RAM/node — modest.
Hot store: 6M driver keys in Redis × 200 B = 1.2 GB — trivial.
Kafka cluster: 200 brokers at 4 MB/s ingress each = 800 MB/s capacity, with headroom. RF=3 means 2.4 GB/s internal replication traffic.

The Set Up

Core Entities

Entity	Description
Driver	`driverId`, `status`, `h3Cell`, `lastLocation`
LocationPing	`driverId`, `lat`, `lng`, `heading`, `speed`, `accuracy`, `ts`
Connection	`connId`, `userId`, `gatewayNodeId`, `protocol` (WS/gRPC), `subscriptions[]`

API Design

Clients use WebSocket for the long-lived connection; internal services use gRPC. Location reads are also exposed as REST for internal batch jobs.

TypeScript (WebSocket + REST)Java (gRPC)C++ (Connection Service core loop)

typescript

// Driver app -> backend
WS /v1/driver/stream
-> client sends: { type: "ping", lat, lng, heading, speed, ts }
<- server sends: { type: "offer", ... } | { type: "ack", seq }

// Rider subscription
WS /v1/rider/trip/:tripId/stream
<- server sends: { type: "driver_location", lat, lng, eta }

// Internal read
GET /v1/drivers/:driverId/location
Response: { lat, lng, ts, staleness_ms }

java

service Location {
  rpc StreamPings(stream Ping) returns (stream Ack);
  rpc GetLastKnown(GetLastKnownReq) returns (Location);
  rpc SubscribeDriver(SubscribeReq) returns (stream Location);
}

message Ping {
  string driver_id = 1;
  double lat = 2;
  double lng = 3;
  float heading = 4;
  float speed = 5;
  int64 ts_ms = 6;
  uint64 seq = 7;
}

cpp

// Per-connection event loop (io_uring-backed)
while (io.run_one()) {
  for (auto& frame : gateway.pollFrames()) {
    if (frame.type == PING) {
      ingressQueue.enqueueSharded(frame.driverId, frame.payload);
      gateway.send(frame.connId, Ack{frame.seq});
    }
  }
}

High-Level Design

Drivers (6M)                     Riders/Ops (100M)
   |                                    ^
   v                                    |
+-------------------+           +-------------------+
| Connection Svc    |           | Connection Svc    |
| (WebSocket GW)    |           | (egress nodes)    |
| 500 nodes         |           | shared with above |
+-------------------+           +-------------------+
   |                                    ^
   | binary frame, gRPC                 | per-trip channel
   v                                    |
+-------------------+           +-------------------+
| Location Ingress  |---------->| Fan-out Service   |
| (Ringpop shard    |           | (Kafka consumers) |
|  by driverId)     |           +-------------------+
+-------------------+                   ^
   |         |                          |
   |         +-------- Kafka -----------+ (driver-locations topic)
   v
+-------------------+
| Hot store: Redis  |
| (driverId -> loc) |   Cassandra (history, 72h hot)
| TTL 60s           |
+-------------------+

End-to-End Flow for a Driver Ping

Connection setup. Driver app calls WSS /v1/driver/stream with a JWT. DNS routes to the nearest edge POP. Edge handles TLS and sticky-routes to a Connection Service node based on (userId, nodeHealth). On success, Connection Service writes (userId -> nodeId) to Redis with a 60s TTL and starts a heartbeat loop.
Ping send. Driver app emits a ping at 4 Hz. Client SDK batches up to 3 pings per frame if network latency is bad. Frame format is protobuf for compactness (~40 B per ping vs 200 B for JSON).
Gateway forwarding. Connection Service receives the frame over WebSocket. It forwards via internal gRPC stream to Location Ingress, consistent-hashed by driverId. A single open gRPC stream between each Connection node and each Ingress shard keeps overhead low.
Ingress write path. Location Ingress applies a sequence-number CAS in Redis via Lua (IF current.seq < incoming.seq THEN SET). Then it produces to Kafka driver-locations with acks=1, linger.ms=5, batch.size=64KB, compression lz4. Acks flow back to the driver client as a status frame.
Durable history. Cassandra Sink consumer reads from Kafka in batches of 100 per partition, writes via UNLOGGED BATCH (rows share partition key (driverId, dayHour)). TTL 72h.
Fan-out path. Fan-out Service consumes the same Kafka topic. For each record, it looks up (tripId -> subscribers) in Redis. Subscribers are rider, ops dashboards, and analytics mirror — typically 1–3 per driver. It posts each subscriber an update via an internal gRPC stream to the gateway node holding their connection.
Egress delivery. The receiving gateway pushes a WS frame to the rider/ops client. Total end-to-end latency (driver frame in → rider frame out) is 100–150ms p99.

Why These Components

Terminating WebSockets on a dedicated tier isolates connection churn from business services.
Ringpop-sharded ingress serializes per-driver writes without global locks.
Kafka is the durable backbone for both history sink and fan-out.
Redis gives sub-ms reads for last-known location.

Data Model Detail

Driver last-known (Redis): Key driver:{id}:loc. Value: packed binary (lat, lng, heading, speed, ts) ~ 32 B. TTL 60s.
Connection registry (Redis): Hash conn:{userId} with nodeId, connId, lastSeenTs. TTL 60s. Heartbeats refresh.
Location history (Cassandra): Partition key (driverId, bucket=dayHour), clustering key ts DESC. TTL 72h. 72h × 4 Hz × 100 B = 100 MB per driver; bucketing keeps each partition under 10 MB.
Kafka topic: driver-locations, 500 partitions, RF=3, acks=1, compression=lz4, retention 24h.

Capacity Walkthrough

Steady state: 2M writes/sec × 200 B = 400 MB/s Kafka ingress; 3× with replication = 1.2 GB/s cluster I/O.
Cassandra sink: 2M/sec / 100-batch = 20K ops/sec at the sink layer; with RF=3, ~60K replica writes/sec, trivial for a 12-node cluster.
Connection Service: 500 nodes × 200K conns × 1 KB/conn = 100 GB total memory. Each node ~200 MB of connection state — 16 GB host is plenty.
Redis hot store: 6M keys × 50 B = 300 MB, trivial.

Potential Deep Dives

1) How do we manage 100M concurrent connections?

FD limits, TLS handshakes, and per-connection memory are the bottlenecks.

Bad Solution — One node per million users

Approach: Terminate 1M connections per process with aggressive kernel tuning.
Challenges: FD limits (~1M/process) barely hold; memory explodes at 10+ GB per process; single-node failures drop 1M users at once. TLS handshake thundering herd on restart.

Good Solution — Horizontal scale with epoll/kqueue

Approach: 500 nodes × 200K connections using epoll (Linux) or kqueue (BSD). Netty or Go for the gateway. Heartbeats every 30s with aggressive TCP keepalive.
Challenges: TLS CPU cost still high on the gateway. Node restarts trigger reconnect storms.

Great Solution — Edge-POP TLS termination + internal gRPC fan-in

Approach:
- Terminate TLS at edge POPs (CDN-style), forward decrypted WS frames to Connection Service over internal mTLS gRPC streams. Crypto cost shifts to the edge where it parallelizes easily.
- Use TCP_NODELAY on short messages, SO_REUSEPORT for accept-side scaling, io_uring on newer kernels for 2–3× throughput per core.
- Keep only (userId, seq, sub-list-ref) in node memory; put subscription metadata in Redis. Drops per-connection memory to ~1 KB.
Numbers: 200K conns/node × 1 KB = 200 MB per node for connection state. 1 ms p99 ping acknowledgment on-node.
Challenges: Reconnect storms after a POP failover — rate-limit new connection rate per region to prevent cascading overload. Graceful drain on deploys so active trips keep their sockets.

2) How do we balance durability and throughput on the write path?

Every ping needs to land somewhere durable, but synchronous DB writes would collapse under 2M ops/sec.

Bad Solution — Direct Cassandra writes with QUORUM

Approach: Every ping triggers a QUORUM write to Cassandra.
Challenges: Write p99 > 20ms, spiky under GC. Cannot hit 2M writes/sec without a massive cluster.

Good Solution — Kafka-first, async persistence

Approach: Kafka is the durable write, followed by Redis hot-state write and a Cassandra sink consumer that batches writes.
Challenges: Better, but producer-side batching and sink consumer tuning are essential. Kafka acks=all costs latency.

Great Solution — Batched Kafka produce + Ringpop-sharded in-process cache

Approach:
- Producer side: linger.ms=5, batch.size=64KB, compression=lz4, acks=1 on the hot path (acks=all on a sampled audit stream). Achieves 1M+ records/sec/producer.
- Consumer side: Cassandra writes batched per partition (up to 100 rows) via UNLOGGED BATCH — no coordinator overhead because rows share the partition key (driverId-bucket). Drops writes from 2M ops/sec to 20K Cassandra ops/sec (100× amortization).
- Hot Redis state set with SET EX 60. Cluster sized so each shard handles ~50K ops/sec.
Numbers: Hot-path write p99 ~5–10ms (Kafka ack + Redis write overlap). Cassandra sink lag < 2s under normal load.
Challenges: acks=1 allows up to one replica loss to drop a ping — acceptable given "best-effort" history durability. For last-known location, tolerate brief staleness on failover; Redis key-by-driverId is rebuilt from Kafka replay.

2.5) How do we handle massive reconnect storms after an outage?

Post-outage, millions of clients reconnect within seconds. Without a plan, this DDoSes the gateway back.

Good Solution — Client-side jittered backoff

Approach: SDK reconnects with min(cap, base * 2^attempts) + rand(0, jitter).
Challenges: Helps but doesn't cap overall throughput; dense regions still spike.

Great Solution — Server-advertised reconnect hints + regional rate limiting

Approach:
- Edge returns 503 with Retry-After: <seconds> header during capacity pressure, driving clients toward uniform reconnection.
- Regional rate limit on new connections at the edge (e.g., cap at 20K new conns/sec/region).
- Connection Service bulk-rebinds old sessions from a cache once the client returns, avoiding re-authentication costs.
Challenges: Legacy clients that ignore Retry-After — sunset them via minimum-version enforcement.

3) How do we fan out efficiently without overwhelming the broker?

Fan-out depth is low per driver (~1–3 subscribers: rider + ops + analytics), but aggregate fan-out is millions of pushes per second.

Bad Solution — Per-ping subscriber loop

Approach: For every ping, server-side loop over all subscribers, write to each individually.
Challenges: Tail latency blows up for drivers with many watchers. N×M path between nodes.

Good Solution — Centralized subscriber table in Redis

Approach: Maintain driverId -> List<subscriberConnId> in Redis. On ping, push once per subscriber.
Challenges: Throughput bounded by Redis roundtrips. Centralized state is a scaling ceiling.

Great Solution — Kafka-backed fan-out with gateway-local subscriber tables

Approach:
- Fan-out service consumes driver-locations partitioned by driverId.
- A gateway node owning a subscriber's connection also subscribes to an internal topic keyed by driverId. Subscription state lives close to the connection, not centralized.
- For popular drivers (fleet ops watching all drivers in a city), use a "wildcard" subscription by H3 cell — clients subscribe to a cell, not a driver list.
- Pure push, no polling.
Numbers: Observed fan-out depth per driver: 1–3. The hard problem is connection count, already solved.
Challenges: Wildcard subscriptions can blow up during fleet-ops debug tools (a user subscribing to an entire market). Enforce per-user subscription caps (e.g., max 5K concurrent subs per ops user).

4) How do we sequence pings so out-of-order deliveries don't corrupt state?

Networks reorder. A ping from 3 seconds ago can land after a fresh one.

Bad Solution — Always write the incoming ping

Approach: Overwrite the last-known key with every received ping.
Challenges: An out-of-order old ping overwrites a fresher one. Rider sees the driver "jump back in time".

Good Solution — Compare timestamps before writing

Approach: Read-before-write: fetch current ts, compare, conditional write if newer.
Challenges: Extra round trip per ping; at 2M pings/sec that's crippling.

Great Solution — Lua CAS at Redis + sequence number per driver

Approach:
- Client-assigned monotonic sequence number per connection.
- Lua script: IF current.seq < incoming.seq THEN SET — atomic compare-and-set.
- Kafka partition keyed by driverId preserves order end-to-end to consumers.
Challenges: Clients must reset seq on reconnect; server treats the new connection as authoritative based on a session token.

4.5) How do we keep bandwidth and battery under control?

Drivers drive all day; battery drain from streaming kills their willingness to stay online.

Good Solution — Batch pings in the SDK

Approach: Instead of one frame per ping, batch every 1s into a single frame of 4 pings.
Challenges: Adds 250–1000ms latency to location updates.

Great Solution — Adaptive batching + binary protobuf

Approach:
- Protobuf over raw WebSocket frames: ~40 B per ping vs 200 B for JSON.
- When stationary (speed < 2 km/h), reduce frequency to 0.5 Hz and skip redundant pings.
- When accelerating/turning, boost to 4 Hz for accuracy.
- Keep the WebSocket connection open but idle-optimized (TCP keepalive every 60s).
Challenges: Requires client-side logic; sensor fusion for stationary detection; careful backward-compat.

5) How do we handle buggy or abusive clients?

A driver phone stuck in a loop can spam 100 Hz pings and exhaust a Kafka partition.

Good Solution — Client-side throttle

Approach: SDK enforces 4 Hz.
Challenges: Cannot trust the client; must enforce server-side.

Great Solution — Token-bucket per-driver rate limit at gateway

Approach:
- Gateway keeps an in-process token bucket per driverId (N = 10, refill 4/s).
- Over-rate pings are silently dropped with a sampled log line; after sustained abuse, a kill-switch closes the connection and requires re-auth.
- Anomaly detector on pings/sec/driver surfaces buggy app versions.
Challenges: False positives for legitimate high-frequency devices (e.g., new telemetry features). Coordinate rollouts via feature flag.

Rapid-Fire Q&A Anticipations

"What's the connection recovery time after a gateway restart?" ~30s worst-case: TCP SYN (1–3s) + TLS handshake (100ms–1s) + auth + subscription re-registration.
"How do we test failover?" Chaos experiments kill random gateway nodes weekly; SLO on connection-drop percentage tracked in M3.
"Why not use MQTT broker (like AWS IoT Core)?" Licensing and scale limits; Uber's WebSocket+Kafka stack is cheaper and more flexible at this size.
"How do we roll out a gateway change?" Blue/green at the regional level; new color takes 10% of new connections first, then ramps.
"What about jitter introduced by Kafka?" Typical lag 50–200ms; fits easily under the 500ms fan-out p99 budget.

Alternatives Considered

MQTT vs WebSocket: MQTT is lighter, better for constrained devices; WebSocket is simpler for smartphones and integrates with HTTP infrastructure. Uber uses WebSocket.
Server-Sent Events (SSE) vs WebSocket: SSE is one-way; we need bidirectional (ping + offer/ack). WebSocket wins.
Raw TCP vs WebSocket: Raw TCP avoids HTTP overhead but loses proxy compatibility. WebSocket over TLS is the practical choice.
Kafka vs Pulsar: Both work. Pulsar has cleaner tiered storage; Kafka has the bigger ecosystem. Uber is a Kafka shop.
Direct Cassandra writes vs Kafka-first: Kafka-first gives backpressure and multi-consumer fan-out; direct writes couple everything to DB availability.

Frequently Asked Follow-ups

"What if a driver's phone loses connectivity for 5 minutes?" — Pings buffer on-device up to a cap (e.g., last 60s); on reconnect, send a compressed delta. Last-known location falls stale in Redis (TTL 60s); consumers see a staleness_ms field.
"How do you do geofencing (alert when driver enters region)?" — Flink job on driver-locations computes H3 cell per ping; compares against subscribed geofences; emits alerts to notification pipeline.
"What's the impact of GPS noise?" — Client-side Kalman filter smooths coordinates before sending. Server does not re-filter.
"How do we handle driver auth for the WS?" — mTLS or JWT handshake on connect. Token refresh via a keep-alive frame.
"Cross-region?" — Each region has its own Connection Service + Kafka cluster. uReplicator mirrors to a global analytics cluster. Hot path stays in-region.

Visual Aids to Draw

Connection topology: drivers + riders fanning into gateway nodes, with edge POPs at the perimeter.
Ingress → Kafka → egress pipeline with partition keys labeled (driverId).
Ringpop ring illustrating how driverIds hash to owner nodes.
Fan-out tree showing 1 driver ping → 1–3 subscribers (rider + ops + analytics).
Reconnect storm mitigation: timeline with jittered backoff vs. synchronized reconnect spikes.

What's Expected at Each Level

Mid-level (L4)

WebSocket gateway, Redis hot store, basic fan-out.
Understands why Kafka sits between ingress and persistence.
Misses connection-per-node math and backpressure story.

Senior (L5 / L5A)

Explicit sharding story (Ringpop by driverId), Kafka as the durable backbone, tiered storage hot/warm/cold.
Quantifies connection-per-node, bandwidth, and cluster size.
Handles rate limits and reconnect storms.

Staff+ (L6)

Edge/POP terminations, backpressure at the gateway (if downstream Kafka lag rises, drop pings or downsample to 2 Hz).
Sliding-window rate limits per driver to protect against buggy clients.
Cross-region design: regional Kafka clusters mirrored via uReplicator for analytics, but hot path stays in-region.
Observability: per-connection span sampling, pings_dropped_total dashboards, SLOs for fan-out latency.

Common Pitfalls

Single gateway tier. 100M connections don't fit on any reasonable node count without sharding.
Direct DB writes per ping. Kafka-first is mandatory at this scale.
Ignoring reconnect storms. After any outage, 10% of clients reconnect simultaneously. Backoff + jitter are essential.
Assuming durability for pings. Location history is best-effort; state so explicitly.
Over-engineering fan-out. Actual fan-out depth per driver is 1–3; solve the connection-count problem first.

Walkthrough: Interview Dialogue Example

Interviewer: "Walk through a single driver ping end-to-end, including fan-out to the rider."

You should answer:

0ms: Driver GPS emits ping at 4 Hz. SDK sends WS frame to the edge POP (~20ms).
20ms: Edge terminates TLS, forwards over mTLS gRPC to Connection Service node (1ms intra-DC).
21ms: Connection Service enqueues to Location Ingress sharded by driverId.
25ms: Location Ingress consistent-hashes to a Ringpop owner, writes to Redis (SET EX 60), produces to Kafka with acks=1 (~5ms).
30ms: ACK back to driver.
In parallel, Fan-out Service consumes the Kafka record (~50ms lag steady-state), looks up subscribers in Redis (trip:{id} → rider connId on node X), sends via gRPC to node X.
Node X forwards WS frame to rider over edge POP (~20ms).
Total end-to-end: ~100–150ms from driver to rider, comfortably inside the 500ms fan-out p99.

What If They Pivot Mid-Interview?

"Design a live-location share (rider shares trip with a friend)." — Same fan-out pipeline; add an auth scope (tripId, viewerId, expiresAt). Viewer connects to the same WS gateway and subscribes to the trip channel.
"What if all drivers went to 10 Hz instead of 4?" — 2.5x the writes (5M/sec). Need a 2.5x larger Kafka + Cassandra cluster. Connection Service roughly unchanged. Good prompt to demonstrate capacity rethinking.
"Fleet-ops wants to see every driver in a city." — Wildcard subscription by H3 region, not per-driver. Fan-out service publishes per-cell aggregated batches rather than per-ping messages.
"Protocol evolution — HTTP/3?" — Yes, QUIC eliminates head-of-line blocking and gives faster reconnect. Mobile carriers are increasingly QUIC-capable. Run WS over QUIC where supported with fallback to WSS/TCP.

Reliability and Observability

SLO: 99.99% connection uptime, p99 write < 100ms, p99 fan-out < 500ms.
Failure modes:
- Single gateway node failure → ~200K connections drop; clients reconnect with exponential backoff 1s/2s/4s capped at 30s (plus jitter) to avoid thundering herd.
- Kafka partition loss → producer retries cover the gap; consumer offset tracked in Cassandra/Redis so the sink catches up.
- Regional failover → clients resolve a regional DNS name; new connections route to the standby region.
Deployment: Canary gateway deploys at a single AZ first; connection drain gives existing sessions 60s to migrate via server-initiated reconnect hints.
Monitoring: connection_count_per_node, ping_rate_per_driver, fanout_latency_p99, kafka_consumer_lag. Alert on gateway CPU > 70% sustained.
Runbook: If a gateway node is flagged hot, halt new-connection routing to it (health check returns 503) and let load shed gracefully.

Uber-Specific Notes

Uber's gateway tier is built on a custom Go framework. Mentioning Go or Netty is fine.
Ringpop is often the answer for "how do you shard a stateful in-process service at Uber."
uReplicator ships Kafka events across regions for analytics; the hot path stays in-region.
M3 ingests pings_rate, connection_count, fanout_latency metrics. Jaeger traces sample 0.1% of connections for debugging.
Name-drop QUIC / HTTP/3 as a future direction — mobile clients benefit from fast reconnects and built-in congestion control. Most Uber deployments still run WebSocket over TLS 1.3.
Interviewers sometimes pivot to "how would you do geofence alerts" — answer: same fan-out pipeline, but publisher is the geofence evaluator (Flink job on H3 pings) rather than the dispatcher.
Close with: "On a gateway failure we drop ~200K connections; clients reconnect with exponential backoff 1s/2s/4s capped at 30s to avoid thundering herd."

Scaling Milestones

Small fleet (100 drivers): HTTP long-polling + Postgres; good enough for pilot.
Growing (10K drivers): WebSocket gateway, Redis for last-known, single region.
Major markets (500K drivers): Kafka backbone, Ringpop sharding, Cassandra history tier.
Global (6M drivers): Edge-POP TLS, io_uring gateways, regional Kafka, wildcard fleet-ops subscriptions.

Each phase adds capacity without replacing the previous foundations — evolutionary, not revolutionary.

Summary Checklist

[ ] 100M connections → 500 nodes × 200K each.
[ ] Edge POP TLS termination.
[ ] Ringpop-sharded Location Ingress.
[ ] Kafka-first writes with batched Cassandra sink.
[ ] Redis hot store for last-known.
[ ] Fan-out via Kafka + gateway-local subscriber tables.
[ ] Rate limiting per driver (token bucket).
[ ] Sequence-number CAS for out-of-order pings.
[ ] Reconnect backoff + jitter story.
[ ] Cross-region independence.

Key Numbers to Memorize

Metric	Value
Concurrent clients	100M
Pings/sec (peak)	24M
Pings/sec (steady)	1–2M
Connections/node	200K
Gateway node count	500
Ping frequency	4 Hz
Write p99	< 100ms
Fan-out p99	< 500ms
History retention hot	72h
Per-ping bandwidth	~200 B (JSON) / ~40 B (protobuf)

One-Liner You Should Remember

"500 WebSocket gateways × 200K connections, Ringpop-sharded Location Ingress, Kafka-first persistence with batched Cassandra sink, Redis for last-known. 2M pings/sec steady, p99 write < 100ms, p99 fan-out < 500ms, 100M concurrent clients."

HLD: Uber Driver Location Streaming / Dispatch Pipes ​

Understanding the Problem ​

What is Driver Location Streaming? ​

Functional Requirements ​

Non-Functional Requirements ​

Capacity Estimation ​

The Set Up ​

Core Entities ​

API Design ​

High-Level Design ​

End-to-End Flow for a Driver Ping ​

Why These Components ​

Data Model Detail ​

Capacity Walkthrough ​

Potential Deep Dives ​

1) How do we manage 100M concurrent connections? ​

Bad Solution — One node per million users ​

Good Solution — Horizontal scale with epoll/kqueue ​

Great Solution — Edge-POP TLS termination + internal gRPC fan-in ​

2) How do we balance durability and throughput on the write path? ​

Bad Solution — Direct Cassandra writes with QUORUM ​

Good Solution — Kafka-first, async persistence ​

Great Solution — Batched Kafka produce + Ringpop-sharded in-process cache ​

2.5) How do we handle massive reconnect storms after an outage? ​

Good Solution — Client-side jittered backoff ​

Great Solution — Server-advertised reconnect hints + regional rate limiting ​

3) How do we fan out efficiently without overwhelming the broker? ​

Bad Solution — Per-ping subscriber loop ​

Good Solution — Centralized subscriber table in Redis ​

Great Solution — Kafka-backed fan-out with gateway-local subscriber tables ​

4) How do we sequence pings so out-of-order deliveries don't corrupt state? ​

Bad Solution — Always write the incoming ping ​

Good Solution — Compare timestamps before writing ​

Great Solution — Lua CAS at Redis + sequence number per driver ​

4.5) How do we keep bandwidth and battery under control? ​

Good Solution — Batch pings in the SDK ​

Great Solution — Adaptive batching + binary protobuf ​

5) How do we handle buggy or abusive clients? ​

Good Solution — Client-side throttle ​

Great Solution — Token-bucket per-driver rate limit at gateway ​

Rapid-Fire Q&A Anticipations ​

Alternatives Considered ​

Frequently Asked Follow-ups ​

Visual Aids to Draw ​

What's Expected at Each Level ​

Mid-level (L4) ​

Senior (L5 / L5A) ​

Staff+ (L6) ​

Common Pitfalls ​

Walkthrough: Interview Dialogue Example ​

What If They Pivot Mid-Interview? ​

Reliability and Observability ​

Uber-Specific Notes ​

Scaling Milestones ​

Summary Checklist ​

Key Numbers to Memorize ​

One-Liner You Should Remember ​