Skip to content

HLD: Uber ETA Service ​

Understanding the Problem ​

What is the ETA Service? ​

Given an origin and destination (or just an origin and a driver), the ETA service returns the estimated travel time. It is a hidden dependency of almost every other Uber service: dispatch scoring, rider fare quotes, driver heatmaps, restaurant delivery windows, and supply forecasting. Interviewers love this problem because it touches graph algorithms, ML systems, caching, and streaming freshness all at once.

Functional Requirements ​

Core (above the line):

  1. Compute ETA between two points over the driving road network.
  2. Incorporate real-time traffic.
  3. Return in < 200ms p99 per call.
  4. Bulk endpoint: compute ETAs from N origins to M destinations (N, M up to ~100) for dispatch candidate scoring.

Below the line (out of scope):

  • Turn-by-turn navigation — separate nav service.
  • Map tile rendering — map tiles service.
  • Historical trip replay — data warehouse.
  • Map ingestion and edits — separate upstream pipeline.

Non-Functional Requirements ​

Core:

  1. Scale — 200K ETA calls/sec globally (dispatch candidate lookups × surge heatmap recompute × rider quotes).
  2. Latency — p99 < 200ms for point-to-point; < 500ms for bulk (100×100 = 10K entries).
  3. Freshness — traffic incorporation < 60s.
  4. Accuracy — MAPE (Mean Absolute Percentage Error) target < 12% vs actual trip time.

Below the line:

  • Global deep analytics on ETA accuracy — offline pipeline.
  • Replay determinism for A/B experiments — handled by shadow pipeline.

Capacity Estimation ​

Lead with these:

  • Road network: Global graph ~1B edges. Per region, a metro area ~50M edges. At ~32 bytes per edge in a CH-processed adjacency list, a metro graph is ~1.6 GB in RAM.
  • Traffic tiles: H3 res 8 (~0.74 km²) × 2 edges avg = 500K hot cells globally × 100 B = 50 MB state per region, easily memory-resident.
  • Request volume: 200K QPS × 200 B request + 200 B response = ~80 MB/s bandwidth, spread across regions and gateway nodes.
  • Cache footprint: 70% hit rate at Redis tier with ~500M keys (H3-rounded pairs) × 40 B = 20 GB, plus pointers — 32 GB cluster with headroom.
  • Routing engine compute: 6K queries/sec per region post-cache × 10 ms per query = ~60 core-seconds/sec = 60 cores continuously busy; provision 100 cores per region for peak.

The Set Up ​

Core Entities ​

EntityDescription
RoadSegmentsegmentId, geometry, baseSpeedKmh, currentSpeedKmh
TrafficTilecellId, avgSpeed, congestionLevel, updatedAt
ETAResultetaSeconds, distanceMeters, polyline, confidence

API Design ​

Point-to-point for interactive flows; bulk for dispatch scoring, which needs an N×M matrix in one round trip.

typescript
POST /v1/eta/point-to-point
Body: { origin: LatLng; destination: LatLng; mode: "car" | "walk" }
Response: { etaSeconds: number; distanceMeters: number; confidence: number }

POST /v1/eta/bulk
Body: { origins: LatLng[]; destinations: LatLng[] }
Response: { matrix: number[][] } // etaSeconds[i][j]
java
service ETA {
  rpc GetETA(GetETAReq) returns (ETAResp);
  rpc GetETAMatrix(GetMatrixReq) returns (MatrixResp);
}

message GetETAReq {
  LatLng origin = 1;
  LatLng destination = 2;
  Mode mode = 3;
}

message MatrixResp {
  // flattened row-major matrix for wire efficiency
  repeated float eta_seconds = 1;
  int32 rows = 2;
  int32 cols = 3;
}
cpp
// Contraction-Hierarchies-based query
uint32_t srcNode = graph.nearestNode(origin);
uint32_t dstNode = graph.nearestNode(destination);

bidirectional_dijkstra_ch(graph, srcNode, dstNode,
                          trafficWeights, /*out*/ etaSeconds);

High-Level Design ​

+-------+   +--------------+   +----------------+
| Call  |-->| ETA Gateway  |-->| Routing Engine |
|       |   | (bulk-batch) |   | (Dijkstra/A*/  |
+-------+   +--------------+   |  contraction   |
                ^              |   hierarchies) |
                |              +----------------+
                |                      |
                |                      v
                |              +----------------+
                |              | Road Graph     |
                |              | (memory-mapped |
                |              |  per region)   |
                |              +----------------+
                |                      ^
                |                      |
                |              +----------------+
                |<-------------| Traffic Feed   |
                               | (Flink stream, |
                               |  60s window)   |
                               +----------------+
                                       ^
                                       |
                               Driver GPS pings

End-to-End Flow ​

  1. A caller (dispatch service, pricing, rider app) hits the ETA Gateway with origin + destination, plus a product type for mode-specific routing.
  2. Gateway validates auth (internal service mesh via mTLS + SPIFFE) and applies a per-caller rate limit. Then it checks a Redis cache keyed by (h3Origin, h3Dest, product) at res 10 (~50m).
  3. On a miss, Gateway forwards to the Routing Engine pinned to the region owning the origin. Routing Engine selection is via consistent hashing on the origin's region to keep caches warm per node.
  4. Routing Engine consults a memory-mapped road graph with contraction-hierarchies preprocessing. It snaps origin and destination to nearest graph nodes (nearest-node lookup via spatial index), runs the CH query, overlays live traffic weights from the current mmap'd edge-weight array, and returns ETA + distance + polyline.
  5. Result is written through to Redis with a 30–60s TTL to respect traffic freshness. Gateway returns to caller.
  6. Traffic Feed runs a Flink pipeline consuming driver GPS pings from driver-locations Kafka topic. It produces per-edge speed deltas every 60s, bucketed by H3 cell. Routing Engine nodes subscribe to traffic-updates Kafka topic, maintain a shadow edge-weight array, and atomically swap pointers every minute.
  7. For bulk NxM calls from dispatch, Gateway batches candidates into a single Routing Engine request which uses source contraction to amortize the shortest-path cost across shared forward trees.

Why These Components ​

  • CH preprocessing turns expensive Dijkstra queries into near-constant lookups at the cost of longer offline build.
  • Region-local routing keeps graph memory footprint manageable and minimizes cross-region latency.
  • Flink-based traffic updates give us 60s freshness without coupling the routing engine's hot path to a database.

Data Model Detail ​

  • RoadSegment (on-disk in routing engine): Adjacency list keyed by node ID. Each edge stores fromNode, toNode, baseSpeedKmh, lengthM, restrictions (one-way, truck-only), and shortcutInfo from contraction hierarchies.
  • TrafficTile (Redis / in-memory): Key traffic:{cellId}. Value: avgSpeed, congestionLevel, sampleCount, ts. TTL 90s.
  • ETA cache (Redis): Key eta:{h3Origin}:{h3Dest}:{product}. Value: etaSeconds, distanceM, computedAt. TTL 30s.

Capacity Walkthrough ​

  • 200K QPS across regions → 20K QPS per large region. With 70% cache hit rate, ~6K QPS hit the routing engine.
  • Routing engine cluster: 30 nodes per region × 200 QPS each at < 10ms p99 using CH. Comfortable margin for spikes.
  • Redis cache: 10M keys × 60 B = 600 MB per region; 3-node cluster suffices.
  • Road graph per region: 50M edges × 32 B = 1.6 GB on disk, mmap'd into RAM. Each routing node carries the whole graph.

Potential Deep Dives ​

1) How do we route at scale without blowing the latency budget? ​

You have 200ms p99 and 200K QPS. Plain Dijkstra is too slow.

Bad Solution — Plain Dijkstra per query ​

  • Approach: Run Dijkstra on a 50M-edge graph per query, even with priority-queue optimizations.
  • Challenges: 100–500ms per query in a single thread. Does not fit the 200ms p99 budget under load. Memory hot for priority queue allocations.
  • Approach: A* with haversine heuristic cuts explored nodes ~10×. Combine with bidirectional search from both origin and destination.
  • Challenges: Brings queries to 30–80ms, which is workable but leaves no headroom for bulk 100×100 queries.

Great Solution — Contraction Hierarchies (CH) precomputed offline ​

  • Approach:
    • CH preprocesses the graph by adding shortcut edges between "important" nodes (those with higher node level).
    • Queries become effectively O((log n)^2), typically < 10ms for cross-metro routes.
    • Precompute per region overnight; traffic updates modify edge weights but do not invalidate the hierarchy (use CH-based time-dependent variants or reweight at query time).
    • Bulk N×M queries amortize further via source-contraction (run one Dijkstra tree from each origin; reuse for all destinations).
  • Challenges: CH preprocess builds take hours and must be scheduled off-peak. Network edits (new roads, closures) require incremental rebuilds or delta layers.

2) What is the caching strategy? ​

Most ETA calls are to common dispatch flows and identical corridor queries. A good cache pays huge dividends.

Bad Solution — Cache by raw lat/lng ​

  • Approach: Cache every (origin, destination) tuple using full-precision floats.
  • Challenges: Cache hit rate ~0% because float precision makes every key unique. Memory bloat with no benefit.

Good Solution — Round coordinates via H3 ​

  • Approach: Round coordinates to H3 res 10 (~50m). (h3Origin, h3Dest) becomes the key. TTL 30–60s.
  • Challenges: Hit rate ~30–40% in dense cities. Better but still pays full price on ~60% of queries.

Great Solution — Tile-based precompute + origin-only caching ​

  • Approach:
    • For common dispatch flows ("nearest 5 drivers to this rider"), the destination is the rider. Cache h3Origin -> etaToRider keyed by rider H3 res 10 with a 60s window.
    • For long-haul ETAs, precompute tile-to-tile ETAs for major corridors at H3 res 7 granularity nightly; patch with real-time traffic multipliers on read.
    • Redis Cluster with 32 shards handles the load.
  • Numbers: Hit rate ~70%. Tail latency < 50ms on hits, < 200ms on misses.
  • Challenges: Nightly precompute blows up for sparse markets — skip those. Stale traffic multipliers can miscalibrate precomputed corridors during incidents; flag and evict when traffic deltas exceed a threshold.

2.5) How do we handle off-network points (pickup on a campus, inside a parking lot)? ​

A user's GPS sometimes puts them off the road network. You must snap to the nearest routable node.

Good Solution — Nearest graph node via k-d tree ​

  • Approach: Build a k-d tree over graph node coordinates. On each query, snap origin and destination to the nearest routable node.
  • Challenges: Nearest node may be across a barrier (fence, river). ETA then understates walking time to the car.

Great Solution — Nearest-node with access segment heuristic ​

  • Approach:
    • K-d tree + access-segment lookup: if nearest node is an interior node without road access, use the next best one on a public road.
    • Add a fixed-cost "access time" (typically 30–60s) to the ETA to represent walking from curb to vehicle.
    • For airports, use pre-computed PUDO (pickup/drop-off) zones with fixed coordinates.
  • Challenges: Requires map annotations for PUDO zones. Keep a separate "pickup point" service that enhances this over time.

3) How do we propagate traffic updates without lock contention? ​

Routing nodes need fresh edge weights but cannot afford a reader-writer lock on every query.

Bad Solution — Pull every N seconds ​

  • Approach: Routing nodes poll a traffic service.
  • Challenges: Updates hit different servers at different times; two concurrent requests can see different ETAs for the same route. Poll cost adds up.

Good Solution — Multicast push ​

  • Approach: Push updates via a multicast channel; each routing node consumes and updates its in-memory edge weights.
  • Challenges: Requires a lock or copy-on-write scheme to avoid torn reads. Multicast is fragile across cloud networks.
  • Approach:
    • Flink streaming job aggregates GPS pings into per-edge speed distributions with a 60s sliding window.
    • Deltas ship via Kafka traffic-updates topic partitioned by region.
    • Routing nodes per region subscribe and mmap-swap an atomic pointer to the updated edge-weight array once per minute (copy-on-write).
    • Reads remain lock-free. Worst-case staleness is 60–90s.
    • For hot corridors, a low-latency side channel (in-memory gRPC push) can update edge weights within 5s.
  • Challenges: Memory-doubled during swap (old + new array live briefly); size regions so one region's graph fits comfortably in half the host RAM.

4) How do we optimize the bulk N×M endpoint for dispatch? ​

Dispatch calls GetETAMatrix with up to 100 origins × 100 destinations, at p99 < 500ms. Naive per-pair calls are DOA.

Bad Solution — Loop per pair ​

  • Approach: Call GetETA 10K times.
  • Challenges: Even at 10ms per call, that is 100 seconds sequentially. Parallel saturates the cluster for a single caller.

Good Solution — Parallel fan-out with coalescing ​

  • Approach: Gateway fans out to routing nodes in parallel; each node handles a slice.
  • Challenges: 100 origins × 100 destinations still issues 10K queries. Better but not great.

Great Solution — Many-to-many shortest-path via CH source-contraction ​

  • Approach:
    • For each origin, run a forward upward-search Dijkstra in CH space producing a "search tree" rooted at the origin.
    • For each destination, run a backward upward-search.
    • Intersect trees: shortest-path distance is the min over shared meeting nodes.
    • Reuses one forward tree across all destinations → O(N + M) × CH-query-cost rather than O(N × M).
  • Numbers: 100×100 matrix computed in ~30ms in modern CH libraries.
  • Challenges: Implementation complexity; use a library like RoutingKit or an in-house implementation.

4.5) How do we handle time-dependent routing (rush hour matters)? ​

Traffic patterns vary sharply by time of day. A static edge weight cannot capture this.

Good Solution — Separate graphs per time bucket ​

  • Approach: Precompute 4–6 graphs (morning rush, midday, evening rush, night). Route uses the appropriate one.
  • Challenges: Discrete buckets introduce jumps at boundary; doesn't capture day-of-week effects.

Great Solution — Time-dependent weights + predicted speeds ​

  • Approach:
    • Store historical speed profile per segment as a 168-slot array (hour-of-week).
    • At query time, ETA computation integrates over time — use current time to index into the profile, advance time as you move along the route.
    • Blend with real-time traffic: weight = alpha * historical + (1 - alpha) * realtime, alpha learned per region.
    • Include special-event overrides (parades, concerts) via a calendar service.
  • Challenges: Compute cost per query rises; offset via CH that supports time-dependent weights (TCH). For cost control, apply time-dependence only to long-haul routes (> 10 min).

5) How do we improve accuracy beyond the graph? ​

Graph shortest-path with live traffic is a great baseline, but you can do better with ML.

Good Solution — Post-hoc multiplier ​

  • Approach: Multiply graph ETA by a static correction factor per region, derived offline from historical actuals.
  • Challenges: Ignores time-of-day, weather, and driver skill. Cap on accuracy gains.

Great Solution — Two-stage ML correction ("uETA") ​

  • Approach:
    • Graph-based routing produces a baseline ETA and path features (segment speeds, turns, signals).
    • A gradient-boosted tree takes baseline + real-time features (weather, event flags, recent actuals, time of day, driver type) and outputs a corrected ETA.
    • Online feature pipeline ships features via Kafka; model serving is a lightweight C++ library embedded in Routing Engine.
    • A/B framework compares MAPE across model versions.
  • Numbers: Uber has publicly reported single-digit percent MAPE improvements stacking this layer on top of shortest-path.
  • Challenges: Model drift on new markets; rebuild feature pipelines when introducing new products. Added p99 from model inference must stay under 30ms to preserve the 200ms budget.

Rapid-Fire Q&A Anticipations ​

  • "What's the fallback if the ML model is slow?" Bypass and return graph-only ETA; confidence field in response indicates absence of ML layer.
  • "How do you handle driver diversions (they deviate from the route)?" Recompute ETA from new origin; cache invalidation on path deviation.
  • "What's the update rule for a running trip's ETA?" Recompute every 30s; client smooths displayed value.
  • "Cache consistency across regions?" Each region has its own cache; no cross-region consistency needed since most ETAs are intra-region.
  • "What happens for a brand-new road added yesterday?" The rebuild window picks it up (nightly); until then, routes using the new road fall back to adjacent roads — acceptable for most use cases.

Alternatives Considered ​

  • OSRM vs in-house: OSRM is open source and solid, but Uber's scale + ML layering motivated building/forking internal tooling.
  • CH vs Hub Labels: Hub Labels give even faster queries but larger preprocess size. CH is the standard.
  • Graph-only vs ML-corrected: Graph-only is simpler; ML layering is needed to hit sub-12% MAPE.
  • Client-side routing vs server-side: Client can't see real-time traffic across the network. Server-side is mandatory.
  • Separate nav and dispatch ETAs vs unified: Uber runs them as separate services with different SLOs. Navigation wants every 30s refresh; dispatch wants point-to-point in bulk. Unified API but distinct tuning.

Frequently Asked Follow-ups ​

  • "How do you update the graph when a road closes?" — Edits are applied to the live edge-weight array atomically. Structural edits (new edges) wait for the next preprocessing rebuild.
  • "What's the failure mode if traffic feed stalls?" — Engine falls back to historical speeds keyed by time-of-day. Accuracy drops ~5% but service stays up.
  • "Do we handle walking ETAs?" — Different mode in the request; uses a pedestrian-oriented graph. Often runs on a smaller, cheaper routing engine pool.
  • "How do we prevent a rider from seeing a jumping ETA?" — Client-side smoothing: new ETA replaces old only if delta > threshold. Server returns ETAs with small EMA applied.
  • "Cold region spinup?" — Precompute takes hours. For new cities, run a background job; service boots in read-only mode until graph is ready.

Visual Aids to Draw ​

  • Road graph sketch with a few nodes, edges labeled with baseline speed and live multiplier.
  • CH shortcut overlay showing how a long-distance query jumps via shortcut edges instead of individual street segments.
  • Traffic pipeline from driver pings → Flink windowing → edge-weight deltas → routing engine atomic swap.
  • Cache tier diagram with Redis res-10 cache in front of Routing Engine, with hit rate annotation.
  • MAPE chart showing graph-only vs graph + ML correction error curves — evidence for the second stage.

What's Expected at Each Level ​

Mid-level (L4) ​

  • Dijkstra / A* with a spatial index. Redis cache.
  • Understands graph data model and cache TTL tradeoffs.
  • Misses CH-level preprocessing and traffic propagation design.

Senior (L5 / L5A) ​

  • Mentions contraction hierarchies or similar preprocessing.
  • Discusses freshness vs cache hit rate tradeoff.
  • Bulk-endpoint optimization (reuse Dijkstra tree / source contraction).
  • Back-of-envelope on graph memory, cache sizing, and per-region sharding.

Staff+ (L6) ​

  • ML-based ETA refinement with training/serving split, feature store, and A/B framework.
  • Accuracy metrics (MAPE, p90 absolute error) and how they roll up.
  • Multi-region replication story; how a failover region's graph stays reasonably fresh.
  • Cost/performance: when to push a route to the ML layer vs stay on the graph-only fast path.

Common Pitfalls ​

  • Using Dijkstra without acknowledging CH. At 200K QPS on a 50M-edge graph, you must precompute.
  • Caching on raw floats. Always bucket to H3 first; otherwise your cache hit rate is zero.
  • Ignoring traffic staleness. ETAs without fresh traffic are useless in dense cities at rush hour. State the 60s freshness target explicitly.
  • Forgetting bulk NxM. Dispatch always calls ETA in bulk; optimize for it.
  • Single global graph. Regionalize the graph for memory + latency.

Walkthrough: Interview Dialogue Example ​

Interviewer: "A dispatch request needs ETAs for 5 candidates. Walk through."

You should answer:

  1. Dispatch calls GetETAMatrix({origins=[rider_pickup], destinations=[5 driver locations]}).
  2. Gateway cache lookup: (h3_pickup, h3_driver_i) × 5. Hit rate ~60%; 2 miss, 3 hit.
  3. For misses, Gateway calls regional Routing Engine with a single batched request. Routing Engine does one forward CH tree from pickup, then 5 backward trees from drivers — O(N+M) rather than O(N*M).
  4. Routing Engine overlays live traffic weights from mmap'd edge-weight array (atomically swapped every 60s).
  5. Results return in ~15ms to Gateway; cached for 30s with TTL.
  6. Gateway returns matrix to Dispatch with ETAs.

Total: ~30ms end-to-end, well inside the 500ms bulk budget.

What If They Pivot Mid-Interview? ​

  • "Now design the full navigation service." — Explain that nav is a superset: it reuses the same road graph and traffic feed, but adds turn-by-turn instructions, re-routing on deviation, and voice synthesis. p99 budget loosens to 500ms. Shares caching infra with ETA.
  • "What about food delivery ETAs?" — Adds restaurant prep time prediction (separate ML model), courier mode (driving + walking + biking), and multi-leg routing (courier to restaurant, restaurant to customer). Reuses the graph; adds mode selection.
  • "Could we serve ETAs from a CDN?" — Tile-level precomputed ETAs yes; point-to-point no. Traffic freshness and specific origin/destination granularity are incompatible with CDN caching.
  • "Explain the cost." — Routing engine nodes are the dominant cost: ~50 nodes per region × ~50 regions globally × $0.50/hr = $18K/month. Plus Redis cache + traffic pipeline. Well-justified by the business value.

Reliability and Observability ​

  • SLO: 99.99% availability, p99 < 200ms point-to-point, p99 < 500ms bulk.
  • Failure modes:
    • Routing engine OOM due to graph swap → size regions so the graph fits comfortably within half of available RAM; use memory-mapped files to let the kernel manage paging.
    • Traffic feed stalled → fall back to historical time-of-day speeds; degrade accuracy gracefully rather than return errors.
    • ML model service down → bypass the correction layer; graph-only ETA is still serviceable.
  • Deployment: Blue-green on graph builds; both color groups running with the previous snapshot while the new one primes caches.
  • Monitoring: Per-region MAPE tracked nightly against actual trip times. Alert when MAPE > 15% for 24h, or when p99 latency > 300ms for 10 minutes.
  • Runbook: On routing-engine hotspot, reshuffle H3 cell ownership across nodes; on cache overload, temporarily lower TTL and accept reduced hit rate.

Uber-Specific Notes ​

  • Uber uses OSM + internal map corrections. ETA is often a two-stage model: graph-based prediction + ML correction layer ("uETA").
  • Map data pipelines are large; don't go deep on map ingestion unless asked.
  • H3 cells anchor both caching and traffic aggregation.
  • Jaeger traces every ETA call; instrument the routing engine with span events for graph lookup, traffic overlay, and ML inference.
  • For a stretch answer, bring up cross-mode ETA (car + walk for pickup, transit for UberX Share) as a graph-of-graphs design.
  • Mention that Uber Eats shares this ETA infrastructure but layers in restaurant prep time prediction on top — handy if the interviewer pivots to Eats.
  • When in doubt on graph preprocessing, Customizable Contraction Hierarchies (CCH) is the more modern variant that supports fast edge-weight updates; mention it as a stretch.

Scaling Milestones ​

  • Single region, small graph: Dijkstra on Postgres+PostGIS. OK for 100 QPS.
  • Regional dispatch: A* with haversine, in-memory graph. Up to 1K QPS.
  • High traffic market: Contraction Hierarchies + Redis H3 caching. 10K QPS per region.
  • Global: Per-region CH graphs, Flink traffic pipelines, ML correction layer, bulk source-contraction endpoint. 200K QPS globally.

Each jump solved a specific pain point: Dijkstra latency, traffic staleness, bulk cost, accuracy.

Summary Checklist ​

  • [ ] p99 200ms budget articulated.
  • [ ] CH (or similar) for routing, not plain Dijkstra.
  • [ ] H3-based caching with tile precompute.
  • [ ] Flink traffic feed with 60s windows.
  • [ ] Bulk NxM endpoint with source contraction.
  • [ ] ML correction layer (uETA) for accuracy.
  • [ ] Region-local graphs, mmap'd.
  • [ ] Fallback to historical speeds on traffic outage.

Key Numbers to Memorize ​

MetricValue
ETA QPS (global)200K
Point-to-point p99< 200ms
Bulk (100x100) p99< 500ms
Road graph edges (global)1B
Road graph edges (metro)50M
Traffic freshness< 60s
Target MAPE< 12%
Cache hit rate target~70%
CH query cost< 10ms

One-Liner You Should Remember ​

"Contraction Hierarchies over region-local road graphs for sub-10ms queries; H3-tile cache for 70% hit rate; Flink traffic feed with atomic edge-weight swap every 60s; bulk NxM via source contraction; ML correction layer to hit sub-12% MAPE. 200K QPS, p99 < 200ms point-to-point."

Frontend interview preparation reference.