08 - HLD & System Design (Food Delivery) β
Cross-Reference β
For foundational system design concepts (CAP theorem, SQL vs NoSQL, indexing, sharding, caching, message queues, load balancing, ACID), see paytm-prep/notes/04-hld-system-design.md. This file focuses on food-delivery-specific system designs relevant to Temple (ex-Zomato founding team).
Quick Reference (scan in 5 min) β
| System | Key Components | Key Patterns | Scale Challenges |
|---|---|---|---|
| Notification System | Kafka, Orchestrator, Channel Workers (push/SMS/email), Retry Queue, DLQ | Fan-out per channel, Exponential backoff, Rate limiting per user | Millions of concurrent sends during promos, priority ordering, delivery guarantees |
| Real-Time Updates | WebSocket Gateway, Redis Pub/Sub, Location Service, GPS ingestion pipeline | Geohash bucketing, Sticky sessions, Heartbeat + reconnect + polling fallback | High-frequency GPS writes, fan-out to watchers, horizontal WebSocket scaling |
| Food Delivery (Full) | User/Restaurant/Order/Delivery/Payment/Search/Notification services, API Gateway | Database-per-service, Saga for orders, CQRS for search, Queue-based load leveling | Peak hour auto-scaling, geo-partitioned orders, ETA estimation accuracy |
Design 1: Notification System at Scale β
Requirements β
- Multi-channel: push notifications, SMS, and email
- Volume: millions of users; promotional blasts during peak hours (lunch/dinner)
- Priority handling: order updates (high) vs promotional (low)
- Reliability: retry on transient failure, dead-letter for permanent failure
- User respect: rate limiting to prevent notification fatigue
Architecture β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Order Serviceβ β Promo Serviceβ βDelivery Svc β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Kafka Cluster β
β (partitioned by user_id for per-user ordering) β
β β
β Topics: notification.high β notification.low β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Notification Orchestrator β
β β
β 1. Read event from Kafka β
β 2. Resolve user preferences (opt-ins, channels) β
β 3. Check rate limit (Redis counter per user) β
β 4. Determine channels + priority β
β 5. Dispatch to channel-specific queues β
βββββ¬βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββ βββββββββββ ββββββββββββ
β Push β β SMS β β Email β
βWorkers β β Workers β β Workers β
β(FCM/APNs) β(Twilio) β β(SES/SG) β
βββββ¬βββββ βββββ¬ββββββ ββββββ¬ββββββ
β β β
β on failure β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Retry Queue (Kafka) β
β Exponential backoff: 1s β 2s β 4s β 8s β
β Max retries: 3 (push), 2 (SMS), 3 (email) β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β after max retries
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Dead Letter Queue (DLQ) β
β Permanent failures logged for manual review β
β Alert on DLQ depth > threshold β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββNotification Log (Audit Database) β
Every notification attempt is logged for audit, debugging, and analytics.
Table: notification_log
βββββββββββββββββββββββββββββββββββββββββββββββββ
id UUID PK
user_id UUID FK β users (indexed)
event_type VARCHAR(50) -- 'order_update', 'promo', 'delivery_status'
channel VARCHAR(20) -- 'push', 'sms', 'email'
priority VARCHAR(10) -- 'high', 'low'
status VARCHAR(20) -- 'sent', 'failed', 'dlq'
payload JSONB -- full notification content
attempt_count INT
created_at TIMESTAMP (indexed)
updated_at TIMESTAMPKey Design Decisions β
Why Kafka (not RabbitMQ)?
- Partitioning by
user_idguarantees per-user message ordering. A user always sees "order confirmed" before "order picked up." - High throughput for promotional blasts (millions of messages during dinner push).
- Consumer groups allow independent scaling of the orchestrator.
- Replay capability: if a bug in the orchestrator misprocesses events, rewind the offset and reprocess.
Why separate worker pools per channel?
- Different latency SLAs: push is expected in < 1 second, email can tolerate 30 seconds.
- Different failure modes: FCM may rate-limit you, Twilio may have regional outages, SES has sending quotas.
- Independent scaling: during a promo blast, email workers scale 10x while push workers stay steady.
- Isolating failures: an SMS provider outage does not back-pressure push delivery.
Why separate Kafka topics for priority?
- High-priority consumers (order updates) get dedicated resources and are never starved by a promo flood.
- Low-priority consumers can be throttled or paused during peak order load.
Rate Limiting per User β
// Redis-based sliding window rate limiter per user per channel
async function canSendNotification(
userId: string,
channel: "push" | "sms" | "email"
): Promise<boolean> {
const key = `ratelimit:notif:${channel}:${userId}`;
const now = Date.now();
const windowMs = 3600_000; // 1-hour window
const limits: Record<string, number> = {
push: 10, // max 10 push notifications per hour
sms: 3, // max 3 SMS per hour (cost + annoyance)
email: 5, // max 5 emails per hour
};
// Redis sorted set: score = timestamp, member = unique event id
const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, 0, now - windowMs); // prune old entries
pipeline.zcard(key); // count in window
const results = await pipeline.exec();
const count = results[1][1] as number;
return count < limits[channel];
}Design 2: Real-Time Updates System β
Use Case β
Live order tracking on the customer app: the map shows the delivery driver's location updating every few seconds, just like Zomato live tracking. Also used for:
- "Your order is being prepared" status updates
- Estimated time of arrival countdown
- Driver en-route path visualization
Architecture β
βββββββββββββββ βββββββββββββββββββββββββββ
β Driver App β β Customer App β
β (GPS every β β (shows live map) β
β 3-5 sec) β β β
ββββββββ¬βββββββ ββββββββββββ²ββββββββββββββββ
β β
β HTTP POST /location β WebSocket (wss://)
β β
βΌ β
ββββββββββββββββ βββββββββββ΄βββββββββββββββ
β API Gateway β β WebSocket Gateway β
β (auth, rate β β (sticky sessions via β
β limit) β β IP hash or conn ID) β
ββββββββ¬ββββββββ βββββββββββ²βββββββββββββββ
β β
βΌ β subscribe to
ββββββββββββββββ βββββββββββ΄βββββββββββββββ
β Location βββwriteβββΆβ Redis β
β Service β β β
β βββpublishββΆβ Pub/Sub channels: β
β β β location:{order_id} β
β β β β
β β β Key-Value store: β
β β β driver:{driver_id} β β
β β β {lat, lng, ts, heading}β
β β β TTL: 60s β
ββββββββββββββββ ββββββββββββββββββββββββββ
Fan-out flow:
1. Driver app POSTs GPS coordinates every 3-5 seconds
2. Location Service writes to Redis (key: driver:{id}, TTL 60s)
3. Location Service publishes to Redis Pub/Sub channel location:{order_id}
4. WebSocket Gateway instances subscribe to relevant channels
5. Gateway pushes update to connected customer via WebSocketScaling WebSockets Across Multiple Instances β
The challenge: a customer's WebSocket connects to Instance A, but the location update arrives at Instance B.
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β WS Gateway β β WS Gateway β β WS Gateway β
β Instance A β β Instance B β β Instance C β
β (1000 conns)β β (1200 conns)β β (950 conns) β
ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
β β β
βββββββββββββββββββββΌββββββββββββββββββββ
β
ββββββββΌββββββββ
β Redis β
β Pub/Sub β
β β
β All instancesβ
β subscribe to β
β all active β
β order channelsβ
ββββββββββββββββ
Solution: Every WS Gateway instance subscribes to Redis Pub/Sub
for all orders whose customers are connected to that instance.
When a location update is published, ALL subscribing instances
receive it, but only the one holding the customer's connection
actually sends the WebSocket frame.Connection Management β
// Server-side WebSocket connection lifecycle
interface TrackingConnection {
orderId: string;
userId: string;
ws: WebSocket;
lastHeartbeat: number;
}
const connections = new Map<string, TrackingConnection>();
function handleConnection(ws: WebSocket, orderId: string, userId: string) {
const conn: TrackingConnection = {
orderId,
userId,
ws,
lastHeartbeat: Date.now(),
};
connections.set(userId, conn);
// Subscribe this instance to the order's location channel
redisSubscriber.subscribe(`location:${orderId}`);
// Heartbeat: client must send ping every 30s
ws.on("message", (msg) => {
if (msg === "ping") {
conn.lastHeartbeat = Date.now();
ws.send("pong");
}
});
ws.on("close", () => {
connections.delete(userId);
// Unsubscribe if no other connections care about this order
if (!hasOtherSubscribers(orderId)) {
redisSubscriber.unsubscribe(`location:${orderId}`);
}
});
}
// Stale connection reaper β runs every 60s
setInterval(() => {
const now = Date.now();
for (const [userId, conn] of connections) {
if (now - conn.lastHeartbeat > 90_000) {
// No heartbeat in 90s β consider dead
conn.ws.terminate();
connections.delete(userId);
}
}
}, 60_000);Client-Side Reconnection with Fallback β
class OrderTracker {
private ws: WebSocket | null = null;
private reconnectAttempts = 0;
private maxReconnectAttempts = 5;
private pollingInterval: ReturnType<typeof setInterval> | null = null;
connect(orderId: string) {
const url = `wss://tracking.temple.app/ws/orders/${orderId}`;
this.ws = new WebSocket(url);
this.ws.onopen = () => {
this.reconnectAttempts = 0;
this.stopPolling();
this.startHeartbeat();
};
this.ws.onmessage = (event) => {
const update = JSON.parse(event.data);
this.onLocationUpdate(update); // update map marker
};
this.ws.onclose = () => {
if (this.reconnectAttempts < this.maxReconnectAttempts) {
// Exponential backoff: 1s, 2s, 4s, 8s, 16s
const delay = Math.pow(2, this.reconnectAttempts) * 1000;
this.reconnectAttempts++;
setTimeout(() => this.connect(orderId), delay);
} else {
// Fallback to HTTP polling every 5s
this.startPolling(orderId);
}
};
}
private startPolling(orderId: string) {
this.pollingInterval = setInterval(async () => {
const res = await fetch(`/api/orders/${orderId}/location`);
const update = await res.json();
this.onLocationUpdate(update);
}, 5_000);
}
private stopPolling() {
if (this.pollingInterval) {
clearInterval(this.pollingInterval);
this.pollingInterval = null;
}
}
private startHeartbeat() {
setInterval(() => {
if (this.ws?.readyState === WebSocket.OPEN) {
this.ws.send("ping");
}
}, 30_000);
}
private onLocationUpdate(update: {
lat: number;
lng: number;
heading: number;
eta: number;
}) {
// Render on map β implementation depends on map library
}
}Key Metrics β
| Metric | Target | Why It Matters |
|---|---|---|
| GPS update frequency | Every 3-5 seconds | Smooth map animation without excessive bandwidth |
| WebSocket message latency | < 200ms end-to-end | User perceives real-time movement |
| Fan-out ratio | 1 driver update β 1-3 watchers | Low for delivery (usually 1 customer + maybe 1 support agent) |
| Connection density per instance | ~10,000 concurrent WS | Memory-bound; each connection holds minimal state |
| Reconnection success rate | > 99% within 3 attempts | Users should rarely fall back to polling |
| Redis Pub/Sub message throughput | ~100K messages/sec | Handles all active deliveries in a city during peak |
Design 3: Food Delivery System (Full) β
This is the comprehensive end-to-end design. In an interview, you would not draw all of this β you would focus on 2-3 services and their interactions. But knowing the full picture lets you zoom into any part confidently.
Microservices Architecture β
ββββββββββββββββββββ
β Customer App β
β (React Native) β
ββββββββββ¬ββββββββββ
β
ββββββββββΌββββββββββ
β API Gateway β
β (Kong / Nginx) β
β β
β β’ Auth (JWT) β
β β’ Rate limiting β
β β’ Routing β
β β’ Request ID β
ββββ¬βββ¬βββ¬βββ¬βββ¬ββββ
β β β β β
βββββββββββββββββββββββ β β β βββββββββββββββββββββββ
β βββββββββββ β βββββββββββ β
βΌ βΌ βΌ βΌ βΌ
ββββββββββββββββ ββββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ
β User Service β β Restaurant β β Order β β Delivery β β Payment β
β β β Service β β Service β β Service β β Service β
β β’ Register β β β β β β β β β
β β’ Login β β β’ Menu CRUDβ β β’ Place β β β’ Assign β β β’ Charge β
β β’ Profile β β β’ Hours β β β’ Status β β β’ Track β β β’ Refund β
β β’ Addresses β β β’ Ratings β β β’ Cancel β β β’ ETA β β β’ Wallet β
β β β β’ Availabilityβ β’ Historyβ β β’ Route β β β’ Idempotencyβ
ββββββββ¬ββββββββ ββββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββββ¬ββββββββ
β β β β β
β PostgreSQL β PostgreSQL β PostgreSQL β Redis β PostgreSQL
β (users) β (menus, β (orders) β (driver β (transactions,
β β restaurants)β β locations) β ledger)
β β β β β
ββββββββββββββββββΌβββββββββββββΌβββββββββββββΌβββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Kafka Event Bus β
β β
β Topics: β
β β’ order.created β’ payment.completed β
β β’ order.confirmed β’ delivery.assigned β
β β’ order.cancelled β’ delivery.picked_up β
β β’ order.delivered β’ driver.location β
ββββββββββββββββββββ¬βββββββββββββββββββββββββββ
β
βββββββββββββββΌββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββ βββββββββββββ ββββββββββββββββ
β Search β βNotificationβ β Analytics β
β Service β β Service β β Service β
β β β β β β
β Elastic- β β Push/SMS/ β β Clickhouse / β
β search β β Email β β Data Lake β
ββββββββββββββ βββββββββββββ ββββββββββββββββDatabase Choices β
| Service | Database | Why |
|---|---|---|
| User Service | PostgreSQL | ACID for profile/address data, relational joins for preferences |
| Restaurant Service | PostgreSQL + Elasticsearch | PostgreSQL for source-of-truth menu/restaurant data; Elasticsearch synced via Kafka for full-text search + geo queries |
| Order Service | PostgreSQL (partitioned by city) | ACID for order state machine, partitioning isolates city-level failures |
| Delivery Service | Redis + PostgreSQL | Redis for real-time driver locations (key-value with TTL); PostgreSQL for assignment history and driver profiles |
| Payment Service | PostgreSQL | ACID is non-negotiable for money; append-only ledger pattern |
| Search Service | Elasticsearch | Geo-distance queries, fuzzy text matching, faceted filters (cuisine, rating, price) |
| Notification Service | Kafka + PostgreSQL | Kafka for reliable delivery pipeline; PostgreSQL for notification log/audit |
| Analytics | ClickHouse or BigQuery | Columnar storage for fast aggregations across millions of orders |
Key Flow 1: Order Placement β
Customer App Backend Services
βββββββββββββ ββββββββββββββββ
1. Search for restaurants
GET /search?q=biryani&lat=..&lng=..
β Search Service (Elasticsearch geo query)
β Restaurant list with menus
2. Add items to cart (client-side state)
3. Place order
POST /orders
{restaurantId, items[], addressId, paymentMethod}
β Order Service
β’ Validate items + prices with Restaurant Service
β’ Calculate total (subtotal + tax + delivery fee)
β’ Create order record (status: PENDING_PAYMENT)
β’ Publish: order.created β Kafka
4. Process payment
β Payment Service (triggered by order.created)
β’ Idempotency check (idempotency_key = order_id)
β’ Charge via payment gateway (Razorpay/Stripe)
β’ On success: publish payment.completed β Kafka
β’ On failure: publish payment.failed β Kafka
5. Confirm order
β Order Service (triggered by payment.completed)
β’ Update order status: CONFIRMED
β’ Publish: order.confirmed β Kafka
6. Notify restaurant
β Notification Service (triggered by order.confirmed)
β’ Push notification to restaurant tablet app
β’ Restaurant accepts β status: PREPARING
7. Assign delivery driver
β Delivery Service (triggered by order.confirmed)
β’ Run driver assignment algorithm
β’ Notify driver via push
β’ Driver accepts β publish: delivery.assigned
β’ Update order status: DRIVER_ASSIGNED
8. Pickup & Delivery
β Delivery Service
β’ Driver reaches restaurant β PICKED_UP
β’ GPS tracking starts (see Design 2)
β’ Driver reaches customer β DELIVERED
β’ Publish: order.delivered β Kafka
9. Post-delivery
β Notification Service
β’ Send "Rate your order" push to customer
β Analytics Service
β’ Log delivery time, distance, ratingKey Flow 2: Driver Assignment Algorithm β
The goal is to find the best available driver when an order is confirmed. "Best" balances proximity, current load, and fairness.
interface Driver {
id: string;
lat: number;
lng: number;
activeOrders: number; // currently carrying 0, 1, or 2 orders
maxConcurrentOrders: number; // typically 2
rating: number; // 1-5 average
lastAssignedAt: number; // timestamp β for fairness
}
interface Restaurant {
id: string;
lat: number;
lng: number;
}
interface ScoredDriver {
driver: Driver;
score: number;
}
function assignDriver(
restaurant: Restaurant,
candidateDrivers: Driver[]
): Driver | null {
// Step 1: Filter β only drivers with capacity
const available = candidateDrivers.filter(
(d) => d.activeOrders < d.maxConcurrentOrders
);
if (available.length === 0) return null;
// Step 2: Score each driver
const scored: ScoredDriver[] = available.map((driver) => {
const distance = haversineDistance(
driver.lat, driver.lng,
restaurant.lat, restaurant.lng
);
// Weights (tuned based on business metrics)
const distanceScore = Math.max(0, 1 - distance / 10); // 0-1, closer is better, 10km max
const loadScore = 1 - driver.activeOrders / driver.maxConcurrentOrders; // prefer less loaded
const ratingScore = driver.rating / 5; // prefer higher rated
const fairnessScore = 1 / (1 + (Date.now() - driver.lastAssignedAt) / 60_000); // prefer recently idle (inverted β lower = longer wait = higher priority)
const waitScore = 1 - fairnessScore; // flip: longer wait β higher score
const score =
0.4 * distanceScore + // proximity matters most
0.25 * loadScore + // don't overload drivers
0.15 * ratingScore + // quality of delivery
0.2 * waitScore; // fairness to idle drivers
return { driver, score };
});
// Step 3: Pick highest score
scored.sort((a, b) => b.score - a.score);
return scored[0].driver;
}
function haversineDistance(
lat1: number, lng1: number,
lat2: number, lng2: number
): number {
const R = 6371; // Earth's radius in km
const dLat = toRad(lat2 - lat1);
const dLng = toRad(lng2 - lng1);
const a =
Math.sin(dLat / 2) ** 2 +
Math.cos(toRad(lat1)) * Math.cos(toRad(lat2)) *
Math.sin(dLng / 2) ** 2;
return R * 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a));
}
function toRad(deg: number): number {
return (deg * Math.PI) / 180;
}Finding nearby drivers efficiently:
Rather than scoring every driver in the city, use Redis geospatial queries to find candidates within a radius.
// Store driver locations in Redis using GEOADD
await redis.geoadd("drivers:active", driverLng, driverLat, driverId);
// Find drivers within 5km of the restaurant
const nearbyDriverIds = await redis.georadius(
"drivers:active",
restaurantLng,
restaurantLat,
5, // radius
"km", // unit
"ASC", // sort by distance ascending
"COUNT", 20 // limit to 20 candidates
);
// Fetch full driver objects, then run scoring algorithmScaling Considerations β
Restaurant Search β
ββββββββββββββββ CDC / Kafka ββββββββββββββββββββ
β PostgreSQL β βββββββββββββββββββΆβ Elasticsearch β
β (source of β β β
β truth) β β β’ Geo-distance β
β β β queries β
β β β β’ Full-text β
ββββββββββββββββ β search β
β β’ Faceted filters β
β (cuisine, price,β
β rating, veg) β
ββββββββββββββββββββ// Elasticsearch query: "biryani" within 5km, rating >= 4, currently open
const query = {
bool: {
must: [
{ match: { menu_items: "biryani" } },
{ range: { rating: { gte: 4.0 } } },
{ term: { is_open: true } },
],
filter: {
geo_distance: {
distance: "5km",
location: { lat: 12.9716, lon: 77.5946 }, // user's location
},
},
},
};
// Sort by a blend of relevance, distance, and rating
const sort = [
"_score", // text relevance
{
_geo_distance: {
location: { lat: 12.9716, lon: 77.5946 },
order: "asc",
unit: "km",
},
},
{ rating: { order: "desc" } },
];Order Service: Partitioning by City/Region β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Order Service β
β β
β Routing logic: order.city_id β shard β
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β PG Shard 1 β β PG Shard 2 β β PG Shard 3 β β
β β Delhi NCR β β Mumbai + β β Bangalore + β β
β β β β Pune β β Hyderabad β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β
β Benefits: β
β β’ City-level failure isolation β
β β’ Independent scaling (Mumbai shard gets more replicas)β
β β’ Cross-shard queries rare (users order in one city) β
β β’ Compliance: data stays in region if needed β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββWhy city-based and not hash-based?
- Users order from restaurants in their city. Cross-city queries are near-zero.
- City-level isolation means a Mumbai database outage does not affect Bangalore orders.
- Operational: you can scale up the Mumbai shard independently during IPL cricket season (order spike in Mumbai).
Peak Hour Handling β
Normal load: ~1,000 orders/min
Peak (lunch): ~10,000 orders/min (10x spike)
Flash sale: ~50,000 orders/min (50x spike, temporary)
Strategy:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Queue-Based Load Leveling β
β β
β Spike ββββββββββ Steady βββββββββββββ
β traffic βββΆβ Kafka βββββ drain ββββββΆβ Order ββ
β (10K/min) β Queue β (controlled) β Workers ββ
β ββββββββββ βββββββββββββ
β β
β β’ Kafka absorbs the burst β
β β’ Workers process at a sustainable rate β
β β’ Auto-scaling adds more workers within 2-3 min β
β β’ Backpressure: if queue depth > threshold, β
β show "high demand" message on app β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Auto-Scaling Triggers:
βββββββββββββββββββββ
β’ Kafka consumer lag > 5,000 messages β scale up order workers
β’ CPU > 70% on order service pods β scale up pods
β’ Active WebSocket connections > 8,000/instance β scale up WS gateway
β’ Scale down after 10 min of low utilization (avoid flapping)ETA Estimation β
ETA is one of the most visible features. If the app says 30 minutes and food arrives in 50, trust is broken.
ETA = Restaurant Prep Time + Driver Pickup Time + Delivery Travel Time + Buffer
Where:
βββββ
Restaurant Prep Time:
β’ Base: restaurant's average prep time (tracked per restaurant)
β’ Adjusted by: current order queue depth at that restaurant
β’ Example: base 20 min, 8 orders in queue β 20 + (8 * 2) = 36 min
Driver Pickup Time:
β’ Distance from driver to restaurant (Google Maps / OSRM)
β’ Adjusted by: current traffic conditions (Google Traffic API)
β’ Example: 3km, light traffic β 8 min
Delivery Travel Time:
β’ Distance from restaurant to customer
β’ Adjusted by: traffic, time of day, historical delivery times on this route
β’ Example: 5km, moderate traffic β 15 min
Buffer:
β’ Static buffer: +3 min (accounts for parking, stairs, finding address)
β’ Dynamic: increased during rain or peak hours
Total ETA: 36 + 8 + 15 + 3 = 62 mininterface ETAComponents {
restaurantPrepMin: number;
driverToRestaurantMin: number;
restaurantToCustomerMin: number;
bufferMin: number;
}
async function estimateETA(
restaurantId: string,
driverLocation: { lat: number; lng: number },
customerLocation: { lat: number; lng: number }
): Promise<{ totalMin: number; breakdown: ETAComponents }> {
// 1. Restaurant prep time
const restaurant = await restaurantService.get(restaurantId);
const queueDepth = await orderService.getActiveOrderCount(restaurantId);
const restaurantPrepMin =
restaurant.avgPrepTimeMin + queueDepth * 2; // 2 min per queued order
// 2. Driver to restaurant
const driverToRestaurant = await mapsService.getETA(
driverLocation,
{ lat: restaurant.lat, lng: restaurant.lng }
);
const driverToRestaurantMin = driverToRestaurant.durationMin;
// 3. Restaurant to customer
const restaurantToCustomer = await mapsService.getETA(
{ lat: restaurant.lat, lng: restaurant.lng },
customerLocation
);
const restaurantToCustomerMin = restaurantToCustomer.durationMin;
// 4. Buffer β increases during rain or peak
const isRaining = await weatherService.isRaining(customerLocation);
const isPeakHour = isPeak(new Date());
let bufferMin = 3;
if (isRaining) bufferMin += 5;
if (isPeakHour) bufferMin += 3;
const totalMin =
restaurantPrepMin +
driverToRestaurantMin +
restaurantToCustomerMin +
bufferMin;
return {
totalMin: Math.ceil(totalMin),
breakdown: {
restaurantPrepMin,
driverToRestaurantMin,
restaurantToCustomerMin,
bufferMin,
},
};
}
function isPeak(now: Date): boolean {
const hour = now.getHours();
return (hour >= 12 && hour <= 14) || (hour >= 19 && hour <= 22);
}Improving ETA accuracy over time:
- Track actual vs predicted ETA for every order.
- Feed into an ML model (features: restaurant, time of day, weather, traffic, order size).
- Use the ML model's output as the ETA instead of the formula, once accuracy exceeds the heuristic.
Interview Tips β
- Start with requirements, not architecture. Ask clarifying questions: "How many users? What channels? What's the latency SLA?" This shows maturity.
- Draw the happy path first, then failure handling. Interviewers want to see you think about retries, DLQs, idempotency, and circuit breakers.
- Justify every database choice. Do not say "I'd use MongoDB because it's fast." Say "I'd use PostgreSQL for orders because state transitions require ACID guarantees."
- Know your numbers. Kafka throughput (~1M messages/sec per broker), Redis latency (~1ms), WebSocket connection limits (~10K per instance), Elasticsearch query latency (~10-50ms).
- Mention observability. Distributed tracing (Jaeger), metrics (Prometheus + Grafana), centralized logging (ELK). This signals production experience.
- For Temple specifically: Their engineering blog and Zomato's tech blog cover exactly these topics. Name-dropping specifics like "Zomato uses Kafka for order events and Redis for driver locations" shows you did your homework.