Skip to content

20 β€” Problem: Food Delivery System ​

Understanding the Problem ​

A food delivery platform is not primarily a CRUD problem, a search problem, or a payments problem β€” it is a coordination problem. Three independent actors (customer, restaurant, delivery agent) each hold partial state, each can fail independently, and all three must transition together through a shared order lifecycle. Every interesting question in this design β€” "who gets the order?", "what happens if the agent cancels?", "how do we charge the right price?" β€” is a question about state transitions, not about data models.

What senior candidates do in the first 5 minutes

A junior candidate opens by listing entities: Customer, Restaurant, Order, Agent. A senior candidate opens by drawing the order state machine on the whiteboard and naming the two or three transitions where two actors race for the same outcome (agent assignment, simultaneous cancellation, restaurant rejection after customer payment). Identify the state machine first, then hang entities off it. The rest of the design falls out of that choice.

The problem is famous in interviews because it has three hard sub-problems woven together: (1) a multi-actor state machine where valid transitions depend on who is acting, (2) a real-time assignment decision with concurrent writers fighting for a scarce resource (idle agents), and (3) a pricing pipeline composed of many independent rules (subtotal, taxes, delivery fee, surge, promos, tips). A good design keeps those three concerns surgically separated.


Clarifying Questions ​

You: What's the geographic scope β€” one city, one country, or global from day one?

Interviewer: Design for a single city first; call out what changes when we go multi-city. Assume we eventually operate in many cities but each city is an independent operational unit.

You: Is the restaurant catalog open-ended (anyone can list) or curated?

Interviewer: Curated β€” restaurants are onboarded by an ops team. Menus are edited by restaurant admins.

You: Are scheduled orders (order for 7 PM tonight) in scope?

Interviewer: Mention how you'd extend to scheduled orders, but design the happy path for on-demand delivery.

You: Group orders β€” multiple customers adding to one cart?

Interviewer: Out of scope for the base design. Call it out as an extension.

You: At which states can a customer cancel? What about the restaurant? The agent?

Interviewer: Customer can cancel freely until PREPARING, and with a fee after. Restaurant can reject at ACCEPTED. Agent can decline at assignment but not after PICKED_UP. Work out the full matrix.

You: Tips β€” at order placement, after delivery, or both?

Interviewer: Both. Customer can add a tip at checkout and modify it up to 1 hour post-delivery.

You: Contactless delivery β€” how do we confirm the handoff without a signature?

Interviewer: Agent marks as delivered with a photo upload. Customer can dispute within 24 hours.

You: Live tracking β€” is that in scope?

Interviewer: Note that it exists but don't design the streaming layer. We care about the data model for live tracking, not the pub/sub plumbing. That's an HLD conversation.

You: What's the scale we're designing for?

Interviewer: Assume a large metro area β€” 10k concurrent orders at peak, 50k active agents, 100k restaurants. Design should survive 10x growth without a rewrite.

You: Payments β€” do we handle PCI ourselves or integrate a processor?

Interviewer: Integrate Stripe/Razorpay. Our job is to orchestrate, not to hold card numbers.


Functional Requirements ​

  1. Discover restaurants: a customer in a location can browse/search open, in-range restaurants and view their menus.
  2. Cart: add/remove items, apply promo codes, see a live price breakdown.
  3. Checkout and place order: convert cart β†’ order with payment authorization. Produces an immutable order record.
  4. Restaurant flow: restaurant sees incoming orders, can accept or reject, marks as preparing and then as ready.
  5. Agent assignment: on order acceptance, the system picks an idle agent near the restaurant. Agent can accept/decline the offer. On decline, reassign.
  6. Pickup and delivery: agent marks PICKED_UP at the restaurant and DELIVERED at the customer's address (photo confirmation for contactless).
  7. Cancellation: customer, restaurant, and system can cancel within their allowed windows. Fees apply per policy.
  8. Pricing: compute a breakdown of subtotal + taxes + delivery fee + surge + promo discount + tip. Pricing is quoted at cart time and re-validated at checkout.
  9. Tips: at checkout and post-delivery (up to 1 hour).
  10. Order history: customer and agent can view past orders.

Out of scope: live GPS streaming to the customer UI (HLD), restaurant-side inventory, surge prediction models (ML), chat between customer and agent, multi-city routing (we design one city).


Non-Functional Requirements ​

RequirementTargetWhy it matters here
Restaurant search latencyp99 < 300 msCustomers churn on slow browse. Catalog is mostly static β€” cacheable.
Assignment correctnessNo double-assignment under any raceA single agent assigned to two orders silently breaks physical-world invariants. Worse than a slow assignment.
Assignment latencyp95 < 2 s end-to-endDelays here cascade β€” restaurant sits idle waiting for a rider.
Order placement availability99.95%Failed placements are directly lost revenue.
Fairness to agentsBounded starvationAn agent at the edge of a zone shouldn't be starved by eager nearest-first logic.
Partial outage toleranceMust degrade, not failIf surge pricing service is down, fall back to base pricing β€” don't fail the order.
IdempotencyAll state-change APIsMobile networks drop. Double-tap of "Accept" must not assign twice.

The tension worth calling out: correctness of assignment beats latency of assignment. A design that picks an agent in 50 ms but 0.1% of the time assigns the same agent to two orders is worse than a design that picks in 800 ms with zero double-assignments. Interviewers probe this β€” don't optimize latency at the cost of correctness.


Core Entities and Relationships ​

EntityResponsibility
CustomerIdentity, addresses, saved payment methods, order history.
RestaurantIdentity, location, hours, menu, acceptance state. Owns a Menu.
MenuA collection of MenuItems. Versioned β€” price/availability changes produce a new effective menu.
MenuItemName, description, base price, modifiers (toppings, sizes), availability flag.
CartCustomer-scoped, restaurant-scoped, mutable. Holds CartItems and a quoted PriceBreakdown. Ephemeral.
OrderThe immutable commitment. Snapshot of cart items, pricing, addresses, and the live state machine. Owns OrderItems.
OrderItemSnapshot of a MenuItem at order time (name, price, modifiers). Immutable.
DeliveryAgentIdentity, vehicle, rating, current AgentState, current assignment, last known AgentLocation.
AgentLocation{lat, lng, updatedAt}. Updated on a heartbeat.
AddressStructured location with geocoded lat/lng. Customer-owned.
PaymentA payment attempt on an order. Has its own state (authorized, captured, refunded, failed).
AssignmentThe join between an Order and a DeliveryAgent with offer/accept/decline state.
PriceBreakdownValue object: subtotal, taxes, delivery fee, surge, promo, tip, total. Immutable per quote.

Why is OrderItem a snapshot of MenuItem and not a reference? Menus change. A restaurant bumps a burger from $9 to $10 mid-day. An order placed at 11:59 must remain at $9 forever β€” for receipts, refunds, disputes, accounting. The order is an immutable legal record of what was agreed to; the menu is a mutable presentation of what's currently offered. Never conflate them.

Why is Cart separate from Order? Cart is working-state; Order is committed-state. Cart can be abandoned, restored from another device, emptied. Order, once placed, never changes except through its state machine. Mixing the two produces bugs like "customer edited cart β†’ altered a placed order's items."

Why separate Payment from Order? An order can have multiple payment attempts (declined card, retry, split payment). Payment has its own lifecycle with processor webhooks. Keeping them separate avoids the order's state machine carrying payment-specific states, which would balloon the diagram.


Interfaces ​

Seams worth isolating. Each one is a plug point for strategy swaps, testing, and team ownership boundaries.

typescript
type Location = { lat: number; lng: number };

interface ILocationService {
  // Distance/ETA abstraction β€” haversine in V1, routing API later.
  distance(a: Location, b: Location): number;
  eta(from: Location, to: Location): number; // minutes
  // Spatial query: which agents are near this point?
  nearbyIdleAgents(origin: Location, radiusKm: number, limit: number): DeliveryAgent[];
}

interface IAssignmentStrategy {
  // Pick an agent for an order. Returns null if none suitable.
  // Must be called inside an atomic block that transitions the agent
  // from IDLE β†’ ASSIGNED.
  pick(order: Order, candidates: DeliveryAgent[]): DeliveryAgent | null;
}

interface IPricingRule {
  // Each rule contributes a named line item (positive or negative).
  apply(ctx: PricingContext, current: PriceBreakdown): PriceBreakdown;
}

interface IPricingStrategy {
  // Composes rules in order: subtotal β†’ tax β†’ delivery fee β†’ surge β†’ promo β†’ tip.
  quote(ctx: PricingContext): PriceBreakdown;
}

interface IOrderRepository {
  save(order: Order): void;
  findById(id: OrderId): Order | null;
  // Atomic state transition with optimistic concurrency check.
  transitionState(
    id: OrderId,
    fromState: OrderState,
    toState: OrderState,
    actorId: string,
  ): boolean;
}

interface IAgentRepository {
  findById(id: AgentId): DeliveryAgent | null;
  // Atomic reserve: flip IDLE β†’ ASSIGNED only if currently IDLE.
  tryReserve(id: AgentId, orderId: OrderId): boolean;
  updateLocation(id: AgentId, loc: Location): void;
}

interface INotificationService {
  notifyCustomer(customerId: CustomerId, event: OrderEvent): void;
  notifyRestaurant(restaurantId: RestaurantId, event: OrderEvent): void;
  notifyAgent(agentId: AgentId, event: OrderEvent): void;
}

Each interface maps to a replaceable component. IAssignmentStrategy can be nearest-idle, batched, or ML-scored without touching the caller. IPricingRule is the hook for a surge rule, a promo rule, or a city-specific tax rule β€” rules are composed, not hardcoded.


Class Diagram ​

                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚     OrderService       │──────────┐
                       β”‚  (application facade)  β”‚          β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
                               β”‚ uses                      β”‚ uses
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
                 β”‚             β”‚                β”‚          β”‚
                 β–Ό             β–Ό                β–Ό          β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚IPricing      β”‚ β”‚IAssignment   β”‚ β”‚IOrder    β”‚ β”‚INotification   β”‚
        β”‚Strategy      β”‚ β”‚Strategy      β”‚ β”‚Repositoryβ”‚ β”‚Service         β”‚
        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚                β”‚              β”‚
               β”‚ composes       β”‚ consults     β”‚ persists
               β–Ό                β–Ό              β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚IPricingRule  β”‚ β”‚ILocation     β”‚ β”‚  Order   │────────┐
        β”‚(many)        β”‚ β”‚Service       β”‚ β”‚          β”‚        β”‚ 1..*
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜        β–Ό
                                               β”‚       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                               β”‚       β”‚ OrderItem  β”‚
                                               β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                               β”‚
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚                         β”‚                     β”‚
                     β–Ό                         β–Ό                     β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  Customer  β”‚           β”‚ Restaurant  β”‚        β”‚DeliveryAgent β”‚
              β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                    β”‚ 1..*                    β”‚ 1              1    β”‚
                    β–Ό                         β–Ό                     β–Ό
              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
              β”‚  Address  β”‚              β”‚  Menu  │───┐       β”‚AgentLocationβ”‚
              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                      β”‚ 1..*
                                                      β–Ό
                                                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                β”‚ MenuItem β”‚
                                                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
             β”‚  Cart   │──── converts ───▢│   Order    β”‚
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   (one-way)     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The rigid separation: Cart lives in the working layer and is mutable; Order is the committed artifact. Arrows from OrderService go out to abstractions, not in from concrete classes β€” the service doesn't know what pricing rules are active or how agents are picked, only that the abstractions exist.


Class Design ​

Order and its state machine ​

The order state machine is the spine of the system. Every other class exists to cause or respond to these transitions.

          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  PLACED  │◀── Customer submits checkout (payment authorized).
          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
               β”‚ restaurant accepts
               β”‚ OR restaurant rejects β†’ CANCELLED
               β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚ ACCEPTED │◀── Triggers agent assignment.
          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
               β”‚ restaurant starts cooking
               β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚ PREPARING │◀── Customer cancellation now costs a fee.
          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
               β”‚ food is ready
               β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚  READY   │◀── Waiting for agent arrival.
          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
               β”‚ agent scans / confirms pickup
               β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚ PICKED_UP │◀── Point of no return for cancellation.
          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
               β”‚ agent confirms handoff
               β–Ό
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚ DELIVERED │◀── Terminal.
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

 CANCELLED (terminal) reachable from PLACED, ACCEPTED, PREPARING
                     β€” by customer, restaurant, or system.

Transitions are not symmetric across actors. The who can do what when matrix:

From β†’ ToCustomerRestaurantAgentSystem
PLACED β†’ ACCEPTEDβœ“
PLACED β†’ CANCELLEDβœ“ (free)βœ“ (reject)βœ“ (timeout)
ACCEPTED β†’ PREPARINGβœ“
ACCEPTED β†’ CANCELLEDβœ“ (free)βœ“ (rare)βœ“ (no agent)
PREPARING β†’ READYβœ“
PREPARING β†’ CANCELLEDβœ“ (fee)βœ“ (emergency)
READY β†’ PICKED_UPβœ“
PICKED_UP β†’ DELIVEREDβœ“

Encoded as a guard table in code, not as branchy conditionals scattered through handlers. That's the difference between a system that's analyzable and one that isn't.

Cart β†’ Order transition ​

Conversion is a one-way trapdoor. Once the customer hits "Place Order," the cart snapshot becomes an immutable Order:

  1. Freeze: read current cart items.
  2. Quote: call IPricingStrategy.quote() with the frozen snapshot. Store the PriceBreakdown on the order.
  3. Authorize payment: call PaymentService.authorize(total). On failure, return error β€” no order created.
  4. Persist: save Order with state=PLACED. At this point the order exists and is visible to the restaurant.
  5. Clear cart.
  6. Notify restaurant (via INotificationService).

Between steps 3 and 4 is the dangerous window β€” payment authorized but order not persisted. Handle with a saga pattern: if step 4 fails, void the authorization.

DeliveryAgent state machine ​

            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ OFFLINE │◀── Agent logs in / goes online β†’ IDLE
            β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                 β”‚ go online
                 β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚  IDLE   │◀── Eligible for assignment.
            β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                 β”‚ accepts offered order
                 β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ ASSIGNED │◀── Traveling to restaurant.
            β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                 β”‚ confirms pickup at restaurant
                 β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ PICKING_UP β”‚  (transient β€” some designs collapse this)
            β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
                 β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚ DELIVERING │◀── Has food, traveling to customer.
            β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚ confirms handoff β†’ IDLE
                 β–Ό
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”‚  IDLE   β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    OFFLINE reachable from IDLE (explicit go-offline) or any state
    via emergency (rare β€” triggers recovery workflow).

The critical invariant: AgentState.IDLE ↔ eligible for assignment. That single bit is the concurrency-control primitive for the entire assignment subsystem. Any state flip to or from IDLE must be atomic with the corresponding order change.

Assignment logic ​

Default strategy: nearest-idle agent, with tiebreakers:

  1. Primary: haversine distance from agent's current location to the restaurant, ascending.
  2. Tiebreaker 1: agent rating descending (prefer better agents).
  3. Tiebreaker 2: orders completed today ascending (fairness β€” don't starve agents).
  4. Tiebreaker 3: random (to avoid deterministic starvation at the margin).

Implemented as sort-then-reserve: query spatial index for agents within radius R (start 2 km, expand to 5 km if no match), sort by the tuple above, then walk the list and attempt tryReserve on each until one succeeds. Reserve is atomic β€” a compare-and-swap on the agent's state. Details in Concurrency Considerations below.

Pricing composition ​

Pricing is an ordered pipeline of IPricingRules applied over a running PriceBreakdown:

start:          subtotal=0, taxes=0, deliveryFee=0, surge=0, promo=0, tip=0
SubtotalRule:   subtotal = Ξ£ item.price * qty
TaxRule:        taxes = subtotal * cityTaxRate
DeliveryRule:   deliveryFee = base + distanceFee
SurgeRule:      surge = (subtotal + deliveryFee) * surgeMultiplier (if active)
PromoRule:      promo = -discountAmount (validates eligibility)
TipRule:        tip = customer-chosen
FinalizeRule:   total = sum, rounded to currency precision

Each rule is independently testable. Adding "student discount" means writing a new IPricingRule and slotting it between PromoRule and FinalizeRule. No existing rule changes.


Key Methods ​

The core operations, in TypeScript. Focus is on correctness invariants and race handling β€” these are what interviewers probe.

typescript
// ---- Types ----

type OrderId = string;
type CustomerId = string;
type RestaurantId = string;
type AgentId = string;

enum OrderState {
  PLACED = "PLACED",
  ACCEPTED = "ACCEPTED",
  PREPARING = "PREPARING",
  READY = "READY",
  PICKED_UP = "PICKED_UP",
  DELIVERED = "DELIVERED",
  CANCELLED = "CANCELLED",
}

enum AgentState {
  OFFLINE = "OFFLINE",
  IDLE = "IDLE",
  ASSIGNED = "ASSIGNED",
  PICKING_UP = "PICKING_UP",
  DELIVERING = "DELIVERING",
}

type Actor =
  | { kind: "customer"; id: CustomerId }
  | { kind: "restaurant"; id: RestaurantId }
  | { kind: "agent"; id: AgentId }
  | { kind: "system" };

interface PriceBreakdown {
  subtotal: number;
  taxes: number;
  deliveryFee: number;
  surge: number;
  promo: number;
  tip: number;
  total: number;
  currency: string;
}

class Order {
  constructor(
    public readonly id: OrderId,
    public readonly customerId: CustomerId,
    public readonly restaurantId: RestaurantId,
    public readonly items: ReadonlyArray<OrderItem>,
    public readonly deliveryAddress: Address,
    public price: PriceBreakdown,
    public state: OrderState = OrderState.PLACED,
    public assignedAgentId: AgentId | null = null,
    public readonly createdAt: Date = new Date(),
    public version: number = 0, // for optimistic concurrency
  ) {}
}

// ---- Valid transitions, encoded as a guard table ----

const VALID_TRANSITIONS: ReadonlyMap<OrderState, ReadonlySet<OrderState>> = new Map([
  [OrderState.PLACED,    new Set([OrderState.ACCEPTED, OrderState.CANCELLED])],
  [OrderState.ACCEPTED,  new Set([OrderState.PREPARING, OrderState.CANCELLED])],
  [OrderState.PREPARING, new Set([OrderState.READY, OrderState.CANCELLED])],
  [OrderState.READY,     new Set([OrderState.PICKED_UP])],
  [OrderState.PICKED_UP, new Set([OrderState.DELIVERED])],
  [OrderState.DELIVERED, new Set()],
  [OrderState.CANCELLED, new Set()],
]);

function isValidTransition(from: OrderState, to: OrderState): boolean {
  return VALID_TRANSITIONS.get(from)?.has(to) ?? false;
}

// Who is allowed to trigger each transition?
const TRANSITION_ACTORS: ReadonlyMap<string, ReadonlySet<Actor["kind"]>> = new Map([
  ["PLACED->ACCEPTED",    new Set(["restaurant"])],
  ["PLACED->CANCELLED",   new Set(["customer", "restaurant", "system"])],
  ["ACCEPTED->PREPARING", new Set(["restaurant"])],
  ["ACCEPTED->CANCELLED", new Set(["customer", "restaurant", "system"])],
  ["PREPARING->READY",    new Set(["restaurant"])],
  ["PREPARING->CANCELLED",new Set(["customer", "restaurant"])],
  ["READY->PICKED_UP",    new Set(["agent"])],
  ["PICKED_UP->DELIVERED",new Set(["agent"])],
]);

// ---- OrderService: the application facade ----

class OrderService {
  constructor(
    private readonly orders: IOrderRepository,
    private readonly agents: IAgentRepository,
    private readonly pricing: IPricingStrategy,
    private readonly assignment: IAssignmentStrategy,
    private readonly location: ILocationService,
    private readonly notifications: INotificationService,
    private readonly payments: IPaymentService,
  ) {}

  /**
   * Cart β†’ Order conversion. This is a trapdoor: cart is consumed, order
   * is persisted only if payment authorization succeeds.
   */
  placeOrder(cart: Cart, address: Address): Order {
    // 1. Freeze and quote
    const items = cart.items.map(ci => OrderItem.fromCartItem(ci));
    const priceCtx = new PricingContext(items, address, cart.promoCode, cart.tip);
    const price = this.pricing.quote(priceCtx);

    // 2. Authorize payment (external; may throw)
    const authRef = this.payments.authorize(cart.customerId, price.total, price.currency);

    // 3. Persist order β€” only after payment is authorized
    const order = new Order(
      newId(),
      cart.customerId,
      cart.restaurantId,
      items,
      address,
      price,
    );
    try {
      this.orders.save(order);
    } catch (e) {
      // Persistence failed after auth succeeded β€” void to avoid orphan charge
      this.payments.void(authRef);
      throw e;
    }

    // 4. Notify restaurant
    this.notifications.notifyRestaurant(order.restaurantId, {
      type: "ORDER_PLACED",
      orderId: order.id,
    });

    return order;
  }

  /**
   * Valid-transition check + atomic state flip + side-effects.
   * Returns true on success, false if the order was already past `newState`.
   */
  updateOrderState(orderId: OrderId, newState: OrderState, actor: Actor): boolean {
    const order = this.orders.findById(orderId);
    if (!order) throw new Error(`order ${orderId} not found`);

    if (!isValidTransition(order.state, newState)) {
      throw new Error(`invalid transition ${order.state} β†’ ${newState}`);
    }

    const key = `${order.state}->${newState}`;
    const allowedActors = TRANSITION_ACTORS.get(key);
    if (!allowedActors || !allowedActors.has(actor.kind)) {
      throw new Error(`actor ${actor.kind} cannot perform ${key}`);
    }

    // Atomic state flip: compare-and-swap on the state column.
    // If someone else already transitioned the order, we lose the race and return false.
    const ok = this.orders.transitionState(orderId, order.state, newState, actorDescription(actor));
    if (!ok) return false;

    // Side effects β€” only after the state flip succeeds
    this.onStateChanged(order, newState, actor);
    return true;
  }

  private onStateChanged(order: Order, newState: OrderState, actor: Actor): void {
    switch (newState) {
      case OrderState.ACCEPTED:
        // Restaurant accepted β†’ trigger agent assignment asynchronously
        this.assignAgent(order.id);
        this.notifications.notifyCustomer(order.customerId, {
          type: "ORDER_ACCEPTED", orderId: order.id,
        });
        break;

      case OrderState.READY:
        if (order.assignedAgentId) {
          this.notifications.notifyAgent(order.assignedAgentId, {
            type: "ORDER_READY_FOR_PICKUP", orderId: order.id,
          });
        }
        break;

      case OrderState.DELIVERED:
        this.payments.capture(order.id);
        // Release agent: DELIVERING β†’ IDLE, atomically.
        if (order.assignedAgentId) {
          this.agents.updateState(order.assignedAgentId, AgentState.DELIVERING, AgentState.IDLE);
        }
        this.notifications.notifyCustomer(order.customerId, {
          type: "ORDER_DELIVERED", orderId: order.id,
        });
        break;

      case OrderState.CANCELLED:
        this.handleCancellation(order, actor);
        break;
    }
  }

  /**
   * Race-safe assignment. Returns null if no agent is available within radius
   * (caller should retry or escalate).
   *
   * Race model: multiple concurrent `assignAgent` calls can be in flight for
   * different orders. They all query the same spatial index and may sort the
   * same agent to the top. We resolve via `tryReserve`, which atomically
   * transitions the agent IDLE β†’ ASSIGNED and returns false on contention.
   */
  assignAgent(orderId: OrderId): DeliveryAgent | null {
    const order = this.orders.findById(orderId);
    if (!order || order.state !== OrderState.ACCEPTED) return null;

    const restaurant = this.getRestaurantLocation(order.restaurantId);
    const radii = [2, 5, 10]; // km β€” progressive expansion

    for (const radiusKm of radii) {
      const candidates = this.location.nearbyIdleAgents(restaurant, radiusKm, 20);
      if (candidates.length === 0) continue;

      const ranked = this.assignment.pick(order, candidates);
      if (!ranked) continue;

      // Atomic reserve: flip IDLE β†’ ASSIGNED on exactly this agent.
      // On contention (another assignment already took them), try next candidate.
      const reserved = this.tryReserveInOrder(order, candidates);
      if (reserved) return reserved;
    }

    // No agent found anywhere β€” caller's responsibility to retry or cancel.
    return null;
  }

  private tryReserveInOrder(order: Order, ranked: DeliveryAgent[]): DeliveryAgent | null {
    for (const agent of ranked) {
      if (this.agents.tryReserve(agent.id, order.id)) {
        // Persist the assignment on the order side too
        this.orders.attachAgent(order.id, agent.id);
        this.notifications.notifyAgent(agent.id, {
          type: "OFFER", orderId: order.id, expiresInSec: 30,
        });
        return agent;
      }
      // else: lost the race for this agent, try next
    }
    return null;
  }

  /**
   * Cancellation policy: fees depend on current state.
   */
  private handleCancellation(order: Order, actor: Actor): void {
    const wasInFlight = order.state === OrderState.PREPARING || order.state === OrderState.READY;
    if (actor.kind === "customer" && wasInFlight) {
      this.payments.capture(order.id, order.price.total); // retain for fee
    } else {
      this.payments.void(order.id);
    }
    if (order.assignedAgentId) {
      this.agents.release(order.assignedAgentId); // back to IDLE
    }
    this.notifications.notifyCustomer(order.customerId, {
      type: "ORDER_CANCELLED", orderId: order.id,
    });
    this.notifications.notifyRestaurant(order.restaurantId, {
      type: "ORDER_CANCELLED", orderId: order.id,
    });
  }

  /**
   * Pricing is pure: same inputs β†’ same breakdown. Safe to call at cart-view,
   * at checkout, and at re-quote. Quoted price must be re-validated at
   * checkout to guard against surge-change between view and submit.
   */
  computePrice(ctx: PricingContext): PriceBreakdown {
    return this.pricing.quote(ctx);
  }

  private getRestaurantLocation(_id: RestaurantId): Location {
    // Indirection via restaurant repository β€” omitted for brevity.
    return { lat: 0, lng: 0 };
  }
}

// ---- Agent-side handlers ----

class AgentService {
  constructor(
    private readonly agents: IAgentRepository,
    private readonly orders: IOrderRepository,
    private readonly orderService: OrderService,
  ) {}

  /**
   * Agent accepts an offer. Idempotent β€” repeated calls for the same
   * (agentId, orderId) are no-ops after the first success.
   */
  acceptOffer(agentId: AgentId, orderId: OrderId): boolean {
    const agent = this.agents.findById(agentId);
    if (!agent) return false;
    // Agent should already be in ASSIGNED (set by tryReserve). Idempotency:
    // if we're already ASSIGNED to this order, treat as success.
    if (agent.state === AgentState.ASSIGNED && agent.currentOrderId === orderId) {
      return true;
    }
    return false;
  }

  /**
   * Agent declines an offer. Must release the reservation AND trigger
   * re-assignment. Idempotent.
   */
  declineOffer(agentId: AgentId, orderId: OrderId): void {
    this.agents.release(agentId); // ASSIGNED β†’ IDLE (only if still ASSIGNED to this order)
    // Penalize lightly? Business policy. Then re-trigger assignment.
    this.orderService.assignAgent(orderId);
  }

  /**
   * High-frequency endpoint. Update agent location.
   * NOT a state-changing operation β€” just writes lat/lng/timestamp.
   * Idempotent and safe to drop under load (staleness is acceptable).
   */
  updateLocation(agentId: AgentId, loc: Location): void {
    this.agents.updateLocation(agentId, loc);
  }
}

A few conventions worth noting:

  • Every state-changing operation is idempotent β€” mobile clients retry freely, and doubling up should be a no-op at the domain level.
  • Compare-and-swap is the concurrency primitive for both agent reservation and order state transitions. No lock-based designs; they don't scale.
  • Side effects fire after the state flip commits, never before. A notification sent before the DB commit means a user might see "accepted" for an order that didn't actually transition.
  • The Order object carries a version field for optimistic locking. The repository's transitionState uses WHERE id = ? AND state = ? AND version = ? and bumps on write.

Design Decisions & Tradeoffs ​

1. Assignment: greedy-nearest vs. batched optimization ​

ApproachWhen it winsWhen it loses
Greedy nearest-idle (our default)Low latency to decision (ms). Simple to reason about. Good at normal load.Suboptimal at peak β€” sends the closest agent to this order, missing a globally better pairing for a concurrent order arriving 200 ms later.
Batched optimization (collect 5-10 seconds of orders and agents, solve an assignment matrix)Higher global utilization. Can honor "Zomato Plus" priority. Better fairness.5-10 s added latency. Complex code. Harder to explain to customers ("why isn't anyone accepting?").

Decision: greedy-nearest for V1, with IAssignmentStrategy pluggable so batched can replace it per-zone during peak without touching callers. DoorDash's Dasher Batching and Uber's batching logic are both variants of this β€” swap strategies based on load signals.

2. Location updates: push frequency vs. pull-on-demand ​

Agents post GPS every few seconds to the server. Two extreme designs:

  • High-frequency push (every 3 seconds): fresh data, huge write volume (50k agents Γ— 20 updates/min = 1M writes/min just on location).
  • Pull-on-demand: only read location when assigning or tracking, massive reduction in writes, but risk of stale data.

Realistic middle ground: 5-10 second push from the agent, write to an in-memory geo-index (Redis GEO, or a dedicated spatial service), with TTL. Database of record is updated lazily. Assignment queries hit the in-memory index β€” never the primary DB. This lets location be "fresh enough" without saturating the system-of-record.

(Deep dive on the streaming layer is HLD territory β€” mentioned here and moved on. In interview, explicitly scope it out.)

3. Order state storage: row-based vs. event-sourced ​

  • Row-based (one row per order, state column updated in place): simple, easy queries. Loses history unless an order_audit table mirrors transitions.
  • Event-sourced (append OrderEvents, derive current state): perfect audit, easy to replay, but every read costs aggregation unless snapshotted.

Sensible hybrid: row-based for the "current" view (fast reads, O(1) state check), with a complementary append-only order_events table written transactionally. Current-row is the source of truth for business logic; events are the source of truth for audit, dispute resolution, and analytics. This is the pattern most real systems land on.

4. Cart lifetime ​

  • Session-scoped (expires on logout or 30 min of inactivity): simple, but loses cart across devices.
  • User-scoped, persistent (saved server-side per customer): better UX, but you have to decide what to do with a 2-week-old cart from a closed restaurant. Usually: show it as "some items unavailable" and re-quote.

Real platforms persist carts server-side. Worth it. The price-changed-since-cart case is the interesting one β€” we handle it in the edge cases below.

5. Pricing quote lifetime and re-validation ​

Price shown in the cart is a quote. Between cart view and checkout, surge can kick in, promo can expire, or a menu item can go out of stock. The rule: re-quote at checkout. If the total changes beyond a small tolerance (say > 1%), show the user a "price updated" dialog before they submit. Never silently charge a different amount.


Patterns Used ​

PatternWhere
StateOrder and DeliveryAgent each encode a state machine with per-state allowed transitions. Implemented with guard tables over conditionals.
StrategyIAssignmentStrategy (nearest vs. batched vs. ML-scored), IPricingStrategy (composed of IPricingRules). Each strategy is swap-at-runtime.
ObserverOrder state changes notify customer + restaurant + agent via INotificationService. In a real system this is backed by pub/sub; in the design it's a single interface.
FactoryOrder creation lives in OrderService.placeOrder β€” centralizes the invariant that orders only come into existence via the checkout trapdoor (never new Order() elsewhere in the codebase).
CommandState transitions as audited events: every call to updateOrderState writes to order_events with actor, timestamp, from-state, to-state. Replayable, auditable.
RepositoryIOrderRepository, IAgentRepository β€” abstract persistence. Lets the domain not care whether state lives in Postgres, DynamoDB, or a hybrid.
FacadeOrderService and AgentService present small, task-oriented APIs over a complex domain. Clients don't assemble pricing + assignment + notification themselves.
Chain of ResponsibilityPricing rules applied in sequence, each adding or modifying a line item. New rules slot into the chain without modification of existing ones.

Concurrency Considerations ​

This is the section interviewers linger on. Food delivery is where textbook distributed-systems races become concrete.

Race 1: Two orders racing for the same idle agent ​

Two customers in the same zone place orders within 100 ms. Both assignment calls query the spatial index, both find AgentX as the nearest idle agent, both call pick. Without coordination, both orders get AgentX.

Resolution options:

  1. Optimistic lock on the agent row (our default). tryReserve(agentId, orderId) is a single atomic statement:

    sql
    UPDATE agents
    SET state = 'ASSIGNED', current_order_id = ?, version = version + 1
    WHERE id = ? AND state = 'IDLE' AND version = ?

    Only one of the two transactions can succeed β€” the other sees 0 rows updated and moves to the next candidate in its ranked list. No locks held across the network. Cheap. Default choice.

  2. Central dispatcher per region: a single-threaded (per-region) process serializes all assignment decisions for that region. No race β€” only one caller can make the decision at a time. Simple to reason about, but becomes a bottleneck; scale by sharding regions. DoorDash's historical architecture. Use when per-order optimization complexity exceeds what optimistic locking can handle (e.g., batching logic).

  3. Geohash-based partitioning: route all orders and agents in a geohash to the same worker. Each worker is the single writer for its cell. No cross-worker race β€” but you pay for cross-cell assignments (agent in adjacent cell). Hybrid: optimistic lock within cell, route-to-worker across cells.

For V1, the CAS-based optimistic lock is the right default. Call out the alternatives for peak-load scenarios.

Race 2: Order state transition contention ​

Customer clicks "Cancel" and restaurant clicks "Start Preparing" at the same moment. Both try to transition the order.

Resolution: the same CAS pattern on order rows. transitionState(orderId, fromState, toState, actor) becomes:

sql
UPDATE orders
SET state = ?, version = version + 1
WHERE id = ? AND state = ? AND version = ?

Exactly one wins. The loser gets a "state has changed" response and must re-fetch and decide whether to retry (e.g., if the customer's cancel raced with ACCEPTED β†’ PREPARING, the cancel might still be valid for a fee β€” re-check and retry).

Race 3: Double-submit / retry idempotency ​

Mobile app retries a "place order" request because the first one timed out. Without guards, the customer's card is authorized twice and two orders appear.

Resolution: the client generates a unique requestId (UUID) per user action. Server stores a unique index on (customer_id, request_id). Retries collide and return the result of the first request. This is cheap and universally applicable β€” every state-changing endpoint carries a requestId.

Race 4: Restaurant accept + customer cancel ​

Restaurant taps "Accept" and customer taps "Cancel" simultaneously. Two valid transitions from PLACED: one to ACCEPTED, one to CANCELLED. The CAS resolves: whichever writes first wins. The other sees the changed state and gets a friendly error. In practice, we also add a tiny (say 500 ms) "cancel lockout" window after acceptance to avoid the jarring UX of "cancelled just as it was accepted" β€” a product decision, not a technical one.

Race 5: Agent goes offline during ASSIGNED ​

Agent's phone dies while en route to pickup. System detects via heartbeat timeout. We must safely transition the agent OFFLINE and reassign the order. The danger: if the agent comes back online and thinks they still have the order, they'll show up at the restaurant after we've already given it to someone else.

Resolution: the server is the source of truth for the assignment. When we reassign, we flip the agent's current_order_id to null atomically. When the agent app reconnects, it pulls state from the server (not its own cache). The app UI must always be a projection of server state on reconnect.

Idempotency checklist ​

Every state-change endpoint:

  • Accepts a client-generated requestId.
  • Stores (endpoint, requestId) with the transition outcome.
  • On replay, returns the cached outcome β€” never re-executes.

Specifically: acceptOrder, rejectOrder, acceptOffer, declineOffer, markReady, markPickedUp, markDelivered, cancelOrder. All idempotent.


Scale & Extensibility ​

Geosharding (per-city services) ​

Each city is an independent operational unit: its own restaurants, agents, surge pricing, ops team. Shard by city_id. An order never crosses cities, so cross-shard transactions are essentially zero. This is the single biggest scale lever β€” you can run 50 cities on 50 independent clusters with no coupling.

Restaurant catalog caching ​

Catalog is read-heavy (every open of the app reads the restaurant list) and write-light (menu changes once a day, usually). Push the catalog into a CDN-fronted cache, invalidate on menu write. Search (filter by cuisine, sort by distance) runs against a search index (Elasticsearch / Typesense) rebuilt on catalog change. The database is never in the hot read path for browse.

New payment methods ​

The IPaymentService hides the processor. Adding Apple Pay, UPI, or a new regional processor is a new implementation of the interface. Payment flows (authorize-then-capture, authorize-and-capture-now, etc.) differ; model them with a small set of payment-flow strategies, not as branching logic in the order service.

Scheduled orders ​

Currently placeOrder triggers immediate restaurant notification. For a scheduled order (pickup at 7 PM), add a deliverBy field and defer the restaurant notification until deliverBy - prepTime - deliveryTime. A scheduled-job subsystem (which you already have for timeouts) handles the delay. The state machine is unchanged.

Group orders ​

One Order → many Carts, one per participant. Each cart quotes independently; the order sums them. Payment splits or single payer — both are payment-flow variants. The state machine doesn't change; only the cart→order transition gains a "merge carts" step.

Surge pricing from signal streams ​

SurgeRule queries a surge service that ingests supply (idle agent count) and demand (orders-per-minute) signals, outputs a multiplier per geohash. The rule is unchanged; the signal-processing pipeline is a separate HLD concern. Fallback: if surge service is unreachable, the rule returns zero surge. Graceful degradation.

Multi-leg orders ​

Customer orders from two restaurants on one checkout. Either: (a) two independent orders, one delivery each (simplest, probably twice the fee); (b) one order, one agent, two pickup legs. Option (b) requires the agent state machine to grow PICKING_UP_1, PICKING_UP_2 or we model it as a sequence of sub-orders with shared delivery. Doable, but a meaningful complication β€” design around it only if the feature is explicitly in scope.

Alcohol delivery ​

Requires ID check at handoff. Extend the PICKED_UP β†’ DELIVERED transition with a precondition: photo-scan of an ID of legal age. A new field requiresIdCheck: bool on OrderItem, propagated to Order, and guarded in the transition. State machine unchanged; transition guard gains a predicate.

Live tracking (HLD-flavored mention) ​

Data model: AgentLocation keyed by agentId, updated on heartbeat, with TTL. Customer's tracking view reads the agent assigned to their order's location, subscribed via websocket or server-sent events. The streaming layer (pub/sub topology, fan-out, backpressure) is HLD. LLD-wise: ILocationService.subscribeToAgentLocation(agentId, callback) is the seam. Mention it and move on β€” deep streaming is not what we're testing.


Edge Cases ​

  1. Restaurant rejects after acceptance. Transition ACCEPTED β†’ CANCELLED by the restaurant. Rare (we already accepted), happens when ingredients run out. System triggers full refund, customer notified, any assigned agent released. UX: apologize loudly, offer a credit.

  2. Agent declines after assignment. Agent was offered the order and declined within the 30s window. tryReserve had succeeded, now we reverse: agent back to IDLE, call assignAgent again. The offer is a soft lock, not a commitment.

  3. Agent accepts, goes offline mid-delivery. Heartbeat times out. System transitions agent to OFFLINE and the order to a recovery state. Options: reassign (if still READY or earlier), escalate to ops (if food is already picked up), or in the worst case full refund + write-off. Rare; needs ops tooling, not code automation.

  4. Customer address unreachable. Agent arrives, can't find the customer. Agent calls support from in-app. Support can either (a) reschedule, (b) mark delivered with evidence, or (c) cancel with partial refund. The state machine gains an UNDELIVERABLE terminal state.

  5. Payment failure mid-checkout. Authorization declined. No order is created. Customer sees "try another card." Cart is preserved.

  6. Simultaneous cancellation by customer and agent. Two independent cancel requests arriving concurrently. Both are valid transitions from PREPARING β†’ CANCELLED. The CAS resolves: first write wins, second sees "already cancelled" and returns success (idempotent cancel). Cancellation reason is whichever wrote first β€” a tie noted in audit.

  7. Surge kicks in during checkout. Cart quoted at $20, surge activates, checkout quote is $23. Server re-quotes at placeOrder. If delta exceeds tolerance (e.g., > 5% or > $2), return a specific error code; client shows a "price updated" dialog and lets the user re-submit.

  8. Restaurant closed at assignment time. Should be caught earlier β€” at cart-creation (restaurant marked as closed β†’ cart reject). But cover the race: if a restaurant closes between cart and checkout, placeOrder rejects with RESTAURANT_UNAVAILABLE. If between PLACED and restaurant's first check, restaurant rejects on their end.

  9. Agent already has an order. In V1 (single-batch), tryReserve only flips IDLE β†’ ASSIGNED. An agent in ASSIGNED or DELIVERING is not IDLE, so the query never returns them, and even if it did the CAS would fail. Safe by construction. If we later add multi-batch (one agent carrying two orders), the agent's eligibility changes to IDLE ∨ (ASSIGNED ∧ slotsAvailable) and the reserve op becomes assign to slot k.

  10. Promo code exhausted between cart and checkout. PromoRule re-evaluates at placeOrder. If the code is no longer valid (out of budget, expired, already used by this customer), the rule returns promo = 0 and marks the quote changed. Same "price updated" UX as surge.

  11. Customer tip added post-delivery. After DELIVERED, allow a tip update up to 1 hour. Model as an addendum to the payment β€” a second capture on the saved payment method, not a modification of the completed charge. The Order.price.tip is updated; the rest of the breakdown is frozen.

  12. Delivery to wrong address. Customer claims they never received food. Agent's photo proof + GPS trail at time of DELIVERED go to dispute review. State doesn't move back; dispute resolution produces a refund as a compensating transaction, not a state rewind. Immutable order principle: don't rewrite history, add an adjustment.


Follow-up Questions ​

The ones an interviewer actually asks, with the hook into the depth answer:

  1. "How do you batch multiple orders to one agent?" Agents have capacity > 1. Assignment groups orders by pickup-window overlap and geographic proximity. Eligibility changes from state=IDLE to slotsAvailable > 0. State machine gains per-slot tracking. Pickup and delivery become per-order events, not per-agent events.

  2. "How do you model live tracking efficiently?" (HLD territory but worth a short answer.) Geohash-keyed pub/sub topics. Agent publishes to loc/{geohash}. Clients of that agent's order subscribe. Fan-out stays bounded. Data model: TTL'd location, never persisted.

  3. "What prevents the same agent being assigned to two orders?" Optimistic CAS on the agent row (WHERE state='IDLE'). Covered in detail in Concurrency.

  4. "How do you handle an agent refusing an offer repeatedly?" Track refusal rate per agent. If > threshold in a window, temporary de-prioritize in ranking (not blacklist). Business policy, not technical.

  5. "Scale to 10M orders/day β€” what changes?" Geoshard by city. Per-city clusters, independent. Restaurant catalog on CDN. Spatial index in-memory (Redis GEO). Audit/events stream to a data warehouse. Payments through a dedicated payments service. Nothing in the core design fundamentally changes; you just operate N copies of it.

  6. "How do you A/B test pricing?" Pricing rules carry a variant key. Experiment service assigns each order to a variant at cart time; the rule uses the variant. Breakdowns are tagged with variant for downstream analysis.

  7. "How do you add tips post-delivery?" Separate capture on saved payment, bumped Order.price.tip, agent earnings payout adjusted by the delta. One-hour window, enforced at the service layer.

  8. "How do you implement no-contact delivery without losing delivery confirmation?" Transition predicate on PICKED_UP β†’ DELIVERED requires a photo uploaded by the agent (with geo-tag matching customer address). Customer can dispute within 24 h, which opens a review ticket.

  9. "What happens if the payments service is down?" placeOrder fails explicitly β€” we do not create orders without authorization. Show a "try again shortly" error. Queue is not appropriate here; stale orders are worse than rejected orders.

  10. "How do you prevent a restaurant from being overloaded with orders?" Per-restaurant concurrency cap (e.g., maxSimultaneousOrders=30). At PLACED, restaurant can auto-reject if over cap. Dynamic cap based on observed kitchen throughput is a future extension.

  11. "Can two customers share delivery?" That's group-order or multi-leg. Addressed in Scale & Extensibility.

  12. "How do you handle the 'restaurant forgot to mark ready' case?" System timeout from PREPARING (e.g., 45 min). Transitions to an ops-review state, pings the restaurant, and optionally auto-releases the agent if they've been waiting > 10 min.


SDE2 vs SDE3 β€” How the Bar Rises ​

DimensionSDE2 (strong)SDE3 (senior)
State machine completenessDraws the order state machine with main states; names valid transitions.Draws order and agent machines; encodes transition matrix with allowed actors; discusses how the two machines synchronize (agent IDLE↔ASSIGNED tied to order ACCEPTED).
Assignment race handlingIdentifies the race. Proposes a lock.Chooses CAS over locking; articulates the alternatives (central dispatcher, geohash partition) and when each wins; explains graceful retry across ranked candidates on reserve failure.
IdempotencyMentions it for placement.Treats it as a universal property; designs the requestId mechanism; names which endpoints need it (all state-change ones) and why.
Geosharding awarenessKnows we'd shard eventually.Designs around city as the unit of sharding from the start; shows no cross-shard transactions in the core flow; calls out catalog caching and surge signal scoping per shard.
Failure modesHandles the happy path + cancel.Enumerates 10+ failure paths (payment down, agent offline mid-delivery, surge mid-checkout, double-cancel) with specific mitigations and explicit "what the user sees."
Extensibility for batchingCalls out "we could batch."Shows the exact interface seam (IAssignmentStrategy), how agent eligibility changes, how the state machine extends per-slot, and the migration path (strategy-swap at zone level under peak).
Scheduled ordersMentions as future feature.Identifies exact point of divergence (defer notification), reuses existing state machine, names the dependency (job scheduler already present for timeouts).
Pricing compositionOne computePrice function.Ordered IPricingRule chain, each independently testable; calls out re-quote-at-checkout and the delta-tolerance UX; surge as graceful-degradation source.
Event sourcing vs. row-basedPicks one, probably row-based.Articulates the hybrid (row for current truth, append-only events for audit) and names the specific uses (disputes, analytics, replay).
Interview scope managementTries to design live tracking, runs out of time.Explicitly scopes live tracking out ("that's an HLD question, here's the data-model seam and how the LLD is unaffected") β€” signals senior judgment about what fits in the room.

The senior move in this problem is not knowing more patterns. It is (a) identifying the state-machine-first framing in the first five minutes, (b) being ruthlessly precise about concurrency primitives (CAS, not "a lock"), and (c) showing which choices are load-bearing (assignment correctness, idempotency, per-city sharding) versus which are interchangeable defaults (greedy vs. batched, row vs. event-sourced). That ratio β€” of decisions framed and ranked, versus patterns named β€” is what separates the two levels.

Frontend interview preparation reference.