12 — Problem: Amazon Locker
Understanding the Problem
At its heart, an Amazon Locker is two problems wearing the same uniform. The first is an assignment problem: given an incoming parcel of some size and a location with a finite inventory of lockers of varying sizes, pick the "right" locker — one that fits, uses capacity sensibly, and doesn't collide with another simultaneous drop-off. The second is a state machine: each locker is either AVAILABLE, RESERVED, OCCUPIED, or out for MAINTENANCE, and every real-world event (reserve, drop-in, pickup, OTP expiry, hardware failure) is a transition that must be legal, observable, and recoverable.
Why this problem gets asked
Amazon Locker is a favourite LLD question for senior backend rounds because it forces you to reason about all three of the hard things simultaneously:
- Multi-entity modelling — Locations, lockers, parcels, users, OTPs, reservations, and assignment policy all need to live in the same object graph without turning into spaghetti.
- Assignment strategy — Smallest-fit vs largest-fit isn't a throwaway choice. It's a durable tradeoff between utilization today and flexibility tomorrow, and the interviewer wants to hear you articulate it.
- Expiry semantics and concurrency — Reservations have TTLs. Two couriers can arrive at the same moment. A pickup can race a forced-expiry. How you model these is a depth-probe for "do you know how real distributed systems fail?"
If you only model the happy path — parcel drops in, user picks it up — you've missed the interview. The points are in the state machine, the assignment seam, and the concurrency story.
Clarifying Questions
You: What locker sizes are we supporting, and is the set fixed or extensible?
Interviewer: Start with
SMALL,MEDIUM,LARGE. Design so new sizes (sayXL) can be added without changing core assignment logic.
You: Do we know the dimensions of an incoming parcel, or just a pre-computed size category?
Interviewer: Assume the upstream shipping service tags each parcel with a size category. Your system consumes that.
You: What happens if no locker of the required size is available? Can we upsize?
Interviewer: Yes, upsize is acceptable — a MEDIUM parcel can go into a LARGE locker. A LARGE parcel must never go into a SMALL. If even upsizing fails, the parcel is rejected back to the courier; that's a legitimate outcome.
You: Can a single user have multiple parcels in the same or different lockers at the same location?
Interviewer: Yes. Each parcel gets its own reservation and its own OTP. Don't bundle them.
You: OTP mechanics — how long is it valid, is it single-use, and what's the character space?
Interviewer: 6-digit numeric, valid for 72 hours from dropoff, single-use. Design so the expiry duration is configurable per-location.
You: If the OTP expires before pickup, what happens?
Interviewer: The parcel enters a "needs return to sender" workflow. Locker stays
OCCUPIEDuntil a courier reclaims it. That's a separate flow; just make sure your state machine has a place for it.
You: Can the OTP be reissued if the user loses it?
Interviewer: Yes — reissue is a supported operation. Invalidates the old OTP, extends the expiry by a configurable amount.
You: Concurrency — can two couriers simultaneously try to reserve the same locker?
Interviewer: Absolutely can happen in practice. Your design has to prevent double-booking. This is the core of the concurrency discussion.
You: Multi-location. Does a single service instance own a single location, or are we coordinating across thousands?
Interviewer: Thousands of locations globally. Assume each location is independently routable but availability queries need to be fast across regions.
You: Out of scope — hardware protocol for the physical locker door, payment, identity verification beyond OTP, real-time courier routing?
Interviewer: All out of scope. Assume the locker hardware exposes an idealised
open(lockerId)RPC that either succeeds or raises a hardware error.
Functional Requirements
- Assign a locker — Given a parcel and a location, reserve the best-fit available locker atomically. Generate an OTP. Notify the recipient.
- Drop-off — A courier physically drops the parcel into the reserved locker; the system transitions the locker from
RESERVEDtoOCCUPIEDand starts the pickup TTL. - Pickup — The recipient presents the OTP at the kiosk; if valid and matches the locker, the system opens the locker, marks the parcel picked-up, and returns the locker to
AVAILABLE. - Expire stale reservations — Reservations that go unpaired with a dropoff (courier never showed) auto-expire and release the locker.
- Expire stale pickups — Parcels that sit in an
OCCUPIEDlocker past the pickup TTL enter aRETURN_TO_SENDERflow. - Reissue OTP — User can request a new OTP for an active reservation; old one is invalidated.
- Real-time availability — Queries like "how many lockers of size S are available at location L?" are served in near-real-time.
- Reassign on hardware failure — If a reserved locker breaks before pickup, reassign the parcel to another locker at the same location.
- Maintenance mode — Admins can mark a locker as unavailable without affecting existing reservations.
Out of scope: Physical hardware protocol, payments, identity verification beyond OTP, courier routing, warehouse-to-locker logistics.
Non-Functional Requirements
| Concern | Target / Expectation |
|---|---|
| Consistency (critical path) | Two concurrent reservations must never both hold the same locker. The reserveLocker method is the single strong-consistency hotspot in the system. |
| Availability | Locker lookup (available lockers at location L) should tolerate replica lag. Near-real-time (seconds) is acceptable. |
| Scale | ~10k locker locations globally, average ~50 lockers per location, ~5M parcels/day peak. A single location handles ~100 reservations/day (low QPS per location, but global fan-out is significant). |
| Latency | Reservation p99 < 200ms. Pickup (OTP validate + open) p99 < 500ms — includes hardware call. |
| Durability | Reservations and OTPs are durable the moment they're issued. Loss means a customer showing up to a locker that won't open. |
| Auditability | Every state transition is logged with actor, timestamp, reason. Required for disputes and loss investigations. |
| Offline tolerance | A locker location can lose network connectivity for minutes. It must not accept new reservations while offline, but in-flight pickups (OTP already issued) should still work locally. |
| Security | OTPs must be brute-force-resistant (rate limit per locker + lockout). OTPs are short-lived and single-use. |
Core Entities and Relationships
| Entity | Responsibility |
|---|---|
Location | A physical site. Owns a set of Lockers, a timezone, and a configuration (OTP TTL, reservation TTL). |
Locker | A single physical compartment. Has a size, a state (AVAILABLE / RESERVED / OCCUPIED / MAINTENANCE), and at most one active Reservation. |
LockerSize | Enum with an ordering. SMALL < MEDIUM < LARGE < XL. Used for size-compatible matching. |
Package / Parcel | Interchangeable here. Has an id, a declared size, recipient user id, origin metadata. A Parcel is the runtime instance that flows through the system. |
User | Has an id, a contact method (email / SMS), and a preference for notification channel. |
Otp | A value object: 6-digit code, creation timestamp, expiry timestamp, locker id, parcel id, attempts-remaining counter. |
Reservation | Binds a Parcel → Locker → Otp with a state (PENDING_DROPOFF / AWAITING_PICKUP / COMPLETED / EXPIRED / RETURNED) and timestamps. |
AssignmentStrategy | Policy object that, given a list of candidate lockers and a parcel, picks one. Pluggable (smallest-fit, largest-fit, bin-packing, etc.). |
ExpiryPolicy | Policy object that computes the expiry timestamp for a new reservation or OTP. Per-location overrides. |
LocationRegistry | Lookup service for locations by id or geospatial query. |
NotificationService | Sends OTP + pickup instructions to the user. Abstracted so email / SMS / push are interchangeable. |
Key relationships:
Location1 ↔ NLocker(composition — a locker can't exist without a location).Locker1 ↔ 0..1Reservation(a locker has at most one active reservation).Parcel1 ↔ 0..1Reservation(a parcel has at most one active reservation at a time).Reservation1 ↔ 1Otp(current OTP; reissue replaces it but keeps history).
Interfaces
// Assignment policy — the seam where smallest-fit, largest-fit, etc. plug in.
interface IAssignmentStrategy {
select(candidates: Locker[], parcel: Parcel): Locker | null;
}
// OTP generation — pluggable for random vs HMAC-deterministic.
interface IOtpGenerator {
generate(reservationId: string, expiresAt: Date): Otp;
verify(otp: Otp, presentedCode: string): boolean;
}
// Notifications — push, SMS, email, chatbot — all hide behind this.
interface INotificationService {
notifyReserved(user: User, reservation: Reservation): Promise<void>;
notifyExpiring(user: User, reservation: Reservation): Promise<void>;
notifyPickupComplete(user: User, reservation: Reservation): Promise<void>;
}
// Expiry policy — per-location TTL configuration lives here.
interface IExpiryPolicy {
reservationExpiresAt(location: Location, createdAt: Date): Date;
pickupExpiresAt(location: Location, droppedAt: Date): Date;
otpExpiresAt(location: Location, createdAt: Date): Date;
}
// The narrow contract the locker kiosk / courier app talks to.
interface ILockerService {
reserveLocker(parcelId: string, locationId: string): Promise<Reservation>;
recordDropoff(reservationId: string): Promise<void>;
pickup(locationId: string, otpCode: string): Promise<Parcel>;
reissueOtp(reservationId: string): Promise<Otp>;
markMaintenance(lockerId: string, reason: string): Promise<void>;
releaseMaintenance(lockerId: string): Promise<void>;
}The interfaces above are the only seams where new behaviour plugs in. Everything else is internal choreography.
Class Diagram
+--------------------+
| LocationRegistry |
+---------+----------+
|
v
+-----------+ +----------+ +---------+ +--------------+
| Parcel |---->| Reservation |<--| Locker |<>-----| Location |
+-----------+ +-----+----+ ^ +---+-----+ +-------+------+
| | | |
| | | |
v | v v
+-------+ | +---------+ +---------+
| Otp | | | State | | Config |
+-------+ | | Machine | | (TTLs) |
| +---------+ +---------+
|
+------------------+------------------+
| | |
v v v
+-------------------+ +-------------------+ +---------------+
| IAssignment | | IExpiryPolicy | | IOtpGenerator |
| Strategy | +-------------------+ +---------------+
+-------------------+
^ ^
| |
+---------+ +--------+
| Smallest| | Largest|
| Fit | | Fit |
+---------+ +--------+
+--------------------------+
| LockerService (facade) |
| - reserve/dropoff/pickup|
| - reissueOtp |
| - maintenance |
+--------------------------+
| |
v v
+-------------+ +-----------------------+
| Repository | | INotificationService |
| (persistence) +-----------------------+
+-------------+Class Design
Locker and its state machine
The Locker is the single source of truth for physical compartment state. Four states, and transitions are strictly gated:
markMaintenance
+-------------------------+
| |
v |
MAINTENANCE <--- releaseMaintenance ---AVAILABLE
| ^ ^
| | |
reserve| | |releasePickup / releaseExpired
v | |
RESERVED |
| |
dropoff |
v |
OCCUPIED --+Every transition is a method on Locker that validates the current state and atomically swaps it. Illegal transitions throw — never silently no-op.
enum LockerState {
AVAILABLE = "AVAILABLE",
RESERVED = "RESERVED",
OCCUPIED = "OCCUPIED",
MAINTENANCE = "MAINTENANCE",
}
class Locker {
constructor(
public readonly id: string,
public readonly locationId: string,
public readonly size: LockerSize,
private _state: LockerState = LockerState.AVAILABLE,
private _reservationId: string | null = null,
private _version: number = 0, // for optimistic concurrency
) {}
get state(): LockerState { return this._state; }
get reservationId(): string | null { return this._reservationId; }
get version(): number { return this._version; }
reserve(reservationId: string): void {
this.assertState(LockerState.AVAILABLE, "reserve");
this._state = LockerState.RESERVED;
this._reservationId = reservationId;
this._version++;
}
markDropoff(): void {
this.assertState(LockerState.RESERVED, "markDropoff");
this._state = LockerState.OCCUPIED;
this._version++;
}
releasePickup(): void {
this.assertState(LockerState.OCCUPIED, "releasePickup");
this._state = LockerState.AVAILABLE;
this._reservationId = null;
this._version++;
}
releaseExpired(): void {
// Called only for stale RESERVED lockers (courier never showed).
this.assertState(LockerState.RESERVED, "releaseExpired");
this._state = LockerState.AVAILABLE;
this._reservationId = null;
this._version++;
}
markMaintenance(): void {
if (this._state === LockerState.OCCUPIED || this._state === LockerState.RESERVED) {
throw new IllegalStateError("cannot enter maintenance with an active reservation");
}
this._state = LockerState.MAINTENANCE;
this._version++;
}
releaseMaintenance(): void {
this.assertState(LockerState.MAINTENANCE, "releaseMaintenance");
this._state = LockerState.AVAILABLE;
this._version++;
}
private assertState(expected: LockerState, op: string): void {
if (this._state !== expected) {
throw new IllegalStateError(`cannot ${op} from state ${this._state}`);
}
}
}Why the version counter? It's the hook for optimistic concurrency at the persistence layer. Every write increments; the repository refuses to persist if the stored version has moved on. More on this in the concurrency section.
Reservation
A reservation is the contract between a parcel and a locker: which parcel, which locker, which OTP, when it expires, and what state it's in. Its own state machine mirrors and slightly extends the locker's.
enum ReservationState {
PENDING_DROPOFF = "PENDING_DROPOFF", // locker reserved, courier hasn't arrived
AWAITING_PICKUP = "AWAITING_PICKUP", // parcel in locker, OTP active
COMPLETED = "COMPLETED",
EXPIRED_BEFORE_DROPOFF = "EXPIRED_BEFORE_DROPOFF",
EXPIRED_AFTER_DROPOFF = "EXPIRED_AFTER_DROPOFF", // RETURN_TO_SENDER
}
class Reservation {
constructor(
public readonly id: string,
public readonly parcelId: string,
public readonly lockerId: string,
public readonly locationId: string,
public otp: Otp,
public state: ReservationState,
public readonly createdAt: Date,
public dropoffDeadline: Date,
public pickupDeadline: Date | null,
public version: number = 0,
) {}
}Reservation is deliberately near-anaemic — most of the behaviour lives in the service, because transitioning a reservation usually requires touching the locker too, and we want those two writes inside one transaction.
AssignmentStrategy (Strategy Pattern)
class SmallestFitStrategy implements IAssignmentStrategy {
select(candidates: Locker[], parcel: Parcel): Locker | null {
const fitting = candidates
.filter(l => l.state === LockerState.AVAILABLE && l.size >= parcel.size)
.sort((a, b) => a.size - b.size);
return fitting[0] ?? null;
}
}
class LargestFitStrategy implements IAssignmentStrategy {
select(candidates: Locker[], parcel: Parcel): Locker | null {
const fitting = candidates
.filter(l => l.state === LockerState.AVAILABLE && l.size >= parcel.size)
.sort((a, b) => b.size - a.size);
return fitting[0] ?? null;
}
}OtpService
Two implementations worth discussing — random with DB lookup, or HMAC-deterministic for stateless verification. We'll default to random because it's simpler; HMAC variant is in the tradeoffs section.
class RandomOtpGenerator implements IOtpGenerator {
generate(reservationId: string, expiresAt: Date): Otp {
const code = this.sixDigits();
return new Otp(code, new Date(), expiresAt, reservationId, /* attempts */ 5);
}
verify(stored: Otp, presented: string): boolean {
if (stored.attemptsRemaining <= 0) return false;
if (new Date() > stored.expiresAt) return false;
// constant-time compare avoids timing side channels
return constantTimeEquals(stored.code, presented);
}
private sixDigits(): string {
const n = crypto.randomInt(0, 1_000_000);
return n.toString().padStart(6, "0");
}
}Key Methods
reserveLocker — the race-safe critical path
This is the method the interview is built around. The interviewer will push on every line. Here's the structure — DB-level optimistic concurrency, retry on conflict, fallback across candidates.
class LockerService implements ILockerService {
constructor(
private readonly lockerRepo: ILockerRepository,
private readonly reservationRepo: IReservationRepository,
private readonly parcelRepo: IParcelRepository,
private readonly locationRegistry: LocationRegistry,
private readonly strategy: IAssignmentStrategy,
private readonly expiryPolicy: IExpiryPolicy,
private readonly otpGen: IOtpGenerator,
private readonly notifier: INotificationService,
private readonly clock: Clock,
) {}
async reserveLocker(parcelId: string, locationId: string): Promise<Reservation> {
const parcel = await this.parcelRepo.findOrThrow(parcelId);
const location = await this.locationRegistry.findOrThrow(locationId);
// 1. Read a fresh snapshot of candidate lockers.
const candidates = await this.lockerRepo.findByLocationAndState(
locationId,
LockerState.AVAILABLE,
);
// 2. Ask the strategy to rank. This is the only piece that changes
// between smallest-fit, largest-fit, or future bin-packing heuristics.
const ranked = this.rankAllFitting(candidates, parcel);
if (ranked.length === 0) {
throw new NoLockerAvailableError(locationId, parcel.size);
}
// 3. Try each candidate in order. Optimistic concurrency means two
// concurrent reservers can race on the same locker; the loser retries
// with the next candidate.
for (const locker of ranked) {
try {
return await this.tryReserve(parcel, location, locker);
} catch (err) {
if (err instanceof OptimisticLockError) continue;
throw err;
}
}
throw new NoLockerAvailableError(locationId, parcel.size);
}
private async tryReserve(
parcel: Parcel,
location: Location,
locker: Locker,
): Promise<Reservation> {
const now = this.clock.now();
const reservationId = generateId();
const otp = this.otpGen.generate(
reservationId,
this.expiryPolicy.otpExpiresAt(location, now),
);
const reservation = new Reservation(
reservationId,
parcel.id,
locker.id,
location.id,
otp,
ReservationState.PENDING_DROPOFF,
now,
this.expiryPolicy.reservationExpiresAt(location, now),
null,
);
// 4. Mutate the in-memory locker and persist locker + reservation
// inside ONE transaction guarded by the locker's version.
locker.reserve(reservationId);
await this.reservationRepo.withTransaction(async tx => {
// throws OptimisticLockError if the DB's version has advanced
await this.lockerRepo.saveIfVersion(locker, locker.version - 1, tx);
await this.reservationRepo.create(reservation, tx);
});
// 5. Fire-and-forget notification — must not block the critical path,
// and must not fail the reservation if the notifier is down.
this.notifier.notifyReserved(parcel.recipient, reservation).catch(err => {
logger.warn({ err, reservationId }, "notification failed; will retry async");
});
return reservation;
}
private rankAllFitting(candidates: Locker[], parcel: Parcel): Locker[] {
const ranked: Locker[] = [];
let pool = [...candidates];
// Strategy may only return the top choice; we loop to get a fallback list.
while (pool.length) {
const pick = this.strategy.select(pool, parcel);
if (!pick) break;
ranked.push(pick);
pool = pool.filter(l => l.id !== pick.id);
}
return ranked;
}
async pickup(locationId: string, otpCode: string): Promise<Parcel> {
// Look up reservation by (locationId, otpCode) using an indexed table.
const reservation = await this.reservationRepo.findActiveByOtp(locationId, otpCode);
if (!reservation) {
// Intentionally generic error — no info leak about which field failed.
throw new InvalidOtpError();
}
if (!this.otpGen.verify(reservation.otp, otpCode)) {
await this.reservationRepo.decrementOtpAttempts(reservation.id);
throw new InvalidOtpError();
}
if (reservation.state !== ReservationState.AWAITING_PICKUP) {
throw new InvalidReservationStateError(reservation.state);
}
const locker = await this.lockerRepo.findOrThrow(reservation.lockerId);
locker.releasePickup();
reservation.state = ReservationState.COMPLETED;
await this.reservationRepo.withTransaction(async tx => {
await this.lockerRepo.saveIfVersion(locker, locker.version - 1, tx);
await this.reservationRepo.update(reservation, tx);
});
const parcel = await this.parcelRepo.findOrThrow(reservation.parcelId);
this.notifier.notifyPickupComplete(parcel.recipient, reservation).catch(() => {});
return parcel;
}
async expireStaleReservations(now: Date = this.clock.now()): Promise<number> {
// Batched cleanup. Called by a scheduler — cron, timer wheel, or
// DB-TTL-driven. See concurrency section for tradeoffs.
const stale = await this.reservationRepo.findStale(now, /* batch */ 500);
let expired = 0;
for (const r of stale) {
try {
await this.expireOne(r);
expired++;
} catch (err) {
// Don't fail the whole batch for one bad row.
logger.error({ err, id: r.id }, "failed to expire reservation");
}
}
return expired;
}
private async expireOne(r: Reservation): Promise<void> {
const locker = await this.lockerRepo.findOrThrow(r.lockerId);
if (r.state === ReservationState.PENDING_DROPOFF) {
locker.releaseExpired();
r.state = ReservationState.EXPIRED_BEFORE_DROPOFF;
} else if (r.state === ReservationState.AWAITING_PICKUP) {
// Parcel IS in the locker. Do NOT free the locker — enter RTS flow.
r.state = ReservationState.EXPIRED_AFTER_DROPOFF;
} else {
return;
}
await this.reservationRepo.withTransaction(async tx => {
if (r.state === ReservationState.EXPIRED_BEFORE_DROPOFF) {
await this.lockerRepo.saveIfVersion(locker, locker.version - 1, tx);
}
await this.reservationRepo.update(r, tx);
});
}
}A few things to notice:
- No explicit lock around the read-decide-write loop. The atomicity lives inside
saveIfVersion, which fails if another writer moved the row. That means the critical section is exactly one DB round-trip — no app-level mutexes, no distributed locks for the hot path. - Fallback list. If two reservers race on the same best-fit locker, the loser retries with the next-best. The tail latency cost of a conflict is bounded by the length of the fallback list, not by a lock-wait queue.
- Notifications are off the critical path. A down email service must not block deliveries. Reliability for notifications is a separate retry queue.
saveIfVersionis the only place strong consistency is required. Everywhere else we can read from a replica.
Design Decisions & Tradeoffs
Smallest-fit vs largest-fit assignment
| Strategy | Pro | Con | When to pick |
|---|---|---|---|
| Smallest-fit | Maximises utilization. Keeps large lockers free for parcels that genuinely need them. Reduces the probability of rejecting a future oversized parcel. | A burst of small parcels consumes small lockers; when one more small parcel arrives and the small lockers are all taken, you upsize to medium, kicking off a cascade. | Default. Works for most locations, most of the time. |
| Largest-fit | Keeps the small lockers as buffer inventory for the common case (small parcels dominate). Simpler mental model for couriers. | Wastes a lot of cubic volume — a 10cm parcel sitting in a LARGE locker. You run out of LARGE lockers faster than you should. | Specific locations where oversized parcels are rare and small-parcel volume is bursty. |
| Bin-packing (offline) | Optimal utilization if you have a window of upcoming parcels to look ahead at. | Requires batching — defers each reservation by the batch window. Unacceptable latency for drive-up couriers. | Never in real-time. Possible for scheduled overnight batching. |
| Round-robin within size | Even hardware wear across lockers. | No utilization benefit — assumes all locker of a given size are equivalent, which is true physically but irrelevant to assignment. | As a tiebreaker inside smallest-fit / largest-fit, not as the primary strategy. |
Default: smallest-fit with round-robin tiebreak. It's the sensible starting point and the interviewer expects you to justify it, not to have picked it by default.
OTP generation: random with DB lookup vs HMAC-deterministic
| Approach | Pro | Con |
|---|---|---|
| Random + DB lookup | Pure random; no key management. Easy to rate-limit and track attempts because verify hits the DB. Revocation is trivial — delete the row. | Verification requires a DB read. OTP collisions possible (mitigated by scoping uniqueness to (locationId, code) with retry on insert). |
| HMAC-deterministic (code = truncate(HMAC(secret, reservationId || expiry))) | Stateless verification — kiosk can validate offline if it has the secret and a cached reservation. No collisions. | Key distribution and rotation become a real problem. Can't revoke a single OTP without invalidating a whole generation. Brute-force counter has to live somewhere anyway, so you're back to DB. |
Default: random + DB lookup. HMAC is tempting for offline operation but the operational cost of key management rarely pays off for a system where the DB is already in the hot path.
Reservation storage: DB row per reservation vs locker.reservation field
| Approach | Pro | Con |
|---|---|---|
Separate reservations table | Reservations have history — a locker accumulates dozens. History is cheap and queryable. Multi-entity analytics (pickup latency distribution, expiry rates) just work. | Two rows to write on every transition. Slightly more complex consistency story. |
| Single field on locker | Simpler schema. One row per locker, current reservation embedded. | Loses history. Auditing ("when did this locker last malfunction?") becomes impossible without a separate log. Expiry/cleanup requires scanning lockers instead of an indexed reservations table. |
Default: separate reservations table. The history is free and the ops/analytics value is large.
Patterns Used
| Pattern | Where | Why |
|---|---|---|
| Strategy | IAssignmentStrategy, IExpiryPolicy, IOtpGenerator | The three knobs Amazon wants to tune per-region or per-experiment without touching core code. |
| State | Locker and Reservation state machines | Makes illegal transitions impossible at compile + runtime. Self-documenting. |
| Observer | INotificationService wired off the critical path; onReserved, onPickedUp, onExpired events on an internal bus | Notifications, analytics, and audit logs subscribe without coupling into the core flow. |
| Factory | LockerFactory (instantiates Locker + persistence row given size and location); LocationFactory (same for location provisioning) | Keeps construction centralized so new sizes or config defaults don't require ripping up call sites. |
| Repository | ILockerRepository, IReservationRepository, IParcelRepository | Abstracts the DB; the service stays pure domain logic. Makes the concurrency story concrete (saveIfVersion lives on the repo). |
| Facade | LockerService | Hides the five-to-eight collaborators (repos, strategies, notifier, clock) behind one coherent surface. |
| Command (optional extension) | Wrapping reservation state transitions for undo / audit replay | Not essential for the base design, but trivial to bolt on. |
Concurrency Considerations
The central race is embarrassingly concrete: two couriers arrive at the same location within milliseconds of each other, both wanting a MEDIUM locker. The strategy picks the same locker for both. Without coordination, both succeed, and one of them ends up dropping the parcel into a locker that the other already claimed — a disaster.
Here are the credible options.
| Approach | How it works | Pro | Con |
|---|---|---|---|
| Pessimistic DB row lock | SELECT ... FOR UPDATE on the candidate locker row inside the reservation transaction. | Dead simple to reason about. Zero chance of double-booking. | Lock contention if many couriers target the same locker. Holding a lock across a slow notification = deadlock risk. Requires care to not lock the whole table. |
| Optimistic concurrency (version column) | Read locker with version V; write only if DB version is still V. On failure, retry with next candidate. | No lock contention. Scales horizontally. Matches the "try candidates in order" fallback we already need. | Retry storms if conflict rate is high (many couriers competing for last locker). Correctness requires atomic compare-and-set — every repo call has to opt in. |
| Distributed lock (Redis / ZooKeeper) | Acquire a lock keyed by lockerId before the transaction; release after. | Works across DBs; supports cross-shard reservations. | Another moving part to operate. Lock expiry vs transaction duration is an eternal source of bugs. Slower than DB-native options. |
| Single-writer per location via a queue | All reservation requests for location L go through a single partition (Kafka key = locationId, one consumer processes sequentially). | Zero contention — serialized by construction. Natural place to add FIFO fairness. | Adds end-to-end latency (serialization overhead per request). Partition hotspotting on popular locations. Operationally heavier than DB-only. |
Senior take — pick one, justify:
I'd go with optimistic concurrency + retry with fallback list as the default, for three reasons. First, the conflict rate is low at any given location (10k locations × ~100 reservations/day ≈ 1 every 15 minutes per location on average — contention is rare). Second, the retry logic is something we need anyway, because even with a lock, a locker can go
MAINTENANCEbetween the read and the write, and we'd want to fall back to the next candidate. Third, it keeps the hot path to a single DB round-trip, which matters for p99 latency.I'd reach for a distributed lock only for cross-location operations (e.g., reassigning a broken locker's parcel to a different location). I'd reach for a per-location queue if and when we see pathological conflict rates at flagship locations during peak — but that's a scale-up problem, not a day-one problem.
The queue approach is also the right answer if the interviewer pivots to "what if the physical hardware itself needs serialized access?" — a single locker door can only be physically operated by one process at a time, and queuing hardware commands per-location is the clean way.
Scale & Extensibility
- Multi-region partitioning. Lockers are geographically immutable — a locker in Seattle is never used to reserve a parcel in Tokyo. Shard the lockers and reservations databases by
locationId. A single region owns writes for its locations; other regions replicate read-only. - Eventual consistency on availability lookups. The "how many MEDIUM lockers are free at this location?" query is served from a replica or a cache with a TTL of 1-5 seconds. Stale by a few seconds is fine — the reservation itself is the source of truth and will reject if the cached count lied.
- Offline locker operation. Each location runs a thin edge service that caches active reservations (OTP + locker id) on a rolling basis. If the location loses connectivity, pickups work locally — the edge validates OTPs from cache and opens the door. New reservations are refused while offline (can't guarantee no double-booking). When connectivity returns, the edge replays its pickup log.
- Maintenance mode rollout. Admin action sets
state = MAINTENANCEon one or more lockers. Existing reservations unaffected. Availability queries automatically skip maintenance lockers. - Multi-parcel reservation. A single user with multiple parcels today gets N reservations and N OTPs. Extension: a "bundle reservation" type that groups N parcels into one OTP. Cleanly slots in behind a new
ReservationType.BUNDLEand aBundleReservationsubclass — no change to locker logic. - Locker-to-locker transfer. Needed when a locker fails after dropoff. Admin flow: open source locker, move parcel to a new available locker, update the reservation's
lockerId(new version bump), OTP is unchanged. BecauseReservationreferencesLockerby id, the swap is atomic. - Oversize handling via partner carrier. If no locker of any size fits, assignment returns null and the upstream shipping service either reroutes to a different location or hands off to a partner (UPS home delivery, etc.). Our seam:
reserveLockerthrowsNoLockerAvailableError; the caller decides the fallback. - New locker size (XL) rollout. Add the enum member, provision hardware at target locations, configure which locations have XL. Smallest-fit strategy handles the new size automatically (ordering in enum is all it cares about). No deploy needed on the service side beyond the enum change.
Edge Cases
- No locker fits (undersized + can't upsize). Throw
NoLockerAvailableError. Shipping service returns the parcel to the courier. Log for capacity planning. - Locker hardware malfunction mid-pickup.
hardware.open()RPC fails after we transitionedOCCUPIED → AVAILABLEin DB. Compensate: roll the locker intoMAINTENANCE, roll reservation back toAWAITING_PICKUP, notify user to go to a kiosk attendant. This is why we want the DB transaction to commit before the hardware call, and have an explicit compensating action for a hardware failure. - OTP expired before user arrives. Reservation transitions to
EXPIRED_AFTER_DROPOFF. User can still pick up via attendant (separate flow) or a reissued OTP. Default: auto-reissue a single time with a notification. - Parcel never picked up (multi-day). After the pickup TTL (e.g., 72h), reservation enters
EXPIRED_AFTER_DROPOFF/ RTS. A nightly job emits events to the returns pipeline. Courier swings by to reclaim. Locker staysOCCUPIEDuntil the courier physically retrieves. - Concurrent pickup attempts with the same OTP. The same user tries twice, or two devices use the same OTP simultaneously. First successful verify transitions reservation to
COMPLETED; second seesstate !== AWAITING_PICKUPand throws. OTP also becomes single-use because attempts counter gets decremented on consumption. - OTP brute force. Per-locker attempt counter (5 tries). On exhaustion, invalidate the OTP; require a new reissue initiated from the user's verified account. Per-location rate limit (say, 10 failed OTPs/minute from one kiosk) to catch scanner attacks.
- Locker opened but parcel not removed. Kiosk physically opens the door, user walks away. We don't know. Mitigation: a weight sensor or IR beam reports "door closed, contents still present." Without sensors: mark
COMPLETEDand handle the re-discovery at the next maintenance sweep. - Courier drops into the wrong locker. Courier scans parcel at a kiosk; the kiosk says "locker 17." Courier opens 18 and dumps the parcel. Sensor-free systems can't detect this. Mitigation: require the courier to scan a barcode on the inside of the locker door at dropoff — mismatched barcode = hard error. In the model, this is a validation on
recordDropoff(reservationId, scannedLockerId). - Race between pickup and forced-expiry. A user scans OTP at 00:00:00 while the cron job is expiring their reservation at 00:00:01. Optimistic concurrency resolves cleanly: whichever write commits first wins, the other sees a stale version and re-reads. If pickup wins, reservation is
COMPLETED— expiry sees that and skips. If expiry wins, pickup seesEXPIRED_AFTER_DROPOFFand surfaces the "see attendant" flow. - Duplicate reservation for the same parcel. Caller retries a
reserveLockercall after a timeout, not knowing the first succeeded. Mitigation: idempotency key on the reserve endpoint (parcel id + caller request id). Second call returns the first reservation.
Follow-up Questions
How do you expire stale reservations at scale? Three common approaches and their tradeoffs:
Approach Pro Con Cron job scanning WHERE expires_at < now AND state = PENDINGSimple. Runs on any SQL DB. Tunable batch sizes. Scan load grows with total reservations. Latency = cron interval (minute-granularity at best). Timer wheel (per-service in-memory) Millisecond precision. No DB scan. Lost on restart without persistence. Doesn't scale past one box without sharding. DB-native TTL (DynamoDB TTL, Redis EXPIRE) Zero app code. Expiry is "best effort within minutes" in DynamoDB. Doesn't give you compensating actions (can't release the locker in the expiry callback — it's a delete, not a trigger). Senior answer: cron at the DB level for the decision, combined with a publish to a work queue that fans out the compensating actions (release lockers, send notifications). The cron is just "find expired rows, emit events" — everything else is event-driven and horizontally scalable.
How do you handle a locker hardware failure mid-flow? Split the operation into "commit DB state" followed by "call hardware." If hardware fails after DB commit, fire a compensating workflow: mark locker
MAINTENANCE, rollback the reservation state if appropriate, page a technician. Never attempt to re-synthesize physical state — you can't trust it.How do you prevent OTP brute force? Per-OTP attempt counter (invalidate on exhaustion). Per-kiosk + per-locker rate limits at the edge. Exponential backoff between attempts on the kiosk UI. Alert on anomalous OTP-failure rates per location.
How do you support multi-parcel pickup in one OTP? New
BundleReservationtype that owns a set of(lockerId, parcelId)tuples and a single OTP. Pickup validates the OTP once, then iterates opening each locker. State machine gains a partial-pickup state for when the user takes some but not all parcels.How do you roll out a new locker size (XL) globally? Three-phase: (a) extend enum and deploy service; smallest-fit ignores XL until lockers exist. (b) provision XL lockers at pilot locations; assignment picks them up automatically. (c) expand location coverage over time. No coupling between rollout and service deploy.
How do you reassign a reservation when the original locker breaks? Admin-triggered workflow: locate an alternative locker at the same location using the current assignment strategy, transition original locker to
MAINTENANCE, update reservation'slockerIdinside one transaction. OTP is unchanged. Notify user of new locker number. If no alternative exists, treat as "no locker fits" and escalate.What does your observability story look like? Metrics: reservations/sec per location, failed assignments (and reason), OTP verification failures, pickup latency, hardware open success rate, expired-without-pickup rate. Traces for each of the four critical-path methods. Structured logs with reservation id as the trace correlation. Alarms on failed-assignment spike (suggests capacity problem) and hardware-failure spike (suggests a location-wide issue).
How would you audit a "user claims the locker was empty" dispute? Query the reservations table for the reservation id → get state transition history. Query the hardware log at the edge for the open event + any sensor data. Query the dropoff event for the scanning courier. The separate reservations table (vs a single locker field) is what makes this trivial.
Could this be a sharded monolith or must it be microservices? A sharded monolith by
locationIdis genuinely sufficient at 5M parcels/day. Microservice split (reservations, lockers, notifications, hardware adapter) is a team boundary, not a performance one. I'd split when on-call burden or deploy contention made it worth it, not before.How do you guarantee the hardware "open" is only called after the DB commit? The reservation/locker transaction commits first; then we publish an
OPEN_REQUESTEDevent to an at-least-once queue. The edge consumer is idempotent on(lockerId, reservationId). If the queue delivers twice, the second open is a no-op. If the process crashes between commit and publish, a reconciler notices the state mismatch and re-publishes.
SDE2 vs SDE3 — How the Bar Rises
| Dimension | SDE2 | SDE3 |
|---|---|---|
| Concurrency handling | Mentions "we need a lock"; reaches for a DB FOR UPDATE or a global mutex. Handles the happy-path race. | Articulates optimistic vs pessimistic explicitly, justifies choice against conflict rate, designs a retry-with-fallback loop. Calls out the per-location queue escape hatch for hotspotting at scale. |
| Assignment strategy justification | Picks smallest-fit because "it seems better." | Walks through smallest-fit, largest-fit, bin-packing. Argues tradeoffs on utilization vs future-flexibility. Calls out that the choice can vary per-location or per-region as a product lever. |
| Expiry mechanism | "A cron job scans expired rows." | Compares cron / timer wheel / DB TTL; splits the decision from the compensating action; routes the compensation through an event queue for horizontal scale; discusses race with in-flight pickups. |
| State machine rigor | Implements available → reserved → occupied and moves on. | Enforces transitions at the class level (not just repo), handles all reverse transitions (expiry before dropoff vs after), maps MAINTENANCE pre-conditions, proves illegal transitions are impossible. |
| Observability | "We'd log things." | Names the specific metrics (reservations/sec, failed-assignment rate, hardware-open success), defines what alarm thresholds mean operationally, proposes a dispute-audit workflow grounded in the schema. |
| Degradation modes | Assumes the happy path; mentions "what if the DB is down." | Designs explicit offline-locker operation (edge cache, pickup-only, reject new reservations), splits hot-path consistency from cold-path eventual, walks through the hardware-failure compensating workflow end-to-end. |
| Scope control | Tries to design the hardware protocol, the payments flow, and the routing. | Explicitly names what's out of scope and defends the boundary. Invites the interviewer to redirect rather than burning time on off-topic subsystems. |
| Extensibility seams | Adds a comment "we could make this pluggable later." | Points at the IAssignmentStrategy / IExpiryPolicy / IOtpGenerator interfaces and walks through what changes when (e.g.) we want HMAC OTPs or a bundle-reservation feature. Shows that extensibility was designed in, not retrofitted. |