16 β Problem: Movie Ticket Booking β
Understanding the Problem β
On the surface, movie-ticket booking reads like a CRUD modeling exercise: theatres own screens, screens host shows, shows have seats, users book seats, payments confirm bookings. You could fill a whiteboard with entity boxes and call it done β and get politely rejected. The interview is about none of that. The interview is about two users clicking the same seat at the same millisecond and exactly one of them walking away with a ticket. Every senior-level question on this problem eventually becomes a concurrency question.
The real test
The interviewer is waiting to hear you say "seat lock with TTL" β ideally in the first five minutes. If you spend twenty minutes on UML before acknowledging that seat reservation is a distributed critical section, the signal is: this candidate thinks of backend systems as data stores, not as coordination problems. The modeling is the easy half. The concurrency is the interview.
The pathological cases tell you what the system actually has to do:
- A hit movie's opening night: 300 seats, 10,000 users refreshing at 6 PM.
- A concert on-sale: 50,000 seats, 2 million people in queue, most will see "sold out" within 90 seconds.
- A normal Tuesday: two friends both click seat F7 within 200 ms of each other because they didn't coordinate.
All three require the same core primitive: atomic, time-bounded ownership of a seat during the checkout window. Everything else in the design β the entity graph, the payment flow, the state machines β exists to serve that primitive.
Clarifying Questions β
You: Are we designing for a single theatre, a chain, or a multi-tenant platform spanning many cities?
Interviewer: Multi-tenant, BookMyShow-style. Many cities, many theatres per city, many screens per theatre.
You: Do seats have categories β silver, gold, platinum, recliner β with different prices?
Interviewer: Yes. Price is per (show, seat-category). Pricing can also vary by day (weekday vs weekend) and time (matinee vs prime).
You: Rough scale? Seats per show, shows per day, peak bookings per second?
Interviewer: Assume 200 seats per show average, a popular show runs 10x/day across a city, peak is ~5k bookings/sec during flash sales.
You: Group bookings β does a user always book a contiguous block, or can they pick scattered seats?
Interviewer: Any combination. Up to 10 seats per booking.
You: Is payment integration in scope, or do we assume a payment service exists?
Interviewer: Assume an external payment gateway. You define the contract. The flow β lock, pay, confirm β is in scope.
You: Cancellations and refunds?
Interviewer: Cancellation allowed up to 2 hours before showtime with partial refund. Refund semantics themselves are out of scope; the state machine should support it.
You: Waitlist for sold-out shows?
Interviewer: Out of scope for v1. Mention how you'd add it.
You: What happens to a lock if the user abandons checkout?
Interviewer: That's exactly what I want to hear your answer on.
You: Is this a monolith or a distributed system? Specifically β is the seat-lock store in-process or shared across app servers?
Interviewer: Distributed. Assume multiple app-server replicas. The lock store is whatever you design.
You: Fairness β on a concert flash-sale, does first-click win, or do we have a queue / lottery?
Interviewer: First-click wins for v1. But talk about why that's a terrible answer for real concert sales and what you'd do instead.
Functional Requirements β
- Browse β list cities, theatres in a city, shows in a theatre on a date, and the seat map of a show.
- Hold β user selects 1β10 seats; system atomically reserves them for that user for a bounded TTL (5 min default).
- Release on abandon β if the user closes the tab or times out, seats return to
AVAILABLE. - Pay β user submits payment against a held lock; payment gateway confirms asynchronously (webhook) or synchronously.
- Confirm β on payment success, seats transition from
LOCKEDβBOOKED. Booking record persisted. - Cancel β user cancels a confirmed booking before the allowed cutoff; seats return to
AVAILABLE; refund is queued. - Search β by movie, city, date, theatre, language, format (2D/3D/IMAX).
- Idempotency β retrying a lock, confirm, or payment call with the same key must not double-book or double-charge.
Non-Functional Requirements β
| Concern | Target | Why it matters |
|---|---|---|
| Seat-selection latency | P99 < 200 ms for lock attempt | Users click, expect instant feedback. Lock failure must also be fast. |
| Consistency | Strict β no seat double-booked, ever | This is the one requirement that cannot degrade. Availability can dip; correctness cannot. |
| Availability during payment outage | Booking-creation path degrades gracefully | If the payment gateway is down, existing locks should still expire cleanly and the UI should say "try again" rather than stranding seats. |
| Fairness under contention | First-click wins (v1); pluggable later | Flash-sale fairness is a product call (queue / lottery / captcha), not a correctness one. |
| Throughput | 5k lock attempts/sec sustained | Dominated by the lock store. Redis handles this; a SQL row-lock approach does not. |
| Durability | Confirmed bookings survive any single-node failure | Locks can be ephemeral. Bookings cannot. |
Correctness is non-negotiable; availability is negotiable. If Redis is slow, some users see "try again." If Redis loses a lock during a cluster failover and we double-book, we refund one user and apologize β but we've violated the one invariant the business trusts us with. Every design choice in the concurrency section follows from this asymmetry.
Core Entities and Relationships β
| Entity | Responsibility | Key relationships |
|---|---|---|
City | Geographic grouping for browse. | 1 β N Theatre |
Theatre | Physical venue. | 1 β N Screen |
Screen | Auditorium inside a theatre. Owns the seat layout. | 1 β N Seat (template); 1 β N Show |
Movie | The film. Metadata (title, duration, rating, language). | β |
Show | A scheduled instance: Movie Γ Screen Γ startTime. | Owns a mutable seat grid (ShowSeat[]) derived from the screen template. |
ShowSeat | Per-show seat record with state. Decoupled from the screen-template Seat so two shows in the same screen have independent availability. | belongs to Show |
SeatLock | Transient reservation: (showId, seatId, userId, lockId, expiresAt). Lives in the lock store, not the primary DB. | β |
Booking | Durable record of a purchase. State machine (see below). | 1 β N ShowSeat (at time of booking), 1 β 1 Payment |
Payment | Gateway transaction. Idempotent. | 1 β 1 Booking |
User | Customer. | 1 β N Booking |
Why ShowSeat and not just Seat? The screen has a template (row F, col 7, category gold) shared across every show in that screen. Availability, price, and lock state are per show. Conflating the two forces every seat read to join by show and makes lock state harder to reason about. Splitting them is a one-line cost that saves grief forever.
Why is SeatLock outside the primary DB? Locks are high-churn (created, expired, released thousands of times a second) and short-lived (5 min TTL). Storing them in Postgres rows works but puts write pressure on the same tables that hold your durable bookings. Redis (or any TTL-capable KV) is the natural home. We'll defend this choice in the concurrency section.
Interfaces β
// ------------ Lock service ------------
interface LockToken {
readonly lockId: string; // opaque, server-generated
readonly showId: string;
readonly seatIds: readonly string[];
readonly userId: string;
readonly expiresAt: number; // unix ms
}
interface ISeatLockService {
/**
* Atomic across seats. Either all seats are locked or none are.
* Throws SeatUnavailableError with the conflicting seat ids.
*/
lockSeats(
showId: string,
seatIds: readonly string[],
userId: string,
ttlMs: number,
idempotencyKey: string,
): Promise<LockToken>;
/** Extend a lock's TTL. Used when payment is slow but still alive. */
extendLock(lockId: string, additionalMs: number): Promise<LockToken>;
/** Release all seats under a lock. Idempotent. */
releaseLock(lockId: string): Promise<void>;
/** Validate a lock is still live and owned by user. */
validateLock(lockId: string, userId: string): Promise<LockToken>;
}
// ------------ Pricing ------------
interface PriceBreakdown {
readonly subtotal: number;
readonly taxes: number;
readonly fees: number;
readonly total: number;
readonly currency: string;
}
interface IPricingStrategy {
price(show: Show, seats: readonly ShowSeat[]): PriceBreakdown;
}
// ------------ Payment ------------
interface PaymentRequest {
readonly bookingId: string;
readonly amount: number;
readonly currency: string;
readonly userId: string;
readonly method: PaymentMethod;
readonly idempotencyKey: string; // MUST be the bookingId-derived key
}
interface PaymentResult {
readonly paymentId: string;
readonly status: "SUCCEEDED" | "FAILED" | "PENDING";
readonly gatewayRef: string;
}
interface IPaymentGateway {
charge(req: PaymentRequest): Promise<PaymentResult>;
refund(paymentId: string, amount: number, idempotencyKey: string): Promise<void>;
}
// ------------ Booking ------------
interface IBookingRepository {
save(booking: Booking): Promise<void>;
findById(id: string): Promise<Booking | null>;
findByIdempotencyKey(key: string): Promise<Booking | null>;
}
interface IBookingService {
createBooking(
lockToken: LockToken,
userId: string,
idempotencyKey: string,
): Promise<Booking>;
confirmBooking(
bookingId: string,
paymentDetails: PaymentDetails,
idempotencyKey: string,
): Promise<Booking>;
cancelBooking(bookingId: string, userId: string): Promise<Booking>;
}Note the idempotency keys everywhere external effects happen. That isn't decoration β it's the only reason the system can be safely retried end-to-end. We'll come back to key design in the Tradeoffs section.
Class Diagram β
+-----------+
| City |
+-----------+
| 1..N
v
+-----------+ +-----------+
| Theatre |------->| Movie |
+-----------+ +-----------+
| 1..N ^
v |
+-----------+ 1..N +-----------+
| Screen |------>| Show |
+-----------+ +-----------+
| 1..N | 1..N
v v
+-----------+ +-----------+
| Seat |<------| ShowSeat |
| (template)| | (stateful)|
+-----------+ +-----------+
^
| references (by id)
|
+----------------+ +----------------+ |
| SeatLockService|<---->| SeatLock |---+
| (Redis-backed)| | (TTL, ephemeral)|
+----------------+ +----------------+
^
| (1) lock
|
+----------------+ +----------------+ +----------------+
| BookingService |---->| Booking |---->| Payment |
+----------------+ +----------------+ +----------------+
| ^ ^
| (2) create | |
| (3) charge | state: |
| (4) confirm | CREATEDβSEATS_LOCKED |
| | βPAYMENT_PENDING |
v | βCONFIRMED / EXPIRED |
+----------------+ | /CANCELLED |
| IPaymentGateway| | |
+----------------+ +-----------------------+
^
|
+---------------------+
| IPricingStrategy |
| (weekday/weekend, |
| tier, dynamic) |
+---------------------+Class Design β
Show owns the seat grid β
class Show {
constructor(
public readonly id: string,
public readonly movieId: string,
public readonly screenId: string,
public readonly startTime: Date,
public readonly endTime: Date,
public readonly format: ScreenFormat, // 2D | 3D | IMAX
public readonly language: string,
private readonly seats: Map<string, ShowSeat>,
) {}
getSeat(seatId: string): ShowSeat | undefined {
return this.seats.get(seatId);
}
getSeats(seatIds: readonly string[]): ShowSeat[] {
return seatIds.map((id) => {
const s = this.seats.get(id);
if (!s) throw new SeatNotFoundError(id);
return s;
});
}
isBookable(now: Date): boolean {
// Reject booking if show starts within 5 min β avoids locks
// colliding with the projectionist closing doors.
return this.startTime.getTime() - now.getTime() > 5 * 60 * 1000;
}
}ShowSeat β state is a computed projection β
type ShowSeatState = "AVAILABLE" | "LOCKED" | "BOOKED";
class ShowSeat {
constructor(
public readonly id: string,
public readonly showId: string,
public readonly row: string,
public readonly column: number,
public readonly category: SeatCategory,
public readonly basePrice: number,
private _bookedByBookingId: string | null = null,
) {}
get isBooked(): boolean {
return this._bookedByBookingId !== null;
}
/**
* Compute state by consulting the lock store. State is NOT a mutable
* field β it's derived from (isBooked, hasActiveLock). This is the only
* way to avoid "state cached and now stale" bugs.
*/
stateAt(now: Date, activeLock: SeatLock | null): ShowSeatState {
if (this._bookedByBookingId) return "BOOKED";
if (activeLock && activeLock.expiresAt > now.getTime()) return "LOCKED";
return "AVAILABLE";
}
markBooked(bookingId: string): void {
if (this._bookedByBookingId) {
throw new SeatAlreadyBookedError(this.id);
}
this._bookedByBookingId = bookingId;
}
markAvailable(): void {
this._bookedByBookingId = null;
}
}Why isn't state a field? Because then you have two sources of truth (the seat row and the lock store) and they will diverge. A lock expires at T+300s but the seat row still says LOCKED until something sweeps it β that's a bug waiting to happen. By deriving state at read time from (bookedFlag, lockLookup) you eliminate the divergence entirely. The only mutable field is _bookedByBookingId, which changes at most twice in a seat's lifetime (booked, then cancelled).
Booking β state machine β
+---------+
| CREATED | (booking row persisted; no lock yet)
+----+----+
| lockSeats() ok
v
+---------------+
| SEATS_LOCKED |---- TTL expires -----> +---------+
+-------+-------+ | EXPIRED |
| +---------+
| user submits payment
v
+------------------+
| PAYMENT_PENDING |---- gateway FAIL --> +-----------+
+---------+--------+ | CANCELLED |
| +-----------+
| gateway SUCCESS ^
v | user cancels
+--------------------+ | (before cutoff)
| PAYMENT_SUCCEEDED | |
+----------+---------+ |
| confirm step (commit seats) |
v |
+-----------+ |
| CONFIRMED |-----------------------------+
+-----------+type BookingState =
| "CREATED"
| "SEATS_LOCKED"
| "PAYMENT_PENDING"
| "PAYMENT_SUCCEEDED"
| "CONFIRMED"
| "EXPIRED"
| "CANCELLED";
class Booking {
constructor(
public readonly id: string,
public readonly userId: string,
public readonly showId: string,
public readonly seatIds: readonly string[],
public readonly lockId: string,
public readonly price: PriceBreakdown,
public readonly idempotencyKey: string,
public state: BookingState = "CREATED",
public paymentId: string | null = null,
public readonly createdAt: Date = new Date(),
public updatedAt: Date = new Date(),
) {}
transition(next: BookingState): void {
if (!this.canTransitionTo(next)) {
throw new IllegalStateTransitionError(this.state, next);
}
this.state = next;
this.updatedAt = new Date();
}
private canTransitionTo(next: BookingState): boolean {
const allowed: Record<BookingState, readonly BookingState[]> = {
CREATED: ["SEATS_LOCKED", "EXPIRED"],
SEATS_LOCKED: ["PAYMENT_PENDING", "EXPIRED", "CANCELLED"],
PAYMENT_PENDING: ["PAYMENT_SUCCEEDED", "CANCELLED", "EXPIRED"],
PAYMENT_SUCCEEDED: ["CONFIRMED", "CANCELLED"],
CONFIRMED: ["CANCELLED"],
EXPIRED: [],
CANCELLED: [],
};
return allowed[this.state].includes(next);
}
}Explicit terminal states and an explicit transition guard are worth the extra lines. Without the guard, a late webhook delivering PAYMENT_SUCCEEDED on an already-EXPIRED booking will silently flip you into an inconsistent state β and that inconsistency is what double-bookings are made of.
SeatLockService β the critical section β
class RedisSeatLockService implements ISeatLockService {
private static readonly KEY_PREFIX = "seatlock";
constructor(
private readonly redis: RedisClient,
private readonly clock: Clock,
) {}
private key(showId: string, seatId: string): string {
return `${RedisSeatLockService.KEY_PREFIX}:${showId}:${seatId}`;
}
async lockSeats(
showId: string,
seatIds: readonly string[],
userId: string,
ttlMs: number,
idempotencyKey: string,
): Promise<LockToken> {
// 1. Idempotency short-circuit: if this key already produced a lock,
// return the same token. Retries from flaky networks cannot
// double-spend seats this way.
const existing = await this.redis.get(`idem:${idempotencyKey}`);
if (existing) return JSON.parse(existing);
const lockId = randomUUID();
const expiresAt = this.clock.now() + ttlMs;
const token: LockToken = { lockId, showId, seatIds, userId, expiresAt };
// 2. Atomically SETNX every seat key in a Lua script, or bail and
// roll back. Redis Lua is single-threaded β no partial lock.
const script = `
local acquired = {}
for i, key in ipairs(KEYS) do
local ok = redis.call('SET', key, ARGV[1], 'NX', 'PX', ARGV[2])
if ok then
table.insert(acquired, key)
else
-- rollback
for j, acquiredKey in ipairs(acquired) do
redis.call('DEL', acquiredKey)
end
return {0, key} -- 0 = failure, name the offender
end
end
return {1}
`;
const keys = seatIds.map((sid) => this.key(showId, sid));
const lockPayload = JSON.stringify(token);
const result = await this.redis.eval(script, keys, [lockPayload, ttlMs.toString()]);
if (result[0] === 0) {
throw new SeatUnavailableError(showId, [result[1]]);
}
// 3. Store idempotency mapping with the same TTL. If the caller retries,
// they get the exact same token back.
await this.redis.set(`idem:${idempotencyKey}`, lockPayload, "PX", ttlMs);
// 4. Store a reverse index (lockId -> seat keys) for release.
await this.redis.set(`lock:${lockId}`, JSON.stringify(keys), "PX", ttlMs);
return token;
}
async releaseLock(lockId: string): Promise<void> {
const keysJson = await this.redis.get(`lock:${lockId}`);
if (!keysJson) return; // already released / expired β idempotent no-op
const keys: string[] = JSON.parse(keysJson);
await this.redis.del(...keys, `lock:${lockId}`);
}
async extendLock(lockId: string, additionalMs: number): Promise<LockToken> {
// Only extend if still live. Guards against "lock expired β payment
// webhook lands β we extend and resurrect a zombie lock."
const keysJson = await this.redis.get(`lock:${lockId}`);
if (!keysJson) throw new LockNotFoundError(lockId);
const keys: string[] = JSON.parse(keysJson);
// PEXPIRE on each key; if any returns 0, lock has partially expired.
// Senior-level answer: treat partial expiry as full failure.
for (const key of keys) {
const ok = await this.redis.pexpire(key, additionalMs);
if (ok === 0) throw new LockExpiredError(lockId);
}
// refresh reverse index too
await this.redis.pexpire(`lock:${lockId}`, additionalMs);
// reconstruct the token (or store and re-read)
const token = await this.validateLock(lockId, /* userId */ "");
return token;
}
async validateLock(lockId: string, userId: string): Promise<LockToken> {
// In real code, store the full token payload keyed by lockId and re-read it
// here; elided for brevity.
throw new Error("see notes");
}
}The Lua script is the important part. It's the single primitive that makes multi-seat locking race-safe: Redis runs the whole script atomically against its single-threaded event loop, so the SET NX chain either commits every seat or rolls back every seat. Without it, you could end up with seat F7 locked, seat F8 taken by someone else, and a torn state the application has to clean up. The script erases that class of bug.
Key Methods β
The three methods below are the spine of the system. Everything else is plumbing.
class BookingService implements IBookingService {
constructor(
private readonly lockSvc: ISeatLockService,
private readonly paymentGw: IPaymentGateway,
private readonly pricing: IPricingStrategy,
private readonly bookings: IBookingRepository,
private readonly shows: IShowRepository,
private readonly clock: Clock,
private readonly events: IEventBus,
) {}
// -----------------------------------------------------------------
// 1. LOCK β the race-safe critical section
// -----------------------------------------------------------------
async createBooking(
showId: string,
seatIds: readonly string[],
userId: string,
idempotencyKey: string,
): Promise<Booking> {
// Idempotent: same key β same booking.
const prior = await this.bookings.findByIdempotencyKey(idempotencyKey);
if (prior) return prior;
const show = await this.shows.findById(showId);
if (!show) throw new ShowNotFoundError(showId);
if (!show.isBookable(this.clock.now_date())) {
throw new ShowNoLongerBookableError(showId);
}
const seats = show.getSeats(seatIds);
// Pre-flight DB check β not authoritative (TOCTOU) but filters
// obviously-booked seats before we hit Redis.
for (const s of seats) {
if (s.isBooked) throw new SeatAlreadyBookedError(s.id);
}
// THE atomic step. If two users race here, exactly one proceeds.
const ttlMs = 5 * 60 * 1000; // 5 min
const lockToken = await this.lockSvc.lockSeats(
showId,
seatIds,
userId,
ttlMs,
idempotencyKey, // same key stops retries from double-locking
);
const price = this.pricing.price(show, seats);
const booking = new Booking(
randomUUID(),
userId,
showId,
seatIds,
lockToken.lockId,
price,
idempotencyKey,
"SEATS_LOCKED",
);
await this.bookings.save(booking);
this.events.publish({ type: "booking.seats_locked", bookingId: booking.id });
return booking;
}
// -----------------------------------------------------------------
// 2. CONFIRM β idempotent end-to-end
// -----------------------------------------------------------------
async confirmBooking(
bookingId: string,
paymentDetails: PaymentDetails,
idempotencyKey: string,
): Promise<Booking> {
const booking = await this.bookings.findById(bookingId);
if (!booking) throw new BookingNotFoundError(bookingId);
// Idempotent: already confirmed? Just return it.
if (booking.state === "CONFIRMED") return booking;
if (booking.state === "EXPIRED" || booking.state === "CANCELLED") {
throw new BookingNoLongerActiveError(booking.state);
}
// Validate the lock is still live. If TTL fired while user was
// typing their CVV, fail closed before charging the card.
const lock = await this.lockSvc.validateLock(
booking.lockId,
booking.userId,
).catch(() => null);
if (!lock || lock.expiresAt <= this.clock.now()) {
booking.transition("EXPIRED");
await this.bookings.save(booking);
throw new LockExpiredError(booking.lockId);
}
// Optionally extend the lock to cover payment latency.
await this.lockSvc.extendLock(booking.lockId, 2 * 60 * 1000);
booking.transition("PAYMENT_PENDING");
await this.bookings.save(booking);
// Idempotency key for the gateway derives from bookingId. Retrying
// confirmBooking β from a client retry or a saga redelivery β
// MUST reuse the same key so the gateway returns the same charge.
const paymentIdemKey = `pay:${booking.id}`;
let result: PaymentResult;
try {
result = await this.paymentGw.charge({
bookingId: booking.id,
amount: booking.price.total,
currency: booking.price.currency,
userId: booking.userId,
method: paymentDetails.method,
idempotencyKey: paymentIdemKey,
});
} catch (err) {
// Gateway unreachable. DO NOT mark as failed β we don't know.
// Keep state PAYMENT_PENDING; a reconciler will settle it.
throw new PaymentAttemptFailedError(err);
}
if (result.status === "FAILED") {
booking.transition("CANCELLED");
await this.bookings.save(booking);
await this.lockSvc.releaseLock(booking.lockId);
throw new PaymentDeclinedError(result);
}
if (result.status === "PENDING") {
// Async gateway. Webhook will complete the flow.
booking.paymentId = result.paymentId;
await this.bookings.save(booking);
return booking;
}
// SUCCEEDED: flip bookings and seats to permanent state.
booking.paymentId = result.paymentId;
booking.transition("PAYMENT_SUCCEEDED");
// Commit: mark seats BOOKED, release the lock. If the process
// dies between these two steps, the reconciliation job (keyed
// by booking state) finishes the job.
const show = await this.shows.findById(booking.showId);
if (!show) throw new ShowNotFoundError(booking.showId);
for (const seatId of booking.seatIds) {
show.getSeat(seatId)!.markBooked(booking.id);
}
await this.shows.save(show);
booking.transition("CONFIRMED");
await this.bookings.save(booking);
await this.lockSvc.releaseLock(booking.lockId);
this.events.publish({ type: "booking.confirmed", bookingId: booking.id });
return booking;
}
// -----------------------------------------------------------------
// 3. RELEASE EXPIRED LOCKS β background reconciler
// -----------------------------------------------------------------
async releaseExpiredLocks(): Promise<void> {
// Redis TTL frees the seat keys automatically β this job handles
// the *booking-side* cleanup: any booking stuck in SEATS_LOCKED
// or PAYMENT_PENDING past the lock TTL must be resolved.
const stuck = await this.bookings.findStuckBookings(
["SEATS_LOCKED", "PAYMENT_PENDING"],
this.clock.now() - 10 * 60 * 1000, // older than 10 min
);
for (const booking of stuck) {
// If PAYMENT_PENDING, reconcile with gateway first β did it succeed?
if (booking.state === "PAYMENT_PENDING" && booking.paymentId) {
const status = await this.paymentGw.status(booking.paymentId);
if (status === "SUCCEEDED") {
// Gateway succeeded but we never confirmed. Recover.
await this.forceConfirm(booking);
continue;
}
}
booking.transition("EXPIRED");
await this.bookings.save(booking);
await this.lockSvc.releaseLock(booking.lockId);
}
}
async cancelBooking(bookingId: string, userId: string): Promise<Booking> {
const booking = await this.bookings.findById(bookingId);
if (!booking) throw new BookingNotFoundError(bookingId);
if (booking.userId !== userId) throw new ForbiddenError();
if (booking.state !== "CONFIRMED") {
throw new BookingNotConfirmedError(booking.state);
}
const show = await this.shows.findById(booking.showId);
if (!show) throw new ShowNotFoundError(booking.showId);
const hoursToShow = (show.startTime.getTime() - this.clock.now()) / 3_600_000;
if (hoursToShow < 2) throw new CancellationWindowClosedError();
booking.transition("CANCELLED");
for (const seatId of booking.seatIds) {
show.getSeat(seatId)!.markAvailable();
}
await this.shows.save(show);
await this.bookings.save(booking);
// Refund is queued as a saga step β don't block the cancel call.
this.events.publish({
type: "booking.refund_requested",
bookingId: booking.id,
paymentId: booking.paymentId!,
amount: booking.price.total * 0.9, // 10% cancellation fee
});
return booking;
}
private async forceConfirm(booking: Booking): Promise<void> {
// Recovery path for PAYMENT_PENDING bookings whose payment
// succeeded at the gateway but we missed the callback.
const show = await this.shows.findById(booking.showId);
if (!show) return;
for (const seatId of booking.seatIds) {
const seat = show.getSeat(seatId);
if (seat && !seat.isBooked) seat.markBooked(booking.id);
}
await this.shows.save(show);
booking.transition("PAYMENT_SUCCEEDED");
booking.transition("CONFIRMED");
await this.bookings.save(booking);
await this.lockSvc.releaseLock(booking.lockId);
}
}Three structural rules to notice:
- Idempotency keys live in the contract, not the implementation.
createBooking,confirmBooking, and the payment call all take an explicit key. A retry on any boundary is safe. - State transitions are guarded. A late webhook for an
EXPIREDbooking cannot drag it back toCONFIRMED. The state machine rejects it, and the reconciler handles compensation. - The lock is validated before the charge. A TTL fires once and you charge a card for seats you no longer own β that's the worst UX possible. We validate and extend.
Design Decisions & Tradeoffs β
Lock TTL β how long? β
| TTL | Upside | Downside |
|---|---|---|
| 30 s | Frees seats fast on abandons. | Users on slow networks / 3DS challenge lose their seats mid-checkout. |
| 5 min | Covers normal checkout including card auth. | On sold-out shows, a single abandoner ties up seats for 5 min. |
| 15 min | Accommodates wallet-selection hesitancy. | During flash sales, the effective inventory is halved because abandoned locks dominate. |
Pick 5 min with extension on payment submit. Gives a normal checkout plenty of headroom; flash sales get an aggressive 90 s TTL with no extension.
Lock granularity β per-seat vs per-show β
Per-show is a single mutex β a sledgehammer. 200 users trying to book a show serialize through one lock; throughput collapses. Per-seat is the default. Per-show appears as a short-held meta-lock only when modifying the seat template (adding/removing seats from a screen), and it's held by admins, not booking flows.
Lock storage β in-memory Map vs Redis vs DB β
| Option | Throughput | Correctness across replicas | Operational cost |
|---|---|---|---|
In-memory Map (single process) | Highest | Fails β two replicas don't share state | Trivial |
Shared DB row + FOR UPDATE | Moderate | Correct | Adds write pressure to primary DB; long-held row locks hurt OLTP |
Redis with Lua SET NX PX | Very high | Correct (single Redis primary) | Redis HA, failover semantics, key-hashtags for cluster routing |
| Distributed consensus (Raft/Zookeeper) | Moderate | Strongest | Massive operational tax for this problem |
Redis is the default for this problem at this scale. It's fast, single-threaded Lua gives you atomic multi-key ops, TTL is native, and failover semantics are well-understood. The one thing to address explicitly in an interview: a Redis primary failover can briefly expose the "two lock holders" window (async replication can lose the latest SET). For a ticketing system where each failover risks seat double-allocation, the mitigations are (a) keep Redis replication latency tiny, (b) use Redlock across independent Redis nodes if your correctness bar is absolute, (c) accept that the lock is advisory and enforce real commitment via the DB's uniqueness constraint on (showId, seatId, state=BOOKED). Option (c) is what production systems usually do.
Pessimistic vs optimistic concurrency β
Pessimistic (Redis/DB lock) reserves the seat up front. Optimistic (version column) lets any number of transactions attempt, and the DB's UPDATE ... WHERE version = ? catches the loser at commit time.
For seat booking, pessimistic wins because the critical section is long-running: lock β user enters card details β gateway roundtrip β confirm. Optimistic means the loser has already typed their card info when they find out they lost. Users hate that. Optimistic is for quick, server-side retries β not multi-minute user flows.
Payment flow β sync vs async β
Gateways increasingly push you to webhooks. You get a PENDING on charge, then a callback says SUCCEEDED or FAILED. Design assumes both modes are possible:
- Sync
SUCCEEDED: inline confirm, fastest UX. PENDING: booking sits inPAYMENT_PENDING, webhook handler drives it forward.
The webhook is a distinct endpoint that looks up the booking by paymentId, validates the signature, and calls the same forceConfirm path the reconciler uses. Both webhook and reconciler use the same idempotent confirm logic β impossible to double-confirm.
Idempotency key design β
Three boundaries, three key scopes:
- Client β createBooking: client-generated UUID. "This click is one booking intent." Replays on network retry are safe.
- BookingService β gateway:
pay:{bookingId}. Deterministic. If confirm runs twice (reconciler + client retry), the gateway dedupes. - Webhook β BookingService: the gateway's event id. Guarantees we don't apply the same
PAYMENT_SUCCEEDEDevent twice.
The key that matters most is the second one. A client could legitimately skip it (the client doesn't know the bookingId until createBooking returns). The service-to-gateway key is the one that prevents double-charges under any retry scenario.
Patterns Used β
| Pattern | Where | Why |
|---|---|---|
| State | Booking state machine | Explicit transitions prevent "late event drags booking back to CONFIRMED." Terminal states (EXPIRED, CANCELLED) are enforced. |
| Strategy | IPricingStrategy | Weekday/weekend/dynamic pricing swap without touching booking logic. |
| Observer | IEventBus | booking.seats_locked, booking.confirmed, booking.refund_requested. Email, analytics, recommendation models all subscribe without changes to the booking path. |
| Command | Booking operations as audit records | Every state transition writes an immutable event. Auditors ask "why did this seat get assigned to user X?" β you read the command log. |
| Saga | Payment β Confirm β Email β Refund-on-cancel | No 2PC across booking DB and the payment gateway. Each step is idempotent with a compensating action (release lock, refund). |
| Repository | IBookingRepository, IShowRepository | Testable: inject in-memory repos for unit tests; Postgres impls in prod. |
| Factory | ShowFactory.forScreen(screen, movie, time) | Builds the ShowSeat grid from the screen's seat template. Keeps show construction from leaking into call sites. |
Concurrency Considerations β
This is the section the interviewer is grading. Every other section is table stakes; this one sorts SDE2 from SDE3.
The core race β
Two users, U1 and U2, both see seat F7 as AVAILABLE and both click "book." Without coordination, both succeed. One shows up to the theatre, the other shows up to the theatre, the usher's face drops. The design problem is: a transient, distributed, time-bounded ownership claim that is atomic across multiple seats and resilient to the claim-holder dying.
Options compared β
| Approach | Mechanism | Throughput | Correctness | Ops complexity | Fits us? |
|---|---|---|---|---|---|
| (a) Per-seat mutex in-process | Map<seatId, Mutex> in the app server | Very high | Broken β doesn't span replicas | Trivial | No. Any horizontal scaling breaks it. |
| (b) Optimistic lock (DB version column) | UPDATE seats SET state='LOCKED', version=v+1 WHERE id=? AND version=? | High | Correct for the update step; but the window between read and write is the user's entire checkout session β optimistic is wrong for human-latency critical sections | Low | No. User finds out they lost after entering card info. |
| (c) Pessimistic DB row lock | SELECT ... FOR UPDATE held across lock/pay/confirm | Low (long-held row locks serialize) | Correct | Medium | Acceptable for small scale. Breaks at flash-sale throughput. |
(d) Redis SET NX PX (our choice) | Atomic SETNX with TTL, Lua script for multi-seat | Very high (~100k/s per node) | Correct under single-master; edge cases on failover | Medium (Redis HA) | Yes. Modern default. |
| (e) Redlock across independent Redis | SET NX on majority of N nodes | High | Strongest with correct TTL math | High | Overkill unless you're bank-grade. |
| (f) Consensus (Zookeeper, etcd) | Ephemeral nodes with session TTL | Moderate | Strongest | Very high | Overkill for this problem. |
Senior take: Redis SET NX PX with a Lua script for atomic multi-seat lock is the modern, proportional answer. Pair it with a DB-level uniqueness constraint on confirmed seats as belt-and-suspenders: even if Redis fails over in a pathological way, the DB refuses to commit two (showId, seatId, bookedBookingId) rows.
What if the lock-holder dies between lock and pay? β
The TTL fires. Redis evicts the keys. Any user refreshing the seat map sees the seats available again. The dead booking sits in the DB as SEATS_LOCKED; the reconciler finds it on its next sweep and flips it to EXPIRED. No human intervention. No stranded inventory.
What if the TTL fires mid-payment? β
Worst case. The gateway already charged the card but the lock is gone. Someone else can now grab the seat, and if they pay first they get it β we now have one happy customer and one customer who will call support.
Mitigations in layered order:
- Extend the lock before charging. In
confirmBookingwe callextendLock(+2 min)to cover gateway latency. - Validate the lock before charging. If validation fails, transition to
EXPIREDand do not charge. - Compensating refund. If both layers fail (concurrent eviction races our validate), the reconciler detects a
PAYMENT_SUCCEEDEDbooking whose seats are now booked by someone else, publishes a refund event, and emails the user. This is rare β we're talking tail-of-tail β but the recovery path must exist and must be designed, not improvised at 3 AM.
What about Redis failover? β
Async replication means a SET NX on the primary may not have reached the replica when the primary dies. The replica is promoted; the same seat key may not be set; a second user acquires it. Two lock tokens, same seat.
The DB commit is the last line of defense. confirmBooking flips ShowSeat.bookedByBookingId under a DB transaction. The seat row has a unique constraint that allows only one non-null bookedByBookingId. The second UPDATE fails. The second booking compensates via refund. This is the correctness invariant: Redis gives us throughput, but the DB gives us the final "at most one booking per seat" guarantee.
Fairness under flash sale β
First-click-wins is a terrible UX for a concert on-sale where 2M people hit the page at the same millisecond. The winning strategy is:
- Admission queue β users enter a queue; only N admitted to the actual booking page per second. Queue is Redis sorted-set by arrival timestamp.
- Bounded concurrency β capacity Γ 2 admitted at a time; once they lock or abandon, next batch enters.
- Bot/scalp defenses β CAPTCHA on entry, phone-verification before checkout, purchase limits per account.
None of this changes the lock design β it throttles the rate at which lock attempts hit the core. Seat-lock remains the same primitive.
Metrics that matter β
| Metric | What it tells you |
|---|---|
lock.acquired.count vs lock.conflict.count | How contested is the show? Ratio predicts churn. |
lock.expired.count | How many users are abandoning checkout? A spike means UX regression. |
confirm.p99_latency | How long from charge to commit? Drives TTL sizing. |
booking.reconciled_stuck.count | Bookings the reconciler had to rescue. Non-zero is acceptable; trend-up is a bug. |
payment_pending.age.max | Stuck webhooks. Alerts ops before users complain. |
Scale & Extensibility β
Handling spikes (concert on-sale) β
- Admission queue β users queue via Redis sorted set; only k per second enter the booking page. Keeps the seat-lock service from being stampeded.
- Request coalescing β the seat-map read is cached per show with a 1 s TTL. A million users refreshing during on-sale cost us one Redis read per second, not a million.
- Read-replica fanout β seat-map reads go to read replicas; locks and bookings go to primary.
- Backpressure β if lock-attempt latency exceeds a budget, return 503 and let the client retry with backoff rather than melt the lock store.
Sharding β
Shard by showId. All seat keys for a show route to the same Redis node (Redis-cluster hash tag: {showId}:seatId). All bookings for a show route to the same DB partition. This keeps the per-show critical section on one machine and avoids cross-shard lock acquisition, which is how you accidentally reinvent 2PC.
Multi-region β
Shows are physically regional. A Mumbai show is booked by users in Mumbai. Route by show's city to the nearest region; each region runs its own Redis + DB for its shows. Cross-region consistency is not a problem because bookings are regional. The one cross-region need is user identity (a user traveling from Delhi to Mumbai); keep user identity in a global table with eventual replication.
CDN & caching β
- Static metadata (cities, movies, theatres, showtimes) behind a CDN with long TTL + purge on admin update.
- Seat map (stateful) is short-TTL (~1 s) at the app layer; never at CDN. The seat map is never "served stale" β users get their own live read when they click "pick seats."
Async email / notification β
Fire-and-forget booking.confirmed event to SNS/Kafka. Email service, analytics, recommendations all subscribe. The booking confirm path returns to the user in tens of milliseconds; notifications are best-effort.
Payment provider failover β
Abstract behind IPaymentGateway. Primary gateway (Stripe). Secondary (Razorpay). If the primary returns 5xx or circuit-breaker trips, attempt on secondary with a different idempotency key (different gateway, different namespace). One booking may have two payment attempts in the log; at most one settles because the state machine only accepts PAYMENT_SUCCEEDED from the first acknowledging gateway.
Cancellation and refund β
Cancel is a state transition + seat release + event publish. Refund is a separate consumer of the event, calling gateway.refund(paymentId, amount, idem). Refund consumer has its own retry and its own idempotency key. User sees the cancellation immediately; refund reflects in 3β5 days, matching user expectations.
New screen formats / seat layouts β
The Screen holds a Seat[] template. IMAX-recliner, drive-in, couple-seat ("LoveSeat" category spanning two columns) β all just new SeatCategory values and new layout templates. Show.getSeats doesn't care. IPricingStrategy gets a new rule. Nothing else moves.
Waitlist (deferred) β
When a show sells out, users join a waitlist per show. On any seat release (cancellation, TTL expiry of a lock that had a successful booking never happens β lock expiry only frees locked, not booked seats), pop the next waitlist entry, send a 10-min lock-grant notification. Implement as a Redis list per show plus a fan-out worker on seat-release events.
Edge Cases β
- User locks seats, closes tab β TTL fires; Redis evicts; reconciler flips booking to
EXPIRED; seat is available. No intervention. - Payment succeeds at gateway, confirm fails β
PAYMENT_PENDINGbooking with apaymentId. Reconciler polls gateway, seesSUCCEEDED, runsforceConfirm. User gets ticket. Worst-case delay: one reconciler cycle (1 min). - Two users click the same seat simultaneously β Redis Lua SETNX chain: exactly one succeeds; the other receives
SeatUnavailableErrorwith the conflicting seat name; UI shows "that seat was just taken, please pick another." - Show starts while user is in checkout β
Show.isBookable(now)gates at the createBooking step. If the user takes so long that the show starts mid-payment, confirm rejects withShowNoLongerBookableErrorand triggers compensating refund. - Seat removed from screen (maintenance) mid-lock β Admin operation takes a per-show meta-lock, refunds any active locks and bookings involving the seat (compensating saga), then removes it. Rare; requires admin tooling.
- Partial group lock β User requests 4 seats, only 3 are lockable. Lua script rolls back the 3 acquired, returns the offending seat id. UI offers "3 of 4 available here, want to pick a different 4th?" or alternative seat blocks.
- Duplicate webhook β Gateway fires
PAYMENT_SUCCEEDEDtwice for network retry. Handler is keyed by event id; second handler sees booking is alreadyCONFIRMED, no-ops. State machine rejects the transition as well. - Clock skew β Redis TTL is authoritative because it's evaluated on Redis's clock; app-server clocks don't matter for eviction. For
validateLock(lockId)we use Redis's returned TTL, notDate.now(). - Lock-holder retries
createBookingafter network glitch β Idempotency key on createBooking returns the same Booking + LockToken. No double lock. - Confirm called with a lockToken for someone else's lock β
validateLockchecksuserIdmatches. Mismatch throwsForbiddenError. - Seat renumbered between lock and confirm β Shouldn't happen because admin ops take a meta-lock. If it does, confirm fails, lock released, refund if charged.
- Refund idempotency β Refund consumer uses
refund:{bookingId}as idempotency key. Duplicate refund events are no-ops at the gateway.
Follow-up Questions β
- Payment service is down for 20 minutes. What happens to in-flight bookings? Bookings stuck in
PAYMENT_PENDINGpile up; lock-TTL expires; seats release. New bookings fail fast with a circuit-breaker-driven "try again" message. Reconciler backs off exponentially on payment-status polls. - How do you implement refund? Separate consumer off the
booking.refund_requestedevent. Idempotent call topaymentGw.refund. Retry with exponential backoff on transient failures, dead-letter queue after N attempts for manual intervention. - Flash sale: 100k people, 1k seats, sells out in 30 s. Design it. Admission queue throttles to ~200/s into the lock service; seat-selection UI pre-filters unavailable seats to reduce futile lock attempts; short TTL (90 s, no extension) so abandoners don't strand inventory; bot defenses at the queue ingress.
- How do you detect and prevent scalping bots? Device fingerprinting, phone-OTP at checkout, purchase caps per account + per payment instrument, CAPTCHA on queue entry, anomaly detection on booking patterns (same IP, 50 bookings/min).
- How do you test the race conditions? Deterministic unit tests: inject a fake clock and a fake Redis that interleaves operations in a chosen order. Chaos tests: run thousands of concurrent lock attempts against real Redis; assert post-conditions (each seat booked by exactly one booking, or none). Formal methods: if the interviewer presses, mention TLA+ for the state machine invariants.
- What metric tells you when lock contention is degrading UX? The
lock.conflict.count / lock.attempt.countratio. Alsolock.attempt.p99_latencyβ if it's above ~200 ms you're losing users who abandon before the UI responds. - How do you evolve from in-memory lock to distributed lock without downtime? Deploy a new
ISeatLockServiceimplementation. Shadow-mode: for each lock call, call both implementations, compare outcomes, alert on divergence. After a week of clean shadow runs, flip traffic, keep old implementation as dark backup for a release cycle, then remove. No user-visible downtime. - Why not 2PC across booking DB and payment gateway? The gateway doesn't speak XA. Even if it did, 2PC blocks on coordinator failure β unacceptable for a user-facing flow. Saga with compensating transactions (refund on confirm-failure) gives us better availability with correctness we can reason about.
- How would you support a "book for a friend" flow where payer β seat-holder? Add a
heldForUserIdfield onBooking. Payer-side flow is unchanged. Ticket QR encodesheldForUserId; gate-check validates against that user's ID. Refund goes to payer. - What's the failure mode if Redis is unreachable? Booking creation returns 503. Existing bookings in
PAYMENT_PENDING/PAYMENT_SUCCEEDEDcontinue (they don't need Redis anymore β locks have served their purpose). The system degrades gracefully: no new bookings, existing ones complete. - How would you add seat recommendations ("best available")? A small recommender service reads the seat map, ranks by (center-ness, contiguity, user preference), returns top-k. Pure read, no effect on the lock path.
- If we discover a seat is defective mid-show, how does the system learn the booking was bad? Admin-triggered compensating event:
seat.defectiveβ finds the active booking β transitions toCANCELLED+ refund + notify. The state machine tolerates admin-forced cancels.
SDE2 vs SDE3 β How the Bar Rises β
| Dimension | SDE2 | SDE3 |
|---|---|---|
| Concurrency strategy | "Lock the seat before booking." Picks Redis or DB locks; explains the basic race. | Compares pessimistic/optimistic/DB/Redis/Redlock with numbers; names Redis failover risk and the DB-uniqueness backstop; argues proportionality. |
| Idempotency | Mentions idempotency keys on the API. | Names three key boundaries (clientβservice, serviceβgateway, webhookβservice) with distinct scopes and failure analyses for each. |
| TTL reasoning | Picks 5 min, moves on. | Discusses TTL-vs-abandonment tradeoff, TTL extension on payment submit, the mid-payment TTL-expiry edge case and its compensating flow. |
| State machine | Draws the booking states. | Enforces transitions with guard rails; explains why late webhook for EXPIRED booking must be rejected; handles PAYMENT_PENDING reconciliation. |
| Observability | Logs errors. | Names the specific metrics (lock.conflict, payment_pending.age), defines alert thresholds, explains the dashboard they'd build day-1. |
| Saga vs 2PC | Doesn't distinguish. | Explains why 2PC is infeasible (gateway doesn't cooperate), walks the saga with compensation steps, gives the dead-letter plan for failed refunds. |
| Spike handling | "Add caching." | Admission queue + bounded admission + short TTL + bot defenses, with the specific Redis primitives. |
| Testing | Unit tests. | Deterministic race simulation (faked clock + ordered interleaving), chaos tests with assertions on invariants, alludes to formal model when invariant-heavy. |
| Failure modes | Handles happy path. | Enumerates partial-failure scenarios (Redis failover, webhook lost, gateway down) and shows the recovery path for each. |
| System evolution | Ships v1. | Designs the v1 β v2 migration (shadow-mode lock-service swap) with zero-downtime plan and rollback. |
The gap between SDE2 and SDE3 on this problem isn't knowledge of more patterns β it's the reflex to ask "how does this fail?" for every component and to have a designed recovery path before the system is built. The interviewer isn't looking for someone who can model a theatre. They're looking for someone who can name the seven ways the lock can leak and compensate for each.