HLD: Flash Sale System

Frequently Asked at Salesforce SMTS — Common HLD problem in recent SMTS interviews (2024-2026).

Understanding the Problem

What is a Flash Sale System?

A flash sale is a merchant campaign where a fixed quantity of a SKU (say 1,000 units) goes on sale at an exact moment (12:00:00 PM sharp) and typically sells out in seconds. The system must absorb 10-100x normal traffic for a 60-120-second burst, decrement inventory atomically (selling exactly 1,000 units — not 1,001 and not 999), apply some notion of fairness to buyers, and critically, isolate one org's sale from every other org's normal commerce traffic. This is a classic "burst handling + inventory correctness" system design problem that shows up in Salesforce Commerce contexts.

Functional Requirements

Core (above the line):

Schedule a flash sale — a merchant (org) configures a SKU, stock count, start/end time, and optional per-user cap.
Accept buy intents with atomic decrement — oversell is a P0 bug. We must never sell the 1,001st unit.
Virtual waiting room — during overflow, buyers queue fairly, get an ETA, and are admitted at a controlled rate.
Checkout with payment integration — reservation → payment → order. Reservations auto-release on abandon/timeout.
Real-time stock counter on the product page (eventually consistent is fine; < 1s staleness).

Below the line (out of scope):

Marketing funnel attribution — we only record raw conversion; warehouse handles attribution.
Advanced bot detection — we assume an edge provider (Cloudflare, Akamai Bot Manager) stubs out CAPTCHA and fingerprinting.
Post-sale analytics — CDC export to warehouse handles reports.
Cart with multiple SKUs — flash sale is single-SKU for simplicity.

Non-Functional Requirements

Core:

Scale: normal commerce 5k RPS; flash peak 200-500k RPS for 60s.
Latency: admission response < 100 ms; checkout < 1 s end-to-end.
Consistency: inventory must be strongly consistent (atomic decrement, no oversell). Other pages (product description, reviews) can be eventually consistent.
Availability: 99.99% during the sale window. Degrade reads before writes — we can show "see cached page" but never fail a checkout due to an infra blip.
Multi-tenancy: Org A's mega-sale cannot starve Org B's normal checkout.

Below the line:

Global fairness across orgs (we do per-sale fairness; cross-org priorities are out of scope).
End-to-end exactly-once payments (we rely on payment-provider idempotency keys).

Capacity Estimation

500k RPS × 60 s burst = 30M total requests in the sale window. Plan for 50M with 2x headroom.
Admission bucket at 10k users/s admitted into real checkout.
Redis cluster target: 1M ops/s with 20% headroom → 12-node cluster (each node ~85k ops/s).
1,000 units of stock ÷ 10k admission/s = sold out in < 1 s if everyone converts. Plan for 3-10% conversion so effective admit rate covers ~30k-100k tries for 1,000 buys.
Reservation TTL: 120 s. Cleanup via Redis key expiry.

The Set Up

Core Entities

Organization — orgId, tier (determines shared vs dedicated resources).
Sale — saleId, orgId, skuId, stock (initial count), startAt, endAt, perUserCap.
Inventory — hot counter in Redis (inv:{saleId}), durable mirror in Postgres.
Reservation — reservationId, saleId, userId, expiresAt. Ephemeral, Redis-resident.
Order — orderId, saleId, userId, status (pending / paid / refunded). Postgres.
WaitingRoomToken — token, saleId, issuedAt, admitAt. Signed JWT scoped to the sale.

The API

POST /v1/orgs/{orgId}/sales                        (merchant creates a sale)
GET  /v1/orgs/{orgId}/sales/{saleId}

POST /v1/sales/{saleId}/waiting-room               → { token, etaSec }
GET  /v1/sales/{saleId}/waiting-room/{token}       → { status: "waiting" | "admitted" }

POST /v1/sales/{saleId}/reserve                    (admitted only)
  → { reservationId, expiresIn }

POST /v1/sales/{saleId}/checkout
  → { orderId }

Waiting-room endpoints are public (buyer tokens, not user auth required beyond session).
Reserve and checkout require both an admitted token and user auth.

High-Level Design

Architecture

               ┌───────────┐   ┌──────────────┐    ┌────────────────┐
  Buyers ─────▶│  CDN +    │──▶│  Waiting     │───▶│ Admission Svc  │
               │  Edge WAF │   │  Room Svc    │    │ (token bucket  │
               └───────────┘   │  (FIFO queue │    │ per sale)      │
                               │  in Redis)   │    └───────┬────────┘
                               └──────────────┘            │
                                                           ▼
                                                  ┌───────────────┐
                                                  │ Reservation   │
                                                  │ Svc (Redis    │
                                                  │ + Lua atomic) │
                                                  └──────┬────────┘
                                                         │
                                                         ▼
                                                  ┌───────────────┐      ┌──────────┐
                                                  │ Checkout Svc  │─────▶│ Payments │
                                                  └──────┬────────┘      └──────────┘
                                                         │
                                                         ▼
                                                  ┌───────────────┐
                                                  │ Postgres      │
                                                  │ (orders,      │
                                                  │ durable inv.) │
                                                  └───────────────┘

End-to-end flow: a buyer in a flash sale

At T-10 minutes, CDN pre-warm caches the product page everywhere; a JS widget handles the live countdown and inventory ticker.
At T+0, users rush in. CDN serves the static page; only the JS widget's inventory poll and the "buy" button POSTs hit origin.
User clicks "Buy." Request hits Edge WAF (Cloudflare/Akamai) for bot filtering.
Request reaches the Waiting Room Service, which issues a signed token with a FIFO position in a Redis list (LPUSH wr:{saleId}).
Client polls /waiting-room/{token} every 2s. Waiting Room Service checks position + current admission rate.
Admission Service drains the queue at 10k/s (LPOP + token bucket). When the user is admitted, Waiting Room updates their token status to admitted.
Client calls POST /reserve. Reservation Service runs an atomic Lua script on Redis:
- Check current stock inv:{saleId}.
- Check per-user cap via a SISMEMBER buyers:{saleId}.
- DECR inventory, SADD buyers, create resv:{saleId}:{resvId} with TTL 120s.
- Return reservationId or error code.
Client has 120s to complete payment. Calls POST /checkout with the reservationId.
Checkout Service verifies the reservation is still valid, calls Payments (Stripe/Braintree) with an idempotency key, writes an orders row in Postgres, and confirms the reservation.
Async reconciliation job mirrors Redis inventory changes to Postgres every second for durability.
If the reservation expires without checkout, a Redis keyspace notification triggers a rollback: INCR inv:{saleId} and SREM buyers:{saleId} userId.

Data model

sales in Postgres: (org_id, sale_id) PK, sku_id, stock_initial, stock_remaining_durable, start_at, end_at, per_user_cap.
orders in Postgres: (org_id, sale_id, order_id) PK, user_id, status, payment_ref, created_at.
inventory_counter in Redis: inv:{saleId} → integer. Authoritative during the sale.
reservations in Redis: resv:{saleId}:{resvId} → userId, TTL 120 s.
buyers in Redis: SET buyers:{saleId} contains userId entries for per-user cap.
Every second: async job reads Redis stock, writes to stock_remaining_durable in Postgres.

Multi-Tenancy Strategy

Isolation level: L1 shared DB + shared schema for the baseline, with dedicated Redis shards for mega-tenant sales. Choosing L1 for Postgres keeps cost sane at 100k merchant orgs. But the Redis hot path, which drives the entire correctness of the sale, gets per-mega-tenant dedicated shards because a single sale is a burst event that can blow the cache of co-tenants.

Tenant context flow:

orgId is extracted from the buyer's session / merchant's JWT.
Tenant context stamped in every log, metric, and trace. Admission tokens are signed with the orgId baked in.
Every DB query filters by org_id. Every Redis key is prefixed with {orgId} hash tag for Redis Cluster slot colocation: inv:{orgId}:{saleId}.

Noisy-neighbor mitigations:

Per-sale rate limits distinct from per-org rate limits. An org running 3 concurrent sales gets 3 independent capacity budgets.
Separate service pool for flash sales vs normal commerce. Even inside the flash-sale pool, traffic hits a shuffle-sharded subset of admission nodes — each sale maps to k=8 of n=100 nodes, so one misbehaving sale cannot take down the whole admission tier.
Dedicated Redis shards for Enterprise-tier flash sales. inv:{saleId} keys land on a pod that no other tenant shares.
Autoscale ahead of start time. When a sale is scheduled, scheduler proactively warms capacity (k8s HPA with scheduled scaling) to handle 10x normal load at T-5 min.
Bulkhead on payments: flash-sale checkouts go through a dedicated payments pool so they cannot saturate the shared payment circuit breakers used by normal commerce.

Per-tenant observability:

Per-sale dashboards: arrivals, admitted, reserved, checked out, abandoned, oversells (must always be zero).
Per-org metrics labeled with org_id + sale_id.
Alerts: oversell > 0 is SEV-1. Reservation leak rate > 10%. Admission queue depth.

Potential Deep Dives

1) How do we atomically decrement inventory without overselling?

Bad Solution: Read-then-write.

Approach: SELECT stock FROM inv → check if > 0 → UPDATE inv SET stock = stock - 1.
Challenges: Classic race condition. Two concurrent buyers both read stock=1, both update to 0. Oversold. At 500k RPS, this blows up instantly.

Good Solution: Atomic DB update with WHERE clause.

Approach: UPDATE inv SET stock = stock - 1 WHERE sale_id = ? AND stock > 0 RETURNING stock. Single SQL statement; DB enforces atomicity.
Challenges: Works correctly, but Postgres row-level lock contention on the single hot row is brutal at 500k RPS. Lock wait times dominate. You saturate Postgres.

Great Solution: Redis + Lua atomic script, with bucket sharding for huge sales.

Approach: A single Lua script runs atomically on Redis, performing stock check, per-user cap check, decrement, and reservation creation in one step:

JavaC++TypeScript

java

// KEYS[1] = inv:{saleId}  ARGV[1] = userId  ARGV[2] = ttlSec
String LUA = """
  local left = tonumber(redis.call('GET', KEYS[1]) or '0')
  if left <= 0 then return -1 end
  if redis.call('SISMEMBER', KEYS[1]..':buyers', ARGV[1]) == 1 then
    return -2   -- already bought (per-user cap)
  end
  redis.call('DECR', KEYS[1])
  redis.call('SADD', KEYS[1]..':buyers', ARGV[1])
  local resv = redis.call('INCR', KEYS[1]..':resvSeq')
  redis.call('SET', KEYS[1]..':resv:'..resv, ARGV[1], 'EX', ARGV[2])
  return resv
""";
Long resv = redis.eval(LUA, List.of("inv:" + saleId), List.of(userId, "120"));

cpp

const char* kLua = R"LUA(
  local left = tonumber(redis.call('GET', KEYS[1]) or '0')
  if left <= 0 then return -1 end
  if redis.call('SISMEMBER', KEYS[1]..':buyers', ARGV[1]) == 1 then return -2 end
  redis.call('DECR', KEYS[1])
  redis.call('SADD', KEYS[1]..':buyers', ARGV[1])
  local resv = redis.call('INCR', KEYS[1]..':resvSeq')
  redis.call('SET', KEYS[1]..':resv:'..resv, ARGV[1], 'EX', ARGV[2])
  return resv
)LUA";
auto reply = redis.Eval(kLua,
    {"inv:" + sale_id},
    {user_id, "120"});

typescript

const lua = `
  local left = tonumber(redis.call('GET', KEYS[1]) or '0')
  if left <= 0 then return -1 end
  if redis.call('SISMEMBER', KEYS[1]..':buyers', ARGV[1]) == 1 then return -2 end
  redis.call('DECR', KEYS[1])
  redis.call('SADD', KEYS[1]..':buyers', ARGV[1])
  local resv = redis.call('INCR', KEYS[1]..':resvSeq')
  redis.call('SET', KEYS[1]..':resv:'..resv, ARGV[1], 'EX', ARGV[2])
  return resv
`;
const resv = await redis.eval(lua, 1, `inv:${saleId}`, userId, "120");

For very large sales (stock > 100k units), pre-shard the counter into N buckets: inv:{saleId}:0..N-1. The admission service picks a bucket at random. This removes the single-key hot spot while preserving atomicity per bucket. Total remaining stock = sum across buckets, computed asynchronously for display.

Challenges: Redis is a single point of failure; mitigate with Redis Cluster and replica failover. If Redis goes down mid-sale, we must fail closed (reject buys) rather than oversell. Bucket sharding means a user might see bucket-0 sold out while bucket-1 has stock — refresh + retry handles this UX issue.

2) How do we prevent cache stampede on the product page?

Bad Solution: Every request reads from Postgres on cache miss.

Approach: Standard cache with TTL; on miss, fetch from DB.
Challenges: At T+0, every request hits cache miss simultaneously. 500k RPS pierces to Postgres. Origin DB melts.

Good Solution: Pre-warm + single-flight + TTL jitter.

Approach: Pre-populate cache 10 minutes before the sale for all known hot SKUs. On miss, use SETNX as a lock so only one request recomputes; others wait. TTL includes ±10% jitter so not all keys expire together.
Challenges: Even with pre-warming, the moment the cache expires mid-sale can spike origin. Single-flight locks can hold many requests briefly.

Great Solution: Static HTML at CDN + live widget hits only inventory counter.

Approach:
- Product page is pre-rendered as static HTML and pushed to the CDN at T-10 min. 99% of requests never reach origin.
- A small JS widget on the page polls GET /sales/{saleId}/stock (cached at the edge with 1s TTL) to show the live counter.
- The only origin-hitting path is the "buy" button POST, and that goes through the admission/reservation flow.
- Cache-Control: stale-while-revalidate=60 so the CDN can serve slightly stale stock while asynchronously refreshing.
- Origin sees < 1% of total page traffic.
Challenges: Static HTML means personalized elements (user-specific recommendations) have to be client-side-rendered post-load. Stock counter drifts ~1s from reality, which is fine for display; the atomic decrement is still truth.

3) How do we enforce fairness in the waiting room?

Bad Solution: First-come-first-served by client-side timer.

Approach: "Buy now" button becomes clickable at exactly 12:00:00; first 1,000 clicks win.
Challenges: Users with faster networks, pre-warmed connections, and bots dominate. Real users with 4G connections lose every time. Not fair.

Good Solution: Server-side FIFO queue with fixed admission rate.

Approach: On arrival, user gets appended to a Redis list. Admission service pops at 10k/s. Each user gets a signed JWT with their position and ETA. Client polls for status.
Challenges: Bots can automate the arrival step to land at queue position 1. A sophisticated bot with pre-established TCP connections and a 1-ms-before-open script still wins over a human.

Great Solution: Randomized batching + per-user cap + fairness rule per sale.

Approach:
- Queue accumulates arrivals for a brief window (e.g., 500 ms), then admits a randomized batch from that window. Bots that arrive at T-1ms still have to compete fairly within the window batch.
- Per-user cap enforced at both admission layer (can't hold 10 tokens for the same user) and reservation layer (can't reserve if already a buyer).
- Per-sale fairness rule configurable:
  - FIFO for commodity sales where arrival time matters.
  - Lottery for scarce SKUs (Supreme drops, limited-edition shoes): everyone who arrives in the first N seconds has equal shot.
  - Weighted for loyalty programs: gold members get 3x lottery tickets.
- Tokens are signed JWTs that carry (saleId, userId, admissionRank). Re-use is prevented by a Redis SET of used tokens.
- Queue state persists across Redis replica failover so a blip doesn't reset positions.
Challenges: Lottery frustrates power users who feel they earned first position. FIFO frustrates regular users beaten by bots. Pick per-sale; document the UX clearly. Token distribution attacks (one user registers 1000 accounts) need additional fraud controls.

4) How do we isolate tenants during a mega-sale?

Bad Solution: Shared admission service across all orgs.

Approach: One pool of admission nodes for everyone.
Challenges: Org X's 500k-RPS sale takes down Org Y's normal checkout traffic. CPU starvation, network saturation, Redis slot contention all hit shared infrastructure.

Good Solution: Feature-flagged dedicated pool for scheduled flash sales.

Approach: Ops flags a sale as "large." Scheduler pre-autoscales a dedicated admission pool. Regular commerce stays on its pool.
Challenges: Still one pool for all "large" sales — two concurrent mega-sales from different orgs can collide. Also pool autoscaling is reactive; initial burst may still overwhelm.

Great Solution: Cell-based architecture with shuffle sharding.

Approach:
- Each cell is a full vertical stack: admission nodes + dedicated Redis cluster + Postgres writer + payment pool. A cell is the unit of blast radius.
- Orgs map to cells via shuffle sharding: each org assigned to k=4 of n=50 cells. Cell failure costs 4/50 = 8% of capacity, not 100%. Mega-sales from different orgs use disjoint cells with high probability.
- Per-cell monitoring and per-org-per-cell dashboards. If Org X's sale saturates 3 of its 4 cells, alert fires.
- Pre-scheduled mega-sales can request a dedicated cell for 2 hours (contractually for Enterprise tier).
- Graceful degradation: if all cells for an org are saturated, admit to the overflow pool (shared) at reduced SLA rather than dropping.
Challenges: Cell-based architecture has operational cost — more clusters to run, more configs, more ops. Data duplication for things like SKU metadata. Requires mature deployment automation. Worth it at Salesforce scale.

5) How do we reconcile inventory between Redis and Postgres?

Bad Solution: Redis-only, never persist.

Approach: Keep the hot counter in Redis for the duration of the sale.
Challenges: Redis crash loses current stock state. Worst case: Redis fails over, replica lagged by 200 ms, we lose the most recent 200ms of sales.

Good Solution: Write-through to Postgres on every decrement.

Approach: Lua script decrements Redis and also writes the new value to Postgres in the same logical operation.
Challenges: Cannot be truly atomic across Redis + Postgres — distributed transaction. Write amplification kills throughput.

Great Solution: Redis authoritative + async 1s reconciliation + post-sale replay.

Approach:
- Redis is authoritative during the sale window.
- Async reconciler job reads Redis stock every 1 second and writes to Postgres stock_remaining_durable. RPO = 1 second.
- All reservation-creation events emit a Kafka message with (saleId, userId, resvId, timestamp) to sales.events. Durable log; even if Redis dies, we can replay.
- After the sale closes, a final reconciliation compares event-log sum to durable stock and flags any discrepancies. In practice, this is zero if the Lua script is correct.
- If Redis fails mid-sale, we pause admissions (fail closed), restore from replica + event log, and resume. Observable downtime: 10-30s; oversells: 0.
Challenges: Event log adds throughput load. Reconciler lag means durable state is always ~1s behind real time — acceptable for post-sale analytics. Replay logic for Redis recovery is complex and must be tested with chaos drills.

What is Expected at Each Level?

Mid-level (SMTS-junior)

Correct atomic decrement (at least the Good DB-level solution). Basic waiting room with FIFO. Clear "no oversell" property. Can be prompted for cache stampede and cell isolation.

Senior (SMTS / LMTS)

Redis Lua atomicity. Per-user cap enforced at the Lua level. Cache stampede mitigation via pre-warming + static HTML. Payment reconciliation with idempotency keys. Per-sale rate limits. Back-of-envelope for Redis cluster sizing.

Staff+ (PMTS)

Cell-based architecture with shuffle sharding. Fairness math (lottery vs FIFO vs weighted). Pre-warming and autoscale schedules. Graceful degradation plan. Post-mortem-friendly observability (every oversell is a SEV-1 with full audit trail). Cost model (dedicated cells vs shared capacity). Chaos testing plan.

Salesforce-Specific Considerations

Product analog: Salesforce B2C Commerce / SFRA (Storefront Reference Architecture) flash campaigns.
Governor-limit parallel: each checkout transaction should stay within a bounded number of DB writes. Expensive post-order work (email confirmations, analytics events) goes async via Platform Events. Mirrors Salesforce's "keep sync short, push async" philosophy.
Platform Events: emit SaleInventoryLow__e when stock drops below a threshold and OrderPlaced__e on successful checkout. Other parts of the platform (analytics, Flow triggers) subscribe without adding latency to the critical path.
Shield Event Monitoring: every admission and reservation is logged for compliance audit; this becomes the tamper-evident record of who bought what at what time.
Hyperforce data residency: the sale's home region is the merchant org's region; buyers from other regions are served via CDN but their checkout POSTs land in the home region.
Governor-style fairness: the core Salesforce philosophy "cap the worst case rather than optimize the best" shows up here as our fairness lottery and per-user caps — we'd rather frustrate a power user than let a bot cartel buy out the whole sale.

Example snippet — async reconciler

JavaC++TypeScript

java

@Scheduled(fixedDelay = 1000)
public void reconcile() {
    for (String saleId : activeSales()) {
        Long redisStock = redis.getLong("inv:" + saleId);
        if (redisStock != null) {
            jdbc.update(
              "UPDATE sales SET stock_remaining_durable = ? WHERE sale_id = ?",
              redisStock, saleId);
        }
    }
}

cpp

void Reconciler::Tick() {
  for (const auto& sale_id : ActiveSales()) {
    auto stock = redis_.GetLong("inv:" + sale_id);
    if (stock.has_value()) {
      db_.Execute(
        "UPDATE sales SET stock_remaining_durable = $1 WHERE sale_id = $2",
        *stock, sale_id);
    }
  }
}

typescript

setInterval(async () => {
  for (const saleId of await activeSales()) {
    const stock = await redis.get(`inv:${saleId}`);
    if (stock !== null) {
      await db.query(
        "UPDATE sales SET stock_remaining_durable = $1 WHERE sale_id = $2",
        [Number(stock), saleId]
      );
    }
  }
}, 1000);

HLD: Flash Sale System ​

Understanding the Problem ​

What is a Flash Sale System? ​

Functional Requirements ​

Non-Functional Requirements ​

Capacity Estimation ​

The Set Up ​

Core Entities ​

The API ​

High-Level Design ​

Architecture ​

End-to-end flow: a buyer in a flash sale ​

Data model ​

Multi-Tenancy Strategy ​

Potential Deep Dives ​

1) How do we atomically decrement inventory without overselling? ​

2) How do we prevent cache stampede on the product page? ​

3) How do we enforce fairness in the waiting room? ​

4) How do we isolate tenants during a mega-sale? ​

5) How do we reconcile inventory between Redis and Postgres? ​

What is Expected at Each Level? ​

Mid-level (SMTS-junior) ​

Senior (SMTS / LMTS) ​

Staff+ (PMTS) ​

Salesforce-Specific Considerations ​

Example snippet — async reconciler ​