System Design (HLD) — Salesforce SMTS

Salesforce system-design rounds blend classic web-scale problems with multi-tenant SaaS flavour. Even frontend-leaning HLDs are expected to cover capacity, API shape, data model, and at least one deep-dive with tradeoffs. This file gives you the framework, a multi-tenancy primer, then eight detailed walkthroughs.

System Design Framework for Salesforce

Follow this sequence every time. Twenty-five minutes of the hour goes into the first four steps.

Clarify requirements (5 min) — functional + non-functional. Ask about tenants, offline, internationalisation, analytics vs OLTP. Six to eight pointed questions.
Capacity estimation (3 min) — users, QPS peak vs average, storage growth, bandwidth. Numbers must be defensible, not precise.
Core entities (3 min) — top-level nouns, cardinalities.
API design (5 min) — REST / GraphQL / WebSocket. Sketch 3-5 key endpoints with request / response shapes.
Architecture (5 min) — client, edge (CDN, WAF), gateway, services, queues, stores, caches. Draw a block diagram.
Data model (3 min) — SQL vs NoSQL vs blob vs time-series. Tenant-ID as partition key where relevant.
Multi-tenancy (3 min) — isolation level, noisy-neighbour mitigation, per-tenant quotas.
Deep dives (15 min) — interviewer picks one to three. Present Bad → Good → Great progression.
Operability and risk (3 min) — observability, rollout, failure modes.

Multi-Tenancy Patterns Primer

Salesforce is the canonical multi-tenant SaaS, so reaching for the right isolation model is table stakes. There are three classical levels:

Level	Shape	Pros	Cons	When
Shared DB, shared schema	Tenant-ID column on every row	Cheapest, densest	Noisy neighbour, hard per-tenant backups	Default for most SaaS, Salesforce Platform
Shared DB, separate schema	Schema per tenant	Easier per-tenant schema evolution, backups	Schema migrations explode with tenant count	Mid-market SaaS
Separate DB	DB per tenant	Strong isolation, regulated data	Costly, operationally heavy	Enterprise / healthcare / finance tier

Rules of thumb:

Tenant-ID goes in the primary-key prefix and in every index. Never let a query run without it.
Enforce at the framework level, not the application level — a missing WHERE tenant_id = ? is a data breach. Use row-level security or a data-access gateway.
Cache keys always start with tenant-ID. Two tenants' user:42 must not collide.
Quotas (QPS, storage, compute) per tenant prevent noisy neighbours. Enforce at the gateway.
Hot-tenant isolation: the top 1% of tenants often drive 50% of load. Give them their own pool or shard.

Salesforce-Specific Context

Interviewers will probe these themes even on "generic" system design questions:

Org-level customisation — each tenant (org) can define custom fields, custom objects, custom validation rules. Your data model should explain how you support this without schema migrations. Hint: a single custom_fields table with (tenant_id, object_id, field_key, value) rows, or EAV, or a wide sparse table.
Governor limits — Salesforce ships limits (per-tenant API calls, DB rows returned). Mention them; show you'd build similar caps.
Trust and audit — every write should produce an audit event. Retention and tamper-evident logs matter.
Global with data residency — EU / US / APAC data residency. Pick partition strategy accordingly.

1. Design WhatsApp Web

Reported Frequency: Nov 2025 SMTS Frontend HLD round.

Problem

Build the WhatsApp web client backed by a messaging service. Real-time 1:1 and small group chat. Offline send with a local queue. Read receipts and typing indicators. Multi-device sync when the phone is the source of truth. Works in a single tab; optimistic UI on send.

Clarifying Questions

You: "Is the phone still the source of truth, or is it a first-class device?" — Interviewer: "First-class. Both can go offline."
You: "Group size cap?" — Interviewer: "Up to 256 for v1."
You: "Voice / video / media?" — Interviewer: "Text + media. No voice."
You: "E2E encryption?" — Interviewer: "Yes, but you can treat the crypto layer as a black box."
You: "Retention?" — Interviewer: "Server keeps messages until delivered to all recipient devices, then purges."
You: "Scale — DAU?" — Interviewer: "2 billion DAU globally, but design for a single region first."
You: "Delivery guarantee?" — Interviewer: "At-least-once with de-dup on client."
You: "Cursor position and typing indicator scale?" — Interviewer: "Yes, typing indicator must fan out."

Functional Requirements

Send / receive text and media messages.
Delivery states: Sent → Delivered → Read.
Typing indicators.
Offline queue on client; flush on reconnect.
Multi-device sync.
Group chat up to 256.
Search message history.

Non-Functional Requirements

P99 message delivery: under 500 ms at steady state.
Availability: 99.99% for the messaging path.
Durability: no message loss after server ACK.
Scale: 1M concurrent connections per messaging pod.
Privacy: E2E encryption, server cannot read plaintext.

Capacity Estimation

2B DAU, peak 100M concurrent connections → ~1000 pods at 100k connections each.
50 messages/user/day → 100B messages/day → ~1.2M msg/s average, 5M msg/s peak.
Avg message 200 B encrypted → 200 GB/day of text, easily 10 TB/day with media.
Retention until delivered: keep ~7 days buffer → 1.4 TB of text + media buffer.

Core Entities

User, Device, Conversation, Message, DeliveryReceipt, ConversationMember.

High-Level Architecture

Browser (React)
  └── Service Worker (offline queue, background sync)
  └── IndexedDB (message history cache)
        │
        ▼  WSS
   [Edge WebSocket Gateway] ── terminates TLS, pins to pod
        │
        ▼ (gRPC)
   [Message Router] ── sharded by conversation_id
        │           ├── [Outbox Kafka] ── durable log
        │           ├── [Presence Service] ── Redis
        │           └── [Push Service] ── FCM / APNs
        ▼
   [Message Store] ── Cassandra, partitioned by conversation_id
   [Media Store] ── S3 + CDN (signed URLs)
   [Search Index] ── Elasticsearch per user, E2E-encrypted tokens

API Design

REST for control plane, WebSocket for data plane.

typescript

// Control plane (HTTPS)
POST   /v1/conversations                          { memberIds: string[] } -> { id }
GET    /v1/conversations/:id/messages?before=t    -> Message[]
POST   /v1/media/upload                           multipart -> { mediaId, cdnUrl }

// Data plane (WSS, framed)
// client -> server
{ type: "SEND", clientId: "uuid", convId, ciphertext, sentAt }
{ type: "DELIVERED", convId, messageId }
{ type: "READ", convId, messageId }
{ type: "TYPING", convId, on: boolean }

// server -> client
{ type: "ACK", clientId, serverId, serverTs }
{ type: "DELIVER", convId, message: {...} }
{ type: "RECEIPT", convId, messageId, by: userId, state: "DELIVERED" | "READ" }

Data Model

Cassandra, partitioned by conversation_id with clustering on (server_ts, message_id):

sql

CREATE TABLE messages (
  conv_id    uuid,
  server_ts  timestamp,
  msg_id     timeuuid,
  sender_id  uuid,
  ciphertext blob,
  media_ids  list<uuid>,
  PRIMARY KEY ((conv_id), server_ts, msg_id)
) WITH CLUSTERING ORDER BY (server_ts DESC);

Receipts in a separate table keyed by (conv_id, msg_id) with user + state columns. Delivery state is a per-message set of (device_id, state) tuples.

Multi-Tenancy Strategy

WhatsApp isn't multi-tenant in the SaaS sense, but partitioning by conv_id gives you the same noisy-neighbour protection. A single hot group chat must be shardable: introduce sub-shards (conv_id, shard_no) once a conversation exceeds a QPS threshold.

Deep Dives

1. Connection management

Bad: every client opens a long WebSocket to a single origin. A pod restart disconnects millions.
Good: edge gateway pool with sticky session via user_id hashing. Health-checked; clients reconnect with exponential backoff + jitter. Resume token returned on first connect lets the gateway skip re-auth on reconnect.
Great: edge also maintains a durable subscription table in Redis (user_id → pod_id). On pod restart, the router replays undelivered messages from Kafka since the client's last acked offset. Client sends SYNC { lastSeenServerId } on reconnect.

2. Offline queue

Bad: keep outbox in memory; on refresh the queue is lost.
Good: IndexedDB outbox + Service Worker Background Sync. Each message has a clientId. On reconnect the SW flushes in order and de-dups by clientId when the server echoes back.
Great: outbox entries have monotonic local sequence numbers so the server can preserve client ordering even when it gets a burst out-of-order. If the app is killed mid-upload, the SW resumes partial media uploads via the Resumable Uploads protocol.

3. Multi-device sync

Bad: each device treats inbox independently, users see unread badges mismatch.
Good: server maintains per-device high-water marks. A READ receipt from one device moves the mark and fan-outs to the user's other devices.
Great: with E2E encryption, the server can't read content but can still route. Each device has its own key; sender encrypts the message once per recipient device (Signal protocol sender keys). Key rotation on device-add / revoke.

Frontend Considerations

Component tree: ChatList (virtualised) / Conversation (virtualised messages) / Composer. Virtualisation is critical for 10k-message threads.
State: conversation metadata in Zustand or Redux Toolkit; messages in a per-conversation slice keyed by convId. Do not put all messages in one flat map; re-renders explode.
Optimistic send: append a pending message with clientId to the slice. On ACK, patch with serverId + serverTs and re-sort. On timeout, mark as failed with retry action.
Typing indicator: throttled at 1 Hz, with a 3s inactivity timeout.
A11y: aria-live="polite" for incoming messages, focus trap in composer on reply.
Perf: message list uses react-window; media thumbnails lazy-load via IntersectionObserver; images use <img loading="lazy"> and decoding="async".

What's Expected at Each Level

SMTS (IC4): full architecture, at least two deep dives, explicit offline + multi-device story, crisp frontend state model.
Lead (IC5): capacity numbers that match real-world, trade-offs between CRDT vs server-authoritative ordering, pod failure recovery.
Staff (IC6): key rotation at scale, regional failover, privacy + legal boundaries.

2. Design Google Maps

Reported Frequency: Nov 2025 SMTS Frontend HLD round, often paired with "render a 10k-point dataset smoothly".

Problem

Frontend-heavy. Build a pan-and-zoom map that renders tiles over a large geographic area, supports markers, clustering, and a search-autocomplete for places. Backend serves tiles and routing.

Clarifying Questions

You: "Raster or vector tiles?" — Interviewer: "Both — talk about trade-offs and pick one."
You: "Offline support?" — Interviewer: "Nice to have, cache the last viewport."
You: "Is routing a requirement?" — Interviewer: "Yes, turn-by-turn for driving."
You: "How many markers on screen at peak?" — Interviewer: "Can be 100k for a business search."
You: "Mobile or desktop?" — Interviewer: "Desktop primary, but think about touch gestures."
You: "Do we support overlays like traffic?" — Interviewer: "Yes, as an opt-in layer."

Functional Requirements

Pan, zoom, rotate.
Display tiles for the viewport.
Place markers and clusters.
Search autocomplete.
Routing between two points.
Traffic overlay layer.

Non-Functional Requirements

60 fps pan / zoom on mid-tier laptops.
Tile load under 200 ms P95 from cache.
First meaningful paint under 1.5 s on 4G.
Bundle under 250 KB gzipped core.

Capacity Estimation

World, zoom 0–22. Tile count ≈ 4^22 ≈ 17 trillion at max zoom — pre-render only popular zooms, generate the rest on demand.
Average viewport 16 tiles. With 100M DAU, ~1.6B tile requests/day → ~20k QPS average, 100k peak. CDN absorbs ~99%.

Core Entities

Tile(x, y, z), Viewport, Marker, Cluster, Place, Route.

High-Level Architecture

Browser (Canvas / WebGL)
  ├── Tile Cache (Map<tileKey, ImageBitmap>) + IndexedDB
  ├── Marker Quadtree
  └── Gesture Controller (momentum, pinch)
        │
        ▼ HTTPS
  [CDN] ── Tile origin miss → [Tile Service]
  [Places Service] ── autocomplete → Elasticsearch
  [Routing Service] ── contraction hierarchies → PostgreSQL + pgRouting
  [Traffic Service] ── live aggregated speeds → Redis Streams

API Design

typescript

GET /tiles/{layer}/{z}/{x}/{y}.pbf              // vector tile
GET /tiles/{layer}/{z}/{x}/{y}.png              // raster fallback
GET /places/autocomplete?q=coffee&near=lat,lng  // ranked results
GET /routes?from=a,b&to=c,d&mode=drive          // polyline + turn list
GET /traffic/segments?bbox=...                  // sparse speed deltas

Data Model

Tiles: S3 + CDN, {layer}/{z}/{x}/{y}.pbf. Vector tiles are Protocol-Buffer-encoded feature collections. Content-addressed by version, cache-forever with a new version on data refresh.
Places: Elasticsearch index with geohash prefixes for proximity boost, n-gram tokenizer for prefix autocomplete.
Routing graph: PostGIS + custom adjacency. Pre-computed contraction hierarchies for fast queries.

Vector vs Raster Tiles

	Raster	Vector
Rendering	Image blit	Parse + draw on GPU
Styling	Fixed per tile	Restyle on client, dark mode, zoom interpolation
File size	20–40 KB each	10–30 KB each
CPU	Low	Higher, but amortised
Choice	Legacy / simple	Modern / preferred (Mapbox GL, MapLibre)

Pick vector — smoother zoom interpolation between levels, smaller payloads, and restyling without re-fetching.

Deep Dives

1. Tile loading performance

Bad: fetch tiles as <img> on each pan step. DNS + HTTP overhead tanks FPS.
Good: precompute tile keys for the current viewport + 1 buffer ring. Cache in memory by tileKey. Use HTTP/2 for multiplexed requests. LRU eviction capped at ~500 tiles.
Great: WebWorker parses vector tiles off the main thread. Progressive rendering — draw the lower-zoom tile under the higher-zoom tile so the user sees something immediately. Prefetch tiles in the direction of user pan momentum. Service Worker caches tiles per-version in Cache Storage.

2. Infinite canvas / gestures

Bad: listen to wheel and mousedown and translate div on every event. Layout thrash, jank.
Good: canvas or WebGL. Gesture controller integrates velocity; rAF loop applies the current transform. Hit-testing via a quadtree.
Great: GPU transforms only; avoid Layout. Decouple input thread from render thread with a shared transform atomic. At zoom transitions, cross-fade old and new tile sets over 150 ms for subjective smoothness.

3. Marker clustering at 100k scale

Bad: render all markers. Browser dies at 10k DOM nodes.
Good: Supercluster-style grid clustering. Compute clusters per zoom at ingest time; client consumes pre-clustered GeoJSON.
Great: GPU instancing on WebGL — one draw call for all markers. Clustering happens in a WebWorker; the main thread only renders. Incremental updates on pan so only the new viewport region gets re-clustered.

Frontend Considerations

Rendering: WebGL via MapLibre / Mapbox GL for vector; fallback to canvas for low-end devices.
State: React for chrome (side panel, search box), imperative canvas for map. Do not put tile state in React state — re-renders will kill frame budget.
Gestures: PointerEvents, momentum via requestAnimationFrame. Pinch-to-zoom on touch.
Perf budgets: 16.67 ms per frame; profile with Performance panel Frame Rendering Stats.
A11y: keyboard pan (arrow keys), zoom (+/−), list view fallback for screen readers.

What's Expected at Each Level

SMTS: tile cache strategy, gesture perf, vector tile reasoning, marker clustering.
Lead: WebWorker architecture, progressive rendering, CDN versioning.
Staff: data freshness, global edge topology, cost of pre-rendering.

3. Design Notification Service

Reported Frequency: Very common. Pure backend HLD.

Problem

Multi-channel notification service: email, SMS, push (FCM/APNs), in-app. Per-user preferences per channel per category. Rate limits per user. Durable, idempotent, retryable. Multi-tenant.

Clarifying Questions

You: "Who's the caller — internal services or external?" — Interviewer: "Internal services publish notification intents."
You: "Latency target?" — Interviewer: "Transactional in 10s P99, marketing can be minutes."
You: "Do we de-dup?" — Interviewer: "Yes, based on idempotency key."
You: "Templates?" — Interviewer: "Server-rendered with variables."
You: "Opt-out granularity?" — Interviewer: "Per channel per category per tenant."
You: "Scale?" — Interviewer: "1B notifications/day at steady state."

Functional Requirements

Accept notification intents with recipient, template, channel preferences, priority.
Resolve recipient preferences.
Render templates.
Dispatch through channel providers.
Retry with backoff.
De-dup by idempotency key.
Observability — delivery state queryable.

Non-Functional Requirements

1B notifications/day → ~12k/s average, 60k/s peak.
Transactional P99 < 10s end-to-end.
At-least-once delivery with de-dup.
Tenant isolation.

Capacity Estimation

60k/s peak × 1 KB payload = 60 MB/s.
Kafka throughput: ~500 MB/s per broker; easily fits in a modest cluster.
Storage of delivery state: 1B rows/day × 200 B = 200 GB/day. 30-day retention = 6 TB, then archive.

Core Entities

NotificationIntent, UserPreferences, Template, DeliveryAttempt, Channel.

High-Level Architecture

Producer Services
      │  (POST /notify)
      ▼
 [Ingest API] ── validate, idempotency check (Redis), enqueue
      │
      ▼
 [Kafka: notifications]
   partitions keyed by (tenant_id, user_id) for ordering
      │
      ▼
 [Dispatcher Workers] ── per channel
      ├── resolve prefs (PrefsService, cached)
      ├── render template (TemplateService)
      ├── apply rate limit (Redis token bucket)
      ├── call channel provider (SES, Twilio, FCM, APNs)
      └── write DeliveryAttempt (Postgres + Kafka: delivery_log)
              │
     retry ◄──┤ failed (transient)
     DLQ ◄────┘ failed (permanent)

API Design

typescript

POST /v1/notifications
Headers: Idempotency-Key: <uuid>, X-Tenant-Id: <uuid>
Body: {
  recipient: { userId: string } | { email: string } | { phone: string },
  template: { id: string, data: Record<string, any> },
  channels: ("EMAIL" | "SMS" | "PUSH" | "INAPP")[],
  category: string,                // "transactional" | "marketing" | ...
  priority: "HIGH" | "NORMAL" | "LOW",
  scheduleAt?: string
}
-> 202 Accepted { notificationId }

GET /v1/notifications/:id  -> { state, attempts: DeliveryAttempt[] }
PUT /v1/preferences/:userId -> { channels: { EMAIL: { marketing: false, ... } } }

Data Model

Notifications (Postgres, partitioned by tenant_id, month): id, tenant_id, user_id, category, priority, state, created_at.
DeliveryAttempts (same partitioning): notification_id, channel, provider_id, state, attempt_no, error_code, attempted_at.
UserPreferences (DynamoDB or Postgres JSONB): tenant_id + user_id → {channel: {category: bool}}.
IdempotencyKeys (Redis, 24h TTL): tenant_id:idem_key → notification_id.

Multi-Tenancy Strategy

Kafka partition key hash(tenant_id, user_id) so one user's notifications stay in order and isolated.
Per-tenant Kafka quotas so a single tenant can't flood the cluster.
Rate limiter keyed by (tenant_id, user_id, channel).
DB rows all carry tenant_id; row-level security enforces scoping on queries.
Separate "priority" topic or partition ranges for tenants with enterprise SLAs.

Deep Dives

1. Fan-out strategy for push to all devices

Bad: a single intent with 10k recipients serialises through one worker.
Good: fan-out at ingest — expand recipient group into N intents, each with one recipient, published to Kafka. Parallelism scales with partitions.
Great: tiered fan-out: intent → batch job → per-recipient messages. Batch job is idempotent (Flink or Spark), checkpointed. For marketing blasts, use a shuffler to avoid hotspotting single users across retries.

2. Idempotency

Bad: check-then-insert; races produce duplicates.
Good: Redis SET NX with key tenant:idem and 24h TTL. If key exists, return stored notification_id.
Great: two-level check — Redis in the hot path, Postgres unique constraint as last line of defense. On Redis outage, fall back to DB. Store the rendered body hash so re-send with modified payload is detected.

3. Priority and starvation

Bad: single queue; low-priority marketing blocks transactional OTPs when producer misbehaves.
Good: separate Kafka topics per priority. Dispatcher pool per topic.
Great: weighted pool: high-priority workers can temporarily steal low-priority capacity but not vice versa. Back-pressure signals to ingest when HIGH topic lag exceeds budget.

What's Expected at Each Level

SMTS: clean layered architecture, idempotency, rate limiting, retries with DLQ, template rendering.
Lead: priority lanes, fan-out strategy, provider failover (SES → SendGrid).
Staff: global delivery, compliance (opt-out enforcement at ingest), cost optimisation.

4. Design Google Docs (Collaborative Editor)

Reported Frequency: Common. Pair programming of the HLD — they'll ask about OT/CRDT in depth.

Problem

Real-time collaborative text editor. Multiple users edit the same document with sub-second convergence. Cursor presence. Conflict resolution. Offline editing. Version history.

Clarifying Questions

You: "Rich text or plain?" — Interviewer: "Rich text with basic formatting."
You: "How many concurrent editors per doc?" — Interviewer: "Up to 100 typical, 200 max."
You: "Offline edits?" — Interviewer: "Yes, merge on reconnect."
You: "Version history granularity?" — Interviewer: "Every 30s auto-snapshot, plus named versions."
You: "Access control?" — Interviewer: "Per-doc ACL: owner, editor, commenter, viewer."
You: "E2E encryption?" — Interviewer: "No, server sees content."

Functional Requirements

Collaborative edit with sub-second convergence.
Cursor + selection presence.
Undo / redo local.
Version history with named versions.
Offline edit + merge.
Comments (threaded).

Non-Functional Requirements

Latency: 99% of operations under 200 ms end-to-end.
Durability: no edit loss after server ACK.
Scale: 10M docs, 100k concurrent docs active.

Capacity Estimation

100k concurrent docs × 50 ops/min avg = 83k ops/s average, ~500k peak during business hours.
Each op ≈ 100 B. 50 MB/s of op traffic.
Snapshots every 30s: 100k × 2/min × 20 KB = 67 MB/s, ~6 TB/day before compression.

Core Entities

Document, Operation, Snapshot, Presence, Comment, AccessGrant.

High-Level Architecture

Browser (rich-text editor, e.g. Slate / TipTap)
  ├── Local CRDT or OT state
  ├── Pending ops queue (IndexedDB for offline)
  └── WebSocket to Doc Shard
        │
        ▼
 [Doc Gateway] ── routes to shard by doc_id
        │
        ▼
 [Doc Shard] (stateful; single-writer per doc)
   ├── Apply and transform ops
   ├── Broadcast to other clients on this doc
   ├── Flush ops to Kafka log (durability)
   └── Snapshot every N ops or 30s to S3
        │
        ▼
 [Ops Log: Kafka]   [Snapshots: S3]   [Metadata: Postgres]
 [Presence: Redis pub/sub]

API Design

WebSocket frame protocol:

typescript

// client -> server
{ type: "JOIN", docId, fromRev }
{ type: "OP", docId, clientId, rev, op }               // op = OT or CRDT delta
{ type: "CURSOR", docId, position, selection }

// server -> client
{ type: "SYNC", snapshot, rev }                        // on first join
{ type: "OP", docId, authorId, rev, op }               // fan-out
{ type: "CURSOR", docId, authorId, position, selection }
{ type: "ACK", clientId, rev }

Data Model

Documents (Postgres): id, owner_id, title, current_rev, created_at.
Snapshots (S3): docs/{docId}/snap/{rev}.bin — serialised rich-text tree.
Ops log (Kafka topic per shard): each message is (doc_id, rev, author, op bytes). Retention 30 days.
ACL (Postgres): doc_id, user_id, role, grant_ts.
Presence (Redis): ephemeral, TTL 30s.

OT vs CRDT (be prepared to compare)

OT (Operational Transform): central server applies ops and transforms concurrent ops against each other. Client sends op at its known revision, server transforms it against any ops in between, applies, returns new revision. Used by Google Docs historically.

Pros: server-authoritative, smaller client state, rich-text transforms well-studied.
Cons: needs an authoritative serialiser per doc, tricky transform functions, hard offline.

CRDT (Conflict-free Replicated Data Types): ops are commutative; any apply order converges. Examples: RGA, Yjs, Automerge.

Pros: true peer-to-peer, excellent offline, no central transform.
Cons: larger metadata (timestamps, tombstones), GC is hard, rich-text CRDTs less mature.

For a "server-authoritative, reasonable offline" Salesforce-style answer, pick OT and name-drop CRDT as the "going further" path.

Deep Dives

1. OT transform and doc shard

Bad: every client and the server independently apply ops — divergence.
Good: single-writer per doc shard. Server holds current state + history ring buffer of last N ops. Incoming op at rev r is transformed against ops r..current, applied, broadcast. Client rebases its pending ops against the server's echoed rev.
Great: shard failover — each doc shard is a Raft group of 3 replicas. Leader applies; followers tail the Kafka log. On leader loss, a follower promotes and replays from the last snapshot.

2. Presence scaling

Bad: broadcast cursor on every keystroke over the OT channel — saturates WebSocket.
Good: separate presence channel, throttled to 10 Hz, ephemeral via Redis pub/sub keyed by doc_id.
Great: spatial subscription — a client with 50 collaborators subscribes only to cursors within its viewport (rich doc case). Presence deltas compressed.

3. Offline editing

Bad: user types offline; on reconnect the local doc overwrites the server.
Good: local pending op queue. On reconnect, send ops one by one starting from last-acked rev. Server transforms each against concurrent ops and returns ACK with rebased rev. Client reapplies local state.
Great: long offline windows use a CRDT-style per-session log; the server reconciles via a merge procedure that preserves user intent (heuristics for insert-after vs delete-through-typed-region). Present user with a merge conflict UI only for irreconcilable cases.

What's Expected at Each Level

SMTS: OT flow, shard-per-doc, snapshot + log, presence, offline queue.
Lead: Raft-based HA, explicit transform function, back-pressure.
Staff: CRDT trade-off with concrete scale numbers, export / import, plagiarism on comment history.

5. Design Autocomplete / Typeahead

Reported Frequency: Very common in frontend rounds. Confirmed Salesforce question.

Problem

Input field that suggests completions as the user types. Client handles debouncing, race conditions, caching. Backend serves ranked completions from a prefix index. Personalisation and accessibility required.

Clarifying Questions

You: "Data size?" — Interviewer: "10M strings, rebuild index daily."
You: "Latency target?" — Interviewer: "Under 100 ms from keystroke to render."
You: "Personalised?" — Interviewer: "Yes, user history boosts."
You: "Offline?" — Interviewer: "Cache recent queries."
You: "Multilingual?" — Interviewer: "Unicode, but no translation."
You: "Min prefix length?" — Interviewer: "2 characters."
You: "A11y?" — Interviewer: "Screen-reader support is mandatory."

Functional Requirements

Suggestions after 2+ chars.
Keyboard nav (arrows, Enter, Escape).
Personalised ranking.
Recent searches.
Highlight match.

Non-Functional Requirements

Keystroke to render: P99 < 100 ms.
Client cache hit: > 50% during a session.
Abort in-flight on new keystroke.

Core Entities

Query, Suggestion, UserContext, PrefixIndex.

High-Level Architecture

Input (React)
  ├── Debounce (150 ms)
  ├── LRU cache (prefix → suggestions[])
  ├── AbortController per in-flight request
  └── aria-live status
        │
        ▼
 [Edge Cache / CDN]  ← short TTL (60s) for hot prefixes
        │
        ▼
 [Autocomplete Service]
  ├── Prefix index (Trie in memory OR Elasticsearch completion suggester)
  ├── Ranker (popularity + personalisation)
  └── Feature store (user recent queries)

API Design

typescript

GET /v1/autocomplete?q=sa&ctx=home&userId=...
-> {
  prefix: "sa",
  results: [
    { text: "salesforce", score: 0.97, highlight: "<b>sa</b>lesforce" },
    { text: "samsung", score: 0.90, highlight: "<b>sa</b>msung" }
  ],
  servedAt: "2026-04-19T09:02:11Z"
}

Client Implementation

typescript

interface UseAutocompleteResult {
  query: string;
  setQuery: (v: string) => void;
  suggestions: Suggestion[];
  loading: boolean;
  error: Error | null;
}

function useAutocomplete(fetcher: (q: string, signal: AbortSignal) => Promise<Suggestion[]>): UseAutocompleteResult {
  const [query, setQuery] = useState("");
  const [suggestions, setSuggestions] = useState<Suggestion[]>([]);
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState<Error | null>(null);
  const cacheRef = useRef(new LRUCache<string, Suggestion[]>(50));
  const activeController = useRef<AbortController | null>(null);

  useEffect(() => {
    if (query.length < 2) { setSuggestions([]); return; }

    const cached = cacheRef.current.get(query);
    if (cached) { setSuggestions(cached); return; }

    const t = setTimeout(() => {
      activeController.current?.abort();
      const controller = new AbortController();
      activeController.current = controller;
      setLoading(true);

      fetcher(query, controller.signal)
        .then((res) => {
          cacheRef.current.put(query, res);
          setSuggestions(res);
          setLoading(false);
        })
        .catch((e) => {
          if (e.name !== "AbortError") { setError(e); setLoading(false); }
        });
    }, 150);

    return () => clearTimeout(t);
  }, [query, fetcher]);

  return { query, setQuery, suggestions, loading, error };
}

Deep Dives

1. Cache invalidation

Bad: client cache forever; stale suggestions after index refresh.
Good: short TTL per entry (5 min) + server servedAt check on response.
Great: cache key includes user context + index version; server returns indexVersion header; client evicts cross-version entries on change.

2. Offline

Bad: no suggestions without network.
Good: cache last N queries; respond from cache when offline; show subtle "offline" marker.
Great: Service Worker precaches a static top-10k completions shard for cold-start offline use.

3. Ranking signals

Bad: alphabetical.
Good: popularity (frequency in logs) + recency decay.
Great: per-user learned ranker. Features: user history, time-of-day, current session queries, geo. Lightweight logistic model served from a feature store; latency budget 10 ms.

Frontend Considerations

A11y: listbox role. aria-controls, aria-activedescendant points to the highlighted option ID. aria-live="polite" status region announces "3 suggestions". Esc closes. Enter selects.
Keyboard: Down/Up moves active, with wrap. Home/End optional.
Race conditions: AbortController cancels stale fetches. Additionally check if (responseQuery !== currentQuery) return; after await.
Perf: virtualise when suggestions > 20.

What's Expected at Each Level

SMTS: debounce + abort + cache + a11y; calls out race conditions unprompted.
Lead: ranking, index refresh, client / server cache layering.
Staff: ML ranker, cold-start, multi-locale indexing.

6. Design Infinite Scroll / Virtualised List

Reported Frequency: Common frontend HLD, often merged with a Feed / Inbox problem.

Problem

Render a list of up to 100k items smoothly. Support variable heights, scroll restoration, prefetching of next page, keyboard accessibility.

Core Entities

Item, Viewport, ItemMeasureCache, Page.

Architecture (client-side)

ListContainer
  ├── IntersectionObserver for sentinel near bottom → load next page
  ├── Windowing (react-window / react-virtuoso)
  ├── ItemMeasureCache (ResizeObserver on each mounted row)
  └── ScrollRestoration hook

Windowing Sketch

typescript

function VirtualList<T>({ items, estimatedHeight, Row }: {
  items: T[];
  estimatedHeight: number;
  Row: React.FC<{ item: T; index: number }>;
}) {
  const [range, setRange] = useState({ start: 0, end: 20 });
  const parentRef = useRef<HTMLDivElement>(null);
  const measured = useRef(new Map<number, number>());

  const offsets = useMemo(() => {
    const out = [0];
    for (let i = 0; i < items.length; i++) {
      out.push(out[i] + (measured.current.get(i) ?? estimatedHeight));
    }
    return out;
  }, [items.length, estimatedHeight]);

  const onScroll = () => {
    const el = parentRef.current!;
    const top = el.scrollTop;
    const bottom = top + el.clientHeight;
    let start = 0, end = items.length;
    for (let i = 0; i < offsets.length - 1; i++) {
      if (offsets[i + 1] >= top) { start = Math.max(0, i - 4); break; }
    }
    for (let i = start; i < offsets.length - 1; i++) {
      if (offsets[i] > bottom) { end = Math.min(items.length, i + 4); break; }
    }
    setRange({ start, end });
  };

  return (
    <div ref={parentRef} onScroll={onScroll} style={{ height: 600, overflow: "auto", position: "relative" }}>
      <div style={{ height: offsets[items.length] }}>
        {items.slice(range.start, range.end).map((item, idx) => (
          <div
            key={range.start + idx}
            style={{ position: "absolute", top: offsets[range.start + idx], left: 0, right: 0 }}
          >
            <Row item={item} index={range.start + idx} />
          </div>
        ))}
      </div>
    </div>
  );
}

Deep Dives

1. Scroll performance

Bad: render all 100k nodes. Layout time seconds, memory GBs.
Good: windowing with buffer rings (4 items above / below).
Great: GPU-friendly transforms (translate3d(0, Y, 0)) instead of changing top; will-change: transform; avoid box-shadow on scrolling items.

2. Variable heights

Bad: assume fixed height, mis-align on tall items.
Good: ResizeObserver on mounted rows writes measured height into cache; offsets recompute lazily.
Great: estimated-height placeholder until measured; on measurement, translate subsequent items by delta without triggering reflow cascade. Maintain a prefix-sum tree for O(log n) offset queries on 100k items.

3. Scroll restoration

Bad: on navigation back, list is at top.
Good: persist scrollTop + firstVisibleKey in history state; restore on mount.
Great: anchor on a stable item key (not index) — if the list changed while away, align the anchor item to its previous viewport position, then reflow around it.

4. Prefetch

Observe a sentinel 2 pages before the end. On intersect, fetch next page; update state via functional setter to avoid stale closures. Respect React 18 transitions so scroll stays smooth.

role="list" on container, role="listitem" on rows.
Keyboard focus must not break when the focused row is virtualised out — pre-render focused row outside the window or use tabindex="-1" fallbacks.
aria-rowcount for screen readers so they announce "item 57 of 10000".

What's Expected at Each Level

SMTS: windowing, measurement cache, IntersectionObserver prefetch, scroll restoration.
Lead: prefix-sum tree for variable heights, memory ceiling at 100k.
Staff: integration with offline cache, user-visible loading patterns.

7. Design Flash Sale System

Reported Frequency: Hyderabad SMTS report, Jan 2026.

Problem

A limited-inventory flash sale. 10× normal traffic burst at sale start. Fairness: first-come-first-served (within tolerance). Prevent oversell. Multi-tenant — one platform hosts many vendors' sales.

Clarifying Questions

You: "Fairness level?" — Interviewer: "Best-effort FIFO within a 1s bucket; strict ordering not required."
You: "Pay or reserve?" — Interviewer: "Reserve inventory for 2 min, user pays within that window."
You: "Known bad actors?" — Interviewer: "Yes, expect bots; bot mitigation at edge."
You: "Inventory scale?" — Interviewer: "10k units per SKU, many SKUs."
You: "Global?" — Interviewer: "India region only for v1."

Functional Requirements

Inventory check + reserve with a hold.
Queue when capacity exceeded.
Convert hold to order on payment.
Release on timeout or failed payment.
Per-tenant config (start time, max per user).

Non-Functional Requirements

Peak 200k RPS for 30s.
P99 reserve under 300 ms during burst.
Zero oversell.
Tenant isolation.

Capacity Estimation

200k RPS × 30s = 6M attempts. For 100k units, ~60× contention.
Inventory writes: 100k successful reservations in 30s ≈ 3k writes/s — trivial in absolute terms, but hotspotted on a few SKU rows.

High-Level Architecture

Client
  ▼
[Edge WAF / Bot mitigation (reCAPTCHA, fingerprint)]
  ▼
[Global LB]
  ▼
[Queue Gate] ── admission controller
  │ admitted?
  ├── No → enqueue in virtual waiting room (SSE updates position)
  └── Yes ▼
[Reserve Service]
  ├── atomic decrement (Redis cluster, per-SKU hash slot)
  ├── write hold (DynamoDB with TTL)
  └── publish to Kafka for persistence + audit
  ▼
[Order Service]
  ├── checkout form
  └── on pay → convert hold to order, settle payment
  ▼
[Primary DB (Postgres, per-tenant partition)]

Data Model

Inventory counter: Redis hash key sale:{tenant}:{sku}:remaining, initialised at sale start. Use DECRBY 1 atomically; reject if < 0.
Holds: DynamoDB (tenant, hold_id) with TTL 120s. Secondary index by user_id + sku for per-user limits.
Orders: Postgres, tenant_id partitioned.

Deep Dives

1. Preventing oversell under concurrency

Bad: read inventory, check, decrement. Classic TOCTOU race.
Good: atomic DECR; if resulting value < 0, restore and reject. Single Redis primary per SKU avoids CRDT counters.
Great: Lua script that decrements and writes hold atomically. If Redis node fails, failover to replica with write-fencing token so stale primary can't re-accept.

2. Cache stampede / hot-key

Bad: every request reads SKU metadata fresh from DB at T-0.
Good: pre-warm SKU metadata into every edge cache before sale start; cache-aside with mutex on miss.
Great: request coalescing (singleflight) in the gateway — one back-end call per SKU per node during warm-up bursts.

3. Fairness queue

Bad: let the LB distribute freely; last request wins during bursts.
Good: virtual waiting room — on oversubscription, admission controller issues a queue token. Client polls position over SSE.
Great: admission rate tracks drain rate of Reserve Service; feedback loop adjusts in real time. Token signed with HMAC so client can't skip line.

4. Multi-tenant isolation / noisy neighbour

Per-tenant Redis slot for counters so a hot sale in tenant A doesn't evict tenant B.
Per-tenant rate limits at gateway.
Dedicated worker pool for enterprise-tier tenants.
Each sale has its own Kafka topic partition — easy to throttle and monitor independently.

Frontend Considerations

Queue screen with estimated time, SSE reconnection on drop.
Disable buy button on submit to avoid double-clicks; still idempotent by client-side token.
Optimistic UI inappropriate here: show clear "reserved" state once server confirms.
Local timer for hold expiry, synced from server.

What's Expected at Each Level

SMTS: oversell prevention, virtual queue, cache warming, tenant quotas.
Lead: Lua-atomicity, fencing, singleflight, capacity planning.
Staff: global deployment, fraud detection, regulator-grade audit, cost of over-provisioning.

8. Design a Multi-Tenant CRM Dashboard

Reported Frequency: Salesforce-specific — expect some variant in the HLD round.

Problem

A dashboard view in a multi-tenant CRM. Each tenant defines custom fields on core objects (Account, Contact, Opportunity). Users build dashboards from widgets (chart, table, KPI). Role-based access; row-level sharing rules.

Clarifying Questions

You: "Do tenants share any data?" — Interviewer: "No, strict isolation."
You: "Custom field types?" — Interviewer: "String, number, date, picklist, lookup."
You: "Data volume per tenant?" — Interviewer: "Median 100k records, P99 tenant has 50M."
You: "Dashboard refresh cadence?" — Interviewer: "On open, plus optional auto-refresh every 60s."
You: "Are dashboards shared?" — Interviewer: "Within org, yes. Role-based viewers."

Functional Requirements

Render dashboards with N widgets.
Drag-and-drop layout; save per user.
Widget queries the customised data model.
Role-based access: only see rows the current user is allowed to see.
Export to CSV.

Non-Functional Requirements

P95 dashboard open < 2s.
Strong tenant isolation.
Custom schema changes must not require deploys.

High-Level Architecture

Browser (React)
  ├── Dashboard framework (grid layout, widget SDK)
  ├── Per-widget query hooks (React Query)
  └── Local cache with tenant+user key
        │
        ▼ GraphQL
 [BFF / GraphQL Gateway]
        ├── tenant-scoping middleware (authz + tenant_id injection)
        └── per-tenant persisted queries
        │
        ▼
 [Query Service]
        ├── object registry (tenant custom schema)
        ├── sharing-rule resolver
        └── cache layer (per-tenant)
        │
        ▼
 [Primary DB (Postgres)]
  ├── core tables: accounts, contacts, opportunities (tenant_id column)
  ├── custom_fields (tenant_id, object, field_key, type)
  └── custom_values (tenant_id, object_id, field_key, value)
 [Analytics DB (ClickHouse)] ── for heavy aggregations
 [Search (Elasticsearch)] ── per-tenant index

Data Model — Schema Flexibility

Two common approaches for custom fields:

A. EAV (Entity-Attribute-Value)

sql

CREATE TABLE custom_values (
  tenant_id uuid,
  object_type text,
  object_id uuid,
  field_key text,
  value_text text,
  value_number numeric,
  value_date timestamp,
  PRIMARY KEY (tenant_id, object_type, object_id, field_key)
);

Pros: unlimited fields, no migrations.
Cons: joins explode for reporting; typed indexing is manual.

B. Sparse wide table / JSONB

sql

ALTER TABLE accounts ADD COLUMN custom jsonb;
CREATE INDEX ON accounts ((custom->>'industry_tier')) WHERE tenant_id = '...';

Pros: single row fetch returns everything; GIN / functional indexes on JSONB.
Cons: per-tenant functional indexes don't scale to thousands of tenants.

Salesforce's own pattern is a pivoted physical model (wide table with VARCHAR columns + a metadata table mapping tenant field → column). For an interview, propose JSONB with tenant-specific functional indexes for top-N tenants and EAV fallback.

Multi-Tenancy Strategy

Every table has tenant_id. Primary key starts with tenant_id.
GraphQL gateway injects tenant_id from the auth token; query builder composes it into every WHERE.
Postgres row-level security as a backstop: CREATE POLICY tenant_iso ON accounts USING (tenant_id = current_setting('app.tenant')::uuid).
Cache keys prefixed tenant:{id}:....
Large tenants (>10M records) optionally pinned to a dedicated schema or replica.
Per-tenant query cost caps (governor limits): reject queries above N returned rows or T ms.

Roles hierarchy: Org admin / manager / member.
Record-level ACL computed as: owner grants + role-up-the-tree grants + explicit shares.
Materialise user_visible_accounts(tenant_id, user_id, account_id) for fast joins on read-heavy dashboards; refresh on ACL change via CDC.

Deep Dives

1. Schema evolution at tenant-time

Bad: custom field add triggers a DDL migration. Thousands of tenants × thousands of fields → unmanageable.
Good: metadata table describes fields; values go in custom_values or JSONB. No DDL per tenant.
Great: hot path uses pre-joined materialised view refreshed lazily; cold metadata change triggers view rebuild within seconds; plan cache invalidated per tenant.

2. Dashboard query performance

Bad: run 8 widget queries in parallel against OLTP, dashboard hits 4s.
Good: route aggregates to ClickHouse with tenant_id as first sort key. OLTP for drill-downs.
Great: per-widget result cache keyed by tenant + query hash + data epoch; partial results streamed via HTTP streaming / GraphQL @defer so the dashboard renders incrementally.

3. Noisy neighbour

Bad: one tenant's slow query pegs the shared DB.
Good: per-tenant connection pools with caps; query budgets enforced.
Great: pool per tier; enterprise tenants get dedicated read replicas; analytics queries segregated to a separate cluster.

Frontend Considerations

Dashboard rendered via a Widget component tree. Each widget is isolated — errors bounded by an ErrorBoundary so one broken widget doesn't crash the page.
State: layout in a store (Zustand), widget data via React Query with key [tenantId, userId, widgetId, params].
Drag-and-drop using react-grid-layout or custom; save layout diff, not full state.
A11y: each widget has a heading; dashboard announces updates via aria-live="polite" on auto-refresh.
Performance: widgets lazy-load via Suspense; below-the-fold ones use IntersectionObserver.

What's Expected at Each Level

SMTS: custom schema strategy, tenant-ID everywhere, sharing rule model, two-store pattern (OLTP + OLAP).
Lead: materialised visibility, governor limits, hot-tenant isolation.
Staff: schema-at-rest evolution (online reindex), cross-region residency, cost model per tenant.

Cross-Problem Playbook

Use these transitions when an interviewer nudges you toward a specific area:

"What about scale?" → capacity numbers, shard key, hot-tenant handling.
"What if the DB goes down?" → replication topology, RTO/RPO, degraded mode (read-only dashboards, reject writes).
"How do you test this?" → contract tests, chaos tests, load tests with recorded traffic.
"What metrics would you track?" → RED (Rate, Errors, Duration) per service + per tenant; end-to-end message journey timer; inventory oversell counter (must be zero); dashboard widget P95 open time.
"How do you roll this out?" → feature flags per tenant, canary to 1% → 10% → 50% → 100%, backout plan, schema changes online-safe (expand, migrate, contract).

Personal Experience Hooks

When you can, anchor an answer in Pixis work so the interviewer sees proof:

"At Pixis we built an autocomplete for ad-targeting keywords over ~2M terms. We debounced at 150 ms, aborted in-flight fetches on new keystrokes, and cached per-session in an LRU — hit rate was ~60% which halved backend QPS."
"We had a real-time dashboard for ad performance; the original implementation re-rendered the whole grid on every WebSocket tick and dropped to 20 fps. I moved to per-cell subscriptions with a selector pattern and got back to 60 fps."
"Multi-tenant context matters because our ad accounts partition at the customer level — I've debugged at least one incident where a missing tenant scope in a cache key leaked data across tenants in a preview environment."

These hooks turn a generic HLD into "I've actually done this." Salesforce interviewers weight that heavily at SMTS.

System Design (HLD) — Salesforce SMTS ​

System Design Framework for Salesforce ​

Multi-Tenancy Patterns Primer ​

Salesforce-Specific Context ​

1. Design WhatsApp Web ​

Problem ​

Clarifying Questions ​

Functional Requirements ​

Non-Functional Requirements ​

Capacity Estimation ​

Core Entities ​

High-Level Architecture ​

API Design ​

Data Model ​

Multi-Tenancy Strategy ​

Deep Dives ​

Frontend Considerations ​

What's Expected at Each Level ​

2. Design Google Maps ​

Problem ​

Clarifying Questions ​

Functional Requirements ​

Non-Functional Requirements ​

Capacity Estimation ​

Core Entities ​

High-Level Architecture ​

API Design ​

Data Model ​

Vector vs Raster Tiles ​

Deep Dives ​

Frontend Considerations ​

What's Expected at Each Level ​

3. Design Notification Service ​

Problem ​

Clarifying Questions ​

Functional Requirements ​

Non-Functional Requirements ​

Capacity Estimation ​

Core Entities ​

High-Level Architecture ​

API Design ​

Data Model ​

Multi-Tenancy Strategy ​

Deep Dives ​

What's Expected at Each Level ​

4. Design Google Docs (Collaborative Editor) ​

Problem ​

Clarifying Questions ​

Functional Requirements ​

Non-Functional Requirements ​

Capacity Estimation ​

Core Entities ​

High-Level Architecture ​

API Design ​

Data Model ​

OT vs CRDT (be prepared to compare) ​

Deep Dives ​

What's Expected at Each Level ​

5. Design Autocomplete / Typeahead ​

Problem ​

Clarifying Questions ​

Functional Requirements ​

Non-Functional Requirements ​

Core Entities ​

High-Level Architecture ​

API Design ​

Client Implementation ​

Deep Dives ​

Frontend Considerations ​

What's Expected at Each Level ​

6. Design Infinite Scroll / Virtualised List ​

Problem ​

Core Entities ​

Architecture (client-side) ​

Windowing Sketch ​

Deep Dives ​

A11y ​

What's Expected at Each Level ​

7. Design Flash Sale System ​

Problem ​

System Design (HLD) — Salesforce SMTS

System Design Framework for Salesforce

Multi-Tenancy Patterns Primer

Salesforce-Specific Context

1. Design WhatsApp Web

Problem

Clarifying Questions

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Core Entities

High-Level Architecture

API Design

Data Model

Multi-Tenancy Strategy

Deep Dives

Frontend Considerations

What's Expected at Each Level

2. Design Google Maps

Problem

Clarifying Questions

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Core Entities

High-Level Architecture

API Design

Data Model

Vector vs Raster Tiles

Deep Dives

Frontend Considerations

What's Expected at Each Level

3. Design Notification Service

Problem

Clarifying Questions

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Core Entities

High-Level Architecture

API Design

Data Model

Multi-Tenancy Strategy

Deep Dives

What's Expected at Each Level

4. Design Google Docs (Collaborative Editor)

Problem

Clarifying Questions

Functional Requirements

Non-Functional Requirements

Capacity Estimation

Core Entities

High-Level Architecture

API Design

Data Model

OT vs CRDT (be prepared to compare)

Deep Dives

What's Expected at Each Level

5. Design Autocomplete / Typeahead

Problem

Clarifying Questions

Functional Requirements

Non-Functional Requirements

Core Entities

High-Level Architecture

API Design

Client Implementation

Deep Dives

Frontend Considerations

What's Expected at Each Level

6. Design Infinite Scroll / Virtualised List

Problem

Core Entities

Architecture (client-side)

Windowing Sketch

Deep Dives

A11y

What's Expected at Each Level

7. Design Flash Sale System

Problem