System Design (HLD) — Salesforce SMTS
Salesforce system-design rounds blend classic web-scale problems with multi-tenant SaaS flavour. Even frontend-leaning HLDs are expected to cover capacity, API shape, data model, and at least one deep-dive with tradeoffs. This file gives you the framework, a multi-tenancy primer, then eight detailed walkthroughs.
System Design Framework for Salesforce
Follow this sequence every time. Twenty-five minutes of the hour goes into the first four steps.
- Clarify requirements (5 min) — functional + non-functional. Ask about tenants, offline, internationalisation, analytics vs OLTP. Six to eight pointed questions.
- Capacity estimation (3 min) — users, QPS peak vs average, storage growth, bandwidth. Numbers must be defensible, not precise.
- Core entities (3 min) — top-level nouns, cardinalities.
- API design (5 min) — REST / GraphQL / WebSocket. Sketch 3-5 key endpoints with request / response shapes.
- Architecture (5 min) — client, edge (CDN, WAF), gateway, services, queues, stores, caches. Draw a block diagram.
- Data model (3 min) — SQL vs NoSQL vs blob vs time-series. Tenant-ID as partition key where relevant.
- Multi-tenancy (3 min) — isolation level, noisy-neighbour mitigation, per-tenant quotas.
- Deep dives (15 min) — interviewer picks one to three. Present Bad → Good → Great progression.
- Operability and risk (3 min) — observability, rollout, failure modes.
Multi-Tenancy Patterns Primer
Salesforce is the canonical multi-tenant SaaS, so reaching for the right isolation model is table stakes. There are three classical levels:
| Level | Shape | Pros | Cons | When |
|---|---|---|---|---|
| Shared DB, shared schema | Tenant-ID column on every row | Cheapest, densest | Noisy neighbour, hard per-tenant backups | Default for most SaaS, Salesforce Platform |
| Shared DB, separate schema | Schema per tenant | Easier per-tenant schema evolution, backups | Schema migrations explode with tenant count | Mid-market SaaS |
| Separate DB | DB per tenant | Strong isolation, regulated data | Costly, operationally heavy | Enterprise / healthcare / finance tier |
Rules of thumb:
- Tenant-ID goes in the primary-key prefix and in every index. Never let a query run without it.
- Enforce at the framework level, not the application level — a missing
WHERE tenant_id = ?is a data breach. Use row-level security or a data-access gateway. - Cache keys always start with tenant-ID. Two tenants'
user:42must not collide. - Quotas (QPS, storage, compute) per tenant prevent noisy neighbours. Enforce at the gateway.
- Hot-tenant isolation: the top 1% of tenants often drive 50% of load. Give them their own pool or shard.
Salesforce-Specific Context
Interviewers will probe these themes even on "generic" system design questions:
- Org-level customisation — each tenant (org) can define custom fields, custom objects, custom validation rules. Your data model should explain how you support this without schema migrations. Hint: a single
custom_fieldstable with(tenant_id, object_id, field_key, value)rows, or EAV, or a wide sparse table. - Governor limits — Salesforce ships limits (per-tenant API calls, DB rows returned). Mention them; show you'd build similar caps.
- Trust and audit — every write should produce an audit event. Retention and tamper-evident logs matter.
- Global with data residency — EU / US / APAC data residency. Pick partition strategy accordingly.
1. Design WhatsApp Web
Reported Frequency: Nov 2025 SMTS Frontend HLD round.
Problem
Build the WhatsApp web client backed by a messaging service. Real-time 1:1 and small group chat. Offline send with a local queue. Read receipts and typing indicators. Multi-device sync when the phone is the source of truth. Works in a single tab; optimistic UI on send.
Clarifying Questions
- You: "Is the phone still the source of truth, or is it a first-class device?" — Interviewer: "First-class. Both can go offline."
- You: "Group size cap?" — Interviewer: "Up to 256 for v1."
- You: "Voice / video / media?" — Interviewer: "Text + media. No voice."
- You: "E2E encryption?" — Interviewer: "Yes, but you can treat the crypto layer as a black box."
- You: "Retention?" — Interviewer: "Server keeps messages until delivered to all recipient devices, then purges."
- You: "Scale — DAU?" — Interviewer: "2 billion DAU globally, but design for a single region first."
- You: "Delivery guarantee?" — Interviewer: "At-least-once with de-dup on client."
- You: "Cursor position and typing indicator scale?" — Interviewer: "Yes, typing indicator must fan out."
Functional Requirements
- Send / receive text and media messages.
- Delivery states: Sent → Delivered → Read.
- Typing indicators.
- Offline queue on client; flush on reconnect.
- Multi-device sync.
- Group chat up to 256.
- Search message history.
Non-Functional Requirements
- P99 message delivery: under 500 ms at steady state.
- Availability: 99.99% for the messaging path.
- Durability: no message loss after server ACK.
- Scale: 1M concurrent connections per messaging pod.
- Privacy: E2E encryption, server cannot read plaintext.
Capacity Estimation
- 2B DAU, peak 100M concurrent connections → ~1000 pods at 100k connections each.
- 50 messages/user/day → 100B messages/day → ~1.2M msg/s average, 5M msg/s peak.
- Avg message 200 B encrypted → 200 GB/day of text, easily 10 TB/day with media.
- Retention until delivered: keep ~7 days buffer → 1.4 TB of text + media buffer.
Core Entities
User, Device, Conversation, Message, DeliveryReceipt, ConversationMember.
High-Level Architecture
Browser (React)
└── Service Worker (offline queue, background sync)
└── IndexedDB (message history cache)
│
▼ WSS
[Edge WebSocket Gateway] ── terminates TLS, pins to pod
│
▼ (gRPC)
[Message Router] ── sharded by conversation_id
│ ├── [Outbox Kafka] ── durable log
│ ├── [Presence Service] ── Redis
│ └── [Push Service] ── FCM / APNs
▼
[Message Store] ── Cassandra, partitioned by conversation_id
[Media Store] ── S3 + CDN (signed URLs)
[Search Index] ── Elasticsearch per user, E2E-encrypted tokensAPI Design
REST for control plane, WebSocket for data plane.
// Control plane (HTTPS)
POST /v1/conversations { memberIds: string[] } -> { id }
GET /v1/conversations/:id/messages?before=t -> Message[]
POST /v1/media/upload multipart -> { mediaId, cdnUrl }
// Data plane (WSS, framed)
// client -> server
{ type: "SEND", clientId: "uuid", convId, ciphertext, sentAt }
{ type: "DELIVERED", convId, messageId }
{ type: "READ", convId, messageId }
{ type: "TYPING", convId, on: boolean }
// server -> client
{ type: "ACK", clientId, serverId, serverTs }
{ type: "DELIVER", convId, message: {...} }
{ type: "RECEIPT", convId, messageId, by: userId, state: "DELIVERED" | "READ" }Data Model
Cassandra, partitioned by conversation_id with clustering on (server_ts, message_id):
CREATE TABLE messages (
conv_id uuid,
server_ts timestamp,
msg_id timeuuid,
sender_id uuid,
ciphertext blob,
media_ids list<uuid>,
PRIMARY KEY ((conv_id), server_ts, msg_id)
) WITH CLUSTERING ORDER BY (server_ts DESC);Receipts in a separate table keyed by (conv_id, msg_id) with user + state columns. Delivery state is a per-message set of (device_id, state) tuples.
Multi-Tenancy Strategy
WhatsApp isn't multi-tenant in the SaaS sense, but partitioning by conv_id gives you the same noisy-neighbour protection. A single hot group chat must be shardable: introduce sub-shards (conv_id, shard_no) once a conversation exceeds a QPS threshold.
Deep Dives
1. Connection management
- Bad: every client opens a long WebSocket to a single origin. A pod restart disconnects millions.
- Good: edge gateway pool with sticky session via user_id hashing. Health-checked; clients reconnect with exponential backoff + jitter. Resume token returned on first connect lets the gateway skip re-auth on reconnect.
- Great: edge also maintains a durable subscription table in Redis (
user_id → pod_id). On pod restart, the router replays undelivered messages from Kafka since the client's last acked offset. Client sendsSYNC { lastSeenServerId }on reconnect.
2. Offline queue
- Bad: keep outbox in memory; on refresh the queue is lost.
- Good: IndexedDB outbox + Service Worker Background Sync. Each message has a
clientId. On reconnect the SW flushes in order and de-dups byclientIdwhen the server echoes back. - Great: outbox entries have monotonic local sequence numbers so the server can preserve client ordering even when it gets a burst out-of-order. If the app is killed mid-upload, the SW resumes partial media uploads via the Resumable Uploads protocol.
3. Multi-device sync
- Bad: each device treats inbox independently, users see unread badges mismatch.
- Good: server maintains per-device high-water marks. A READ receipt from one device moves the mark and fan-outs to the user's other devices.
- Great: with E2E encryption, the server can't read content but can still route. Each device has its own key; sender encrypts the message once per recipient device (Signal protocol sender keys). Key rotation on device-add / revoke.
Frontend Considerations
- Component tree:
ChatList(virtualised) /Conversation(virtualised messages) /Composer. Virtualisation is critical for 10k-message threads. - State: conversation metadata in Zustand or Redux Toolkit; messages in a per-conversation slice keyed by
convId. Do not put all messages in one flat map; re-renders explode. - Optimistic send: append a pending message with
clientIdto the slice. OnACK, patch withserverId+serverTsand re-sort. On timeout, mark as failed with retry action. - Typing indicator: throttled at 1 Hz, with a 3s inactivity timeout.
- A11y:
aria-live="polite"for incoming messages, focus trap in composer on reply. - Perf: message list uses
react-window; media thumbnails lazy-load via IntersectionObserver; images use<img loading="lazy">anddecoding="async".
What's Expected at Each Level
- SMTS (IC4): full architecture, at least two deep dives, explicit offline + multi-device story, crisp frontend state model.
- Lead (IC5): capacity numbers that match real-world, trade-offs between CRDT vs server-authoritative ordering, pod failure recovery.
- Staff (IC6): key rotation at scale, regional failover, privacy + legal boundaries.
2. Design Google Maps
Reported Frequency: Nov 2025 SMTS Frontend HLD round, often paired with "render a 10k-point dataset smoothly".
Problem
Frontend-heavy. Build a pan-and-zoom map that renders tiles over a large geographic area, supports markers, clustering, and a search-autocomplete for places. Backend serves tiles and routing.
Clarifying Questions
- You: "Raster or vector tiles?" — Interviewer: "Both — talk about trade-offs and pick one."
- You: "Offline support?" — Interviewer: "Nice to have, cache the last viewport."
- You: "Is routing a requirement?" — Interviewer: "Yes, turn-by-turn for driving."
- You: "How many markers on screen at peak?" — Interviewer: "Can be 100k for a business search."
- You: "Mobile or desktop?" — Interviewer: "Desktop primary, but think about touch gestures."
- You: "Do we support overlays like traffic?" — Interviewer: "Yes, as an opt-in layer."
Functional Requirements
- Pan, zoom, rotate.
- Display tiles for the viewport.
- Place markers and clusters.
- Search autocomplete.
- Routing between two points.
- Traffic overlay layer.
Non-Functional Requirements
- 60 fps pan / zoom on mid-tier laptops.
- Tile load under 200 ms P95 from cache.
- First meaningful paint under 1.5 s on 4G.
- Bundle under 250 KB gzipped core.
Capacity Estimation
- World, zoom 0–22. Tile count ≈ 4^22 ≈ 17 trillion at max zoom — pre-render only popular zooms, generate the rest on demand.
- Average viewport 16 tiles. With 100M DAU, ~1.6B tile requests/day → ~20k QPS average, 100k peak. CDN absorbs ~99%.
Core Entities
Tile(x, y, z), Viewport, Marker, Cluster, Place, Route.
High-Level Architecture
Browser (Canvas / WebGL)
├── Tile Cache (Map<tileKey, ImageBitmap>) + IndexedDB
├── Marker Quadtree
└── Gesture Controller (momentum, pinch)
│
▼ HTTPS
[CDN] ── Tile origin miss → [Tile Service]
[Places Service] ── autocomplete → Elasticsearch
[Routing Service] ── contraction hierarchies → PostgreSQL + pgRouting
[Traffic Service] ── live aggregated speeds → Redis StreamsAPI Design
GET /tiles/{layer}/{z}/{x}/{y}.pbf // vector tile
GET /tiles/{layer}/{z}/{x}/{y}.png // raster fallback
GET /places/autocomplete?q=coffee&near=lat,lng // ranked results
GET /routes?from=a,b&to=c,d&mode=drive // polyline + turn list
GET /traffic/segments?bbox=... // sparse speed deltasData Model
- Tiles: S3 + CDN,
{layer}/{z}/{x}/{y}.pbf. Vector tiles are Protocol-Buffer-encoded feature collections. Content-addressed by version, cache-forever with a new version on data refresh. - Places: Elasticsearch index with geohash prefixes for proximity boost, n-gram tokenizer for prefix autocomplete.
- Routing graph: PostGIS + custom adjacency. Pre-computed contraction hierarchies for fast queries.
Vector vs Raster Tiles
| Raster | Vector | |
|---|---|---|
| Rendering | Image blit | Parse + draw on GPU |
| Styling | Fixed per tile | Restyle on client, dark mode, zoom interpolation |
| File size | 20–40 KB each | 10–30 KB each |
| CPU | Low | Higher, but amortised |
| Choice | Legacy / simple | Modern / preferred (Mapbox GL, MapLibre) |
Pick vector — smoother zoom interpolation between levels, smaller payloads, and restyling without re-fetching.
Deep Dives
1. Tile loading performance
- Bad: fetch tiles as
<img>on each pan step. DNS + HTTP overhead tanks FPS. - Good: precompute tile keys for the current viewport + 1 buffer ring. Cache in memory by
tileKey. Use HTTP/2 for multiplexed requests. LRU eviction capped at ~500 tiles. - Great: WebWorker parses vector tiles off the main thread. Progressive rendering — draw the lower-zoom tile under the higher-zoom tile so the user sees something immediately. Prefetch tiles in the direction of user pan momentum. Service Worker caches tiles per-version in Cache Storage.
2. Infinite canvas / gestures
- Bad: listen to
wheelandmousedownand translatedivon every event. Layout thrash, jank. - Good: canvas or WebGL. Gesture controller integrates velocity; rAF loop applies the current transform. Hit-testing via a quadtree.
- Great: GPU transforms only; avoid Layout. Decouple input thread from render thread with a shared transform atomic. At zoom transitions, cross-fade old and new tile sets over 150 ms for subjective smoothness.
3. Marker clustering at 100k scale
- Bad: render all markers. Browser dies at 10k DOM nodes.
- Good: Supercluster-style grid clustering. Compute clusters per zoom at ingest time; client consumes pre-clustered GeoJSON.
- Great: GPU instancing on WebGL — one draw call for all markers. Clustering happens in a WebWorker; the main thread only renders. Incremental updates on pan so only the new viewport region gets re-clustered.
Frontend Considerations
- Rendering: WebGL via MapLibre / Mapbox GL for vector; fallback to canvas for low-end devices.
- State: React for chrome (side panel, search box), imperative canvas for map. Do not put tile state in React state — re-renders will kill frame budget.
- Gestures: PointerEvents, momentum via requestAnimationFrame. Pinch-to-zoom on touch.
- Perf budgets: 16.67 ms per frame; profile with Performance panel Frame Rendering Stats.
- A11y: keyboard pan (arrow keys), zoom (+/−), list view fallback for screen readers.
What's Expected at Each Level
- SMTS: tile cache strategy, gesture perf, vector tile reasoning, marker clustering.
- Lead: WebWorker architecture, progressive rendering, CDN versioning.
- Staff: data freshness, global edge topology, cost of pre-rendering.
3. Design Notification Service
Reported Frequency: Very common. Pure backend HLD.
Problem
Multi-channel notification service: email, SMS, push (FCM/APNs), in-app. Per-user preferences per channel per category. Rate limits per user. Durable, idempotent, retryable. Multi-tenant.
Clarifying Questions
- You: "Who's the caller — internal services or external?" — Interviewer: "Internal services publish notification intents."
- You: "Latency target?" — Interviewer: "Transactional in 10s P99, marketing can be minutes."
- You: "Do we de-dup?" — Interviewer: "Yes, based on idempotency key."
- You: "Templates?" — Interviewer: "Server-rendered with variables."
- You: "Opt-out granularity?" — Interviewer: "Per channel per category per tenant."
- You: "Scale?" — Interviewer: "1B notifications/day at steady state."
Functional Requirements
- Accept notification intents with recipient, template, channel preferences, priority.
- Resolve recipient preferences.
- Render templates.
- Dispatch through channel providers.
- Retry with backoff.
- De-dup by idempotency key.
- Observability — delivery state queryable.
Non-Functional Requirements
- 1B notifications/day → ~12k/s average, 60k/s peak.
- Transactional P99 < 10s end-to-end.
- At-least-once delivery with de-dup.
- Tenant isolation.
Capacity Estimation
- 60k/s peak × 1 KB payload = 60 MB/s.
- Kafka throughput: ~500 MB/s per broker; easily fits in a modest cluster.
- Storage of delivery state: 1B rows/day × 200 B = 200 GB/day. 30-day retention = 6 TB, then archive.
Core Entities
NotificationIntent, UserPreferences, Template, DeliveryAttempt, Channel.
High-Level Architecture
Producer Services
│ (POST /notify)
▼
[Ingest API] ── validate, idempotency check (Redis), enqueue
│
▼
[Kafka: notifications]
partitions keyed by (tenant_id, user_id) for ordering
│
▼
[Dispatcher Workers] ── per channel
├── resolve prefs (PrefsService, cached)
├── render template (TemplateService)
├── apply rate limit (Redis token bucket)
├── call channel provider (SES, Twilio, FCM, APNs)
└── write DeliveryAttempt (Postgres + Kafka: delivery_log)
│
retry ◄──┤ failed (transient)
DLQ ◄────┘ failed (permanent)API Design
POST /v1/notifications
Headers: Idempotency-Key: <uuid>, X-Tenant-Id: <uuid>
Body: {
recipient: { userId: string } | { email: string } | { phone: string },
template: { id: string, data: Record<string, any> },
channels: ("EMAIL" | "SMS" | "PUSH" | "INAPP")[],
category: string, // "transactional" | "marketing" | ...
priority: "HIGH" | "NORMAL" | "LOW",
scheduleAt?: string
}
-> 202 Accepted { notificationId }
GET /v1/notifications/:id -> { state, attempts: DeliveryAttempt[] }
PUT /v1/preferences/:userId -> { channels: { EMAIL: { marketing: false, ... } } }Data Model
- Notifications (Postgres, partitioned by tenant_id, month): id, tenant_id, user_id, category, priority, state, created_at.
- DeliveryAttempts (same partitioning): notification_id, channel, provider_id, state, attempt_no, error_code, attempted_at.
- UserPreferences (DynamoDB or Postgres JSONB): tenant_id + user_id → {channel: {category: bool}}.
- IdempotencyKeys (Redis, 24h TTL):
tenant_id:idem_key → notification_id.
Multi-Tenancy Strategy
- Kafka partition key
hash(tenant_id, user_id)so one user's notifications stay in order and isolated. - Per-tenant Kafka quotas so a single tenant can't flood the cluster.
- Rate limiter keyed by
(tenant_id, user_id, channel). - DB rows all carry tenant_id; row-level security enforces scoping on queries.
- Separate "priority" topic or partition ranges for tenants with enterprise SLAs.
Deep Dives
1. Fan-out strategy for push to all devices
- Bad: a single intent with 10k recipients serialises through one worker.
- Good: fan-out at ingest — expand recipient group into N intents, each with one recipient, published to Kafka. Parallelism scales with partitions.
- Great: tiered fan-out: intent → batch job → per-recipient messages. Batch job is idempotent (Flink or Spark), checkpointed. For marketing blasts, use a shuffler to avoid hotspotting single users across retries.
2. Idempotency
- Bad: check-then-insert; races produce duplicates.
- Good: Redis
SET NXwith keytenant:idemand 24h TTL. If key exists, return stored notification_id. - Great: two-level check — Redis in the hot path, Postgres unique constraint as last line of defense. On Redis outage, fall back to DB. Store the rendered body hash so re-send with modified payload is detected.
3. Priority and starvation
- Bad: single queue; low-priority marketing blocks transactional OTPs when producer misbehaves.
- Good: separate Kafka topics per priority. Dispatcher pool per topic.
- Great: weighted pool: high-priority workers can temporarily steal low-priority capacity but not vice versa. Back-pressure signals to ingest when HIGH topic lag exceeds budget.
What's Expected at Each Level
- SMTS: clean layered architecture, idempotency, rate limiting, retries with DLQ, template rendering.
- Lead: priority lanes, fan-out strategy, provider failover (SES → SendGrid).
- Staff: global delivery, compliance (opt-out enforcement at ingest), cost optimisation.
4. Design Google Docs (Collaborative Editor)
Reported Frequency: Common. Pair programming of the HLD — they'll ask about OT/CRDT in depth.
Problem
Real-time collaborative text editor. Multiple users edit the same document with sub-second convergence. Cursor presence. Conflict resolution. Offline editing. Version history.
Clarifying Questions
- You: "Rich text or plain?" — Interviewer: "Rich text with basic formatting."
- You: "How many concurrent editors per doc?" — Interviewer: "Up to 100 typical, 200 max."
- You: "Offline edits?" — Interviewer: "Yes, merge on reconnect."
- You: "Version history granularity?" — Interviewer: "Every 30s auto-snapshot, plus named versions."
- You: "Access control?" — Interviewer: "Per-doc ACL: owner, editor, commenter, viewer."
- You: "E2E encryption?" — Interviewer: "No, server sees content."
Functional Requirements
- Collaborative edit with sub-second convergence.
- Cursor + selection presence.
- Undo / redo local.
- Version history with named versions.
- Offline edit + merge.
- Comments (threaded).
Non-Functional Requirements
- Latency: 99% of operations under 200 ms end-to-end.
- Durability: no edit loss after server ACK.
- Scale: 10M docs, 100k concurrent docs active.
Capacity Estimation
- 100k concurrent docs × 50 ops/min avg = 83k ops/s average, ~500k peak during business hours.
- Each op ≈ 100 B. 50 MB/s of op traffic.
- Snapshots every 30s: 100k × 2/min × 20 KB = 67 MB/s, ~6 TB/day before compression.
Core Entities
Document, Operation, Snapshot, Presence, Comment, AccessGrant.
High-Level Architecture
Browser (rich-text editor, e.g. Slate / TipTap)
├── Local CRDT or OT state
├── Pending ops queue (IndexedDB for offline)
└── WebSocket to Doc Shard
│
▼
[Doc Gateway] ── routes to shard by doc_id
│
▼
[Doc Shard] (stateful; single-writer per doc)
├── Apply and transform ops
├── Broadcast to other clients on this doc
├── Flush ops to Kafka log (durability)
└── Snapshot every N ops or 30s to S3
│
▼
[Ops Log: Kafka] [Snapshots: S3] [Metadata: Postgres]
[Presence: Redis pub/sub]API Design
WebSocket frame protocol:
// client -> server
{ type: "JOIN", docId, fromRev }
{ type: "OP", docId, clientId, rev, op } // op = OT or CRDT delta
{ type: "CURSOR", docId, position, selection }
// server -> client
{ type: "SYNC", snapshot, rev } // on first join
{ type: "OP", docId, authorId, rev, op } // fan-out
{ type: "CURSOR", docId, authorId, position, selection }
{ type: "ACK", clientId, rev }Data Model
- Documents (Postgres): id, owner_id, title, current_rev, created_at.
- Snapshots (S3):
docs/{docId}/snap/{rev}.bin— serialised rich-text tree. - Ops log (Kafka topic per shard): each message is (doc_id, rev, author, op bytes). Retention 30 days.
- ACL (Postgres): doc_id, user_id, role, grant_ts.
- Presence (Redis): ephemeral, TTL 30s.
OT vs CRDT (be prepared to compare)
OT (Operational Transform): central server applies ops and transforms concurrent ops against each other. Client sends op at its known revision, server transforms it against any ops in between, applies, returns new revision. Used by Google Docs historically.
- Pros: server-authoritative, smaller client state, rich-text transforms well-studied.
- Cons: needs an authoritative serialiser per doc, tricky transform functions, hard offline.
CRDT (Conflict-free Replicated Data Types): ops are commutative; any apply order converges. Examples: RGA, Yjs, Automerge.
- Pros: true peer-to-peer, excellent offline, no central transform.
- Cons: larger metadata (timestamps, tombstones), GC is hard, rich-text CRDTs less mature.
For a "server-authoritative, reasonable offline" Salesforce-style answer, pick OT and name-drop CRDT as the "going further" path.
Deep Dives
1. OT transform and doc shard
- Bad: every client and the server independently apply ops — divergence.
- Good: single-writer per doc shard. Server holds current state + history ring buffer of last N ops. Incoming op at rev
ris transformed against opsr..current, applied, broadcast. Client rebases its pending ops against the server's echoed rev. - Great: shard failover — each doc shard is a Raft group of 3 replicas. Leader applies; followers tail the Kafka log. On leader loss, a follower promotes and replays from the last snapshot.
2. Presence scaling
- Bad: broadcast cursor on every keystroke over the OT channel — saturates WebSocket.
- Good: separate presence channel, throttled to 10 Hz, ephemeral via Redis pub/sub keyed by doc_id.
- Great: spatial subscription — a client with 50 collaborators subscribes only to cursors within its viewport (rich doc case). Presence deltas compressed.
3. Offline editing
- Bad: user types offline; on reconnect the local doc overwrites the server.
- Good: local pending op queue. On reconnect, send ops one by one starting from last-acked rev. Server transforms each against concurrent ops and returns ACK with rebased rev. Client reapplies local state.
- Great: long offline windows use a CRDT-style per-session log; the server reconciles via a merge procedure that preserves user intent (heuristics for insert-after vs delete-through-typed-region). Present user with a merge conflict UI only for irreconcilable cases.
What's Expected at Each Level
- SMTS: OT flow, shard-per-doc, snapshot + log, presence, offline queue.
- Lead: Raft-based HA, explicit transform function, back-pressure.
- Staff: CRDT trade-off with concrete scale numbers, export / import, plagiarism on comment history.
5. Design Autocomplete / Typeahead
Reported Frequency: Very common in frontend rounds. Confirmed Salesforce question.
Problem
Input field that suggests completions as the user types. Client handles debouncing, race conditions, caching. Backend serves ranked completions from a prefix index. Personalisation and accessibility required.
Clarifying Questions
- You: "Data size?" — Interviewer: "10M strings, rebuild index daily."
- You: "Latency target?" — Interviewer: "Under 100 ms from keystroke to render."
- You: "Personalised?" — Interviewer: "Yes, user history boosts."
- You: "Offline?" — Interviewer: "Cache recent queries."
- You: "Multilingual?" — Interviewer: "Unicode, but no translation."
- You: "Min prefix length?" — Interviewer: "2 characters."
- You: "A11y?" — Interviewer: "Screen-reader support is mandatory."
Functional Requirements
- Suggestions after 2+ chars.
- Keyboard nav (arrows, Enter, Escape).
- Personalised ranking.
- Recent searches.
- Highlight match.
Non-Functional Requirements
- Keystroke to render: P99 < 100 ms.
- Client cache hit: > 50% during a session.
- Abort in-flight on new keystroke.
Core Entities
Query, Suggestion, UserContext, PrefixIndex.
High-Level Architecture
Input (React)
├── Debounce (150 ms)
├── LRU cache (prefix → suggestions[])
├── AbortController per in-flight request
└── aria-live status
│
▼
[Edge Cache / CDN] ← short TTL (60s) for hot prefixes
│
▼
[Autocomplete Service]
├── Prefix index (Trie in memory OR Elasticsearch completion suggester)
├── Ranker (popularity + personalisation)
└── Feature store (user recent queries)API Design
GET /v1/autocomplete?q=sa&ctx=home&userId=...
-> {
prefix: "sa",
results: [
{ text: "salesforce", score: 0.97, highlight: "<b>sa</b>lesforce" },
{ text: "samsung", score: 0.90, highlight: "<b>sa</b>msung" }
],
servedAt: "2026-04-19T09:02:11Z"
}Client Implementation
interface UseAutocompleteResult {
query: string;
setQuery: (v: string) => void;
suggestions: Suggestion[];
loading: boolean;
error: Error | null;
}
function useAutocomplete(fetcher: (q: string, signal: AbortSignal) => Promise<Suggestion[]>): UseAutocompleteResult {
const [query, setQuery] = useState("");
const [suggestions, setSuggestions] = useState<Suggestion[]>([]);
const [loading, setLoading] = useState(false);
const [error, setError] = useState<Error | null>(null);
const cacheRef = useRef(new LRUCache<string, Suggestion[]>(50));
const activeController = useRef<AbortController | null>(null);
useEffect(() => {
if (query.length < 2) { setSuggestions([]); return; }
const cached = cacheRef.current.get(query);
if (cached) { setSuggestions(cached); return; }
const t = setTimeout(() => {
activeController.current?.abort();
const controller = new AbortController();
activeController.current = controller;
setLoading(true);
fetcher(query, controller.signal)
.then((res) => {
cacheRef.current.put(query, res);
setSuggestions(res);
setLoading(false);
})
.catch((e) => {
if (e.name !== "AbortError") { setError(e); setLoading(false); }
});
}, 150);
return () => clearTimeout(t);
}, [query, fetcher]);
return { query, setQuery, suggestions, loading, error };
}Deep Dives
1. Cache invalidation
- Bad: client cache forever; stale suggestions after index refresh.
- Good: short TTL per entry (5 min) + server
servedAtcheck on response. - Great: cache key includes user context + index version; server returns
indexVersionheader; client evicts cross-version entries on change.
2. Offline
- Bad: no suggestions without network.
- Good: cache last N queries; respond from cache when offline; show subtle "offline" marker.
- Great: Service Worker precaches a static top-10k completions shard for cold-start offline use.
3. Ranking signals
- Bad: alphabetical.
- Good: popularity (frequency in logs) + recency decay.
- Great: per-user learned ranker. Features: user history, time-of-day, current session queries, geo. Lightweight logistic model served from a feature store; latency budget 10 ms.
Frontend Considerations
- A11y: listbox role.
aria-controls,aria-activedescendantpoints to the highlighted option ID.aria-live="polite"status region announces "3 suggestions". Esc closes. Enter selects. - Keyboard: Down/Up moves active, with wrap. Home/End optional.
- Race conditions: AbortController cancels stale fetches. Additionally check
if (responseQuery !== currentQuery) return;after await. - Perf: virtualise when suggestions > 20.
What's Expected at Each Level
- SMTS: debounce + abort + cache + a11y; calls out race conditions unprompted.
- Lead: ranking, index refresh, client / server cache layering.
- Staff: ML ranker, cold-start, multi-locale indexing.
6. Design Infinite Scroll / Virtualised List
Reported Frequency: Common frontend HLD, often merged with a Feed / Inbox problem.
Problem
Render a list of up to 100k items smoothly. Support variable heights, scroll restoration, prefetching of next page, keyboard accessibility.
Core Entities
Item, Viewport, ItemMeasureCache, Page.
Architecture (client-side)
ListContainer
├── IntersectionObserver for sentinel near bottom → load next page
├── Windowing (react-window / react-virtuoso)
├── ItemMeasureCache (ResizeObserver on each mounted row)
└── ScrollRestoration hookWindowing Sketch
function VirtualList<T>({ items, estimatedHeight, Row }: {
items: T[];
estimatedHeight: number;
Row: React.FC<{ item: T; index: number }>;
}) {
const [range, setRange] = useState({ start: 0, end: 20 });
const parentRef = useRef<HTMLDivElement>(null);
const measured = useRef(new Map<number, number>());
const offsets = useMemo(() => {
const out = [0];
for (let i = 0; i < items.length; i++) {
out.push(out[i] + (measured.current.get(i) ?? estimatedHeight));
}
return out;
}, [items.length, estimatedHeight]);
const onScroll = () => {
const el = parentRef.current!;
const top = el.scrollTop;
const bottom = top + el.clientHeight;
let start = 0, end = items.length;
for (let i = 0; i < offsets.length - 1; i++) {
if (offsets[i + 1] >= top) { start = Math.max(0, i - 4); break; }
}
for (let i = start; i < offsets.length - 1; i++) {
if (offsets[i] > bottom) { end = Math.min(items.length, i + 4); break; }
}
setRange({ start, end });
};
return (
<div ref={parentRef} onScroll={onScroll} style={{ height: 600, overflow: "auto", position: "relative" }}>
<div style={{ height: offsets[items.length] }}>
{items.slice(range.start, range.end).map((item, idx) => (
<div
key={range.start + idx}
style={{ position: "absolute", top: offsets[range.start + idx], left: 0, right: 0 }}
>
<Row item={item} index={range.start + idx} />
</div>
))}
</div>
</div>
);
}Deep Dives
1. Scroll performance
- Bad: render all 100k nodes. Layout time seconds, memory GBs.
- Good: windowing with buffer rings (4 items above / below).
- Great: GPU-friendly transforms (
translate3d(0, Y, 0)) instead of changingtop;will-change: transform; avoidbox-shadowon scrolling items.
2. Variable heights
- Bad: assume fixed height, mis-align on tall items.
- Good: ResizeObserver on mounted rows writes measured height into cache; offsets recompute lazily.
- Great: estimated-height placeholder until measured; on measurement, translate subsequent items by delta without triggering reflow cascade. Maintain a prefix-sum tree for O(log n) offset queries on 100k items.
3. Scroll restoration
- Bad: on navigation back, list is at top.
- Good: persist
scrollTop + firstVisibleKeyin history state; restore on mount. - Great: anchor on a stable item key (not index) — if the list changed while away, align the anchor item to its previous viewport position, then reflow around it.
4. Prefetch
- Observe a sentinel 2 pages before the end. On intersect, fetch next page; update state via functional setter to avoid stale closures. Respect React 18 transitions so scroll stays smooth.
A11y
role="list"on container,role="listitem"on rows.- Keyboard focus must not break when the focused row is virtualised out — pre-render focused row outside the window or use
tabindex="-1"fallbacks. aria-rowcountfor screen readers so they announce "item 57 of 10000".
What's Expected at Each Level
- SMTS: windowing, measurement cache, IntersectionObserver prefetch, scroll restoration.
- Lead: prefix-sum tree for variable heights, memory ceiling at 100k.
- Staff: integration with offline cache, user-visible loading patterns.
7. Design Flash Sale System
Reported Frequency: Hyderabad SMTS report, Jan 2026.
Problem
A limited-inventory flash sale. 10× normal traffic burst at sale start. Fairness: first-come-first-served (within tolerance). Prevent oversell. Multi-tenant — one platform hosts many vendors' sales.
Clarifying Questions
- You: "Fairness level?" — Interviewer: "Best-effort FIFO within a 1s bucket; strict ordering not required."
- You: "Pay or reserve?" — Interviewer: "Reserve inventory for 2 min, user pays within that window."
- You: "Known bad actors?" — Interviewer: "Yes, expect bots; bot mitigation at edge."
- You: "Inventory scale?" — Interviewer: "10k units per SKU, many SKUs."
- You: "Global?" — Interviewer: "India region only for v1."
Functional Requirements
- Inventory check + reserve with a hold.
- Queue when capacity exceeded.
- Convert hold to order on payment.
- Release on timeout or failed payment.
- Per-tenant config (start time, max per user).
Non-Functional Requirements
- Peak 200k RPS for 30s.
- P99 reserve under 300 ms during burst.
- Zero oversell.
- Tenant isolation.
Capacity Estimation
- 200k RPS × 30s = 6M attempts. For 100k units, ~60× contention.
- Inventory writes: 100k successful reservations in 30s ≈ 3k writes/s — trivial in absolute terms, but hotspotted on a few SKU rows.
High-Level Architecture
Client
▼
[Edge WAF / Bot mitigation (reCAPTCHA, fingerprint)]
▼
[Global LB]
▼
[Queue Gate] ── admission controller
│ admitted?
├── No → enqueue in virtual waiting room (SSE updates position)
└── Yes ▼
[Reserve Service]
├── atomic decrement (Redis cluster, per-SKU hash slot)
├── write hold (DynamoDB with TTL)
└── publish to Kafka for persistence + audit
▼
[Order Service]
├── checkout form
└── on pay → convert hold to order, settle payment
▼
[Primary DB (Postgres, per-tenant partition)]Data Model
- Inventory counter: Redis hash key
sale:{tenant}:{sku}:remaining, initialised at sale start. UseDECRBY 1atomically; reject if < 0. - Holds: DynamoDB
(tenant, hold_id)with TTL 120s. Secondary index by user_id + sku for per-user limits. - Orders: Postgres, tenant_id partitioned.
Deep Dives
1. Preventing oversell under concurrency
- Bad: read inventory, check, decrement. Classic TOCTOU race.
- Good: atomic
DECR; if resulting value < 0, restore and reject. Single Redis primary per SKU avoids CRDT counters. - Great: Lua script that decrements and writes hold atomically. If Redis node fails, failover to replica with write-fencing token so stale primary can't re-accept.
2. Cache stampede / hot-key
- Bad: every request reads SKU metadata fresh from DB at T-0.
- Good: pre-warm SKU metadata into every edge cache before sale start; cache-aside with mutex on miss.
- Great: request coalescing (singleflight) in the gateway — one back-end call per SKU per node during warm-up bursts.
3. Fairness queue
- Bad: let the LB distribute freely; last request wins during bursts.
- Good: virtual waiting room — on oversubscription, admission controller issues a queue token. Client polls position over SSE.
- Great: admission rate tracks drain rate of Reserve Service; feedback loop adjusts in real time. Token signed with HMAC so client can't skip line.
4. Multi-tenant isolation / noisy neighbour
- Per-tenant Redis slot for counters so a hot sale in tenant A doesn't evict tenant B.
- Per-tenant rate limits at gateway.
- Dedicated worker pool for enterprise-tier tenants.
- Each sale has its own Kafka topic partition — easy to throttle and monitor independently.
Frontend Considerations
- Queue screen with estimated time, SSE reconnection on drop.
- Disable buy button on submit to avoid double-clicks; still idempotent by client-side token.
- Optimistic UI inappropriate here: show clear "reserved" state once server confirms.
- Local timer for hold expiry, synced from server.
What's Expected at Each Level
- SMTS: oversell prevention, virtual queue, cache warming, tenant quotas.
- Lead: Lua-atomicity, fencing, singleflight, capacity planning.
- Staff: global deployment, fraud detection, regulator-grade audit, cost of over-provisioning.
8. Design a Multi-Tenant CRM Dashboard
Reported Frequency: Salesforce-specific — expect some variant in the HLD round.
Problem
A dashboard view in a multi-tenant CRM. Each tenant defines custom fields on core objects (Account, Contact, Opportunity). Users build dashboards from widgets (chart, table, KPI). Role-based access; row-level sharing rules.
Clarifying Questions
- You: "Do tenants share any data?" — Interviewer: "No, strict isolation."
- You: "Custom field types?" — Interviewer: "String, number, date, picklist, lookup."
- You: "Data volume per tenant?" — Interviewer: "Median 100k records, P99 tenant has 50M."
- You: "Dashboard refresh cadence?" — Interviewer: "On open, plus optional auto-refresh every 60s."
- You: "Are dashboards shared?" — Interviewer: "Within org, yes. Role-based viewers."
Functional Requirements
- Render dashboards with N widgets.
- Drag-and-drop layout; save per user.
- Widget queries the customised data model.
- Role-based access: only see rows the current user is allowed to see.
- Export to CSV.
Non-Functional Requirements
- P95 dashboard open < 2s.
- Strong tenant isolation.
- Custom schema changes must not require deploys.
High-Level Architecture
Browser (React)
├── Dashboard framework (grid layout, widget SDK)
├── Per-widget query hooks (React Query)
└── Local cache with tenant+user key
│
▼ GraphQL
[BFF / GraphQL Gateway]
├── tenant-scoping middleware (authz + tenant_id injection)
└── per-tenant persisted queries
│
▼
[Query Service]
├── object registry (tenant custom schema)
├── sharing-rule resolver
└── cache layer (per-tenant)
│
▼
[Primary DB (Postgres)]
├── core tables: accounts, contacts, opportunities (tenant_id column)
├── custom_fields (tenant_id, object, field_key, type)
└── custom_values (tenant_id, object_id, field_key, value)
[Analytics DB (ClickHouse)] ── for heavy aggregations
[Search (Elasticsearch)] ── per-tenant indexData Model — Schema Flexibility
Two common approaches for custom fields:
A. EAV (Entity-Attribute-Value)
CREATE TABLE custom_values (
tenant_id uuid,
object_type text,
object_id uuid,
field_key text,
value_text text,
value_number numeric,
value_date timestamp,
PRIMARY KEY (tenant_id, object_type, object_id, field_key)
);- Pros: unlimited fields, no migrations.
- Cons: joins explode for reporting; typed indexing is manual.
B. Sparse wide table / JSONB
ALTER TABLE accounts ADD COLUMN custom jsonb;
CREATE INDEX ON accounts ((custom->>'industry_tier')) WHERE tenant_id = '...';- Pros: single row fetch returns everything; GIN / functional indexes on JSONB.
- Cons: per-tenant functional indexes don't scale to thousands of tenants.
Salesforce's own pattern is a pivoted physical model (wide table with VARCHAR columns + a metadata table mapping tenant field → column). For an interview, propose JSONB with tenant-specific functional indexes for top-N tenants and EAV fallback.
Multi-Tenancy Strategy
- Every table has
tenant_id. Primary key starts with tenant_id. - GraphQL gateway injects tenant_id from the auth token; query builder composes it into every
WHERE. - Postgres row-level security as a backstop:
CREATE POLICY tenant_iso ON accounts USING (tenant_id = current_setting('app.tenant')::uuid). - Cache keys prefixed
tenant:{id}:.... - Large tenants (>10M records) optionally pinned to a dedicated schema or replica.
- Per-tenant query cost caps (governor limits): reject queries above N returned rows or T ms.
Sharing Rules / RBAC
- Roles hierarchy: Org admin / manager / member.
- Record-level ACL computed as: owner grants + role-up-the-tree grants + explicit shares.
- Materialise
user_visible_accounts(tenant_id, user_id, account_id)for fast joins on read-heavy dashboards; refresh on ACL change via CDC.
Deep Dives
1. Schema evolution at tenant-time
- Bad: custom field add triggers a DDL migration. Thousands of tenants × thousands of fields → unmanageable.
- Good: metadata table describes fields; values go in
custom_valuesor JSONB. No DDL per tenant. - Great: hot path uses pre-joined materialised view refreshed lazily; cold metadata change triggers view rebuild within seconds; plan cache invalidated per tenant.
2. Dashboard query performance
- Bad: run 8 widget queries in parallel against OLTP, dashboard hits 4s.
- Good: route aggregates to ClickHouse with tenant_id as first sort key. OLTP for drill-downs.
- Great: per-widget result cache keyed by tenant + query hash + data epoch; partial results streamed via HTTP streaming / GraphQL
@deferso the dashboard renders incrementally.
3. Noisy neighbour
- Bad: one tenant's slow query pegs the shared DB.
- Good: per-tenant connection pools with caps; query budgets enforced.
- Great: pool per tier; enterprise tenants get dedicated read replicas; analytics queries segregated to a separate cluster.
Frontend Considerations
- Dashboard rendered via a
Widgetcomponent tree. Each widget is isolated — errors bounded by an ErrorBoundary so one broken widget doesn't crash the page. - State: layout in a store (Zustand), widget data via React Query with key
[tenantId, userId, widgetId, params]. - Drag-and-drop using react-grid-layout or custom; save layout diff, not full state.
- A11y: each widget has a heading; dashboard announces updates via
aria-live="polite"on auto-refresh. - Performance: widgets lazy-load via Suspense; below-the-fold ones use IntersectionObserver.
What's Expected at Each Level
- SMTS: custom schema strategy, tenant-ID everywhere, sharing rule model, two-store pattern (OLTP + OLAP).
- Lead: materialised visibility, governor limits, hot-tenant isolation.
- Staff: schema-at-rest evolution (online reindex), cross-region residency, cost model per tenant.
Cross-Problem Playbook
Use these transitions when an interviewer nudges you toward a specific area:
- "What about scale?" → capacity numbers, shard key, hot-tenant handling.
- "What if the DB goes down?" → replication topology, RTO/RPO, degraded mode (read-only dashboards, reject writes).
- "How do you test this?" → contract tests, chaos tests, load tests with recorded traffic.
- "What metrics would you track?" → RED (Rate, Errors, Duration) per service + per tenant; end-to-end message journey timer; inventory oversell counter (must be zero); dashboard widget P95 open time.
- "How do you roll this out?" → feature flags per tenant, canary to 1% → 10% → 50% → 100%, backout plan, schema changes online-safe (expand, migrate, contract).
Personal Experience Hooks
When you can, anchor an answer in Pixis work so the interviewer sees proof:
- "At Pixis we built an autocomplete for ad-targeting keywords over ~2M terms. We debounced at 150 ms, aborted in-flight fetches on new keystrokes, and cached per-session in an LRU — hit rate was ~60% which halved backend QPS."
- "We had a real-time dashboard for ad performance; the original implementation re-rendered the whole grid on every WebSocket tick and dropped to 20 fps. I moved to per-cell subscriptions with a selector pattern and got back to 60 fps."
- "Multi-tenant context matters because our ad accounts partition at the customer level — I've debugged at least one incident where a missing tenant scope in a cache key leaked data across tenants in a preview environment."
These hooks turn a generic HLD into "I've actually done this." Salesforce interviewers weight that heavily at SMTS.