Backend Fundamentals — Salesforce SMTS

Broad-surface backend recall targeted at Salesforce SMTS Backend interviews (~3 YoE). The fundamentals show up in R3/R4/R5 follow-ups: they probe multi-tenancy, consistency, and concurrency. Answer the core question first, then volunteer one Salesforce-angle hook (tenant isolation, governor limits, bulkification) to signal you understand the platform.

Quick reference cheat sheet

Rapid one-liners. If the interviewer throws these as a lightning round, you should produce the one-liner in under 10 seconds.

Concept	One-line recall
ACID	Atomicity, Consistency, Isolation, Durability — all-or-nothing, valid state, no interference, survives crash
Read Committed	Prevents dirty reads; Postgres default; non-repeatable reads still possible
Repeatable Read	Same row reads are stable; phantoms possible in SQL standard (MySQL InnoDB blocks phantoms via gap locks)
Serializable	Full isolation; behaves as if transactions ran one at a time
B-tree index	Ordered tree, O(log n) lookup, supports range and equality
Hash index	Equality only, O(1), no range scans
Covering index	Index contains all queried columns, avoids heap lookup
Selectivity	Fraction of unique values; low selectivity = index is useless
Sharding	Partition data across nodes by key (range, hash, directory)
Consistent hashing	Keys and nodes on a ring, minimal rebalance on node churn
Replication lag	Time between write on primary and visibility on replica
2PC	Prepare + commit across participants, blocks on coordinator failure
Saga	Long-running tx as sequence of local tx + compensations
Outbox	Write event to same DB tx, separate process publishes
CDC	Stream DB changes (logical WAL) to downstream consumers
Cache-aside	App reads cache, on miss reads DB and backfills
Write-through	Writes go through cache to DB synchronously
LRU	Evict least recently used
Cache stampede	Many requests hit DB when hot key expires
Mutex	Exclusive lock
Semaphore	N permits, gate concurrency
CAS	Compare-and-swap, lock-free primitive
Deadlock	4 conditions: mutex, hold-and-wait, no preemption, circular wait
Kafka partition	Unit of parallelism and ordering
At-least-once	Retries may produce duplicates; needs idempotency
Exactly-once (Kafka)	Transactional producer + idempotent consumer + read-committed
DLQ	Dead letter queue for poisoned messages
HTTP/2	Multiplexed streams over one TCP, HPACK headers
gRPC	HTTP/2 + protobuf, supports bidi streaming
CAP	Under partition, pick Consistency or Availability
PACELC	Else (no partition), pick Latency or Consistency
Raft	Leader + log replication + majority quorum
Lamport ts	Scalar logical clock, total order
Vector clock	Per-node counter, detects concurrent events
Redlock	Redis multi-node distributed lock (controversial)
Circuit breaker	Closed → open on failures → half-open to probe
Bulkhead	Isolate resources so one tenant can't drown others
Jitter	Random offset on retry to avoid herds
JWT	Base64(header).Base64(payload).signature
OIDC	OAuth 2.0 + ID token for identity
mTLS	Both client and server present certs
RBAC	Role-based access control, permissions attached to roles
ABAC	Attribute-based; policy over subject/resource/env attributes
Row-level security	DB-enforced filter on tenant_id/owner
CQRS	Split command and query models
Event sourcing	State = fold(events), append-only log

How to deploy this in Salesforce interviews

At SMTS level the interviewer cares less about textbook definitions and more about how you reason under multi-tenant constraints. Use this mental model when answering:

Default to tenant isolation. Every data structure, cache key, queue name, thread pool, and log line should carry a tenantId (or orgId in Salesforce lingo). If your answer doesn't mention it, the interviewer will push until you do.
Fairness beats peak throughput. Governor limits exist because one noisy tenant should never starve the other 150k orgs on the same pod. When you pick a pattern, articulate how it bounds the blast radius.
Consistency is a product decision, not just a DB flag. Salesforce records must be consistent within an org (CP), but search indexes and reports can be eventually consistent (AP). State which side you're on.
Bulkify by default. Don't loop a network call per record; batch. When you propose a solution, describe the batch boundary (200 records, 10MB, 30s window, whichever comes first).
Always describe failure modes. For every happy path, mention retry behavior, idempotency key, DLQ, and how you'd detect it in observability.

Interview followup pattern you'll see: "OK, that works for 1 tenant. What breaks at 10k tenants? What breaks at 150k?" Always have the next-order answer ready.

Section 1 — Databases

Databases are the heaviest fundamentals topic for Salesforce because the platform is fundamentally a database-as-a-service with a programmable layer. Expect isolation, indexing, and multi-tenant schema design to eat 20+ minutes in a 60-minute loop.

ACID

Definition. Atomicity (tx is all-or-nothing), Consistency (tx moves DB from valid state to valid state respecting constraints), Isolation (concurrent tx don't interfere per the chosen level), Durability (committed data survives crashes, typically via WAL fsync).

When ACID. Financial ledgers, inventory debits, anything where a half-applied change corrupts the business (Salesforce Opportunity + OpportunityLineItem must commit together — orphan line items = bug report from a Fortune 500 customer).

When not ACID. Analytical pipelines, activity streams, audit logs that only append. Eventual consistency is fine and cheaper.

Interview followup. "What does Consistency in ACID mean vs Consistency in CAP?" They're different: ACID-C is about invariants (unique constraints, FK, check constraints), CAP-C is about replica agreement.

Salesforce angle. A Salesforce transaction inside Apex is ACID within an org — inserts, triggers, and workflows run in a single DB transaction and roll back together.

Isolation levels

Ordered weakest to strongest:

Level	Dirty read	Non-repeatable read	Phantom read	Typical engine
Read Uncommitted	Possible	Possible	Possible	SQL Server (if set)
Read Committed	Blocked	Possible	Possible	Postgres default, Oracle default
Repeatable Read	Blocked	Blocked	Possible (standard); Blocked in InnoDB via gap locks	MySQL InnoDB default
Serializable	Blocked	Blocked	Blocked	Rare default; Postgres SSI

Dirty read — T1 reads T2's uncommitted write.
Non-repeatable read — T1 reads row, T2 updates and commits, T1 reads again and sees different data.
Phantom read — T1 runs SELECT ... WHERE age > 30, T2 inserts a matching row and commits, T1 reruns and sees a new row.

Postgres SSI (Serializable Snapshot Isolation) detects conflicts at commit and aborts one tx; requires retry logic.

Interview followup. "Your code does SELECT balance, UPDATE balance = balance - 100. Is Read Committed safe?" No — classic lost update. Use SELECT ... FOR UPDATE, an atomic UPDATE balance = balance - 100 WHERE balance >= 100, or bump isolation to Repeatable Read.

Salesforce angle. Salesforce's DB layer is Oracle; most workloads run at Read Committed with row-level locks (FOR UPDATE) for things like approval processes or sequence generation.

Indexes

B-tree is the default. Ordered, supports equality, range (BETWEEN, <, >), prefix matches (LIKE 'abc%'), and ORDER BY. O(log n) lookup.

Hash is equality-only, O(1). No range scans. Postgres has hash indexes; most engines default to B-tree anyway.

Covering index includes all columns needed by a query, so the engine never visits the heap. Postgres INCLUDE clause, MySQL secondary indexes already carry PK so they can be covering by accident.

Composite (a, b, c) supports WHERE a = ?, WHERE a = ? AND b = ?, WHERE a = ? AND b = ? AND c = ?. Does not support WHERE b = ? alone (leftmost prefix rule).

Partial indexes cover a subset: CREATE INDEX ... WHERE deleted = false. Cheaper to build and smaller.

When indexes hurt. Write-heavy tables (every insert updates every index). Low-selectivity columns (boolean, status with 3 values — scan is faster). Huge wide indexes on small tables.

Selectivity. unique_values / total_rows. An index on is_active with 50/50 split is useless; planner will seq scan. An index on email with near-1.0 selectivity is ideal.

Interview followup. "You have a composite index (org_id, created_at). Which queries use it?" Anything filtering on org_id alone, or org_id + created_at. Not created_at alone.

Salesforce angle. Multi-tenant tables start every index with org_id. A query without a leading org_id predicate will scan across tenants and is usually rejected at code review.

Query optimization

EXPLAIN / EXPLAIN ANALYZE. Read bottom-up. Watch for Seq Scan on large tables, Nested Loop with high row counts (should be Hash Join), and huge rows estimates that are off by 10x (stale statistics → run ANALYZE).
Index hints. MySQL USE INDEX, Oracle /*+ INDEX(...) */. Last resort — usually means stats are wrong.
Query rewriting. Turn correlated subqueries into joins. Replace OR with UNION ALL when each branch is selective. Push predicates down before joins.
N+1. Loading parent, then looping children per parent. Fix with join, IN (...) batch, or dataloader pattern.

Interview followup. "The query was fast yesterday and slow today, same data size. What happened?" Plan flip due to stale stats, parameter sniffing, or a new index changing planner choices.

Normalization and when to denormalize

1NF — atomic columns, no repeating groups.
2NF — 1NF + no partial dependency on a composite key.
3NF — 2NF + no transitive dependency.
BCNF — stricter 3NF.

Denormalize when. Read-heavy, reporting, dashboards, hot paths where the join cost dominates. Duplicate data with a refresh job or CDC pipeline, and document the source of truth.

Interview followup. "How do you keep a denormalized column in sync?" Triggers (fragile), CDC into materialized view, or rebuild-on-write in application code guarded by a single writer.

Salesforce angle. Reports and list views use denormalized summary fields (rollup summary, formula fields). The platform maintains them via background jobs.

SQL vs NoSQL decision tree

Pick SQL when:

You need multi-row transactions.
Schema is stable and you value constraints.
Ad-hoc analytics with joins.
Regulatory audit trails.

Pick NoSQL when:

Schema is genuinely flexible (event payloads, product catalogs with vendor-specific attrs).
Scale of writes exceeds a single primary and sharding SQL is operationally painful.
You need a specific data model (graph for relationships, time-series for metrics).
Latency budgets demand in-memory (Redis).

Most Salesforce backend services are SQL first; Redis for cache, Kafka for events, document stores only where justified.

NoSQL types

Document (MongoDB, DocumentDB). JSON-ish documents, flexible schema, indexes on fields. Good for content-heavy data.
Key-value (Redis, DynamoDB KV mode). Single key → value. Fastest. Caches, session stores, rate limit counters.
Columnar (Cassandra, HBase, ScyllaDB). Wide rows keyed by partition; great for time-series and write-heavy workloads. Tunable consistency.
Graph (Neo4j, Neptune). Nodes and edges with traversal queries. Fraud detection, social graphs.
Time-series (InfluxDB, TimescaleDB). Optimized for append + time-range reads with retention policies. Metrics, IoT.

Interview followup. "When would you choose Cassandra over MongoDB?" Wide-column workloads with heavy writes, multi-DC active-active, tunable consistency (QUORUM, LOCAL_QUORUM). MongoDB is better for flexible schemas and secondary indexes.

Sharding

Splitting a logical table across physical nodes.

Range sharding. Shard 1: a–m, Shard 2: n–z. Simple range queries, but hotspots (e.g., if key is timestamp, all writes hit latest shard).
Hash sharding. shard = hash(key) % N. Even distribution. Range queries must scatter.
Directory sharding. Lookup table from key → shard. Flexible, can rebalance per-key, but the directory is a SPOF and a hot read.
Consistent hashing. Keys and nodes hashed onto a ring; a key lives on the next clockwise node. Adding/removing a node only moves 1/N keys. Use virtual nodes (vnodes, typically 100–256 per physical node) to even out distribution and make rebalancing granular.

Hotspot mitigation. Salt the key ({tenantId}:{randomBucket0-15}:{id}), add a secondary prefix, time-bucket, or route power-user tenants to dedicated shards.

Interview followup. "A tenant is 100x the next largest. What do you do?" Isolate them: move to their own shard (pod in Salesforce terms), or use a different storage tier for their large objects.

Salesforce angle. Orgs are assigned to pods; large orgs may get dedicated pods. Data within an org stays on one pod.

Replication

Master-slave (primary-replica). One writer, N readers. Simple, read scaling, async lag.
Master-master (multi-primary). Writes to any node; needs conflict resolution (last-write-wins, CRDTs, app-level merge). Operationally hard.
Sync replication. Writer waits for replica ack. Zero data loss on failover, higher write latency. Postgres synchronous commit, Oracle Data Guard SYNC.
Async replication. Writer returns immediately, replicas catch up. Risk of data loss on primary crash.

Read replicas serve stale data (lag from ms to minutes under load). Don't read your own writes from a replica — pin the read to primary after a write, or use a session token to verify the replica has caught up (MySQL GTID, Postgres LSN).

Interview followup. "How do you handle a replication lag spike?" Alert on lag metric; throttle write-heavy jobs; fall back to primary for critical reads; investigate long transactions blocking apply.

Multi-tenant DB patterns

Three canonical choices:

Shared DB + shared schema (pool model). Every table has tenant_id. Every query has WHERE tenant_id = :t. Cheapest, scales to millions of tenants. Risk: a missing predicate leaks data across tenants. Mitigate with row-level security (Postgres RLS) or a mandatory repository layer that injects tenant_id.
Shared DB + separate schema (bridge model). One DB, one schema per tenant. Schema migrations run per tenant (slow at scale). Moderate isolation; noisy neighbors still share buffer pool.
DB per tenant (silo model). Best isolation, easy backup/restore per tenant, easy compliance. Expensive at >1k tenants; migrations become orchestration problems.

Interview followup. "Customer demands their data in EU-only, others stay in US." DB-per-tenant or at least shard-per-region. The pool model can't satisfy data-residency by itself.

Salesforce angle. Historically Salesforce uses a shared schema on Oracle with org_id on every row, plus a sophisticated metadata layer. It's the textbook pool model, proven at 150k+ orgs per pod.

Partitioning

Horizontal (aka sharding when across nodes). Rows split by a key. Postgres 10+ declarative partitioning: PARTITION BY RANGE (created_at) or LIST (region) or HASH (tenant_id).
Vertical. Split wide tables into narrow ones by column usage (hot columns vs blob columns).

Tenant-aware partitioning. Hash-partition on org_id so tenant data co-locates on one partition. Aids partition pruning on every query that filters by org_id.

Interview followup. "Why partition if you have indexes?" Partition pruning skips whole partitions (smaller index to walk), aids maintenance (drop a partition = drop a table, no VACUUM), supports tiered storage (old partitions → slow disk).

Distributed transactions

2PC (two-phase commit). Prepare phase: each participant writes a prepare record and votes yes/no. Commit phase: coordinator broadcasts decision. Blocks if coordinator dies between prepare and commit — participants hold locks until they recover the decision.
3PC. Adds a pre-commit phase to reduce blocking. Rare in practice; network assumptions don't hold.
Sagas. Sequence of local transactions; each step has a compensating transaction. Orchestration — central coordinator drives steps (easier to reason, the coordinator is a SPOF unless itself HA). Choreography — services publish/subscribe events (scales, but the flow is scattered across services and hard to debug).
Outbox pattern. Write domain change and event row to the same DB transaction. A poller or CDC process reads the outbox and publishes to Kafka. Guarantees at-least-once publish atomically with state change.
TCC (Try-Confirm-Cancel). Participants expose try/confirm/cancel APIs. Try reserves resources, confirm commits, cancel undoes. More explicit than saga; business logic must support reservation.

Interview followup. "2PC vs Saga?" 2PC needs all participants online, supports same-transaction semantics, blocks under coordinator failure. Sagas give up atomicity for availability; you write compensations and accept intermediate visibility.

Salesforce angle. Cross-org or cross-service flows use saga-style orchestration with an idempotency key; Platform Events + Outbox publishes changes to subscribers.

CDC (Change Data Capture)

Stream row-level changes out of a DB.

Debezium. Reads the WAL/binlog via logical replication slots; publishes to Kafka.
Maxwell's Daemon. MySQL-only; simpler.
Postgres logical replication. Built-in, subscribers can be other Postgres instances or third-party sinks.

Why CDC. Keep caches, search indexes (Elasticsearch), analytics DBs, and downstream services in sync without dual-writes. Replaces trigger-based event publishing.

Interview followup. "How do you handle schema changes under CDC?" Use a schema registry (Avro + Confluent), make additive changes, handle the consumer side with schema evolution.

Section 2 — Caching

Caching is where you buy performance with complexity. Every cache pattern introduces a consistency window; the question is whether you can tolerate it.

Cache patterns

Cache-aside (lazy loading). App reads cache → miss → app reads DB → backfill cache. Writes invalidate or update cache. Most common. Cache only holds what's been requested.
Read-through. Cache client transparently loads from DB on miss. App sees a single API. Requires a cache layer that can reach DB (e.g., library or proxy).
Write-through. Write goes through cache synchronously to DB. Cache is always consistent with DB. Slightly higher write latency.
Write-behind (write-back). Write to cache, async flush to DB. Fast writes, risk of data loss on cache node death.
Refresh-ahead. Proactively refresh hot keys before TTL expiry. Avoids miss spikes; wasted work on keys that aren't actually requested again.

When NOT to cache. Strongly consistent financial reads, write-heavy workloads (cache churn > benefit), data that's cheap to compute (hit the DB).

Interview followup. "Cache-aside vs write-through for user profile?" Cache-aside is simpler and fits the read-heavy pattern. Write-through if profile updates must be immediately visible across services.

Eviction policies

LRU (Least Recently Used). Evict the item untouched the longest. Redis approximation via sampling (maxmemory-policy allkeys-lru).
LFU (Least Frequently Used). Evict the least accessed overall. Better for Zipfian traffic where some keys are permanently hot. Redis allkeys-lfu.
FIFO. Evict oldest inserted. Simple, rarely ideal.
TTL. Time-based; every key has a deadline. Combine with LRU.
Random. Evict a random key. Surprisingly okay when you have many equal-weight keys and want cheap eviction.

Interview followup. "You see cache hit rate drop after a deploy. What do you check?" Key format change (prefix bumped), TTL too short, memory pressure triggering eviction, cold cache right after deploy (warm it).

Consistency

Cache-aside with TTL trades freshness for simplicity. If you need strong consistency:

Write-through (cache+DB synchronously).
Invalidate-on-write (delete the cache key in the same transaction — but what if the cache delete succeeds and DB rollback? Order: write DB first, then invalidate. If invalidation fails, a short TTL bounds damage).
Double-delete pattern: delete before write, write DB, delete again after short delay (defeats racing readers backfilling stale data).

Event-driven invalidation. CDC stream publishes to a topic; cache consumers invalidate. Decouples producers from caches.

Interview followup. "Two readers miss simultaneously — both read DB and write cache. Which wins?" Last writer, but both values are the same so it's fine. The real risk is if a stale reader (from before a write) and a fresh reader race to backfill.

Thundering herd / cache stampede

Hot key expires → 10k requests miss simultaneously → 10k DB reads. Mitigations:

Jittered TTL. base + rand(0, jitter). Spreads expiration.
Request coalescing / single-flight. First miss triggers load; other requests wait on the same promise. Go's singleflight, Java CompletableFuture reuse, Guava LoadingCache.
Probabilistic early expiration (XFetch). Each reader has a small chance to refresh before TTL; probability rises as TTL approaches. Avoids any cliff.
Stale-while-revalidate. Return stale value, refresh async. Good for read-heavy, eventually consistent data.
Bloom filter gate. Avoids cache penetration (cache misses for keys that don't exist in DB either).

Cache invalidation strategies

"There are only two hard things in computer science: cache invalidation and naming things." Karlton's law.

TTL. Set-and-forget; bounded staleness.
Explicit invalidation on write. App code deletes keys after state change.
Tag-based invalidation. Associate keys with tags; invalidate by tag. Varnish, Rails cache tags.
Event-driven (CDC or pub-sub). Most robust for multi-service systems; decouples writers and readers.

Distributed caches

Redis Cluster. Sharded (16384 hash slots), single-threaded per node (fast, but a long command blocks the slot), supports data structures (lists, sorted sets, streams, hashes), Lua scripting for atomic multi-op, persistence (RDB, AOF), pub/sub.
Memcached. Simpler, multi-threaded, strings only, no persistence, no replication built-in. Good for plain K/V cache at very high throughput.

When Redis. You need data structures, atomic ops, pub/sub, or persistence. When Memcached. Pure ephemeral K/V, extreme throughput, minimal ops overhead.

Tenant-aware caching

Per-tenant key namespace. cache:v1:{tenantId}:user:{userId}. Never omit tenantId; easy grep in incidents.
Per-tenant quota. Use CLIENT LIST + track memory via pattern, or dedicated Redis instances per shard. Avoids one tenant monopolizing cache memory.
Eviction fairness. LRU across tenants is unfair to quiet tenants with occasional important reads. Consider separate cache databases per tier (premium vs standard).
Cache key versioning. v1 prefix lets you deploy new serialization without invalidation storms — roll to v2 in code and old keys expire naturally.

Interview followup. "A large tenant fills 90% of the cache. How do you protect other tenants?" Namespace + quota tracking, separate Redis DB per shard/tier, or evict by tenant memory fairness. Salesforce style: per-pod caches so tenants can't cross pod boundaries.

Section 3 — Concurrency

Concurrency is where SMTS interviews actually stress-test you. Expect live code, expect follow-ups on the JMM, expect deadlock scenarios.

Primitives

Mutex. Exclusive lock, one holder. synchronized block, ReentrantLock.
Semaphore. N permits. Bound concurrency (e.g., max 10 DB connections). java.util.concurrent.Semaphore.
Read-write lock. Many readers OR one writer. Best when reads dominate. ReentrantReadWriteLock.
Condition variable. Wait for a predicate while holding a lock. Object.wait/notify, Condition.await/signal.
Monitor. Lock + condition bundled per object. Java's intrinsic lock on every Object.
Barrier / Latch. CountDownLatch (one-shot), CyclicBarrier (reusable, all threads meet before proceeding), Phaser.

Java-specific toolbox

synchronized — intrinsic, reentrant, always releases on exception.
ReentrantLock — explicit, supports tryLock(timeout), fair mode, interruptible wait.
ReentrantReadWriteLock — read-mostly shared state; write lock is exclusive, read lock shared.
StampedLock — optimistic reads without blocking writers; validate stamp before use.
ConcurrentHashMap — striped locks historically, now CAS-based buckets. Use computeIfAbsent for atomic memoization.
BlockingQueue — producer-consumer handoff; ArrayBlockingQueue, LinkedBlockingQueue, SynchronousQueue (zero capacity handoff).
CompletableFuture — async composition, thenApply, thenCompose, allOf, anyOf, custom executor.
ForkJoinPool — work-stealing, for recursive divide-and-conquer. Backs parallel streams.
ThreadLocal — per-thread slot; beware memory leaks in thread pools (always remove() in finally).
AtomicInteger, AtomicReference, LongAdder — CAS-based lock-free counters. LongAdder scales better under contention than AtomicLong.
VarHandle (Java 9+) — low-level CAS, fences.
Virtual threads (Java 21) — lightweight threads scheduled on carrier threads; blocking I/O no longer costs a platform thread. Perfect for multi-tenant request-per-thread models.

Race conditions

TOCTOU (Time-Of-Check-To-Time-Of-Use). Check a condition, then act based on it; another thread changes state between. Classic if (!map.containsKey(k)) map.put(k, v) — use putIfAbsent or computeIfAbsent.
Check-then-act. Subset of TOCTOU.
Compound actions. Increment, read-modify-write. Must be atomic or locked.

Deadlock

Four necessary conditions (Coffman):

Mutual exclusion.
Hold and wait.
No preemption.
Circular wait.

Break any one to prevent deadlock:

Lock ordering. Always acquire locks in a global canonical order (e.g., by object hashcode or ID).
Timeouts. tryLock(timeout) — if you can't get the lock in time, back off and retry.
Detection. Wait-for graph + periodic cycle detection; abort one tx. DBs do this natively.
Prevention via single big lock. Simplest, worst throughput.

Interview followup. "Two services, each calls the other and they deadlock." Same principles: canonical call ordering or timeouts and compensating actions.

Livelock and starvation

Livelock. Threads actively change state but no one makes progress (two polite people dodging in a hallway). Fix with random backoff or priority.
Starvation. A thread never gets the resource (unfair lock, high-priority threads monopolizing). Fix with fair locks or priority inheritance.

Actor model

State is encapsulated in an actor; communication only via asynchronous messages. Actors process messages one at a time, so no shared mutable state. Great for multi-tenant: one actor per tenant/session. Frameworks: Akka (Scala/Java), Erlang/OTP, Microsoft Orleans.

When. Naturally concurrent, state-ful entities (one connection, one device, one org). When not. Transactional cross-entity updates (you'll end up doing distributed coordination, same hard problems).

Java Memory Model

Happens-before. Partial ordering on actions. Writes before volatile write are visible after volatile read of the same variable. Monitor release happens-before next monitor acquire. Thread start happens-before first action of that thread.
volatile. Visibility (no caching in registers) + prevents reordering across the volatile access. Not atomicity for compound actions.
synchronized. Mutual exclusion + full happens-before on entry/exit.
final fields. Safely published after constructor returns (with a small caveat: don't leak this from the constructor).

Interview followup. "Is volatile int counter; counter++ safe?" No — counter++ is read-modify-write, not atomic. Use AtomicInteger.

Structured concurrency (Java 21 preview)

StructuredTaskScope bounds the lifetime of spawned virtual threads to the scope. Errors propagate; cancellation fans out. Replaces ad-hoc Future juggling. Cleaner for fan-out reads (fetch user + org + permissions in parallel with a deadline).

Key Java concurrency idiom — counter

JavaC++TypeScript

java

// Atomic counter (lock-free, scales under contention)
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.LongAdder;

class Counter {
    private final AtomicInteger count = new AtomicInteger();

    public int increment() {
        return count.incrementAndGet(); // CAS under the hood
    }
}

// LongAdder scales better than AtomicLong under heavy contention
class HighContentionCounter {
    private final LongAdder count = new LongAdder();
    public void increment() { count.increment(); }
    public long value() { return count.sum(); }
}

// synchronized alternative (coarse lock, simpler, slower under contention)
class SyncCounter {
    private int count = 0;
    public synchronized int increment() { return ++count; }
}

cpp

// std::atomic is the C++ equivalent; memory order matters
#include <atomic>

class Counter {
    std::atomic<int> count{0};
public:
    int increment() {
        return count.fetch_add(1, std::memory_order_relaxed) + 1;
    }
};

// Mutex alternative
#include <mutex>
class MutexCounter {
    int count = 0;
    std::mutex m;
public:
    int increment() {
        std::lock_guard<std::mutex> lk(m);
        return ++count;
    }
};

typescript

// Node is single-threaded for JS; no data races on primitives.
// Concurrency problems show up as logical races across async boundaries.
// Here's a "limit N concurrent" pool — analog of Java Semaphore.
async function pMap<T, R>(
  items: T[],
  fn: (item: T) => Promise<R>,
  concurrency = 10,
): Promise<R[]> {
  const results: R[] = new Array(items.length);
  let cursor = 0;
  async function worker() {
    while (true) {
      const i = cursor++;
      if (i >= items.length) return;
      results[i] = await fn(items[i]);
    }
  }
  await Promise.all(Array.from({ length: concurrency }, worker));
  return results;
}

// For true shared-memory concurrency in Node: SharedArrayBuffer + Atomics
const sab = new SharedArrayBuffer(4);
const view = new Int32Array(sab);
Atomics.add(view, 0, 1); // atomic increment across worker threads

Read-write scenarios

JavaC++TypeScript

java

import java.util.concurrent.locks.ReentrantReadWriteLock;

class Cache<K, V> {
    private final Map<K, V> map = new HashMap<>();
    private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();

    public V get(K key) {
        lock.readLock().lock();
        try { return map.get(key); }
        finally { lock.readLock().unlock(); }
    }

    public void put(K key, V value) {
        lock.writeLock().lock();
        try { map.put(key, value); }
        finally { lock.writeLock().unlock(); }
    }
}

// For read-heavy, prefer ConcurrentHashMap. RWLock is only a win if writes
// are rare and reads are long.

cpp

#include <shared_mutex>
#include <unordered_map>

template <typename K, typename V>
class Cache {
    std::unordered_map<K, V> map;
    mutable std::shared_mutex m;
public:
    V get(const K& k) const {
        std::shared_lock lk(m); // reader lock
        return map.at(k);
    }
    void put(const K& k, V v) {
        std::unique_lock lk(m); // writer lock
        map[k] = std::move(v);
    }
};

typescript

// Node: no data races on a JS Map in a single event loop.
// Logical races across async boundaries still require coordination.
class AsyncCache<K, V> {
  private map = new Map<K, V>();
  private inflight = new Map<K, Promise<V>>();

  async get(key: K, loader: (k: K) => Promise<V>): Promise<V> {
    const cached = this.map.get(key);
    if (cached !== undefined) return cached;
    // single-flight: coalesce concurrent misses
    const existing = this.inflight.get(key);
    if (existing) return existing;
    const p = loader(key).then((v) => {
      this.map.set(key, v);
      this.inflight.delete(key);
      return v;
    });
    this.inflight.set(key, p);
    return p;
  }
}

Section 4 — Messaging and queues

SMTS interviews love async: "how do you make this not block the request?" Know Kafka and RabbitMQ well enough to defend your choice.

Kafka

Topic. Logical stream. Partition. Ordered, append-only log; unit of parallelism and ordering.
Offset. Position in partition. Consumers track their own offset.
Consumer group. Set of consumers collaborating on a topic; each partition is owned by exactly one consumer in the group at a time.
Retention. Time-based (e.g., 7 days) or size-based. Messages stay on disk, consumers replay by seeking.
Log compaction. Keeps latest value per key. Useful as a database changelog.
Exactly-once semantics (EOS). Transactional producer (writes to multiple partitions are atomic) + idempotent producer (dedupes on producer ID + sequence) + isolation.level=read_committed on consumer + consumer commits offsets transactionally. EOS only within Kafka; side effects outside Kafka still need idempotency.
Rebalancing. When a consumer joins/leaves, partitions reshuffle. Cooperative sticky assignor minimizes churn.

Ordering. Per partition only. If you need per-entity order, hash by entity key to a partition.

Pitfalls. Too many partitions → metadata overhead, long rebalances. Too few → limits consumer parallelism. Partition count is hard to change (new partitions change hashing → order breaks).

RabbitMQ

Exchange types.
- Direct. Routing key equality.
- Topic. Wildcard routing (orders.*.us).
- Fanout. Broadcast to all bound queues.
- Headers. Match on headers instead of routing key.
Queue. Ordered FIFO buffer.
Binding. Rule connecting exchange to queue.
Acks. Consumer acks after processing; broker redelivers on channel close without ack.
Prefetch (QoS). Limit unacked messages per consumer to prevent one slow consumer hoarding.

Kafka vs RabbitMQ

Aspect	Kafka	RabbitMQ
Model	Distributed log	Smart broker, dumb consumer
Throughput	Very high (100k+ msg/s/broker)	High (tens of k msg/s)
Ordering	Per partition	Per queue
Retention	Days/weeks; replay	Until ack
Consumer model	Pull, offset-based	Push (or pull), ack-based
Routing	Client-side (by key)	Server-side (exchanges)
Best for	Event streaming, log pipelines, CDC, analytics	Task queues, RPC patterns, complex routing

Interview followup. "Why pick Kafka for an order event stream?" Replay for new consumers, retention, high throughput, partition ordering per customer.

Delivery semantics

At-most-once. Fire and forget; may lose.
At-least-once. Retry until ack; may duplicate. Default assumption; design consumers to be idempotent.
Exactly-once. Hard in distributed systems. Kafka EOS within Kafka; for external side effects, achieve "effectively-once" via idempotency keys.

Idempotency

Idempotency key. Client-generated UUID in header. Server stores processed keys with result; retry returns stored result.
Unique constraint. DB unique index on business key; second insert errors cleanly.
Dedup table. (key, expires_at); background cleanup. Sized for your retry window.
Natural idempotency. UPDATE SET state = 'shipped' WHERE id = ? AND state = 'paid' — re-running is a no-op.

Interview followup. "How long do you keep idempotency keys?" Longer than the max retry window (e.g., 7 days). Size the table; partition by day; drop old partitions.

DLQ and retry

DLQ (dead letter queue). Messages that fail after N retries go here for manual inspection.
Exponential backoff with jitter. delay = min(cap, base * 2^attempt) + rand(0, base). Jitter avoids retry herds.
Retry budgets. Cap retries per unit time to avoid amplifying outages.
Poison message. Same message fails repeatedly; route to DLQ early.
Replay tooling. DLQ consumer with manual dispatch; do not auto-replay without operator approval.

Order guarantees

Kafka: per-partition only. Hash by entity key.
RabbitMQ: per-queue only, and only with a single consumer (or single active consumer flag). Multiple consumers of one queue parallelize but lose order.

Salesforce angle. Platform Events are Kafka-backed; per-org events get ordered within their partition assignment. Bulk API uses batch jobs (BlockingQueue analog) to serialize work per org.

Section 5 — Networking

Network fundamentals come up in R3/R4 when the interviewer probes "what happens between services."

HTTP methods

GET — safe, idempotent, cacheable. Never mutates.
POST — create or process. Not idempotent by default.
PUT — replace resource at URL. Idempotent.
PATCH — partial update. Not necessarily idempotent (depends on body).
DELETE — remove. Idempotent.
HEAD, OPTIONS — metadata and CORS preflight.

Status codes

2xx success: 200 OK, 201 Created (Location header), 202 Accepted, 204 No Content.
3xx redirection: 301 permanent, 302/307 temporary, 304 Not Modified.
4xx client: 400 Bad Request, 401 Unauthorized (missing auth), 403 Forbidden (auth but denied), 404, 409 Conflict, 422 Unprocessable, 429 Too Many Requests.
5xx server: 500, 502 Bad Gateway, 503 Service Unavailable (with Retry-After), 504 Gateway Timeout.

HTTP/1.1 vs HTTP/2 vs HTTP/3

HTTP/1.1. Text-based, one request per TCP connection (pipelining broken in practice), head-of-line blocking. Keep-alive reuses connections.
HTTP/2. Binary framing, multiplexed streams over one TCP (many parallel requests, no HOL at HTTP layer). HPACK header compression. Server push (mostly deprecated).
HTTP/3. Over QUIC (UDP-based). Solves TCP HOL (packet loss doesn't block other streams). Faster handshake (0-RTT with session resumption). Good over lossy networks (mobile).

REST vs GraphQL vs gRPC

REST. Resource-oriented, HTTP verbs, JSON. Simple, cacheable, ubiquitous. Over-fetching/under-fetching on nested resources.
GraphQL. Client selects exactly the fields it needs. One endpoint, complex caching, n+1 risk on the server (use dataloader). Schema-first.
gRPC. HTTP/2 + protobuf, typed contracts, unary + server/client/bidi streaming, code-gen. Best for service-to-service where you control both ends. Not browser-native without grpc-web.

When each.

REST for public APIs and simple CRUD.
GraphQL for aggregating many sources under a single flexible client (BFF).
gRPC for internal microservices, low latency, streaming.

WebSocket

Handshake. HTTP/1.1 request with Upgrade: websocket + Sec-WebSocket-Key. Server responds 101 Switching Protocols. Connection becomes full-duplex.
Ping/pong. Keep-alive and liveness; close if no pong within timeout.
Backpressure. TCP provides backpressure, but application must handle queue buildup (close or drop).
Scaling. Sticky routing or a shared pub/sub (Redis) for fanout across instances.

gRPC

Protobuf. Binary schema, forward/backward compatible if you follow rules (never change field numbers; make new fields optional).
Unary — request/response.
Server streaming — one request, stream of responses.
Client streaming — stream of requests, one response.
Bidi streaming — both sides stream.
Interceptors — cross-cutting (auth, tracing, retries). Analog of HTTP middleware.
Deadlines. Every call must carry a deadline; propagate through fan-outs.

DNS

Resolution. Recursive resolver → root → TLD → authoritative NS → answer.
TTL. Caching duration. Short TTL (30–60s) enables fast failover; long TTL (hours) reduces load.
Geo-DNS. Returns different IPs per client geography.
Round-robin DNS. Multiple A records, client picks. Crude load balancing; no health checks. Prefer a real LB.

TLS

Handshake (TLS 1.3). ClientHello (cipher suites + key share) → ServerHello + cert + Finished → client Finished. One RTT. 0-RTT resumption with prior session.
Cert validation. Chain to trusted root; check name, expiry, revocation (OCSP or CRL). SNI lets one IP host many certs.
mTLS. Both sides present certs. Used for service-to-service trust inside a mesh; replaces or complements token auth.

Load balancers

L4 (TCP/UDP). Routes by IP+port. Fast, protocol-agnostic. No HTTP awareness, no per-request routing.
L7 (HTTP). Routes by path, host, header, cookie. Can do TLS termination, header rewrites, WAF. Modern LBs (Envoy, NGINX, HAProxy, ALB) are L7.

Algorithms.

Round-robin. Simple; ignores load.
Least connections. Sends to the LB-tracked least-busy backend. Good for long connections.
Consistent hashing. Same key → same backend. Cache affinity, sticky sessions without cookies.
Power of two choices. Pick 2 random backends, send to less-loaded. Nearly optimal, cheap.

Section 6 — Distributed systems primitives

At SMTS, the interviewer expects you to name patterns and describe tradeoffs in the same sentence. Practice that pairing.

CAP theorem

Under a network partition, pick Consistency or Availability — you can't have both. Without partition, you always have both; CAP only bites during failure.

CP. Rejects requests that would violate consistency (e.g., primary unreachable → reads/writes fail). Examples: Spanner, Zookeeper, HBase.
AP. Answers, possibly stale. Examples: Cassandra (tunable), Dynamo, DNS.

Salesforce angle. Writes to a Salesforce org are CP (single primary per pod). Read replicas for reports and search can be AP — you'll see slightly stale data briefly after a write.

PACELC

Extends CAP: if Partition, choose A or C; Else, choose Latency or Consistency. Spanner is CP/EC (strong even during normal ops, paying latency). Dynamo is AP/EL (cheap latency, eventual normally). PACELC forces you to reason about the normal case, not just the partition.

Consistency models

Strongest to weakest:

Strict/Linearizable. Operations appear to happen in real-time order. Single system image.
Sequential. All nodes see the same order, not necessarily real-time.
Causal. Preserves happens-before relationships; concurrent writes may reorder.
Read-your-writes. A client sees its own writes.
Monotonic-read. Successive reads never go backwards.
Eventual. Converges if no new writes.

Interview followup. "Session consistency?" Combines read-your-writes + monotonic reads within a session. Achievable with a session token (LSN, GTID) forwarded to the DB.

Consensus

Paxos. Classic; hard to implement. Rarely used directly; Multi-Paxos for log replication.
Raft. Understandable alternative. Leader election via randomized timeouts and votes; log replication with majority quorum; commit when majority persists.
- Leader election. Node becomes candidate after election timeout, increments term, requests votes. Wins on majority.
- Log replication. Leader appends entries, replicates to followers, commits when majority ack. Followers overwrite on conflict.
- Safety. Only leaders with up-to-date logs can win elections.
Used in etcd, Consul, CockroachDB, TiKV.

Logical time

Lamport timestamp. Counter per node; on send send_ts = max(local, recv) + 1. Total order by (ts, node_id). Cannot detect concurrency.
Vector clock. [c1, c2, ..., cN], one per node. Can determine if A happens-before B, B happens-before A, or concurrent.
Hybrid Logical Clock (HLC). Physical time + logical counter. Close to real time but preserves causality. Used by CockroachDB, YugabyteDB.

When. Distributed DBs and event systems needing causal order without a central sequencer.

Leader election

ZooKeeper. Ephemeral sequential znodes; smallest sequence becomes leader. Watchers on predecessor for failover.
etcd. Lease + atomic compare-and-swap on a key.
Redis Sentinel. Quorum-based election among sentinels; failover the Redis primary.
Raft-based. Built-in, used in etcd, Consul.

Distributed locks

Redis Redlock. Lock across N Redis nodes, require majority. Martin Kleppmann criticized the fencing assumption; safer with fencing tokens (monotonically increasing token passed to the resource; resource rejects stale tokens).
ZooKeeper-based. Ephemeral sequential znode; smallest holds the lock. Automatic release on session expiry. Stronger than Redlock.
DB-based. SELECT ... FOR UPDATE on a sentinel row. Easy, bounded by DB contention.

Interview followup. "Why fencing tokens?" A lock holder can be paused (GC, stop-the-world), lock expires, another acquires, original wakes and writes. Fencing token lets the downstream reject the stale write.

Distributed rate limiting

Redis + Lua. Atomic multi-key ops. Token bucket: store tokens and last refill; Lua script refills and decrements atomically.
Token bucket. Smooth rate, allows bursts up to bucket size.
Sliding window log. Store request timestamps; count last N seconds. Precise, memory heavy.
Sliding window counter. Two counters (current/previous window); interpolate. Good balance.
Leaky bucket. Constant drain rate; queued requests. Smooths bursts.

Interview followup. "Per-user or global?" Usually both: per-tenant quota + global safety. Salesforce governor limits are per-tenant and per-transaction.

Circuit breaker

States: Closed (normal), Open (fail fast after failure threshold), Half-open (after cooldown, allow a trial request).

Parameters: failure rate threshold, minimum requests, open duration, half-open trial count. Libraries: Resilience4j (Java), Hystrix (legacy), Polly (.NET).

When. Calling a dependency that may go down; protect yourself from slow failures blowing your thread pool.

Bulkhead pattern

Isolate resources so one failure doesn't drown the system. Per-tenant or per-dependency thread pools, connection pools, or queue quotas. If tenant A's provider is slow, tenant B's requests still flow.

Salesforce angle. Governor limits are effectively a bulkhead: CPU time, heap, and query counts are capped per transaction per tenant. One customer's bad loop can't starve another.

Retries

Exponential backoff. delay = base * 2^attempt, capped.
Jitter. Add randomness; full jitter (rand(0, delay)) is often best.
Retry budget. Don't retry forever; cap as a fraction of normal traffic.
Idempotency required. Every retry assumes idempotent targets, otherwise you double-charge.

Section 7 — Observability

"If it's not measured, it's broken in production" — expect to be asked how you'd debug a latency spike.

Logs, metrics, traces

Logs. Discrete events with context. Expensive per-event; index selectively. Structured (JSON) so you can query.
Metrics. Aggregates over time; cheap and queryable. Counters (always increase), gauges (point-in-time), histograms (distribution), summaries (client-side percentiles).
Traces. A single request's journey across services. Invaluable for latency attribution.

Use logs for cause, metrics for trend, traces for flow.

Structured logging

Emit JSON with stable fields: ts, level, service, tenantId, traceId, userId, message. Makes grep trivial and feeds log analytics.

Correlation ID. A request ID propagated through all downstream calls (in HTTP headers, gRPC metadata, Kafka headers). Stitches logs across services.

Distributed tracing

W3C Trace Context. traceparent: 00-{trace-id}-{span-id}-{flags}. Standardized header; OpenTelemetry emits it by default.
Spans. One per operation; parent/child forms the tree. Attributes, events, status.
Sampling. Head-based (decide at entry) or tail-based (sample slow/errors). Full sampling on errors, low sample on happy path.

Prometheus + Grafana

Counter — monotonic; rate with rate().
Gauge — up/down; memory, queue depth.
Histogram — bucketed counts; compute percentiles with histogram_quantile.
Summary — client-side quantiles; cheaper to read but can't aggregate across instances.

Golden signals (Google SRE). Latency, traffic, errors, saturation.

SLI / SLO / SLA

SLI (Indicator). What you measure — e.g., "fraction of HTTP 2xx responses."
SLO (Objective). Internal target — "99.9% of requests succeed over 30 days."
SLA (Agreement). External contract with consequences — "99.9% or we refund."

Error budget. 1 - SLO. If SLO is 99.9%, you can be down 43.2 min/month. Spend the budget on velocity (risky deploys) when healthy; freeze when exhausted.

Section 8 — Security

SaaS is security-critical. Salesforce enforces tenant isolation at every layer; expect probing questions.

AuthN vs AuthZ

Authentication. Who are you? Credentials → identity.
Authorization. What can you do? Identity → permissions.

Separate layers; blur them at your peril.

JWT

Structure: base64url(header).base64url(payload).base64url(signature). Claims: iss, sub, aud, exp, iat, nbf, jti, plus custom.

HS256. Symmetric HMAC with shared secret. Simpler, fine for same-service.
RS256 / ES256. Asymmetric. Public key verifies; private key signs. Required when clients should verify without holding signing key.

Revocation. JWT is stateless, which means revocation is hard. Options:

Short-lived access tokens (5–15 min) + refresh tokens.
jti denylist with TTL = token lifetime.
Token introspection endpoint (stateful — negates the statelessness win).

Interview followup. "Access vs refresh tokens?" Access: short-lived, sent on every request. Refresh: long-lived, sent only to auth server to mint a new access token. Refresh rotation (one-time-use) catches theft.

OAuth 2.0 and OIDC

OAuth 2.0 — delegated authorization.
OIDC — identity layer on top; adds the id_token (JWT with user info).

Flows:

Authorization code + PKCE. For public clients (SPAs, mobile). PKCE prevents code interception by tying the code to a verifier only the caller knows.
Client credentials. Machine-to-machine.
Device code. TVs, CLI.
Avoid Implicit (deprecated) and Resource owner password (don't collect passwords).

Tokens. id_token (identity, JWT), access_token (authz, opaque or JWT), refresh_token (mint new access tokens).

mTLS

Both sides present X.509 certs. Used for service mesh (Istio, Linkerd, Consul Connect) — every hop authenticated by cert. Rotate certs frequently; SPIFFE/SPIRE automates.

Secrets management

HashiCorp Vault. Central secret store; dynamic secrets (short-lived DB creds), transit encryption, PKI.
AWS KMS / Secrets Manager. Managed; tight IAM integration; automatic rotation.
Rotation. Automated; services fetch on startup or subscribe to updates. Avoid env vars for long-lived secrets.

RBAC vs ABAC

RBAC. Role → permissions; user → roles. Simple, scales to hundreds of roles.
ABAC. Policy evaluates attributes: subject (role, department), resource (owner, classification), action, environment (time, IP). Expressive; harder to audit. Policies in Rego (OPA) or XACML.

Salesforce angle. Profiles and permission sets are RBAC. Record-level security (owner, role hierarchy, sharing rules) is ABAC-ish — it's policy over row attributes.

Row-level security

Every read/write must carry tenantId. Enforcement options:

App layer. Repository always injects WHERE tenant_id = :t. Easy to bypass if someone writes raw SQL.
DB RLS. Postgres policies enforce at the DB. Cannot be bypassed by app bugs. Connection sets SET app.current_tenant = :t; policy filters on it.
Per-tenant schemas/DBs. Physical isolation.

Interview followup. "How do you prevent a missing tenant_id predicate from leaking data?" DB-level RLS + lint rules that reject raw SQL outside the repository layer + pen tests with a cross-tenant query.

OWASP Top 10 highlights

Injection (SQL/NoSQL/OS). Always parameterize. Never string-concatenate query fragments from input. ORM protects if used correctly.
XSS. Escape output by context (HTML, attribute, JS, URL). Content Security Policy as defense-in-depth.
CSRF. State-changing requests need anti-CSRF token or SameSite cookies. JSON APIs with no cookies are less vulnerable but check origin.
SSRF. Server fetches a URL the attacker controls; blocks metadata endpoint (169.254.169.254) and private IP ranges. Allowlist destinations.
Insecure deserialization. Don't deserialize untrusted data. Whitelist types.
Secrets in logs. Never. Redact cards, tokens, passwords at the logger.

Section 9 — Scaling patterns

Horizontal vs vertical

Vertical (scale-up). Bigger box. Simple, limited by hardware, single failure domain.
Horizontal (scale-out). More boxes. Requires statelessness or a coordination layer. Linear scaling if done right.

Default horizontal for stateless services; vertical for single-node DBs until you must shard.

Auto-scaling

Reactive. Scale on CPU, request rate, queue depth. Lags spikes; overshoots.
Predictive. ML on historical load; scale ahead of known patterns (9am Monday).
Scheduled. Manually scale up for known events (Black Friday).

Cooldowns to prevent flapping; scale up fast, scale down slow.

CDN

Edge caching at PoPs close to users. Caches by URL + Vary headers. Purge strategies:

Soft purge. Mark stale; revalidate on next request. Cheap.
Hard purge. Remove now. Expensive; use for incidents.
Tag-based purge. Surrogate-Key or Cache-Tag headers; purge all assets with a tag.

Cache key. URL + headers in Vary (e.g., Vary: Accept-Language). Beware Cookie in Vary — essentially uncacheable.

Database read replicas

Easy read scaling; offload reports and search.
Replication lag. ms to minutes. Writes + immediate reads must go to primary.
Read-your-writes. Session stickiness, LSN tokens, or read from primary for a bounded window after write.

Hot partition detection

Symptom: one partition CPU/IO near 100%, others idle. Fix:

Re-shard — rare and expensive.
Salt hot keys — add a random prefix to split one logical key into N.
Dedicated shard — move the big tenant to its own partition.
Caching layer — absorb reads before they hit the hot partition.

Detect via per-partition metrics, latency P99 by partition, query plans with partition pruning.

Section 10 — Architectural patterns

Architecture questions escalate in R4/R5. Know these names and their failure modes.

CQRS (Command Query Responsibility Segregation)

Split write model (commands, normalized, transactional) from read model (queries, denormalized, fast).

When. Read/write workloads diverge dramatically; reporting needs shapes the transactional model can't serve cheaply.

Pitfalls. Eventual consistency between command and query sides; operational complexity (two models to maintain); stale reads visible to users (design UX for it).

Salesforce angle. List views and reports query a denormalized projection; writes hit the normalized transactional tables. CDC/materialized views bridge the two.

Event sourcing

State = replay of immutable events. The log is the source of truth; current state is a projection.

Benefits. Full audit, time-travel, rebuild projections, natural fit with CQRS.

Pitfalls. Schema evolution (event versioning is permanent), snapshots to avoid replaying millions of events, harder to query current state directly.

When NOT. Simple CRUD with no audit demands. The complexity tax isn't worth it.

Saga

Long-running workflow as a sequence of local transactions with compensating actions.

Orchestration. A central saga coordinator invokes each step and compensates on failure. Pros: flow is explicit, easy to debug. Cons: coordinator is central (needs HA).
Choreography. Services publish events, peers react. Pros: loose coupling. Cons: the flow is implicit and debugging is a nightmare at scale.

Always design compensating transactions carefully: they're rarely the exact inverse (you can't "uncharge" a credit card silently; you issue a refund).

Outbox pattern

Within the DB transaction that changes state, also insert a row into an outbox table. A separate process (polling query or CDC on the outbox table) publishes to the message bus and deletes/marks the row. Guarantees exactly the events that correspond to committed state changes get published — no dual-write race.

When. Any time a service must publish events on state changes. Standard pattern for microservices with a transactional core.

Strangler fig

Gradually replace a legacy system by routing specific endpoints to the new system behind a facade. Old system shrinks over time and eventually gets strangled (removed).

When. Large legacy migrations where big-bang rewrites are too risky.

Backends for Frontends (BFF)

One backend per client type (web, iOS, Android). Each BFF aggregates downstream services and shapes responses for its client.

When. Clients differ in data needs, screen sizes, latency budgets. Avoids the monolith API that pleases no one.

Pitfalls. Duplicated logic across BFFs; discipline to keep business logic in domain services, not BFFs.

Section 11 — Salesforce-specific patterns

These patterns are the ones interviewers use to tell platform-aware SMTSes from generic backend engineers. Even if you haven't written Apex, speak the vocabulary.

Multi-tenancy at every layer

tenantId (org_id) flows from request through every layer:

Request. Auth layer extracts org_id from JWT/session, attaches to request context.
Service. Every domain method takes OrgContext or reads from a ThreadLocal. Never a bare id.
DB. Every query filters on org_id. Enforced via repository layer + DB RLS as defense-in-depth.
Cache. Every key prefixed with org_id. No global keys except true app config.
Async jobs. Job payload includes org_id; executors re-establish OrgContext before running.
Logs and metrics. Every structured log has org_id. Metrics tagged with org_id where cardinality permits (or bucket by tier).

Interview followup. "Show me the code path where org_id could leak." Missing predicate on raw JDBC, a shared cache key, a cron job that forgets to restore context. Your lint rules and RLS must assume engineers will make this mistake.

Governor limits philosophy

Salesforce caps per-transaction resources (CPU ms, heap, SOQL queries, DML rows, callouts). The philosophy: fairness over raw performance. A runaway transaction should be killed before it noisy-neighbors the pod.

In your own service design: budget per request (timeouts on every downstream, max rows processed per call, max memory). Surface the limit clearly (429 with Retry-After; error telling the caller what limit they hit).

Async job patterns

Apex patterns and Java analogs:

Apex	Java analog	Use
`@future`	`CompletableFuture.runAsync` / `ExecutorService`	Fire-and-forget async
Batch Apex	Chunked `ExecutorService` loop, or Spring Batch	Process millions of records in chunks of 200
Queueable	`BlockingQueue` + worker pool	Enqueue work with chained follow-ups
Scheduled Apex	Quartz, `ScheduledExecutorService`	Cron-style
Platform Events	Kafka topic + consumer group	Pub/sub across services and triggers

Interview followup. "Design a Salesforce-style batch job in pure Java." ExecutorService with a bounded queue, chunk iteration of 200 IDs, per-chunk transaction, progress persisted to a job_state table, idempotent on retry. Governor-limit analogs: per-chunk CPU budget, per-job row cap, per-tenant quota.

Bulkification

Process records in batches. The cardinal sin is a per-record network/DB call inside a loop.

Batch size: 200 is the Salesforce default; pick yours based on payload size and downstream limits.
Fail atomically per chunk, not per record, if you can (easier retry).
For idempotency, include batch ID + record ID in the idempotency key.

Example. 50k records to sync to an external system. Bad: 50k HTTP calls. Good: chunks of 200, one bulk API call per chunk, DLQ for chunks that fail after retries.

Platform Events analog

Event bus semantics:

At-least-once delivery.
Per-org ordering.
Retention (Salesforce: hours to days depending on type).
Subscribers replay from an offset.

In your own Java service: Kafka topic per event type, key by org_id for ordering, Outbox pattern for publishing, idempotent consumers keyed by event ID.

Final checklist for the interview

Before each system-design or fundamentals follow-up, mentally walk this list:

Tenant isolation. Where does org_id go? Where could it leak?
Consistency stance. CP or AP? Strong or eventual? Why is that acceptable?
Concurrency model. Threads, actors, async? What's the contention hotspot?
Failure modes. What happens when this dependency is slow, down, or partitioned? Circuit breaker? DLQ? Retry budget?
Idempotency. What's the idempotency key? How long do you remember it?
Observability. What do you log, meter, and trace? What does the alarm look like?
Scale story. How does it behave at 1x, 100x, 10000x tenants?
Security. AuthN, AuthZ, row-level security. Where's the blast radius?
Bulkification. Am I doing one-at-a-time where I should batch?
Governor-style limits. What protects other tenants from this tenant's worst day?

If you can hit these ten in every answer, you'll sound like an SMTS Salesforce hires.

Backend Fundamentals — Salesforce SMTS ​

Quick reference cheat sheet ​

How to deploy this in Salesforce interviews ​

Section 1 — Databases ​

ACID ​

Isolation levels ​

Indexes ​

Query optimization ​

Normalization and when to denormalize ​

SQL vs NoSQL decision tree ​

NoSQL types ​

Sharding ​

Replication ​

Multi-tenant DB patterns ​

Partitioning ​

Distributed transactions ​

CDC (Change Data Capture) ​

Section 2 — Caching ​

Cache patterns ​

Eviction policies ​

Consistency ​

Thundering herd / cache stampede ​

Cache invalidation strategies ​

Distributed caches ​

Tenant-aware caching ​

Section 3 — Concurrency ​

Primitives ​

Java-specific toolbox ​

Race conditions ​

Deadlock ​

Livelock and starvation ​

Actor model ​

Java Memory Model ​

Structured concurrency (Java 21 preview) ​

Key Java concurrency idiom — counter ​

Read-write scenarios ​

Section 4 — Messaging and queues ​

Kafka ​

RabbitMQ ​

Kafka vs RabbitMQ ​

Delivery semantics ​

Idempotency ​

DLQ and retry ​

Order guarantees ​

Section 5 — Networking ​

HTTP methods ​

Status codes ​

HTTP/1.1 vs HTTP/2 vs HTTP/3 ​

REST vs GraphQL vs gRPC ​

WebSocket ​

gRPC ​

DNS ​

TLS ​

Load balancers ​

Section 6 — Distributed systems primitives ​

CAP theorem ​

PACELC ​

Consistency models ​

Consensus ​

Logical time ​

Leader election ​

Distributed locks ​

Distributed rate limiting ​

Circuit breaker ​

Bulkhead pattern ​

Retries ​

Section 7 — Observability ​

Logs, metrics, traces ​

Structured logging ​

Distributed tracing ​

Prometheus + Grafana ​

SLI / SLO / SLA ​

Section 8 — Security ​

AuthN vs AuthZ ​

JWT ​

OAuth 2.0 and OIDC ​

mTLS ​

Secrets management ​

RBAC vs ABAC ​

Row-level security ​