HLD: Multi-Tenant CRM Dashboard β
Understanding the Problem β
What is a Multi-Tenant CRM Dashboard? β
A multi-tenant CRM dashboard is a platform where each customer org sees its own leads, opportunities, accounts, and custom objects through configurable widgets (counters, charts, tables) with role-based access. Think of a simplified Salesforce Lightning Dashboard platform. The core design challenge is that every org defines its own schema (custom objects and fields) without a platform deploy, while you still deliver sub-second dashboard loads with strict data isolation. If you cannot design this, you cannot design Salesforce Core β so this problem tests your grasp of Salesforce's actual architecture more than any other.
Functional Requirements β
Core (above the line):
- Record CRUD β users in an org can view, create, update, and query their records (Accounts, Opportunities, Leads, custom objects).
- Per-org schema customization β admins define custom objects (e.g.,
Project__c) and custom fields on standard objects (Account.Industry__c) without any deploy. - Dashboards with widgets β composed of counters, charts, tables. Widgets are configurable by end users at render time.
- Role-based access β users see only records they are permitted to via roles, sharing rules, and permission sets.
- Aggregations at scale β counters, top-N, trends over potentially millions of records per org, returned in sub-second time.
Below the line (out of scope):
- Mobile-responsive rendering (UI concern, separate project).
- Embedded AI predictions (Einstein, separate ML service).
- Complex cross-org analytics (Data Cloud handles this).
- Custom code execution (Apex / Flow triggers are a separate runtime).
Non-Functional Requirements β
Core:
- Scale: 100k orgs, average 200 users per org, median 5M records per org, max org 1B records. 50k concurrent dashboard viewers.
- Latency: dashboard load p95 < 1.5 s, widget refresh p95 < 500 ms.
- Consistency: eventual for aggregations (5-30 s staleness acceptable). Strong read-your-writes for record CRUD (user saves a record and re-reads it β must see their change).
- Multi-tenancy: absolute data isolation. No cross-tenant leakage ever. Per-org schema customization without physical tables per org.
Below the line:
- Global cross-org queries (operator-only; out of scope).
- Real-time streaming widgets (polling at 30s is fine).
Capacity Estimation β
- Writes: 10k records/s across the fleet at peak.
- Reads: 100k QPS for dashboards and record lookups.
- Storage: 100k orgs Γ 5M records Γ 2 KB = 1 PB raw. With 3x replication and indexes, ~4-5 PB total.
- Dashboard fan-out: 6 widgets Γ 50k concurrent viewers = 300k queries/s at peak, many of them aggregations over millions of rows. Impossible to serve from OLTP directly β must pre-compute.
- Metadata storage: each org has ~50 standard + 100 custom objects with ~20 fields each β 150k object definitions + 3M field definitions. Fits in RAM.
The Set Up β
Core Entities β
- Organization β
orgId, tier, region, feature flags. - ObjectDefinition β
(orgId, objectApiName), list of fields, relationships. - FieldDefinition β
(orgId, objectApiName, fieldApiName), type (string/number/date/reference/picklist), required, defaultValue. - Record β
(orgId, objectApiName, recordId), key-value map offieldValues,ownerId, timestamps. - User, Role, SharingRule, Permission β the RBAC graph.
- Dashboard, Widget β
(orgId, dashboardId)with a list of widgets; each widget has a layout and a query spec.
The API β
REST with tenant-scoped URLs, plus a SOQL-style query endpoint (mimicking Salesforce).
Record operations:
POST /v1/orgs/{orgId}/sobjects/{objectApiName}
GET /v1/orgs/{orgId}/sobjects/{objectApiName}/{id}
PATCH /v1/orgs/{orgId}/sobjects/{objectApiName}/{id}
POST /v1/orgs/{orgId}/query
Body: { "soql": "SELECT Id, Name, Industry__c FROM Account WHERE CreatedDate > LAST_WEEK" }Metadata operations (admin only):
POST /v1/orgs/{orgId}/metadata/objects
POST /v1/orgs/{orgId}/metadata/fieldsDashboards and widgets:
GET /v1/orgs/{orgId}/dashboards/{dashId}
POST /v1/orgs/{orgId}/dashboards/{dashId}/widgets
GET /v1/orgs/{orgId}/widgets/{widgetId}/data (returns materialized result)Return a 202 + job token when a query exceeds the p95 budget and must run async.
High-Level Design β
Architecture β
ββββββββ βββββββββββ βββββββββββββββββββββ ββββββββββββββββββ
βClientββββΆβGateway ββββΆβ Query Planner ββββΆ β Metadata Cache β
ββββββββ β(authz, β β (parse SOQL β β β (objects, β
β RLS β β physical plan) β β fields, shares)β
β ctx) β βββββββββββ¬ββββββββββ ββββββββββββββββββ
ββββββββββ β
βΌ
ββββββββββββββββββββββββββββββ
β Storage Router β
β (org β shard β engine) β
ββββ¬βββββββββββ¬βββββββββββββββ
βΌ βΌ
βββββββββββββ βββββββββββββββ
β OLTP Shardβ β OLAP Engine β
β (Postgres β β (Clickhouse β
β per 1k β β or Druid β
β orgs) β β per region)β
βββββββββββββ βββββββββββββββ
Widget Cache βββ materialized aggregates per (org, widget)
CDC stream ββββΆ OLAP / Search Index / Cache invalidationEnd-to-end flow: a user opens a dashboard β
- User navigates to
/dashboards/{dashId}. Client fetches the dashboard definition (static JSON with a list of widget IDs). - For each widget, client issues
GET /widgets/{widgetId}/data. - Widget Cache (Redis) holds pre-computed results keyed by
(orgId, widgetId, filterHash). Cache hit β return in < 30 ms. - On cache miss, the Query Planner parses the widget's SOQL-equivalent into a physical plan, consulting the Metadata Cache (in-memory LRU keyed by
(orgId, metadataVersion)) to translateIndustry__cinto its physical column in the universal data model. - Storage Router decides OLTP vs OLAP:
- Aggregations (COUNT, SUM, GROUP BY) β OLAP engine (Clickhouse/Druid).
- Single-record lookups or filtered lists β OLTP (Postgres shard assigned to this org).
- Query executes with
WHERE org_id = ?injected automatically. Sharing rules from therecord_accesstable are joined in. - Result returned to client; cached in Widget Cache with 30 s TTL + jitter.
End-to-end flow: a user updates a record β
- Client does
PATCH /sobjects/Account/{id}. - Gateway validates JWT, extracts
orgIdanduserId. - Query Planner checks metadata: does this object exist in this org? Are the fields valid?
- Permission check: does this user have edit permission on this record (via
record_access)? - Update is written to the universal data model in Postgres:
UPDATE records SET fields = jsonb_set(fields, '{industry}', 'SaaS') WHERE org_id = ? AND record_id = ?. - Postgres WAL β Debezium β Kafka topic
cdc.records. - Downstream consumers: OLAP loader (Clickhouse), search indexer (Elasticsearch), widget-cache invalidator (publishes to Redis to purge affected widget keys).
- Response returns to client in < 200 ms. Widget caches refresh within ~5 s.
Data model β the core decision β
There are three ways to model custom fields. This is the most-discussed design decision in a Salesforce HLD:
| Approach | Description | Pros | Cons |
|---|---|---|---|
| EAV (Entity-Attribute-Value) | Rows like (record_id, field_id, value). | Unlimited fields, no DDL. | Joins expensive; bad for aggregations. |
| Physical DDL per custom field | ALTER TABLE account ADD COLUMN custom_x VARCHAR. | Best query performance. | DDL storms, lock contention, schema explosion (millions of columns). |
| Universal data model with JSONB | Fixed records table; fields stored in a JSONB column; metadata catalog maps fieldApiName β physical slot. | Salesforce's actual approach. No DDL. Indexable via GIN or expression indexes. | Complexity in query translation. |
Pick the universal data model. It matches Salesforce reality and scales. Schema sketch in Postgres:
CREATE TABLE records (
org_id BIGINT NOT NULL,
object_id INT NOT NULL, -- FK to object_definitions
record_id UUID NOT NULL,
fields JSONB NOT NULL, -- { "name": "Acme", "industry__c": "SaaS" }
owner_id UUID,
created_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ,
PRIMARY KEY (org_id, object_id, record_id)
) PARTITION BY HASH (org_id);
CREATE INDEX idx_records_org_obj_updated
ON records (org_id, object_id, updated_at DESC);
-- Per-field indexes get created on demand for hot fields:
CREATE INDEX idx_records_industry
ON records (org_id, object_id, (fields->>'industry__c'))
WHERE object_id = 17;Multi-Tenancy Strategy β
This is the problem where Salesforce interviewers grade multi-tenancy hardest. Nail it.
Isolation level: L1 shared DB + shared schema, with org_id as the first column of every PK and every index. Chose L1 because 100k orgs Γ per-org cluster is infeasible on cost. Mega-tenants (> 50M records) get promoted to a dedicated shard (L2-ish hybrid).
Tenant context flow:
orgIdextracted from JWT at the gateway, stamped into aTenantContextobject.- Every request carries it via thread-local / async context, propagated to:
- RPC calls (
X-Org-Idheader). - Kafka message keys for CDC ordering.
- Log MDC for grep by org.
- Prometheus labels for per-org dashboards.
- RPC calls (
- No raw DB access. All DB calls go through a Tenant-Aware DAO that injects
WHERE org_id = :ctx.orgId. Reviewers look for this.
Row-level security (RLS):
- Every query is wrapped by a server-side context that appends
WHERE org_id = :ctx.orgId AND record_id IN (SELECT record_id FROM record_access WHERE principal_id IN :ctx.principals). - The
record_accessmaterialized table (described in the deep dives) pre-computes sharing. - Enforced by the query planner, not the caller. Even a buggy app cannot bypass.
Routing layer:
orgId β shardlookup via a directory service (consistent hash with a rebalancer). Each shard holds ~1k orgs by default.- Mega-tenants (Fortune 100 orgs with hundreds of millions of records) get a dedicated shard β their data lives alone, preserving noisy-neighbor isolation at the DB level.
- Directory is cached in-process with 1-min TTL + CDC invalidation.
Noisy-neighbor mitigations:
- Per-org query time budget: small queries 50 ms, large queries 500 ms. Planner-level enforcement. Queries exceeding budget get terminated and converted to async jobs with a job token returned to the client.
- Per-org CPU / IOPS quotas at the Postgres level via cgroups on the DB pods.
- Shuffle sharding at the dashboard cache layer: each org maps to
k=8ofn=100cache pods. - Query concurrency cap per org: max 10 concurrent dashboard queries per org; beyond that, 429.
- Bulkhead thread pools per org tier in the query service; Enterprise gets wider pools than Essentials.
Per-tenant observability:
- Dashboards per org: query count, p95 latency, rows scanned, CPU time, storage.
- Alerts when an org approaches its query-time budget consistently (indicates bad schema / missing indexes).
- Metadata change audit per org.
Potential Deep Dives β
1) How do we support per-org schema flexibility (custom objects and fields)? β
Bad Solution: Per-tenant DDL.
- Approach: Every new custom field triggers
ALTER TABLE. - Challenges: Lock storms on shared tables.
pg_classbloat from millions of columns across 100k orgs. Online DDL tooling breaks at this scale. Cross-org deploys take hours.
Good Solution: Entity-Attribute-Value (EAV).
- Approach: Rows of
(record_id, field_id, value). Every field is a row; the record is reconstructed by joining. - Challenges: Joins kill dashboard aggregations. Aggregating "SUM(Amount) GROUP BY Stage" requires joining two EAV rows per record, then grouping. Terrible performance at 5M records.
Great Solution: Universal data model with JSONB + metadata-driven query rewrite + opt-in expression indexes.
- Approach:
- All records in one
recordstable with a JSONBfieldscolumn. Partition byHASH(org_id). - Metadata catalog (
object_definitions,field_definitions) mapsfieldApiName β JSON path + type + indexing policy. - Query planner reads metadata and rewrites
SELECT Industry__c FROM Account WHERE CreatedDate > ...intoSELECT fields->>'industry__c' FROM records WHERE org_id = ? AND object_id = 17 AND created_at > ?. - Opt-in expression indexes: when a field becomes "hot" (many queries filter on it), the platform creates a concurrent expression index on
(org_id, object_id, (fields->>'field_name')). Automated index advisor service suggests and creates them. - Mirrors what Salesforce Core does: metadata-driven storage with "skinny table" optimization for frequently-accessed field sets.
- All records in one
- Challenges: JSONB is less efficient than native columns for full scans. Expression indexes double write amplification. Query planner complexity is real β translating SOQL subqueries, aggregations, and sharing rules into Postgres-native SQL is a non-trivial piece of software.
2) How do we enforce row-level security at scale? β
Bad Solution: Application-layer ACL after fetching.
- Approach: Fetch N records from DB, check ACL in app code, return only allowed ones.
- Challenges: Pagination breaks (you asked for 50, got 50, but only 30 are visible). Risk of leakage if a code path skips the check. Fetches more than needed β wastes DB.
Good Solution: WHERE-clause push-down.
- Approach: Every query is rewritten to include
AND (owner_id = :user OR owner_id IN (:user_teams) OR record_id IN (:shared_to_user)). Enforced by the query planner, not the caller. - Challenges: Joining per-query against sharing rules is expensive. Sharing rule graph can be deep (Role β Role hierarchy β Permission Set β Sharing Rule) β resolving at query time is slow.
Great Solution: Pre-materialized sharing table (Salesforce's "Record Access").
- Approach:
- A background job computes the full
record_accesstable:(org_id, record_id, principal_id)for every record Γ every principal that can see it. Updated via CDC whenever roles, sharing rules, or record ownership change. - Dashboard queries join:
SELECT ... FROM records JOIN record_access USING(record_id) WHERE principal_id IN (:user, :user_teams, :user_roles, 'Everyone'). - Idempotent re-computation: when a sharing rule changes, only affected records are reprocessed. For a big org, reprocessing takes minutes but runs async without blocking reads.
- Hot path reads are simple joins with proper indexes β fast.
- A background job computes the full
- Challenges: Record access table can be huge β a public record in a 10k-user org creates 10k rows. Mitigate with a synthetic
Everyoneprincipal for truly public records. Re-materialization on sharing rule changes is a background workload you must capacity-plan for.
3) How do we serve dashboard aggregations at 300k QPS? β
Bad Solution: Live aggregations on OLTP.
- Approach: Every widget runs
SELECT COUNT(*), SUM(amount) FROM records WHERE ...on render. - Challenges: At 300k QPS with aggregations over millions of rows each, OLTP melts. Postgres is optimized for point reads, not analytical scans.
Good Solution: CDC β OLAP, cache results.
- Approach: Postgres WAL β Debezium β Kafka β Clickhouse / Druid. Dashboards query the OLAP store. Widget cache with 30 s TTL on results.
- Challenges: OLAP freshness lag ~5-30 s. Acceptable for most widgets. Cache invalidation is still hard β when an underlying record changes, all widgets that aggregate it must invalidate.
Great Solution: Pre-computed materialized widget results with incremental updates.
- Approach:
- For every widget configured by a user, the platform registers a materialization job. Job subscribes to CDC events for the objects it aggregates over.
- On each CDC event, the job incrementally updates the aggregate:
count += 1,sum += new.amount - old.amount, etc. Stream-processing engine (Flink / Kafka Streams) maintains this per widget. - Widget cache key:
(orgId, widgetId, filterHash). Incrementally updated; near-real-time (< 5 s lag). - For rarely-used filters or ad-hoc queries, fall back to a live OLAP query with admission control and a per-org concurrency cap.
- Widgets can be tagged "real-time" (< 5 s) or "batch" (refresh every 5 min); batch widgets are cheaper and cover most use cases.
- Challenges: Materialization jobs consume real resources β you must have quotas on number of widgets per org. Incremental aggregation for complex operators (median, top-N with ties) is subtle β use approximate structures (HyperLogLog for distinct count, t-digest for quantiles). Cold-start for a new widget requires backfilling, which is expensive on large orgs.
4) How do we apply metadata changes (new fields, new objects) without downtime? β
Bad Solution: Lock the table during schema change.
- Approach:
ALTER TABLE records ... LOCK IN EXCLUSIVE MODE. - Challenges: Blocks all reads and writes on the shared table β affects every other org. Unacceptable in a multi-tenant system.
Good Solution: Metadata-only change.
- Approach: Adding a field is an insert into
field_definitionsand a metadata cache invalidation. No DDL. Existing rows naturally have NULL for the new field because it lives in JSONB. - Challenges: Queries on the new field are slow until an index is created. No enforcement of "required" on existing rows.
Great Solution: Metadata change + background backfill + concurrent index build + per-org feature flag.
- Approach:
- Metadata change is instant (insert + cache invalidate).
- Background backfill job populates default values for existing rows in batches, rate-limited by the org's backfill quota. Takes minutes to hours for a 100M-row org; runs in background.
- Expression index built
CREATE INDEX CONCURRENTLYβ does not lock reads/writes. - Per-org feature flag gates the rollout of the new field. If backfill or index build fails, flip the flag to hide the field without removing metadata.
- Quota on concurrent schema changes per org β prevents a misconfigured admin script from triggering 1000 ALTER operations.
- Challenges: Long-running backfills consume IOPS budget. Monitor and throttle. Concurrent index builds still take a long time on huge tables; plan for days on mega-tenants. Rollback of a partially-backfilled field requires care.
5) How do we handle mega-tenant isolation? β
Bad Solution: Force all orgs onto shared shards.
- Approach: Every org is on a shared shard holding ~1k orgs.
- Challenges: A single mega-tenant with 1B records can saturate a shard's IOPS, cache, and disk. Other 999 orgs on that shard get degraded performance.
Good Solution: Shard rebalance based on size.
- Approach: Periodically rebalance: move big orgs to dedicated shards; consolidate small orgs.
- Challenges: Rebalancing is hours-long and risky. "Big" is fuzzy β an org with 10M rows but hot query patterns can hurt worse than one with 100M rows mostly cold.
Great Solution: Tiered shard assignment + proactive promotion + per-shard resource caps.
- Approach:
- Shared shards for small/medium orgs (up to 50M records).
- Dedicated shards for mega-tenants, provisioned per-org based on size, tier, or contractual SLA.
- Hot-spot detector monitors per-org IOPS, query time, row scans. An org crossing a threshold (e.g., 20% of shard IOPS for > 1 hour) is flagged for promotion.
- Promotion is a live migration: new shard spun up, data copied via CDC, traffic switched with a brief writes-paused window (seconds).
- Per-shard resource caps enforce that any single org cannot consume more than, say, 50% of a shared shard's capacity. Excess queries queue or 429.
- Challenges: Live migration requires robust tooling β CDC + read-your-writes consistency during cutover is non-trivial. Dedicated shards cost more; tie to tier pricing. Promotion decisions benefit from ML over time (predicting growth) rather than reactive thresholds.
What is Expected at Each Level? β
Mid-level (SMTS-junior) β
Multi-tenant tables with org_id as a column (not just PK prefix β but the PK prefix is bonus). Basic RBAC. Metadata cache for object definitions. Can be prompted on EAV vs JSONB vs DDL tradeoffs.
Senior (SMTS / LMTS) β
Universal data model with JSONB. Push-down RLS with pre-materialized record access table. CDC β OLAP for dashboards. Widget caching with TTL. Concrete capacity math. Metadata changes as online operations.
Staff+ (PMTS) β
Mega-tenant isolation strategy (tiered shards + promotion). Online metadata evolution with backfill quotas. Cross-region consistency for geo-distributed orgs. Per-org query cost analysis (CPU per query, rows scanned per dashboard). Approximate aggregations (HLL, t-digest) for real-time widgets. Proactive index advisor service.
Salesforce-Specific Considerations β
- Direct analog: this is literally the Salesforce Multitenant Kernel (Force.com's Universal Data Dictionary). Use these terms: "pivot tables," "skinny tables," "field history tracking," "Record Access table."
- Governor limit parallel: mirror SOQL row limit (50,000 rows returned per transaction) and CPU time (10 s sync) at the query planner. Queries approaching these limits get converted to async jobs or rejected.
- Sharing model chain:
Organization β Profile β Permission Set β Sharing Rule β Record Access. The pre-materialized sharing table captures the output of this chain; mention it as you walk through the design. - Shield Field Audit Trail: every record change is logged for up to 10 years. The CDC stream to cold storage is the mechanism.
- Platform Cache: Salesforce's per-org in-memory cache maps to our Widget Cache + Metadata Cache. Both sit in Redis with per-org quotas.
- Hyperforce regional isolation: the storage router maps
orgId β region β shard. No cross-region joins on hot paths. - Metadata API: admins use the Metadata API to deploy custom objects and fields. That API lands on the same metadata catalog we described; our query planner just reads it.
Example snippet β tenant-aware query middleware β
public class TenantAwareDao {
private final JdbcTemplate jdbc;
public List<Record> find(TenantContext ctx, String objectApi,
String whereClause, Object... args) {
int objectId = metadataCache.objectId(ctx.orgId(), objectApi);
String sql = """
SELECT record_id, fields FROM records r
JOIN record_access a ON r.record_id = a.record_id
WHERE r.org_id = ? AND r.object_id = ?
AND a.principal_id = ANY(?)
AND %s
""".formatted(whereClause);
Object[] bound = prepend(new Object[]{
ctx.orgId(), objectId, ctx.principals().toArray()
}, args);
return jdbc.query(sql, bound, this::mapRow);
}
}std::vector<Record> TenantAwareDao::Find(const TenantContext& ctx,
const std::string& object_api,
const std::string& where_clause,
const ParamList& params) {
int object_id = metadata_cache_.ObjectId(ctx.org_id(), object_api);
const std::string sql =
"SELECT record_id, fields FROM records r "
"JOIN record_access a ON r.record_id = a.record_id "
"WHERE r.org_id = $1 AND r.object_id = $2 "
" AND a.principal_id = ANY($3) AND " + where_clause;
ParamList bound = {ctx.org_id(), object_id, ctx.principals()};
bound.Append(params);
return db_.Query<Record>(sql, bound);
}async function findRecords(
ctx: TenantContext,
objectApi: string,
whereClause: string,
params: unknown[]
): Promise<Record[]> {
const objectId = metadataCache.objectId(ctx.orgId, objectApi);
const sql = `
SELECT record_id, fields FROM records r
JOIN record_access a ON r.record_id = a.record_id
WHERE r.org_id = $1 AND r.object_id = $2
AND a.principal_id = ANY($3)
AND ${whereClause}
`;
return db.query<Record>(sql, [ctx.orgId, objectId, ctx.principals, ...params]);
}