HLD: Multi-Tenant CRM Dashboard

Understanding the Problem

What is a Multi-Tenant CRM Dashboard?

A multi-tenant CRM dashboard is a platform where each customer org sees its own leads, opportunities, accounts, and custom objects through configurable widgets (counters, charts, tables) with role-based access. Think of a simplified Salesforce Lightning Dashboard platform. The core design challenge is that every org defines its own schema (custom objects and fields) without a platform deploy, while you still deliver sub-second dashboard loads with strict data isolation. If you cannot design this, you cannot design Salesforce Core — so this problem tests your grasp of Salesforce's actual architecture more than any other.

Functional Requirements

Core (above the line):

Record CRUD — users in an org can view, create, update, and query their records (Accounts, Opportunities, Leads, custom objects).
Per-org schema customization — admins define custom objects (e.g., Project__c) and custom fields on standard objects (Account.Industry__c) without any deploy.
Dashboards with widgets — composed of counters, charts, tables. Widgets are configurable by end users at render time.
Role-based access — users see only records they are permitted to via roles, sharing rules, and permission sets.
Aggregations at scale — counters, top-N, trends over potentially millions of records per org, returned in sub-second time.

Below the line (out of scope):

Mobile-responsive rendering (UI concern, separate project).
Embedded AI predictions (Einstein, separate ML service).
Complex cross-org analytics (Data Cloud handles this).
Custom code execution (Apex / Flow triggers are a separate runtime).

Non-Functional Requirements

Core:

Scale: 100k orgs, average 200 users per org, median 5M records per org, max org 1B records. 50k concurrent dashboard viewers.
Latency: dashboard load p95 < 1.5 s, widget refresh p95 < 500 ms.
Consistency: eventual for aggregations (5-30 s staleness acceptable). Strong read-your-writes for record CRUD (user saves a record and re-reads it — must see their change).
Multi-tenancy: absolute data isolation. No cross-tenant leakage ever. Per-org schema customization without physical tables per org.

Below the line:

Global cross-org queries (operator-only; out of scope).
Real-time streaming widgets (polling at 30s is fine).

Capacity Estimation

Writes: 10k records/s across the fleet at peak.
Reads: 100k QPS for dashboards and record lookups.
Storage: 100k orgs × 5M records × 2 KB = 1 PB raw. With 3x replication and indexes, ~4-5 PB total.
Dashboard fan-out: 6 widgets × 50k concurrent viewers = 300k queries/s at peak, many of them aggregations over millions of rows. Impossible to serve from OLTP directly — must pre-compute.
Metadata storage: each org has ~50 standard + 100 custom objects with ~20 fields each → 150k object definitions + 3M field definitions. Fits in RAM.

The Set Up

Core Entities

Organization — orgId, tier, region, feature flags.
ObjectDefinition — (orgId, objectApiName), list of fields, relationships.
FieldDefinition — (orgId, objectApiName, fieldApiName), type (string/number/date/reference/picklist), required, defaultValue.
Record — (orgId, objectApiName, recordId), key-value map of fieldValues, ownerId, timestamps.
User, Role, SharingRule, Permission — the RBAC graph.
Dashboard, Widget — (orgId, dashboardId) with a list of widgets; each widget has a layout and a query spec.

The API

REST with tenant-scoped URLs, plus a SOQL-style query endpoint (mimicking Salesforce).

Record operations:

POST /v1/orgs/{orgId}/sobjects/{objectApiName}
GET  /v1/orgs/{orgId}/sobjects/{objectApiName}/{id}
PATCH /v1/orgs/{orgId}/sobjects/{objectApiName}/{id}

POST /v1/orgs/{orgId}/query
Body: { "soql": "SELECT Id, Name, Industry__c FROM Account WHERE CreatedDate > LAST_WEEK" }

Metadata operations (admin only):

POST /v1/orgs/{orgId}/metadata/objects
POST /v1/orgs/{orgId}/metadata/fields

Dashboards and widgets:

GET  /v1/orgs/{orgId}/dashboards/{dashId}
POST /v1/orgs/{orgId}/dashboards/{dashId}/widgets
GET  /v1/orgs/{orgId}/widgets/{widgetId}/data   (returns materialized result)

Return a 202 + job token when a query exceeds the p95 budget and must run async.

High-Level Design

Architecture

 ┌──────┐  ┌─────────┐   ┌───────────────────┐    ┌────────────────┐
 │Client│──▶│Gateway │──▶│ Query Planner     │──▶ │ Metadata Cache │
 └──────┘  │(authz, │   │ (parse SOQL →     │    │ (objects,      │
           │ RLS    │   │ physical plan)    │    │ fields, shares)│
           │ ctx)   │   └─────────┬─────────┘    └────────────────┘
           └────────┘             │
                                  ▼
                     ┌────────────────────────────┐
                     │ Storage Router             │
                     │  (org → shard → engine)    │
                     └──┬──────────┬──────────────┘
                        ▼          ▼
                ┌───────────┐  ┌─────────────┐
                │ OLTP Shard│  │ OLAP Engine │
                │ (Postgres │  │ (Clickhouse │
                │  per 1k   │  │ or Druid    │
                │   orgs)   │  │  per region)│
                └───────────┘  └─────────────┘

 Widget Cache  ◀── materialized aggregates per (org, widget)
 CDC stream   ───▶ OLAP / Search Index / Cache invalidation

End-to-end flow: a user opens a dashboard

User navigates to /dashboards/{dashId}. Client fetches the dashboard definition (static JSON with a list of widget IDs).
For each widget, client issues GET /widgets/{widgetId}/data.
Widget Cache (Redis) holds pre-computed results keyed by (orgId, widgetId, filterHash). Cache hit → return in < 30 ms.
On cache miss, the Query Planner parses the widget's SOQL-equivalent into a physical plan, consulting the Metadata Cache (in-memory LRU keyed by (orgId, metadataVersion)) to translate Industry__c into its physical column in the universal data model.
Storage Router decides OLTP vs OLAP:
- Aggregations (COUNT, SUM, GROUP BY) → OLAP engine (Clickhouse/Druid).
- Single-record lookups or filtered lists → OLTP (Postgres shard assigned to this org).
Query executes with WHERE org_id = ? injected automatically. Sharing rules from the record_access table are joined in.
Result returned to client; cached in Widget Cache with 30 s TTL + jitter.

End-to-end flow: a user updates a record

Client does PATCH /sobjects/Account/{id}.
Gateway validates JWT, extracts orgId and userId.
Query Planner checks metadata: does this object exist in this org? Are the fields valid?
Permission check: does this user have edit permission on this record (via record_access)?
Update is written to the universal data model in Postgres: UPDATE records SET fields = jsonb_set(fields, '{industry}', 'SaaS') WHERE org_id = ? AND record_id = ?.
Postgres WAL → Debezium → Kafka topic cdc.records.
Downstream consumers: OLAP loader (Clickhouse), search indexer (Elasticsearch), widget-cache invalidator (publishes to Redis to purge affected widget keys).
Response returns to client in < 200 ms. Widget caches refresh within ~5 s.

Data model — the core decision

There are three ways to model custom fields. This is the most-discussed design decision in a Salesforce HLD:

Approach	Description	Pros	Cons
EAV (Entity-Attribute-Value)	Rows like `(record_id, field_id, value)`.	Unlimited fields, no DDL.	Joins expensive; bad for aggregations.
Physical DDL per custom field	`ALTER TABLE account ADD COLUMN custom_x VARCHAR`.	Best query performance.	DDL storms, lock contention, schema explosion (millions of columns).
Universal data model with JSONB	Fixed `records` table; fields stored in a `JSONB` column; metadata catalog maps `fieldApiName → physical slot`.	Salesforce's actual approach. No DDL. Indexable via GIN or expression indexes.	Complexity in query translation.

Pick the universal data model. It matches Salesforce reality and scales. Schema sketch in Postgres:

sql

CREATE TABLE records (
  org_id        BIGINT NOT NULL,
  object_id     INT    NOT NULL,   -- FK to object_definitions
  record_id     UUID   NOT NULL,
  fields        JSONB  NOT NULL,   -- { "name": "Acme", "industry__c": "SaaS" }
  owner_id      UUID,
  created_at    TIMESTAMPTZ,
  updated_at    TIMESTAMPTZ,
  PRIMARY KEY (org_id, object_id, record_id)
) PARTITION BY HASH (org_id);

CREATE INDEX idx_records_org_obj_updated
  ON records (org_id, object_id, updated_at DESC);

-- Per-field indexes get created on demand for hot fields:
CREATE INDEX idx_records_industry
  ON records (org_id, object_id, (fields->>'industry__c'))
  WHERE object_id = 17;

Multi-Tenancy Strategy

This is the problem where Salesforce interviewers grade multi-tenancy hardest. Nail it.

Isolation level: L1 shared DB + shared schema, with org_id as the first column of every PK and every index. Chose L1 because 100k orgs × per-org cluster is infeasible on cost. Mega-tenants (> 50M records) get promoted to a dedicated shard (L2-ish hybrid).

Tenant context flow:

orgId extracted from JWT at the gateway, stamped into a TenantContext object.
Every request carries it via thread-local / async context, propagated to:
- RPC calls (X-Org-Id header).
- Kafka message keys for CDC ordering.
- Log MDC for grep by org.
- Prometheus labels for per-org dashboards.
No raw DB access. All DB calls go through a Tenant-Aware DAO that injects WHERE org_id = :ctx.orgId. Reviewers look for this.

Row-level security (RLS):

Every query is wrapped by a server-side context that appends WHERE org_id = :ctx.orgId AND record_id IN (SELECT record_id FROM record_access WHERE principal_id IN :ctx.principals).
The record_access materialized table (described in the deep dives) pre-computes sharing.
Enforced by the query planner, not the caller. Even a buggy app cannot bypass.

Routing layer:

orgId → shard lookup via a directory service (consistent hash with a rebalancer). Each shard holds ~1k orgs by default.
Mega-tenants (Fortune 100 orgs with hundreds of millions of records) get a dedicated shard — their data lives alone, preserving noisy-neighbor isolation at the DB level.
Directory is cached in-process with 1-min TTL + CDC invalidation.

Noisy-neighbor mitigations:

Per-org query time budget: small queries 50 ms, large queries 500 ms. Planner-level enforcement. Queries exceeding budget get terminated and converted to async jobs with a job token returned to the client.
Per-org CPU / IOPS quotas at the Postgres level via cgroups on the DB pods.
Shuffle sharding at the dashboard cache layer: each org maps to k=8 of n=100 cache pods.
Query concurrency cap per org: max 10 concurrent dashboard queries per org; beyond that, 429.
Bulkhead thread pools per org tier in the query service; Enterprise gets wider pools than Essentials.

Per-tenant observability:

Dashboards per org: query count, p95 latency, rows scanned, CPU time, storage.
Alerts when an org approaches its query-time budget consistently (indicates bad schema / missing indexes).
Metadata change audit per org.

Potential Deep Dives

1) How do we support per-org schema flexibility (custom objects and fields)?

Bad Solution: Per-tenant DDL.

Approach: Every new custom field triggers ALTER TABLE.
Challenges: Lock storms on shared tables. pg_class bloat from millions of columns across 100k orgs. Online DDL tooling breaks at this scale. Cross-org deploys take hours.

Good Solution: Entity-Attribute-Value (EAV).

Approach: Rows of (record_id, field_id, value). Every field is a row; the record is reconstructed by joining.
Challenges: Joins kill dashboard aggregations. Aggregating "SUM(Amount) GROUP BY Stage" requires joining two EAV rows per record, then grouping. Terrible performance at 5M records.

Great Solution: Universal data model with JSONB + metadata-driven query rewrite + opt-in expression indexes.

Approach:
- All records in one records table with a JSONB fields column. Partition by HASH(org_id).
- Metadata catalog (object_definitions, field_definitions) maps fieldApiName → JSON path + type + indexing policy.
- Query planner reads metadata and rewrites SELECT Industry__c FROM Account WHERE CreatedDate > ... into SELECT fields->>'industry__c' FROM records WHERE org_id = ? AND object_id = 17 AND created_at > ?.
- Opt-in expression indexes: when a field becomes "hot" (many queries filter on it), the platform creates a concurrent expression index on (org_id, object_id, (fields->>'field_name')). Automated index advisor service suggests and creates them.
- Mirrors what Salesforce Core does: metadata-driven storage with "skinny table" optimization for frequently-accessed field sets.
Challenges: JSONB is less efficient than native columns for full scans. Expression indexes double write amplification. Query planner complexity is real — translating SOQL subqueries, aggregations, and sharing rules into Postgres-native SQL is a non-trivial piece of software.

2) How do we enforce row-level security at scale?

Bad Solution: Application-layer ACL after fetching.

Approach: Fetch N records from DB, check ACL in app code, return only allowed ones.
Challenges: Pagination breaks (you asked for 50, got 50, but only 30 are visible). Risk of leakage if a code path skips the check. Fetches more than needed — wastes DB.

Good Solution: WHERE-clause push-down.

Approach: Every query is rewritten to include AND (owner_id = :user OR owner_id IN (:user_teams) OR record_id IN (:shared_to_user)). Enforced by the query planner, not the caller.
Challenges: Joining per-query against sharing rules is expensive. Sharing rule graph can be deep (Role → Role hierarchy → Permission Set → Sharing Rule) — resolving at query time is slow.

Great Solution: Pre-materialized sharing table (Salesforce's "Record Access").

Approach:
- A background job computes the full record_access table: (org_id, record_id, principal_id) for every record × every principal that can see it. Updated via CDC whenever roles, sharing rules, or record ownership change.
- Dashboard queries join: SELECT ... FROM records JOIN record_access USING(record_id) WHERE principal_id IN (:user, :user_teams, :user_roles, 'Everyone').
- Idempotent re-computation: when a sharing rule changes, only affected records are reprocessed. For a big org, reprocessing takes minutes but runs async without blocking reads.
- Hot path reads are simple joins with proper indexes — fast.
Challenges: Record access table can be huge — a public record in a 10k-user org creates 10k rows. Mitigate with a synthetic Everyone principal for truly public records. Re-materialization on sharing rule changes is a background workload you must capacity-plan for.

3) How do we serve dashboard aggregations at 300k QPS?

Bad Solution: Live aggregations on OLTP.

Approach: Every widget runs SELECT COUNT(*), SUM(amount) FROM records WHERE ... on render.
Challenges: At 300k QPS with aggregations over millions of rows each, OLTP melts. Postgres is optimized for point reads, not analytical scans.

Good Solution: CDC → OLAP, cache results.

Approach: Postgres WAL → Debezium → Kafka → Clickhouse / Druid. Dashboards query the OLAP store. Widget cache with 30 s TTL on results.
Challenges: OLAP freshness lag ~5-30 s. Acceptable for most widgets. Cache invalidation is still hard — when an underlying record changes, all widgets that aggregate it must invalidate.

Great Solution: Pre-computed materialized widget results with incremental updates.

Approach:
- For every widget configured by a user, the platform registers a materialization job. Job subscribes to CDC events for the objects it aggregates over.
- On each CDC event, the job incrementally updates the aggregate: count += 1, sum += new.amount - old.amount, etc. Stream-processing engine (Flink / Kafka Streams) maintains this per widget.
- Widget cache key: (orgId, widgetId, filterHash). Incrementally updated; near-real-time (< 5 s lag).
- For rarely-used filters or ad-hoc queries, fall back to a live OLAP query with admission control and a per-org concurrency cap.
- Widgets can be tagged "real-time" (< 5 s) or "batch" (refresh every 5 min); batch widgets are cheaper and cover most use cases.
Challenges: Materialization jobs consume real resources — you must have quotas on number of widgets per org. Incremental aggregation for complex operators (median, top-N with ties) is subtle — use approximate structures (HyperLogLog for distinct count, t-digest for quantiles). Cold-start for a new widget requires backfilling, which is expensive on large orgs.

4) How do we apply metadata changes (new fields, new objects) without downtime?

Bad Solution: Lock the table during schema change.

Approach: ALTER TABLE records ... LOCK IN EXCLUSIVE MODE.
Challenges: Blocks all reads and writes on the shared table — affects every other org. Unacceptable in a multi-tenant system.

Good Solution: Metadata-only change.

Approach: Adding a field is an insert into field_definitions and a metadata cache invalidation. No DDL. Existing rows naturally have NULL for the new field because it lives in JSONB.
Challenges: Queries on the new field are slow until an index is created. No enforcement of "required" on existing rows.

Great Solution: Metadata change + background backfill + concurrent index build + per-org feature flag.

Approach:
- Metadata change is instant (insert + cache invalidate).
- Background backfill job populates default values for existing rows in batches, rate-limited by the org's backfill quota. Takes minutes to hours for a 100M-row org; runs in background.
- Expression index built CREATE INDEX CONCURRENTLY — does not lock reads/writes.
- Per-org feature flag gates the rollout of the new field. If backfill or index build fails, flip the flag to hide the field without removing metadata.
- Quota on concurrent schema changes per org — prevents a misconfigured admin script from triggering 1000 ALTER operations.
Challenges: Long-running backfills consume IOPS budget. Monitor and throttle. Concurrent index builds still take a long time on huge tables; plan for days on mega-tenants. Rollback of a partially-backfilled field requires care.

5) How do we handle mega-tenant isolation?

Bad Solution: Force all orgs onto shared shards.

Approach: Every org is on a shared shard holding ~1k orgs.
Challenges: A single mega-tenant with 1B records can saturate a shard's IOPS, cache, and disk. Other 999 orgs on that shard get degraded performance.

Good Solution: Shard rebalance based on size.

Approach: Periodically rebalance: move big orgs to dedicated shards; consolidate small orgs.
Challenges: Rebalancing is hours-long and risky. "Big" is fuzzy — an org with 10M rows but hot query patterns can hurt worse than one with 100M rows mostly cold.

Great Solution: Tiered shard assignment + proactive promotion + per-shard resource caps.

Approach:
- Shared shards for small/medium orgs (up to 50M records).
- Dedicated shards for mega-tenants, provisioned per-org based on size, tier, or contractual SLA.
- Hot-spot detector monitors per-org IOPS, query time, row scans. An org crossing a threshold (e.g., 20% of shard IOPS for > 1 hour) is flagged for promotion.
- Promotion is a live migration: new shard spun up, data copied via CDC, traffic switched with a brief writes-paused window (seconds).
- Per-shard resource caps enforce that any single org cannot consume more than, say, 50% of a shared shard's capacity. Excess queries queue or 429.
Challenges: Live migration requires robust tooling — CDC + read-your-writes consistency during cutover is non-trivial. Dedicated shards cost more; tie to tier pricing. Promotion decisions benefit from ML over time (predicting growth) rather than reactive thresholds.

What is Expected at Each Level?

Mid-level (SMTS-junior)

Multi-tenant tables with org_id as a column (not just PK prefix — but the PK prefix is bonus). Basic RBAC. Metadata cache for object definitions. Can be prompted on EAV vs JSONB vs DDL tradeoffs.

Senior (SMTS / LMTS)

Universal data model with JSONB. Push-down RLS with pre-materialized record access table. CDC → OLAP for dashboards. Widget caching with TTL. Concrete capacity math. Metadata changes as online operations.

Staff+ (PMTS)

Mega-tenant isolation strategy (tiered shards + promotion). Online metadata evolution with backfill quotas. Cross-region consistency for geo-distributed orgs. Per-org query cost analysis (CPU per query, rows scanned per dashboard). Approximate aggregations (HLL, t-digest) for real-time widgets. Proactive index advisor service.

Salesforce-Specific Considerations

Direct analog: this is literally the Salesforce Multitenant Kernel (Force.com's Universal Data Dictionary). Use these terms: "pivot tables," "skinny tables," "field history tracking," "Record Access table."
Governor limit parallel: mirror SOQL row limit (50,000 rows returned per transaction) and CPU time (10 s sync) at the query planner. Queries approaching these limits get converted to async jobs or rejected.
Sharing model chain: Organization → Profile → Permission Set → Sharing Rule → Record Access. The pre-materialized sharing table captures the output of this chain; mention it as you walk through the design.
Shield Field Audit Trail: every record change is logged for up to 10 years. The CDC stream to cold storage is the mechanism.
Platform Cache: Salesforce's per-org in-memory cache maps to our Widget Cache + Metadata Cache. Both sit in Redis with per-org quotas.
Hyperforce regional isolation: the storage router maps orgId → region → shard. No cross-region joins on hot paths.
Metadata API: admins use the Metadata API to deploy custom objects and fields. That API lands on the same metadata catalog we described; our query planner just reads it.

Example snippet — tenant-aware query middleware

JavaC++TypeScript

java

public class TenantAwareDao {
    private final JdbcTemplate jdbc;

    public List<Record> find(TenantContext ctx, String objectApi,
                             String whereClause, Object... args) {
        int objectId = metadataCache.objectId(ctx.orgId(), objectApi);
        String sql = """
            SELECT record_id, fields FROM records r
              JOIN record_access a ON r.record_id = a.record_id
             WHERE r.org_id = ? AND r.object_id = ?
               AND a.principal_id = ANY(?)
               AND %s
        """.formatted(whereClause);
        Object[] bound = prepend(new Object[]{
            ctx.orgId(), objectId, ctx.principals().toArray()
        }, args);
        return jdbc.query(sql, bound, this::mapRow);
    }
}

cpp

std::vector<Record> TenantAwareDao::Find(const TenantContext& ctx,
                                          const std::string& object_api,
                                          const std::string& where_clause,
                                          const ParamList& params) {
  int object_id = metadata_cache_.ObjectId(ctx.org_id(), object_api);
  const std::string sql =
    "SELECT record_id, fields FROM records r "
    "JOIN record_access a ON r.record_id = a.record_id "
    "WHERE r.org_id = $1 AND r.object_id = $2 "
    "  AND a.principal_id = ANY($3) AND " + where_clause;
  ParamList bound = {ctx.org_id(), object_id, ctx.principals()};
  bound.Append(params);
  return db_.Query<Record>(sql, bound);
}

typescript

async function findRecords(
  ctx: TenantContext,
  objectApi: string,
  whereClause: string,
  params: unknown[]
): Promise<Record[]> {
  const objectId = metadataCache.objectId(ctx.orgId, objectApi);
  const sql = `
    SELECT record_id, fields FROM records r
      JOIN record_access a ON r.record_id = a.record_id
     WHERE r.org_id = $1 AND r.object_id = $2
       AND a.principal_id = ANY($3)
       AND ${whereClause}
  `;
  return db.query<Record>(sql, [ctx.orgId, objectId, ctx.principals, ...params]);
}

HLD: Multi-Tenant CRM Dashboard ​

Understanding the Problem ​

What is a Multi-Tenant CRM Dashboard? ​

Functional Requirements ​

Non-Functional Requirements ​

Capacity Estimation ​

The Set Up ​

Core Entities ​

The API ​

High-Level Design ​

Architecture ​

End-to-end flow: a user opens a dashboard ​

End-to-end flow: a user updates a record ​

Data model — the core decision ​

Multi-Tenancy Strategy ​

Potential Deep Dives ​

1) How do we support per-org schema flexibility (custom objects and fields)? ​

2) How do we enforce row-level security at scale? ​

3) How do we serve dashboard aggregations at 300k QPS? ​

4) How do we apply metadata changes (new fields, new objects) without downtime? ​

5) How do we handle mega-tenant isolation? ​

What is Expected at Each Level? ​

Mid-level (SMTS-junior) ​

Senior (SMTS / LMTS) ​

Staff+ (PMTS) ​

Salesforce-Specific Considerations ​

Example snippet — tenant-aware query middleware ​