Skip to content

HLD: Mini Google Drive (File Storage System) ​

L4 scoping note. This is NOT "design GFS." This is "design a file storage service for ~100M users at 10 GB avg." Emphasize the split between metadata (structured, small, queryable) and blob storage (unstructured, huge, content-addressed). Chunking and resumable uploads are the meaty deep dives. Deduplication and sharing are great follow-ups. Do NOT design a consensus protocol for metadata -- use boring, reliable tech.


Understanding the Problem ​

What is Google Drive? ​

A cloud file storage service. Users upload files of arbitrary size, organize them in folders, share with others, and access from multiple devices. Think Dropbox, Google Drive, OneDrive. The interview value: it's a meaty full-stack system with interesting trade-offs in chunking, dedup, metadata/blob split, permissions, and large-file handling.

Functional Requirements ​

Core (above the line):

  1. Upload -- upload files (small and large, up to several GB), supporting resumable uploads.
  2. Download -- retrieve files by path or ID.
  3. List / navigate -- list files and folders, navigate directory tree.
  4. Delete -- remove files (soft delete / trash with eventual purge).
  5. Share -- share a file with another user (view or edit) or generate a public link.

Below the line (out of scope):

  • Real-time collaborative editing (that's Docs/Sheets, a different product)
  • Offline sync with conflict resolution on multiple devices (mention if asked)
  • Server-side file preview / thumbnail generation (separate service)
  • Full-text search across file contents (separate pipeline)
  • Versioning beyond a simple "keep last N versions"
  • Zero-knowledge end-to-end encryption (possible but out of scope for L4)

Non-Functional Requirements ​

Core:

  1. Durability -- 11 nines ("eleven 9s" = 99.999999999%). Data loss is catastrophic.
  2. Availability -- 99.9% for reads/writes. Downtime is annoying but not fatal.
  3. Scale -- 100M users * 10 GB avg = 1 EB total. Upload throughput: 10K uploads/sec peak. Download throughput: 100K/sec (read-heavy).
  4. Large file support -- individual files up to 5 GB. Resumable uploads across flaky networks.

Below the line:

  • Global sub-100ms file metadata access from every continent (mention CDN for blobs, skip deep multi-region metadata design)
  • Sub-second list consistency across devices (eventual is fine within seconds)

L4 sanity check: 1 EB is big storage but it's just a numbers game -- S3 / GCS does this today. 10K uploads/sec and 100K downloads/sec across thousands of machines. The architecture is conceptually simple; rigor is in the details of each layer.


The Set Up ​

Core Entities ​

EntityDescription
UseruserId, email, quotaBytes, usedBytes
FilefileId, ownerId, parentFolderId, name, size, contentHash, createdAt, updatedAt, trashed
FolderfolderId, ownerId, parentFolderId, name
ChunkchunkId = SHA256(contents), size, blobLocation
FileChunkJoin table: fileId, chunkIndex, chunkId -- the ordered list of chunks making up a file
PermissionfileId, granteeUserId (or publicToken), role (viewer/editor), expiresAt

The API ​

Initiate a resumable upload:

POST /api/files/upload/init
Authorization: Bearer <token>
Content-Type: application/json

{
  "name": "report.pdf",
  "parentFolderId": "f_123",
  "size": 48234782,
  "contentHash": "sha256:abcd1234...",  // optional, enables dedup
  "mimeType": "application/pdf"
}

Response: 200 OK
{
  "uploadId": "u_xyz987",
  "uploadUrl": "https://uploads.example.com/u/xyz987",
  "chunkSize": 4194304,  // 4 MB
  "expiresAt": "2026-04-20T10:00:00Z"
}

POST because initiating creates state on the server.

Upload a chunk:

PUT /uploads/{uploadId}/chunks/{chunkIndex}
Content-Type: application/octet-stream
Content-Range: bytes 0-4194303/48234782

<binary chunk data>

Response: 200 OK
{
  "chunkIndex": 0,
  "received": true,
  "nextExpected": 1
}

Complete the upload:

POST /api/files/upload/{uploadId}/complete

Response: 201 Created
{
  "fileId": "f_new123",
  "name": "report.pdf",
  "size": 48234782,
  "contentHash": "sha256:abcd1234..."
}

Download a file:

GET /api/files/{fileId}/download
Authorization: Bearer <token>

Response: 302 Found
Location: https://<signed-blob-url>?token=...&expires=...

The response is a signed URL to the blob store (S3 presigned URL or GCS signed URL). Browser fetches the content directly -- bypasses our app servers entirely for the download bytes.

List folder contents:

GET /api/folders/{folderId}/contents?cursor=<opaque>&limit=100

Response: 200 OK
{
  "folder": { "folderId": "f_123", "name": "Projects" },
  "entries": [
    { "type": "file", "fileId": "f_abc", "name": "spec.doc", "size": 12034, "updatedAt": "..." },
    { "type": "folder", "folderId": "f_xyz", "name": "Archive" },
    ...
  ],
  "nextCursor": "...",
  "hasMore": true
}

Share a file:

POST /api/files/{fileId}/permissions
{
  "granteeEmail": "bob@example.com",
  "role": "viewer"
}

Response: 200 OK
{ "permissionId": "p_123" }

Delete (soft):

DELETE /api/files/{fileId}

Moves to trash. Permanent delete after 30 days via a background job.


High-Level Design ​

[Client] -> [CDN] -> [API gateway] -> [Metadata service] -> [PostgreSQL (metadata)]
                                              |
                                              +----------> [Auth service]
                                              |
                          [Upload coordinator] -> [Blob storage (S3/GCS)]
                                                        ^
                                                        |
                                          [Garbage collector] (periodic)

Flow 1: Upload a file (large, resumable) ​

  1. Client calls POST /upload/init with file metadata.
  2. Metadata service creates an UploadSession row (uploadId, fileId draft, total size, received chunks map) in Postgres.
  3. Returns uploadUrl pointing to the Upload Coordinator.
  4. Client splits the file into 4 MB chunks. For each chunk: a. Compute SHA-256 of chunk contents. b. PUT /uploads/{uploadId}/chunks/{i}. c. Upload Coordinator:
    • Writes chunk to blob storage at key chunks/<sha256>.
    • Updates UploadSession.receivedChunks[i] = sha256.
    • Returns 200.
  5. On network failure, client resumes from last confirmed chunk -- state is preserved on server.
  6. Once all chunks received, client calls POST /upload/{uploadId}/complete.
  7. Upload Coordinator: a. Validates all chunks received. b. Creates a File row in Postgres with finalized metadata. c. Inserts FileChunk rows linking the file to its ordered chunks. d. Marks UploadSession complete (or deletes it).

Flow 2: Download a file ​

  1. Client calls GET /files/{fileId}/download.
  2. Metadata service checks permissions (caller is owner or has a valid Permission row).
  3. Metadata service looks up the chunk list for the file.
  4. Two paths:
    • Simple (small files): stream the file as a single response, concatenating chunks from blob storage.
    • Better (large files): for single-chunk files (small) or if the client supports it, return a 302 redirect to a signed blob URL. Browser downloads directly from blob storage. Saves app-server bandwidth.
  5. For multi-chunk files via signed URLs: either
    • Reassemble chunks into a single blob at upload completion (copy-on-complete), OR
    • Return a list of signed URLs for the chunks and let the client concatenate.
    • OR: use a tiny "chunk concatenation proxy" that streams from blob store to client.

Most implementations store files as single blobs post-upload-complete (copy-on-complete) for download simplicity. Trade-off discussed below.

Flow 3: List a folder ​

  1. Client calls GET /folders/{folderId}/contents.
  2. Metadata service queries SELECT * FROM files WHERE parent_folder_id = ? AND NOT trashed and similar for folders.
  3. Cursor-paginated by (updated_at, fileId).
  4. Permissions checked at the folder level.

Flow 4: Share ​

  1. Caller POSTs a permission.
  2. Metadata service inserts a Permission row.
  3. Recipient's next list/access call sees the shared file via the permission check.

Flow 5: Delete + GC ​

  1. Client DELETEs a file. File.trashed = true, File.trashedAt = now().
  2. Not immediately removed from chunk storage -- files in trash are recoverable for 30 days.
  3. Background Garbage Collector job:
    • Finds files with trashedAt < now - 30 days.
    • Deletes FileChunk rows.
    • For each chunk, checks if any other file references it (SELECT COUNT(*) FROM file_chunks WHERE chunk_id = ?). If zero, delete the chunk blob.
    • Deletes the File row.

Potential Deep Dives ​

1) Metadata service vs blob storage split ​

This is THE structural decision. Articulate it early.

Good Solution: Split ​

  • Metadata in PostgreSQL: small, structured, queryable. Files, folders, permissions, upload sessions. Tens of KB per user. Millions of rows -- a single Postgres cluster handles it.
  • Blob storage in S3 / GCS / Bigtable (if internal at Google): bytes. Huge. Key-value access pattern only.
  • Why: they have totally different access patterns. SQL is great at joins and filtering; it's terrible at storing EB-scale bytes. S3 is great at EB of bytes; it's terrible at "find all files Bob can edit."

Challenges ​

  • Two systems to keep in sync. If the blob write succeeds but the metadata write fails, you have an orphan chunk. Periodic GC handles it.
  • Transactional guarantees are weaker than single-DB. Accept it; this is the industry standard pattern.

2) Chunking: why and how big? ​

Bad Solution: Single-blob per file ​

  • Approach: Upload files as a single blob. Done.
  • Challenges: Resume after failure means re-uploading the whole file. A 2 GB file over flaky wifi is painful. Dedup impossible at sub-file granularity.

Good Solution: Fixed-size chunks (4 MB) ​

  • Approach: Split files into 4 MB chunks. Each chunk uploaded independently. Metadata tracks ordered chunk list.
  • Why 4 MB? Small enough that a failed chunk upload is cheap to retry. Large enough that per-chunk overhead (HTTP headers, auth, etc.) is small fraction. Matches GCS / S3 multipart upload chunk size.
  • Pros: Resumable by chunk. Parallel chunk upload possible. Dedup at chunk level.

Great Solution: Fixed 4 MB chunks + content-addressable storage ​

  • Approach: Name each chunk by its content hash (SHA-256). On upload, if the chunk hash already exists in blob storage, skip the upload ("already have it"). This is dedup.
  • Savings: If 1000 users upload the same corporate template, we store it ONCE.
  • Reality: Google/Dropbox reports 30-50% storage savings from dedup on typical corporate workloads.

L4 note: Mention variable-size chunking (content-defined chunking via rolling hash like rsync / Rabin fingerprint) as an advanced option for deduplicating near-identical files. Don't go deeper.

3) Resumable upload protocol ​

The problem ​

A user's network drops mid-upload of a 2 GB file. We must NOT make them start over.

Good Solution: Google Resumable Upload-style protocol ​

  • Init: client calls POST /upload/init, gets an uploadId and uploadUrl.
  • Upload chunks: PUT {uploadUrl}/chunks/{index} with Content-Range header (standard HTTP). Server stores which chunks received.
  • Resume: on reconnect, client sends GET {uploadUrl}/status and server responds with the highest contiguous chunk received. Client continues from there.
  • Complete: POST {uploadUrl}/complete finalizes. Server validates all chunks present.
  • Expiry: upload sessions expire after 24 hours -- GC unfinished sessions and their orphan chunks.

State storage ​

  • UploadSession row in Postgres: uploadId, userId, fileName, size, receivedChunks (bitmap or list of indices), expiresAt.
  • For large files with many chunks, use a bitmap for efficient storage.

Challenges ​

  • Idempotency: a retry of chunk N must not overwrite existing chunk N with different content. Use the chunk hash as the blob key -- retries converge to the same blob.
  • Partial failures at finalize: if complete fails after inserting File row but before inserting all FileChunk rows, we have a half-state. Use a single transaction.

4) Deduplication design ​

Good Solution: Chunk-level content-addressing ​

  • Before uploading a chunk, client computes SHA-256 locally and asks HEAD /chunks/{hash}.
  • If blob exists, skip upload. Client proceeds as if chunk was uploaded successfully.
  • Metadata records the reference.

Challenges ​

  • Reference counting: when a file is deleted, we must not delete chunk blobs that other files still reference. Options:
    • Reference-counted blobs: on each add/remove, update a counter. Race conditions possible without a transactional store.
    • Mark-and-sweep GC: periodically scan metadata for all referenced chunks, delete blobs not referenced. Slower but simpler. Used by large systems.
  • Security concern: content-addressing means anyone who happens to have the same file content gets a "free" upload. That's fine -- they already had the content. But cross-account dedup can leak presence info ("is this file on the system?"). Usually scope dedup within an account.

For L4, per-account dedup with mark-and-sweep GC is the safe answer.

5) Database schema and scaling ​

Schema sketch ​

sql
CREATE TABLE users (
  user_id        UUID PRIMARY KEY,
  email          TEXT UNIQUE NOT NULL,
  used_bytes     BIGINT DEFAULT 0,
  quota_bytes    BIGINT DEFAULT 16106127360  -- 15 GB
);

CREATE TABLE files (
  file_id           UUID PRIMARY KEY,
  owner_id          UUID REFERENCES users,
  parent_folder_id  UUID,
  name              TEXT NOT NULL,
  size              BIGINT,
  content_hash      TEXT,
  mime_type         TEXT,
  trashed           BOOLEAN DEFAULT FALSE,
  trashed_at        TIMESTAMP,
  created_at        TIMESTAMP DEFAULT NOW(),
  updated_at        TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON files (parent_folder_id) WHERE NOT trashed;
CREATE INDEX ON files (owner_id, updated_at);

CREATE TABLE file_chunks (
  file_id     UUID,
  chunk_index INT,
  chunk_id    TEXT,  -- SHA-256 hex
  PRIMARY KEY (file_id, chunk_index)
);

CREATE TABLE permissions (
  permission_id  UUID PRIMARY KEY,
  file_id        UUID,
  grantee_id     UUID,   -- nullable if public link
  public_token   TEXT,   -- nullable
  role           TEXT,   -- 'viewer' | 'editor'
  expires_at     TIMESTAMP
);
CREATE INDEX ON permissions (grantee_id);
CREATE INDEX ON permissions (file_id);

CREATE TABLE upload_sessions (
  upload_id       UUID PRIMARY KEY,
  user_id         UUID,
  file_name       TEXT,
  size            BIGINT,
  received_chunks BYTEA,  -- bitmap
  expires_at      TIMESTAMP
);

Scaling strategy ​

  • Start: single Postgres cluster (primary + read replicas). Handles millions of users.
  • Sharding: shard by user_id. Each shard owns all files/folders/permissions for a user-id range. Cross-user queries (shared-with-me) become a federation. Mention this is a later step when the primary can't keep up.
  • Partition key choice: sharding by user works because the most common query patterns are scoped to a user. "Shared with me" is the tricky query -- one option is to materialize a separate "shared_with_me" table per user that's written when a permission is granted.

6) Permissions model ​

Good Solution: Direct permission rows ​

  • Each share = one row: (file_id, grantee_id, role).
  • On access, check: owner? grantee? Public link with valid token?
  • Indexed on both file_id and grantee_id.

Great Solution: Inheritance + explicit override ​

  • Folders have permissions that children inherit. Editing a folder's permission updates all children in effect.
  • Permission rows stored on the nearest ancestor; access check walks up the folder tree (bounded by tree depth, usually small).
  • Explicit overrides at the file level beat inherited permissions.

For L4, direct-per-file is fine and widely used. Mention inheritance as an extension.

Public sharing ​

  • A Permission row with public_token (random string) and no grantee_id.
  • URL format: https://drive.example.com/p/{public_token}.
  • Revoke by deleting the row.

7) Consistency concerns ​

  • Blob before metadata: always write chunks to blob storage first, THEN write metadata. If metadata write fails, the orphan chunk is cleaned up by GC (content-addressed, might even be reused by another file).
  • Listings: eventually consistent with uploads. A file just uploaded might not appear in listings for a few seconds if we have read replicas. Usually acceptable.
  • Strong consistency for permission revocation: when a share is revoked, the check must reflect it immediately. Read from primary, not replica, for permission checks.

8) Bandwidth and CDN ​

  • Downloads are huge volume. App servers should NEVER stream bytes if avoidable.
  • Use signed URLs to blob storage so the browser downloads directly.
  • For public files, CDN in front of the blob store (CloudFront / Cloud CDN). Cache-Control aligned with file mutability.

9) What NOT to design at L4 ​

  • Don't design your own blob store (GFS / Colossus). Use GCS / S3 as a given.
  • Don't design GeoReplication for EB data -- blob stores already do this.
  • Don't design a real-time sync protocol -- that's Google Drive desktop client, separate concern.
  • Don't do end-to-end encryption unless explicitly asked; it complicates sharing, search, and dedup significantly.

What is Expected at Each Level ​

L3 / Mid-level ​

  • Separate metadata and blob storage.
  • Basic upload/download/list APIs.
  • Might propose simple single-blob upload without resumable protocol.
  • Permissions as a simple table.

L4 ​

  • Chunk-based storage with resumable upload.
  • Content-addressed chunks for deduplication.
  • Signed URLs for download (app servers don't stream bytes).
  • Schema with appropriate indexes, back-of-envelope on row counts.
  • Soft delete + GC for chunks.
  • Discussion of the consistency model (orphan chunks, metadata-vs-blob ordering).
  • Permission checks with indexed lookup on both directions.

L5 / Senior ​

  • Variable-size / content-defined chunking for cross-file dedup.
  • Multi-region strategy: regional metadata primary with async replication, blob store already multi-region.
  • Reference counting vs mark-sweep trade-offs in GC.
  • Schema for large-scale sharing: materialized "shared with me" views, handling the fan-out query.
  • Operational concerns: EB-scale cost accounting, storage-class tiering (hot/cold/archive), quota enforcement race conditions.
  • Client sync protocol (events, tokens for resumable sync).
  • E2E encryption trade-offs (no server-side preview, no dedup, harder sharing).

Frontend interview preparation reference.