HLD: Mini Google Drive (File Storage System) ​
L4 scoping note. This is NOT "design GFS." This is "design a file storage service for ~100M users at 10 GB avg." Emphasize the split between metadata (structured, small, queryable) and blob storage (unstructured, huge, content-addressed). Chunking and resumable uploads are the meaty deep dives. Deduplication and sharing are great follow-ups. Do NOT design a consensus protocol for metadata -- use boring, reliable tech.
Understanding the Problem ​
What is Google Drive? ​
A cloud file storage service. Users upload files of arbitrary size, organize them in folders, share with others, and access from multiple devices. Think Dropbox, Google Drive, OneDrive. The interview value: it's a meaty full-stack system with interesting trade-offs in chunking, dedup, metadata/blob split, permissions, and large-file handling.
Functional Requirements ​
Core (above the line):
- Upload -- upload files (small and large, up to several GB), supporting resumable uploads.
- Download -- retrieve files by path or ID.
- List / navigate -- list files and folders, navigate directory tree.
- Delete -- remove files (soft delete / trash with eventual purge).
- Share -- share a file with another user (view or edit) or generate a public link.
Below the line (out of scope):
- Real-time collaborative editing (that's Docs/Sheets, a different product)
- Offline sync with conflict resolution on multiple devices (mention if asked)
- Server-side file preview / thumbnail generation (separate service)
- Full-text search across file contents (separate pipeline)
- Versioning beyond a simple "keep last N versions"
- Zero-knowledge end-to-end encryption (possible but out of scope for L4)
Non-Functional Requirements ​
Core:
- Durability -- 11 nines ("eleven 9s" = 99.999999999%). Data loss is catastrophic.
- Availability -- 99.9% for reads/writes. Downtime is annoying but not fatal.
- Scale -- 100M users * 10 GB avg = 1 EB total. Upload throughput: 10K uploads/sec peak. Download throughput: 100K/sec (read-heavy).
- Large file support -- individual files up to 5 GB. Resumable uploads across flaky networks.
Below the line:
- Global sub-100ms file metadata access from every continent (mention CDN for blobs, skip deep multi-region metadata design)
- Sub-second list consistency across devices (eventual is fine within seconds)
L4 sanity check: 1 EB is big storage but it's just a numbers game -- S3 / GCS does this today. 10K uploads/sec and 100K downloads/sec across thousands of machines. The architecture is conceptually simple; rigor is in the details of each layer.
The Set Up ​
Core Entities ​
| Entity | Description |
|---|---|
| User | userId, email, quotaBytes, usedBytes |
| File | fileId, ownerId, parentFolderId, name, size, contentHash, createdAt, updatedAt, trashed |
| Folder | folderId, ownerId, parentFolderId, name |
| Chunk | chunkId = SHA256(contents), size, blobLocation |
| FileChunk | Join table: fileId, chunkIndex, chunkId -- the ordered list of chunks making up a file |
| Permission | fileId, granteeUserId (or publicToken), role (viewer/editor), expiresAt |
The API ​
Initiate a resumable upload:
POST /api/files/upload/init
Authorization: Bearer <token>
Content-Type: application/json
{
"name": "report.pdf",
"parentFolderId": "f_123",
"size": 48234782,
"contentHash": "sha256:abcd1234...", // optional, enables dedup
"mimeType": "application/pdf"
}
Response: 200 OK
{
"uploadId": "u_xyz987",
"uploadUrl": "https://uploads.example.com/u/xyz987",
"chunkSize": 4194304, // 4 MB
"expiresAt": "2026-04-20T10:00:00Z"
}POST because initiating creates state on the server.
Upload a chunk:
PUT /uploads/{uploadId}/chunks/{chunkIndex}
Content-Type: application/octet-stream
Content-Range: bytes 0-4194303/48234782
<binary chunk data>
Response: 200 OK
{
"chunkIndex": 0,
"received": true,
"nextExpected": 1
}Complete the upload:
POST /api/files/upload/{uploadId}/complete
Response: 201 Created
{
"fileId": "f_new123",
"name": "report.pdf",
"size": 48234782,
"contentHash": "sha256:abcd1234..."
}Download a file:
GET /api/files/{fileId}/download
Authorization: Bearer <token>
Response: 302 Found
Location: https://<signed-blob-url>?token=...&expires=...The response is a signed URL to the blob store (S3 presigned URL or GCS signed URL). Browser fetches the content directly -- bypasses our app servers entirely for the download bytes.
List folder contents:
GET /api/folders/{folderId}/contents?cursor=<opaque>&limit=100
Response: 200 OK
{
"folder": { "folderId": "f_123", "name": "Projects" },
"entries": [
{ "type": "file", "fileId": "f_abc", "name": "spec.doc", "size": 12034, "updatedAt": "..." },
{ "type": "folder", "folderId": "f_xyz", "name": "Archive" },
...
],
"nextCursor": "...",
"hasMore": true
}Share a file:
POST /api/files/{fileId}/permissions
{
"granteeEmail": "bob@example.com",
"role": "viewer"
}
Response: 200 OK
{ "permissionId": "p_123" }Delete (soft):
DELETE /api/files/{fileId}Moves to trash. Permanent delete after 30 days via a background job.
High-Level Design ​
[Client] -> [CDN] -> [API gateway] -> [Metadata service] -> [PostgreSQL (metadata)]
|
+----------> [Auth service]
|
[Upload coordinator] -> [Blob storage (S3/GCS)]
^
|
[Garbage collector] (periodic)Flow 1: Upload a file (large, resumable) ​
- Client calls
POST /upload/initwith file metadata. - Metadata service creates an
UploadSessionrow (uploadId,fileIddraft, total size, received chunks map) in Postgres. - Returns
uploadUrlpointing to the Upload Coordinator. - Client splits the file into 4 MB chunks. For each chunk: a. Compute SHA-256 of chunk contents. b.
PUT /uploads/{uploadId}/chunks/{i}. c. Upload Coordinator:- Writes chunk to blob storage at key
chunks/<sha256>. - Updates
UploadSession.receivedChunks[i] = sha256. - Returns 200.
- Writes chunk to blob storage at key
- On network failure, client resumes from last confirmed chunk -- state is preserved on server.
- Once all chunks received, client calls
POST /upload/{uploadId}/complete. - Upload Coordinator: a. Validates all chunks received. b. Creates a
Filerow in Postgres with finalized metadata. c. InsertsFileChunkrows linking the file to its ordered chunks. d. MarksUploadSessioncomplete (or deletes it).
Flow 2: Download a file ​
- Client calls
GET /files/{fileId}/download. - Metadata service checks permissions (caller is owner or has a valid
Permissionrow). - Metadata service looks up the chunk list for the file.
- Two paths:
- Simple (small files): stream the file as a single response, concatenating chunks from blob storage.
- Better (large files): for single-chunk files (small) or if the client supports it, return a 302 redirect to a signed blob URL. Browser downloads directly from blob storage. Saves app-server bandwidth.
- For multi-chunk files via signed URLs: either
- Reassemble chunks into a single blob at upload completion (copy-on-complete), OR
- Return a list of signed URLs for the chunks and let the client concatenate.
- OR: use a tiny "chunk concatenation proxy" that streams from blob store to client.
Most implementations store files as single blobs post-upload-complete (copy-on-complete) for download simplicity. Trade-off discussed below.
Flow 3: List a folder ​
- Client calls
GET /folders/{folderId}/contents. - Metadata service queries
SELECT * FROM files WHERE parent_folder_id = ? AND NOT trashedand similar for folders. - Cursor-paginated by
(updated_at, fileId). - Permissions checked at the folder level.
Flow 4: Share ​
- Caller POSTs a permission.
- Metadata service inserts a
Permissionrow. - Recipient's next list/access call sees the shared file via the permission check.
Flow 5: Delete + GC ​
- Client DELETEs a file.
File.trashed = true,File.trashedAt = now(). - Not immediately removed from chunk storage -- files in trash are recoverable for 30 days.
- Background Garbage Collector job:
- Finds files with
trashedAt < now - 30 days. - Deletes
FileChunkrows. - For each chunk, checks if any other file references it (
SELECT COUNT(*) FROM file_chunks WHERE chunk_id = ?). If zero, delete the chunk blob. - Deletes the
Filerow.
- Finds files with
Potential Deep Dives ​
1) Metadata service vs blob storage split ​
This is THE structural decision. Articulate it early.
Good Solution: Split ​
- Metadata in PostgreSQL: small, structured, queryable. Files, folders, permissions, upload sessions. Tens of KB per user. Millions of rows -- a single Postgres cluster handles it.
- Blob storage in S3 / GCS / Bigtable (if internal at Google): bytes. Huge. Key-value access pattern only.
- Why: they have totally different access patterns. SQL is great at joins and filtering; it's terrible at storing EB-scale bytes. S3 is great at EB of bytes; it's terrible at "find all files Bob can edit."
Challenges ​
- Two systems to keep in sync. If the blob write succeeds but the metadata write fails, you have an orphan chunk. Periodic GC handles it.
- Transactional guarantees are weaker than single-DB. Accept it; this is the industry standard pattern.
2) Chunking: why and how big? ​
Bad Solution: Single-blob per file ​
- Approach: Upload files as a single blob. Done.
- Challenges: Resume after failure means re-uploading the whole file. A 2 GB file over flaky wifi is painful. Dedup impossible at sub-file granularity.
Good Solution: Fixed-size chunks (4 MB) ​
- Approach: Split files into 4 MB chunks. Each chunk uploaded independently. Metadata tracks ordered chunk list.
- Why 4 MB? Small enough that a failed chunk upload is cheap to retry. Large enough that per-chunk overhead (HTTP headers, auth, etc.) is small fraction. Matches GCS / S3 multipart upload chunk size.
- Pros: Resumable by chunk. Parallel chunk upload possible. Dedup at chunk level.
Great Solution: Fixed 4 MB chunks + content-addressable storage ​
- Approach: Name each chunk by its content hash (
SHA-256). On upload, if the chunk hash already exists in blob storage, skip the upload ("already have it"). This is dedup. - Savings: If 1000 users upload the same corporate template, we store it ONCE.
- Reality: Google/Dropbox reports 30-50% storage savings from dedup on typical corporate workloads.
L4 note: Mention variable-size chunking (content-defined chunking via rolling hash like rsync / Rabin fingerprint) as an advanced option for deduplicating near-identical files. Don't go deeper.
3) Resumable upload protocol ​
The problem ​
A user's network drops mid-upload of a 2 GB file. We must NOT make them start over.
Good Solution: Google Resumable Upload-style protocol ​
- Init: client calls
POST /upload/init, gets anuploadIdanduploadUrl. - Upload chunks:
PUT {uploadUrl}/chunks/{index}withContent-Rangeheader (standard HTTP). Server stores which chunks received. - Resume: on reconnect, client sends
GET {uploadUrl}/statusand server responds with the highest contiguous chunk received. Client continues from there. - Complete:
POST {uploadUrl}/completefinalizes. Server validates all chunks present. - Expiry: upload sessions expire after 24 hours -- GC unfinished sessions and their orphan chunks.
State storage ​
UploadSessionrow in Postgres:uploadId,userId,fileName,size,receivedChunks(bitmap or list of indices),expiresAt.- For large files with many chunks, use a bitmap for efficient storage.
Challenges ​
- Idempotency: a retry of chunk N must not overwrite existing chunk N with different content. Use the chunk hash as the blob key -- retries converge to the same blob.
- Partial failures at finalize: if
completefails after inserting File row but before inserting all FileChunk rows, we have a half-state. Use a single transaction.
4) Deduplication design ​
Good Solution: Chunk-level content-addressing ​
- Before uploading a chunk, client computes SHA-256 locally and asks
HEAD /chunks/{hash}. - If blob exists, skip upload. Client proceeds as if chunk was uploaded successfully.
- Metadata records the reference.
Challenges ​
- Reference counting: when a file is deleted, we must not delete chunk blobs that other files still reference. Options:
- Reference-counted blobs: on each add/remove, update a counter. Race conditions possible without a transactional store.
- Mark-and-sweep GC: periodically scan metadata for all referenced chunks, delete blobs not referenced. Slower but simpler. Used by large systems.
- Security concern: content-addressing means anyone who happens to have the same file content gets a "free" upload. That's fine -- they already had the content. But cross-account dedup can leak presence info ("is this file on the system?"). Usually scope dedup within an account.
For L4, per-account dedup with mark-and-sweep GC is the safe answer.
5) Database schema and scaling ​
Schema sketch ​
CREATE TABLE users (
user_id UUID PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
used_bytes BIGINT DEFAULT 0,
quota_bytes BIGINT DEFAULT 16106127360 -- 15 GB
);
CREATE TABLE files (
file_id UUID PRIMARY KEY,
owner_id UUID REFERENCES users,
parent_folder_id UUID,
name TEXT NOT NULL,
size BIGINT,
content_hash TEXT,
mime_type TEXT,
trashed BOOLEAN DEFAULT FALSE,
trashed_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX ON files (parent_folder_id) WHERE NOT trashed;
CREATE INDEX ON files (owner_id, updated_at);
CREATE TABLE file_chunks (
file_id UUID,
chunk_index INT,
chunk_id TEXT, -- SHA-256 hex
PRIMARY KEY (file_id, chunk_index)
);
CREATE TABLE permissions (
permission_id UUID PRIMARY KEY,
file_id UUID,
grantee_id UUID, -- nullable if public link
public_token TEXT, -- nullable
role TEXT, -- 'viewer' | 'editor'
expires_at TIMESTAMP
);
CREATE INDEX ON permissions (grantee_id);
CREATE INDEX ON permissions (file_id);
CREATE TABLE upload_sessions (
upload_id UUID PRIMARY KEY,
user_id UUID,
file_name TEXT,
size BIGINT,
received_chunks BYTEA, -- bitmap
expires_at TIMESTAMP
);Scaling strategy ​
- Start: single Postgres cluster (primary + read replicas). Handles millions of users.
- Sharding: shard by
user_id. Each shard owns all files/folders/permissions for a user-id range. Cross-user queries (shared-with-me) become a federation. Mention this is a later step when the primary can't keep up. - Partition key choice: sharding by user works because the most common query patterns are scoped to a user. "Shared with me" is the tricky query -- one option is to materialize a separate "shared_with_me" table per user that's written when a permission is granted.
6) Permissions model ​
Good Solution: Direct permission rows ​
- Each share = one row:
(file_id, grantee_id, role). - On access, check: owner? grantee? Public link with valid token?
- Indexed on both
file_idandgrantee_id.
Great Solution: Inheritance + explicit override ​
- Folders have permissions that children inherit. Editing a folder's permission updates all children in effect.
Permissionrows stored on the nearest ancestor; access check walks up the folder tree (bounded by tree depth, usually small).- Explicit overrides at the file level beat inherited permissions.
For L4, direct-per-file is fine and widely used. Mention inheritance as an extension.
Public sharing ​
- A
Permissionrow withpublic_token(random string) and nograntee_id. - URL format:
https://drive.example.com/p/{public_token}. - Revoke by deleting the row.
7) Consistency concerns ​
- Blob before metadata: always write chunks to blob storage first, THEN write metadata. If metadata write fails, the orphan chunk is cleaned up by GC (content-addressed, might even be reused by another file).
- Listings: eventually consistent with uploads. A file just uploaded might not appear in listings for a few seconds if we have read replicas. Usually acceptable.
- Strong consistency for permission revocation: when a share is revoked, the check must reflect it immediately. Read from primary, not replica, for permission checks.
8) Bandwidth and CDN ​
- Downloads are huge volume. App servers should NEVER stream bytes if avoidable.
- Use signed URLs to blob storage so the browser downloads directly.
- For public files, CDN in front of the blob store (CloudFront / Cloud CDN). Cache-Control aligned with file mutability.
9) What NOT to design at L4 ​
- Don't design your own blob store (GFS / Colossus). Use GCS / S3 as a given.
- Don't design GeoReplication for EB data -- blob stores already do this.
- Don't design a real-time sync protocol -- that's Google Drive desktop client, separate concern.
- Don't do end-to-end encryption unless explicitly asked; it complicates sharing, search, and dedup significantly.
What is Expected at Each Level ​
L3 / Mid-level ​
- Separate metadata and blob storage.
- Basic upload/download/list APIs.
- Might propose simple single-blob upload without resumable protocol.
- Permissions as a simple table.
L4 ​
- Chunk-based storage with resumable upload.
- Content-addressed chunks for deduplication.
- Signed URLs for download (app servers don't stream bytes).
- Schema with appropriate indexes, back-of-envelope on row counts.
- Soft delete + GC for chunks.
- Discussion of the consistency model (orphan chunks, metadata-vs-blob ordering).
- Permission checks with indexed lookup on both directions.
L5 / Senior ​
- Variable-size / content-defined chunking for cross-file dedup.
- Multi-region strategy: regional metadata primary with async replication, blob store already multi-region.
- Reference counting vs mark-sweep trade-offs in GC.
- Schema for large-scale sharing: materialized "shared with me" views, handling the fan-out query.
- Operational concerns: EB-scale cost accounting, storage-class tiering (hot/cold/archive), quota enforcement race conditions.
- Client sync protocol (events, tokens for resumable sync).
- E2E encryption trade-offs (no server-side preview, no dedup, harder sharing).