04 - HLD & System Design
Quick Reference (scan in 5 min)
| Topic | Key Points | Interview Tip |
|---|---|---|
| CAP Theorem | Pick 2 of 3: Consistency, Availability, Partition Tolerance. Payments = CP. | "Paytm wallet debits must be CP — stale balance means double-spend." |
| SQL vs NoSQL | SQL for transactions (ACID), NoSQL for scale + flexible schema. | Always justify choice with access patterns, not hype. |
| Indexing | B-tree for range queries, Hash for exact lookup. Composite index order matters. | "I'd add a composite index on (user_id, created_at) for payment history." |
| Sharding | Hash-based for even distribution, range-based for time-series, geo for compliance. | Mention consistent hashing with virtual nodes to handle rebalancing. |
| Caching | Cache-aside is default. Write-through for consistency. Write-behind for throughput. | "For Paytm's merchant dashboard, cache-aside with 30s TTL on analytics." |
| Message Queues | Kafka for high-throughput event streaming. RabbitMQ for task routing. | "Payment events go to Kafka — ordered per partition by payment_id." |
| Load Balancing | L4 for raw TCP speed, L7 for content-based routing. | Mention health checks and graceful draining during deploys. |
| ACID + Isolation | Serializable prevents all anomalies but kills throughput. Read Committed is the practical default. | "For payment writes I'd use Serializable; for read-heavy dashboards, Read Committed." |
| Payment System Design | Idempotency keys + Saga pattern + Event sourcing + Circuit breaker. | Lead with idempotency — it's the #1 thing interviewers want to hear for fintech. |
1. CAP Theorem
The Three Guarantees
- Consistency (C): Every read returns the most recent write. All nodes see the same data at the same time.
- Availability (A): Every request receives a response (success or failure), even if some nodes are down.
- Partition Tolerance (P): The system continues to operate despite network partitions between nodes.
You can only guarantee 2 of 3 when a network partition occurs. Since partitions are inevitable in distributed systems, the real choice is CP vs AP.
CP vs AP — Real Examples
CP Systems (Consistency + Partition Tolerance)
├── Banking / Payment systems (Paytm wallet)
├── ZooKeeper (leader election)
├── HBase (strong consistency reads)
├── MongoDB (with majority write concern)
└── etcd (Raft consensus)
AP Systems (Availability + Partition Tolerance)
├── Social media feeds (eventual consistency is fine)
├── DNS (stale records are tolerable)
├── Cassandra (tunable consistency)
├── DynamoDB (default eventual consistency)
└── CouchbaseWhy Paytm's Payment System is CP
// Scenario: User has Rs 500 in wallet, tries to pay Rs 400
// CP system — consistent balance across nodes
async function debitWallet(userId: string, amount: number): Promise<PaymentResult> {
// Step 1: Acquire distributed lock on user's wallet
const lock = await acquireLock(`wallet:${userId}`, { ttl: 5000 });
// Step 2: Read current balance (from primary/leader node)
const balance = await db.readFromPrimary(`SELECT balance FROM wallets WHERE user_id = $1`, [userId]);
// Step 3: Check sufficient funds
if (balance < amount) {
await releaseLock(lock);
return { status: 'INSUFFICIENT_FUNDS' };
}
// Step 4: Debit atomically
await db.execute(
`UPDATE wallets SET balance = balance - $1 WHERE user_id = $2 AND balance >= $1`,
[amount, userId]
);
await releaseLock(lock);
return { status: 'SUCCESS', newBalance: balance - amount };
}
// AP system would risk: two concurrent requests both reading Rs 500,
// both succeeding, resulting in -Rs 300 balance (double-spend)Key insight: During a network partition, a CP payment system will refuse requests (return errors) rather than risk processing with stale data. This is the correct trade-off for money.
2. SQL vs NoSQL Decision Framework
Comparison Table
| Dimension | SQL (PostgreSQL, MySQL) | NoSQL (MongoDB, DynamoDB, Cassandra) |
|---|---|---|
| Data Model | Relational tables with fixed schema | Document, Key-Value, Column-Family, Graph |
| Schema | Strict schema, migrations required | Flexible / schema-on-read |
| ACID | Full ACID transactions | Varies — some support per-doc ACID (Mongo), most are BASE |
| Scaling | Vertical primarily; horizontal with effort (read replicas, sharding) | Horizontal by design |
| Joins | Native, efficient | Manual (application-level) or denormalized |
| Query Language | SQL (standardized, powerful) | Vendor-specific APIs / query languages |
| Consistency | Strong by default | Tunable (eventual to strong) |
| Best For | Transactions, complex queries, relationships | High write throughput, flexible data, massive scale |
When to Use Each in Fintech
// SQL — Use for transactional, relational data
// Paytm examples:
const sqlUseCases = {
payments: "ACID transactions for debit/credit",
userAccounts: "Relational data with KYC, addresses, bank links",
merchantSettlements: "Complex joins across payments, fees, payouts",
ledger: "Double-entry bookkeeping requires strict consistency",
walletBalances: "Cannot tolerate eventual consistency on money",
};
// NoSQL — Use for high-volume, flexible, or denormalized data
const noSqlUseCases = {
sessionStore: "Redis — ephemeral, key-value access pattern",
activityLogs: "DynamoDB — high write volume, simple access by userId",
productCatalog: "MongoDB — varied product attributes, nested data",
analytics: "Cassandra — time-series write-heavy analytics events",
cache: "Redis — hot data like exchange rates, OTP attempts",
};Decision Flowchart
Need ACID transactions?
├── Yes → SQL
└── No
├── Need flexible schema?
│ ├── Yes → Document DB (MongoDB)
│ └── No
│ ├── Simple key-value access?
│ │ ├── Yes → Redis / DynamoDB
│ │ └── No
│ │ ├── Write-heavy time-series?
│ │ │ ├── Yes → Cassandra / TimescaleDB
│ │ │ └── No → Evaluate SQL first
│ │ └──
│ └──
└──3. Indexing & Sharding
Index Types
B-tree Index (default in most RDBMSs)
- Balanced tree structure, O(log n) lookups
- Supports range queries (
WHERE created_at > '2024-01-01') - Supports ordering (
ORDER BY created_at DESC) - Good for: payment history by date range, user lookups
Hash Index
- O(1) exact-match lookups
- No range query support
- Good for: idempotency key lookups, session lookups
Composite Index
- Multi-column index — column order matters
- Follows the leftmost prefix rule: index on
(a, b, c)supports queries on(a),(a, b), and(a, b, c)but NOT(b)or(c)alone
// Example: Payment history queries for Paytm
// Common queries:
// 1. All payments for a user → WHERE user_id = ?
// 2. User's payments in a date range → WHERE user_id = ? AND created_at BETWEEN ? AND ?
// 3. User's payments of a specific status → WHERE user_id = ? AND status = ?
// Best composite index:
// CREATE INDEX idx_payments_user_date ON payments(user_id, created_at DESC);
// CREATE INDEX idx_payments_user_status ON payments(user_id, status);
// Covering index — includes all columns needed, avoids table lookup
// CREATE INDEX idx_payments_cover ON payments(user_id, created_at DESC)
// INCLUDE (amount, status, merchant_id);
// Query: SELECT amount, status, merchant_id FROM payments
// WHERE user_id = ? ORDER BY created_at DESC LIMIT 20;
// This is answered entirely from the index — no heap access.Sharding Strategies
Range-Based Sharding
Shard by created_at:
Shard 1: Jan-Mar 2024
Shard 2: Apr-Jun 2024
Shard 3: Jul-Sep 2024
Pros: Good for time-range queries, natural archival
Cons: Hot shard problem (latest shard gets all writes)
Use case: Log/analytics data where old data is rarely accessedHash-Based Sharding
Shard = hash(user_id) % num_shards
Pros: Even distribution of writes
Cons: Range queries require scatter-gather across all shards
Use case: Paytm user wallets — even distribution of balance lookupsGeographic Sharding
Shard by region:
Shard India-North: Delhi, UP, Punjab users
Shard India-South: Karnataka, TN, Kerala users
Shard India-West: Maharashtra, Gujarat users
Pros: Data locality, compliance (data residency), lower latency
Cons: Uneven shard sizes, cross-region transactions are complex
Use case: Compliance requirements, latency-sensitive fintech appsConsistent Hashing with Virtual Nodes
// Problem: Adding/removing a shard with hash % N redistributes almost ALL keys
// Solution: Consistent hashing — only K/N keys move when a node is added
class ConsistentHashRing {
private ring: Map<number, string> = new Map(); // position → nodeId
private sortedPositions: number[] = [];
private virtualNodesPerNode: number;
constructor(virtualNodesPerNode: number = 150) {
this.virtualNodesPerNode = virtualNodesPerNode;
}
addNode(nodeId: string): void {
// Each physical node gets multiple positions on the ring (virtual nodes)
// This ensures even distribution even with few physical nodes
for (let i = 0; i < this.virtualNodesPerNode; i++) {
const virtualKey = `${nodeId}:vn${i}`;
const position = this.hash(virtualKey);
this.ring.set(position, nodeId);
this.sortedPositions.push(position);
}
this.sortedPositions.sort((a, b) => a - b);
}
removeNode(nodeId: string): void {
for (let i = 0; i < this.virtualNodesPerNode; i++) {
const virtualKey = `${nodeId}:vn${i}`;
const position = this.hash(virtualKey);
this.ring.delete(position);
}
this.sortedPositions = this.sortedPositions.filter(p => this.ring.has(p));
}
getNode(key: string): string {
const keyHash = this.hash(key);
// Walk clockwise around the ring to find the first node
for (const position of this.sortedPositions) {
if (position >= keyHash) {
return this.ring.get(position)!;
}
}
// Wrap around to the first node
return this.ring.get(this.sortedPositions[0])!;
}
private hash(key: string): number {
// Simplified — in production use MurmurHash3 or xxHash
let hash = 0;
for (let i = 0; i < key.length; i++) {
hash = ((hash << 5) - hash + key.charCodeAt(i)) | 0;
}
return Math.abs(hash);
}
}
// Usage: Distributing Paytm user wallets across DB shards
const ring = new ConsistentHashRing(150);
ring.addNode("db-shard-1");
ring.addNode("db-shard-2");
ring.addNode("db-shard-3");
const shard = ring.getNode("user:12345"); // → "db-shard-2"
// Adding db-shard-4 only moves ~25% of keys, not 75%Why virtual nodes? Without them, 3 physical nodes might end up with 60/25/15 distribution. With 150 virtual nodes each, distribution approaches 33/33/33.
4. Caching Strategies
Cache-Aside (Lazy Loading)
Read path:
┌────────┐ 1. GET ┌───────┐
│ App │──────────────→│ Cache │
│ Server │←──────────────│(Redis)│
│ │ 2. HIT? │ │
│ │ └───────┘
│ │ 3. MISS → read DB
│ │──────────────→┌────────┐
│ │←──────────────│ DB │
│ │ 4. Return └────────┘
│ │
│ │──5. SET cache─→ Cache
└────────┘
Write path:
App writes to DB directly, then invalidates/deletes cache key.async function getPaymentDetails(paymentId: string): Promise<Payment> {
// 1. Check cache first
const cached = await redis.get(`payment:${paymentId}`);
if (cached) return JSON.parse(cached);
// 2. Cache miss — read from DB
const payment = await db.query('SELECT * FROM payments WHERE id = $1', [paymentId]);
// 3. Populate cache with TTL
await redis.setex(`payment:${paymentId}`, 300, JSON.stringify(payment)); // 5 min TTL
return payment;
}
async function updatePaymentStatus(paymentId: string, status: string): Promise<void> {
await db.query('UPDATE payments SET status = $1 WHERE id = $2', [status, paymentId]);
// Invalidate cache — next read will fetch fresh data
await redis.del(`payment:${paymentId}`);
}When to use: Most common pattern. Good default. App controls cache population. Trade-off: First request after miss is slow. Possible stale data if invalidation fails.
Read-Through
┌────────┐ 1. GET ┌──────────────────┐ 2. MISS → auto-fetch ┌────────┐
│ App │──────────→│ Cache (manages │──────────────────────→│ DB │
│ Server │←──────────│ its own loading) │←──────────────────────│ │
│ │ 3. Data │ │ 4. Data └────────┘
└────────┘ └──────────────────┘
Cache auto-populates on missWhen to use: Cache library handles DB fetching. Simpler app code. Same data source always. Trade-off: Cache library must know how to query your DB. Less flexible than cache-aside.
Write-Through
┌────────┐ 1. WRITE ┌───────┐ 2. Sync write ┌────────┐
│ App │───────────→│ Cache │─────────────────→│ DB │
│ Server │←───────────│(Redis)│←─────────────────│ │
│ │ 4. ACK │ │ 3. ACK └────────┘
└────────┘ └───────┘
Cache + DB always in syncasync function createPayment(payment: Payment): Promise<void> {
// Write-through: write to cache AND DB synchronously
await db.query('INSERT INTO payments ...', [payment]);
await redis.setex(`payment:${payment.id}`, 3600, JSON.stringify(payment));
// Both are updated before returning to caller
}When to use: Need strong consistency between cache and DB. Read-heavy after write. Trade-off: Higher write latency (two writes per operation). Writes to cache even for data never read.
Write-Behind (Write-Back)
┌────────┐ 1. WRITE ┌───────┐
│ App │───────────→│ Cache │ ── 2. ACK (immediate)
│ Server │←───────────│(Redis)│
└────────┘ │ │ 3. Async batch write (later)
│ │─────────────────→┌────────┐
└───────┘ │ DB │
└────────┘When to use: High write throughput needed. Can tolerate brief inconsistency. Trade-off: Data loss risk if cache crashes before flushing to DB. Complex failure handling. Fintech note: Generally NOT suitable for payment transactions. Acceptable for analytics counters (e.g., page views on merchant dashboard).
Redis vs Memcached
| Feature | Redis | Memcached |
|---|---|---|
| Data Structures | Strings, Lists, Sets, Sorted Sets, Hashes, Streams | Strings only |
| Persistence | RDB snapshots + AOF | None (pure cache) |
| Replication | Built-in master-replica | None |
| Clustering | Redis Cluster (auto-sharding) | Client-side sharding |
| Pub/Sub | Yes | No |
| Lua Scripting | Yes (atomic operations) | No |
| Memory Efficiency | Higher overhead per key | More memory-efficient for simple strings |
| Multithreading | Single-threaded (I/O threads in 6.0+) | Multithreaded |
| Use in Fintech | Rate limiting, session store, leaderboards, distributed locks | Simple high-throughput caching |
Default choice for Paytm: Redis, because you need data structures (sorted sets for rate limiting), persistence (survive restarts), and pub/sub (real-time notifications).
5. Message Queues
Kafka vs RabbitMQ
| Dimension | Kafka | RabbitMQ |
|---|---|---|
| Model | Distributed commit log (pull-based) | Message broker (push-based) |
| Ordering | Per-partition ordering guaranteed | Per-queue ordering (single consumer) |
| Throughput | Millions of messages/sec | Tens of thousands/sec |
| Retention | Configurable (days/weeks/forever) | Messages deleted after consumption |
| Consumer Model | Consumer groups (each group gets all messages) | Competing consumers (each message to one consumer) |
| Replay | Yes — consumers can seek to any offset | No — once consumed, gone |
| Delivery | At-least-once (exactly-once with transactions) | At-most-once or at-least-once (configurable) |
| Routing | Topic + partition | Exchanges, bindings, routing keys (flexible) |
| Best For | Event streaming, audit logs, high throughput | Task queues, RPC, complex routing |
When to Use Each in Fintech
// Kafka — Event streaming, audit trail, analytics pipeline
// "Every payment state change is an event appended to a Kafka topic"
// Topic: payment-events, partitioned by payment_id (ordering per payment)
interface PaymentEvent {
eventId: string;
paymentId: string; // partition key — all events for a payment land in same partition
eventType: 'CREATED' | 'AUTHORIZED' | 'CAPTURED' | 'SETTLED' | 'REFUNDED' | 'FAILED';
timestamp: number;
payload: Record<string, unknown>;
}
// Consumer group: settlement-service reads all payment events
// Consumer group: analytics-service reads all payment events (independently)
// Consumer group: notification-service reads all payment events
// Each group maintains its own offset — Kafka retains messages for all groups
// RabbitMQ — Task queue for async jobs with complex routing
// "Send email notification after payment success"
// Exchange: notifications (type: topic)
// Routing key: payment.success.email → Queue: email-notifications
// Routing key: payment.success.sms → Queue: sms-notifications
// Routing key: payment.failure.* → Queue: failure-alerts
// RabbitMQ excels here because:
// 1. Complex routing rules (topic exchange with wildcards)
// 2. Per-message acknowledgment (re-queue on failure)
// 3. Dead letter queue for failed messages
// 4. Priority queues (VIP merchant notifications first)Ordering Guarantees
Kafka:
- Ordering guaranteed WITHIN a partition
- No ordering across partitions
- Strategy: Use payment_id as partition key
→ All events for payment "PAY_123" go to same partition
→ CREATED always comes before CAPTURED for that payment
RabbitMQ:
- Ordering guaranteed within a single queue with a single consumer
- Multiple consumers on same queue → no ordering guarantee
- Use single consumer or message grouping for orderingDelivery Semantics
At-most-once: Fire and forget. May lose messages. Fast.
Use: Analytics events where losing 0.1% is acceptable.
At-least-once: Retry until ACK. May duplicate. Most common.
Use: Payment notifications — duplicates are safe if idempotent.
Exactly-once: No loss, no duplicates. Hard / expensive.
Kafka: Achieved via idempotent producer + transactional consumers.
Use: Financial ledger entries — duplicates cause accounting errors.6. Load Balancing
Algorithms
Round Robin
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)
Pros: Simple, even distribution
Cons: Ignores server load — a slow server gets same traffic as a fast one
Use: Stateless services with homogeneous serversWeighted Round Robin
Server A (weight 5): Gets 5 out of every 8 requests
Server B (weight 2): Gets 2 out of every 8 requests
Server C (weight 1): Gets 1 out of every 8 requests
Use: Mixed hardware — beefy servers get more trafficLeast Connections
Server A: 12 active connections → skip
Server B: 3 active connections → ROUTE HERE
Server C: 7 active connections → skip
Pros: Adapts to actual server load
Cons: Requires tracking connection counts
Use: Long-lived connections (WebSockets for real-time payment status)Consistent Hashing
hash(user_id) → always same server (unless topology changes)
Pros: Session affinity without sticky sessions, minimal redistribution on scale
Cons: Uneven if hash function is poor
Use: Caching layers, stateful servicesL4 vs L7 Load Balancing
| Aspect | L4 (Transport Layer) | L7 (Application Layer) |
|---|---|---|
| Operates On | TCP/UDP packets (IP + port) | HTTP headers, URL path, cookies, body |
| Speed | Faster — no payload inspection | Slower — inspects full request |
| Routing | By IP and port only | By URL path, header, cookie, etc. |
| SSL Termination | Pass-through (end-to-end encryption) | Terminates SSL, re-encrypts to backend |
| Use Case | Raw TCP services, database connections | API routing, A/B testing, canary deploys |
| Example | AWS NLB | AWS ALB, Nginx, HAProxy (L7 mode) |
Paytm Load Balancing Architecture:
Internet → CDN (static assets)
→ L7 ALB
├── /api/payments/* → Payment Service cluster
├── /api/merchants/* → Merchant Service cluster
├── /api/wallets/* → Wallet Service cluster
└── /api/analytics/* → Analytics Service cluster
Each service cluster has its own internal L4 NLB for service-to-service gRPC calls.7. ACID + Isolation Levels
ACID Properties
| Property | Meaning | Fintech Example |
|---|---|---|
| Atomicity | All or nothing — partial transactions roll back | Debit wallet + credit merchant: both succeed or both fail |
| Consistency | DB moves from one valid state to another | Balance never goes negative (CHECK constraint) |
| Isolation | Concurrent transactions don't interfere | Two simultaneous wallet debits don't double-spend |
| Durability | Committed data survives crashes | Payment confirmed = written to disk, not just memory |
Isolation Levels & Anomalies
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Performance |
|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Fastest |
| Read Committed | Prevented | Possible | Possible | Fast (PostgreSQL default) |
| Repeatable Read | Prevented | Prevented | Possible | Medium (MySQL InnoDB default) |
| Serializable | Prevented | Prevented | Prevented | Slowest |
Anomaly definitions:
- Dirty Read: Reading data from an uncommitted transaction (it might roll back).
- Non-Repeatable Read: Re-reading a row gives different values because another transaction committed between reads.
- Phantom Read: Re-running a range query returns new rows that were inserted by another committed transaction.
// Practical example: Double-spend prevention
// BAD — Read Committed allows this race condition:
// T1: SELECT balance FROM wallets WHERE user_id = 'U1'; → 500
// T2: SELECT balance FROM wallets WHERE user_id = 'U1'; → 500
// T1: UPDATE wallets SET balance = 500 - 400 WHERE user_id = 'U1'; → 100
// T2: UPDATE wallets SET balance = 500 - 400 WHERE user_id = 'U1'; → 100
// Both succeed! User spent Rs 800 with only Rs 500.
// SOLUTION 1: Serializable isolation
// T1: SELECT balance ... (acquires lock)
// T2: SELECT balance ... (BLOCKS until T1 commits)
// T1: UPDATE ... COMMIT → balance = 100
// T2: SELECT balance ... → 100, insufficient for Rs 400 → ABORT
// SOLUTION 2: Optimistic locking (better performance)
async function debitWithOptimisticLock(userId: string, amount: number): Promise<boolean> {
const { balance, version } = await db.query(
'SELECT balance, version FROM wallets WHERE user_id = $1',
[userId]
);
if (balance < amount) return false;
const result = await db.query(
`UPDATE wallets
SET balance = balance - $1, version = version + 1
WHERE user_id = $2 AND version = $3`,
[amount, userId, version]
);
// If another transaction changed the version, rowCount = 0 → retry
if (result.rowCount === 0) {
// Version mismatch — another transaction modified the row
// Retry from the beginning (read again)
return debitWithOptimisticLock(userId, amount);
}
return true;
}
// SOLUTION 3: Pessimistic locking (SELECT ... FOR UPDATE)
async function debitWithPessimisticLock(userId: string, amount: number): Promise<boolean> {
return db.transaction(async (tx) => {
// Acquires row-level lock — other transactions block here
const { balance } = await tx.query(
'SELECT balance FROM wallets WHERE user_id = $1 FOR UPDATE',
[userId]
);
if (balance < amount) return false;
await tx.query(
'UPDATE wallets SET balance = balance - $1 WHERE user_id = $2',
[amount, userId]
);
return true;
});
}Optimistic vs Pessimistic Locking
| Aspect | Optimistic | Pessimistic |
|---|---|---|
| Mechanism | Version column; detect conflict at write time | Lock row at read time (SELECT ... FOR UPDATE) |
| Blocking | No blocking on reads | Blocks other transactions on locked rows |
| Conflict Rate | Best when conflicts are rare | Best when conflicts are frequent |
| Retry | Application must retry on version mismatch | No retry needed — waits for lock |
| Deadlock Risk | None | Possible (if locking order isn't consistent) |
| Use Case | Merchant profile updates (low contention) | Wallet balance debits (high contention on popular wallets) |
8. Design Problem: Payment System at Scale
This is the big one. A payment system at Paytm scale: millions of transactions/day, multiple payment methods, strict consistency, full audit trail.
Functional Requirements
- Initiate payment — User pays merchant via UPI / card / wallet / netbanking
- Process payment — Validate, authorize, capture funds
- Payment status — Real-time status tracking (PENDING → AUTHORIZED → CAPTURED → SETTLED)
- Refunds — Full or partial refund with money returned to source
- Payment history — User and merchant can view past transactions
- Merchant settlement — Batch settle funds to merchant bank accounts (T+1 / T+2)
- Notifications — Real-time updates via webhook + push + SMS
Non-Functional Requirements
| Requirement | Target |
|---|---|
| Availability | 99.99% (< 53 min downtime/year) |
| Latency | Payment initiation < 500ms p99 |
| Throughput | 10,000+ TPS peak (festive sales) |
| Consistency | Strong consistency for balance operations |
| Durability | Zero data loss for financial transactions |
| Idempotency | Duplicate requests must not double-charge |
| Auditability | Full audit trail, immutable event log |
Architecture Diagram
┌─────────────────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Mobile │ │ Web App │ │ Merchant SDK │ │ Merchant │ │
│ │ App │ │ │ │ (checkout) │ │ Server API │ │
│ └────┬─────┘ └────┬─────┘ └──────┬───────┘ └─────┬──────┘ │
└───────┼──────────────┼───────────────┼────────────────┼─────────────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ API GATEWAY (L7 LB) │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Rate Limiting │ Auth (JWT/OAuth) │ Request Routing │ Idempotency Check│ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└─────────┬───────────────┬───────────────────┬──────────────────┬────────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Payment │ │ Wallet │ │ Settlement │ │ Notification │
│ Service │ │ Service │ │ Service │ │ Service │
│ │ │ │ │ │ │ │
│ - Initiate │ │ - Balance │ │ - Batch settle │ │ - Webhooks │
│ - Authorize │ │ - Debit │ │ - Reconcile │ │ - Push / SMS │
│ - Capture │ │ - Credit │ │ - Payout │ │ - Email │
│ - Refund │ │ - Freeze │ │ │ │ │
└──────┬───────┘ └──────┬───────┘ └────────┬─────────┘ └──────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ DATA LAYER │
│ │
│ ┌──────────────────┐ ┌───────────────┐ ┌──────────────┐ ┌───────────────┐ │
│ │ PostgreSQL │ │ Redis │ │ Kafka │ │ S3 / Object │ │
│ │ (primary DB) │ │ (cache + │ │ (event bus) │ │ Store │ │
│ │ │ │ locks + │ │ │ │ (receipts, │ │
│ │ - Payments │ │ rate limit) │ │ - Payment │ │ statements) │ │
│ │ - Wallets │ │ │ │ events │ │ │ │
│ │ - Merchants │ │ │ │ - Audit log │ │ │ │
│ │ - Ledger │ │ │ │ │ │ │ │
│ └──────────────────┘ └───────────────┘ └──────────────┘ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│ EXTERNAL PAYMENT GATEWAYS │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ UPI/NPCI │ │ Card │ │ Net │ │ Paytm │ │ Bank APIs │ │
│ │ │ │ Networks │ │ Banking │ │ Wallet │ │ (settlements)│ │
│ │ │ │(Visa/MC) │ │ │ │ │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘Idempotency Keys
Why critical: Network failures, timeouts, and retries mean the same request can hit your server multiple times. Without idempotency, a user gets charged twice.
// Client sends a unique idempotency key with every payment request
// POST /payments
// Headers: { "Idempotency-Key": "idem_abc123xyz" }
interface IdempotencyRecord {
key: string;
status: 'PROCESSING' | 'COMPLETED' | 'FAILED';
requestHash: string; // Hash of request body — detect mismatched reuse
response: unknown; // Cached response to return on duplicate
createdAt: Date;
expiresAt: Date; // Auto-cleanup after 24-48 hours
}
async function handlePaymentWithIdempotency(
idempotencyKey: string,
request: CreatePaymentRequest
): Promise<PaymentResponse> {
const requestHash = hash(JSON.stringify(request));
// Step 1: Check if we've seen this key before
const existing = await db.query(
'SELECT * FROM idempotency_keys WHERE key = $1 FOR UPDATE',
[idempotencyKey]
);
if (existing) {
// Verify the request body matches — same key, different body = error
if (existing.requestHash !== requestHash) {
throw new Error('Idempotency key reused with different request body');
}
if (existing.status === 'COMPLETED') {
// Return cached response — no re-processing
return existing.response as PaymentResponse;
}
if (existing.status === 'PROCESSING') {
// Another request is in-flight — return 409 Conflict or wait
throw new ConflictError('Payment is already being processed');
}
// FAILED — allow retry with same key
}
// Step 2: Insert idempotency record atomically
await db.query(
`INSERT INTO idempotency_keys (key, status, request_hash, created_at, expires_at)
VALUES ($1, 'PROCESSING', $2, NOW(), NOW() + INTERVAL '24 hours')
ON CONFLICT (key) DO NOTHING`,
[idempotencyKey, requestHash]
);
try {
// Step 3: Process payment
const response = await processPayment(request);
// Step 4: Cache response
await db.query(
`UPDATE idempotency_keys SET status = 'COMPLETED', response = $1 WHERE key = $2`,
[JSON.stringify(response), idempotencyKey]
);
return response;
} catch (error) {
await db.query(
`UPDATE idempotency_keys SET status = 'FAILED' WHERE key = $1`,
[idempotencyKey]
);
throw error;
}
}Key design decisions:
- Key is client-generated (UUID v4) so retries use the same key
FOR UPDATElock prevents race between duplicate concurrent requests- Request hash catches misuse (same key, different amount)
- TTL-based cleanup prevents unbounded table growth
Saga Pattern for Distributed Transactions
A payment involves multiple services (wallet, payment gateway, ledger, notifications). Traditional 2PC is slow and fragile. Sagas break the transaction into steps with compensating actions for rollback.
// Orchestrator-based Saga for a payment flow
interface SagaStep {
name: string;
execute: () => Promise<void>;
compensate: () => Promise<void>; // Undo this step if a later step fails
}
class PaymentSaga {
private executedSteps: SagaStep[] = [];
async run(paymentId: string, request: CreatePaymentRequest): Promise<void> {
const steps: SagaStep[] = [
{
name: 'VALIDATE_PAYMENT',
execute: async () => {
await paymentService.validate(request);
await paymentService.setStatus(paymentId, 'VALIDATED');
},
compensate: async () => {
await paymentService.setStatus(paymentId, 'VALIDATION_REVERSED');
},
},
{
name: 'FREEZE_FUNDS',
execute: async () => {
// Put a hold on funds (don't debit yet)
await walletService.freezeAmount(request.userId, request.amount, paymentId);
await paymentService.setStatus(paymentId, 'FUNDS_FROZEN');
},
compensate: async () => {
// Release the hold
await walletService.unfreezeAmount(request.userId, request.amount, paymentId);
await paymentService.setStatus(paymentId, 'FUNDS_RELEASED');
},
},
{
name: 'AUTHORIZE_WITH_GATEWAY',
execute: async () => {
// Call external payment gateway (UPI/card network)
const authResult = await paymentGateway.authorize(request);
await paymentService.setGatewayRef(paymentId, authResult.gatewayRef);
await paymentService.setStatus(paymentId, 'AUTHORIZED');
},
compensate: async () => {
// Void the authorization with the gateway
await paymentGateway.voidAuthorization(paymentId);
await paymentService.setStatus(paymentId, 'AUTH_VOIDED');
},
},
{
name: 'CAPTURE_AND_DEBIT',
execute: async () => {
// Actually debit the frozen funds
await walletService.captureFreeze(request.userId, request.amount, paymentId);
await paymentService.setStatus(paymentId, 'CAPTURED');
},
compensate: async () => {
// Refund back to wallet
await walletService.credit(request.userId, request.amount, paymentId);
await paymentService.setStatus(paymentId, 'CAPTURE_REVERSED');
},
},
{
name: 'RECORD_IN_LEDGER',
execute: async () => {
// Double-entry bookkeeping: debit user, credit merchant
await ledgerService.recordEntry({
debit: { account: `user:${request.userId}`, amount: request.amount },
credit: { account: `merchant:${request.merchantId}`, amount: request.amount },
paymentId,
});
await paymentService.setStatus(paymentId, 'SETTLED');
},
compensate: async () => {
// Reverse ledger entry
await ledgerService.recordReversal(paymentId);
await paymentService.setStatus(paymentId, 'LEDGER_REVERSED');
},
},
];
for (const step of steps) {
try {
await step.execute();
this.executedSteps.push(step);
} catch (error) {
console.error(`Saga step ${step.name} failed:`, error);
await paymentService.setStatus(paymentId, 'FAILED');
// Compensate in reverse order
await this.rollback();
throw new PaymentFailedError(paymentId, step.name, error);
}
}
}
private async rollback(): Promise<void> {
// Execute compensating actions in reverse order
for (const step of this.executedSteps.reverse()) {
try {
await step.compensate();
} catch (compensateError) {
// Log for manual intervention — compensation failures are critical
console.error(`CRITICAL: Compensation failed for ${step.name}`, compensateError);
await alertOpsTeam(step.name, compensateError);
// Continue compensating remaining steps
}
}
}
}
// Usage
const saga = new PaymentSaga();
await saga.run('PAY_12345', {
userId: 'U_001',
merchantId: 'M_042',
amount: 1500,
method: 'UPI',
});Why Saga over 2PC?
- 2PC holds locks across services — kills throughput at Paytm scale
- Saga allows each service to commit independently
- Compensation handles failures gracefully
- Works across heterogeneous systems (SQL + NoSQL + external APIs)
Event Sourcing for Audit Trail
Instead of storing just current state, store every state change as an immutable event. This gives you a complete, tamper-evident audit trail — critical for financial compliance.
// Every payment state change is an immutable event in Kafka + event store
interface PaymentEvent {
eventId: string; // Globally unique
paymentId: string; // Aggregate ID
eventType: string;
version: number; // Monotonically increasing per payment
timestamp: string; // ISO 8601
actor: string; // Who triggered this (user, system, admin)
data: Record<string, unknown>;
metadata: {
correlationId: string; // Trace across services
source: string; // Which service emitted this
};
}
// Example event stream for a single payment:
const paymentEvents: PaymentEvent[] = [
{
eventId: "evt_001", paymentId: "PAY_12345", eventType: "PAYMENT_INITIATED",
version: 1, timestamp: "2024-12-01T10:00:00Z", actor: "user:U_001",
data: { amount: 1500, currency: "INR", method: "UPI", merchantId: "M_042" },
metadata: { correlationId: "corr_abc", source: "payment-service" }
},
{
eventId: "evt_002", paymentId: "PAY_12345", eventType: "FUNDS_FROZEN",
version: 2, timestamp: "2024-12-01T10:00:01Z", actor: "system:wallet-service",
data: { frozenAmount: 1500, walletBalance: 3500 },
metadata: { correlationId: "corr_abc", source: "wallet-service" }
},
{
eventId: "evt_003", paymentId: "PAY_12345", eventType: "GATEWAY_AUTHORIZED",
version: 3, timestamp: "2024-12-01T10:00:03Z", actor: "system:gateway-service",
data: { gatewayRef: "UPI_REF_789", rrn: "432109876543" },
metadata: { correlationId: "corr_abc", source: "gateway-service" }
},
{
eventId: "evt_004", paymentId: "PAY_12345", eventType: "PAYMENT_CAPTURED",
version: 4, timestamp: "2024-12-01T10:00:04Z", actor: "system:payment-service",
data: { capturedAmount: 1500 },
metadata: { correlationId: "corr_abc", source: "payment-service" }
},
];
// Rebuild current state from events (event replay)
function rebuildPaymentState(events: PaymentEvent[]): PaymentState {
return events.reduce((state, event) => {
switch (event.eventType) {
case 'PAYMENT_INITIATED':
return {
...state,
id: event.paymentId,
amount: event.data.amount as number,
status: 'INITIATED',
method: event.data.method as string,
};
case 'FUNDS_FROZEN':
return { ...state, status: 'FUNDS_FROZEN' };
case 'GATEWAY_AUTHORIZED':
return { ...state, status: 'AUTHORIZED', gatewayRef: event.data.gatewayRef as string };
case 'PAYMENT_CAPTURED':
return { ...state, status: 'CAPTURED' };
case 'PAYMENT_REFUNDED':
return { ...state, status: 'REFUNDED', refundedAmount: event.data.amount as number };
default:
return state;
}
}, {} as PaymentState);
}
// Benefits for Paytm:
// 1. Full audit trail for RBI compliance
// 2. Can replay events to debug disputes ("show me exactly what happened at 10:00:03")
// 3. Build new read models without changing write path (CQRS)
// 4. Temporal queries: "What was the payment status at 10:00:02?"Circuit Breaker Pattern for External Gateways
External payment gateways (UPI/NPCI, Visa, bank APIs) can fail or slow down. Without a circuit breaker, your system queues up requests, exhausts connections, and cascades failures.
enum CircuitState {
CLOSED = 'CLOSED', // Normal operation — requests pass through
OPEN = 'OPEN', // Failures exceeded threshold — reject immediately
HALF_OPEN = 'HALF_OPEN', // Testing — allow limited requests to check recovery
}
class CircuitBreaker {
private state: CircuitState = CircuitState.CLOSED;
private failureCount: number = 0;
private successCount: number = 0;
private lastFailureTime: number = 0;
private readonly failureThreshold: number;
private readonly recoveryTimeout: number; // ms before trying HALF_OPEN
private readonly halfOpenMaxAttempts: number;
constructor(options: {
failureThreshold: number; // e.g., 5 failures
recoveryTimeout: number; // e.g., 30000ms (30 seconds)
halfOpenMaxAttempts: number; // e.g., 3 test requests
}) {
this.failureThreshold = options.failureThreshold;
this.recoveryTimeout = options.recoveryTimeout;
this.halfOpenMaxAttempts = options.halfOpenMaxAttempts;
}
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === CircuitState.OPEN) {
// Check if recovery timeout has elapsed
if (Date.now() - this.lastFailureTime >= this.recoveryTimeout) {
this.state = CircuitState.HALF_OPEN;
this.successCount = 0;
console.log('Circuit breaker → HALF_OPEN: testing recovery');
} else {
throw new CircuitOpenError('Circuit is OPEN — gateway unavailable, try again later');
}
}
try {
const result = await fn();
if (this.state === CircuitState.HALF_OPEN) {
this.successCount++;
if (this.successCount >= this.halfOpenMaxAttempts) {
this.state = CircuitState.CLOSED;
this.failureCount = 0;
console.log('Circuit breaker → CLOSED: gateway recovered');
}
} else {
this.failureCount = 0; // Reset on success in CLOSED state
}
return result;
} catch (error) {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = CircuitState.OPEN;
console.log(`Circuit breaker → OPEN: ${this.failureCount} consecutive failures`);
}
throw error;
}
}
}
// Usage: One circuit breaker per external gateway
const upiCircuit = new CircuitBreaker({
failureThreshold: 5,
recoveryTimeout: 30_000,
halfOpenMaxAttempts: 3,
});
const visaCircuit = new CircuitBreaker({
failureThreshold: 5,
recoveryTimeout: 60_000,
halfOpenMaxAttempts: 2,
});
async function processUPIPayment(request: PaymentRequest): Promise<GatewayResponse> {
return upiCircuit.execute(async () => {
const response = await fetch('https://upi-gateway.npci.org.in/authorize', {
method: 'POST',
body: JSON.stringify(request),
signal: AbortSignal.timeout(5000), // 5s timeout per request
});
if (!response.ok) throw new GatewayError(response.status);
return response.json();
});
}
// When UPI circuit is OPEN, payment service can:
// 1. Show user "UPI is temporarily unavailable, try card/wallet"
// 2. Queue the payment for retry
// 3. Route to backup UPI providerCircuit Breaker State Machine:
success (count < threshold)
┌──────────┐
│ │
▼ │
┌─────────┐ │ failure count >= threshold ┌────────┐
│ CLOSED │─────┼─────────────────────────────────→│ OPEN │
│(normal) │ │ │(reject)│
└─────────┘ │ └───┬────┘
▲ │ │
│ │ recovery timeout elapsed
│ ┌────┴──────┐ │
│ │ HALF_OPEN │◄──────────────────────────────┘
└─────│ (test) │
N successes└───────────┘
failure → back to OPENDatabase Schema Sketch
// Core tables for the payment system
// payments — Main payment record
// CREATE TABLE payments (
// id VARCHAR(26) PRIMARY KEY, -- ULID (sortable, unique)
// idempotency_key VARCHAR(64) UNIQUE NOT NULL,
// user_id VARCHAR(26) NOT NULL,
// merchant_id VARCHAR(26) NOT NULL,
// amount BIGINT NOT NULL, -- Store in smallest unit (paise)
// currency VARCHAR(3) NOT NULL DEFAULT 'INR',
// status VARCHAR(20) NOT NULL, -- INITIATED, AUTHORIZED, CAPTURED, SETTLED, FAILED, REFUNDED
// method VARCHAR(20) NOT NULL, -- UPI, CARD, WALLET, NETBANKING
// gateway_ref VARCHAR(128), -- External gateway reference
// description TEXT,
// metadata JSONB, -- Flexible key-value (order_id, etc.)
// created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
// updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
// INDEX idx_payments_user (user_id, created_at DESC),
// INDEX idx_payments_merchant (merchant_id, created_at DESC),
// INDEX idx_payments_status (status, created_at),
// INDEX idx_payments_gateway (gateway_ref)
// );
// wallets — User wallet balances
// CREATE TABLE wallets (
// user_id VARCHAR(26) PRIMARY KEY,
// balance BIGINT NOT NULL DEFAULT 0, -- In paise, never negative (CHECK constraint)
// frozen_amount BIGINT NOT NULL DEFAULT 0, -- Amount held for in-flight payments
// version INTEGER NOT NULL DEFAULT 0, -- Optimistic locking
// updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
// CHECK (balance >= 0),
// CHECK (frozen_amount >= 0),
// CHECK (balance >= frozen_amount)
// );
// ledger_entries — Double-entry bookkeeping (immutable, append-only)
// CREATE TABLE ledger_entries (
// id BIGSERIAL PRIMARY KEY,
// payment_id VARCHAR(26) NOT NULL,
// debit_account VARCHAR(64) NOT NULL, -- e.g., "user:U_001"
// credit_account VARCHAR(64) NOT NULL, -- e.g., "merchant:M_042"
// amount BIGINT NOT NULL,
// entry_type VARCHAR(20) NOT NULL, -- PAYMENT, REFUND, SETTLEMENT, FEE
// created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
// INDEX idx_ledger_payment (payment_id),
// INDEX idx_ledger_account (debit_account, created_at),
// INDEX idx_ledger_account2 (credit_account, created_at)
// );
// payment_events — Event sourcing store (immutable, append-only)
// CREATE TABLE payment_events (
// id BIGSERIAL PRIMARY KEY,
// event_id VARCHAR(36) UNIQUE NOT NULL,
// payment_id VARCHAR(26) NOT NULL,
// event_type VARCHAR(40) NOT NULL,
// version INTEGER NOT NULL,
// actor VARCHAR(64) NOT NULL,
// data JSONB NOT NULL,
// metadata JSONB,
// created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
// UNIQUE (payment_id, version), -- No gaps, no duplicates per payment
// INDEX idx_events_payment (payment_id, version)
// );
// idempotency_keys — Prevents duplicate processing
// CREATE TABLE idempotency_keys (
// key VARCHAR(64) PRIMARY KEY,
// request_hash VARCHAR(64) NOT NULL,
// status VARCHAR(20) NOT NULL, -- PROCESSING, COMPLETED, FAILED
// response JSONB,
// created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
// expires_at TIMESTAMPTZ NOT NULL
// );
// refunds — Track refunds separately for reconciliation
// CREATE TABLE refunds (
// id VARCHAR(26) PRIMARY KEY,
// payment_id VARCHAR(26) NOT NULL REFERENCES payments(id),
// amount BIGINT NOT NULL,
// reason TEXT,
// status VARCHAR(20) NOT NULL, -- INITIATED, PROCESSING, COMPLETED, FAILED
// gateway_ref VARCHAR(128),
// created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
// updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
// INDEX idx_refunds_payment (payment_id)
// );Design decisions:
- BIGINT for money: Store amounts in paise (smallest unit). Never use FLOAT/DECIMAL for money in application code. BIGINT avoids floating point errors entirely.
- ULID for IDs: Sortable (unlike UUIDv4), globally unique, encodes timestamp. Better for B-tree index locality.
- Separate ledger table: Immutable, append-only. Makes reconciliation and auditing straightforward.
- Event sourcing table: Separate from the payments table. Payments table holds current state (for fast reads), events table holds full history.
API Design
// ========================
// POST /v1/payments
// ========================
// Create a new payment
// Request
// Headers:
// Authorization: Bearer <token>
// Idempotency-Key: idem_<uuid>
// Content-Type: application/json
interface CreatePaymentRequest {
amount: number; // In paise (e.g., 150000 = Rs 1500)
currency: 'INR';
method: 'UPI' | 'CARD' | 'WALLET' | 'NETBANKING';
description?: string;
merchantId: string;
metadata?: Record<string, string>; // { orderId: "ORD_123" }
// Method-specific fields
upi?: { vpa: string };
card?: { token: string }; // Tokenized card (never raw card number)
netbanking?: { bankCode: string };
}
// Response (201 Created)
interface PaymentResponse {
id: string; // "pay_01HXK4..."
status: 'INITIATED' | 'AUTHORIZED' | 'CAPTURED' | 'SETTLED' | 'FAILED' | 'REFUNDED';
amount: number;
currency: string;
method: string;
merchantId: string;
gatewayRef?: string;
createdAt: string; // ISO 8601
updatedAt: string;
metadata?: Record<string, string>;
// For UPI: includes a deep link or collect request ID
nextAction?: {
type: 'UPI_COLLECT' | 'REDIRECT' | 'OTP';
url?: string;
collectRef?: string;
};
}
// Error Response (4xx/5xx)
interface ErrorResponse {
error: {
code: string; // "INSUFFICIENT_FUNDS", "INVALID_VPA", "GATEWAY_TIMEOUT"
message: string;
details?: Record<string, unknown>;
};
requestId: string; // For debugging
}
// ========================
// GET /v1/payments/:id
// ========================
// Retrieve payment details
// Response: PaymentResponse (same as above)
// 404 if not found
// Only accessible by the user or merchant associated with the payment
// ========================
// GET /v1/payments
// ========================
// List payments with filtering and pagination
// Query params:
// ?userId=U_001
// &status=CAPTURED
// &from=2024-01-01T00:00:00Z
// &to=2024-12-31T23:59:59Z
// &limit=20
// &cursor=pay_01HXK4... (cursor-based pagination)
interface PaymentListResponse {
data: PaymentResponse[];
pagination: {
hasMore: boolean;
nextCursor?: string; // Opaque cursor for next page
};
}
// ========================
// POST /v1/payments/:id/refund
// ========================
// Initiate a refund
interface RefundRequest {
amount?: number; // Partial refund amount in paise (omit for full refund)
reason?: string;
}
interface RefundResponse {
id: string; // "ref_01HXK5..."
paymentId: string;
amount: number;
status: 'INITIATED' | 'PROCESSING' | 'COMPLETED' | 'FAILED';
createdAt: string;
}
// ========================
// POST /v1/webhooks
// ========================
// Merchant registers a webhook URL to receive payment status updates
// Paytm sends POST to merchant's URL with:
interface WebhookPayload {
event: 'payment.authorized' | 'payment.captured' | 'payment.failed' | 'refund.completed';
data: PaymentResponse | RefundResponse;
timestamp: string;
signature: string; // HMAC-SHA256(payload, merchant_secret) — verify authenticity
}
// Webhook delivery:
// - Retry with exponential backoff (1s, 2s, 4s, 8s, ... up to 24h)
// - Expect 2xx response within 5 seconds
// - After 24h of failures, mark webhook as dead, alert merchantAPI design decisions:
- Cursor-based pagination over offset: More efficient for large datasets, stable under concurrent writes. The cursor is the last payment ID (ULID is sortable).
- Paise for amounts: Avoids floating point issues. Rs 1500.50 = 150050 paise.
- Webhook signatures: HMAC-SHA256 so merchants can verify the payload came from Paytm, not an attacker.
nextActionpattern: Different payment methods require different user actions (UPI collect, card 3DS redirect, OTP). The API tells the client what to do next instead of the client guessing.