Skip to content

04 - HLD & System Design

Quick Reference (scan in 5 min)

TopicKey PointsInterview Tip
CAP TheoremPick 2 of 3: Consistency, Availability, Partition Tolerance. Payments = CP."Paytm wallet debits must be CP — stale balance means double-spend."
SQL vs NoSQLSQL for transactions (ACID), NoSQL for scale + flexible schema.Always justify choice with access patterns, not hype.
IndexingB-tree for range queries, Hash for exact lookup. Composite index order matters."I'd add a composite index on (user_id, created_at) for payment history."
ShardingHash-based for even distribution, range-based for time-series, geo for compliance.Mention consistent hashing with virtual nodes to handle rebalancing.
CachingCache-aside is default. Write-through for consistency. Write-behind for throughput."For Paytm's merchant dashboard, cache-aside with 30s TTL on analytics."
Message QueuesKafka for high-throughput event streaming. RabbitMQ for task routing."Payment events go to Kafka — ordered per partition by payment_id."
Load BalancingL4 for raw TCP speed, L7 for content-based routing.Mention health checks and graceful draining during deploys.
ACID + IsolationSerializable prevents all anomalies but kills throughput. Read Committed is the practical default."For payment writes I'd use Serializable; for read-heavy dashboards, Read Committed."
Payment System DesignIdempotency keys + Saga pattern + Event sourcing + Circuit breaker.Lead with idempotency — it's the #1 thing interviewers want to hear for fintech.

1. CAP Theorem

The Three Guarantees

  • Consistency (C): Every read returns the most recent write. All nodes see the same data at the same time.
  • Availability (A): Every request receives a response (success or failure), even if some nodes are down.
  • Partition Tolerance (P): The system continues to operate despite network partitions between nodes.

You can only guarantee 2 of 3 when a network partition occurs. Since partitions are inevitable in distributed systems, the real choice is CP vs AP.

CP vs AP — Real Examples

CP Systems (Consistency + Partition Tolerance)
├── Banking / Payment systems (Paytm wallet)
├── ZooKeeper (leader election)
├── HBase (strong consistency reads)
├── MongoDB (with majority write concern)
└── etcd (Raft consensus)

AP Systems (Availability + Partition Tolerance)
├── Social media feeds (eventual consistency is fine)
├── DNS (stale records are tolerable)
├── Cassandra (tunable consistency)
├── DynamoDB (default eventual consistency)
└── Couchbase

Why Paytm's Payment System is CP

ts
// Scenario: User has Rs 500 in wallet, tries to pay Rs 400

// CP system — consistent balance across nodes
async function debitWallet(userId: string, amount: number): Promise<PaymentResult> {
  // Step 1: Acquire distributed lock on user's wallet
  const lock = await acquireLock(`wallet:${userId}`, { ttl: 5000 });

  // Step 2: Read current balance (from primary/leader node)
  const balance = await db.readFromPrimary(`SELECT balance FROM wallets WHERE user_id = $1`, [userId]);

  // Step 3: Check sufficient funds
  if (balance < amount) {
    await releaseLock(lock);
    return { status: 'INSUFFICIENT_FUNDS' };
  }

  // Step 4: Debit atomically
  await db.execute(
    `UPDATE wallets SET balance = balance - $1 WHERE user_id = $2 AND balance >= $1`,
    [amount, userId]
  );

  await releaseLock(lock);
  return { status: 'SUCCESS', newBalance: balance - amount };
}

// AP system would risk: two concurrent requests both reading Rs 500,
// both succeeding, resulting in -Rs 300 balance (double-spend)

Key insight: During a network partition, a CP payment system will refuse requests (return errors) rather than risk processing with stale data. This is the correct trade-off for money.


2. SQL vs NoSQL Decision Framework

Comparison Table

DimensionSQL (PostgreSQL, MySQL)NoSQL (MongoDB, DynamoDB, Cassandra)
Data ModelRelational tables with fixed schemaDocument, Key-Value, Column-Family, Graph
SchemaStrict schema, migrations requiredFlexible / schema-on-read
ACIDFull ACID transactionsVaries — some support per-doc ACID (Mongo), most are BASE
ScalingVertical primarily; horizontal with effort (read replicas, sharding)Horizontal by design
JoinsNative, efficientManual (application-level) or denormalized
Query LanguageSQL (standardized, powerful)Vendor-specific APIs / query languages
ConsistencyStrong by defaultTunable (eventual to strong)
Best ForTransactions, complex queries, relationshipsHigh write throughput, flexible data, massive scale

When to Use Each in Fintech

ts
// SQL — Use for transactional, relational data
// Paytm examples:
const sqlUseCases = {
  payments: "ACID transactions for debit/credit",
  userAccounts: "Relational data with KYC, addresses, bank links",
  merchantSettlements: "Complex joins across payments, fees, payouts",
  ledger: "Double-entry bookkeeping requires strict consistency",
  walletBalances: "Cannot tolerate eventual consistency on money",
};

// NoSQL — Use for high-volume, flexible, or denormalized data
const noSqlUseCases = {
  sessionStore: "Redis — ephemeral, key-value access pattern",
  activityLogs: "DynamoDB — high write volume, simple access by userId",
  productCatalog: "MongoDB — varied product attributes, nested data",
  analytics: "Cassandra — time-series write-heavy analytics events",
  cache: "Redis — hot data like exchange rates, OTP attempts",
};

Decision Flowchart

Need ACID transactions?
├── Yes → SQL
└── No
    ├── Need flexible schema?
    │   ├── Yes → Document DB (MongoDB)
    │   └── No
    │       ├── Simple key-value access?
    │       │   ├── Yes → Redis / DynamoDB
    │       │   └── No
    │       │       ├── Write-heavy time-series?
    │       │       │   ├── Yes → Cassandra / TimescaleDB
    │       │       │   └── No → Evaluate SQL first
    │       │       └──
    │       └──
    └──

3. Indexing & Sharding

Index Types

B-tree Index (default in most RDBMSs)

  • Balanced tree structure, O(log n) lookups
  • Supports range queries (WHERE created_at > '2024-01-01')
  • Supports ordering (ORDER BY created_at DESC)
  • Good for: payment history by date range, user lookups

Hash Index

  • O(1) exact-match lookups
  • No range query support
  • Good for: idempotency key lookups, session lookups

Composite Index

  • Multi-column index — column order matters
  • Follows the leftmost prefix rule: index on (a, b, c) supports queries on (a), (a, b), and (a, b, c) but NOT (b) or (c) alone
ts
// Example: Payment history queries for Paytm

// Common queries:
// 1. All payments for a user               → WHERE user_id = ?
// 2. User's payments in a date range       → WHERE user_id = ? AND created_at BETWEEN ? AND ?
// 3. User's payments of a specific status  → WHERE user_id = ? AND status = ?

// Best composite index:
// CREATE INDEX idx_payments_user_date ON payments(user_id, created_at DESC);
// CREATE INDEX idx_payments_user_status ON payments(user_id, status);

// Covering index — includes all columns needed, avoids table lookup
// CREATE INDEX idx_payments_cover ON payments(user_id, created_at DESC)
//   INCLUDE (amount, status, merchant_id);
// Query: SELECT amount, status, merchant_id FROM payments
//        WHERE user_id = ? ORDER BY created_at DESC LIMIT 20;
// This is answered entirely from the index — no heap access.

Sharding Strategies

Range-Based Sharding

Shard by created_at:
  Shard 1: Jan-Mar 2024
  Shard 2: Apr-Jun 2024
  Shard 3: Jul-Sep 2024

Pros: Good for time-range queries, natural archival
Cons: Hot shard problem (latest shard gets all writes)
Use case: Log/analytics data where old data is rarely accessed

Hash-Based Sharding

Shard = hash(user_id) % num_shards

Pros: Even distribution of writes
Cons: Range queries require scatter-gather across all shards
Use case: Paytm user wallets — even distribution of balance lookups

Geographic Sharding

Shard by region:
  Shard India-North: Delhi, UP, Punjab users
  Shard India-South: Karnataka, TN, Kerala users
  Shard India-West: Maharashtra, Gujarat users

Pros: Data locality, compliance (data residency), lower latency
Cons: Uneven shard sizes, cross-region transactions are complex
Use case: Compliance requirements, latency-sensitive fintech apps

Consistent Hashing with Virtual Nodes

ts
// Problem: Adding/removing a shard with hash % N redistributes almost ALL keys
// Solution: Consistent hashing — only K/N keys move when a node is added

class ConsistentHashRing {
  private ring: Map<number, string> = new Map(); // position → nodeId
  private sortedPositions: number[] = [];
  private virtualNodesPerNode: number;

  constructor(virtualNodesPerNode: number = 150) {
    this.virtualNodesPerNode = virtualNodesPerNode;
  }

  addNode(nodeId: string): void {
    // Each physical node gets multiple positions on the ring (virtual nodes)
    // This ensures even distribution even with few physical nodes
    for (let i = 0; i < this.virtualNodesPerNode; i++) {
      const virtualKey = `${nodeId}:vn${i}`;
      const position = this.hash(virtualKey);
      this.ring.set(position, nodeId);
      this.sortedPositions.push(position);
    }
    this.sortedPositions.sort((a, b) => a - b);
  }

  removeNode(nodeId: string): void {
    for (let i = 0; i < this.virtualNodesPerNode; i++) {
      const virtualKey = `${nodeId}:vn${i}`;
      const position = this.hash(virtualKey);
      this.ring.delete(position);
    }
    this.sortedPositions = this.sortedPositions.filter(p => this.ring.has(p));
  }

  getNode(key: string): string {
    const keyHash = this.hash(key);
    // Walk clockwise around the ring to find the first node
    for (const position of this.sortedPositions) {
      if (position >= keyHash) {
        return this.ring.get(position)!;
      }
    }
    // Wrap around to the first node
    return this.ring.get(this.sortedPositions[0])!;
  }

  private hash(key: string): number {
    // Simplified — in production use MurmurHash3 or xxHash
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      hash = ((hash << 5) - hash + key.charCodeAt(i)) | 0;
    }
    return Math.abs(hash);
  }
}

// Usage: Distributing Paytm user wallets across DB shards
const ring = new ConsistentHashRing(150);
ring.addNode("db-shard-1");
ring.addNode("db-shard-2");
ring.addNode("db-shard-3");

const shard = ring.getNode("user:12345"); // → "db-shard-2"
// Adding db-shard-4 only moves ~25% of keys, not 75%

Why virtual nodes? Without them, 3 physical nodes might end up with 60/25/15 distribution. With 150 virtual nodes each, distribution approaches 33/33/33.


4. Caching Strategies

Cache-Aside (Lazy Loading)

Read path:
┌────────┐    1. GET     ┌───────┐
│  App   │──────────────→│ Cache │
│ Server │←──────────────│(Redis)│
│        │   2. HIT?     │       │
│        │               └───────┘
│        │   3. MISS → read DB
│        │──────────────→┌────────┐
│        │←──────────────│   DB   │
│        │   4. Return   └────────┘
│        │
│        │──5. SET cache─→ Cache
└────────┘

Write path:
App writes to DB directly, then invalidates/deletes cache key.
ts
async function getPaymentDetails(paymentId: string): Promise<Payment> {
  // 1. Check cache first
  const cached = await redis.get(`payment:${paymentId}`);
  if (cached) return JSON.parse(cached);

  // 2. Cache miss — read from DB
  const payment = await db.query('SELECT * FROM payments WHERE id = $1', [paymentId]);

  // 3. Populate cache with TTL
  await redis.setex(`payment:${paymentId}`, 300, JSON.stringify(payment)); // 5 min TTL

  return payment;
}

async function updatePaymentStatus(paymentId: string, status: string): Promise<void> {
  await db.query('UPDATE payments SET status = $1 WHERE id = $2', [status, paymentId]);
  // Invalidate cache — next read will fetch fresh data
  await redis.del(`payment:${paymentId}`);
}

When to use: Most common pattern. Good default. App controls cache population. Trade-off: First request after miss is slow. Possible stale data if invalidation fails.

Read-Through

┌────────┐  1. GET   ┌──────────────────┐  2. MISS → auto-fetch  ┌────────┐
│  App   │──────────→│  Cache (manages   │──────────────────────→│   DB   │
│ Server │←──────────│  its own loading) │←──────────────────────│        │
│        │  3. Data  │                   │  4. Data              └────────┘
└────────┘           └──────────────────┘
                     Cache auto-populates on miss

When to use: Cache library handles DB fetching. Simpler app code. Same data source always. Trade-off: Cache library must know how to query your DB. Less flexible than cache-aside.

Write-Through

┌────────┐  1. WRITE  ┌───────┐  2. Sync write  ┌────────┐
│  App   │───────────→│ Cache │─────────────────→│   DB   │
│ Server │←───────────│(Redis)│←─────────────────│        │
│        │  4. ACK    │       │  3. ACK          └────────┘
└────────┘            └───────┘
                      Cache + DB always in sync
ts
async function createPayment(payment: Payment): Promise<void> {
  // Write-through: write to cache AND DB synchronously
  await db.query('INSERT INTO payments ...', [payment]);
  await redis.setex(`payment:${payment.id}`, 3600, JSON.stringify(payment));
  // Both are updated before returning to caller
}

When to use: Need strong consistency between cache and DB. Read-heavy after write. Trade-off: Higher write latency (two writes per operation). Writes to cache even for data never read.

Write-Behind (Write-Back)

┌────────┐  1. WRITE  ┌───────┐
│  App   │───────────→│ Cache │ ── 2. ACK (immediate)
│ Server │←───────────│(Redis)│
└────────┘            │       │  3. Async batch write (later)
                      │       │─────────────────→┌────────┐
                      └───────┘                  │   DB   │
                                                 └────────┘

When to use: High write throughput needed. Can tolerate brief inconsistency. Trade-off: Data loss risk if cache crashes before flushing to DB. Complex failure handling. Fintech note: Generally NOT suitable for payment transactions. Acceptable for analytics counters (e.g., page views on merchant dashboard).

Redis vs Memcached

FeatureRedisMemcached
Data StructuresStrings, Lists, Sets, Sorted Sets, Hashes, StreamsStrings only
PersistenceRDB snapshots + AOFNone (pure cache)
ReplicationBuilt-in master-replicaNone
ClusteringRedis Cluster (auto-sharding)Client-side sharding
Pub/SubYesNo
Lua ScriptingYes (atomic operations)No
Memory EfficiencyHigher overhead per keyMore memory-efficient for simple strings
MultithreadingSingle-threaded (I/O threads in 6.0+)Multithreaded
Use in FintechRate limiting, session store, leaderboards, distributed locksSimple high-throughput caching

Default choice for Paytm: Redis, because you need data structures (sorted sets for rate limiting), persistence (survive restarts), and pub/sub (real-time notifications).


5. Message Queues

Kafka vs RabbitMQ

DimensionKafkaRabbitMQ
ModelDistributed commit log (pull-based)Message broker (push-based)
OrderingPer-partition ordering guaranteedPer-queue ordering (single consumer)
ThroughputMillions of messages/secTens of thousands/sec
RetentionConfigurable (days/weeks/forever)Messages deleted after consumption
Consumer ModelConsumer groups (each group gets all messages)Competing consumers (each message to one consumer)
ReplayYes — consumers can seek to any offsetNo — once consumed, gone
DeliveryAt-least-once (exactly-once with transactions)At-most-once or at-least-once (configurable)
RoutingTopic + partitionExchanges, bindings, routing keys (flexible)
Best ForEvent streaming, audit logs, high throughputTask queues, RPC, complex routing

When to Use Each in Fintech

ts
// Kafka — Event streaming, audit trail, analytics pipeline
// "Every payment state change is an event appended to a Kafka topic"

// Topic: payment-events, partitioned by payment_id (ordering per payment)
interface PaymentEvent {
  eventId: string;
  paymentId: string;      // partition key — all events for a payment land in same partition
  eventType: 'CREATED' | 'AUTHORIZED' | 'CAPTURED' | 'SETTLED' | 'REFUNDED' | 'FAILED';
  timestamp: number;
  payload: Record<string, unknown>;
}

// Consumer group: settlement-service reads all payment events
// Consumer group: analytics-service reads all payment events (independently)
// Consumer group: notification-service reads all payment events
// Each group maintains its own offset — Kafka retains messages for all groups

// RabbitMQ — Task queue for async jobs with complex routing
// "Send email notification after payment success"

// Exchange: notifications (type: topic)
// Routing key: payment.success.email → Queue: email-notifications
// Routing key: payment.success.sms   → Queue: sms-notifications
// Routing key: payment.failure.*     → Queue: failure-alerts

// RabbitMQ excels here because:
// 1. Complex routing rules (topic exchange with wildcards)
// 2. Per-message acknowledgment (re-queue on failure)
// 3. Dead letter queue for failed messages
// 4. Priority queues (VIP merchant notifications first)

Ordering Guarantees

Kafka:
  - Ordering guaranteed WITHIN a partition
  - No ordering across partitions
  - Strategy: Use payment_id as partition key
    → All events for payment "PAY_123" go to same partition
    → CREATED always comes before CAPTURED for that payment

RabbitMQ:
  - Ordering guaranteed within a single queue with a single consumer
  - Multiple consumers on same queue → no ordering guarantee
  - Use single consumer or message grouping for ordering

Delivery Semantics

At-most-once:  Fire and forget. May lose messages. Fast.
               Use: Analytics events where losing 0.1% is acceptable.

At-least-once: Retry until ACK. May duplicate. Most common.
               Use: Payment notifications — duplicates are safe if idempotent.

Exactly-once:  No loss, no duplicates. Hard / expensive.
               Kafka: Achieved via idempotent producer + transactional consumers.
               Use: Financial ledger entries — duplicates cause accounting errors.

6. Load Balancing

Algorithms

Round Robin

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  (cycle repeats)

Pros: Simple, even distribution
Cons: Ignores server load — a slow server gets same traffic as a fast one
Use: Stateless services with homogeneous servers

Weighted Round Robin

Server A (weight 5): Gets 5 out of every 8 requests
Server B (weight 2): Gets 2 out of every 8 requests
Server C (weight 1): Gets 1 out of every 8 requests

Use: Mixed hardware — beefy servers get more traffic

Least Connections

Server A: 12 active connections → skip
Server B:  3 active connections → ROUTE HERE
Server C:  7 active connections → skip

Pros: Adapts to actual server load
Cons: Requires tracking connection counts
Use: Long-lived connections (WebSockets for real-time payment status)

Consistent Hashing

hash(user_id) → always same server (unless topology changes)

Pros: Session affinity without sticky sessions, minimal redistribution on scale
Cons: Uneven if hash function is poor
Use: Caching layers, stateful services

L4 vs L7 Load Balancing

AspectL4 (Transport Layer)L7 (Application Layer)
Operates OnTCP/UDP packets (IP + port)HTTP headers, URL path, cookies, body
SpeedFaster — no payload inspectionSlower — inspects full request
RoutingBy IP and port onlyBy URL path, header, cookie, etc.
SSL TerminationPass-through (end-to-end encryption)Terminates SSL, re-encrypts to backend
Use CaseRaw TCP services, database connectionsAPI routing, A/B testing, canary deploys
ExampleAWS NLBAWS ALB, Nginx, HAProxy (L7 mode)
Paytm Load Balancing Architecture:

Internet → CDN (static assets)
        → L7 ALB
            ├── /api/payments/*    → Payment Service cluster
            ├── /api/merchants/*   → Merchant Service cluster
            ├── /api/wallets/*     → Wallet Service cluster
            └── /api/analytics/*   → Analytics Service cluster

Each service cluster has its own internal L4 NLB for service-to-service gRPC calls.

7. ACID + Isolation Levels

ACID Properties

PropertyMeaningFintech Example
AtomicityAll or nothing — partial transactions roll backDebit wallet + credit merchant: both succeed or both fail
ConsistencyDB moves from one valid state to anotherBalance never goes negative (CHECK constraint)
IsolationConcurrent transactions don't interfereTwo simultaneous wallet debits don't double-spend
DurabilityCommitted data survives crashesPayment confirmed = written to disk, not just memory

Isolation Levels & Anomalies

Isolation LevelDirty ReadNon-Repeatable ReadPhantom ReadPerformance
Read UncommittedPossiblePossiblePossibleFastest
Read CommittedPreventedPossiblePossibleFast (PostgreSQL default)
Repeatable ReadPreventedPreventedPossibleMedium (MySQL InnoDB default)
SerializablePreventedPreventedPreventedSlowest

Anomaly definitions:

  • Dirty Read: Reading data from an uncommitted transaction (it might roll back).
  • Non-Repeatable Read: Re-reading a row gives different values because another transaction committed between reads.
  • Phantom Read: Re-running a range query returns new rows that were inserted by another committed transaction.
ts
// Practical example: Double-spend prevention

// BAD — Read Committed allows this race condition:
// T1: SELECT balance FROM wallets WHERE user_id = 'U1';  → 500
// T2: SELECT balance FROM wallets WHERE user_id = 'U1';  → 500
// T1: UPDATE wallets SET balance = 500 - 400 WHERE user_id = 'U1';  → 100
// T2: UPDATE wallets SET balance = 500 - 400 WHERE user_id = 'U1';  → 100
// Both succeed! User spent Rs 800 with only Rs 500.

// SOLUTION 1: Serializable isolation
// T1: SELECT balance ... (acquires lock)
// T2: SELECT balance ... (BLOCKS until T1 commits)
// T1: UPDATE ... COMMIT → balance = 100
// T2: SELECT balance ... → 100, insufficient for Rs 400 → ABORT

// SOLUTION 2: Optimistic locking (better performance)
async function debitWithOptimisticLock(userId: string, amount: number): Promise<boolean> {
  const { balance, version } = await db.query(
    'SELECT balance, version FROM wallets WHERE user_id = $1',
    [userId]
  );

  if (balance < amount) return false;

  const result = await db.query(
    `UPDATE wallets
     SET balance = balance - $1, version = version + 1
     WHERE user_id = $2 AND version = $3`,
    [amount, userId, version]
  );

  // If another transaction changed the version, rowCount = 0 → retry
  if (result.rowCount === 0) {
    // Version mismatch — another transaction modified the row
    // Retry from the beginning (read again)
    return debitWithOptimisticLock(userId, amount);
  }

  return true;
}

// SOLUTION 3: Pessimistic locking (SELECT ... FOR UPDATE)
async function debitWithPessimisticLock(userId: string, amount: number): Promise<boolean> {
  return db.transaction(async (tx) => {
    // Acquires row-level lock — other transactions block here
    const { balance } = await tx.query(
      'SELECT balance FROM wallets WHERE user_id = $1 FOR UPDATE',
      [userId]
    );

    if (balance < amount) return false;

    await tx.query(
      'UPDATE wallets SET balance = balance - $1 WHERE user_id = $2',
      [amount, userId]
    );

    return true;
  });
}

Optimistic vs Pessimistic Locking

AspectOptimisticPessimistic
MechanismVersion column; detect conflict at write timeLock row at read time (SELECT ... FOR UPDATE)
BlockingNo blocking on readsBlocks other transactions on locked rows
Conflict RateBest when conflicts are rareBest when conflicts are frequent
RetryApplication must retry on version mismatchNo retry needed — waits for lock
Deadlock RiskNonePossible (if locking order isn't consistent)
Use CaseMerchant profile updates (low contention)Wallet balance debits (high contention on popular wallets)

8. Design Problem: Payment System at Scale

This is the big one. A payment system at Paytm scale: millions of transactions/day, multiple payment methods, strict consistency, full audit trail.

Functional Requirements

  1. Initiate payment — User pays merchant via UPI / card / wallet / netbanking
  2. Process payment — Validate, authorize, capture funds
  3. Payment status — Real-time status tracking (PENDING → AUTHORIZED → CAPTURED → SETTLED)
  4. Refunds — Full or partial refund with money returned to source
  5. Payment history — User and merchant can view past transactions
  6. Merchant settlement — Batch settle funds to merchant bank accounts (T+1 / T+2)
  7. Notifications — Real-time updates via webhook + push + SMS

Non-Functional Requirements

RequirementTarget
Availability99.99% (< 53 min downtime/year)
LatencyPayment initiation < 500ms p99
Throughput10,000+ TPS peak (festive sales)
ConsistencyStrong consistency for balance operations
DurabilityZero data loss for financial transactions
IdempotencyDuplicate requests must not double-charge
AuditabilityFull audit trail, immutable event log

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────────┐
│                              CLIENT LAYER                                       │
│  ┌──────────┐  ┌──────────┐  ┌──────────────┐  ┌────────────┐                  │
│  │ Mobile   │  │ Web App  │  │ Merchant SDK │  │ Merchant   │                  │
│  │ App      │  │          │  │ (checkout)   │  │ Server API │                  │
│  └────┬─────┘  └────┬─────┘  └──────┬───────┘  └─────┬──────┘                  │
└───────┼──────────────┼───────────────┼────────────────┼─────────────────────────┘
        │              │               │                │
        ▼              ▼               ▼                ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                            API GATEWAY (L7 LB)                                  │
│  ┌────────────────────────────────────────────────────────────────────────┐      │
│  │ Rate Limiting │ Auth (JWT/OAuth) │ Request Routing │ Idempotency Check│      │
│  └────────────────────────────────────────────────────────────────────────┘      │
└─────────┬───────────────┬───────────────────┬──────────────────┬────────────────┘
          │               │                   │                  │
          ▼               ▼                   ▼                  ▼
┌──────────────┐ ┌──────────────┐  ┌──────────────────┐ ┌──────────────────┐
│   Payment    │ │   Wallet     │  │   Settlement     │ │  Notification    │
│   Service    │ │   Service    │  │   Service        │ │  Service         │
│              │ │              │  │                  │ │                  │
│ - Initiate   │ │ - Balance    │  │ - Batch settle   │ │ - Webhooks       │
│ - Authorize  │ │ - Debit      │  │ - Reconcile      │ │ - Push / SMS     │
│ - Capture    │ │ - Credit     │  │ - Payout         │ │ - Email          │
│ - Refund     │ │ - Freeze     │  │                  │ │                  │
└──────┬───────┘ └──────┬───────┘  └────────┬─────────┘ └──────────────────┘
       │                │                   │
       ▼                ▼                   ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│                           DATA LAYER                                            │
│                                                                                 │
│  ┌──────────────────┐  ┌───────────────┐  ┌──────────────┐  ┌───────────────┐  │
│  │ PostgreSQL       │  │ Redis         │  │ Kafka        │  │ S3 / Object   │  │
│  │ (primary DB)     │  │ (cache +      │  │ (event bus)  │  │ Store         │  │
│  │                  │  │  locks +      │  │              │  │ (receipts,    │  │
│  │ - Payments       │  │  rate limit)  │  │ - Payment    │  │  statements)  │  │
│  │ - Wallets        │  │              │  │   events     │  │               │  │
│  │ - Merchants      │  │              │  │ - Audit log  │  │               │  │
│  │ - Ledger         │  │              │  │              │  │               │  │
│  └──────────────────┘  └───────────────┘  └──────────────┘  └───────────────┘  │
│                                                                                 │
└─────────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────────┐
│                     EXTERNAL PAYMENT GATEWAYS                                   │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────┐     │
│  │ UPI/NPCI │  │ Card     │  │ Net      │  │ Paytm    │  │ Bank APIs    │     │
│  │          │  │ Networks │  │ Banking  │  │ Wallet   │  │ (settlements)│     │
│  │          │  │(Visa/MC) │  │          │  │          │  │              │     │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘  └──────────────┘     │
└─────────────────────────────────────────────────────────────────────────────────┘

Idempotency Keys

Why critical: Network failures, timeouts, and retries mean the same request can hit your server multiple times. Without idempotency, a user gets charged twice.

ts
// Client sends a unique idempotency key with every payment request
// POST /payments
// Headers: { "Idempotency-Key": "idem_abc123xyz" }

interface IdempotencyRecord {
  key: string;
  status: 'PROCESSING' | 'COMPLETED' | 'FAILED';
  requestHash: string;   // Hash of request body — detect mismatched reuse
  response: unknown;     // Cached response to return on duplicate
  createdAt: Date;
  expiresAt: Date;       // Auto-cleanup after 24-48 hours
}

async function handlePaymentWithIdempotency(
  idempotencyKey: string,
  request: CreatePaymentRequest
): Promise<PaymentResponse> {
  const requestHash = hash(JSON.stringify(request));

  // Step 1: Check if we've seen this key before
  const existing = await db.query(
    'SELECT * FROM idempotency_keys WHERE key = $1 FOR UPDATE',
    [idempotencyKey]
  );

  if (existing) {
    // Verify the request body matches — same key, different body = error
    if (existing.requestHash !== requestHash) {
      throw new Error('Idempotency key reused with different request body');
    }

    if (existing.status === 'COMPLETED') {
      // Return cached response — no re-processing
      return existing.response as PaymentResponse;
    }

    if (existing.status === 'PROCESSING') {
      // Another request is in-flight — return 409 Conflict or wait
      throw new ConflictError('Payment is already being processed');
    }

    // FAILED — allow retry with same key
  }

  // Step 2: Insert idempotency record atomically
  await db.query(
    `INSERT INTO idempotency_keys (key, status, request_hash, created_at, expires_at)
     VALUES ($1, 'PROCESSING', $2, NOW(), NOW() + INTERVAL '24 hours')
     ON CONFLICT (key) DO NOTHING`,
    [idempotencyKey, requestHash]
  );

  try {
    // Step 3: Process payment
    const response = await processPayment(request);

    // Step 4: Cache response
    await db.query(
      `UPDATE idempotency_keys SET status = 'COMPLETED', response = $1 WHERE key = $2`,
      [JSON.stringify(response), idempotencyKey]
    );

    return response;
  } catch (error) {
    await db.query(
      `UPDATE idempotency_keys SET status = 'FAILED' WHERE key = $1`,
      [idempotencyKey]
    );
    throw error;
  }
}

Key design decisions:

  • Key is client-generated (UUID v4) so retries use the same key
  • FOR UPDATE lock prevents race between duplicate concurrent requests
  • Request hash catches misuse (same key, different amount)
  • TTL-based cleanup prevents unbounded table growth

Saga Pattern for Distributed Transactions

A payment involves multiple services (wallet, payment gateway, ledger, notifications). Traditional 2PC is slow and fragile. Sagas break the transaction into steps with compensating actions for rollback.

ts
// Orchestrator-based Saga for a payment flow

interface SagaStep {
  name: string;
  execute: () => Promise<void>;
  compensate: () => Promise<void>;  // Undo this step if a later step fails
}

class PaymentSaga {
  private executedSteps: SagaStep[] = [];

  async run(paymentId: string, request: CreatePaymentRequest): Promise<void> {
    const steps: SagaStep[] = [
      {
        name: 'VALIDATE_PAYMENT',
        execute: async () => {
          await paymentService.validate(request);
          await paymentService.setStatus(paymentId, 'VALIDATED');
        },
        compensate: async () => {
          await paymentService.setStatus(paymentId, 'VALIDATION_REVERSED');
        },
      },
      {
        name: 'FREEZE_FUNDS',
        execute: async () => {
          // Put a hold on funds (don't debit yet)
          await walletService.freezeAmount(request.userId, request.amount, paymentId);
          await paymentService.setStatus(paymentId, 'FUNDS_FROZEN');
        },
        compensate: async () => {
          // Release the hold
          await walletService.unfreezeAmount(request.userId, request.amount, paymentId);
          await paymentService.setStatus(paymentId, 'FUNDS_RELEASED');
        },
      },
      {
        name: 'AUTHORIZE_WITH_GATEWAY',
        execute: async () => {
          // Call external payment gateway (UPI/card network)
          const authResult = await paymentGateway.authorize(request);
          await paymentService.setGatewayRef(paymentId, authResult.gatewayRef);
          await paymentService.setStatus(paymentId, 'AUTHORIZED');
        },
        compensate: async () => {
          // Void the authorization with the gateway
          await paymentGateway.voidAuthorization(paymentId);
          await paymentService.setStatus(paymentId, 'AUTH_VOIDED');
        },
      },
      {
        name: 'CAPTURE_AND_DEBIT',
        execute: async () => {
          // Actually debit the frozen funds
          await walletService.captureFreeze(request.userId, request.amount, paymentId);
          await paymentService.setStatus(paymentId, 'CAPTURED');
        },
        compensate: async () => {
          // Refund back to wallet
          await walletService.credit(request.userId, request.amount, paymentId);
          await paymentService.setStatus(paymentId, 'CAPTURE_REVERSED');
        },
      },
      {
        name: 'RECORD_IN_LEDGER',
        execute: async () => {
          // Double-entry bookkeeping: debit user, credit merchant
          await ledgerService.recordEntry({
            debit: { account: `user:${request.userId}`, amount: request.amount },
            credit: { account: `merchant:${request.merchantId}`, amount: request.amount },
            paymentId,
          });
          await paymentService.setStatus(paymentId, 'SETTLED');
        },
        compensate: async () => {
          // Reverse ledger entry
          await ledgerService.recordReversal(paymentId);
          await paymentService.setStatus(paymentId, 'LEDGER_REVERSED');
        },
      },
    ];

    for (const step of steps) {
      try {
        await step.execute();
        this.executedSteps.push(step);
      } catch (error) {
        console.error(`Saga step ${step.name} failed:`, error);
        await paymentService.setStatus(paymentId, 'FAILED');
        // Compensate in reverse order
        await this.rollback();
        throw new PaymentFailedError(paymentId, step.name, error);
      }
    }
  }

  private async rollback(): Promise<void> {
    // Execute compensating actions in reverse order
    for (const step of this.executedSteps.reverse()) {
      try {
        await step.compensate();
      } catch (compensateError) {
        // Log for manual intervention — compensation failures are critical
        console.error(`CRITICAL: Compensation failed for ${step.name}`, compensateError);
        await alertOpsTeam(step.name, compensateError);
        // Continue compensating remaining steps
      }
    }
  }
}

// Usage
const saga = new PaymentSaga();
await saga.run('PAY_12345', {
  userId: 'U_001',
  merchantId: 'M_042',
  amount: 1500,
  method: 'UPI',
});

Why Saga over 2PC?

  • 2PC holds locks across services — kills throughput at Paytm scale
  • Saga allows each service to commit independently
  • Compensation handles failures gracefully
  • Works across heterogeneous systems (SQL + NoSQL + external APIs)

Event Sourcing for Audit Trail

Instead of storing just current state, store every state change as an immutable event. This gives you a complete, tamper-evident audit trail — critical for financial compliance.

ts
// Every payment state change is an immutable event in Kafka + event store

interface PaymentEvent {
  eventId: string;          // Globally unique
  paymentId: string;        // Aggregate ID
  eventType: string;
  version: number;          // Monotonically increasing per payment
  timestamp: string;        // ISO 8601
  actor: string;            // Who triggered this (user, system, admin)
  data: Record<string, unknown>;
  metadata: {
    correlationId: string;  // Trace across services
    source: string;         // Which service emitted this
  };
}

// Example event stream for a single payment:
const paymentEvents: PaymentEvent[] = [
  {
    eventId: "evt_001", paymentId: "PAY_12345", eventType: "PAYMENT_INITIATED",
    version: 1, timestamp: "2024-12-01T10:00:00Z", actor: "user:U_001",
    data: { amount: 1500, currency: "INR", method: "UPI", merchantId: "M_042" },
    metadata: { correlationId: "corr_abc", source: "payment-service" }
  },
  {
    eventId: "evt_002", paymentId: "PAY_12345", eventType: "FUNDS_FROZEN",
    version: 2, timestamp: "2024-12-01T10:00:01Z", actor: "system:wallet-service",
    data: { frozenAmount: 1500, walletBalance: 3500 },
    metadata: { correlationId: "corr_abc", source: "wallet-service" }
  },
  {
    eventId: "evt_003", paymentId: "PAY_12345", eventType: "GATEWAY_AUTHORIZED",
    version: 3, timestamp: "2024-12-01T10:00:03Z", actor: "system:gateway-service",
    data: { gatewayRef: "UPI_REF_789", rrn: "432109876543" },
    metadata: { correlationId: "corr_abc", source: "gateway-service" }
  },
  {
    eventId: "evt_004", paymentId: "PAY_12345", eventType: "PAYMENT_CAPTURED",
    version: 4, timestamp: "2024-12-01T10:00:04Z", actor: "system:payment-service",
    data: { capturedAmount: 1500 },
    metadata: { correlationId: "corr_abc", source: "payment-service" }
  },
];

// Rebuild current state from events (event replay)
function rebuildPaymentState(events: PaymentEvent[]): PaymentState {
  return events.reduce((state, event) => {
    switch (event.eventType) {
      case 'PAYMENT_INITIATED':
        return {
          ...state,
          id: event.paymentId,
          amount: event.data.amount as number,
          status: 'INITIATED',
          method: event.data.method as string,
        };
      case 'FUNDS_FROZEN':
        return { ...state, status: 'FUNDS_FROZEN' };
      case 'GATEWAY_AUTHORIZED':
        return { ...state, status: 'AUTHORIZED', gatewayRef: event.data.gatewayRef as string };
      case 'PAYMENT_CAPTURED':
        return { ...state, status: 'CAPTURED' };
      case 'PAYMENT_REFUNDED':
        return { ...state, status: 'REFUNDED', refundedAmount: event.data.amount as number };
      default:
        return state;
    }
  }, {} as PaymentState);
}

// Benefits for Paytm:
// 1. Full audit trail for RBI compliance
// 2. Can replay events to debug disputes ("show me exactly what happened at 10:00:03")
// 3. Build new read models without changing write path (CQRS)
// 4. Temporal queries: "What was the payment status at 10:00:02?"

Circuit Breaker Pattern for External Gateways

External payment gateways (UPI/NPCI, Visa, bank APIs) can fail or slow down. Without a circuit breaker, your system queues up requests, exhausts connections, and cascades failures.

ts
enum CircuitState {
  CLOSED = 'CLOSED',         // Normal operation — requests pass through
  OPEN = 'OPEN',             // Failures exceeded threshold — reject immediately
  HALF_OPEN = 'HALF_OPEN',   // Testing — allow limited requests to check recovery
}

class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failureCount: number = 0;
  private successCount: number = 0;
  private lastFailureTime: number = 0;
  private readonly failureThreshold: number;
  private readonly recoveryTimeout: number;     // ms before trying HALF_OPEN
  private readonly halfOpenMaxAttempts: number;

  constructor(options: {
    failureThreshold: number;  // e.g., 5 failures
    recoveryTimeout: number;   // e.g., 30000ms (30 seconds)
    halfOpenMaxAttempts: number; // e.g., 3 test requests
  }) {
    this.failureThreshold = options.failureThreshold;
    this.recoveryTimeout = options.recoveryTimeout;
    this.halfOpenMaxAttempts = options.halfOpenMaxAttempts;
  }

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === CircuitState.OPEN) {
      // Check if recovery timeout has elapsed
      if (Date.now() - this.lastFailureTime >= this.recoveryTimeout) {
        this.state = CircuitState.HALF_OPEN;
        this.successCount = 0;
        console.log('Circuit breaker → HALF_OPEN: testing recovery');
      } else {
        throw new CircuitOpenError('Circuit is OPEN — gateway unavailable, try again later');
      }
    }

    try {
      const result = await fn();

      if (this.state === CircuitState.HALF_OPEN) {
        this.successCount++;
        if (this.successCount >= this.halfOpenMaxAttempts) {
          this.state = CircuitState.CLOSED;
          this.failureCount = 0;
          console.log('Circuit breaker → CLOSED: gateway recovered');
        }
      } else {
        this.failureCount = 0; // Reset on success in CLOSED state
      }

      return result;
    } catch (error) {
      this.failureCount++;
      this.lastFailureTime = Date.now();

      if (this.failureCount >= this.failureThreshold) {
        this.state = CircuitState.OPEN;
        console.log(`Circuit breaker → OPEN: ${this.failureCount} consecutive failures`);
      }

      throw error;
    }
  }
}

// Usage: One circuit breaker per external gateway
const upiCircuit = new CircuitBreaker({
  failureThreshold: 5,
  recoveryTimeout: 30_000,
  halfOpenMaxAttempts: 3,
});

const visaCircuit = new CircuitBreaker({
  failureThreshold: 5,
  recoveryTimeout: 60_000,
  halfOpenMaxAttempts: 2,
});

async function processUPIPayment(request: PaymentRequest): Promise<GatewayResponse> {
  return upiCircuit.execute(async () => {
    const response = await fetch('https://upi-gateway.npci.org.in/authorize', {
      method: 'POST',
      body: JSON.stringify(request),
      signal: AbortSignal.timeout(5000), // 5s timeout per request
    });

    if (!response.ok) throw new GatewayError(response.status);
    return response.json();
  });
}

// When UPI circuit is OPEN, payment service can:
// 1. Show user "UPI is temporarily unavailable, try card/wallet"
// 2. Queue the payment for retry
// 3. Route to backup UPI provider
Circuit Breaker State Machine:

         success (count < threshold)
         ┌──────────┐
         │          │
         ▼          │
    ┌─────────┐     │    failure count >= threshold    ┌────────┐
    │ CLOSED  │─────┼─────────────────────────────────→│  OPEN  │
    │(normal) │     │                                  │(reject)│
    └─────────┘     │                                  └───┬────┘
         ▲          │                                      │
         │          │                          recovery timeout elapsed
         │     ┌────┴──────┐                               │
         │     │ HALF_OPEN │◄──────────────────────────────┘
         └─────│  (test)   │
    N successes└───────────┘
               failure → back to OPEN

Database Schema Sketch

ts
// Core tables for the payment system

// payments — Main payment record
// CREATE TABLE payments (
//   id              VARCHAR(26) PRIMARY KEY,     -- ULID (sortable, unique)
//   idempotency_key VARCHAR(64) UNIQUE NOT NULL,
//   user_id         VARCHAR(26) NOT NULL,
//   merchant_id     VARCHAR(26) NOT NULL,
//   amount          BIGINT NOT NULL,             -- Store in smallest unit (paise)
//   currency        VARCHAR(3) NOT NULL DEFAULT 'INR',
//   status          VARCHAR(20) NOT NULL,        -- INITIATED, AUTHORIZED, CAPTURED, SETTLED, FAILED, REFUNDED
//   method          VARCHAR(20) NOT NULL,        -- UPI, CARD, WALLET, NETBANKING
//   gateway_ref     VARCHAR(128),                -- External gateway reference
//   description     TEXT,
//   metadata        JSONB,                       -- Flexible key-value (order_id, etc.)
//   created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//   updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
//   INDEX idx_payments_user      (user_id, created_at DESC),
//   INDEX idx_payments_merchant  (merchant_id, created_at DESC),
//   INDEX idx_payments_status    (status, created_at),
//   INDEX idx_payments_gateway   (gateway_ref)
// );

// wallets — User wallet balances
// CREATE TABLE wallets (
//   user_id         VARCHAR(26) PRIMARY KEY,
//   balance         BIGINT NOT NULL DEFAULT 0,   -- In paise, never negative (CHECK constraint)
//   frozen_amount   BIGINT NOT NULL DEFAULT 0,   -- Amount held for in-flight payments
//   version         INTEGER NOT NULL DEFAULT 0,  -- Optimistic locking
//   updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
//   CHECK (balance >= 0),
//   CHECK (frozen_amount >= 0),
//   CHECK (balance >= frozen_amount)
// );

// ledger_entries — Double-entry bookkeeping (immutable, append-only)
// CREATE TABLE ledger_entries (
//   id              BIGSERIAL PRIMARY KEY,
//   payment_id      VARCHAR(26) NOT NULL,
//   debit_account   VARCHAR(64) NOT NULL,        -- e.g., "user:U_001"
//   credit_account  VARCHAR(64) NOT NULL,        -- e.g., "merchant:M_042"
//   amount          BIGINT NOT NULL,
//   entry_type      VARCHAR(20) NOT NULL,        -- PAYMENT, REFUND, SETTLEMENT, FEE
//   created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
//   INDEX idx_ledger_payment  (payment_id),
//   INDEX idx_ledger_account  (debit_account, created_at),
//   INDEX idx_ledger_account2 (credit_account, created_at)
// );

// payment_events — Event sourcing store (immutable, append-only)
// CREATE TABLE payment_events (
//   id              BIGSERIAL PRIMARY KEY,
//   event_id        VARCHAR(36) UNIQUE NOT NULL,
//   payment_id      VARCHAR(26) NOT NULL,
//   event_type      VARCHAR(40) NOT NULL,
//   version         INTEGER NOT NULL,
//   actor           VARCHAR(64) NOT NULL,
//   data            JSONB NOT NULL,
//   metadata        JSONB,
//   created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
//   UNIQUE (payment_id, version),                -- No gaps, no duplicates per payment
//   INDEX idx_events_payment (payment_id, version)
// );

// idempotency_keys — Prevents duplicate processing
// CREATE TABLE idempotency_keys (
//   key             VARCHAR(64) PRIMARY KEY,
//   request_hash    VARCHAR(64) NOT NULL,
//   status          VARCHAR(20) NOT NULL,        -- PROCESSING, COMPLETED, FAILED
//   response        JSONB,
//   created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//   expires_at      TIMESTAMPTZ NOT NULL
// );

// refunds — Track refunds separately for reconciliation
// CREATE TABLE refunds (
//   id              VARCHAR(26) PRIMARY KEY,
//   payment_id      VARCHAR(26) NOT NULL REFERENCES payments(id),
//   amount          BIGINT NOT NULL,
//   reason          TEXT,
//   status          VARCHAR(20) NOT NULL,        -- INITIATED, PROCESSING, COMPLETED, FAILED
//   gateway_ref     VARCHAR(128),
//   created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//   updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
//
//   INDEX idx_refunds_payment (payment_id)
// );

Design decisions:

  • BIGINT for money: Store amounts in paise (smallest unit). Never use FLOAT/DECIMAL for money in application code. BIGINT avoids floating point errors entirely.
  • ULID for IDs: Sortable (unlike UUIDv4), globally unique, encodes timestamp. Better for B-tree index locality.
  • Separate ledger table: Immutable, append-only. Makes reconciliation and auditing straightforward.
  • Event sourcing table: Separate from the payments table. Payments table holds current state (for fast reads), events table holds full history.

API Design

ts
// ========================
// POST /v1/payments
// ========================
// Create a new payment

// Request
// Headers:
//   Authorization: Bearer <token>
//   Idempotency-Key: idem_<uuid>
//   Content-Type: application/json

interface CreatePaymentRequest {
  amount: number;           // In paise (e.g., 150000 = Rs 1500)
  currency: 'INR';
  method: 'UPI' | 'CARD' | 'WALLET' | 'NETBANKING';
  description?: string;
  merchantId: string;
  metadata?: Record<string, string>;  // { orderId: "ORD_123" }
  // Method-specific fields
  upi?: { vpa: string };
  card?: { token: string };           // Tokenized card (never raw card number)
  netbanking?: { bankCode: string };
}

// Response (201 Created)
interface PaymentResponse {
  id: string;               // "pay_01HXK4..."
  status: 'INITIATED' | 'AUTHORIZED' | 'CAPTURED' | 'SETTLED' | 'FAILED' | 'REFUNDED';
  amount: number;
  currency: string;
  method: string;
  merchantId: string;
  gatewayRef?: string;
  createdAt: string;        // ISO 8601
  updatedAt: string;
  metadata?: Record<string, string>;
  // For UPI: includes a deep link or collect request ID
  nextAction?: {
    type: 'UPI_COLLECT' | 'REDIRECT' | 'OTP';
    url?: string;
    collectRef?: string;
  };
}

// Error Response (4xx/5xx)
interface ErrorResponse {
  error: {
    code: string;           // "INSUFFICIENT_FUNDS", "INVALID_VPA", "GATEWAY_TIMEOUT"
    message: string;
    details?: Record<string, unknown>;
  };
  requestId: string;        // For debugging
}

// ========================
// GET /v1/payments/:id
// ========================
// Retrieve payment details

// Response: PaymentResponse (same as above)
// 404 if not found
// Only accessible by the user or merchant associated with the payment

// ========================
// GET /v1/payments
// ========================
// List payments with filtering and pagination

// Query params:
//   ?userId=U_001
//   &status=CAPTURED
//   &from=2024-01-01T00:00:00Z
//   &to=2024-12-31T23:59:59Z
//   &limit=20
//   &cursor=pay_01HXK4...   (cursor-based pagination)

interface PaymentListResponse {
  data: PaymentResponse[];
  pagination: {
    hasMore: boolean;
    nextCursor?: string;    // Opaque cursor for next page
  };
}

// ========================
// POST /v1/payments/:id/refund
// ========================
// Initiate a refund

interface RefundRequest {
  amount?: number;          // Partial refund amount in paise (omit for full refund)
  reason?: string;
}

interface RefundResponse {
  id: string;               // "ref_01HXK5..."
  paymentId: string;
  amount: number;
  status: 'INITIATED' | 'PROCESSING' | 'COMPLETED' | 'FAILED';
  createdAt: string;
}

// ========================
// POST /v1/webhooks
// ========================
// Merchant registers a webhook URL to receive payment status updates
// Paytm sends POST to merchant's URL with:

interface WebhookPayload {
  event: 'payment.authorized' | 'payment.captured' | 'payment.failed' | 'refund.completed';
  data: PaymentResponse | RefundResponse;
  timestamp: string;
  signature: string;        // HMAC-SHA256(payload, merchant_secret) — verify authenticity
}

// Webhook delivery:
// - Retry with exponential backoff (1s, 2s, 4s, 8s, ... up to 24h)
// - Expect 2xx response within 5 seconds
// - After 24h of failures, mark webhook as dead, alert merchant

API design decisions:

  • Cursor-based pagination over offset: More efficient for large datasets, stable under concurrent writes. The cursor is the last payment ID (ULID is sortable).
  • Paise for amounts: Avoids floating point issues. Rs 1500.50 = 150050 paise.
  • Webhook signatures: HMAC-SHA256 so merchants can verify the payload came from Paytm, not an attacker.
  • nextAction pattern: Different payment methods require different user actions (UPI collect, card 3DS redirect, OTP). The API tells the client what to do next instead of the client guessing.

Frontend interview preparation reference.