06 - Vector Databases

1. What is a Vector Database?

What: A database optimized for storing, indexing, and querying high-dimensional vectors (embeddings). Unlike traditional databases that search by exact match or range, vector databases find the most similar vectors using distance metrics.

Traditional DB:  SELECT * FROM docs WHERE category = 'AI'   (exact match)
Vector DB:       Find 5 vectors closest to query_vector      (similarity search)

┌─────────────────────────────────────────────┐
│              Vector Database                 │
│                                              │
│  Store:   [0.12, -0.34, 0.56, ..., 0.78]   │  ← 1536-dim vectors
│  Index:   HNSW / IVF / Flat                 │  ← fast ANN search
│  Query:   Find k nearest neighbors          │  ← cosine / L2 / dot product
│  Filter:  + metadata filtering              │  ← combine with traditional filters
│                                              │
└─────────────────────────────────────────────┘

2. ANN (Approximate Nearest Neighbor) Search

What: Finding the exact nearest neighbors in high-dimensional space is slow (O(n) — must check every vector). ANN algorithms trade a small amount of accuracy for massive speed improvements.

Exact search:  Check all 1M vectors → 100% accurate, slow
ANN search:    Check ~1000 vectors  → 95-99% accurate, 100x faster

Why approximate is fine: In practice, the top-5 results from ANN almost always include the true top-5. The "missed" results are usually nearly as relevant.

3. HNSW (Hierarchical Navigable Small World)

What: The most popular ANN index. Builds a multi-layer graph where each layer is progressively sparser, enabling fast traversal from coarse to fine.

Layer 3:  A ──────────────────── D          (sparse, long-range)
          │                      │
Layer 2:  A ──── B ──────── D   │          (medium density)
          │      │          │   │
Layer 1:  A ── B ── C ── D ── E ── F      (dense, short-range)
          │    │    │    │    │    │
Layer 0:  A  B  C  D  E  F  G  H  I  J    (all nodes)

How search works:

Start at top layer → find closest node using greedy traversal
Drop to next layer → continue greedy search from that node
Repeat until reaching bottom layer
Return k nearest neighbors

Parameters:

M: Max connections per node (higher = better recall, more memory)
ef_construction: Search depth during build (higher = better index quality)
ef_search: Search depth at query time (higher = better recall, slower)

Trade-offs:

Fast query time: O(log n)
High memory usage (graph structure)
Slow to build
Best for: high-recall, low-latency requirements

4. IVF (Inverted File Index)

What: Partitions vectors into clusters (using k-means), then only searches the nearest clusters at query time.

Build phase:
  All vectors → K-means clustering → N clusters (centroids)

Query phase:
  Query vector → Find nprobe nearest centroids → Search only those clusters

┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐
│Cluster 1│  │Cluster 2│  │Cluster 3│  │Cluster 4│
│ • • •   │  │ • • •   │  │ • •     │  │ • • • • │
│ • •     │  │ • • • • │  │ • • •   │  │ • •     │
└─────────┘  └─────────┘  └─────────┘  └─────────┘
                  ↑                          ↑
              nprobe=2: only search these two clusters

Parameters:

nlist: Number of clusters (typically sqrt(n))
nprobe: Number of clusters to search (higher = better recall, slower)

Trade-offs:

Lower memory than HNSW
Faster build time
Requires training (k-means on representative data)
Best for: large datasets, memory-constrained environments

5. Vector Database Comparison

Database	Type	Index Types	Best For
Pinecone	Managed cloud	Proprietary	Production RAG, zero ops
Chroma	Embedded / local	HNSW	Prototyping, local dev
pgvector	Postgres extension	IVF, HNSW	Existing Postgres stack
Weaviate	Self-hosted / cloud	HNSW	Multi-modal, GraphQL API
Qdrant	Self-hosted / cloud	HNSW	Filtering + vector search
Milvus	Self-hosted / cloud	IVF, HNSW, DiskANN	Large-scale, GPU support
FAISS	Library (not DB)	IVF, HNSW, PQ	Research, custom pipelines

6. Pinecone

What: Fully managed vector database. No infrastructure to manage.

python

from pinecone import Pinecone

pc = Pinecone(api_key="your-key")
index = pc.Index("my-index")

# Upsert vectors
index.upsert(vectors=[
    {"id": "doc1", "values": [0.1, 0.2, ...], "metadata": {"source": "docs"}},
    {"id": "doc2", "values": [0.3, 0.4, ...], "metadata": {"source": "blog"}},
])

# Query
results = index.query(
    vector=[0.15, 0.25, ...],
    top_k=5,
    filter={"source": {"$eq": "docs"}},  # metadata filtering
    include_metadata=True
)

Key features: Serverless tier, namespaces, metadata filtering, hybrid search (sparse + dense).

7. Chroma

What: Open-source embedding database. Runs in-process (no server needed) or client-server. Great for prototyping.

python

import chromadb

client = chromadb.Client()  # in-memory
# client = chromadb.PersistentClient(path="./chroma_db")  # persistent

collection = client.create_collection("my_docs")

# Add documents (Chroma can auto-embed with default model)
collection.add(
    documents=["Doc about AI", "Doc about cooking"],
    ids=["doc1", "doc2"],
    metadatas=[{"topic": "ai"}, {"topic": "food"}]
)

# Query
results = collection.query(
    query_texts=["machine learning"],
    n_results=5,
    where={"topic": "ai"}
)

8. pgvector

What: PostgreSQL extension that adds vector similarity search. Use your existing Postgres database for embeddings — no separate infrastructure.

sql

-- Enable extension
CREATE EXTENSION vector;

-- Create table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1536)  -- OpenAI embedding dimension
);

-- Insert
INSERT INTO documents (content, embedding)
VALUES ('About AI', '[0.1, 0.2, ...]');

-- Cosine similarity search
SELECT content, 1 - (embedding <=> query_vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.15, 0.25, ...]'  -- <=> is cosine distance
LIMIT 5;

-- Create HNSW index for faster queries
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Operators: <-> L2 distance, <#> negative inner product, <=> cosine distance.

Trade-offs:

Pro: No new infrastructure, ACID transactions, join with relational data
Con: Not as fast as purpose-built vector DBs at scale, limited to Postgres

9. Key Considerations

Choosing a vector database:

Small project / prototype?     → Chroma (embedded, zero setup)
Already using Postgres?        → pgvector (no new infra)
Production, want managed?      → Pinecone (serverless)
Need advanced filtering?       → Qdrant or Weaviate
Massive scale (100M+ vectors)? → Milvus or Pinecone
Research / custom pipeline?    → FAISS (library)

Important metrics:

QPS (Queries Per Second): How many searches can you serve?
Recall@k: What % of true top-k results does your ANN return?
Latency (p99): Worst-case query time
Memory per vector: Storage cost at scale
Build time: How long to index your data?

06 - Vector Databases ​

1. What is a Vector Database? ​

2. ANN (Approximate Nearest Neighbor) Search ​

3. HNSW (Hierarchical Navigable Small World) ​

4. IVF (Inverted File Index) ​

5. Vector Database Comparison ​

6. Pinecone ​

7. Chroma ​

8. pgvector ​

9. Key Considerations ​

06 - Vector Databases

1. What is a Vector Database?

2. ANN (Approximate Nearest Neighbor) Search

3. HNSW (Hierarchical Navigable Small World)

4. IVF (Inverted File Index)

5. Vector Database Comparison

6. Pinecone

7. Chroma

8. pgvector

9. Key Considerations