Skip to content

06 - Vector Databases ​


1. What is a Vector Database? ​

What: A database optimized for storing, indexing, and querying high-dimensional vectors (embeddings). Unlike traditional databases that search by exact match or range, vector databases find the most similar vectors using distance metrics.

Traditional DB:  SELECT * FROM docs WHERE category = 'AI'   (exact match)
Vector DB:       Find 5 vectors closest to query_vector      (similarity search)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Vector Database                 β”‚
β”‚                                              β”‚
β”‚  Store:   [0.12, -0.34, 0.56, ..., 0.78]   β”‚  ← 1536-dim vectors
β”‚  Index:   HNSW / IVF / Flat                 β”‚  ← fast ANN search
β”‚  Query:   Find k nearest neighbors          β”‚  ← cosine / L2 / dot product
β”‚  Filter:  + metadata filtering              β”‚  ← combine with traditional filters
β”‚                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

What: Finding the exact nearest neighbors in high-dimensional space is slow (O(n) β€” must check every vector). ANN algorithms trade a small amount of accuracy for massive speed improvements.

Exact search:  Check all 1M vectors β†’ 100% accurate, slow
ANN search:    Check ~1000 vectors  β†’ 95-99% accurate, 100x faster

Why approximate is fine: In practice, the top-5 results from ANN almost always include the true top-5. The "missed" results are usually nearly as relevant.


3. HNSW (Hierarchical Navigable Small World) ​

What: The most popular ANN index. Builds a multi-layer graph where each layer is progressively sparser, enabling fast traversal from coarse to fine.

Layer 3:  A ──────────────────── D          (sparse, long-range)
          β”‚                      β”‚
Layer 2:  A ──── B ──────── D   β”‚          (medium density)
          β”‚      β”‚          β”‚   β”‚
Layer 1:  A ── B ── C ── D ── E ── F      (dense, short-range)
          β”‚    β”‚    β”‚    β”‚    β”‚    β”‚
Layer 0:  A  B  C  D  E  F  G  H  I  J    (all nodes)

How search works:

  1. Start at top layer β†’ find closest node using greedy traversal
  2. Drop to next layer β†’ continue greedy search from that node
  3. Repeat until reaching bottom layer
  4. Return k nearest neighbors

Parameters:

  • M: Max connections per node (higher = better recall, more memory)
  • ef_construction: Search depth during build (higher = better index quality)
  • ef_search: Search depth at query time (higher = better recall, slower)

Trade-offs:

  • Fast query time: O(log n)
  • High memory usage (graph structure)
  • Slow to build
  • Best for: high-recall, low-latency requirements

4. IVF (Inverted File Index) ​

What: Partitions vectors into clusters (using k-means), then only searches the nearest clusters at query time.

Build phase:
  All vectors β†’ K-means clustering β†’ N clusters (centroids)

Query phase:
  Query vector β†’ Find nprobe nearest centroids β†’ Search only those clusters

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Cluster 1β”‚  β”‚Cluster 2β”‚  β”‚Cluster 3β”‚  β”‚Cluster 4β”‚
β”‚ β€’ β€’ β€’   β”‚  β”‚ β€’ β€’ β€’   β”‚  β”‚ β€’ β€’     β”‚  β”‚ β€’ β€’ β€’ β€’ β”‚
β”‚ β€’ β€’     β”‚  β”‚ β€’ β€’ β€’ β€’ β”‚  β”‚ β€’ β€’ β€’   β”‚  β”‚ β€’ β€’     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  ↑                          ↑
              nprobe=2: only search these two clusters

Parameters:

  • nlist: Number of clusters (typically sqrt(n))
  • nprobe: Number of clusters to search (higher = better recall, slower)

Trade-offs:

  • Lower memory than HNSW
  • Faster build time
  • Requires training (k-means on representative data)
  • Best for: large datasets, memory-constrained environments

5. Vector Database Comparison ​

DatabaseTypeIndex TypesBest For
PineconeManaged cloudProprietaryProduction RAG, zero ops
ChromaEmbedded / localHNSWPrototyping, local dev
pgvectorPostgres extensionIVF, HNSWExisting Postgres stack
WeaviateSelf-hosted / cloudHNSWMulti-modal, GraphQL API
QdrantSelf-hosted / cloudHNSWFiltering + vector search
MilvusSelf-hosted / cloudIVF, HNSW, DiskANNLarge-scale, GPU support
FAISSLibrary (not DB)IVF, HNSW, PQResearch, custom pipelines

6. Pinecone ​

What: Fully managed vector database. No infrastructure to manage.

python
from pinecone import Pinecone

pc = Pinecone(api_key="your-key")
index = pc.Index("my-index")

# Upsert vectors
index.upsert(vectors=[
    {"id": "doc1", "values": [0.1, 0.2, ...], "metadata": {"source": "docs"}},
    {"id": "doc2", "values": [0.3, 0.4, ...], "metadata": {"source": "blog"}},
])

# Query
results = index.query(
    vector=[0.15, 0.25, ...],
    top_k=5,
    filter={"source": {"$eq": "docs"}},  # metadata filtering
    include_metadata=True
)

Key features: Serverless tier, namespaces, metadata filtering, hybrid search (sparse + dense).


7. Chroma ​

What: Open-source embedding database. Runs in-process (no server needed) or client-server. Great for prototyping.

python
import chromadb

client = chromadb.Client()  # in-memory
# client = chromadb.PersistentClient(path="./chroma_db")  # persistent

collection = client.create_collection("my_docs")

# Add documents (Chroma can auto-embed with default model)
collection.add(
    documents=["Doc about AI", "Doc about cooking"],
    ids=["doc1", "doc2"],
    metadatas=[{"topic": "ai"}, {"topic": "food"}]
)

# Query
results = collection.query(
    query_texts=["machine learning"],
    n_results=5,
    where={"topic": "ai"}
)

8. pgvector ​

What: PostgreSQL extension that adds vector similarity search. Use your existing Postgres database for embeddings β€” no separate infrastructure.

sql
-- Enable extension
CREATE EXTENSION vector;

-- Create table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1536)  -- OpenAI embedding dimension
);

-- Insert
INSERT INTO documents (content, embedding)
VALUES ('About AI', '[0.1, 0.2, ...]');

-- Cosine similarity search
SELECT content, 1 - (embedding <=> query_vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.15, 0.25, ...]'  -- <=> is cosine distance
LIMIT 5;

-- Create HNSW index for faster queries
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Operators: <-> L2 distance, <#> negative inner product, <=> cosine distance.

Trade-offs:

  • Pro: No new infrastructure, ACID transactions, join with relational data
  • Con: Not as fast as purpose-built vector DBs at scale, limited to Postgres

9. Key Considerations ​

Choosing a vector database:

Small project / prototype?     β†’ Chroma (embedded, zero setup)
Already using Postgres?        β†’ pgvector (no new infra)
Production, want managed?      β†’ Pinecone (serverless)
Need advanced filtering?       β†’ Qdrant or Weaviate
Massive scale (100M+ vectors)? β†’ Milvus or Pinecone
Research / custom pipeline?    β†’ FAISS (library)

Important metrics:

  • QPS (Queries Per Second): How many searches can you serve?
  • Recall@k: What % of true top-k results does your ANN return?
  • Latency (p99): Worst-case query time
  • Memory per vector: Storage cost at scale
  • Build time: How long to index your data?

Frontend interview preparation reference.