Vector Database Comparison — Pinecone, Weaviate, Chroma, Qdrant, and 4 More

8 vector databases compared across 10 dimensions including query latency at scale, pricing models, filtering capabilities, and operational complexity with a selection decision matrix.

Kenny Tan 15 April 2026

You Chose Your Vector Database for a 10,000-Document Prototype — Now It Needs to Handle 10 Million Documents and the Architecture Won’t Scale

Every vector database works at demo scale. The differences emerge at production scale: query latency at 10M+ vectors, filtering performance on metadata-heavy queries, cost at sustained throughput, and operational complexity when something breaks at 3 AM. The vector database market fragmented rapidly — 8+ viable options with fundamentally different architectures, pricing models, and operational requirements. This guide provides the 10-dimension comparison across all major options, the performance data at production scale, and the selection framework that matches database capabilities to application requirements.

The Comparison Matrix

Dimension	Pinecone	Weaviate	Qdrant	Chroma	Milvus	pgvector	Vespa	LanceDB
Type	Managed SaaS	Open-source + cloud	Open-source + cloud	Open-source	Open-source + cloud	PostgreSQL extension	Open-source + cloud	Open-source
Index type	Proprietary	HNSW	HNSW, scalar/product quantization	HNSW (via hnswlib)	IVF_FLAT, HNSW, DiskANN	IVF_FLAT, HNSW	HNSW, ANN	IVF_PQ, DiskANN
Max vectors	Billions (serverless)	100M+ (tested)	100M+ (tested)	1M (practical limit)	Billions (distributed)	10M+ (depends on RAM)	Billions (distributed)	100M+ (disk-based)
Query latency (1M vectors, p50)	10-30ms	5-20ms	3-15ms	5-30ms	5-25ms	10-50ms	5-15ms	8-25ms
Query latency (10M vectors, p50)	15-50ms	10-40ms	8-30ms	N/A (too slow)	10-35ms	30-150ms	8-25ms	15-40ms
Metadata filtering	Excellent (server-side)	Excellent (inverted index)	Excellent (payload index)	Basic	Good (scalar index)	Excellent (SQL WHERE)	Excellent (native)	Good
Hybrid search (vector + keyword)	No native BM25	Built-in BM25	Built-in BM25 (v1.7+)	No	Sparse vector support	tsvector + pgvector	Built-in BM25 + vector	No native BM25
Multi-tenancy	Namespaces	Native tenant isolation	Collection-per-tenant or payload filtering	Collections	Partitions	Row-level security	Built-in	Separate tables
Self-hosted option	No (SaaS only)	Yes (Docker, K8s)	Yes (Docker, K8s)	Yes (embedded, Docker)	Yes (Docker, K8s)	Yes (PostgreSQL)	Yes (Docker, K8s)	Yes (embedded)
Operational complexity	None (fully managed)	Medium	Medium	Low (embedded)	High (distributed)	Low (if you know PostgreSQL)	High	Low (embedded)

Pricing Comparison

Pricing is the most confusing dimension — every vendor uses a different model.

Managed/Cloud Pricing (1M vectors, 1024 dimensions)

Database	Storage cost	Query cost	Monthly total (10K queries/day)	Pricing model
Pinecone (Serverless)	$0.33/1M vectors/mo	$8/1M read units	$80-150/mo	Per-read-unit + storage
Pinecone (Standard)	Based on pod type	Included in pod	$70-280/mo	Pod-based (p1, s1 pods)
Weaviate Cloud	$25/mo (sandbox free)	Included	$95-250/mo	Cluster-based
Qdrant Cloud	$0.045/GB RAM	Included	$65-180/mo	RAM-based
Milvus (Zilliz Cloud)	$0.03/CU-hour	Included	$100-300/mo	Compute-unit hours
Vespa Cloud	$0.12/GB content	Included	$150-400/mo	Content-size based

Self-Hosted Cost (1M vectors, 1024 dimensions)

Database	Min RAM	Min storage	Recommended instance	Monthly cloud cost
Weaviate	4 GB	8 GB SSD	t3.xlarge (4 vCPU, 16 GB)	$120/mo
Qdrant	4 GB	8 GB SSD	t3.xlarge (4 vCPU, 16 GB)	$120/mo
Chroma	2 GB	4 GB SSD	t3.large (2 vCPU, 8 GB)	$60/mo
Milvus	8 GB + etcd + MinIO	20 GB SSD	m5.2xlarge (8 vCPU, 32 GB)	$280/mo
pgvector	4 GB (shared with PostgreSQL)	8 GB SSD	t3.xlarge (4 vCPU, 16 GB)	$120/mo
LanceDB	1 GB (disk-based)	8 GB SSD	t3.medium (2 vCPU, 4 GB)	$30/mo

Key insight: At 1M vectors, most databases cost $60-300/month whether managed or self-hosted. The cost divergence happens at 10M+ vectors where managed pricing scales linearly while self-hosted can scale more efficiently with disk-based solutions.

Cost at Scale (10M vectors, 1024 dimensions)

Database	Managed cost/mo	Self-hosted cost/mo	Cost ratio
Pinecone	$800-1,500	N/A (SaaS only)	—
Weaviate	$500-1,200	$300-600	2x
Qdrant	$400-1,000	$250-500	2x
Milvus	$600-1,500	$400-800	1.5x
pgvector	N/A	$300-700 (large PostgreSQL)	—
LanceDB	N/A	$80-200 (disk-optimized)	—

Performance at Scale

Query Latency by Vector Count

Vector count	Pinecone	Qdrant	Weaviate	pgvector	Milvus	LanceDB
100K	5-15ms	2-8ms	3-10ms	5-20ms	3-10ms	5-15ms
1M	10-30ms	3-15ms	5-20ms	10-50ms	5-25ms	8-25ms
5M	12-40ms	5-25ms	8-35ms	20-100ms	8-30ms	12-35ms
10M	15-50ms	8-30ms	10-40ms	30-150ms	10-35ms	15-40ms
50M	20-60ms	15-50ms	20-60ms	>200ms	15-45ms	25-60ms

Observations: Qdrant and Milvus maintain the best latency scaling characteristics. pgvector degrades fastest beyond 5M vectors — HNSW index builds become slow and RAM-dependent. LanceDB’s disk-based architecture trades slightly higher latency for dramatically lower RAM requirements.

Filtered Query Performance

Filtering (e.g., “find similar vectors WHERE category = ‘tech’ AND date > 2025-01-01”) is where databases diverge most:

Database	Pre-filter strategy	Filter impact on latency	Filter impact on recall
Pinecone	Server-side metadata filtering	+5-15ms	No recall impact (exact filter)
Weaviate	Inverted index + HNSW	+3-10ms	Minimal (<2% recall loss)
Qdrant	Payload index with HNSW	+2-8ms	Minimal (<2% recall loss)
Chroma	In-memory filtering	+10-50ms (slow at scale)	No recall impact
Milvus	Scalar index + vector search	+5-15ms	Minimal
pgvector	SQL WHERE + ANN search	+10-50ms	Can drop 5-15% recall (post-filter)
Vespa	Native filtered ANN	+2-5ms	Minimal
LanceDB	Column-based filtering	+5-15ms	Minimal

Critical distinction: Some databases filter before vector search (pre-filter — accurate but potentially slower on small filter sets) while others filter after (post-filter — fast but may miss relevant vectors that were pruned during ANN search). pgvector’s default post-filtering behavior is the most common source of recall degradation.

Feature Comparison for Production Requirements

Feature	Pinecone	Weaviate	Qdrant	Chroma	Milvus	pgvector
Backup/restore	Automatic	Manual + collections API	Snapshots	Manual (copy files)	Backup API	PostgreSQL pg_dump
High availability	Built-in	Replication (3+ nodes)	Replication + sharding	None (single node)	Replica groups	PostgreSQL HA (Patroni, etc.)
Horizontal scaling	Automatic (serverless)	Manual shard management	Manual sharding	Not supported	Automatic sharding	Read replicas only
Access control	API key + roles	OIDC, API key	API key, JWT	No auth (embedded)	RBAC	PostgreSQL roles
SDK languages	Python, JS, Java, Go	Python, JS, Java, Go, .NET	Python, JS, Rust, Go	Python, JS	Python, JS, Java, Go	Any PostgreSQL driver
Upsert performance	1,000-5,000 vectors/sec	2,000-8,000 vectors/sec	3,000-10,000 vectors/sec	1,000-3,000 vectors/sec	5,000-20,000 vectors/sec	500-2,000 vectors/sec
Index build time (1M vectors)	Automatic	5-15 min	3-10 min	2-8 min	5-20 min	10-30 min

Selection Decision Matrix

Your situation	Recommended database	Why
Prototype / POC (<100K vectors)	Chroma or LanceDB	Zero infrastructure; embedded in your application; free
Production SaaS, want zero ops	Pinecone	Fully managed; no infrastructure to maintain; auto-scales
Production, need hybrid search	Weaviate or Qdrant	Built-in BM25 + vector search; critical for keyword-dependent queries
Already using PostgreSQL	pgvector	No new infrastructure; leverage existing PostgreSQL expertise
10M+ vectors, cost-sensitive	Qdrant (self-hosted) or LanceDB	Best latency-per-dollar at scale
Enterprise, strict data sovereignty	Qdrant or Milvus (self-hosted)	Full control over data location; no third-party data access
High-throughput ingestion	Milvus	Best upsert performance (5K-20K vectors/sec)
Need advanced ranking beyond ANN	Vespa	Combines ML ranking, BM25, vector search, and business logic in one query
Serverless / edge deployment	LanceDB	Disk-based, low memory; can run on minimal infrastructure
Team knows PostgreSQL, nothing else	pgvector	Familiar ops, familiar tooling, acceptable for <5M vectors

Migration Difficulty

From	To	Difficulty
Chroma → Pinecone	Low	Export embeddings, re-import; metadata is simple
Chroma → Qdrant/Weaviate	Low	Same — export and re-import
pgvector → Qdrant	Medium	Schema translation; SQL filters → payload filters
Pinecone → self-hosted	Medium	Export via API; rebuild indexes; set up infrastructure
Milvus → Qdrant	Medium	Different collection schemas; re-index required
Any → Any (10M+ vectors)	High	Re-embedding not needed, but re-indexing at scale takes hours-days

Common Mistakes

Mistake	What happens	Better approach
Choosing based on benchmark alone	Benchmarks use uniform data; your data has hotspots, filters, variable dimensions	Test with your actual data and query patterns
pgvector at 10M+ vectors	Latency degrades to 100-200ms; index rebuilds take hours	Migrate to purpose-built vector DB before reaching 5M
Ignoring filtered search	Pure ANN works for demos; production always has metadata filters	Evaluate filtered query performance, not just pure vector search
Embedding lock-in	Chose database that limits embedding dimensions or models	Ensure database supports arbitrary dimensions and multiple collections
No backup strategy	Vector index corruption means re-embedding entire corpus	Implement backup on day one; test restore quarterly
Premature distributed deployment	Running Milvus cluster for 500K vectors	Single-node Qdrant or Weaviate handles 10M vectors; don’t distribute early

How to Apply This

Use the token-counter tool to estimate your embedding corpus size — this determines vector count, which drives database selection and cost.

Start with Chroma or LanceDB for prototyping. Zero infrastructure, embedded in your application, free. Migrate when you exceed 500K vectors or need production features (HA, backup, access control).

Test with your actual data. Load 10% of your production vectors and queries into your top 2 candidates. Measure latency, recall, and filtered search performance. Generic benchmarks predict poorly for specific workloads.

Plan for migration. Your first vector database won’t be your last. Store embeddings in a format that’s portable (vectors + metadata as JSON/Parquet). Re-indexing is fast; re-embedding is expensive.

Don’t over-engineer early. 90% of production RAG systems have fewer than 1M vectors. At that scale, every database in this comparison works acceptably. Choose for operational simplicity first, performance second.

Honest Limitations

Performance numbers are based on published benchmarks and community reports; your specific workload (dimension count, filter complexity, concurrent queries) will produce different results. Pricing changes frequently — Pinecone, Weaviate, and Qdrant have all revised pricing in the past 12 months. Self-hosted costs exclude engineering time for setup, maintenance, and troubleshooting — which can be 10-40 hours/month for distributed databases. The “pgvector degrades at 10M+” finding depends on instance size and index configuration; heavily tuned PostgreSQL setups can push this boundary further. New entrants (Turbopuffer, LanceDB Cloud, etc.) are emerging rapidly; the market hasn’t consolidated. HNSW recall numbers assume default configurations; tuning ef_construction and ef_search can improve recall at the cost of latency. The comparison focuses on vector search; Vespa and Weaviate offer significantly more functionality (ranking, aggregation, real-time indexing) that may justify their complexity for advanced use cases.

Kenny Tan Co-founder & Technical Lead

Cross-domain expertise in software engineering, content systems, and infrastructure architecture.

15 April 2026

Continue reading

AI Agent Design Patterns — Tool Use, Planning, and Memory Architectures

Agent architecture decision matrix comparing ReAct, Plan-and-Execute, and Tree-of-Thought with tool integration patterns, memory systems, and failure mode analysis for production agent systems.

AI API Integration Patterns — Direct Call vs Streaming vs Batch Processing

Latency, cost, and complexity comparison across AI API integration patterns with architecture decision matrix, failure handling strategies, and production throughput data.

AI Cost Optimization in Production — Techniques That Cut Spend by 60-80%

Cost reduction technique comparison with percentage savings, implementation effort, and quality impact data across model routing, caching, prompt compression, and architectural patterns.

All articles in ai development workflows