Vector Database Comparison — Pinecone, Weaviate, Chroma, Qdrant, and 4 More
8 vector databases compared across 10 dimensions including query latency at scale, pricing models, filtering capabilities, and operational complexity with a selection decision matrix.
You Chose Your Vector Database for a 10,000-Document Prototype — Now It Needs to Handle 10 Million Documents and the Architecture Won’t Scale
Every vector database works at demo scale. The differences emerge at production scale: query latency at 10M+ vectors, filtering performance on metadata-heavy queries, cost at sustained throughput, and operational complexity when something breaks at 3 AM. The vector database market fragmented rapidly — 8+ viable options with fundamentally different architectures, pricing models, and operational requirements. This guide provides the 10-dimension comparison across all major options, the performance data at production scale, and the selection framework that matches database capabilities to application requirements.
The Comparison Matrix
| Dimension | Pinecone | Weaviate | Qdrant | Chroma | Milvus | pgvector | Vespa | LanceDB |
|---|---|---|---|---|---|---|---|---|
| Type | Managed SaaS | Open-source + cloud | Open-source + cloud | Open-source | Open-source + cloud | PostgreSQL extension | Open-source + cloud | Open-source |
| Index type | Proprietary | HNSW | HNSW, scalar/product quantization | HNSW (via hnswlib) | IVF_FLAT, HNSW, DiskANN | IVF_FLAT, HNSW | HNSW, ANN | IVF_PQ, DiskANN |
| Max vectors | Billions (serverless) | 100M+ (tested) | 100M+ (tested) | 1M (practical limit) | Billions (distributed) | 10M+ (depends on RAM) | Billions (distributed) | 100M+ (disk-based) |
| Query latency (1M vectors, p50) | 10-30ms | 5-20ms | 3-15ms | 5-30ms | 5-25ms | 10-50ms | 5-15ms | 8-25ms |
| Query latency (10M vectors, p50) | 15-50ms | 10-40ms | 8-30ms | N/A (too slow) | 10-35ms | 30-150ms | 8-25ms | 15-40ms |
| Metadata filtering | Excellent (server-side) | Excellent (inverted index) | Excellent (payload index) | Basic | Good (scalar index) | Excellent (SQL WHERE) | Excellent (native) | Good |
| Hybrid search (vector + keyword) | No native BM25 | Built-in BM25 | Built-in BM25 (v1.7+) | No | Sparse vector support | tsvector + pgvector | Built-in BM25 + vector | No native BM25 |
| Multi-tenancy | Namespaces | Native tenant isolation | Collection-per-tenant or payload filtering | Collections | Partitions | Row-level security | Built-in | Separate tables |
| Self-hosted option | No (SaaS only) | Yes (Docker, K8s) | Yes (Docker, K8s) | Yes (embedded, Docker) | Yes (Docker, K8s) | Yes (PostgreSQL) | Yes (Docker, K8s) | Yes (embedded) |
| Operational complexity | None (fully managed) | Medium | Medium | Low (embedded) | High (distributed) | Low (if you know PostgreSQL) | High | Low (embedded) |
Pricing Comparison
Pricing is the most confusing dimension — every vendor uses a different model.
Managed/Cloud Pricing (1M vectors, 1024 dimensions)
| Database | Storage cost | Query cost | Monthly total (10K queries/day) | Pricing model |
|---|---|---|---|---|
| Pinecone (Serverless) | $0.33/1M vectors/mo | $8/1M read units | $80-150/mo | Per-read-unit + storage |
| Pinecone (Standard) | Based on pod type | Included in pod | $70-280/mo | Pod-based (p1, s1 pods) |
| Weaviate Cloud | $25/mo (sandbox free) | Included | $95-250/mo | Cluster-based |
| Qdrant Cloud | $0.045/GB RAM | Included | $65-180/mo | RAM-based |
| Milvus (Zilliz Cloud) | $0.03/CU-hour | Included | $100-300/mo | Compute-unit hours |
| Vespa Cloud | $0.12/GB content | Included | $150-400/mo | Content-size based |
Self-Hosted Cost (1M vectors, 1024 dimensions)
| Database | Min RAM | Min storage | Recommended instance | Monthly cloud cost |
|---|---|---|---|---|
| Weaviate | 4 GB | 8 GB SSD | t3.xlarge (4 vCPU, 16 GB) | $120/mo |
| Qdrant | 4 GB | 8 GB SSD | t3.xlarge (4 vCPU, 16 GB) | $120/mo |
| Chroma | 2 GB | 4 GB SSD | t3.large (2 vCPU, 8 GB) | $60/mo |
| Milvus | 8 GB + etcd + MinIO | 20 GB SSD | m5.2xlarge (8 vCPU, 32 GB) | $280/mo |
| pgvector | 4 GB (shared with PostgreSQL) | 8 GB SSD | t3.xlarge (4 vCPU, 16 GB) | $120/mo |
| LanceDB | 1 GB (disk-based) | 8 GB SSD | t3.medium (2 vCPU, 4 GB) | $30/mo |
Key insight: At 1M vectors, most databases cost $60-300/month whether managed or self-hosted. The cost divergence happens at 10M+ vectors where managed pricing scales linearly while self-hosted can scale more efficiently with disk-based solutions.
Cost at Scale (10M vectors, 1024 dimensions)
| Database | Managed cost/mo | Self-hosted cost/mo | Cost ratio |
|---|---|---|---|
| Pinecone | $800-1,500 | N/A (SaaS only) | — |
| Weaviate | $500-1,200 | $300-600 | 2x |
| Qdrant | $400-1,000 | $250-500 | 2x |
| Milvus | $600-1,500 | $400-800 | 1.5x |
| pgvector | N/A | $300-700 (large PostgreSQL) | — |
| LanceDB | N/A | $80-200 (disk-optimized) | — |
Performance at Scale
Query Latency by Vector Count
| Vector count | Pinecone | Qdrant | Weaviate | pgvector | Milvus | LanceDB |
|---|---|---|---|---|---|---|
| 100K | 5-15ms | 2-8ms | 3-10ms | 5-20ms | 3-10ms | 5-15ms |
| 1M | 10-30ms | 3-15ms | 5-20ms | 10-50ms | 5-25ms | 8-25ms |
| 5M | 12-40ms | 5-25ms | 8-35ms | 20-100ms | 8-30ms | 12-35ms |
| 10M | 15-50ms | 8-30ms | 10-40ms | 30-150ms | 10-35ms | 15-40ms |
| 50M | 20-60ms | 15-50ms | 20-60ms | >200ms | 15-45ms | 25-60ms |
Observations: Qdrant and Milvus maintain the best latency scaling characteristics. pgvector degrades fastest beyond 5M vectors — HNSW index builds become slow and RAM-dependent. LanceDB’s disk-based architecture trades slightly higher latency for dramatically lower RAM requirements.
Filtered Query Performance
Filtering (e.g., “find similar vectors WHERE category = ‘tech’ AND date > 2025-01-01”) is where databases diverge most:
| Database | Pre-filter strategy | Filter impact on latency | Filter impact on recall |
|---|---|---|---|
| Pinecone | Server-side metadata filtering | +5-15ms | No recall impact (exact filter) |
| Weaviate | Inverted index + HNSW | +3-10ms | Minimal (<2% recall loss) |
| Qdrant | Payload index with HNSW | +2-8ms | Minimal (<2% recall loss) |
| Chroma | In-memory filtering | +10-50ms (slow at scale) | No recall impact |
| Milvus | Scalar index + vector search | +5-15ms | Minimal |
| pgvector | SQL WHERE + ANN search | +10-50ms | Can drop 5-15% recall (post-filter) |
| Vespa | Native filtered ANN | +2-5ms | Minimal |
| LanceDB | Column-based filtering | +5-15ms | Minimal |
Critical distinction: Some databases filter before vector search (pre-filter — accurate but potentially slower on small filter sets) while others filter after (post-filter — fast but may miss relevant vectors that were pruned during ANN search). pgvector’s default post-filtering behavior is the most common source of recall degradation.
Feature Comparison for Production Requirements
| Feature | Pinecone | Weaviate | Qdrant | Chroma | Milvus | pgvector |
|---|---|---|---|---|---|---|
| Backup/restore | Automatic | Manual + collections API | Snapshots | Manual (copy files) | Backup API | PostgreSQL pg_dump |
| High availability | Built-in | Replication (3+ nodes) | Replication + sharding | None (single node) | Replica groups | PostgreSQL HA (Patroni, etc.) |
| Horizontal scaling | Automatic (serverless) | Manual shard management | Manual sharding | Not supported | Automatic sharding | Read replicas only |
| Access control | API key + roles | OIDC, API key | API key, JWT | No auth (embedded) | RBAC | PostgreSQL roles |
| SDK languages | Python, JS, Java, Go | Python, JS, Java, Go, .NET | Python, JS, Rust, Go | Python, JS | Python, JS, Java, Go | Any PostgreSQL driver |
| Upsert performance | 1,000-5,000 vectors/sec | 2,000-8,000 vectors/sec | 3,000-10,000 vectors/sec | 1,000-3,000 vectors/sec | 5,000-20,000 vectors/sec | 500-2,000 vectors/sec |
| Index build time (1M vectors) | Automatic | 5-15 min | 3-10 min | 2-8 min | 5-20 min | 10-30 min |
Selection Decision Matrix
| Your situation | Recommended database | Why |
|---|---|---|
| Prototype / POC (<100K vectors) | Chroma or LanceDB | Zero infrastructure; embedded in your application; free |
| Production SaaS, want zero ops | Pinecone | Fully managed; no infrastructure to maintain; auto-scales |
| Production, need hybrid search | Weaviate or Qdrant | Built-in BM25 + vector search; critical for keyword-dependent queries |
| Already using PostgreSQL | pgvector | No new infrastructure; leverage existing PostgreSQL expertise |
| 10M+ vectors, cost-sensitive | Qdrant (self-hosted) or LanceDB | Best latency-per-dollar at scale |
| Enterprise, strict data sovereignty | Qdrant or Milvus (self-hosted) | Full control over data location; no third-party data access |
| High-throughput ingestion | Milvus | Best upsert performance (5K-20K vectors/sec) |
| Need advanced ranking beyond ANN | Vespa | Combines ML ranking, BM25, vector search, and business logic in one query |
| Serverless / edge deployment | LanceDB | Disk-based, low memory; can run on minimal infrastructure |
| Team knows PostgreSQL, nothing else | pgvector | Familiar ops, familiar tooling, acceptable for <5M vectors |
Migration Difficulty
| From | To | Difficulty | Why |
|---|---|---|---|
| Chroma → Pinecone | Low | Export embeddings, re-import; metadata is simple | |
| Chroma → Qdrant/Weaviate | Low | Same — export and re-import | |
| pgvector → Qdrant | Medium | Schema translation; SQL filters → payload filters | |
| Pinecone → self-hosted | Medium | Export via API; rebuild indexes; set up infrastructure | |
| Milvus → Qdrant | Medium | Different collection schemas; re-index required | |
| Any → Any (10M+ vectors) | High | Re-embedding not needed, but re-indexing at scale takes hours-days |
Common Mistakes
| Mistake | What happens | Better approach |
|---|---|---|
| Choosing based on benchmark alone | Benchmarks use uniform data; your data has hotspots, filters, variable dimensions | Test with your actual data and query patterns |
| pgvector at 10M+ vectors | Latency degrades to 100-200ms; index rebuilds take hours | Migrate to purpose-built vector DB before reaching 5M |
| Ignoring filtered search | Pure ANN works for demos; production always has metadata filters | Evaluate filtered query performance, not just pure vector search |
| Embedding lock-in | Chose database that limits embedding dimensions or models | Ensure database supports arbitrary dimensions and multiple collections |
| No backup strategy | Vector index corruption means re-embedding entire corpus | Implement backup on day one; test restore quarterly |
| Premature distributed deployment | Running Milvus cluster for 500K vectors | Single-node Qdrant or Weaviate handles 10M vectors; don’t distribute early |
How to Apply This
Use the token-counter tool to estimate your embedding corpus size — this determines vector count, which drives database selection and cost.
Start with Chroma or LanceDB for prototyping. Zero infrastructure, embedded in your application, free. Migrate when you exceed 500K vectors or need production features (HA, backup, access control).
Test with your actual data. Load 10% of your production vectors and queries into your top 2 candidates. Measure latency, recall, and filtered search performance. Generic benchmarks predict poorly for specific workloads.
Plan for migration. Your first vector database won’t be your last. Store embeddings in a format that’s portable (vectors + metadata as JSON/Parquet). Re-indexing is fast; re-embedding is expensive.
Don’t over-engineer early. 90% of production RAG systems have fewer than 1M vectors. At that scale, every database in this comparison works acceptably. Choose for operational simplicity first, performance second.
Honest Limitations
Performance numbers are based on published benchmarks and community reports; your specific workload (dimension count, filter complexity, concurrent queries) will produce different results. Pricing changes frequently — Pinecone, Weaviate, and Qdrant have all revised pricing in the past 12 months. Self-hosted costs exclude engineering time for setup, maintenance, and troubleshooting — which can be 10-40 hours/month for distributed databases. The “pgvector degrades at 10M+” finding depends on instance size and index configuration; heavily tuned PostgreSQL setups can push this boundary further. New entrants (Turbopuffer, LanceDB Cloud, etc.) are emerging rapidly; the market hasn’t consolidated. HNSW recall numbers assume default configurations; tuning ef_construction and ef_search can improve recall at the cost of latency. The comparison focuses on vector search; Vespa and Weaviate offer significantly more functionality (ranking, aggregation, real-time indexing) that may justify their complexity for advanced use cases.
Continue reading
AI Agent Design Patterns — Tool Use, Planning, and Memory Architectures
Agent architecture decision matrix comparing ReAct, Plan-and-Execute, and Tree-of-Thought with tool integration patterns, memory systems, and failure mode analysis for production agent systems.
AI API Integration Patterns — Direct Call vs Streaming vs Batch Processing
Latency, cost, and complexity comparison across AI API integration patterns with architecture decision matrix, failure handling strategies, and production throughput data.
AI Cost Optimization in Production — Techniques That Cut Spend by 60-80%
Cost reduction technique comparison with percentage savings, implementation effort, and quality impact data across model routing, caching, prompt compression, and architectural patterns.