You Chose Your Vector Database for a 10,000-Document Prototype — Now It Needs to Handle 10 Million Documents and the Architecture Won’t Scale

Every vector database works at demo scale. The differences emerge at production scale: query latency at 10M+ vectors, filtering performance on metadata-heavy queries, cost at sustained throughput, and operational complexity when something breaks at 3 AM. The vector database market fragmented rapidly — 8+ viable options with fundamentally different architectures, pricing models, and operational requirements. This guide provides the 10-dimension comparison across all major options, the performance data at production scale, and the selection framework that matches database capabilities to application requirements.

The Comparison Matrix

DimensionPineconeWeaviateQdrantChromaMilvuspgvectorVespaLanceDB
TypeManaged SaaSOpen-source + cloudOpen-source + cloudOpen-sourceOpen-source + cloudPostgreSQL extensionOpen-source + cloudOpen-source
Index typeProprietaryHNSWHNSW, scalar/product quantizationHNSW (via hnswlib)IVF_FLAT, HNSW, DiskANNIVF_FLAT, HNSWHNSW, ANNIVF_PQ, DiskANN
Max vectorsBillions (serverless)100M+ (tested)100M+ (tested)1M (practical limit)Billions (distributed)10M+ (depends on RAM)Billions (distributed)100M+ (disk-based)
Query latency (1M vectors, p50)10-30ms5-20ms3-15ms5-30ms5-25ms10-50ms5-15ms8-25ms
Query latency (10M vectors, p50)15-50ms10-40ms8-30msN/A (too slow)10-35ms30-150ms8-25ms15-40ms
Metadata filteringExcellent (server-side)Excellent (inverted index)Excellent (payload index)BasicGood (scalar index)Excellent (SQL WHERE)Excellent (native)Good
Hybrid search (vector + keyword)No native BM25Built-in BM25Built-in BM25 (v1.7+)NoSparse vector supporttsvector + pgvectorBuilt-in BM25 + vectorNo native BM25
Multi-tenancyNamespacesNative tenant isolationCollection-per-tenant or payload filteringCollectionsPartitionsRow-level securityBuilt-inSeparate tables
Self-hosted optionNo (SaaS only)Yes (Docker, K8s)Yes (Docker, K8s)Yes (embedded, Docker)Yes (Docker, K8s)Yes (PostgreSQL)Yes (Docker, K8s)Yes (embedded)
Operational complexityNone (fully managed)MediumMediumLow (embedded)High (distributed)Low (if you know PostgreSQL)HighLow (embedded)

Pricing Comparison

Pricing is the most confusing dimension — every vendor uses a different model.

Managed/Cloud Pricing (1M vectors, 1024 dimensions)

DatabaseStorage costQuery costMonthly total (10K queries/day)Pricing model
Pinecone (Serverless)$0.33/1M vectors/mo$8/1M read units$80-150/moPer-read-unit + storage
Pinecone (Standard)Based on pod typeIncluded in pod$70-280/moPod-based (p1, s1 pods)
Weaviate Cloud$25/mo (sandbox free)Included$95-250/moCluster-based
Qdrant Cloud$0.045/GB RAMIncluded$65-180/moRAM-based
Milvus (Zilliz Cloud)$0.03/CU-hourIncluded$100-300/moCompute-unit hours
Vespa Cloud$0.12/GB contentIncluded$150-400/moContent-size based

Self-Hosted Cost (1M vectors, 1024 dimensions)

DatabaseMin RAMMin storageRecommended instanceMonthly cloud cost
Weaviate4 GB8 GB SSDt3.xlarge (4 vCPU, 16 GB)$120/mo
Qdrant4 GB8 GB SSDt3.xlarge (4 vCPU, 16 GB)$120/mo
Chroma2 GB4 GB SSDt3.large (2 vCPU, 8 GB)$60/mo
Milvus8 GB + etcd + MinIO20 GB SSDm5.2xlarge (8 vCPU, 32 GB)$280/mo
pgvector4 GB (shared with PostgreSQL)8 GB SSDt3.xlarge (4 vCPU, 16 GB)$120/mo
LanceDB1 GB (disk-based)8 GB SSDt3.medium (2 vCPU, 4 GB)$30/mo

Key insight: At 1M vectors, most databases cost $60-300/month whether managed or self-hosted. The cost divergence happens at 10M+ vectors where managed pricing scales linearly while self-hosted can scale more efficiently with disk-based solutions.

Cost at Scale (10M vectors, 1024 dimensions)

DatabaseManaged cost/moSelf-hosted cost/moCost ratio
Pinecone$800-1,500N/A (SaaS only)
Weaviate$500-1,200$300-6002x
Qdrant$400-1,000$250-5002x
Milvus$600-1,500$400-8001.5x
pgvectorN/A$300-700 (large PostgreSQL)
LanceDBN/A$80-200 (disk-optimized)

Performance at Scale

Query Latency by Vector Count

Vector countPineconeQdrantWeaviatepgvectorMilvusLanceDB
100K5-15ms2-8ms3-10ms5-20ms3-10ms5-15ms
1M10-30ms3-15ms5-20ms10-50ms5-25ms8-25ms
5M12-40ms5-25ms8-35ms20-100ms8-30ms12-35ms
10M15-50ms8-30ms10-40ms30-150ms10-35ms15-40ms
50M20-60ms15-50ms20-60ms>200ms15-45ms25-60ms

Observations: Qdrant and Milvus maintain the best latency scaling characteristics. pgvector degrades fastest beyond 5M vectors — HNSW index builds become slow and RAM-dependent. LanceDB’s disk-based architecture trades slightly higher latency for dramatically lower RAM requirements.

Filtered Query Performance

Filtering (e.g., “find similar vectors WHERE category = ‘tech’ AND date > 2025-01-01”) is where databases diverge most:

DatabasePre-filter strategyFilter impact on latencyFilter impact on recall
PineconeServer-side metadata filtering+5-15msNo recall impact (exact filter)
WeaviateInverted index + HNSW+3-10msMinimal (<2% recall loss)
QdrantPayload index with HNSW+2-8msMinimal (<2% recall loss)
ChromaIn-memory filtering+10-50ms (slow at scale)No recall impact
MilvusScalar index + vector search+5-15msMinimal
pgvectorSQL WHERE + ANN search+10-50msCan drop 5-15% recall (post-filter)
VespaNative filtered ANN+2-5msMinimal
LanceDBColumn-based filtering+5-15msMinimal

Critical distinction: Some databases filter before vector search (pre-filter — accurate but potentially slower on small filter sets) while others filter after (post-filter — fast but may miss relevant vectors that were pruned during ANN search). pgvector’s default post-filtering behavior is the most common source of recall degradation.

Feature Comparison for Production Requirements

FeaturePineconeWeaviateQdrantChromaMilvuspgvector
Backup/restoreAutomaticManual + collections APISnapshotsManual (copy files)Backup APIPostgreSQL pg_dump
High availabilityBuilt-inReplication (3+ nodes)Replication + shardingNone (single node)Replica groupsPostgreSQL HA (Patroni, etc.)
Horizontal scalingAutomatic (serverless)Manual shard managementManual shardingNot supportedAutomatic shardingRead replicas only
Access controlAPI key + rolesOIDC, API keyAPI key, JWTNo auth (embedded)RBACPostgreSQL roles
SDK languagesPython, JS, Java, GoPython, JS, Java, Go, .NETPython, JS, Rust, GoPython, JSPython, JS, Java, GoAny PostgreSQL driver
Upsert performance1,000-5,000 vectors/sec2,000-8,000 vectors/sec3,000-10,000 vectors/sec1,000-3,000 vectors/sec5,000-20,000 vectors/sec500-2,000 vectors/sec
Index build time (1M vectors)Automatic5-15 min3-10 min2-8 min5-20 min10-30 min

Selection Decision Matrix

Your situationRecommended databaseWhy
Prototype / POC (<100K vectors)Chroma or LanceDBZero infrastructure; embedded in your application; free
Production SaaS, want zero opsPineconeFully managed; no infrastructure to maintain; auto-scales
Production, need hybrid searchWeaviate or QdrantBuilt-in BM25 + vector search; critical for keyword-dependent queries
Already using PostgreSQLpgvectorNo new infrastructure; leverage existing PostgreSQL expertise
10M+ vectors, cost-sensitiveQdrant (self-hosted) or LanceDBBest latency-per-dollar at scale
Enterprise, strict data sovereigntyQdrant or Milvus (self-hosted)Full control over data location; no third-party data access
High-throughput ingestionMilvusBest upsert performance (5K-20K vectors/sec)
Need advanced ranking beyond ANNVespaCombines ML ranking, BM25, vector search, and business logic in one query
Serverless / edge deploymentLanceDBDisk-based, low memory; can run on minimal infrastructure
Team knows PostgreSQL, nothing elsepgvectorFamiliar ops, familiar tooling, acceptable for <5M vectors

Migration Difficulty

FromToDifficultyWhy
Chroma → PineconeLowExport embeddings, re-import; metadata is simple
Chroma → Qdrant/WeaviateLowSame — export and re-import
pgvector → QdrantMediumSchema translation; SQL filters → payload filters
Pinecone → self-hostedMediumExport via API; rebuild indexes; set up infrastructure
Milvus → QdrantMediumDifferent collection schemas; re-index required
Any → Any (10M+ vectors)HighRe-embedding not needed, but re-indexing at scale takes hours-days

Common Mistakes

MistakeWhat happensBetter approach
Choosing based on benchmark aloneBenchmarks use uniform data; your data has hotspots, filters, variable dimensionsTest with your actual data and query patterns
pgvector at 10M+ vectorsLatency degrades to 100-200ms; index rebuilds take hoursMigrate to purpose-built vector DB before reaching 5M
Ignoring filtered searchPure ANN works for demos; production always has metadata filtersEvaluate filtered query performance, not just pure vector search
Embedding lock-inChose database that limits embedding dimensions or modelsEnsure database supports arbitrary dimensions and multiple collections
No backup strategyVector index corruption means re-embedding entire corpusImplement backup on day one; test restore quarterly
Premature distributed deploymentRunning Milvus cluster for 500K vectorsSingle-node Qdrant or Weaviate handles 10M vectors; don’t distribute early

How to Apply This

Use the token-counter tool to estimate your embedding corpus size — this determines vector count, which drives database selection and cost.

Start with Chroma or LanceDB for prototyping. Zero infrastructure, embedded in your application, free. Migrate when you exceed 500K vectors or need production features (HA, backup, access control).

Test with your actual data. Load 10% of your production vectors and queries into your top 2 candidates. Measure latency, recall, and filtered search performance. Generic benchmarks predict poorly for specific workloads.

Plan for migration. Your first vector database won’t be your last. Store embeddings in a format that’s portable (vectors + metadata as JSON/Parquet). Re-indexing is fast; re-embedding is expensive.

Don’t over-engineer early. 90% of production RAG systems have fewer than 1M vectors. At that scale, every database in this comparison works acceptably. Choose for operational simplicity first, performance second.

Honest Limitations

Performance numbers are based on published benchmarks and community reports; your specific workload (dimension count, filter complexity, concurrent queries) will produce different results. Pricing changes frequently — Pinecone, Weaviate, and Qdrant have all revised pricing in the past 12 months. Self-hosted costs exclude engineering time for setup, maintenance, and troubleshooting — which can be 10-40 hours/month for distributed databases. The “pgvector degrades at 10M+” finding depends on instance size and index configuration; heavily tuned PostgreSQL setups can push this boundary further. New entrants (Turbopuffer, LanceDB Cloud, etc.) are emerging rapidly; the market hasn’t consolidated. HNSW recall numbers assume default configurations; tuning ef_construction and ef_search can improve recall at the cost of latency. The comparison focuses on vector search; Vespa and Weaviate offer significantly more functionality (ranking, aggregation, real-time indexing) that may justify their complexity for advanced use cases.