Embedding Similarity Calculator — cosine / Euclidean / dot product × ANN algorithm recommender
Embedding similarity is linear algebra — but which metric, which threshold, and which indexing algorithm depend on application and scale. Cosine for text, Euclidean for images. 0.8 threshold for semantic search, 0.95+ for deduplication. HNSW under 1M vectors, IVF+PQ past 100M. This calculator takes your two vectors, computes all five metrics, interprets the similarity in the context of your use case (search / classify / dedup / recommend / anomaly), and recommends the right ANN algorithm for your corpus scale and latency budget. Model profiles for OpenAI, Voyage, Cohere, Gemini, and custom with dimension + pricing + Matryoshka-truncation support.
- 1Vectors
- 2Metric
- 3Model
- 4Scale
- 5Analysis
Why cosine similarity dominates text embedding comparisons
Quick answer: text embedding models (OpenAI, Voyage, Cohere, BERT-family) output unit-normalized vectors by default — each vector has magnitude 1. For unit vectors, cosine similarity equals dot product (cos θ = A·B because ||A||×||B|| = 1). Cosine is magnitude-invariant, which matters because different texts encode to vectors of the same magnitude regardless of text length. Euclidean distance is magnitude-sensitive, so it\u0027s better for image embeddings (where image complexity may legitimately correlate with magnitude) but worse for text.
Practical consequence: if you\u0027re using OpenAI / Voyage / Cohere text embeddings, default to cosine similarity. If you\u0027re using image embeddings or custom non-normalized models, test both cosine and Euclidean and pick whichever correlates better with ground-truth labels on a validation set.
Cosine threshold values by use case
Quick answer: "cosine 0.8 means similar" is an oversimplification. The useful threshold depends on what you\u0027re DOING with the result. For semantic search, 0.7+ is relevant, 0.85+ is strongly relevant. For deduplication, you need 0.95+ to avoid false positives (which are catastrophic in a dedup context). For recommendation, 0.65-0.8 gives a good candidate pool with room for diversity. For anomaly detection, flip it: LOW similarity to known distribution = high anomaly score.
| Use case | Threshold | Rationale |
|---|---|---|
| Semantic search / retrieval | 0.7-0.85 | Top-K retrieval returns everything above; production thresholds for "clearly relevant" 0.75-0.85. |
| Classification / clustering | Measure on held-out set | Use k-means or HDBSCAN on embeddings, not pairwise thresholds. Depends on class separation. |
| Duplicate detection | 0.95-0.98 | False positives are catastrophic. Conservative threshold, accept false negatives. |
| Recommendation | 0.65-0.8 candidate pool | Mix high-similarity + diverse items to prevent filter bubble. |
| Anomaly detection | Bottom 1-5% of distribution | LOW similarity to nearest neighbor = high anomaly. Percentile-based threshold. |
Embedding model landscape as of 2026-04
Quick answer: five major production embedding models worth comparing. Quality ranking (MTEB benchmark) as of early 2026: Voyage-3 > OpenAI text-embedding-3-large > OpenAI text-embedding-3-small ≈ Cohere embed-english-v3 > Gemini text-embedding-004. Price ranking (per 1M tokens): OpenAI-small $0.02 < Cohere $0.10 < OpenAI-large $0.13 < Voyage-3 $0.18. Gemini-004 has free tier (lowest cost for low-volume). Dimensions range 768 (Gemini) to 3072 (OpenAI-large). OpenAI-large + Gemini-004 support Matryoshka dimension truncation; others don\u0027t.
Selection heuristic: for most text tasks, OpenAI text-embedding-3-small at 1536 dims is the cost-effective default. For retrieval-quality-critical applications, Voyage-3 at 1024 dims with retrieval-tuned training outperforms at modest cost premium. For multimodal (text + image jointly), Gemini is worth evaluating. For high-scale with storage concern, Cohere has int8 and binary variants out-of-the-box.
ANN algorithm trade-offs at scale
Quick answer: approximate nearest neighbor algorithms trade recall for latency and memory. At small scale (under 10K vectors) use brute-force / FLAT — it\u0027s fast enough and gives 100% recall. At medium scale (10K-1M) use HNSW — it\u0027s the industry default with good recall (~95-99%), fast queries (under 10ms), and moderate memory (~2× baseline). At large scale (1M-100M) switch to IVF (Inverted File) or IVF+PQ (Product Quantization) — lower memory (0.1-0.3× with PQ) at modest recall cost (~85-95%). At very large scale (100M+) you need distributed indexes + tiered storage + 2-stage retrieval.
| Algorithm | Recall | Latency | Memory | Good for |
|---|---|---|---|---|
| FLAT / brute-force | 100% | O(N) per query | 1× baseline | <10K vectors |
| HNSW | 95-99% @ M=16 | O(log N) | 2× baseline | 10K-1M, industry default |
| IVF | 85-95% | O(√N) | 1.2× baseline | 1M-100M |
| IVF + PQ (8-bit) | 80-90% | O(√N) | 0.25× baseline | 10M+ with memory constraint |
| LSH | 70-90% | O(1) amortized | Depends | Binary or sparse; less common for dense text |
Matryoshka truncation — the free storage win
Quick answer: some embedding models are trained with Matryoshka representation learning — the first N dimensions alone are useful at progressively coarser quality. OpenAI text-embedding-3-large (3072 dim) can truncate to 512 dim and preserve ~90% of retrieval quality. Gemini-004 (768 dim) truncates to 256 or 512. If storage cost matters, Matryoshka truncation is a 3-6× storage reduction for 5-10% recall cost — almost always a good trade.
Implementation: truncate to first-N dimensions, then re-normalize to unit magnitude (||A|| = 1). Cosine similarity on truncated vectors is NOT exactly the same as on full vectors but preserves rank order well for retrieval. Not all models support Matryoshka — only those trained with the loss function designed for it. OpenAI text-embedding-3-large and Gemini-004 are the main options as of 2026-04.
Product Quantization — 4× to 32× storage reduction
Quick answer: Product Quantization (PQ) from Jegou et al. 2011 compresses vectors by splitting each vector into M sub-vectors, then quantizing each sub-vector to one of K codebook entries (K typically 256 = 1 byte per sub-vector). Original float32 at 1536 dim = 6144 bytes per vector. PQ with M=96 sub-vectors of 16-dim each, K=256 = 96 bytes per vector. 64× compression ratio with typically 5-15% recall loss.
Stacked with IVF clustering: IVF narrows the search to a handful of clusters; PQ compresses each cluster\u0027s residual vectors. This is the "IVF-PQ" pattern that\u0027s production-default at billion-vector scale (Spotify, Pinterest, YouTube recommendation systems). Storage becomes the dominant cost at billions-of-vectors scale; PQ\u0027s 10-50× compression is what makes it economically feasible.
The curse of dimensionality — what happens at very high dims
Quick answer: in very high dimensions (1000+), random vectors tend toward orthogonality (cosine → 0) because the distribution concentrates in a thin shell. Meaningfully-similar vectors stand out more clearly, but poor-quality embeddings can collapse toward mutual-similarity (all items look similar because all distances are large). Good embeddings preserve the contrast between same-topic and different-topic vectors even at high dims; bad embeddings don\u0027t.
Symptom of curse-of-dim: cosine similarities between all pairs in your corpus cluster in a narrow band (0.65-0.72 for all pairs). Diagnostic: compute similarity distribution for random pairs vs known-similar pairs; the gap should be LARGE (random centered ~0.2, similar centered ~0.8). If the gap is small (random 0.4, similar 0.6), your embedding model is underperforming — either switch models or fine-tune for your domain.
When to re-embed — model migration costs
Quick answer: embedding models update every 6-18 months with meaningful quality improvements. Migrating to a new model requires re-embedding your entire corpus. At 1M documents × 100 tokens average × $0.02/M tokens = $2. At 100M documents × 500 tokens × $0.18/M tokens = $9,000. Plus the operational cost of re-indexing, blue-green deployment, validation on held-out queries.
Migration decision: re-embed when new model\u0027s MTEB score is 5+ points higher (~10-20% retrieval quality gain). Don\u0027t re-embed for 1-3 point MTEB gains (noise-level). DO re-embed when switching application category (switching from OpenAI text-embedding-ada-002 to newer text-embedding-3 family was a significant quality jump worth the cost). Plan re-embedding as a scheduled 6-12 month operation; budget for double-indexing during cutover.
What this model does not capture
The tool computes similarity on provided vectors; it does not generate embeddings. If you need to embed text, use the vendor API/SDK (OpenAI, Voyage, Cohere, Google) — running embedding models in-browser is impractical for production-quality embeddings.
Real retrieval quality depends on more than similarity math. Query preprocessing (normalization, lowercase, diacritic removal), chunking strategy for long documents, hybrid retrieval (combining semantic + keyword BM25), reranking with cross-encoders (Cohere Rerank, BGE-reranker), and prompt design for the downstream LLM all matter. The tool focuses on the similarity + ANN piece; retrieval pipeline engineering extends well beyond.
ANN benchmarks vary by dataset + hardware. HNSW on 1M SIFT descriptors (128-dim synthetic) has different characteristics than HNSW on 1M text embeddings (1024-dim). Tune M, ef_construction, ef_search on your actual data; published defaults are starting points.
Sources and further reading
MTEB Leaderboard at huggingface.co/spaces/mteb/leaderboard — canonical empirical quality ranking of embedding models, covering 50+ tasks across retrieval, classification, clustering, reranking. Malkov, Y.A. & Yashunin, D.A., Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs, IEEE TPAMI 2020 — the HNSW paper. Jegou, H., Douze, M., Schmid, C., Product Quantization for Nearest Neighbor Search, IEEE TPAMI 33(1):117-128 (2011) — PQ paper. Kusupati, A. et al., Matryoshka Representation Learning, NeurIPS 2022 — Matryoshka training paper. Lewis, P. et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, NeurIPS 2020 — RAG founding paper. For production patterns: Faiss documentation (github.com/facebookresearch/faiss); Qdrant documentation (qdrant.tech); pgvector documentation (github.com/pgvector/pgvector); Weaviate documentation (weaviate.io). For embedding-fine-tuning: Matryoshka-adapted loss functions; sentence-transformers library (sbert.net).
Embedding Similarity Calculator Tool v1 · canonical sources cited inline above · runs entirely client-side, no data transmitted