Embedding Similarity Calculator — cosine / Euclidean / dot product × ANN algorithm recommender

Embedding similarity is linear algebra — but which metric, which threshold, and which indexing algorithm depend on application and scale. Cosine for text, Euclidean for images. 0.8 threshold for semantic search, 0.95+ for deduplication. HNSW under 1M vectors, IVF+PQ past 100M. This calculator takes your two vectors, computes all five metrics, interprets the similarity in the context of your use case (search / classify / dedup / recommend / anomaly), and recommends the right ANN algorithm for your corpus scale and latency budget. Model profiles for OpenAI, Voyage, Cohere, Gemini, and custom with dimension + pricing + Matryoshka-truncation support.

Free Private Calculator

Why cosine similarity dominates text embedding comparisons

Quick answer: text embedding models (OpenAI, Voyage, Cohere, BERT-family) output unit-normalized vectors by default — each vector has magnitude 1. For unit vectors, cosine similarity equals dot product (cos θ = A·B because ||A||×||B|| = 1). Cosine is magnitude-invariant, which matters because different texts encode to vectors of the same magnitude regardless of text length. Euclidean distance is magnitude-sensitive, so it\u0027s better for image embeddings (where image complexity may legitimately correlate with magnitude) but worse for text.

Practical consequence: if you\u0027re using OpenAI / Voyage / Cohere text embeddings, default to cosine similarity. If you\u0027re using image embeddings or custom non-normalized models, test both cosine and Euclidean and pick whichever correlates better with ground-truth labels on a validation set.

Cosine threshold values by use case

Quick answer: "cosine 0.8 means similar" is an oversimplification. The useful threshold depends on what you\u0027re DOING with the result. For semantic search, 0.7+ is relevant, 0.85+ is strongly relevant. For deduplication, you need 0.95+ to avoid false positives (which are catastrophic in a dedup context). For recommendation, 0.65-0.8 gives a good candidate pool with room for diversity. For anomaly detection, flip it: LOW similarity to known distribution = high anomaly score.

Use case	Threshold	Rationale
Semantic search / retrieval	0.7-0.85	Top-K retrieval returns everything above; production thresholds for "clearly relevant" 0.75-0.85.
Classification / clustering	Measure on held-out set	Use k-means or HDBSCAN on embeddings, not pairwise thresholds. Depends on class separation.
Duplicate detection	0.95-0.98	False positives are catastrophic. Conservative threshold, accept false negatives.
Recommendation	0.65-0.8 candidate pool	Mix high-similarity + diverse items to prevent filter bubble.
Anomaly detection	Bottom 1-5% of distribution	LOW similarity to nearest neighbor = high anomaly. Percentile-based threshold.

Embedding model landscape as of 2026-04

Quick answer: five major production embedding models worth comparing. Quality ranking (MTEB benchmark) as of early 2026: Voyage-3 > OpenAI text-embedding-3-large > OpenAI text-embedding-3-small ≈ Cohere embed-english-v3 > Gemini text-embedding-004. Price ranking (per 1M tokens): OpenAI-small $0.02 < Cohere $0.10 < OpenAI-large $0.13 < Voyage-3 $0.18. Gemini-004 has free tier (lowest cost for low-volume). Dimensions range 768 (Gemini) to 3072 (OpenAI-large). OpenAI-large + Gemini-004 support Matryoshka dimension truncation; others don\u0027t.

Selection heuristic: for most text tasks, OpenAI text-embedding-3-small at 1536 dims is the cost-effective default. For retrieval-quality-critical applications, Voyage-3 at 1024 dims with retrieval-tuned training outperforms at modest cost premium. For multimodal (text + image jointly), Gemini is worth evaluating. For high-scale with storage concern, Cohere has int8 and binary variants out-of-the-box.

ANN algorithm trade-offs at scale

Quick answer: approximate nearest neighbor algorithms trade recall for latency and memory. At small scale (under 10K vectors) use brute-force / FLAT — it\u0027s fast enough and gives 100% recall. At medium scale (10K-1M) use HNSW — it\u0027s the industry default with good recall (~95-99%), fast queries (under 10ms), and moderate memory (~2× baseline). At large scale (1M-100M) switch to IVF (Inverted File) or IVF+PQ (Product Quantization) — lower memory (0.1-0.3× with PQ) at modest recall cost (~85-95%). At very large scale (100M+) you need distributed indexes + tiered storage + 2-stage retrieval.

Algorithm	Recall	Latency	Memory	Good for
FLAT / brute-force	100%	O(N) per query	1× baseline	<10K vectors
HNSW	95-99% @ M=16	O(log N)	2× baseline	10K-1M, industry default
IVF	85-95%	O(√N)	1.2× baseline	1M-100M
IVF + PQ (8-bit)	80-90%	O(√N)	0.25× baseline	10M+ with memory constraint
LSH	70-90%	O(1) amortized	Depends	Binary or sparse; less common for dense text

Matryoshka truncation — the free storage win

Quick answer: some embedding models are trained with Matryoshka representation learning — the first N dimensions alone are useful at progressively coarser quality. OpenAI text-embedding-3-large (3072 dim) can truncate to 512 dim and preserve ~90% of retrieval quality. Gemini-004 (768 dim) truncates to 256 or 512. If storage cost matters, Matryoshka truncation is a 3-6× storage reduction for 5-10% recall cost — almost always a good trade.

Implementation: truncate to first-N dimensions, then re-normalize to unit magnitude (||A|| = 1). Cosine similarity on truncated vectors is NOT exactly the same as on full vectors but preserves rank order well for retrieval. Not all models support Matryoshka — only those trained with the loss function designed for it. OpenAI text-embedding-3-large and Gemini-004 are the main options as of 2026-04.

Product Quantization — 4× to 32× storage reduction

Quick answer: Product Quantization (PQ) from Jegou et al. 2011 compresses vectors by splitting each vector into M sub-vectors, then quantizing each sub-vector to one of K codebook entries (K typically 256 = 1 byte per sub-vector). Original float32 at 1536 dim = 6144 bytes per vector. PQ with M=96 sub-vectors of 16-dim each, K=256 = 96 bytes per vector. 64× compression ratio with typically 5-15% recall loss.

Stacked with IVF clustering: IVF narrows the search to a handful of clusters; PQ compresses each cluster\u0027s residual vectors. This is the "IVF-PQ" pattern that\u0027s production-default at billion-vector scale (Spotify, Pinterest, YouTube recommendation systems). Storage becomes the dominant cost at billions-of-vectors scale; PQ\u0027s 10-50× compression is what makes it economically feasible.

The curse of dimensionality — what happens at very high dims

Quick answer: in very high dimensions (1000+), random vectors tend toward orthogonality (cosine → 0) because the distribution concentrates in a thin shell. Meaningfully-similar vectors stand out more clearly, but poor-quality embeddings can collapse toward mutual-similarity (all items look similar because all distances are large). Good embeddings preserve the contrast between same-topic and different-topic vectors even at high dims; bad embeddings don\u0027t.

Symptom of curse-of-dim: cosine similarities between all pairs in your corpus cluster in a narrow band (0.65-0.72 for all pairs). Diagnostic: compute similarity distribution for random pairs vs known-similar pairs; the gap should be LARGE (random centered ~0.2, similar centered ~0.8). If the gap is small (random 0.4, similar 0.6), your embedding model is underperforming — either switch models or fine-tune for your domain.

When to re-embed — model migration costs

Quick answer: embedding models update every 6-18 months with meaningful quality improvements. Migrating to a new model requires re-embedding your entire corpus. At 1M documents × 100 tokens average × $0.02/M tokens = $2. At 100M documents × 500 tokens × $0.18/M tokens = $9,000. Plus the operational cost of re-indexing, blue-green deployment, validation on held-out queries.

Migration decision: re-embed when new model\u0027s MTEB score is 5+ points higher (~10-20% retrieval quality gain). Don\u0027t re-embed for 1-3 point MTEB gains (noise-level). DO re-embed when switching application category (switching from OpenAI text-embedding-ada-002 to newer text-embedding-3 family was a significant quality jump worth the cost). Plan re-embedding as a scheduled 6-12 month operation; budget for double-indexing during cutover.

What this model does not capture

The tool computes similarity on provided vectors; it does not generate embeddings. If you need to embed text, use the vendor API/SDK (OpenAI, Voyage, Cohere, Google) — running embedding models in-browser is impractical for production-quality embeddings.

Real retrieval quality depends on more than similarity math. Query preprocessing (normalization, lowercase, diacritic removal), chunking strategy for long documents, hybrid retrieval (combining semantic + keyword BM25), reranking with cross-encoders (Cohere Rerank, BGE-reranker), and prompt design for the downstream LLM all matter. The tool focuses on the similarity + ANN piece; retrieval pipeline engineering extends well beyond.

ANN benchmarks vary by dataset + hardware. HNSW on 1M SIFT descriptors (128-dim synthetic) has different characteristics than HNSW on 1M text embeddings (1024-dim). Tune M, ef_construction, ef_search on your actual data; published defaults are starting points.

Sources and further reading

MTEB Leaderboard at huggingface.co/spaces/mteb/leaderboard — canonical empirical quality ranking of embedding models, covering 50+ tasks across retrieval, classification, clustering, reranking. Malkov, Y.A. & Yashunin, D.A., Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs, IEEE TPAMI 2020 — the HNSW paper. Jegou, H., Douze, M., Schmid, C., Product Quantization for Nearest Neighbor Search, IEEE TPAMI 33(1):117-128 (2011) — PQ paper. Kusupati, A. et al., Matryoshka Representation Learning, NeurIPS 2022 — Matryoshka training paper. Lewis, P. et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, NeurIPS 2020 — RAG founding paper. For production patterns: Faiss documentation (github.com/facebookresearch/faiss); Qdrant documentation (qdrant.tech); pgvector documentation (github.com/pgvector/pgvector); Weaviate documentation (weaviate.io). For embedding-fine-tuning: Matryoshka-adapted loss functions; sentence-transformers library (sbert.net).

What this embedding similarity calculator delivers

5 similarity metrics simultaneously — cosine (default text), dot product (unnormalized), Euclidean (magnitude-sensitive), Manhattan / L1 (sparse-friendly), Hamming (binary embeddings). Compare all metrics for the same vector pair in one view.
Use-case-specific threshold guidance — semantic search thresholds (0.7-0.85), deduplication (0.95+), recommendation (0.65-0.8), anomaly detection (bottom 1-5% of distribution). Generic "cosine 0.8 means similar" is wrong for dedup, right for search.
ANN algorithm recommendation scaled to corpus — FLAT under 10K · HNSW for 10K-1M · IVF+PQ for 1M-100M · distributed for 100M+. Each with expected recall, latency, and memory multiplier at your recall + latency targets.
5 embedding model profiles — OpenAI 3-small (1536) / 3-large (3072 + Matryoshka) / Voyage-3 (1024) / Cohere v3 (1024) / Gemini-004 (768 + Matryoshka). Storage cost + re-embedding cost + MTEB-benchmark quality context per model.
Matryoshka truncation awareness — text-embedding-3-large + Gemini-004 support dimension truncation (3072 → 512 preserves ~90% retrieval quality at 1/6 storage). Tool flags which models support this and quality trade-off.
Validates vector format + unit-normalization — paste JSON arrays, tool detects dimension mismatch (error), non-unit magnitude (warning for cosine assumption), parses errors cleanly.

Model pricing and dimensions from vendor public pricing pages (2026-04). MTEB scores from huggingface.co/spaces/mteb/leaderboard. ANN algorithm data from Malkov & Yashunin (HNSW paper, IEEE TPAMI 2020), Jegou et al. (PQ paper, IEEE TPAMI 2011), and Faiss / Qdrant / Weaviate production documentation.

Embedding Similarity Calculator Tool v1 · canonical sources cited inline above · runs entirely client-side, no data transmitted

Tell us what could be better

Two questions. Takes 30 seconds. We read every reply.

More tools

AI Token Counter Counter · Count tokens

AI Context Window Planner Planner · Plan a context budget

Recipe Scaler Calculator · Scale a recipe

Bread Hydration + Windowpane Predictor Predictor · Model a dough