AI Embedding Uses

What embeddings are used for — retrieval, classification, clustering, deduplication, and more.

Reference Reference Updated Apr 19, 2026

Use case	Pattern
Semantic search	Embed query and docs; rank by cosine similarity
RAG	Retrieve top-k passages by embedding, then feed to LLM
Classification	Train small head on embeddings — often no fine-tune needed
Clustering	k-means or HDBSCAN on embeddings to group similar items
Deduplication	Near-duplicate detection via similarity threshold
Recommendation	User / item embeddings; nearest-neighbor for related items
Anomaly detection	Distance to cluster centroid flags outliers
Multilingual search	Cross-lingual embeddings find matches across languages
Multimodal search	CLIP-style joint embedding of image and text
Reranking	Use a second model over candidate set from first-stage retrieval
Semantic cache	Look up similar previous LLM calls — serve cached response

Database	Type	Notes
pgvector	Postgres extension	Fits alongside relational data
Pinecone	Managed SaaS	Fully hosted; simple API
Weaviate	Self-host / cloud	Hybrid search (vector + keyword)
Qdrant	Self-host / cloud	Open source, fast
Milvus	Self-host / Zilliz Cloud	Scalable, Apache 2
Chroma	Local / self-host	Python-first for prototyping
LanceDB	Local embedded	Rust, Arrow-backed
FAISS	Library	Facebook — in-memory index
Redis Vector	Redis module	Combined KV + vector
Elasticsearch / OpenSearch	Search engine	Vector + text hybrid

Chunk size: 200–500 tokens often beats whole documents for retrieval.
Overlap chunks by ~10% to avoid losing context at boundaries.
Hybrid search (BM25 + vector) usually beats vector alone for long-tail queries.
Reranking with a cross-encoder (e.g. bge-reranker) dramatically improves top-k quality.
Metadata filters before vector search cut cost and improve relevance.

Last updated: April 19, 2026