AI Embedding Uses

What embeddings are used for — retrieval, classification, clustering, deduplication, and more.

Reference Reference Updated Apr 19, 2026
Reference

Common applications

Use case Pattern
Semantic search Embed query and docs; rank by cosine similarity
RAG Retrieve top-k passages by embedding, then feed to LLM
Classification Train small head on embeddings — often no fine-tune needed
Clustering k-means or HDBSCAN on embeddings to group similar items
Deduplication Near-duplicate detection via similarity threshold
Recommendation User / item embeddings; nearest-neighbor for related items
Anomaly detection Distance to cluster centroid flags outliers
Multilingual search Cross-lingual embeddings find matches across languages
Multimodal search CLIP-style joint embedding of image and text
Reranking Use a second model over candidate set from first-stage retrieval
Semantic cache Look up similar previous LLM calls — serve cached response

Vector database options

Database Type Notes
pgvector Postgres extension Fits alongside relational data
Pinecone Managed SaaS Fully hosted; simple API
Weaviate Self-host / cloud Hybrid search (vector + keyword)
Qdrant Self-host / cloud Open source, fast
Milvus Self-host / Zilliz Cloud Scalable, Apache 2
Chroma Local / self-host Python-first for prototyping
LanceDB Local embedded Rust, Arrow-backed
FAISS Library Facebook — in-memory index
Redis Vector Redis module Combined KV + vector
Elasticsearch / OpenSearch Search engine Vector + text hybrid

Implementation tips

  • Chunk size: 200–500 tokens often beats whole documents for retrieval.
  • Overlap chunks by ~10% to avoid losing context at boundaries.
  • Hybrid search (BM25 + vector) usually beats vector alone for long-tail queries.
  • Reranking with a cross-encoder (e.g. bge-reranker) dramatically improves top-k quality.
  • Metadata filters before vector search cut cost and improve relevance.

Last updated: