AI Embedding Uses
What embeddings are used for — retrieval, classification, clustering, deduplication, and more.
Reference
Common applications
| Use case | Pattern |
|---|---|
| Semantic search | Embed query and docs; rank by cosine similarity |
| RAG | Retrieve top-k passages by embedding, then feed to LLM |
| Classification | Train small head on embeddings — often no fine-tune needed |
| Clustering | k-means or HDBSCAN on embeddings to group similar items |
| Deduplication | Near-duplicate detection via similarity threshold |
| Recommendation | User / item embeddings; nearest-neighbor for related items |
| Anomaly detection | Distance to cluster centroid flags outliers |
| Multilingual search | Cross-lingual embeddings find matches across languages |
| Multimodal search | CLIP-style joint embedding of image and text |
| Reranking | Use a second model over candidate set from first-stage retrieval |
| Semantic cache | Look up similar previous LLM calls — serve cached response |
Vector database options
| Database | Type | Notes |
|---|---|---|
| pgvector | Postgres extension | Fits alongside relational data |
| Pinecone | Managed SaaS | Fully hosted; simple API |
| Weaviate | Self-host / cloud | Hybrid search (vector + keyword) |
| Qdrant | Self-host / cloud | Open source, fast |
| Milvus | Self-host / Zilliz Cloud | Scalable, Apache 2 |
| Chroma | Local / self-host | Python-first for prototyping |
| LanceDB | Local embedded | Rust, Arrow-backed |
| FAISS | Library | Facebook — in-memory index |
| Redis Vector | Redis module | Combined KV + vector |
| Elasticsearch / OpenSearch | Search engine | Vector + text hybrid |
Implementation tips
- Chunk size: 200–500 tokens often beats whole documents for retrieval.
- Overlap chunks by ~10% to avoid losing context at boundaries.
- Hybrid search (BM25 + vector) usually beats vector alone for long-tail queries.
- Reranking with a cross-encoder (e.g. bge-reranker) dramatically improves top-k quality.
- Metadata filters before vector search cut cost and improve relevance.
Last updated: