AI Embedding Models Reference
Popular text, image, and multimodal embedding models — dimensions, context, license, and use case.
Reference
Text (commercial)
| Model | Dim | Max tokens | Notes |
|---|---|---|---|
| OpenAI text-embedding-3-small | 1 536 (truncatable) | 8 192 | ~$0.02 / 1M tokens |
| OpenAI text-embedding-3-large | 3 072 (truncatable) | 8 192 | ~$0.13 / 1M tokens |
| Cohere embed-v3 | 1 024 | 512 | English + multilingual variants |
| Voyage-3 | 1 024 | 32 000 | Strong retrieval benchmarks |
| Anthropic (via Voyage) | 1 024 | 32 000 | Anthropic-branded via partnership |
| Google gecko / text-embedding-004 | 768 | 2 048 | Vertex AI |
Text (open weights)
| Model | Dim | Max tokens | License |
|---|---|---|---|
| BGE-large-en-v1.5 | 1 024 | 512 | MIT |
| BGE-m3 | 1 024 | 8 192 | MIT — multilingual |
| E5-large-v2 | 1 024 | 512 | MIT |
| E5-mistral-7b-instruct | 4 096 | 32 768 | MIT — very strong |
| Jina-embeddings-v3 | 1 024 | 8 192 | Apache 2.0 |
| Nomic embed v1.5 | 768 | 8 192 | Apache 2.0 |
| gte-large-en-v1.5 | 1 024 | 8 192 | MIT |
| MiniLM-L6-v2 | 384 | 256 | Apache 2.0 — small & fast |
Image / multimodal
| Model | Dim | Modality | Notes |
|---|---|---|---|
| CLIP ViT-L/14 | 768 | Image + text | OpenAI, 2021 — baseline |
| OpenCLIP ViT-H/14 | 1 024 | Image + text | LAION open reimplementation |
| SigLIP | 1 152 | Image + text | Google — sigmoid loss |
| DINOv2 ViT-L/14 | 1 024 | Image only | Self-supervised, dense features |
| ImageBind | 1 024 | 6 modalities | Meta — image/text/audio/depth/thermal/IMU |
| Voyage multimodal 3 | 1 024 | Image + text | Commercial |
Notes
- MTEB (Massive Text Embedding Benchmark) is the standard leaderboard for text embedding quality.
- Higher dimension is not always better — many embeddings support truncation/matryoshka for cheaper storage.
- Normalize vectors and use cosine similarity (or inner product on normalized = cosine).
Last updated: