Theme

Custom Colors

Accent

#d4943a

Background

#0c0e11

Header

#141719

Cards

#1a1d21

Accessibility Presets

Font

Code and tool outputs stay monospace.

Readability

Font Size

16px

Line Height

1.6

Letter Spacing

0px

Web & Dev

AI Embedding Models Reference

Popular text, image, and multimodal embedding models — dimensions, context, license, and use case.

Updated Apr 19, 2026 2 min read

Text (commercial)

Model	Dim	Max tokens	Notes
OpenAI text-embedding-3-small	1 536 (truncatable)	8 192	~$0.02 / 1M tokens
OpenAI text-embedding-3-large	3 072 (truncatable)	8 192	~$0.13 / 1M tokens
Cohere embed-v3	1 024	512	English + multilingual variants
Voyage-3	1 024	32 000	Strong retrieval benchmarks
Anthropic (via Voyage)	1 024	32 000	Anthropic-branded via partnership
Google gecko / text-embedding-004	768	2 048	Vertex AI

Text (open weights)

Model	Dim	Max tokens	License
BGE-large-en-v1.5	1 024	512	MIT
BGE-m3	1 024	8 192	MIT — multilingual
E5-large-v2	1 024	512	MIT
E5-mistral-7b-instruct	4 096	32 768	MIT — very strong
Jina-embeddings-v3	1 024	8 192	Apache 2.0
Nomic embed v1.5	768	8 192	Apache 2.0
gte-large-en-v1.5	1 024	8 192	MIT
MiniLM-L6-v2	384	256	Apache 2.0 — small & fast

Image / multimodal

Model	Dim	Modality	Notes
CLIP ViT-L/14	768	Image + text	OpenAI, 2021 — baseline
OpenCLIP ViT-H/14	1 024	Image + text	LAION open reimplementation
SigLIP	1 152	Image + text	Google — sigmoid loss
DINOv2 ViT-L/14	1 024	Image only	Self-supervised, dense features
ImageBind	1 024	6 modalities	Meta — image/text/audio/depth/thermal/IMU
Voyage multimodal 3	1 024	Image + text	Commercial

Notes

MTEB (Massive Text Embedding Benchmark) is the standard leaderboard for text embedding quality.
Higher dimension is not always better — many embeddings support truncation/matryoshka for cheaper storage.
Normalize vectors and use cosine similarity (or inner product on normalized = cosine).

Was this article helpful?