AI Model Parameter Sizes

Notable LLM and vision model sizes — parameter counts, context windows, and notes.

Reference Reference Updated Apr 19, 2026
Reference

Approximate, public figures. Private/lab models vary; closed-source parameter counts are estimates.

Large language models

Model Parameters Context Notes
GPT-2 1.5 B 1 024 Early OpenAI transformer (2019)
GPT-3 175 B 2 048 2020 — few-shot breakthrough
GPT-3.5-turbo ~20 B 16 K ChatGPT launch model
GPT-4 undisclosed (~1 T MoE estimate) 8 K / 32 K
GPT-4-turbo undisclosed 128 K Nov 2023
GPT-4o undisclosed 128 K Omni, May 2024
Claude 1 undisclosed 100 K Anthropic
Claude 2 / 2.1 undisclosed 200 K
Claude 3 (Opus/Sonnet/Haiku) undisclosed 200 K March 2024
Claude 4 undisclosed 200 K / 1 M 2025
Llama 2 7B / 13B / 70B 7 / 13 / 70 B 4 K Meta open weights
Llama 3 8B / 70B / 405B 8 / 70 / 405 B 8 K / 128 K
Mistral 7B 7 B 8 K / 32 K SWA + GQA
Mixtral 8×7B 47 B total / 13 B active 32 K Mixture of experts
Gemini 1.5 Pro undisclosed 2 M Google long-context
Gemini 2.0 / 2.5 undisclosed 1 M
Qwen 2.5 72B 72 B 128 K Alibaba open weights
DeepSeek V3 671 B total / 37 B active 128 K MoE, 2024

Embedding models

Model Dim Notes
OpenAI text-embedding-3-small 1 536 (or truncated) Replaces ada-002
OpenAI text-embedding-3-large 3 072
Cohere embed-v3 1 024
Voyage v3 1 024
BGE-large-en 1 024 Open, BAAI
E5-mistral-7b-instruct 4 096 MTEB top-tier

Image / vision

Model Size Notes
Stable Diffusion 1.5 860 M U-Net
Stable Diffusion XL 3.5 B
Stable Diffusion 3 2 B – 8 B MM-DiT
FLUX.1 (dev) 12 B Black Forest Labs
DALL·E 3 undisclosed
CLIP ViT-L/14 428 M OpenAI

Notes

  • Parameters alone don't determine quality — data, training, and alignment matter. Use benchmarks + evaluation not raw size.

Last updated: