AI Model Parameter Sizes

Notable LLM and vision model sizes — parameter counts, context windows, and notes.

Reference Reference Updated Apr 19, 2026

Approximate, public figures. Private/lab models vary; closed-source parameter counts are estimates.

Model	Parameters	Context	Notes
GPT-2	1.5 B	1 024	Early OpenAI transformer (2019)
GPT-3	175 B	2 048	2020 — few-shot breakthrough
GPT-3.5-turbo	~20 B	16 K	ChatGPT launch model
GPT-4	undisclosed (~1 T MoE estimate)	8 K / 32 K
GPT-4-turbo	undisclosed	128 K	Nov 2023
GPT-4o	undisclosed	128 K	Omni, May 2024
Claude 1	undisclosed	100 K	Anthropic
Claude 2 / 2.1	undisclosed	200 K
Claude 3 (Opus/Sonnet/Haiku)	undisclosed	200 K	March 2024
Claude 4	undisclosed	200 K / 1 M	2025
Llama 2 7B / 13B / 70B	7 / 13 / 70 B	4 K	Meta open weights
Llama 3 8B / 70B / 405B	8 / 70 / 405 B	8 K / 128 K
Mistral 7B	7 B	8 K / 32 K	SWA + GQA
Mixtral 8×7B	47 B total / 13 B active	32 K	Mixture of experts
Gemini 1.5 Pro	undisclosed	2 M	Google long-context
Gemini 2.0 / 2.5	undisclosed	1 M
Qwen 2.5 72B	72 B	128 K	Alibaba open weights
DeepSeek V3	671 B total / 37 B active	128 K	MoE, 2024

Notes

Parameters alone don't determine quality — data, training, and alignment matter. Use benchmarks + evaluation not raw size.

Last updated: April 19, 2026