AI Model Types

Categories of modern AI models — when to use an LLM vs diffusion vs classifier vs RL agent.

Reference Reference Updated Apr 19, 2026
Reference

By architecture

Family What it does Examples
Transformer LLM Generate / classify text GPT, Claude, Llama, Gemini
Encoder-only Classify, extract, embed BERT, RoBERTa, DeBERTa
Decoder-only Autoregressive generation GPT family, Llama
Encoder-decoder Translation, summarization T5, BART, Flan-T5
Vision Transformer (ViT) Image classification ViT-L/14, DINOv2
CNN Image / dense prediction ResNet, EfficientNet, YOLO
Diffusion Generate images / video Stable Diffusion, FLUX, Sora
GAN Generate images (legacy) StyleGAN, CycleGAN
VAE Representation learning VQ-VAE
Audio transformer Speech, music Whisper, MusicGen
Graph NN Molecules, social graphs GraphSAGE, GAT
Reinforcement learning Decision making DQN, PPO, AlphaZero
Multimodal Vision + language + more CLIP, GPT-4o, Gemini

By task

Task Best fit
Classify short text Fine-tuned BERT or small LLM
Generate / reason / chat Instruction-tuned LLM
Summarize LLM or T5 variant
Translate Encoder-decoder or LLM
Extract structured data LLM with function-calling / JSON mode
Search / RAG Dense embedding + vector DB
Generate images Diffusion model
Detect objects YOLO / DETR / SAM
Transcribe speech Whisper / Wav2Vec2
Play a game RL (PPO, AlphaZero-style)

Last updated: