AI Model Types
Categories of modern AI models — when to use an LLM vs diffusion vs classifier vs RL agent.
Reference
By architecture
| Family | What it does | Examples |
|---|---|---|
| Transformer LLM | Generate / classify text | GPT, Claude, Llama, Gemini |
| Encoder-only | Classify, extract, embed | BERT, RoBERTa, DeBERTa |
| Decoder-only | Autoregressive generation | GPT family, Llama |
| Encoder-decoder | Translation, summarization | T5, BART, Flan-T5 |
| Vision Transformer (ViT) | Image classification | ViT-L/14, DINOv2 |
| CNN | Image / dense prediction | ResNet, EfficientNet, YOLO |
| Diffusion | Generate images / video | Stable Diffusion, FLUX, Sora |
| GAN | Generate images (legacy) | StyleGAN, CycleGAN |
| VAE | Representation learning | VQ-VAE |
| Audio transformer | Speech, music | Whisper, MusicGen |
| Graph NN | Molecules, social graphs | GraphSAGE, GAT |
| Reinforcement learning | Decision making | DQN, PPO, AlphaZero |
| Multimodal | Vision + language + more | CLIP, GPT-4o, Gemini |
By task
| Task | Best fit |
|---|---|
| Classify short text | Fine-tuned BERT or small LLM |
| Generate / reason / chat | Instruction-tuned LLM |
| Summarize | LLM or T5 variant |
| Translate | Encoder-decoder or LLM |
| Extract structured data | LLM with function-calling / JSON mode |
| Search / RAG | Dense embedding + vector DB |
| Generate images | Diffusion model |
| Detect objects | YOLO / DETR / SAM |
| Transcribe speech | Whisper / Wav2Vec2 |
| Play a game | RL (PPO, AlphaZero-style) |
Last updated: