Web & Dev

AI Model Types

Categories of modern AI models — when to use an LLM vs diffusion vs classifier vs RL agent.

By architecture

FamilyWhat it doesExamples
Transformer LLMGenerate / classify textGPT, Claude, Llama, Gemini
Encoder-onlyClassify, extract, embedBERT, RoBERTa, DeBERTa
Decoder-onlyAutoregressive generationGPT family, Llama
Encoder-decoderTranslation, summarizationT5, BART, Flan-T5
Vision Transformer (ViT)Image classificationViT-L/14, DINOv2
CNNImage / dense predictionResNet, EfficientNet, YOLO
DiffusionGenerate images / videoStable Diffusion, FLUX, Sora
GANGenerate images (legacy)StyleGAN, CycleGAN
VAERepresentation learningVQ-VAE
Audio transformerSpeech, musicWhisper, MusicGen
Graph NNMolecules, social graphsGraphSAGE, GAT
Reinforcement learningDecision makingDQN, PPO, AlphaZero
MultimodalVision + language + moreCLIP, GPT-4o, Gemini

By task

TaskBest fit
Classify short textFine-tuned BERT or small LLM
Generate / reason / chatInstruction-tuned LLM
SummarizeLLM or T5 variant
TranslateEncoder-decoder or LLM
Extract structured dataLLM with function-calling / JSON mode
Search / RAGDense embedding + vector DB
Generate imagesDiffusion model
Detect objectsYOLO / DETR / SAM
Transcribe speechWhisper / Wav2Vec2
Play a gameRL (PPO, AlphaZero-style)
Was this article helpful?