Regular expressions — regex — are a tiny language for describing patterns in text. They power search-and-replace, input validation, log parsing and data extraction in virtually every programming language and editor. They look intimidating, but the secret is that regex is learned in layers: a handful of symbols at a time, each unlocking a class of problems. This guide builds them up that way.
Build and test patterns live, with instant highlighting of what matches, in the Regex Tester.
Layer 1: literal characters
At its simplest, a regex is just text that matches itself. The pattern cat matches the letters “cat” anywhere in a string. Most characters are literals. The power comes from a small set of special characters that mean something more — and the rest of regex is learning what those mean.
Layer 2: character classes
A character class matches any one character from a set, written in square brackets:
| Pattern | Matches |
|---|---|
[aeiou] | Any single vowel |
[0-9] | Any single digit (a range) |
[a-zA-Z] | Any single letter |
[^0-9] | Any character that is not a digit (^ negates) |
Common classes have shorthands: \d = any digit, \w = any word character (letter, digit or underscore), \s = any whitespace, and the dot . = any character at all.
Layer 3: quantifiers — how many
Quantifiers say how many times the preceding item may repeat:
| Symbol | Meaning |
|---|---|
* | Zero or more |
+ | One or more |
? | Zero or one (optional) |
{3} | Exactly 3 |
{2,5} | Between 2 and 5 |
So \d{3} matches exactly three digits, and \w+ matches a whole word. Combine the layers and patterns start doing real work: \d{3}-\d{4} matches a phone fragment like 555-1234.
Layer 4: anchors — where
Anchors match a position rather than a character:
^— start of the string (or line)$— end of the string (or line)\b— a word boundary (the edge between a word and a non-word character)
Anchors are what turn “contains” into “is.” \d+ matches if there are any digits; ^\d+$ matches only if the entire string is digits — exactly what you want for validating a numeric input.
Layer 5: groups and alternation
Parentheses group part of a pattern, so a quantifier can apply to the whole group and so you can capture what matched for extraction. The pipe | means “or.”
(ab)+— one or more repetitions of “ab”(cat|dog|fish)— any one of those words(\d{4})-(\d{2})-(\d{2})— a date, with year, month and day each captured separately for extraction
2026-06-12 and you get back 2026, 06 and 12 as separate pieces.A classic trap: greedy matching
By default, quantifiers are greedy — they grab as much as possible. Run <.+> against <a><b> and it matches the entire <a><b>, not just <a>, because .+ greedily swallows everything up to the last >. Adding ? makes it lazy — <.+?> matches as little as possible, giving just <a>. This single character is behind a huge share of “why did my regex match too much?” confusion.
When not to use regex
Regex is superb for regular patterns but a poor fit for deeply nested structures like HTML or JSON — use a real parser for those. And resist cramming an entire validation into one monstrous expression; a simple regex plus a little code is usually clearer and more correct than a 200-character one-liner nobody can maintain.
In practice
Learn regex in the five layers above and the cryptic symbols become a readable mini-language. Build patterns incrementally, test each addition against real input, and watch out for greedy quantifiers. The Regex Tester highlights matches as you type, which is the fastest way to learn — and see how regex fits the wider toolkit in Developer Data Essentials.
Frequently asked questions
What is a regular expression?
A regular expression (regex) is a compact pattern that describes a set of strings. It is used to search, match, validate and replace text — for example, to check that an input looks like an email address, or to find every date in a document. Almost every programming language and text editor supports them.
Why does regex look so cryptic?
Because it packs a lot of meaning into a few symbols. But it is learnable in layers: literal characters match themselves, a handful of special symbols add power (any digit, one-or-more, start-of-line), and everything else builds on those. Reading it slowly, symbol by symbol, makes it far less intimidating.
Can I validate an email with regex?
You can approximate it, and a simple pattern catches the obvious mistakes (no @, spaces, missing domain). But fully validating every legal email address with regex is famously impractical — the real specification is enormously complex. For most apps, a loose check plus a confirmation email is the right approach.
What is the difference between greedy and lazy matching?
By default quantifiers are greedy: they match as much as possible. Adding a ? makes them lazy: they match as little as possible. This matters when extracting content between delimiters — a greedy pattern can swallow far more than you intended.