Regular expressions (regex) are powerful tools for pattern matching and text manipulation. They can seem cryptic at first, but once you understand the basics, they become invaluable.
What Are Regular Expressions?
A regular expression is a sequence of characters that defines a search pattern. They're used to:
- Validate input (emails, phone numbers)
- Find and replace text
- Extract information from strings
- Parse log files and data
Here's a simple example that matches email-like patterns:
[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Looks intimidating? By the end of this article, you'll understand every character.
Basic Pattern Matching
Literal Characters
Most characters match themselves:
Pattern: hello
Matches: "hello" in "Say hello to regex"
The Dot (.) - Any Character
The dot matches any single character (except newline):
Pattern: h.t
Matches: "hat", "hot", "hit", "h9t", "h@t"
Character Classes []
Match any one character from a set:
Pattern: [aeiou]
Matches: any single vowel
Pattern: [0-9]
Matches: any single digit
Pattern: [a-zA-Z]
Matches: any letter
Negated classes with ^:
Pattern: [^0-9]
Matches: anything that's NOT a digit
Shorthand Character Classes
| Pattern | Meaning | Equivalent |
|---|---|---|
\d |
Digit | [0-9] |
\D |
Non-digit | [^0-9] |
\w |
Word character | [a-zA-Z0-9_] |
\W |
Non-word character | [^a-zA-Z0-9_] |
\s |
Whitespace | [ \t\n\r\f] |
\S |
Non-whitespace | [^ \t\n\r\f] |
Quantifiers (*, +, ?)
Quantifiers specify how many times a pattern should match:
* (Zero or More)
Pattern: ab*c
Matches: "ac", "abc", "abbc", "abbbc"...
+ (One or More)
Pattern: ab+c
Matches: "abc", "abbc", "abbbc"...
NOT: "ac" (needs at least one 'b')
? (Zero or One)
Pattern: colou?r
Matches: "color", "colour"
Pattern: a{3} Matches: exactly 3 a's → "aaa"
Pattern: a{2,} Matches: 2 or more a's → "aa", "aaa", "aaaa"...
Pattern: a{2,4} Matches: 2 to 4 a's → "aa", "aaa", "aaaa"
Anchors (^, $)
Anchors don't match characters—they match positions:
^ (Start of String)
Pattern: ^hello
Matches: "hello world"
NOT: "say hello"
$ (End of String)
Pattern: world$
Matches: "hello world"
NOT: "world domination"
Combined
Pattern: ^hello$
Matches: "hello" (exactly, nothing before or after)
Groups and Capturing
Parentheses create groups:
Basic Grouping
Pattern: (ab)+
Matches: "ab", "abab", "ababab"...
Capturing Groups
Groups capture matched text for later use:
const regex = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2024-01-15".match(regex);
// match[1] = "2024" (year)
// match[2] = "01" (month)
// match[3] = "15" (day)
Non-Capturing Groups (?:)
When you need grouping but don't need to capture:
Pattern: (?:https?://)?example\.com
Matches: "example.com", "http://example.com", "https://example.com"
Common Patterns
Email Validation
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Breakdown:
^- Start[a-zA-Z0-9._%+-]+- Username (letters, digits, dots, etc.)@- Literal @[a-zA-Z0-9.-]+- Domain name\.- Literal dot (escaped)[a-zA-Z]{2,}- TLD (2+ letters)$- End
Phone Numbers (US)
^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$
Matches: (555) 123-4567, 555-123-4567, 555.123.4567, 5551234567
URLs
https?://[^\s/$.?#].[^\s]*
Basic pattern—URL validation is notoriously complex!
Regex Flags
Flags modify how the pattern is interpreted:
| Flag | Meaning |
|---|---|
i |
Case-insensitive |
g |
Global (find all matches) |
m |
Multiline (^ and $ match line boundaries) |
s |
Dotall (. matches newlines too) |
// Case-insensitive matching
/hello/i.test("HELLO") // true
// Find all matches
"banana".match(/a/g) // ["a", "a", "a"]
Testing and Debugging Regex
Build incrementally
Start simple and add complexity:
1. Start: example
2. Add domain: example\.com
3. Add protocol: https?://example\.com
4. Add optional www: https?://(www\.)?example\.com
Use testing tools
- Online testers (like our Regex Tester!)
- Show matches highlighted in real-time
- Explain each part of your pattern
Test edge cases
- Empty strings
- Very long strings
- Special characters
- Unicode characters
When NOT to Use Regex
Regex isn't always the answer:
Don't parse HTML with regex
// WRONG - HTML is not a regular language
/<div>(.*)<\/div>/
// RIGHT - Use an HTML parser
Don't validate complex formats
For JSON, XML, or other structured data, use a proper parser.
Keep it readable
If your regex is unreadable, consider:
- Breaking into multiple steps
- Using named groups
- Using a dedicated parsing library
- Adding comments (in languages that support
/xflag)
Regex in Different Languages
JavaScript
// Literal syntax
const regex = /pattern/flags;
// Constructor
const regex = new RegExp("pattern", "flags");
// Methods
regex.test(string) // Returns boolean
string.match(regex) // Returns matches
string.replace(regex, replacement)
Python
import re
# Methods
re.search(pattern, string) # Find first match
re.match(pattern, string) # Match at start
re.findall(pattern, string) # Find all matches
re.sub(pattern, replacement, string) # Replace
Common gotcha: Escaping
In strings, backslashes need escaping:
// These are equivalent:
/\d+/
new RegExp("\\d+")
Python raw strings help:
r"\d+" # Raw string, no double escaping needed
Summary
Regular expressions are powerful but require practice:
- Literal characters match themselves
.matches any character[ ]define character classes* + ?are quantifiers^ $are anchors( )create groups
Start simple, test frequently, and build complexity gradually. When regex becomes unreadable, consider alternative approaches.
Ready to test your patterns? Try our Regex Tester!