Regular Expressions: A Practical Introduction

Regular expressions (regex) are powerful tools for pattern matching and text manipulation. They can seem cryptic at first, but once you understand the basics, they become invaluable.

What Are Regular Expressions?

A regular expression is a sequence of characters that defines a search pattern. They're used to:

Validate input (emails, phone numbers)
Find and replace text
Extract information from strings
Parse log files and data

Here's a simple example that matches email-like patterns:

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

Looks intimidating? By the end of this article, you'll understand every character.

Basic Pattern Matching

Literal Characters

Most characters match themselves:

Pattern: hello
Matches: "hello" in "Say hello to regex"

The Dot (.) - Any Character

The dot matches any single character (except newline):

Pattern: h.t
Matches: "hat", "hot", "hit", "h9t", "h@t"

Character Classes []

Match any one character from a set:

Pattern: [aeiou]
Matches: any single vowel

Pattern: [0-9]
Matches: any single digit

Pattern: [a-zA-Z]
Matches: any letter

Negated classes with ^:

Pattern: [^0-9]
Matches: anything that's NOT a digit

Shorthand Character Classes

Pattern	Meaning	Equivalent
`\d`	Digit	`[0-9]`
`\D`	Non-digit	`[^0-9]`
`\w`	Word character	`[a-zA-Z0-9_]`
`\W`	Non-word character	`[^a-zA-Z0-9_]`
`\s`	Whitespace	`[ \t\n\r\f]`
`\S`	Non-whitespace	`[^ \t\n\r\f]`

Quantifiers (*, +, ?)

Quantifiers specify how many times a pattern should match:

* (Zero or More)

Pattern: ab*c
Matches: "ac", "abc", "abbc", "abbbc"...

+ (One or More)

Pattern: ab+c
Matches: "abc", "abbc", "abbbc"...
NOT: "ac" (needs at least one 'b')

? (Zero or One)

Pattern: colou?r
Matches: "color", "colour"

Pattern: a{3}      Matches: exactly 3 a's → "aaa"
Pattern: a{2,}     Matches: 2 or more a's → "aa", "aaa", "aaaa"...
Pattern: a{2,4}    Matches: 2 to 4 a's → "aa", "aaa", "aaaa"

Anchors (^, $)

Anchors don't match characters—they match positions:

^ (Start of String)

Pattern: ^hello
Matches: "hello world"
NOT: "say hello"

$ (End of String)

Pattern: world$
Matches: "hello world"
NOT: "world domination"

Combined

Pattern: ^hello$
Matches: "hello" (exactly, nothing before or after)

Groups and Capturing

Parentheses create groups:

Basic Grouping

Pattern: (ab)+
Matches: "ab", "abab", "ababab"...

Capturing Groups

Groups capture matched text for later use:

const regex = /(\d{4})-(\d{2})-(\d{2})/;
const match = "2024-01-15".match(regex);
// match[1] = "2024" (year)
// match[2] = "01" (month)
// match[3] = "15" (day)

Non-Capturing Groups (?:)

When you need grouping but don't need to capture:

Pattern: (?:https?://)?example\.com
Matches: "example.com", "http://example.com", "https://example.com"

Common Patterns

Email Validation

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

Breakdown:

^ - Start
[a-zA-Z0-9._%+-]+ - Username (letters, digits, dots, etc.)
@ - Literal @
[a-zA-Z0-9.-]+ - Domain name
\. - Literal dot (escaped)
[a-zA-Z]{2,} - TLD (2+ letters)
$ - End

Phone Numbers (US)

^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$

Matches: (555) 123-4567, 555-123-4567, 555.123.4567, 5551234567

URLs

https?://[^\s/$.?#].[^\s]*

Basic pattern—URL validation is notoriously complex!

Regex Flags

Flags modify how the pattern is interpreted:

Flag	Meaning
`i`	Case-insensitive
`g`	Global (find all matches)
`m`	Multiline (^ and $ match line boundaries)
`s`	Dotall (. matches newlines too)

// Case-insensitive matching
/hello/i.test("HELLO")  // true

// Find all matches
"banana".match(/a/g)    // ["a", "a", "a"]

Testing and Debugging Regex

Build incrementally

Start simple and add complexity:

1. Start: example
2. Add domain: example\.com
3. Add protocol: https?://example\.com
4. Add optional www: https?://(www\.)?example\.com

Use testing tools

Online testers (like our Regex Tester!)
Show matches highlighted in real-time
Explain each part of your pattern

Test edge cases

Empty strings
Very long strings
Special characters
Unicode characters

When NOT to Use Regex

Regex isn't always the answer:

Don't parse HTML with regex

// WRONG - HTML is not a regular language
/<div>(.*)<\/div>/

// RIGHT - Use an HTML parser

Don't validate complex formats

For JSON, XML, or other structured data, use a proper parser.

Keep it readable

If your regex is unreadable, consider:

Breaking into multiple steps
Using named groups
Using a dedicated parsing library
Adding comments (in languages that support /x flag)

Regex in Different Languages

JavaScript

// Literal syntax
const regex = /pattern/flags;

// Constructor
const regex = new RegExp("pattern", "flags");

// Methods
regex.test(string)     // Returns boolean
string.match(regex)    // Returns matches
string.replace(regex, replacement)

Python

import re

# Methods
re.search(pattern, string)   # Find first match
re.match(pattern, string)    # Match at start
re.findall(pattern, string)  # Find all matches
re.sub(pattern, replacement, string)  # Replace

Common gotcha: Escaping

In strings, backslashes need escaping:

// These are equivalent:
/\d+/
new RegExp("\\d+")

Python raw strings help:

r"\d+"  # Raw string, no double escaping needed

Summary

Regular expressions are powerful but require practice:

Literal characters match themselves
. matches any character
[ ] define character classes
* + ? are quantifiers
^ $ are anchors
( ) create groups

Start simple, test frequently, and build complexity gradually. When regex becomes unreadable, consider alternative approaches.

Ready to test your patterns? Try our Regex Tester!