PHP Regular Expressions
Introduction
Regular expressions (regex) are powerful tools for pattern matching and text manipulation. In PHP, regular expressions allow you to search, extract, and modify strings based on complex patterns. Whether you need to validate email addresses, parse data, or find specific text patterns, regular expressions provide a concise and flexible solution.
This guide will walk you through PHP's regular expression functionality, starting with the basics and progressing to more advanced techniques.
What are Regular Expressions?
Regular expressions are special text strings used to describe search patterns. Think of them as a mini-language for defining patterns within text. PHP implements regular expressions through the PCRE (Perl Compatible Regular Expressions) library, giving you access to powerful pattern matching capabilities.
Basic PHP Regex Functions
PHP offers several functions for working with regular expressions:
Function | Purpose |
---|---|
preg_match() | Checks if a pattern matches a string and returns 1 if it does |
preg_match_all() | Finds all matches of a pattern in a string |
preg_replace() | Replaces text that matches the pattern |
preg_split() | Splits a string using a pattern as a delimiter |
preg_grep() | Returns array elements that match the pattern |
Creating Your First Regex Pattern
In PHP, regular expression patterns are enclosed between delimiters, commonly forward slashes (/
). Let's start with a simple example:
<?php
// Check if a string contains the word "PHP"
$string = "Learn PHP programming";
$pattern = "/PHP/";
if (preg_match($pattern, $string)) {
echo "Pattern found!";
} else {
echo "Pattern not found!";
}
// Output: Pattern found!
?>
In this example:
/PHP/
is our regular expression pattern- The pattern searches for the exact text "PHP"
preg_match()
returns1
(true) because "PHP" exists in our string
Basic Pattern Matching
Matching Exact Text
The simplest form of a regular expression is matching exact text:
<?php
$text = "The quick brown fox jumps over the lazy dog";
$pattern = "/quick/";
preg_match($pattern, $text, $matches);
print_r($matches);
// Output: Array ( [0] => quick )
?>
Character Classes
Character classes let you match any one character from a set:
<?php
$text = "The year is 2023";
$pattern = "/[0-9]/"; // Matches any single digit
preg_match_all($pattern, $text, $matches);
print_r($matches[0]);
// Output: Array ( [0] => 2 [1] => 0 [2] => 2 [3] => 3 )
?>
Common character classes include:
[0-9]
or\d
: Any digit[a-z]
: Any lowercase letter[A-Z]
: Any uppercase letter[a-zA-Z]
: Any letter[a-zA-Z0-9]
or\w
: Any alphanumeric character or underscore\s
: Any whitespace character (space, tab, newline)
Quantifiers
Quantifiers specify how many times an element should occur:
<?php
$text = "PHP PHP PHP";
$pattern = "/PHP+/"; // Match "PHP" one or more times
preg_match_all($pattern, $text, $matches);
print_r($matches[0]);
// Output: Array ( [0] => PHP [1] => PHP [2] => PHP )
?>
Common quantifiers:
*
: Zero or more times+
: One or more times?
: Zero or one time{n}
: Exactly n times{n,}
: At least n times{n,m}
: Between n and m times
Anchors and Boundaries
Anchors help you match patterns at specific positions in the text:
<?php
$text = "PHP is great";
$pattern1 = "/^PHP/"; // Matches "PHP" at the beginning of the string
$pattern2 = "/great$/"; // Matches "great" at the end of the string
echo preg_match($pattern1, $text) ? "Pattern 1 matched
" : "Pattern 1 not matched
";
echo preg_match($pattern2, $text) ? "Pattern 2 matched
" : "Pattern 2 not matched
";
// Output:
// Pattern 1 matched
// Pattern 2 matched
?>
Common anchors and boundaries:
^
: Beginning of line$
: End of line\b
: Word boundary\B
: Not a word boundary
Practical Example: Email Validation
Let's create a simple email validator using regular expressions:
<?php
function validateEmail($email) {
$pattern = "/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/";
return preg_match($pattern, $email) === 1;
}
$emails = [
"[email protected]",
"invalid-email",
"[email protected]",
"[email protected]"
];
foreach ($emails as $email) {
echo "$email: " . (validateEmail($email) ? "Valid" : "Invalid") . "
";
}
// Output:
// [email protected]: Valid
// invalid-email: Invalid
// [email protected]: Valid
// [email protected]: Invalid
?>
Let's break down this email pattern:
^[a-zA-Z0-9._%+-]+
: Username can contain letters, numbers, and certain special characters@
: The @ symbol[a-zA-Z0-9.-]+
: Domain name can contain letters, numbers, dots, and hyphens\.
: A literal dot (escaped with a backslash)[a-zA-Z]{2,}$
: Top-level domain must be at least 2 letters
Finding and Replacing Text
The preg_replace()
function allows you to replace text that matches a pattern:
<?php
$text = "Contact us at [email protected] or [email protected]";
$pattern = "/([a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})/";
$replacement = "<a href=\"mailto:$1\">$1</a>";
$result = preg_replace($pattern, $replacement, $text);
echo $result;
// Output: Contact us at <a href="mailto:[email protected]">[email protected]</a> or <a href="mailto:[email protected]">[email protected]</a>
?>
In this example, we're finding email addresses and wrapping them in HTML anchor tags for easy clicking.
Capturing Groups
Parentheses ()
create capturing groups that allow you to extract specific parts of the matched pattern:
<?php
$date = "2023-04-15";
$pattern = "/(\d{4})-(\d{2})-(\d{2})/";
preg_match($pattern, $date, $matches);
print_r($matches);
// Output:
// Array (
// [0] => 2023-04-15
// [1] => 2023
// [2] => 04
// [3] => 15
// )
echo "Year: {$matches[1]}, Month: {$matches[2]}, Day: {$matches[3]}";
// Output: Year: 2023, Month: 04, Day: 15
?>
In this example:
- The entire match is stored in
$matches[0]
- The first group (year) is in
$matches[1]
- The second group (month) is in
$matches[2]
- The third group (day) is in
$matches[3]
Named Capturing Groups
For better readability, you can name your capturing groups:
<?php
$date = "2023-04-15";
$pattern = "/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/";
preg_match($pattern, $date, $matches);
print_r($matches);
// Output includes named keys:
// Array (
// [0] => 2023-04-15
// [year] => 2023
// [1] => 2023
// [month] => 04
// [2] => 04
// [day] => 15
// [3] => 15
// )
echo "Year: {$matches['year']}, Month: {$matches['month']}, Day: {$matches['day']}";
// Output: Year: 2023, Month: 04, Day: 15
?>
Pattern Modifiers
PHP regular expressions support modifiers that change how the pattern is interpreted:
<?php
$text = "PHP is a programming language.
php is used for web development.";
// Case-insensitive search
$pattern = "/php/i";
preg_match_all($pattern, $text, $matches);
print_r($matches[0]);
// Output: Array ( [0] => PHP [1] => php )
// Multi-line search
$text = "First line
Second line
Third line";
$pattern = "/^Second/m"; // ^ normally matches start of string, but with m modifier, it matches start of line
preg_match($pattern, $text, $matches);
print_r($matches);
// Output: Array ( [0] => Second )
?>
Common modifiers:
i
: Case-insensitivem
: Multi-line mode (^ and $ match beginning/end of each line)s
: Dot matches all (including newlines)x
: Extended mode (allows comments and whitespace in the pattern)u
: Unicode mode
Real-World Example: Parsing HTML
Here's how you might extract all images from an HTML string:
<?php
$html = '<div class="content">
<img src="image1.jpg" alt="Image 1" />
<p>Some text</p>
<img src="image2.jpg" alt="Image 2" />
</div>';
$pattern = '/<img\s+[^>]*src=["\']([^"\']*)["\'][^>]*>/i';
preg_match_all($pattern, $html, $matches);
print_r($matches[1]); // This will contain all image sources
// Output: Array ( [0] => image1.jpg [1] => image2.jpg )
?>
Note: While regex can parse HTML in simple cases, it's generally better to use a DOM parser for complex HTML.
Common Regex Patterns for PHP
Here are some useful patterns for common PHP tasks:
URL Validation
$pattern = '/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/';
Strong Password Validation
$pattern = '/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/';
Date Validation (YYYY-MM-DD)
$pattern = '/^([0-9]{4})-([0-1][0-9])-([0-3][0-9])$/';
Performance Considerations
Regular expressions are powerful but can become complex and impact performance. Here are some tips:
- Be specific: The more specific your pattern, the faster it will run
- Avoid excessive backtracking with careful use of quantifiers
- Use alternative functions when appropriate (e.g.,
strpos()
for simple string searches) - Test performance-critical regex on large datasets
Summary
Regular expressions in PHP provide a powerful way to handle string manipulations and pattern matching. They can seem intimidating at first, but with practice, they become an essential tool for tasks like:
- Form validation
- Data extraction
- Search and replace operations
- Text parsing and formatting
Remember that while regular expressions are powerful, they aren't always the best solution. For simple string operations, PHP's built-in string functions are often more efficient and readable.
Practice Exercises
- Create a regex pattern to validate a phone number in the format (XXX) XXX-XXXX
- Write a PHP function that extracts all URLs from a given text
- Create a function that censors profanity in a string by replacing certain words with asterisks
- Write a pattern to validate usernames (alphanumeric, 3-16 characters, may include underscore)
- Create a function to extract hashtags from a social media post
Additional Resources
- PHP.net Official Documentation on PCRE
- Regular Expressions 101 - An online tool for testing regex with PHP flavor
- RegexOne - Interactive regex tutorial
- PHP The Right Way: Regular Expressions
Happy coding with PHP regular expressions!
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)