Try it yourself with our free Regex Tester tool — runs entirely in your browser, no signup needed.

How to Use regex to match in PHP

How to use regex to match in PHP

Regular expressions (regex) are a powerful tool for matching patterns in strings, and PHP provides extensive support for regex through its preg functions. Mastering regex can greatly improve your text processing skills and help you solve complex problems efficiently. In this guide, we'll explore how to use regex to match in PHP, covering the basics, common edge cases, and performance tips.

Quick Example

Here's a minimal example that matches email addresses in a string:

$string = "Contact me at john.doe@example.com or jane.doe@example.com";
$pattern = "/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/";
preg_match_all($pattern, $string, $matches);
print_r($matches[0]);

This code will output:

Array
(
    [0] => john.doe@example.com
    [1] => jane.doe@example.com
)

Step-by-Step Breakdown

Let's break down the code:

  1. $string is the input string containing the text to search.
  2. $pattern is the regex pattern to match email addresses. We'll explain this in detail below.
  3. preg_match_all is the PHP function that performs the regex search. It takes three arguments: the pattern, the input string, and an array to store the matches.
  4. The print_r statement outputs the matched email addresses.

The regex pattern /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/ breaks down as follows:

  • \b is a word boundary, ensuring we match whole email addresses, not parts of other words.
  • [A-Za-z0-9._%+-]+ matches one or more characters that are letters, numbers, or special characters commonly used in email addresses.
  • @ matches the @ symbol literally.
  • [A-Za-z0-9.-]+ matches one or more characters that are letters, numbers, or special characters commonly used in domain names.
  • \. matches the dot (.) literally.
  • [A-Z|a-z]{2,} matches the domain extension (it must be at least 2 characters long).
  • \b is another word boundary.

Handling Edge Cases

Here are some common edge cases to consider:

Empty/Null Input

$string = "";
$pattern = "/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/";
preg_match_all($pattern, $string, $matches);
var_dump($matches); // outputs: array(0) { }

In this case, the input string is empty, and preg_match_all returns an empty array.

Invalid Input

$string = " invalid email address ";
$pattern = "/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/";
preg_match_all($pattern, $string, $matches);
var_dump($matches); // outputs: array(0) { }

In this case, the input string contains an invalid email address, and preg_match_all returns an empty array.

Large Input

$string = str_repeat("example@example.com ", 1000);
$pattern = "/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/";
preg_match_all($pattern, $string, $matches);
print_r($matches[0]); // outputs an array with 1000 email addresses

In this case, the input string is large, but preg_match_all still returns the expected matches.

Unicode/Special Characters

$string = "john.doe@example.com with accents éàü";
$pattern = "/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/";
preg_match_all($pattern, $string, $matches);
print_r($matches[0]); // outputs: Array ( [0] => john.doe@example.com )

In this case, the input string contains Unicode characters, but preg_match_all still returns the expected match.

Common Mistakes

Here are some common mistakes developers make when using regex in PHP:

1. Forgetting to escape special characters

$pattern = "/example.com/"; // incorrect, '.' has a special meaning in regex
$pattern = "/example\.com/"; // correct, '.' is escaped

2. Using preg_match instead of preg_match_all

preg_match($pattern, $string, $matches); // returns only the first match
preg_match_all($pattern, $string, $matches); // returns all matches

3. Not checking for errors

preg_match_all($pattern, $string, $matches);
if ($matches === false) {
    // handle error
}

Performance Tips

Here are some practical performance tips for using regex in PHP:

1. Use preg_match_all instead of preg_match with a loop

// slow
while (preg_match($pattern, $string, $match)) {
    $matches[] = $match[0];
    $string = substr($string, strpos($string, $match[0]) + strlen($match[0]));
}
// fast
preg_match_all($pattern, $string, $matches);

2. Use preg_quote to escape special characters

$string = "example.com";
$pattern = "/".preg_quote($string, "/")."/";

3. Avoid using .* in your patterns

$pattern = "/.*example.com/"; // slow
$pattern = "/example\.com/"; // fast

FAQ

Q: What is the difference between preg_match and preg_match_all?

A: preg_match returns only the first match, while preg_match_all returns all matches.

Q: How do I escape special characters in my regex pattern?

A: Use preg_quote to escape special characters.

Q: What is the performance impact of using .* in my regex pattern?

A: Using .* can significantly slow down your regex search.

Q: Can I use regex to validate email addresses?

A: Yes, but be aware that validating email addresses using regex is complex and may not cover all edge cases.

Q: How do I handle errors when using preg_match_all?

A: Check the return value of preg_match_all for false to handle errors.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp