Try it yourself with our free Regex Tester tool — runs entirely in your browser, no signup needed.

How to Use regex to match in Java

How to use regex to match in Java

Regular expressions (regex) are a powerful tool for matching patterns in strings, and Java provides a robust API for working with regex. In this guide, we'll cover the basics of using regex to match in Java, including a quick example, a step-by-step breakdown, and tips for handling edge cases, common mistakes, and performance optimization.

Quick Example

Here's a minimal example that demonstrates how to use regex to match a pattern in a string:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexExample {
    public static void main(String[] args) {
        String input = "Hello, my email is john.doe@example.com";
        String pattern = "\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}\\b";
        Pattern regex = Pattern.compile(pattern);
        Matcher matcher = regex.matcher(input);

        if (matcher.find()) {
            System.out.println("Email found: " + matcher.group());
        } else {
            System.out.println("No email found");
        }
    }
}

This code uses the Pattern and Matcher classes to compile a regex pattern and search for a match in the input string.

Step-by-Step Breakdown

Let's walk through the code line by line:

  • import java.util.regex.Matcher; and import java.util.regex.Pattern;: We import the Matcher and Pattern classes, which are part of Java's regex API.
  • String input = "Hello, my email is john.doe@example.com";: We define the input string that we want to search for a match.
  • String pattern = "\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}\\b";: We define the regex pattern as a string. This pattern matches most common email address formats.
  • Pattern regex = Pattern.compile(pattern);: We compile the regex pattern into a Pattern object using the compile() method.
  • Matcher matcher = regex.matcher(input);: We create a Matcher object by passing the input string to the matcher() method of the Pattern object.
  • if (matcher.find()) {...}: We use the find() method to search for a match in the input string. If a match is found, we execute the code inside the if statement.
  • System.out.println("Email found: " + matcher.group());: If a match is found, we print the matched text to the console using the group() method.

Handling Edge Cases

Here are some common edge cases to consider:

Empty/Null Input

If the input string is empty or null, the matcher() method will throw a NullPointerException. To handle this, we can add a simple null check:

if (input != null && !input.isEmpty()) {
    Matcher matcher = regex.matcher(input);
    // ...
}

Invalid Input

If the input string contains invalid characters, the find() method may not work as expected. To handle this, we can use the matches() method instead, which matches the entire input string against the pattern:

if (input.matches(pattern)) {
    System.out.println("Input matches the pattern");
}

Large Input

If the input string is very large, the find() method may take a long time to complete. To handle this, we can use the region() method to limit the search to a specific region of the input string:

matcher.region(0, 1000); // Search only the first 1000 characters
if (matcher.find()) {
    System.out.println("Match found in the first 1000 characters");
}

Unicode/Special Characters

If the input string contains Unicode or special characters, we may need to adjust the regex pattern to match these characters correctly. For example, to match Unicode characters, we can use the \u escape sequence:

String pattern = "\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}\\b\\u";

Common Mistakes

Here are three common mistakes to avoid:

Mistake 1: Incorrect Pattern

Using an incorrect regex pattern can lead to unexpected results. For example, the following pattern will match most email addresses, but it will also match invalid addresses:

String pattern = "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+";

Corrected code:

String pattern = "\\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}\\b";

Mistake 2: Not Compiling the Pattern

Failing to compile the regex pattern can lead to a PatternSyntaxException. For example:

Matcher matcher = new Matcher(input);

Corrected code:

Pattern regex = Pattern.compile(pattern);
Matcher matcher = regex.matcher(input);

Mistake 3: Not Checking for Null

Failing to check for null input can lead to a NullPointerException. For example:

Matcher matcher = regex.matcher(input);

Corrected code:

if (input != null) {
    Matcher matcher = regex.matcher(input);
    // ...
}

Performance Tips

Here are three performance tips to keep in mind:

Tip 1: Compile the Pattern Once

Compiling the regex pattern once and reusing it can improve performance. For example:

Pattern regex = Pattern.compile(pattern);
Matcher matcher1 = regex.matcher(input1);
Matcher matcher2 = regex.matcher(input2);

Tip 2: Use a Matcher Reset

Resetting the matcher can improve performance when searching for multiple matches in the same input string. For example:

Matcher matcher = regex.matcher(input);
while (matcher.find()) {
    System.out.println("Match found: " + matcher.group());
    matcher.reset();
}

Tip 3: Use a Region

Limiting the search to a specific region of the input string can improve performance. For example:

matcher.region(0, 1000); // Search only the first 1000 characters
if (matcher.find()) {
    System.out.println("Match found in the first 1000 characters");
}

FAQ

Q: What is the difference between find() and matches()?

A: find() searches for a match anywhere in the input string, while matches() matches the entire input string against the pattern.

Q: How do I match Unicode characters in my regex pattern?

A: Use the \u escape sequence to match Unicode characters.

Q: Can I use regex to validate email addresses?

A: Yes, but be aware that regex patterns may not cover all possible valid email address formats.

Q: How do I improve the performance of my regex search?

A: Compile the pattern once, use a matcher reset, and limit the search to a specific region of the input string.

Q: What is the difference between Pattern and Matcher?

A: Pattern represents the compiled regex pattern, while Matcher represents the search results.

AI agent tools available. The CodeTidy MCP Server gives Claude, Cursor, and other AI agents access to 60+ developer tools. One command: npx @codetidy/mcp