1. Overview
Regular expressions can be used for a variety of text processing tasks, such as word-counting algorithms or validation of text inputs.
In this tutorial, we’ll take a look at how to use regular expressions to count the number of matches in some text.
2. Use Case
Let’s develop an algorithm capable of counting how many times a valid email appears in a string.
To detect an email address, we’ll use a simple regular expression pattern:
Note that this is a trivial pattern for demonstration purposes only, as the actual regex for matching valid email addresses is quite complex.
We’ll need this regular expression inside a Pattern object so we can use it:
We’ll look at two main approaches, one of which depends on using Java 9 or later.
我们将看两个主要的方法,其中一个取决于使用Java 9或更高版本。
For our example text, we will try to find the three emails in the string:
"You can contact me through writer@baeldung.com, editor@baeldung.com, and team@bealdung.com"
3. Counting Matches for Java 8 and Older
3.为Java 8和更高版本计算匹配数
Firstly, let’s see how to count the matches using Java 8 or older.
首先,让我们看看如何使用Java 8或更高版本来计算比赛。
A simple way of counting the matches is to iterate over the find method of the Matcher class. This method attempts to find the next subsequence of the input sequence that matches the pattern:
int count = 0;
while (countEmailMatcher.find()) {
Using this approach, we’ll find three matches, as expected:
assertEquals(3, count);
Note that the find method does not reset the Matcher after every match found — it resumes starting at the character after the end of the previous sequence matched, so it wouldn’t work to find overlapping email addresses.
For instance, let’s consider this example:
String OVERLAPPING_EMAIL_ADDRESSES = "Try to contact us at team@baeldung.comeditor@baeldung.com, support@baeldung.com.";
Matcher countOverlappingEmailsMatcher = EMAIL_ADDRESS_PATTERN.matcher(OVERLAPPING_EMAIL_ADDRESSES);
int count = 0;
while (countOverlappingEmailsMatcher.find()) {
assertEquals(2, count);
When the regex tries to find matches in the given String, first it’ll find “team@baeldung.comeditor” as a match. Since there’s no domain part preceding the @, the marker won’t get reset and the second “@baeldung.com” will get ignored. Moving on, it will also consider “support@baeldung.com” as the second match:
当regex试图在给定的String中找到匹配项时,首先会找到 “team@baeldung.comeditor “作为一个匹配。由于@前面没有域名部分,标记不会被重置,第二个“@baeldung.com”将被忽略。继续下去,它也会考虑 “support@baeldung.com “作为第二个匹配项。
As shown above, we only have two matches in the overlapping email example.
4. Counting Matches for Java 9 and Later
4.为Java 9及以后的版本计算匹配数
However, if we have a newer version of Java available, we can use the results method of the Matcher class. This method, added in Java 9, returns a sequential stream of match results, allowing us to count the matches more easily:
然而,如果我们有较新版本的Java,我们可以使用Matcher类的results方法。这个方法是在Java 9中添加的,它返回一个连续的匹配结果流,使我们能够更容易地计算匹配结果。
long count = countEmailMatcher.results()
assertEquals(3, count);
Like we saw with find, the Matcher is not reset while processing the stream from the results method. Similarly, the results method wouldn’t work to find matches that overlap, either.
5. Conclusion
In this short article, we’ve learned how to count the matches of a regular expression.
Firstly, we learned how to use the find method with a while loop. Then we saw how the new Java 9 streaming method allows us to do this with less code.
首先,我们学习了如何使用find方法与while循环。然后我们看到了新的Java 9流式方法是如何让我们用更少的代码完成这个任务的。
As always, the code samples are available over on GitHub.