How to Count the Number of Matches for a Regex? – 如何计算一个Regex的匹配数?

最后修改: 2020年 7月 4日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Regular expressions can be used for a variety of text processing tasks, such as word-counting algorithms or validation of text inputs.

常规表达式可用于各种文本处理任务,例如字数计算算法或文本输入的验证。

In this tutorial, we’ll take a look at how to use regular expressions to count the number of matches in some text.

在本教程中,我们将看看如何使用正则表达式来计算一些文本中的匹配数量

2. Use Case

2.使用案例

Let’s develop an algorithm capable of counting how many times a valid email appears in a string.

让我们开发一个能够计算一个有效的电子邮件在一个字符串中出现多少次的算法

To detect an email address, we’ll use a simple regular expression pattern:

为了检测一个电子邮件地址,我们将使用一个简单的正则表达式模式。

([a-z0-9_.-]+)@([a-z0-9_.-]+[a-z])

Note that this is a trivial pattern for demonstration purposes only, as the actual regex for matching valid email addresses is quite complex.

请注意,这是一个琐碎的模式,仅用于示范目的,因为匹配有效电子邮件地址的实际重码是相当复杂的。

We’ll need this regular expression inside a Pattern object so we can use it:

我们需要在Pattern对象中使用这个正则表达式,这样我们才能使用它。

Pattern EMAIL_ADDRESS_PATTERN = 
  Pattern.compile("([a-z0-9_.-]+)@([a-z0-9_.-]+[a-z])");

We’ll look at two main approaches, one of which depends on using Java 9 or later.

我们将看两个主要的方法,其中一个取决于使用Java 9或更高版本。

For our example text, we will try to find the three emails in the string:

对于我们的示例文本,我们将尝试找到字符串中的三个电子邮件。

"You can contact me through writer@baeldung.com, editor@baeldung.com, and team@bealdung.com"

3. Counting Matches for Java 8 and Older

3.为Java 8和更高版本计算匹配数

Firstly, let’s see how to count the matches using Java 8 or older.

首先,让我们看看如何使用Java 8或更高版本来计算比赛。

A simple way of counting the matches is to iterate over the find method of the Matcher class. This method attempts to find the next subsequence of the input sequence that matches the pattern:

计算匹配数的一个简单方法是遍历Matcher类的find方法。这个方法试图找到输入序列中符合模式的下一个子序列

Matcher countEmailMatcher = EMAIL_ADDRESS_PATTERN.matcher(TEXT_CONTAINING_EMAIL_ADDRESSES);

int count = 0;
while (countEmailMatcher.find()) {
    count++;
}

Using this approach, we’ll find three matches, as expected:

使用这种方法,我们将找到三个匹配,正如预期。

assertEquals(3, count);

Note that the find method does not reset the Matcher after every match found — it resumes starting at the character after the end of the previous sequence matched, so it wouldn’t work to find overlapping email addresses.

请注意,find方法在每次找到匹配后都不会重置Matcher–它从上一个序列匹配结束后的字符开始恢复,所以它不能用于查找重叠的电子邮件地址。

For instance, let’s consider this example:

例如,让我们考虑这个例子。

String OVERLAPPING_EMAIL_ADDRESSES = "Try to contact us at team@baeldung.comeditor@baeldung.com, support@baeldung.com.";

Matcher countOverlappingEmailsMatcher = EMAIL_ADDRESS_PATTERN.matcher(OVERLAPPING_EMAIL_ADDRESSES);

int count = 0;
while (countOverlappingEmailsMatcher.find()) {
    count++;
}

assertEquals(2, count);

When the regex tries to find matches in the given String, first it’ll find “team@baeldung.comeditor” as a match. Since there’s no domain part preceding the @, the marker won’t get reset and the second “@baeldung.com” will get ignored. Moving on, it will also consider “support@baeldung.com” as the second match:

当regex试图在给定的String中找到匹配项时,首先会找到 “team@baeldung.comeditor “作为一个匹配。由于@前面没有域名部分,标记不会被重置,第二个“@baeldung.com”将被忽略。继续下去,它也会考虑 “support@baeldung.com “作为第二个匹配项。

As shown above, we only have two matches in the overlapping email example.

如上所示,在重叠的电子邮件例子中,我们只有两个匹配。

4. Counting Matches for Java 9 and Later

4.为Java 9及以后的版本计算匹配数

However, if we have a newer version of Java available, we can use the results​ method of the Matcher class. This method, added in Java 9, returns a sequential stream of match results, allowing us to count the matches more easily:

然而,如果我们有较新版本的Java,我们可以使用Matcher类的results方法。这个方法是在Java 9中添加的,它返回一个连续的匹配结果流,使我们能够更容易地计算匹配结果。

long count = countEmailMatcher.results()
  .count();

assertEquals(3, count);

Like we saw with find, the Matcher is not reset while processing the stream from the results method. Similarly, the results method wouldn’t work to find matches that overlap, either.

就像我们在find中看到的那样,在处理来自results方法的流时,Matcher不会被重置。同样地,results方法也不能用于查找重叠的匹配。

5. Conclusion

5.总结

In this short article, we’ve learned how to count the matches of a regular expression.

在这篇短文中,我们已经学会了如何计算正则表达式的匹配次数。

Firstly, we learned how to use the find method with a while loop. Then we saw how the new Java 9 streaming method allows us to do this with less code.

首先,我们学习了如何使用find方法与while循环。然后我们看到了新的Java 9流式方法是如何让我们用更少的代码完成这个任务的。

As always, the code samples are available over on GitHub.

像往常一样,代码样本可在GitHub上获得