1. Overview
1.概述
In this tutorial, we’ll see the benefits of pre-compile a regex pattern and the new methods introduced in Java 8 and 11.
在本教程中,我们将看到预编译重码模式的好处以及Java 8和11中引入的新方法。
This will not be a regex how-to, but we have an excellent Guide To Java Regular Expressions API for that purpose.
这不是一个关于如何使用正则表达式的文章,但是我们有一个很好的Guide To Java Regular Expressions API用于这个目的。
2. Benefits
2.效益
Reuse inevitably brings performance gain, as we don’t need to create and recreate instances of the same objects time after time. So, we can assume that reuse and performance are often linked.
重用不可避免地会带来性能的提升,因为我们不需要一次又一次地创建和重新创建相同对象的实例。因此,我们可以认为,重用和性能往往是联系在一起的。
Let’s take a look at this principle as it pertains to Pattern#compile. We’ll use a simple benchmark:
让我们来看看这个与Pattern#compile有关的原则。W我们将使用一个简单的基准。
- We have a list with 5,000,000 numbers from 1 to 5,000,000
- Our regex will match even numbers
So, let’s test parsing these numbers with the following Java regex expressions:
因此,让我们用下面的Java regex表达式来测试对这些数字的解析。
- String.matches(regex)
- Pattern.matches(regex, charSequence)
- Pattern.compile(regex).matcher(charSequence).matches()
- Pre-compiled regex with many calls to preCompiledPattern.matcher(value).matches()
- Pre-compiled regex with one Matcher instance and many calls to matcherFromPreCompiledPattern.reset(value).matches()
Actually, if we look at the String#matches‘s implementation:
实际上,如果我们看一下String#matches的实现。
public boolean matches(String regex) {
return Pattern.matches(regex, this);
}
And at Pattern#matches:
而在Pattern#matches。
public static boolean matches(String regex, CharSequence input) {
Pattern p = compile(regex);
Matcher m = p.matcher(input);
return m.matches();
}
Then, we can imagine that the first three expressions will perform similarly. That’s because the first expression calls the second, and the second calls the third.
然后,我们可以想象前三个表达式会有类似的表现。这是因为第一个表达式调用第二个,第二个调用第三个。
The second point is that these methods do not reuse the Pattern and Matcher instances created. And, as we’ll see in the benchmark, this degrades performance by a factor of six:
第二点是,这些方法没有重复使用创建的Pattern和Matcher实例。而且,正如我们将在基准中看到的,这使性能下降了6倍。
@Benchmark
public void matcherFromPreCompiledPatternResetMatches(Blackhole bh) {
for (String value : values) {
bh.consume(matcherFromPreCompiledPattern.reset(value).matches());
}
}
@Benchmark
public void preCompiledPatternMatcherMatches(Blackhole bh) {
for (String value : values) {
bh.consume(preCompiledPattern.matcher(value).matches());
}
}
@Benchmark
public void patternCompileMatcherMatches(Blackhole bh) {
for (String value : values) {
bh.consume(Pattern.compile(PATTERN).matcher(value).matches());
}
}
@Benchmark
public void patternMatches(Blackhole bh) {
for (String value : values) {
bh.consume(Pattern.matches(PATTERN, value));
}
}
@Benchmark
public void stringMatchs(Blackhole bh) {
Instant start = Instant.now();
for (String value : values) {
bh.consume(value.matches(PATTERN));
}
}
Looking at the benchmark results, there’s no doubt that pre-compiled Pattern and reused Matcher are the winners with a result of more than six times faster:
看看基准测试结果,毫无疑问,预先编译的Pattern和重用的Matcher是赢家,结果快了6倍多。
Benchmark Mode Cnt Score Error Units
PatternPerformanceComparison.matcherFromPreCompiledPatternResetMatches avgt 20 278.732 ± 22.960 ms/op
PatternPerformanceComparison.preCompiledPatternMatcherMatches avgt 20 500.393 ± 34.182 ms/op
PatternPerformanceComparison.stringMatchs avgt 20 1433.099 ± 73.687 ms/op
PatternPerformanceComparison.patternCompileMatcherMatches avgt 20 1774.429 ± 174.955 ms/op
PatternPerformanceComparison.patternMatches avgt 20 1792.874 ± 130.213 ms/op
Beyond performance times, we also have the number of objects created:
除了性能时间,我们还有创建的对象的数量。
- First three forms:
- 5,000,000 Pattern instances created
- 5,000,000 Matcher instances created
- preCompiledPattern.matcher(value).matches()
- 1 Pattern instance created
- 5,000,000 Matcher instances created
- matcherFromPreCompiledPattern.reset(value).matches()
- 1 Pattern instance created
- 1 Matcher instance created
So, instead of delegating our regex to String#matches or Pattern#matches that always will create the Pattern and Matcher instances. We should pre-compile our regex to earn performance and has fewer objects created.
因此,不要将我们的重合码委托给String#matches或Pattern#matches,因为它们总是会创建Pattern和Matcher实例。我们应该预先编译我们的regex以获得性能,并减少创建的对象。
To know more about performance in regex check out our Overview of Regular Expressions Performance in Java.
要了解更多关于regex的性能,请查看我们的Java中正则表达式性能概述。
3. New Methods
3.新方法
Since the introduction of functional interfaces and streams, reuse has become easier.
自从引入功能接口和流后,重用变得更加容易。
The Pattern class has evolved in new Java versions to provide integration with streams and lambdas.
Pattern类在新的Java版本中有所发展,以提供与流和lambdas的集成。
3.1. Java 8
3.1.Java 8
Java 8 introduced two new methods: splitAsStream and asPredicate.
Java 8引入了两个新方法。splitAsStream和asPredicate。
Let’s look at some code for splitAsStream that creates a stream from the given input sequence around matches of the pattern:
让我们看看splitAsStream的一些代码,它从给定的输入序列中围绕模式的匹配创建一个流。
@Test
public void givenPreCompiledPattern_whenCallSplitAsStream_thenReturnArraySplitByThePattern() {
Pattern splitPreCompiledPattern = Pattern.compile("__");
Stream<String> textSplitAsStream = splitPreCompiledPattern.splitAsStream("My_Name__is__Fabio_Silva");
String[] textSplit = textSplitAsStream.toArray(String[]::new);
assertEquals("My_Name", textSplit[0]);
assertEquals("is", textSplit[1]);
assertEquals("Fabio_Silva", textSplit[2]);
}
The asPredicate method creates a predicate that behaves as if it creates a matcher from the input sequence and then calls find:
asPredicate方法创建了一个谓词,它的行为就像从输入序列中创建一个匹配器,然后调用查找。
string -> matcher(string).find();
Let’s create a pattern that matches names from a list that have at least first and last names with at least three letters each:
让我们创建一个模式,从列表中匹配至少有三个字母的名字和姓氏的名字。
@Test
public void givenPreCompiledPattern_whenCallAsPredicate_thenReturnPredicateToFindPatternInTheList() {
List<String> namesToValidate = Arrays.asList("Fabio Silva", "Mr. Silva");
Pattern firstLastNamePreCompiledPattern = Pattern.compile("[a-zA-Z]{3,} [a-zA-Z]{3,}");
Predicate<String> patternsAsPredicate = firstLastNamePreCompiledPattern.asPredicate();
List<String> validNames = namesToValidate.stream()
.filter(patternsAsPredicate)
.collect(Collectors.toList());
assertEquals(1,validNames.size());
assertTrue(validNames.contains("Fabio Silva"));
}
3.2. Java 11
3.2、Java 11
Java 11 introduced the asMatchPredicate method that creates a predicate that behaves as if it creates a matcher from the input sequence and then calls matches:
Java 11引入了asMatchPredicate方法,该方法创建了一个谓词,其行为就像从输入序列创建一个匹配器,然后调用匹配器。
string -> matcher(string).matches();
Let’s create a pattern that matches names from a list that have only first and last name with at least three letters each:
让我们创建一个模式,从列表中匹配只有名字和姓氏的人,每个名字至少有三个字母。
@Test
public void givenPreCompiledPattern_whenCallAsMatchPredicate_thenReturnMatchPredicateToMatchesPattern() {
List<String> namesToValidate = Arrays.asList("Fabio Silva", "Fabio Luis Silva");
Pattern firstLastNamePreCompiledPattern = Pattern.compile("[a-zA-Z]{3,} [a-zA-Z]{3,}");
Predicate<String> patternAsMatchPredicate = firstLastNamePreCompiledPattern.asMatchPredicate();
List<String> validatedNames = namesToValidate.stream()
.filter(patternAsMatchPredicate)
.collect(Collectors.toList());
assertTrue(validatedNames.contains("Fabio Silva"));
assertFalse(validatedNames.contains("Fabio Luis Silva"));
}
4. Conclusion
4.总结
In this tutorial, we saw that the use of pre-compiled patterns brings us a far superior performance.
在本教程中,我们看到,使用预编译模式为我们带来了远为优越的性能。
We also learned about three new methods introduced in JDK 8 and JDK 11 that make our lives easier.
我们还了解了JDK 8和JDK 11中引入的三种新方法,这些方法使我们的生活更轻松。
The code for these examples is available over on GitHub in core-java-11 for the JDK 11 snippets and core-java-regex for the others.
这些例子的代码可以在GitHub上找到,core-java-11用于JDK 11片段,core-java-regex用于其他。