1. Overview
1.概述
When using regular expressions in Java, sometimes we need to match regex patterns in their literal form – without processing any metacharacters present in those sequences.
在Java中使用正则表达式时,有时我们需要以字面形式匹配反义词模式 – 不处理这些序列中存在的任何metacharacters。
In this quick tutorial, let’s see how we can escape metacharacters inside regular expressions both manually and using the Pattern.quote() method provided by Java.
在这个快速教程中,让我们看看如何手动和使用Java提供的Pattern.quote()方法在正则表达式中转义元字符。
2. Without Escaping Metacharacters
2.没有逃避的元老级人物
Let’s consider a string holding a list of dollar amounts:
让我们来考虑一个持有美元数额列表的字符串。
String dollarAmounts = "$100.25, $100.50, $150.50, $100.50, $100.75";
Now, let’s imagine we need to search for occurrences of a specific amount of dollars inside it. Let’s initialize a regular expression pattern string accordingly:
现在,让我们设想一下,我们需要搜索里面出现的特定金额的美元。让我们相应地初始化一个正则表达式模式字符串。
String patternStr = "$100.50";
String patternStr = "$100.50";/code>
First off, let’s find out what happens if we execute our regex search without escaping any metacharacters:
首先,让我们来看看如果我们在没有转义任何元字符的情况下执行我们的重组搜索会发生什么。
public void whenMetacharactersNotEscaped_thenNoMatchesFound() {
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(dollarAmounts);
int matches = 0;
while (matcher.find()) {
matches++;
}
assertEquals(0, matches);
}
As we can see, matcher fails to find even a single occurrence of $150.50 within our dollarAmounts string. This is simply due to patternStr starting with a dollar sign which happens to be a regular expression metacharacter specifying an end of a line.
正如我们所看到的,matcher未能在我们的dollarAmounts字符串中找到$150.50的任何一次出现。这仅仅是由于patternStr以美元符号开始,而美元符号恰好是一个规则表达式metacharacter,指定行的结束。
As you probably should have guessed, we’d face the same issue over all the regex metacharacters. We won’t be able to search for mathematical statements that include carets (^) for exponents like “5^3“, or text that use backslashes (\) such as “users\bob“.
正如你可能已经猜到的那样,我们在所有的regex元字符上都会面临同样的问题。我们将无法搜索包含有代表指数的圆点(^)的数学语句,如”5^3“,或使用反斜线(\)的文本,如”users/bob“。
3. Manually Ignore Metacharacters
3.手动忽略元字符
So secondly, let’s escape the metacharacters within our regular expression before we perform our search:
因此,其次,在我们执行搜索之前,让我们在我们的正则表达式中取消元字符。
public void whenMetacharactersManuallyEscaped_thenMatchingSuccessful() {
String metaEscapedPatternStr = "\\Q" + patternStr + "\\E";
Pattern pattern = Pattern.compile(metaEscapedPatternStr);
Matcher matcher = pattern.matcher(dollarAmounts);
int matches = 0;
while (matcher.find()) {
matches++;
}
assertEquals(2, matches);
}
This time, we have successfully performed our search; But this can’t be the ideal solution due to a couple of reasons:
这一次,我们已经成功地进行了搜索;但由于几个原因,这不可能是理想的解决方案。
- String concatenation carried out when escaping the metacharacters that make the code more difficult to follow.
- Less clean code due to the addition of hard-coded values.
4. Use Pattern.quote()
4.使用Pattern.quote()
Finally, let’s see the easiest and cleanest way to ignore metacharacters in our regular expressions.
最后,让我们看看在正则表达式中忽略元字符的最简单、最干净的方法。
Java provides a quote() method inside their Pattern class to retrieve a literal pattern of a string:
Java提供了一个quote()方法在他们的Pattern类中可以检索一个字符串的字面模式。
public void whenMetacharactersEscapedUsingPatternQuote_thenMatchingSuccessful() {
String literalPatternStr = Pattern.quote(patternStr);
Pattern pattern = Pattern.compile(literalPatternStr);
Matcher matcher = pattern.matcher(dollarAmounts);
int matches = 0;
while (matcher.find()) {
matches++;
}
assertEquals(2, matches);
}
5. Conclusion
5.总结
In this article, we looked at how we can process regular expression patterns in their literal forms.
在这篇文章中,我们研究了如何处理正则表达式的字面形式。
We saw how not escaping regex metacharacters failed to provide the expected results and how escaping metacharacters inside regex patterns can be performed manually and using the Pattern.quote() method.
我们看到了不转义重词元字符如何不能提供预期的结果,以及如何手动和使用Pattern.quote()方法来转义重词模式内的元字符。
The full source code for all the code samples used here can be found over on GitHub.
这里使用的所有代码样本的完整源代码可以在GitHub上找到over。