Case-Insensitive String Matching in Java – 在Java中进行不区分大小写的字符串匹配

最后修改: 2020年 2月 8日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

There are many ways to check if a String contains a substring. In this article, we’ll be looking for substrings within a String while focusing on case-insensitive workarounds to String.contains() in Java. Most importantly, we’ll provide examples of how to solve this issue.

有许多方法可以检查String是否包含一个子串。在这篇文章中,我们将在String中寻找子串,同时关注Java中String.contains()的不区分大小写的变通方法。最重要的是,我们将提供如何解决这个问题的例子。

2. The Simplest Solution: String.toLowerCase

2.最简单的解决方案 String.toLowerCase

The simplest solution is by using String.toLowerCase(). In this case, we’ll transform both strings to lowercase and then use the contains() method:

最简单的解决方案是通过使用String.toLowerCase()。在这种情况下,我们将把两个字符串转换为小写字母,然后使用contains()方法。

assertTrue(src.toLowerCase().contains(dest.toLowerCase()));

We can also use String.toUpperCase() and it would provide the same result.

我们也可以使用String.toUpperCase(),它将提供同样的结果。

3. String.matches With Regular Expressions

3.String.matches 使用正则表达式

Another option is by using String.matches() with regular expressions:

另一个选择是通过使用String.matches()与正则表达式。

assertTrue(src.matches("(?i).*" + dest + ".*"));

The matches() method takes a String to represent the regular expression. (?i) enables case-insensitivity and .* uses every character except line breaks.

matches()方法采用一个String来表示正则表达式。(?i) 启用不区分大小写.* 使用除换行符以外的所有字符。

4. String.regionMatches

4.String.regionMatches

We can also use String.regionMatches(). It checks if two String regions match, using true for the ignoreCase parameter:

我们也可以使用String.regionMatches()/a>。它检查两个String区域是否匹配,使用true作为ignoreCase参数。

public static boolean processRegionMatches(String src, String dest) {
    for (int i = src.length() - dest.length(); i >= 0; i--) 
        if (src.regionMatches(true, i, dest, 0, dest.length())) 
            return true; 
    return false;
}
assertTrue(processRegionMatches(src, dest));

To improve the performance, it starts matching the region, taking into account the length of the destination String. Then, it diminishes the iterator.

为了提高性能,它开始匹配区域,考虑到目标String的长度。然后,它减少了迭代器。

5. Pattern With the CASE_INSENSITIVE Option

5.模式使用CASE_INSENSITIVE选项

The java.util.regex.Pattern class provides us a way of matching strings using the matcher() method. In this case, we can use the quote() method to escape any special characters, and the CASE_INSENSITIVE flag. Let’s take a look:

java.util.regex.Pattern类为我们提供了一种使用matcher()方法来匹配字符串的方法。在这种情况下,我们可以使用quote()方法来转义任何特殊字符,以及CASE_INSENSITIVE标志。让我们看一看。

assertTrue(Pattern.compile(Pattern.quote(dest), Pattern.CASE_INSENSITIVE)
    .matcher(src)
    .find());

6. Apache Commons StringUtils.containsIgnoreCase

6.Apache Commons StringUtils.containsIgnoreCase

Finally, we’ll take advantage of the Apache Commons StringUtils class:

最后,我们将利用Apache Commons StringUtils

assertTrue(StringUtils.containsIgnoreCase(src, dest));

7. Performance Comparison

7.性能比较

As in this general article about checking for substrings using the contains method, we used the open-source framework Java Microbenchmark Harness (JMH) to compare the performance of the methods in nanoseconds:

正如这篇关于使用contains方法检查子字符串的一般文章,我们使用开源框架Java Microbenchmark Harness(JMH)来以纳秒数比较这些方法的性能

  1. Pattern CASE_INSENSITIVE Regular Expression: 399.387 ns
  2. String toLowerCase: 434.064 ns
  3. Apache Commons StringUtils: 496.313 ns
  4. String Region Matches: 718.842 ns
  5. String matches with Regular Expression: 3964.346 ns

As we can see, the winner is Pattern with the CASE_INSENSITIVE flag enabled, closely followed by toLowerCase(). We also noticed a clear improvement in the performance between Java 8 and Java 11.

我们可以看到,获胜者是启用CASE_INSENSITIVE标志的Pattern,紧接着是toLowerCase()。我们还注意到Java 8和Java 11之间的性能有明显的改善。

8. Conclusion

8.结语

In this tutorial, we looked at a few different ways to check a String for a substring, while ignoring the case in Java.

在本教程中,我们研究了几种不同的方法来检查一个String的子串,同时忽略了Java中的情况。

We looked at using String.toLowerCase() and toUpperCase(), String.matches(), String.regionMatches(), Apache Commons StringUtils.containsIgnoreCase(), and Pattern.matcher().find().

我们研究了使用String.toLowerCase()toUpperCase()String.matches()String.regionMatches()、Apache Commons StringUtils.containsIgnoreCase()、以及Pattern.matcher().find()

Also, we evaluated the performance of each solution and found that using the compile() method from java.util.regex.Pattern with the CASE_INSENSITIVE flag performed the best.

此外,我们评估了每个解决方案的性能,发现使用来自java.util.regex.PatternCASE_INSENSITIVE标志的compile()方法表现最好

As always, the code is available over on GitHub.

像往常一样,代码可在GitHub上获得