1. Overview
1.概述
In this tutorial, we’ll review several ways of checking if a String contains a substring, and we’ll compare the performance of each.
在本教程中,我们将回顾几种检查String是否包含子串的方法,并比较每种方法的性能。
2. String.indexOf
2.String.indexOf
Let’s first try using the String.indexOf method. indexOf gives us the first position where the substring is found, or -1 if it isn’t found at all.
让我们首先尝试使用String.indexOf方法。indexOf给我们提供了找到子串的第一个位置,如果完全没有找到,则是-1。
When we search for “Rhap”, it will return 9:
当我们搜索 “Rhap “时,会返回9个。
Assert.assertEquals(9, "Bohemian Rhapsodyan".indexOf("Rhap"));
When we search for “rhap”, it’ll return -1 because it’s case sensitive.
当我们搜索 “rhap “时,它将返回-1,因为它是区分大小写的。
Assert.assertEquals(-1, "Bohemian Rhapsodyan".indexOf("rhap"));
Assert.assertEquals(9, "Bohemian Rhapsodyan".toLowerCase().indexOf("rhap"));
It’s also important to note, that if we search the substring “an”, it’ll return 6 because it returns the first occurrence:
同样重要的是,如果我们搜索子串“an”,它将返回6,因为它返回的是第一次出现的情况。
Assert.assertEquals(6, "Bohemian Rhapsodyan".indexOf("an"));
3. String.contains
3.String.contains
Next, let’s try String.contains. contains will search a substring throughout the entire String and will return true if it’s found and false otherwise.
接下来,让我们试试String.contains。contains将在整个String中搜索一个子串,如果找到它将返回true,否则返回false。
In this example, contains returns true because “Hey” is found.
在这个例子中,contains返回true,因为找到了 “Hey”。
Assert.assertTrue("Hey Ho, let's go".contains("Hey"));
If the string is not found, contains returns false:
如果没有找到该字符串,contains返回false。
Assert.assertFalse("Hey Ho, let's go".contains("jey"));
In the last example, “hey” is not found because String.contains is case-sensitive.
在最后一个例子中,”嘿 “没有被发现,因为String.contains是区分大小写的。
Assert.assertFalse("Hey Ho, let's go".contains("hey"));
Assert.assertTrue("Hey Ho, let's go".toLowerCase().contains("hey"));
An interesting point is that contains internally calls indexOf to know if a substring is contained, or not.
有趣的一点是,contains内部调用indexOf 以了解substring是否被包含。
4. StringUtils.containsIgnoreCase
4.StringUtils.containsIgnoreCase
Our third approach will be using StringUtils#containsIgnoreCase from the Apache Commons Lang library:
我们的第三个方法将使用StringUtils#containsIgnoreCase,来自Apache Commons Lang库。
Assert.assertTrue(StringUtils.containsIgnoreCase("Runaway train", "train"));
Assert.assertTrue(StringUtils.containsIgnoreCase("Runaway train", "Train"));
We can see that it will check if a substring is contained in a String, ignoring the case. That’s why containsIgnoreCase returns true when we search for “Trai” and also “trai” inside of “Runaway Train”.
我们可以看到,它会检查子串是否包含在字符串中,忽略大小写。这就是为什么containsIgnoreCase在我们搜索 “Tray “和 “Runaway Train “里面的 “tiri “时返回true。
This approach won’t be as efficient as the previous approaches as it takes additional time to ignore the case. containsIgnoreCase internally converts every letter to upper-case and compares the converted letters instead of the original ones.
这种方法不会像之前的方法那样高效,因为它需要额外的时间来忽略大小写。containsIgnoreCase在内部将每个字母转换为大写字母,并比较转换后的字母而不是原始字母。
5. Using Pattern
5.使用Pattern
Our last approach will be using a Pattern with a regular expression:
我们的最后一种方法将使用带有正则表达式的Pattern。
Pattern pattern = Pattern.compile("(?<!\\S)" + "road" + "(?!\\S)");
We can observe that we need to build the Pattern first, then we need to create the Matcher, and finally, we can check with the find method if there’s an occurrence of the substring or not:
我们可以观察到,我们需要先建立Pattern,然后我们需要创建Matcher,最后,我们可以用find方法检查是否有子串的出现:。
Matcher matcher = pattern.matcher("Hit the road Jack");
Assert.assertTrue(matcher.find());
For example, the first time that find is executed, it returns true because the word “road” is contained inside of the string “Hit the road Jack”, but when we try to find the same word in the string “and don’t you come back no more” it returns false:
例如,第一次执行find时,它返回true,因为单词 “road “包含在字符串 “Hit the road Jack “中,但是当我们试图在字符串 “and don’t you come back no more “中找到同一个单词时,它返回false:。
Matcher matcher = pattern.matcher("and don't you come back no more");
Assert.assertFalse(matcher.find());
6. Performance Comparison
6.性能比较
We’ll use an open-source micro-benchmark framework called Java Microbenchmark Harness (JMH) in order to decide which method is the most efficient in terms of execution time.
我们将使用一个名为Java Microbenchmark Harness(JMH)的开源微观基准框架,以决定哪种方法在执行时间上最有效。
6.1. Benchmark Setup
6.1.基准设置
As in every JMH benchmark, we have the ability to write a setup method, in order to have certain things in place before our benchmarks are run:
如同每一个JMH基准,我们有能力编写一个setup方法,以便在我们的基准运行之前将某些东西准备就绪。
@Setup
public void setup() {
message = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, " +
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. " +
"Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris " +
"nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in " +
"reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. " +
"Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt " +
"mollit anim id est laborum";
pattern = Pattern.compile("(?<!\\S)" + "eiusmod" + "(?!\\S)");
}
In the setup method, we’re initializing the message field. We’ll use this as the source text for our various searching implementations.
在setup方法中,我们正在初始化message字段。我们将使用它作为我们各种搜索实现的源文本。
We also are initializing pattern in order to use it later in one of our benchmarks.
我们也在初始化pattern,以便以后在我们的一个基准中使用它。
6.2. The String.indexOf Benchmark
6.2.String.indexOf基准测试
Our first benchmark will use indexOf:
我们的第一个基准将使用indexOf。
@Benchmark
public int indexOf() {
return message.indexOf("eiusmod");
}
We’ll search in which position “eiusmod” is present in the message variable.
我们将在message变量中搜索 “iusmod “出现在哪个位置。
6.3. The String.contains Benchmark
6.3.String.contains基准测试
Our second benchmark will use contains:
我们的第二个基准将使用contains。
@Benchmark
public boolean contains() {
return message.contains("eiusmod");
}
We’ll try to find if the message value contains “eiusmod”, the same substring used in the previous benchmark.
我们将尝试查找message值是否包含“iusmod”,也就是之前基准中使用的子串。
6.4. The StringUtils.containsIgnoreCase Benchmark
6.4.StringUtils.containsIgnoreCase基准测试
Our third benchmark will use StringUtils#containsIgnoreCase:
我们的第三个基准将使用StringUtils#containsIgnoreCase。
@Benchmark
public boolean containsStringUtilsIgnoreCase() {
return StringUtils.containsIgnoreCase(message, "eiusmod");
}
As with the previous benchmarks, we’ll search the substring in the message value.
与之前的基准一样,我们将在message值中搜索substring。
6.5. The Pattern Benchmark
6.5.Pattern基准测试
And our last benchmark will use Pattern:
而我们最后一个基准将使用Pattern。
@Benchmark
public boolean searchWithPattern() {
return pattern.matcher(message).find();
}
We’ll use the pattern initialized in the setup method to create a Matcher and be able to call the find method, using the same substring as before.
我们将使用在setup方法中初始化的模式来创建一个Matcher,并能够调用find方法,使用与之前相同的子字符串。
6.6. Analysis of Benchmarks Results
6.6.基准结果的分析
It’s important to note that we’re evaluating the benchmark results in nanoseconds.
值得注意的是,我们是以纳秒为单位评估基准结果的。
After running our JMH test, we can see the average time each took:
在运行我们的JMH测试后,我们可以看到每个人花费的平均时间。
- contains: 14.736 ns
- indexOf: 14.200 ns
- containsStringUtilsIgnoreCase: 385.632 ns
- searchWithPattern: 1014.633 ns
indexOf method is the most efficient one, closely followed by contains. It makes sense that contains took longer because is using indexOf internally.
indexOf方法是最有效的方法,紧随其后的是contains。contains用时较长是有道理的,因为它在内部使用indexOf。
containsStringUtilsIgnoreCase took extra time compared with the previous ones because it’s case insensitive.
containsStringUtilsIgnoreCase与之前的相比花费了额外的时间,因为它是不区分大小写的。
searchWithPattern, took an even higher average time the last one, proving that using Patterns is the worst alternative for this task.
searchWithPattern,花费的平均时间比上一个还要高,证明使用Patterns是这项任务的最差选择。
7. Conclusion
7.结语
In this article, we’ve explored various ways to search for a substring in a String. We’ve also benchmarked the performance of the different solutions.
在这篇文章中,我们探讨了在字符串中搜索子串的各种方法。我们还对不同解决方案的性能进行了基准测试。
As always, the code is available over on GitHub.
像往常一样,代码可在GitHub上获得。