1. Overview
1.概述
In this quick tutorial, we’ll focus on the substring functionality of Strings in Java.
在这个快速教程中,我们将重点介绍Java中字符串的子串功能。
We’ll mostly use the methods from the String class and few from Apache Commons’ StringUtils class.
我们将主要使用String类中的方法和Apache Commons的StringUtils类中的少数方法。
In all of the following examples, we’re going to using this simple String:
在以下所有的例子中,我们将使用这个简单的字符串。
String text = "Julia Evans was born on 25-09-1984. "
+ "She is currently living in the USA (United States of America).";
2. Basics of substring
2.子串的基础知识
Let’s start with a very simple example here – extracting a substring with the start index:
让我们从一个非常简单的例子开始–用起始索引提取一个子串。
assertEquals("USA (United States of America).",
text.substring(67));
Note how we extracted Julia’s country of residence in our example here.
注意我们在这里的例子中是如何提取Julia的居住国的。
There’s also an option to specify an end index, but without it – substring will go all the way to the end of the String.
还有一个选项可以指定结束索引,但是没有这个选项–子串将一直到字符串的结尾。
Let’s do that and get rid of that extra dot at the end, in the example above:
让我们这样做,并去掉上面例子中最后的那个多余的点。
assertEquals("USA (United States of America)",
text.substring(67, text.length() - 1));
In the examples above, we’ve used the exact position to extract the substring.
在上面的例子中,我们使用了精确的位置来提取子串。
2.1. Getting a Substring Starting at a Specific Character
2.1.获取一个从特定字符开始的子串
In case the position needs to be dynamically calculated based on a character or String we can make use of the indexOf method:
如果需要根据一个字符或字符串来动态计算位置,我们可以使用indexOf方法:。
assertEquals("United States of America",
text.substring(text.indexOf('(') + 1, text.indexOf(')')));
A similar method that can help us locate our substring is lastIndexOf. Let’s use lastIndexOf to extract the year “1984”. Its the portion of text between the last dash and the first dot:
一个类似的方法可以帮助我们定位我们的子串,那就是lastIndexOf。让我们使用lastIndexOf来提取 “1984 “这个年份。它是最后一个破折号和第一个点之间的那部分文本。
assertEquals("1984",
text.substring(text.lastIndexOf('-') + 1, text.indexOf('.')));
Both indexOf and lastIndexOf can take a character or a String as a parameter. Let’s extract the text “USA” and the rest of the text in the parenthesis:
indexOf和lastIndexOf都可以接受一个字符或一个String作为参数。让我们提取文本 “USA “和括号内的其他文本。
assertEquals("USA (United States of America)",
text.substring(text.indexOf("USA"), text.indexOf(')') + 1));
3. Using subSequence
3.使用subSequence
The String class provides another method called subSequence which acts similar to the substring method.
String类提供了另一个名为subSequence的方法,其作用与substring方法类似。
The only difference is that it returns a CharSequence instead of a String and it can only be used with a specific start and end index:
唯一的区别是,它返回的是CharSequence,而不是String,而且它只能用于特定的开始和结束索引:。
assertEquals("USA (United States of America)",
text.subSequence(67, text.length() - 1));
4. Using Regular Expressions
4.使用正则表达式
Regular expressions will come to our rescue if we have to extract a substring that matches a specific pattern.
如果我们必须提取一个与特定模式相匹配的子串,正则表达式就会来救我们。
In the example String, Julia’s date of birth is in the format “dd-mm-yyyy”. We can match this pattern using the Java regular expression API.
在这个例子中String, Julia的出生日期是 “dd-mm-yyyy “的格式。我们可以使用Java正则表达式API来匹配这个模式。
First of all, we need to create a pattern for “dd-mm-yyyy”:
首先,我们需要为 “dd-mm-yyy “创建一个模式。
Pattern pattern = Pattern.compile("\\d{2}-\\d{2}-\\d{4}");
Then, we’ll apply the pattern to find a match from the given text:
然后,我们将应用该模式,从给定的文本中找到一个匹配。
Matcher matcher = pattern.matcher(text);
Upon a successful match we can extract the matched String:
匹配成功后,我们可以提取匹配的字符串:。
if (matcher.find()) {
Assert.assertEquals("25-09-1984", matcher.group());
}
For more details on the Java regular expressions check out this tutorial.
关于Java正则表达式的更多细节,请查看这个教程。
5. Using split
5.使用split
We can use the split method from the String class to extract a substring. Say we want to extract the first sentence from the example String. This is quite easy to do using split:
我们可以使用split类中的String方法来提取一个子串。假设我们想从例子String.中提取第一句话,这很容易使用split来完成。
String[] sentences = text.split("\\.");
Since the split method accepts a regex we had to escape the period character. Now the result is an array of 2 sentences.
由于split方法接受一个regex,我们不得不转义句号字符。现在的结果是一个由两个句子组成的数组。
We can use the first sentence (or iterate through the whole array):
我们可以使用第一句话(或遍历整个数组)。
assertEquals("Julia Evans was born on 25-09-1984", sentences[0]);
Please note that there are better ways for sentence detection and tokenization using Apache OpenNLP. Check out this tutorial to learn more about the OpenNLP API.
请注意,使用Apache OpenNLP进行句子检测和标记化有更好的方法。请查看这个教程以了解更多关于OpenNLP API的信息。
6. Using Scanner
6.使用扫描器
We generally use Scanner to parse primitive types and Strings using regular expressions. A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace.
我们通常使用Scanner来使用正则表达式解析原始类型和字符串。一个Scanner使用一个分隔符模式将其输入分解成标记,默认情况下,该模式匹配空白。
Let’s find out how to use this to get the first sentence from the example text:
让我们来看看如何使用这个方法从例文中获得第一句话。
try (Scanner scanner = new Scanner(text)) {
scanner.useDelimiter("\\.");
assertEquals("Julia Evans was born on 25-09-1984", scanner.next());
}
In the above example, we have set the example String as the source for the scanner to use.
在上面的例子中,我们将例子String设为扫描仪使用的源。
Then we are setting the period character as the delimiter (which needs to be escaped otherwise it will be treated as the special regular expression character in this context).
然后,我们将句号字符设置为分隔符(需要转义,否则在这种情况下它将被视为特殊的正则表达式字符)。
Finally, we assert the first token from this delimited output.
最后,我们从这个划线的输出中断言第一个标记。
If required, we can iterate through the complete collection of tokens using a while loop.
如果需要,我们可以使用while循环来迭代整个标记集合。
while (scanner.hasNext()) {
// do something with the tokens returned by scanner.next()
}
7. Maven Dependencies
7.Maven的依赖性
We can go a bit further and use a useful utility – the StringUtils class – part of the Apache Commons Lang library:
我们可以更进一步,使用一个有用的工具–StringUtils类–Apache Commons Lang库的一部分。
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.12.0</version>
</dependency>
You can find the latest version of this library here.
你可以找到这个库的最新版本这里。
8. Using StringUtils
8.使用StringUtils
The Apache Commons libraries add some useful methods for manipulating core Java types. Apache Commons Lang provides a host of helper utilities for the java.lang API, most notably String manipulation methods.
Apache Commons库增加了一些有用的方法来操作核心Java类型。Apache Commons Lang为java.lang API提供了大量的辅助工具,最主要的是String操作方法。
In this example, we’re going to see how to extract a substring nested between two Strings:
在这个例子中,我们将看到如何提取一个嵌套在两个字符串之间的子字符串:。
assertEquals("United States of America",
StringUtils.substringBetween(text, "(", ")"));
There is a simplified version of this method in case the substring is nested in between two instances of the same String:
如果子串被嵌套在同一个字符串的两个实例之间,这个方法有一个简化版本:。
substringBetween(String str, String tag)
The substringAfter method from the same class gets the substring after the first occurrence of a separator.
同类中的substringAfter方法在第一次出现分隔符后获得子串。
The separator isn’t returned:
分隔符没有被返回。
assertEquals("the USA (United States of America).",
StringUtils.substringAfter(text, "living in "));
Similarly, the substringBefore method gets the substring before the first occurrence of a separator.
同样地,substringBefore方法获得了第一次出现的分隔符之前的子串。
The separator isn’t returned:
分隔符没有被返回。
assertEquals("Julia Evans",
StringUtils.substringBefore(text, " was born"));
You can check out this tutorial to find out more about String processing using Apache Commons Lang API.
你可以查看这个教程,了解更多关于使用Apache Commons Lang API处理String的信息。
9. Conclusion
9.结论
In this quick article, we found out various ways to extract a substring from a String in Java. You can explore our other tutorials on String manipulation in Java.
在这篇文章中,我们发现了在Java中从String中提取子串的各种方法。你可以探索我们的其他关于String在Java中操作的教程。
As always, code snippets can be found over on GitHub.
像往常一样,代码片段可以在GitHub上找到。