Splitting a Java String by Multiple Delimiters – 用多个分隔符拆分一个Java字符串

最后修改: 2021年 10月 30日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.绪论

We all know that splitting a string is a very common task. However, we often split using just one delimiter.

我们都知道,分割字符串是一项非常常见的任务。然而,我们经常只使用一个分隔符进行分割。

In this tutorial, we’ll discuss in detail different options for splitting a string by multiple delimiters.

在本教程中,我们将详细讨论用多个分隔符分割字符串的不同选项

2. Splitting a Java String by Multiple Delimiters

2.用多个分隔符拆分一个Java字符串

In order to show how each of the solutions below performs splitting, we’ll use the same example string:

为了说明下面的每个解决方案是如何进行分割的,我们将使用同一个例子的字符串。

String example = "Mary;Thomas:Jane-Kate";
String[] expectedArray = new String[]{"Mary", "Thomas", "Jane", "Kate"};

2.1. Regex Solution

2.1.Regex解决方案

Programmers often use different regular expressions to define a search pattern for strings. They’re also a very popular solution when it comes to splitting a string. So, let’s see how we can use a regular expression to split a string by multiple delimiters in Java.

程序员经常使用不同的regular expressions来定义字符串的搜索模式。当涉及到分割字符串时,它们也是一种非常流行的解决方案。因此,让我们看看如何在Java中使用正则表达式以多个分隔符来分割字符串。

First, we don’t need to add a new dependency since regular expressions are available in the java.util.regex packageWe just have to define an input string we want to split and a pattern.

首先,我们不需要添加一个新的依赖,因为正则表达式在java.util.regex中就有。我们只需要定义一个我们想要分割的输入字符串和一个模式。

The next step is to apply a pattern. A pattern can match zero or multiple times. To split by different delimiters, we should just set all the characters in the pattern.

下一步是应用一个模式。一个模式可以匹配零次或多次。要通过不同的分隔符进行分割,我们应该只设置模式中的所有字符。

We’ll write a simple test to demonstrate this approach:

我们将写一个简单的测试来演示这种方法。

String[] names = example.split("[;:-]");
Assertions.assertEquals(4, names.length);
Assertions.assertArrayEquals(expectedArray, names);

We’ve defined a test string with names that should be split by characters in the pattern. The pattern itself contains a semicolon, a colon, and a hyphen. When applied to the example string, we’ll get four names in the array.

我们已经定义了一个测试字符串,其中的名称应该由模式中的字符来分割。该模式本身包含一个分号、一个冒号和一个连字符。当应用于示例字符串时,我们将在数组中得到四个名字。

2.2. Guava Solution

2.2 番石榴解决方案

Guava also offers a solution for splitting a string by multiple delimiters. Its solution is based on a Splitter class. This class extracts the substrings from an input string using the separator sequence. We can define this sequence in multiple ways:

Guava也提供了一个通过多个分隔符来分割字符串的解决方案。它的解决方案是基于一个Splitter类。这个类使用分隔符序列从输入字符串中提取子串。我们可以以多种方式定义这个序列。

  • as a single character
  • a fixed string
  • a regular expression
  • CharMatcher instance

Further on, the Splitter class has two methods for defining the delimiters. So, let’s test both of them.

进一步说,Splitter类有两种方法来定义分隔符。所以,让我们测试一下这两个方法。

Firstly, we’ll add the Guava dependency:

首先,我们要添加Guava依赖。

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>31.0.1-jre</version>
</dependency>

Then, we’ll start with the on method: public static Splitter on(Pattern separatorPattern)

然后,我们将从on方法开始。public static Splitter on(Pattern separatorPattern)

It takes the pattern for defining the delimiters for splitting. First, we’ll define the combination of the delimiters and compile the pattern. After that, we can split the string.

它采取了定义分界符的模式来进行分割。首先,我们要定义定界符的组合并编译模式。之后,我们就可以分割字符串了。

In our example, we’ll use a regular expression to specify the delimiters:

在我们的例子中,我们将使用一个正则表达式来指定分隔符。

Iterable<String> names = Splitter.on(Pattern.compile("[;:-]")).split(example);
Assertions.assertEquals(4, Iterators.size(names.iterator()));
Assertions.assertIterableEquals(Arrays.asList(expectedArray), names);

The other method is the onPattern method: public static Splitter onPattern(String separatorPattern)

另一个方法是onPattern方法。public static Splitter onPattern(String separatorPattern)

The difference between this and the previous method is that the onPattern method takes the pattern as a string. There is no need to compile it like in the on method. We’ll define the same combination of the delimiters for testing the onPattern method:

这个方法和上一个方法的区别是,onPattern方法将模式作为一个字符串。不需要像在on方法中那样对其进行编译。我们将定义相同的分隔符组合来测试onPattern方法。

Iterable<String> names = Splitter.onPattern("[;:-]").split(example);
Assertions.assertEquals(4, Iterators.size(names.iterator()));
Assertions.assertIterableEquals(Arrays.asList(expectedArray), names);

In both tests, we managed to split the string and get the array with four names.

在这两次测试中,我们都成功地分割了字符串,得到了有四个名字的数组。

Since we’re splitting an input string with multiple delimiters, we can also use the anyOf method in the CharMatcher class:

由于我们要用多个定界符来分割一个输入字符串,我们也可以使用CharMatcher类中的anyOf方法。

Iterable<String> names = Splitter.on(CharMatcher.anyOf(";:-")).split(example);
Assertions.assertEquals(4, Iterators.size(names.iterator()));
Assertions.assertIterableEquals(Arrays.asList(expectedArray), names);

This option comes only with the on method in the Splitter class. The outcome is the same as for the previous two tests.

这个选项只在Splitter类中的on方法中出现。其结果与前两项测试相同。

2.3. Apache Commons Solution

2.3.Apache Commons解决方案

The last option we’ll discuss is available in the Apache Commons Lang 3 library.

我们要讨论的最后一个选项是在Apache Commons Lang 3库中提供的。

We’ll start by adding the Apache Commons Lang dependency to our pom.xml file:

我们首先将Apache Commons Lang依赖性添加到我们的pom.xml文件中。

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.12.0</version>
</dependency>

Next, we’ll use the split method from the StringUtils class:

接下来,我们将使用split类中的StringUtils方法。

String[] names = StringUtils.split(example, ";:-");
Assertions.assertEquals(4, names.length);
Assertions.assertArrayEquals(expectedArray, names);

We only have to define all the characters we’ll use to split the string. Calling the split method will divide the example string into four names.

我们只需要定义所有我们要用来分割字符串的字符。调用split方法将把example字符串分成四个名字。

3. Conclusion

3.总结

In this article, we’ve seen different options for splitting an input string by multiple delimiters. First, we discussed a solution based on regular expressions and plain Java. Later, we showed different options available in Guava. Finally, we wrapped up our examples with a solution based on the Apache Commons Lang 3 library.

在这篇文章中,我们已经看到了用多个分隔符分割输入字符串的不同方案。首先,我们讨论了一个基于正则表达式和普通Java的解决方案。后来,我们展示了Guava中的不同选项。最后,我们用一个基于Apache Commons Lang 3库的解决方案结束了我们的例子。

As always, the code for these examples is available over on GitHub.

像往常一样,这些例子的代码可以在GitHub上找到over