1. Overview
1.概述
When working with a string collection, concatenating these strings with specific separators is a common task. Fortunately, various solutions are at our disposal, including using String.join() and Collectors.joining().
在处理字符串集合时,使用特定分隔符连接这些字符串是一项常见任务。幸运的是,我们可以使用各种解决方案,包括使用 String.join() 和 Collectors.joining().
In this quick tutorial, we’ll explore an interesting string concatenation problem: joining strings in a more natural language-like manner.
在本快速教程中,我们将探讨一个有趣的字符串连接问题:以更类似自然语言的方式连接字符串。
2. Introduction to the Problem
2.问题介绍
Let’s understand the problem with an example. Let’s say we have a list of strings {“A”, “B”, “C”, “D”}. If we want to join them with commas as the separator, the result would be “A, B, C, D“. So far, so good.
让我们通过一个例子来理解这个问题。假设我们有一个字符串列表 {“A”, “B”, “C”, “D”} 。如果我们想用逗号作为分隔符将它们连接起来,结果将是”A, B, C, D“。到目前为止,一切顺利。
However, if we want the joined result to follow English grammar, the expected outcome should be “A, B, C and D” or “A, B, C, and D“. We’ll see why there are two variants later. But, at least we understand that the result isn’t something we can obtain directly from a String.join() or Collectors.joining() method call.
但是,如果我们希望连接的结果遵循英语语法,预期结果应该是”A、B、C 和 D“或”A、B、C 和 D“。稍后我们将看到为什么会有两种变体。但是,至少我们明白,我们不能直接从 String.join() 或 Collectors.joining() 方法调用中获得结果。
The comma between “C” and “and” in the example above is called Oxford comma or Harvard comma. There are discussions about which style is more precise. But this isn’t our focus. We aim to create a method to support both scenarios.
上例中”C“和“and”之间的逗号称为牛津逗号或哈佛逗号。关于哪种风格更精确,还有很多讨论。但这不是我们的重点。我们的目标是创建一种方法来支持这两种情况。
So, given a list with more than two string elements, for instance, {“A”, “B”, “C”, … “X”, “Y”}, we may have two results depending on the requirement:
因此,给定一个包含两个以上字符串元素的列表,例如,{“A”、”B”、”C”、… “X”、”Y”},根据要求,我们可能会有两种结果:
- With Oxford comma – “A, B, C, …. X and Y“
- Without Oxford comma- “A, B, C, … X, and Y“
Moreover, we have only discussed a list with at least three element cases. The result can be different if the list holds less than three elements:
此外,我们只讨论了至少有三个元素的列表。如果列表中的元素少于三个,结果可能会不同:
- For an empty list, return an empty string, so, { } becomes “”
- For a list with a single element, return that element. For example, {“A”} becomes “A”
- When dealing with a list containing two string elements, combine them with the word “and” without using a comma. For instance, {“A”, “B”} becomes “A and B”
Next, let’s create a method to join a list of strings in a natural language-like manner. For simplicity, we assume the input list isn’t null and doesn’t contain null or empty string elements. In practice, if the list carries empty or null strings, we can filter out those elements first.
接下来,让我们创建一个方法,以类似自然语言的方式连接字符串列表。 实际上,如果列表中包含空字符串或 null 字符串,我们可以先过滤掉这些元素。
3. Creating the joinItemsAsNaturalLanguage() Method
3.创建 joinItemsAsNaturalLanguage() 方法
First, let’s look at the method implementation and then understand how it works:
首先,让我们看看方法的实现,然后了解它是如何工作的:
String joinItemsAsNaturalLanguage(List<String> list, boolean oxfordComma) {
if (list.size() < 3) {
return String.join(" and ", list);
}
// list has at least three elements
int lastIdx = list.size() - 1;
StringBuilder sb = new StringBuilder();
return sb.append(String.join(", ", list.subList(0, lastIdx)))
.append(oxfordComma ? ", and " : " and ")
.append(list.get(lastIdx))
.toString();
}
Now, let’s walk through the code quickly. First, we handle cases where the list contains less than three elements using String.join(” and “, list).
现在,让我们快速浏览一下代码。首先,我们使用 String.join(” and “, list) 来处理列表中元素少于三个的情况。</em
Then, if the list contains three or more strings, we take “, “ as the separator to join the elements in a sublist of the input, which excludes the last string. Finally, we concatenate the joined result with the last element with “and”. Of course, the oxfordComma option is considered as well.
然后,如果列表包含三个或更多字符串,我们将 “, “ 作为分隔符,将这些元素连接到输入的子列表中,其中不包括最后一个字符串。最后,我们用 “and”将连接结果与最后一个元素连接起来。当然,我们也会考虑使用 oxfordComma 选项。
It’s worth noting that we shouldn’t take the approach of joining all elements by commas first and replacing the last comma with “and”. This is because the last element might contain commas, too.
值得注意的是,我们不应该采取先用逗号连接所有元素,然后用“和”替换最后一个逗号的方法。这是因为最后一个元素也可能包含逗号。
Let’s test our solution without an Oxford comma:
让我们测试一下没有牛津逗号的解决方案:
assertEquals("", joinItemsAsNaturalLanguage(emptyList(), false));
assertEquals("A", joinItemsAsNaturalLanguage(List.of("A"), false));
assertEquals("A and B", joinItemsAsNaturalLanguage(List.of("A", "B"), false));
assertEquals("A, B, C, D and I have a comma (,)", joinItemsAsNaturalLanguage(List.of("A", "B", "C", "D", "I have a comma (,)"), false));
Finally, let’s test with an Oxford comma:
最后,让我们用牛津逗号进行测试:
assertEquals("", joinItemsAsNaturalLanguage(emptyList(), true));
assertEquals("A", joinItemsAsNaturalLanguage(List.of("A"), true));
assertEquals("A and B", joinItemsAsNaturalLanguage(List.of("A", "B"), true));
assertEquals("A, B, C, D, and I have a comma (,)", joinItemsAsNaturalLanguage(List.of("A", "B", "C", "D", "I have a comma (,)"), true));
4. Conclusion
4.结论
In this article, we discussed the problem of joining a list of strings in a natural language-like manner. Also, we learned how to create a method to solve this problem.
As always, the complete source code for the examples is available over on GitHub.
在本文中,我们讨论了以类似自然语言的方式连接字符串列表的问题。此外,我们还学习了如何创建一个方法来解决这个问题。
一如既往,示例的完整源代码可在 GitHub 上获取。