1. Overview
1.概述
In this tutorial, we’ll discuss several techniques in Java on how to remove repeated characters from a string.
在本教程中,我们将讨论在Java中如何删除字符串中的重复字符的几种技术。
For each technique, we’ll also talk briefly about its time and space complexity.
对于每一种技术,我们还将简要地谈论其时间和空间的复杂性。
2. Using distinct
2.使用distinct
Let’s start by removing the duplicates from our string using the distinct method introduced in Java 8.
让我们首先使用Java 8中引入的distinct方法,从我们的字符串中移除重复的部分。
Below, we’re obtaining an instance of an IntStream from a given string object. Then, we’re using the distinct method to remove the duplicates. Finally, we’re calling the forEach method to loop over the distinct characters and append them to our StringBuilder:
下面,我们从一个给定的字符串对象中获得一个IntStream实例。然后,我们使用distinct方法来删除重复的部分。最后,我们调用forEach方法来循环处理不同的字符,并将它们追加到我们的StringBuilder。
StringBuilder sb = new StringBuilder();
str.chars().distinct().forEach(c -> sb.append((char) c));
Time Complexity: O(n) – runtime of the loop is directly proportional to the size of the input string
时间复杂度。O(n) – 循环的运行时间与输入字符串的大小成正比。
Auxiliary Space: O(n) – since distinct uses a LinkedHashSet internally and we’re also storing the resulting string in a StringBuilder object
辅助空间:O(n)–因为distinct在内部使用了LinkedHashSet,而且我们还在StringBuilder对象中存储结果字符串。
Maintains Order: Yes – since the LinkedHashSet maintains the order of its elements
保持顺序:是–因为LinkedHashSet保持其元素的顺序。
And, while it’s nice that Java 8 does this task for us so nicely, let’s compare it to efforts to roll our own.
虽然Java 8为我们很好地完成了这项任务,但让我们把它与我们自己的努力进行比较。
3. Using indexOf
3.使用indexOf
The naive approach to removing duplicates from a string simply involves looping over the input and using the indexOf method to check whether the current character already exists in the resulting string:
从字符串中移除重复字符的天真方法仅仅是掠过输入并使用indexOf 方法来检查当前字符是否已经存在于结果字符串中。
StringBuilder sb = new StringBuilder();
int idx;
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
idx = str.indexOf(c, i + 1);
if (idx == -1) {
sb.append(c);
}
}
Time Complexity: O(n * n) – for each character, the indexOf method runs through the remaining string
时间复杂度。O(n * n) – 对于每一个字符,indexOf方法都会运行剩余的字符串
Auxiliary Space: O(n) – linear space is required since we’re using the StringBuilder to store the result
辅助空间: O(n) – 由于我们使用StringBuilder来存储结果,所以需要线性空间。
Maintains Order: Yes
维持秩序:是
This method has the same space complexity as the first approach but performs much slower.
这种方法的空间复杂度与第一种方法相同,但执行速度要慢得多。
4. Using a Character Array
4.使用一个字符阵列
We can also remove duplicates from our string by converting it into a char array and then looping over each character and comparing it to all subsequent characters.
我们还可以通过将字符串转换为char数组,然后在每个字符上循环,并将其与所有后续字符进行比较,来移除字符串中的重复字符。
As we can see below, we’re creating two for loops and we’re checking whether each element is repeated in the string. If a duplicate is found, we don’t append it to the StringBuilder:
正如我们在下面看到的,我们正在创建两个for循环,我们正在检查每个元素是否在字符串中重复。如果发现有重复,我们就不把它追加到StringBuilder中。
char[] chars = str.toCharArray();
StringBuilder sb = new StringBuilder();
boolean repeatedChar;
for (int i = 0; i < chars.length; i++) {
repeatedChar = false;
for (int j = i + 1; j < chars.length; j++) {
if (chars[i] == chars[j]) {
repeatedChar = true;
break;
}
}
if (!repeatedChar) {
sb.append(chars[i]);
}
}
Time Complexity: O(n * n) – we have an inner and an outer loop both traversing the input string
时间复杂度。O(n * n) – 我们有一个内循环和一个外循环,都在遍历输入字符串。
Auxiliary Space: O(n) – linear space is required since the chars variable stores a new copy of the string input and we’re also using the StringBuilder to save the result
辅助空间: O(n)–需要线性空间,因为chars变量存储了字符串输入的新副本,我们还使用StringBuilder来保存结果。
Maintains Order: Yes
维持秩序:是
Again, our second attempt performs poorly compared to the Core Java offering, but let’s see where we get with our next attempt.
同样,与Core Java的产品相比,我们的第二次尝试表现得很差,但让我们看看下一次尝试的结果如何。
5. Using Sorting
5.使用分类法
Alternatively, repeated characters can be eliminated by sorting our input string to group duplicates. In order to do that, we have to convert the string to a char array and sort it using the Arrays.sort method. Finally, we’ll iterate over the sorted char array.
另外,可以通过对我们的输入字符串进行排序来消除重复的字符,将重复的字符分组。 为了做到这一点,我们必须将字符串转换为char array,并使用Arrays.sortmethod对其进行排序。最后,我们将对排序后的char数组进行迭代。
During every iteration, we’re going to compare each element of the array with the previous element. If the elements are different then we’ll append the current character to the StringBuilder:
在每个迭代过程中,我们要将数组的每个元素与前一个元素进行比较。如果元素不同,我们将把当前的字符追加到StringBuilder:中。
StringBuilder sb = new StringBuilder();
if(!str.isEmpty()) {
char[] chars = str.toCharArray();
Arrays.sort(chars);
sb.append(chars[0]);
for (int i = 1; i < chars.length; i++) {
if (chars[i] != chars[i - 1]) {
sb.append(chars[i]);
}
}
}
Time Complexity: O(n log n) – the sort uses a dual-pivot Quicksort which offers O(n log n) performance on many data sets
时间复杂度。O(n log n) – 该排序使用双枢轴Quicksort,在许多数据集上提供O(n log n)性能。
Auxiliary Space: O(n) – since the toCharArray method makes a copy of the input String
辅助空间: O(n) – 因为toCharArray方法会对输入的String进行复制。
Maintains Order: No
维持秩序:没有
Let’s try that again with our final attempt.
让我们在最后的尝试中再试一下。
6. Using a Set
6.使用一个集
Another way to remove repeated characters from a string is through the use of a Set. If we do not care about the order of characters in our output string we can use a HashSet. Otherwise, we can use a LinkedHashSet to maintain the insertion order.
另一种从字符串中移除重复字符的方法是通过使用Set。如果我们不关心输出字符串中的字符顺序,我们可以使用HashSet。 否则,我们可以使用LinkedHashSet来保持插入的顺序。
In both cases, we’ll loop over the input string and add each character to the Set. Once the characters are inserted into the set, we’ll iterate over it to add them to the StringBuilder and return the resulting string:
在这两种情况下,我们将循环处理输入的字符串,并将每个字符添加到Set。一旦字符被插入到集合中,我们将对其进行迭代,将其添加到StringBuilder中,并返回结果字符串。
StringBuilder sb = new StringBuilder();
Set<Character> linkedHashSet = new LinkedHashSet<>();
for (int i = 0; i < str.length(); i++) {
linkedHashSet.add(str.charAt(i));
}
for (Character c : linkedHashSet) {
sb.append(c);
}
Time Complexity: O(n) – runtime of the loop is directly proportional to the size of the input string
时间复杂度。O(n) – 循环的运行时间与输入字符串的大小成正比。
Auxiliary Space: O(n) – space required for the Set depends on the size of the input string; also, we’re using the StringBuilder to store the result
辅助空间: O(n) – Set所需的空间取决于输入字符串的大小;另外,我们正在使用StringBuilder来存储结果
Maintains Order: LinkedHashSet – Yes, HashSet – No
保持顺序: LinkedHashSet – 是,HashSet – No
And now, we’ve matched the Core Java approach! It’s not very shocking to find out that this is very similar to what distinct already does.
而现在,我们已经与Core Java的方法相匹配了!发现这与distinct已经做的事情非常相似,这并不十分令人震惊。
7. Conclusion
7.结语
In this article, we covered a few ways to remove repeated characters from a string in Java. We also looked at the time and space complexity of each of these methods.
在这篇文章中,我们介绍了几种在Java中删除字符串中重复字符的方法。我们还研究了每种方法的时间和空间复杂性。
As always, code snippets can be found over on GitHub.
像往常一样,代码片段可以在GitHub上找到over。