Remove Characters From a String That Are in the Other String – 从字符串中删除另一字符串中的字符

最后修改: 2024年 1月 8日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

When we work with Java, we often encounter tasks that require precision and a collaborative effort between elements. Removing characters from a string based on their presence in another string is one such problem.

在使用 Java 时,我们经常会遇到一些需要精确度和元素间协作的任务。根据字符在另一个字符串中的存在情况从字符串中删除字符就是这样一个问题。

In this tutorial, we’ll explore various techniques to achieve this task.

在本教程中,我们将探讨实现这一任务的各种技术。

2. Introduction to the Problem

2.问题介绍

As usual, an example can help us understand the problem quickly. Let’s say we have two strings:

像往常一样,一个例子可以帮助我们快速理解问题。假设我们有两个字符串:

String STRING = "a b c d e f g h i j";
String OTHER = "bdfhj";

Our goal is to eliminate characters from the STRING string if they are present in the string OTHERThus, we expect to get this string as the result:

我们的目标是如果字符串OTHER中出现字符,则从STRING字符串中剔除这些字符。因此,我们希望得到这个字符串:

"a  c  e  g  i "

We’ll learn various approaches to solving this problem in this tutorial. Also, we’ll unit test these solutions to verify whether they produce the expected result.

在本教程中,我们将学习解决这一问题的各种方法。此外,我们还将对这些解决方案进行单元测试,以验证它们是否产生了预期的结果。

3. Using Nested Loops

3.使用嵌套循环

We know a string can be easily split into a char array using the standard toCharArray() method. So, a straightforward and classic approach is first converting the two strings to two char arrays. Then, for each character in STRING, we decide whether to remove it or not by checking if it’s present in OTHER.

我们知道,使用标准的 toCharArray() 方法,可以轻松地将字符串拆分为 char 数组。因此,一种简单而经典的方法是首先将两个字符串转换为两个 char 数组。然后,对于 STRING 中的每个字符,我们通过检查它是否出现在 OTHER 中来决定是否删除它。

We can use nested for loops to implement this logic:

我们可以使用嵌套的 for 循环来实现这一逻辑:

String nestedLoopApproach(String theString, String other) {
    StringBuilder sb = new StringBuilder();
    for (char c : theString.toCharArray()) {
        boolean found = false;
        for (char o : other.toCharArray()) {
            if (c == o) {
                found = true;
                break;
            }
        }
        if (!found) {
            sb.append(c);
        }
    }
    return sb.toString();
}

It’s worth noting since Java strings are immutable objects, we use StringBuilder instead of the ‘+’ operator to concatenate strings to gain better performance.

值得注意的是,由于 Java 字符串是不可变对象,我们使用 StringBuilder 而不是 ‘+’ 操作符来连接字符串,以获得更好的性能

Next, let’s create a test:

接下来,让我们创建一个测试:

String result = nestedLoopApproach(STRING, OTHER);
assertEquals("a  c  e  g  i ", result);

The test passes if we give it a run, so the method does the job.

如果我们运行它,测试就会通过,因此该方法完成了任务。

Since for each character in STRING, we check through the string OTHER, the time complexity of this solution is O(n2).

由于对于 STRING 中的每个字符, 我们都会通过字符串 OTHER 进行检查,因此此解决方案的时间复杂度为 O(n2)

4. Replacing the Inner Loop With the indexOf() Method

4.用 indexOf() 方法替换内循环

In the nested loops solution, we created the boolean flag found to store if the current character has been found in the OTHER String and then decided if we need to keep or discard this character by checking the found flag.

在嵌套循环解决方案中,我们创建了 boolean 标志 found 来存储当前字符是否在 OTHER String 中找到,然后通过检查 found 标志来决定是否需要保留或丢弃该字符。

Java provides the String.indexOf() method that allows us to locate a given character in a string. Further, if the string doesn’t contain the given character, the method returns -1.

Java 提供了String.indexOf()方法,允许我们在字符串中查找给定字符。此外,如果字符串不包含给定字符,该方法将返回 -1

So, if we make use of the String.indexOf() method, the inner loop and the found flag aren’t required:

因此,如果我们使用 String.indexOf() 方法,就不需要内循环和 found 标记:

String loopAndIndexOfApproach(String theString, String other) {
    StringBuilder sb = new StringBuilder();
    for (char c : theString.toCharArray()) {
        if (other.indexOf(c) == -1) {
            sb.append(c);
        }
    }
    return sb.toString();
}

As we can see, this method’s code is easier to understand than the nested loops one, and it passes the test as well:

我们可以看到,这种方法的代码比嵌套循环的代码更容易理解,而且也通过了测试:

String result = loopAndIndexOfApproach(STRING, OTHER);
assertEquals("a  c  e  g  i ", result);

Although this implementation is compact and easy to read, as the String.indexOf() method internally searches the target character through the string by a loop, its time complexity is still O(n2).

虽然这种实现方式简洁易读,但由于 String.indexOf() 方法通过循环在内部搜索字符串中的目标字符,其时间复杂度仍为 O(n2)

Next, let’s see if we can find a solution with lower time complexity.

接下来,让我们看看能否找到一个时间复杂度更低的解决方案。

5. Using a HashSet

5.使用 HashSet

HashSet is a commonly used collection data structure. It stores the elements in an internal HashMap.

HashSet 是一种常用的集合数据结构。它将元素存储在内部 HashMap 中。

Since the hash function’s time complexity is O(1), HashSet‘s contains() method is an O(1) operation.

由于散列函数的时间复杂度为 O(1),因此 HashSetcontains() 方法是一个 O(1) 运算。

Therefore, we can first store all characters in the OTHER string in a HashSet and then check each character from STRING in the HashSet:

因此,我们可以先将 OTHER 字符串中的所有字符存储到 HashSet 中,然后检查 STRINGHashSet 的每个字符:

String hashSetApproach(String theString, String other) {
    StringBuilder sb = new StringBuilder();
    Set<Character> set = new HashSet<>(other.length());
    for (char c : other.toCharArray()) {
        set.add(c);
    }

    for (char i : theString.toCharArray()) {
        if (set.contains(i)) {
            continue;
        }
        sb.append(i);
    }
    return sb.toString();
}

As the code above shows, the implementation is quite straightforward. Now, let’s delve into its performance.

如上面的代码所示,实现过程非常简单。现在,让我们深入了解一下它的性能。

Initially, we iterate through one string to populate the Set object, making it an O(n) operation. Subsequently, for each character in the other string, we utilize the set.contains() method. This results in n times O(1), becoming another O(n) complexity. Therefore, the entire solution comprises two O(n) operations.

起初,我们遍历一个字符串来填充 Set 对象,这使得操作数达到 O(n)。随后,对于另一个字符串中的每个字符,我们都要使用 set.contains() 方法。这将导致 n 乘以 O(1),成为另一个 O(n) 复杂度。因此,整个解决方案包括两个 O(n) 运算。

However, since the factor of two is a constant, the overall time complexity of the solution remains O(n). This stands out as a significant improvement compared to previous O(n2) solutions, demonstrating a considerably faster execution.

然而,由于 2 的系数是一个常数,因此解决方案的总体时间复杂度仍为 O(n)。与之前的 O(n2) 解法相比,这是一项重大改进,表明执行速度大大加快。

Finally, if we test the hashSetApproach() method, it gives the expected result:

最后,如果我们测试 hashSetApproach() 方法,就会得到预期的结果:

String result = hashSetApproach(STRING, OTHER);
assertEquals("a  c  e  g  i ", result);

6. Conclusion

6.结论

In this article, we explored three different approaches to removing characters from one string based on their presence in another.

在本文中,我们探讨了根据一个字符串中存在的字符从另一个字符串中移除的三种不同方法。

Furthermore, we conducted a performance analysis, explicitly focusing on time complexity. The results revealed that both nested loops and loops utilizing indexOf() exhibit equivalent time complexities, while solutions employing HashSet to be the most efficient.

此外,我们还进行了性能分析,明确关注时间复杂性。结果显示,嵌套循环和使用 indexOf() 的循环都表现出相同的时间复杂性,而使用 HashSet 的解决方案效率最高。

As always, the complete source code for the examples is available over on GitHub.

与往常一样,这些示例的完整源代码可在 GitHub 上获取。