Find the Difference Between Two Sets – 寻找两组之间的差异

最后修改: 2022年 3月 28日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Set is one of the commonly used collection types in Java. Today, we’ll discuss how to find the difference between two given sets.

Set是Java中常用的集合类型之一。今天,我们将讨论如何找到两个给定集合之间的差异。

2. Introduction to the Problem

2.对问题的介绍

Before we take a closer look at the implementations, we need first to understand the problem. As usual, an example may help us to understand the requirement quickly.

在我们仔细研究实现方法之前,我们首先需要理解这个问题。像往常一样,一个例子可以帮助我们快速理解需求。

Let’s say we have two Set objects, set1 and set2:

假设我们有两个Set对象,set1set2

set1: {"Kotlin", "Java", "Rust", "Python", "C++"}
set2: {"Kotlin", "Java", "Rust", "Ruby", "C#"}

As we can see, both sets contain some programming language names. The requirement “Finding the difference between two Sets” may have two variants:

我们可以看到,两个集合都包含一些编程语言的名称。要求”找到两个集合的区别“可能有两个变体。

  • Asymmetric difference – Finding those elements that are contained by set1 but not contained by set2; in this case, the expected result is {“Python”, “C++”}
  • Symmetric difference – Finding the elements in either of the sets but not in their intersection; if we look at our example, the result should be {“Python”, “C++”, “Ruby”, “C#”}

In this tutorial, we’ll address the solution to both scenarios. First, we’ll focus on finding the asymmetric differences. After that, we’ll explore finding the symmetric difference between the two sets.

在本教程中,我们将解决这两种情况的解决方案。首先,我们将专注于寻找不对称的差异。之后,我们将探讨寻找两组之间的对称性差异。

Next, let’s see them in action.

接下来,让我们看看他们的行动。

3. Asymmetric Difference

3.非对称性差异

3.1. Using the Standard removeAll Method

3.1.使用标准的removeAll方法

The Set class has provided a removeAll method. This method implements the removeAll method from the Collection interface.

Set类提供了一个removeAll方法。这个方法实现了Collection接口中的removeAll方法。

The removeAll method accepts a Collection object as the parameter and removes all elements in the parameter from the given Set object. So, if we pass the set2 object as the parameter in this way, “set1.removeAll(set2)“, the rest of the elements in the set1 object will be the result.

removeAll方法接受一个Collection对象作为参数,并从给定的Set对象中删除参数中的所有元素。因此,如果我们以这种方式传递set2对象作为参数,”set1.removeAll(set2)“,那么set1对象中的其余元素将成为结果。

For simplicity, let’s show it as a unit test:

为了简单起见,我们把它作为一个单元测试来展示。

Set<String> set1 = Stream.of("Kotlin", "Java", "Rust", "Python", "C++").collect(Collectors.toSet());
Set<String> set2 = Stream.of("Kotlin", "Java", "Rust", "Ruby", "C#").collect(Collectors.toSet());
Set<String> expectedOnlyInSet1 = Set.of("Python", "C++");

set1.removeAll(set2);

assertThat(set1).isEqualTo(expectedOnlyInSet1);

As the method above shows, first, we initialize the two Set objects using Stream. Then, after calling the removeAll method, the set1 object contains the expected elements.

如上面的方法所示,首先,我们使用Stream初始化两个Set对象。然后,在调用removeAll方法后,set1对象包含预期的元素。

This approach is pretty straightforward. However, the drawback is obvious: After removing the common elements from set1, the original set1 is modified.

这种方法是相当直接的。然而,缺点是显而易见的:从set1中删除公共元素后,原来的set1被修改了

Therefore, we need to backup the original set1 object if we still need it after calling the removeAll method, or we have to create a new mutable set object if the set1 is an immutable Set.

因此,如果我们在调用removeAll方法后仍然需要原来的set1对象,我们就需要备份它;如果set1是一个immutable Set.,我们必须创建一个新的可变集对象。

Next, let’s take a look at another approach to returning the asymmetric difference in a new Set object without modifying the original set.

接下来,让我们看看另一种方法,在一个新的Set对象中返回不对称的差异,而不修改原始集合。

3.2. Using the Stream.filter Method

3.2.使用Stream.filter方法

The Stream API has been around since Java 8. It allows us to filter elements from a collection using the Stream.filter method.

Stream API自Java 8以来一直存在。它允许我们使用Stream.filter方法从一个集合中过滤元素。

We can also solve this problem using Stream.filter without modifying the original set1 object. Let’s first initialize the two sets as immutable sets:

我们也可以使用Stream.filter来解决这个问题,而不用修改原来的set1对象。让我们首先将这两个集合初始化为不可变的集合。

Set<String> immutableSet1 = Set.of("Kotlin", "Java", "Rust", "Python", "C++");
Set<String> immutableSet2 = Set.of("Kotlin", "Java", "Rust", "Ruby", "C#");
Set<String> expectedOnlyInSet1 = Set.of("Python", "C++");

Since Java 9, the Set interface introduced the static of method. It allows us to initialize an immutable Set object conveniently. That is to say, if we attempt to modify immutableSet1, an UnsupportedOperationException will be thrown.

从Java 9开始,Set接口引入了静态of方法。它允许我们方便地初始化一个不可变的Set对象。也就是说,如果我们试图修改immutableSet1,将抛出UnsupportedOperationException

Next, let’s write a unit test that uses Stream.filter to find the difference:

接下来,让我们写一个单元测试,使用Stream.filter来寻找差异。

Set<String> actualOnlyInSet1 = immutableSet1.stream().filter(e -> !immutableSet2.contains(e)).collect(Collectors.toSet());
assertThat(actualOnlyInSet1).isEqualTo(expectedOnlyInSet1);

As we can see in the method above, the key is “filter(e -> !immutableSet2.contains(e))“. Here, we only take the elements that are in immutableSet1 but not in immutableSet2.

正如我们在上面的方法中看到的,关键是”filter(e -> !immutableSet2.contains(e))/em>”。在这里,我们只取那些在immutableSet1中但不在immutableSet2中的元素。

If we execute this test method, it passes without any exception. It means this approach works, and the original sets are not modified.

如果我们执行这个测试方法,它没有任何例外地通过了。这意味着这种方法是有效的,而且原始集没有被修改。

3.3. Using the Guava Library

3.3.使用Guava库

Guava is a popular Java library that ships with some new collection types and convenient helper methods. Guava has provided a method to find the asymmetric differences between two sets. Therefore, we can use this method to solve our problems easily.

Guava是一个流行的Java库,它带有一些新的集合类型和方便的辅助方法。Guava提供了一种方法来寻找两个集合之间的不对称差异。因此,我们可以使用这个方法来轻松解决我们的问题。

But first, we need to include the library in our classpath. Let’s say we manage the project dependencies by Maven. We may need to add the Guava dependency to the pom.xml:

但首先,我们需要将该库纳入我们的classpath中。假设我们通过Maven管理项目的依赖性。我们可能需要在pom.xml中添加Guava依赖项

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>31.1-jre</version>
</dependency>

Once Guava is available in our Java project, we can use its Sets.difference method to get the expected result:

一旦Guava在我们的Java项目中可用,我们可以使用它的Sets.difference方法来获得预期结果

Set<String> actualOnlyInSet1 = Sets.difference(immutableSet1, immutableSet2);
assertThat(actualOnlyInSet1).isEqualTo(expectedOnlyInSet1);

It’s worth mentioning that the Sets.difference method returns an immutable Set view containing the result. It means:

值得一提的是,Sets.difference方法返回一个包含结果的不可变的Set视图。这意味着。

  • We cannot modify the returned set
  • If the original set is a mutable one, changes to the original set may be reflected in our resulting set view

3.4. Using the Apache Commons Library

3.4.使用Apache Commons库

Apache Commons is another widely used library. The Apache Commons Collections4 library provides many nice collection-related methods as complementary to the standard Collection API.

Apache Commons是另一个广泛使用的库。Apache Commons Collections4库提供了许多不错的与集合相关的方法,作为标准集合API的补充。

Before we start using it, let’s add the dependency to our pom.xml:

在我们开始使用它之前,让我们把这个依赖关系添加到我们的pom.xml

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-collections4</artifactId>
    <version>4.4</version>
</dependency>

Similarly, we can find the latest version at Maven’s central repository.

同样地,我们可以在Maven的中央仓库找到最新版本

The commons-collections4 library has a CollectionUtils.removeAll method. It’s similar to the standard Collection.removeAll method but returns the result in a new Collection object instead of modifying the first Collection object.

commons-collections4库有一个CollectionUtils.removeAll方法。它类似于标准的Collection.removeAll方法,但是在一个新的Collection对象中返回结果而不是修改第一个Collection对象

Next, let’s test it with two immutable Set objects:

接下来,让我们用两个不可变的Set对象来测试它。

Set<String> actualOnlyInSet1 = new HashSet<>(CollectionUtils.removeAll(immutableSet1, immutableSet2));
assertThat(actualOnlyInSet1).isEqualTo(expectedOnlyInSet1);

The test will pass if we execute it. But, we should note that the CollectionUtils.removeAll method returns the result in the Collection type.

如果我们执行它,测试将通过。但是,我们应该注意,CollectionUtils.removeAll方法返回Collection类型的结果

If a concrete type is required – for instance, Set in our case – we’ll need to convert it manually. In the test method above, we’ve initialized a new HashSet object using the returned collection.

如果需要一个具体的类型–例如,在我们的案例中,Set–我们需要手动转换。在上面的测试方法中,我们已经使用返回的集合初始化了一个新的HashSet对象。

4. Symmetric Difference

4.对称性差异

So far, we’ve learned how to get the asymmetric difference between two sets. Now, let’s take a closer look at the other scenario: finding the symmetric difference between two sets.

到目前为止,我们已经学会了如何获得两个集合之间的不对称差异。现在,让我们仔细看看另一种情况:寻找两个集合之间的对称性差异。

We’ll address two approaches to get the symmetric difference from our two immutable set examples.

我们将讨论两种方法,以从我们的两个不可变集的例子中获得对称性差异。

The expected result is:

预期的结果是。

Set<String> expectedDiff = Set.of("Python", "C++", "Ruby", "C#");

Next, let’s see how to solve the problem.

接下来,让我们看看如何解决这个问题。

4.1. Using HashMap

4.1.使用HashMap

One idea to solve the problem is first creating a Map<T, Integer> object.

解决问题的一个思路是首先创建一个Map<T, Integer>对象。

Then, we iterate through the two given sets and put each element to the map as the key. If the key exists in the map, it means this is a common element in both sets. We set a special number as the value – for example, Integer.MAX_VALUE. Otherwise, we put the element and the value 1 as a new entry in the map.

然后,我们遍历两个给定的集合,并将每个元素作为键放到地图中。如果键存在于地图中,这意味着这是两个集合中的共同元素。我们设置一个特殊的数字作为值–例如,Integer.MAX_VALUE。否则,我们把这个元素和值1作为一个新的条目放在地图中。

Finally, we find out the keys whose value is 1 in the map, and these keys are the symmetric difference between two given sets.

最后,我们找出地图中值为1的键,这些键是两个给定集合之间的对称差。

Next, let’s implement the idea in Java:

接下来,让我们在Java中实现这个想法。

public static <T> Set<T> findSymmetricDiff(Set<T> set1, Set<T> set2) {
    Map<T, Integer> map = new HashMap<>();
    set1.forEach(e -> putKey(map, e));
    set2.forEach(e -> putKey(map, e));
    return map.entrySet().stream()
      .filter(e -> e.getValue() == 1)
      .map(Map.Entry::getKey)
      .collect(Collectors.toSet());
}

private static <T> void putKey(Map<T, Integer> map, T key) {
    if (map.containsKey(key)) {
        map.replace(key, Integer.MAX_VALUE);
    } else {
        map.put(key, 1);
    }
}

Now, let’s test our solution and see if it can give the expected result:

现在,让我们测试一下我们的解决方案,看看它是否能给出预期的结果。

Set<String> actualDiff = SetDiff.findSymmetricDiff(immutableSet1, immutableSet2);
assertThat(actualDiff).isEqualTo(expectedDiff);

The test passes if we run it. That is to say, our implementation works as expected.

如果我们运行它,测试就会通过。这就是说,我们的实现如预期的那样工作。

4.2. Using the Apache Commons Library

4.2.使用Apache Commons库

We’ve already introduced the Apache Commons library when finding the asymmetric difference between two sets. Actually, the commons-collections4 library has a handy SetUtils.disjunction method to return the symmetric difference between two sets directly:

在寻找两个集合之间的不对称差时,我们已经介绍了Apache Commons库。实际上,commons-collections4库有一个方便的SetUtils.disjunction方法来直接返回两个集合的对称差

Set<String> actualDiff = SetUtils.disjunction(immutableSet1, immutableSet2);
assertThat(actualDiff).isEqualTo(expectedDiff);

As the method above shows, unlike the CollectionUtils.removeAll method, the SetUtils.disjunction method returns a Set object. We don’t need to manually convert it to Set.

正如上面的方法所示,与CollectionUtils.removeAll方法不同,SetUtils.disjunction方法返回一个Set对象。我们不需要手动将其转换为Set

5. Conclusion

5.总结

In this article, we’ve explored how to find differences between two Set objects through examples. Further, we’ve discussed two variants of this problem: finding asymmetric differences and symmetric differences.

在这篇文章中,我们通过实例探讨了如何查找两个Set对象之间的差异。此外,我们还讨论了这个问题的两个变体:寻找不对称的差异和对称的差异。

We’ve addressed solving the two variants using the standard Java API and widely used external libraries, such as Apache Commons-Collections and Guava.

我们利用标准的Java API和广泛使用的外部库,如Apache Commons-Collections和Guava,解决了这两种变体的问题。

As always, the source code used in this tutorial is available over on GitHub.

一如既往,本教程中所使用的源代码可在GitHub上获得