Merge Two Arrays and Remove Duplicates in Java – 用 Java 合并两个数组并删除重复数据

最后修改: 2023年 9月 23日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’ll explore various methods of merging the contents of two arrays and subsequently eliminating the duplicates. It is similar to a union operation:

在本教程中,我们将探讨 合并两个数组的内容并随后删除重复内容的各种方法。这类似于联合操作:

  • array1 Union array2

For this, we’ll consider two arrays of integers. Basically, if the following are the two arrays:

为此,我们将考虑两个整数数组。基本上,如果下面是两个数组:

  • arr1 = { 3, 2, 1, 4, 5, 6, 8, 7, 6, 9 }
  • arr2 = { 8, 9, 10, 11, 12, 13, 15, 14, 15, 14, 16, 17 }

Then the result should be:

那么结果应该是

  • mergedArray = { 3, 2, 1, 4, 5, 6, 8, 7, 9, 10, 11, 12, 13, 14, 15, 16, 17 }

2. Approach with Basic Array Operations

2.基本数组操作方法

Normally, this requirement can be implemented by using Set or Stream API, explained in the later sections. But those libraries are resource-intensive. Hence, we’ll focus more on a traditional approach by using basic array operations while iterating through it.

通常,这一要求可以通过使用 SetStream API 来实现,这将在后面的章节中解释。但这些库都是资源密集型的。因此,我们将更多地关注传统方法,即在遍历数组时使用基本的数组操作。

2.1. Approach for Unsorted Arrays

2.1.无排序数组的处理方法

First, let’s define a function that would merge the arrays. Then, we’ll create a method to remove the duplicates from it. Alternatively, we can remove the duplicates from the arrays and then merge them. But this won’t make much of a difference with respect to performance.

首先,让我们定义一个合并数组的函数。或者,我们也可以先删除数组中的重复数据,然后再进行合并。但这样做在性能方面不会有太大影响。

Hence, let’s begin with the method that merges two arrays:

因此,让我们从合并两个数组的方法开始:

static int[] mergeArrays(int[] arr1, int[] arr2) {
    int[] mergedArrays = new int[arr1.length + arr2.length];
    System.arraycopy(arr1, 0, mergedArrays, 0, arr1.length);
    System.arraycopy(arr2, 0, mergedArrays, arr1.length, arr2.length);
    return mergedArrays;
}

Going forward, we’re going to use the above method for merging the arrays in all the sections.

接下来,我们将使用上述方法合并所有部分的数组。

Now, let’s implement the method to remove the duplicates from the merged arrays:

现在,让我们来实现从合并数组中删除重复数据的方法:

static int[] removeDuplicate(int[] arr) {
    int[] withoutDuplicates = new int[arr.length];
    int i = 0;

    for (int element : arr) {
        if (!isElementPresent(withoutDuplicates, element)) {
            withoutDuplicates[i] = element;
            i++;
        }
    }
    int[] truncatedArray = new int[i];
    System.arraycopy(withoutDuplicates, 0, truncatedArray, 0, i);
    return truncatedArray;
}

static boolean isElementPresent(int[] arr, int element) {
    for (int el : arr) {
        if (el == element) {
            return true;
        }
    }
    return false;
}

In the above method, removeDuplicate(), the withoutDuplicates array is populated with the elements of the merged array only if the element is not pre-existing in it.

在上述方法removeDuplicate()中,withoutDuplicates数组只有在合并后的数组中不存在元素的情况下才会被填充。

By using the above methods, let’s define mergeAndRemoveDuplicates():

通过使用上述方法,我们来定义 mergeAndRemoveDuplicates()

public static int[] mergeAndRemoveDuplicates(int[] arr1, int[] arr2) {
    return removeDuplicate(mergeArrays(arr1, arr2));
}

Let’s see how that works:

让我们拭目以待:

@Test
public void givenNoLibraryAndUnSortedArrays_whenArr1andArr2_thenMergeAndRemoveDuplicates() {
    int[] arr1 = {3, 2, 1, 4, 5, 6, 8, 7, 9};
    int[] arr2 = {8, 9, 10, 11, 12, 13, 15, 14, 15, 14, 16, 17};
    int[] expectedArr = {3, 2, 1, 4, 5, 6, 8, 7, 9, 10, 11, 12, 13, 15, 14, 16, 17};

    int[] mergedArr = MergeArraysAndRemoveDuplicate.mergeAndRemoveDuplicates(arr1, arr2);

    assertArrayEquals(expectedArr, mergedArr);
}

It proves we achieved the expected result. Interestingly, this approach preserves the order of the array elements as well.

这证明我们达到了预期效果。有趣的是,这种方法还保留了数组元素的顺序。

Since removing duplicates involves comparing each element in the merged array with all other elements, the time complexity of this approach is close to O(n x n).

由于删除重复内容涉及将合并数组中的每个元素与所有其他元素进行比较,因此这种方法的时间复杂度接近 O(n x n)。

2.2. Approach for Sorted Arrays

2.2.排序数组的方法

Let’s consider a scenario where the elements are already sorted in the array. Also, in the second array, the first element is equal to or greater than the last element of the first array.

假设数组中的元素已经排序。另外,在第二个数组中,第一个元素等于或大于第一个数组的最后一个元素。

In such a case, we can use the following method to get rid of duplicate elements:

在这种情况下,我们可以使用下面的方法来删除重复元素:

public static int[] removeDuplicateOnSortedArray(int[] arr) {
    int[] uniqueArray = new int[arr.length];
    uniqueArray[0] = arr[0];
    int uniqueCount = 1;

    for (int i = 1; i < arr.length; i++) {
        if (arr[i] != arr[i - 1]) {
            uniqueArray[uniqueCount] = arr[i];
            uniqueCount++;
        }
    }
    int[] truncatedArray = new int[uniqueCount];
    System.arraycopy(uniqueArray, 0, truncatedArray, 0, uniqueCount);
    return truncatedArray;
}

This approach is more efficient because it only compares adjacent elements. If they are not equal, they are added to the uniqueArray array. In contrast, earlier, the isElementPresent() method compared the array element with all the other elements in the array.

这种方法更高效,因为它只比较相邻的元素。如果它们不相等,则将它们添加到 uniqueArray 数组中。相比之下,先前的 isElementPresent() 方法将数组元素与数组中的所有其他元素进行比较。

Let’s see removeDuplicateOnSortedArray() in action:

让我们看看 removeDuplicateOnSortedArray() 的实际操作:

@Test
public void givenNoLibraryAndSortedArrays_whenArr1andArr2_thenMergeAndRemoveDuplicates() {
    int[] arr1 = {1, 2, 3, 4, 5, 5, 6, 7, 7, 8};
    int[] arr2 = {8, 9, 10, 11, 12, 13, 14, 15, 15, 16, 17};
    int[] expectedArr = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17};

    int[] mergedArr = MergeArraysAndRemoveDuplicate.mergeAndRemoveDuplicatesOnSortedArray(arr1, arr2);

    assertArrayEquals(expectedArr, mergedArr);
}

In a worst-case scenario, the time complexity of this approach is O(n).

在最坏的情况下,这种方法的时间复杂度为 O(n)。

3. Merge Arrays Using Set

3.使用 Set 合并数组

The Set doesn’t allow duplicate elements. Hence, this feature helps remove duplicates.

Set不允许使用重复元素。因此,该功能有助于删除重复元素

Let’s have a look at the method mergeAndRemoveDuplicatesUsingSet():

让我们来看看 mergeAndRemoveDuplicatesUsingSet() 方法:

public static int[] mergeAndRemoveDuplicatesUsingSet(int[] arr1, int[] arr2) {
    int[] mergedArr = mergeArrays(arr1, arr2);
    Set<Integer> uniqueInts = new HashSet<>();

    for (int el : mergedArr) {
        uniqueInts.add(el);
    }

    return getArrayFromSet(uniqueInts);
}

While adding the array elements into the Set uniqueInts, the elements are ignored if they are already present init. At the end, getArrayFromSet() converts the Set into an array.

在向 Set uniqueInts 中添加数组元素时,如果元素已经存在 init,则会被忽略。最后,getArrayFromSet()Set 转换为数组。

Let’s see how the above method behaves:

让我们看看上述方法是如何运行的:

@Test
public void givenSet_whenArr1andArr2_thenMergeAndRemoveDuplicates() {
    int[] arr1 = {3, 2, 1, 4, 5, 6, 8, 7, 9};
    int[] arr2 = {8, 9, 10, 11, 12, 13, 15, 14, 15, 14, 16, 17};
    int[] expectedArr = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17};

    int[] mergedArr = MergeArraysAndRemoveDuplicate.mergeAndRemoveDuplicatesUsingSet(arr1, arr2);

    assertArrayEquals(expectedArr, mergedArr);
}

It’s evident. that the arrays no longer preserve the order of the elements.

很明显,数组不再保留元素的顺序。

The execution time with this approach is dependent on the number of array elements. Hence, its time complexity is O(n).

这种方法的执行时间取决于数组元素的数量。因此,其时间复杂度为 O(n)。

4. Merge Arrays Using Stream

4.使用 Stream 合并数组

The powerful Stream API provides a very concise and declarative way of merging and removing duplicates from the arrays. For further details, let’s check out the method mergeAndRemoveDuplicatesUsingStream():

功能强大的 Stream API 提供了一种非常简洁和声明式的方法来合并和删除数组中的重复数据。有关详细信息,让我们查看方法 mergeAndRemoveDuplicatesUsingStream()

public static int[] mergeAndRemoveDuplicatesUsingStream(int[] arr1, int[] arr2) {
    Stream<Integer> s1 = Arrays.stream(arr1).boxed();
    Stream<Integer> s2 = Arrays.stream(arr2).boxed();
    return Stream.concat(s1, s2)
      .distinct()
      .mapToInt(Integer::intValue)
      .toArray();
}

Firstly, the arrays are converted individually into Stream. Then, the boxed method wraps the int elements in the arrays as Integer. Further, the Stream pipeline merges the arrays and then eliminates the duplicates from them.

首先,将数组单独转换为 Stream 。然后,盒式方法将数组中的 int 元素封装为 Integer 。此外,Stream 管道会合并数组,然后删除其中的重复内容。

Let’s see how it works:

让我们看看它是如何工作的:

@Test
public void givenStream_whenArr1andArr2_thenMergeAndRemoveDuplicates() {
    int[] arr1 = {3, 2, 1, 4, 5, 6, 8, 7, 9};
    int[] arr2 = {8, 9, 10, 11, 12, 13, 15, 14, 15, 14, 16, 17};
    int[] expectedArr = {3, 2, 1, 4, 5, 6, 8, 7, 9, 10, 11, 12, 13, 15, 14, 16, 17};

    int[] mergedArr = MergeArraysAndRemoveDuplicate.mergeAndRemoveDuplicatesUsingStream(arr1, arr2);

    assertArrayEquals(expectedArr, mergedArr);
}

It’s evident that the Stream approach preserves the order of the elements in the array.

很明显,Stream 方法保留了数组中元素的顺序。

The dominant factor in deciding the time complexity is the distinct() method in the stream, which is O(n). Overall, we can say that the time complexity of this approach is O(n).

决定时间复杂度的主要因素是流中的 distinct() 方法,其时间复杂度为 O(n)。总的来说,我们可以说这种方法的时间复杂度为 O(n)。

5. Conclusion

5.结论

In this tutorial, we examined various ways to merge two arrays and then remove the duplicates from them. The most concise approach uses Stream API. The set doesn’t preserve the order of the elements inserted in them. However, Set is extremely effective in removing duplicates. The other two approaches preserve the order of the elements in the array.

在本教程中,我们研究了合并两个数组并从中删除重复数据的各种方法。最简洁的方法是使用 Stream API。集合不保留插入元素的顺序。但是,Set 在删除重复元素方面非常有效。其他两种方法会保留数组中元素的顺序。

It is worth noting that Stream and Set are quite resource-intensive, and hence, it is advisable to avoid them wherever possible.

值得注意的是,StreamSet 非常耗费资源,因此建议尽可能避免使用它们。

As usual, the code examples are available over on GitHub.

与往常一样,代码示例可在 GitHub 上获取