Filtering a Java Collection by a List – 通过一个列表过滤一个Java集合

最后修改: 2019年 3月 8日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Filtering a Collection by a List is a common business logic scenario. There are plenty of ways to achieve this. However, some may lead to under-performing solutions if not done properly.

通过ListCollection进行过滤是一种常见的业务逻辑场景。有很多方法可以实现这一点。但是,如果操作不当,有些方法可能会导致性能不佳的解决方案。

In this tutorial, we’ll compare some filtering implementations and discuss their advantages and drawbacks.

在本教程中,我们将比较一些过滤实现,并讨论其优点和缺点

2. Using a For-Each Loop

2.使用For-Each循环

We’ll begin with the most classic syntax, a for-each loop.

我们将从最经典的语法开始,一个for-each循环。

For this and all other examples in this article, we’ll use the following class:

对于本文和其他所有的例子,我们将使用以下类:

public class Employee {

    private Integer employeeNumber;
    private String name;
    private Integer departmentId;
    //Standard constructor, getters and setters.
}

We’ll also use the following methods for all examples, for simplicity’s sake:

为了简单起见,我们还将在所有的例子中使用以下方法。

private List<Employee> buildEmployeeList() {
    return Arrays.asList(
      new Employee(1, "Mike", 1),
      new Employee(2, "John", 1),
      new Employee(3, "Mary", 1),
      new Employee(4, "Joe", 2),
      new Employee(5, "Nicole", 2),
      new Employee(6, "Alice", 2),
      new Employee(7, "Bob", 3),
      new Employee(8, "Scarlett", 3));
}

private List<String> employeeNameFilter() {
    return Arrays.asList("Alice", "Mike", "Bob");
}

For our example, we’ll filter the first list of Employees based on the second list with Employee names to find only the Employees with those specific names.

在我们的例子中,我们将根据带有雇员名字的第二个列表来过滤第一个列表中的雇员,以便只找到带有这些特定名字的雇员

Now, let’s see the traditional approach – looping through both lists looking for matches:

现在,让我们看看传统的方法–在两个列表中循环寻找匹配:

@Test
public void givenEmployeeList_andNameFilterList_thenObtainFilteredEmployeeList_usingForEachLoop() {
    List<Employee> filteredList = new ArrayList<>();
    List<Employee> originalList = buildEmployeeList();
    List<String> nameFilter = employeeNameFilter();

    for (Employee employee : originalList) {
        for (String name : nameFilter) {
            if (employee.getName().equals(name)) {
                filteredList.add(employee);
                // break;
            }
        }
    }

    assertThat(filteredList.size(), is(nameFilter.size()));
}

This is a simple syntax, but it’s quite verbose, and actually quite inefficient. Simply put, it iterates through the Cartesian product of the two sets in order to get our answer.

这是一个简单的语法,但它相当啰嗦,而且实际上效率很低。简单地说,它迭代了两个集合的笛卡尔积,以便得到我们的答案。

Even adding a break to exit early will still iterate on the same order as a Cartesian product in the average case.

即使添加一个break来提前退出,在一般情况下,仍然会按照笛卡尔乘积的相同顺序进行迭代。

If we call the size of the employee list n, then nameFilter will be on the order just as big, giving us an O(n2classification.

如果我们把雇员名单的大小称为n,那么nameFilter将在顺序上一样大,给我们一个O(n2分类。

3. Using Streams and List#contains

3.使用流和List#contains

We’ll now refactor the previous method by using lambdas to simplify syntax and improve readability. Let’s also use the List#contains method as the lambda filter:

我们现在将通过使用lambdas来重构之前的方法,以简化语法并提高可读性。让我们也使用List#contains方法作为lambda过滤器

@Test
public void givenEmployeeList_andNameFilterList_thenObtainFilteredEmployeeList_usingLambda() {
    List<Employee> filteredList;
    List<Employee> originalList = buildEmployeeList();
    List<String> nameFilter = employeeNameFilter();

    filteredList = originalList.stream()
      .filter(employee -> nameFilter.contains(employee.getName()))
      .collect(Collectors.toList());

    assertThat(filteredList.size(), is(nameFilter.size()));
}

By using the Stream API, readability has been greatly improved, but our code remains as inefficient as our previous method because it’s still iterating through the Cartesian product internally. Thus, we have the same O(n2classification.

通过使用Stream API,可读性得到了极大的改善,但是我们的代码仍然和之前的方法一样效率低下,因为它仍然在内部迭代笛卡尔积。因此,我们有相同的O(n2) 分类。

4. Using Streams with HashSet

4.使用流与HashSet

To improve performance, we must use the HashSet#contains method. This method differs from List#contains because it performs a hash code lookup, giving us a constant-time number of operations:

为了提高性能,我们必须使用HashSet#contains方法。该方法与List#contains不同,因为它执行了一个哈希代码查找,为我们提供了一个恒定时间的操作数:

@Test
public void givenEmployeeList_andNameFilterList_thenObtainFilteredEmployeeList_usingLambdaAndHashSet() {
    List<Employee> filteredList;
    List<Employee> originalList = buildEmployeeList();
    Set<String> nameFilterSet = employeeNameFilter().stream().collect(Collectors.toSet());

    filteredList = originalList.stream()
      .filter(employee -> nameFilterSet.contains(employee.getName()))
      .collect(Collectors.toList());

    assertThat(filteredList.size(), is(nameFilterSet.size()));
}

By using HashSet, our code efficiency has vastly improved while not affecting readability. Since HashSet#contains runs in constant time, we’ve improved our classification to O(n).

通过使用HashSet我们的代码效率得到了极大的提高,同时不影响可读性。由于HashSet#contains在恒定时间内运行,我们的分类已经改进为O(n)。

5. Conclusion

5.结论

In this quick tutorial, we learned how to filter a Collection by a List of values and the drawbacks of using what may seem like the most straightforward method.

在这个快速教程中,我们学习了如何通过一个List的值来过滤Collection,以及使用看似最直接的方法的弊端。

We must always consider efficiency because our code might end up running in huge data sets, and performance issues could have catastrophic consequences in such environments.

我们必须始终考虑效率,因为我们的代码最终可能会在巨大的数据集中运行,而性能问题在这种环境中可能会产生灾难性的后果。

All code presented in this article is available over on GitHub.

本文介绍的所有代码都可以在GitHub上找到。