Performance of contains() in a HashSet vs ArrayList – HashSet与ArrayList中contains()的性能比较

最后修改: 2018年 8月 14日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.介绍

In this quick guide, we’re going to discuss the performance of the contains() method available in java.util.HashSet and java.util.ArrayList. They are both collections for storing and manipulating objects.

在这份快速指南中,我们将讨论contains()方法的性能,该方法在java.util.HashSet和java.util.ArrayList中可用。它们都是用于存储和操作对象的集合。

HashSet is a collection for storing unique elements. To learn more about the HashSet, check out this link.

HashSet是一个用于存储独特元素的集合。要了解关于HashSet的更多信息,请查看这个链接

ArrayList is a popular implementation of the java.util.List interface.

ArrayListjava.util.List接口的一个流行实现。

We have an extended article about the ArrayList available here.

我们有一篇关于ArrayList的扩展文章,可在这里查阅。

2. HashSet.contains()

2.HashSet.contains()

Internally, the HashSet implementation is based on a HashMap instance. The contains() method calls HashMap.containsKey(object).

在内部,HashSet的实现是基于一个HashMap实例的。contains()/em>方法调用HashMap.containsKey(object)/em>。

Here, it’s checking whether the object is in the internal map or not. The internal map stores data inside of the Nodes, known as buckets. Each bucket corresponds to a hash code generated with hashCode() method. So contains() is actually using hashCode() method to find the object’s location.

这里,它在检查object是否在内部地图中。内部地图在节点内部存储数据,被称为桶。每个桶对应于用hashCode()方法生成的哈希代码。所以contains()实际上是使用hashCode()方法来寻找对象的位置。

Now let’s determine the lookup time complexity. Before moving ahead, make sure you are familiar with Big-O notation.

现在让我们来确定查找时间的复杂性。在继续前进之前,请确保你熟悉Big-O记号

On average, the contains() of HashSet runs in O(1) time. Getting the object’s bucket location is a constant time operation. Taking into account possible collisions, the lookup time may rise to log(n) because the internal bucket structure is a TreeMap.

平均而言,HashSetcontains()运行时间O(1)获取对象的桶的位置是一个恒定的时间操作。考虑到可能的碰撞,查找时间可能会上升到log(n) ,因为内部的桶结构是一个TreeMap

This is an improvement from Java 7 which used a LinkedList for the internal bucket structure. In general, hash code collisions are rare. So we can consider the elements lookup complexity as O(1).

这是对Java 7的改进,后者使用LinkedList作为内部桶结构。一般来说,哈希代码的碰撞是很少的。所以我们可以认为元素查找的复杂性为O(1)

3. ArrayList.contains()

3.ArrayList.contains()

Internally, ArrayList uses the indexOf(object) method to check if the object is in the list. The indexOf(object) method iterates the entire array and compares each element with the equals(object) method.

在内部,ArrayList使用indexOf(object)方法来检查该对象是否在列表中indexOf(object)方法遍历整个数组,并用equals(object)方法比较每个元素。

Getting back to complexity analysis, the ArrayList.contains() method requires O(n) time. So the time we spend to find a specific object here depends on the number of items we have in the array.

回到复杂性分析,ArrayList.contains()方法需要O(n)时间。所以我们在这里找到一个特定对象所花费的时间取决于我们在数组中的项目数量。

4. Benchmark Testing

4.基准测试

Now, let’s warm up the JVM with the performance benchmark test. We’ll use the JMH (Java Microbenchmark Harness) OpenJDK product. To learn more about setup and execution, check out our useful guide.

现在,让我们用性能基准测试为JVM预热。我们将使用JMH(Java Microbenchmark Harness)OpenJDK产品。要了解有关设置和执行的更多信息,请查看我们的实用指南

To start, let’s create a simple CollectionsBenchmark class:

首先,让我们创建一个简单的CollectionsBenchmark类。

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5)
public class CollectionsBenchmark {

    @State(Scope.Thread)
    public static class MyState {
        private Set<Employee> employeeSet = new HashSet<>();
        private List<Employee> employeeList = new ArrayList<>();

        private long iterations = 1000;

        private Employee employee = new Employee(100L, "Harry");

        @Setup(Level.Trial)
        public void setUp() {

            for (long i = 0; i < iterations; i++) {
                employeeSet.add(new Employee(i, "John"));
                employeeList.add(new Employee(i, "John"));
            }

            employeeList.add(employee);
            employeeSet.add(employee);
        }
    }
}

Here, we create and initialize HashSet and an ArrayList of Employee objects:

在这里,我们创建并初始化HashSet和一个ArrayListEmployee对象。

public class Employee {

    private Long id;
    private String name;

    // constructor and getter setters go here
}

We add the employee = new Employee(100L, “Harry”) instance as the last elements to both collections. So we test the employee object’s lookup time for the worst possible case.

我们将employee = new Employee(100L, “Harry”) 实例作为最后一个元素添加到两个集合中。所以我们测试了employee对象的查找时间,这是最坏的情况。

@OutputTimeUnit(TimeUnit.NANOSECONDS) indicates that we want the results in nanoseconds. The number of default @Warmup iterations are 5 in our case. The @BenchmarkMode is set to Mode.AverageTime, which means we’re interested in calculating an average running time. For the first execution, we put iterations = 1000 items in our collections.

@OutputTimeUnit(TimeUnit.NANOSECONDS)表示我们希望得到纳秒级的结果。在我们的例子中,默认的@Warmup迭代次数是5次。@BenchmarkMode被设置为Mode.AverageTime,这意味着我们对计算平均运行时间感兴趣。对于第一次执行,我们把iterations = 1000项放在我们的集合中。

After, we add our benchmark methods to the CollectionsBenchmark class:

之后,我们将我们的基准方法添加到CollectionsBenchmark类。

@Benchmark
public boolean testArrayList(MyState state) {
    return state.employeeList.contains(state.employee);
}

Here we check whether the employeeList contains employee object.

这里我们检查employeeList是否包含employee对象。

Likewise, we have the familiar test for employeeSet:

同样地,我们有熟悉的employeeSet的测试。

@Benchmark
public boolean testHashSet(MyState state) {
    return state.employeeSet.contains(state.employee);
}

Finally, we can run the test:

最后,我们可以运行测试。

public static void main(String[] args) throws Exception {
    Options options = new OptionsBuilder()
      .include(CollectionsBenchmark.class.getSimpleName())
      .forks(1).build();
    new Runner(options).run();
}

Here are the results:

以下是结果。

Benchmark                           Mode  Cnt     Score     Error  Units
CollectionsBenchmark.testArrayList  avgt   20  4035.646 ± 598.541  ns/op
CollectionsBenchmark.testHashSet    avgt   20     9.456 ±   0.729  ns/op

We can clearly see that the testArrayList method has 4035.646 ns average lookup score, while the testHashSet performs faster with 9.456 ns on average.

我们可以清楚地看到,testArrayList方法的平均查找分数为4035.646 ns,而testHashSet的表现更快,平均为9.456 ns

Now, let’s increase the elements count in our test and run it for iterations = 10.000 items:

现在,让我们增加测试中的元素数,并在迭代=10.000项时运行。

Benchmark                           Mode  Cnt      Score       Error  Units
CollectionsBenchmark.testArrayList  avgt   20  57499.620 ± 11388.645  ns/op
CollectionsBenchmark.testHashSet    avgt   20     11.802 ±     1.164  ns/op

Here also, the contains() in HashSet has a huge performance advantage over the ArrayList.

在这里,HashSet中的contains()也比ArrayList有很大的性能优势。

5. Conclusion

5.结论

This quick write-up explains the performance of the contains() method of the HashSet and ArrayList collections. With the help of the JMH benchmarking, we’ve presented the performance of contains() for each type of collection.

这篇快速文章解释了HashSetArrayList集合的contains()方法的性能。在JMH基准测试的帮助下,我们介绍了每种类型集合的contains()的性能。

As a conclusion, we can learn, that the contains() method works faster in HashSet compared to an ArrayList.

作为结论,我们可以了解到,contains()方法在HashSet中的工作速度比ArrayList快。

As usual, the complete code for this article is over on GitHub project.

像往常一样,本文的完整代码在GitHub项目上