The Java HashMap Under the Hood – 罩子下的Java HashMap

最后修改: 2016年 12月 28日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this article, we are going to explore the most popular implementation of Map interface from the Java Collections Framework in more detail, picking up where our intro article left off.

在这篇文章中,我们将更详细地探讨Java集合框架中最流行的Map接口的实现,继续我们的intro文章中的内容。

Before we get started with the implementation, it’s important to point out that the primary List and Set collection interfaces extend Collection but Map does not.

在我们开始实现之前,有必要指出,主要的ListSet集合接口扩展了Collection,但Map没有。

Simply put, the HashMap stores values by key and provides APIs for adding, retrieving and manipulating stored data in various ways. The implementation is based on the the principles of a hashtable, which sounds a little complex at first but is actually very easy to understand.

简单地说,HashMap通过键来存储值,并提供了用于以各种方式添加、检索和操纵存储数据的API。该实现基于hashtable的原则,初听起来有点复杂,但实际上非常容易理解。

Key-value pairs are stored in what is known as buckets which together make up what is called a table, which is actually an internal array.

键值对被存储在所谓的桶中,这些桶共同构成了所谓的表,这实际上是一个内部数组。

Once we know the key under which an object is stored or is to be stored, storage and retrieval operations occur in constant time, O(1) in a well-dimensioned hash map.

一旦我们知道一个对象被存储或将被存储的密钥,存储和检索操作就会在恒定的时间内发生O(1) 在一个尺寸良好的哈希图中。

To understand how hash maps work under the hood, one needs to understand the storage and retrieval mechanism employed by the HashMap. We’ll focus a lot on these.

要了解哈希图是如何工作的,就需要了解哈希图采用的存储和检索机制。我们将大量关注这些。

Finally, HashMap related questions are quite common in interviews, so this is a solid way to either prepare an interview or prepare for it.

最后,哈希图相关的问题在面试中相当常见,因此这是准备面试或准备面试的一个可靠方法。

2. The put() API

2、put() API

To store a value in a hash map, we call the put API which takes two parameters; a key and the corresponding value:

为了在哈希图中存储一个值,我们调用put API,它需要两个参数;一个键和相应的值。

V put(K key, V value);

When a value is added to the map under a key, the hashCode() API of the key object is called to retrieve what is known as the initial hash value.

当一个值被添加到地图的一个键下时,键对象的hashCode() API被调用以检索所谓的初始哈希值。

To see this in action, let us create an object that will act as a key. We will only create a single attribute to use as a hash code to simulate the first phase of hashing:

为了看到这个动作,让我们创建一个将作为钥匙的对象。我们将只创建一个单一的属性来作为哈希代码,以模拟哈希的第一阶段。

public class MyKey {
    private int id;
   
    @Override
    public int hashCode() {
        System.out.println("Calling hashCode()");
        return id;
    }

    // constructor, setters and getters 
}

We can now use this object to map a value in the hash map:

现在我们可以使用这个对象来映射哈希图中的一个值。

@Test
public void whenHashCodeIsCalledOnPut_thenCorrect() {
    MyKey key = new MyKey(1);
    Map<MyKey, String> map = new HashMap<>();
    map.put(key, "val");
}

Nothing much happening in the above code, but pay attention to the console output. Indeed the hashCode method gets invoked:

上面的代码中没有发生什么,但要注意控制台的输出。事实上,hashCode方法被调用了。

Calling hashCode()

Next, the hash() API of the hash map is called internally to compute the final hash value using the initial hash value.

接下来,哈希图的hash() API被内部调用,以使用初始哈希值计算最终的哈希值。

This final hash value ultimately boils down to an index in the internal array or what we call a bucket location.

这个最终的哈希值最终会归结为内部数组中的一个索引,或者我们称之为桶的位置。

The hash function of HashMap looks like this:

HashHashMap函数看起来像这样。

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

What we should note here is only the use of the hash code from the key object to compute a final hash value.

我们在这里应该注意的只是使用钥匙对象的哈希代码来计算最终的哈希值。

While inside the put function, the final hash value is used like this:

put函数内,最后的哈希值是这样使用的。

public V put(K key, V value) {
    return putVal(hash(key), key, value, false, true);
}

Notice that an internal putVal function is called and given the final hash value as the first parameter.

请注意,一个内部的putVal函数被调用,并给出最终的哈希值作为第一个参数。

One may wonder why the key is again used inside this function since we have already used it to compute the hash value.

人们可能会问,既然我们已经用它来计算哈希值了,为什么还要在这个函数中再次使用密钥?

The reason is that hash maps store both key and value in the bucket location as a Map.Entry object.

原因是,哈希地图将键和值都以Map.Entry对象的形式存储在桶的位置

As discussed before, all Java collections framework interfaces extend Collection interface but Map does not. Compare the declaration of Map interface we saw earlier to that of Set interface:

如前所述,所有的Java集合框架接口都扩展了Collection接口,但Map没有。比较我们前面看到的Map接口的声明和Set接口的声明。

public interface Set<E> extends Collection<E>

The reason is that maps do not exactly store single elements as do other collections but rather a collection of key-value pairs.

原因是,地图并不像其他集合那样确切地存储单个元素,而是一个键值对的集合。

So the generic methods of Collection interface such as add, toArray do not make sense when it comes to Map.

因此,Collection接口的通用方法,如addtoArray在涉及Map时没有意义。

The concept we have covered in the last three paragraphs makes for one of the most popular Java Collections Framework interview questions. So, it’s worth understanding.

我们在过去三段中所涉及的概念,使其成为最受欢迎的Java集合框架面试问题之一。所以,这是值得理解的。

One special attribute with the hash map is that it accepts null values and null keys:

哈希图的一个特殊属性是它接受null值和空键。

@Test
public void givenNullKeyAndVal_whenAccepts_thenCorrect(){
    Map<String, String> map = new HashMap<>();
    map.put(null, null);
}

When a null key is encountered during a put operation, it is automatically assigned a final hash value of 0, which means it becomes the first element of the underlying array.

当在put操作中遇到一个空键时,它会被自动分配一个0的最终哈希值,这意味着它成为底层数组的第一个元素。

This also means that when the key is null, there is no hashing operation and therefore, the hashCode API of the key is not invoked, ultimately avoiding a null pointer exception.

这也意味着,当密钥为空时,没有散列操作,因此,密钥的hashCode API不被调用,最终避免了空指针异常。

During a put operation, when we use a key that was already used previously to store a value, it returns the previous value associated with the key:

put操作中,当我们使用一个之前已经使用过的键来存储一个值时,它会返回与该键相关的前一个值。

@Test
public void givenExistingKey_whenPutReturnsPrevValue_thenCorrect() {
    Map<String, String> map = new HashMap<>();
    map.put("key1", "val1");

    String rtnVal = map.put("key1", "val2");

    assertEquals("val1", rtnVal);
}

otherwise, it returns null:

否则,它返回null:

@Test
public void givenNewKey_whenPutReturnsNull_thenCorrect() {
    Map<String, String> map = new HashMap<>();

    String rtnVal = map.put("key1", "val1");

    assertNull(rtnVal);
}

When put returns null, it could also mean that the previous value associated with the key is null, not necessarily that it’s a new key-value mapping:

put返回null时,也可能意味着之前与键相关的值为null,而不一定是新的键值映射。

@Test
public void givenNullVal_whenPutReturnsNull_thenCorrect() {
    Map<String, String> map = new HashMap<>();

    String rtnVal = map.put("key1", null);

    assertNull(rtnVal);
}

The containsKey API can be used to distinguish between such scenarios as we will see in the next subsection.

containsKey API可以用来区分这种情况,我们将在下一小节看到。

3. The get API

3、get API

To retrieve an object already stored in the hash map, we must know the key under which it was stored. We call the get API and pass to it the key object:

要检索一个已经存储在哈希图中的对象,我们必须知道它所存储的键。我们调用get API并向其传递key对象。

@Test
public void whenGetWorks_thenCorrect() {
    Map<String, String> map = new HashMap<>();
    map.put("key", "val");

    String val = map.get("key");

    assertEquals("val", val);
}

Internally, the same hashing principle is used. The hashCode() API of the key object is called to obtain the initial hash value:

在内部,使用的是相同的散列原则。密钥对象的hashCode() API被调用以获得初始哈希值。

@Test
public void whenHashCodeIsCalledOnGet_thenCorrect() {
    MyKey key = new MyKey(1);
    Map<MyKey, String> map = new HashMap<>();
    map.put(key, "val");
    map.get(key);
}

This time, the hashCode API of MyKey is called twice; once for put and once for get:

这一次,MyKeyhashCode API被调用两次;一次是put,一次是get

Calling hashCode()
Calling hashCode()

This value is then rehashed by calling the internal hash() API to obtain the final hash value.

然后通过调用内部的hash() API来重构这个值,以获得最终的哈希值。

As we saw in the previous section, this final hash value ultimately boils down to a bucket location or an index of the internal array.

正如我们在上一节看到的,这个最终的哈希值最终归结为一个桶的位置或内部数组的索引。

The value object stored in that location is then retrieved and returned to the calling function.

然后,存储在该位置的值对象被检索并返回给调用函数。

When the returned value is null, it could mean that the key object is not associated with any value in the hash map:

当返回值为空时,可能意味着键对象没有与哈希图中的任何值相关联。

@Test
public void givenUnmappedKey_whenGetReturnsNull_thenCorrect() {
    Map<String, String> map = new HashMap<>();

    String rtnVal = map.get("key1");

    assertNull(rtnVal);
}

Or it could simply mean that the key was explicitly mapped to a null instance:

或者它可能仅仅意味着键被明确地映射到一个空实例。

@Test
public void givenNullVal_whenRetrieves_thenCorrect() {
    Map<String, String> map = new HashMap<>();
    map.put("key", null);
        
    String val=map.get("key");
        
    assertNull(val);
}

To distinguish between the two scenarios, we can use the containsKey API, to which we pass the key and it returns true if and only if a mapping was created for the specified key in the hash map:

为了区分这两种情况,我们可以使用containsKey API,我们向其传递密钥,如果并且只有在哈希图中为指定的密钥创建了映射,它才会返回true。

@Test
public void whenContainsDistinguishesNullValues_thenCorrect() {
    Map<String, String> map = new HashMap<>();

    String val1 = map.get("key");
    boolean valPresent = map.containsKey("key");

    assertNull(val1);
    assertFalse(valPresent);

    map.put("key", null);
    String val = map.get("key");
    valPresent = map.containsKey("key");

    assertNull(val);
    assertTrue(valPresent);
}

For both cases in the above test, the return value of the get API call is null but we are able to distinguish which one is which.

对于上述测试中的两种情况,get API调用的返回值都是空的,但我们能够区分哪个是哪个。

4. Collection Views in HashMap

4.HashMap中的集合视图

HashMap offers three views that enable us to treat its keys and values as another collection. We can get a set of all keys of the map:

HashMap提供了三个视图,使我们能够将其键和值视为另一个集合。我们可以得到地图的所有键的集合

@Test
public void givenHashMap_whenRetrievesKeyset_thenCorrect() {
    Map<String, String> map = new HashMap<>();
    map.put("name", "baeldung");
    map.put("type", "blog");

    Set<String> keys = map.keySet();

    assertEquals(2, keys.size());
    assertTrue(keys.contains("name"));
    assertTrue(keys.contains("type"));
}

The set is backed by the map itself. So any change made to the set is reflected in the map:

集合的背后是地图本身的支持。因此,对集合所做的任何改变都会反映在地图上

@Test
public void givenKeySet_whenChangeReflectsInMap_thenCorrect() {
    Map<String, String> map = new HashMap<>();
    map.put("name", "baeldung");
    map.put("type", "blog");

    assertEquals(2, map.size());

    Set<String> keys = map.keySet();
    keys.remove("name");

    assertEquals(1, map.size());
}

We can also obtain a collection view of the values:

我们还可以获得一个数值的集合视图

@Test
public void givenHashMap_whenRetrievesValues_thenCorrect() {
    Map<String, String> map = new HashMap<>();
    map.put("name", "baeldung");
    map.put("type", "blog");

    Collection<String> values = map.values();

    assertEquals(2, values.size());
    assertTrue(values.contains("baeldung"));
    assertTrue(values.contains("blog"));
}

Just like the key set, any changes made in this collection will be reflected in the underlying map.

就像钥匙集一样,在这个集合中的任何变化都会反映在底层地图中

Finally, we can obtain a set view of all entries in the map:

最后,我们可以获得地图中所有条目的集合视图

@Test
public void givenHashMap_whenRetrievesEntries_thenCorrect() {
    Map<String, String> map = new HashMap<>();
    map.put("name", "baeldung");
    map.put("type", "blog");

    Set<Entry<String, String>> entries = map.entrySet();

    assertEquals(2, entries.size());
    for (Entry<String, String> e : entries) {
        String key = e.getKey();
        String val = e.getValue();
        assertTrue(key.equals("name") || key.equals("type"));
        assertTrue(val.equals("baeldung") || val.equals("blog"));
    }
}

Remember that a hash map specifically contains unordered elements, therefore we assume any order when testing the keys and values of entries in the for each loop.

请记住,哈希图特别包含无序的元素,因此我们在测试for each循环中的条目的键和值时假设任何顺序。

Many times, you will use the collection views in a loop as in the last example, and more specifically using their iterators.

很多时候,你会像上一个例子那样在一个循环中使用集合视图,更确切地说,是使用其迭代器。

Just remember that the iterators for all the above views are fail-fast.

只要记住,上述所有观点的迭代器是失败-快速

If any structural modification is made on the map, after the iterator has been created, a concurrent modification exception will be thrown:

如果在迭代器创建后,在地图上进行任何结构性修改,将抛出一个并发修改异常。

@Test(expected = ConcurrentModificationException.class)
public void givenIterator_whenFailsFastOnModification_thenCorrect() {
    Map<String, String> map = new HashMap<>();
    map.put("name", "baeldung");
    map.put("type", "blog");

    Set<String> keys = map.keySet();
    Iterator<String> it = keys.iterator();
    map.remove("type");
    while (it.hasNext()) {
        String key = it.next();
    }
}

The only allowed structural modification is a remove operation performed through the iterator itself:

唯一允许的结构修改是通过迭代器本身执行的remove操作。

public void givenIterator_whenRemoveWorks_thenCorrect() {
    Map<String, String> map = new HashMap<>();
    map.put("name", "baeldung");
    map.put("type", "blog");

    Set<String> keys = map.keySet();
    Iterator<String> it = keys.iterator();

    while (it.hasNext()) {
        it.next();
        it.remove();
    }

    assertEquals(0, map.size());
}

The final thing to remember about these collection views is the performance of iterations. This is where a hash map performs quite poorly compared with its counterparts linked hash map and tree map.

关于这些集合视图,需要记住的最后一点是迭代的性能。这就是哈希图与它的同类链接哈希图和树形图相比表现相当差的地方。

Iteration over a hash map happens in worst case O(n) where n is the sum of its capacity and the number of entries.

散列图的迭代在最坏的情况下发生O(n),其中n是其容量和条目数之和。

5. HashMap Performance

5.HashMap性能

The performance of a hash map is affected by two parameters: Initial Capacity and Load Factor. The capacity is the number of buckets or the underlying array length and the initial capacity is simply the capacity during creation.

哈希图的性能受两个参数的影响。初始容量负载因子。容量是指桶的数量或底层数组的长度,初始容量只是创建时的容量。

The load factor or LF, in short, is a measure of how full the hash map should be after adding some values before it is resized.

负载因子或LF,简而言之,是衡量在添加一些值之后,在调整大小之前,哈希图应该有多满。

The default initial capacity is 16 and default load factor is 0.75. We can create a hash map with custom values for initial capacity and LF:

默认的初始容量为16,默认的负载系数为0.75。我们可以用初始容量和LF的自定义值创建一个哈希图。

Map<String,String> hashMapWithCapacity=new HashMap<>(32);
Map<String,String> hashMapWithCapacityAndLF=new HashMap<>(32, 0.5f);

The default values set by the Java team are well optimized for most cases. However, if you need to use your own values, which is very okay, you need to understand the performance implications so that you know what you are doing.

Java团队设置的默认值在大多数情况下是很好的优化。然而,如果你需要使用你自己的值,这是非常可以的,你需要了解对性能的影响,以便你知道你在做什么。

When the number of hash map entries exceeds the product of LF and capacity, then rehashing occurs i.e. another internal array is created with twice the size of the initial one and all entries are moved over to new bucket locations in the new array.

当哈希图条目的数量超过LF和容量的乘积时,就会发生重新洗牌,即创建另一个内部数组,其大小是初始数组的两倍,所有条目都被转移到新数组中的新桶位置

A low initial capacity reduces space cost but increases the frequency of rehashing. Rehashing is obviously a very expensive process. So as a rule, if you anticipate many entries, you should set a considerably high initial capacity.

一个低的初始容量可以减少空间成本,但增加了重洗的频率。重洗显然是一个非常昂贵的过程。因此,作为一项规则,如果你预计会有很多条目,你应该设置一个相当高的初始容量。

On the flip side, if you set the initial capacity too high, you will pay the cost in iteration time. As we saw in the previous section.

反过来说,如果你把初始容量设置得太高,你将在迭代时间上付出代价。正如我们在上一节看到的那样。

So a high initial capacity is good for a large number of entries coupled with little to no iteration.

因此,高初始容量对大量条目加上很少或没有迭代的情况下是很好的

A low initial capacity is good for few entries with a lot of iteration.

一个低的初始容量适合少量条目,大量迭代

6. Collisions in the HashMap

6.HashMap中的碰撞

A collision, or more specifically, a hash code collision in a HashMap, is a situation where two or more key objects produce the same final hash value and hence point to the same bucket location or array index.

碰撞,或者更具体地说,在HashMap中的哈希代码碰撞,是指两个或更多的键对象产生相同的最终哈希值,从而指向同一个桶的位置或数组索引的情况。

This scenario can occur because according to the equals and hashCode contract, two unequal objects in Java can have the same hash code.

这种情况可能发生,因为根据equalshashCode合约,Java中两个不相等的对象可以有相同的哈希代码

It can also happen because of the finite size of the underlying array, that is, before resizing. The smaller this array, the higher the chances of collision.

它也可能因为底层数组的有限大小而发生,也就是说,在调整大小之前。这个数组越小,碰撞的几率就越大。

That said, it’s worth mentioning that Java implements a hash code collision resolution technique which we will see using an example.

也就是说,值得一提的是,Java实现了一种哈希码碰撞解决技术,我们将用一个例子来看看。

Keep in mind that it’s the hash value of the key that determines the bucket the object will be stored in. And so, if the hash codes of any two keys collide, their entries will still be stored in the same bucket.

请记住,是键的哈希值决定了该对象将被存储在哪个桶里。因此,如果任何两个键的哈希代码发生冲突,它们的条目仍将被存储在同一个桶中。

And by default, the implementation uses a linked list as the bucket implementation.

而在默认情况下,该实现使用一个链接列表作为桶的实现。

The initially constant time O(1) put and get operations will occur in linear time O(n) in the case of a collision. This is because after finding the bucket location with the final hash value, each of the keys at this location will be compared with the provided key object using the equals API.

最初的恒定时间O(1)putget操作在发生碰撞的情况下将以线性时间O(n)发生。这是因为在找到具有最终哈希值的桶的位置后,这个位置的每个键将使用equals API与提供的键对象进行比较。

To simulate this collision resolution technique, let’s modify our earlier key object a little:

为了模拟这种碰撞解决技术,让我们稍微修改一下我们先前的关键对象。

public class MyKey {
    private String name;
    private int id;

    public MyKey(int id, String name) {
        this.id = id;
        this.name = name;
    }
    
    // standard getters and setters
 
    @Override
    public int hashCode() {
        System.out.println("Calling hashCode()");
        return id;
    } 
 
    // toString override for pretty logging

    @Override
    public boolean equals(Object obj) {
        System.out.println("Calling equals() for key: " + obj);
        // generated implementation
    }

}

Notice how we’re simply returning the id attribute as the hash code – and thus force a collision to occur.

注意我们是如何简单地将id属性作为哈希代码返回的–从而迫使碰撞发生。

Also, note that we’ve added log statements in our equals and hashCode implementations – so that we know exactly when the logic is called.

另外,请注意,我们在equalshashCode实现中加入了日志语句–这样我们就能准确地知道逻辑被调用的时间。

Let’s now go ahead to store and retrieve some objects that collide at some point:

现在让我们去存储和检索一些在某一点上发生碰撞的对象。

@Test
public void whenCallsEqualsOnCollision_thenCorrect() {
    HashMap<MyKey, String> map = new HashMap<>();
    MyKey k1 = new MyKey(1, "firstKey");
    MyKey k2 = new MyKey(2, "secondKey");
    MyKey k3 = new MyKey(2, "thirdKey");

    System.out.println("storing value for k1");
    map.put(k1, "firstValue");
    System.out.println("storing value for k2");
    map.put(k2, "secondValue");
    System.out.println("storing value for k3");
    map.put(k3, "thirdValue");

    System.out.println("retrieving value for k1");
    String v1 = map.get(k1);
    System.out.println("retrieving value for k2");
    String v2 = map.get(k2);
    System.out.println("retrieving value for k3");
    String v3 = map.get(k3);

    assertEquals("firstValue", v1);
    assertEquals("secondValue", v2);
    assertEquals("thirdValue", v3);
}

In the above test, we create three different keys – one has a unique id and the other two have the same id. Since we use id as the initial hash value, there will definitely be a collision during both storage and retrieval of data with these keys.

在上述测试中,我们创建了三个不同的键–一个有唯一的id,另外两个有相同的id。由于我们使用id作为初始哈希值,在用这些键存储和检索数据时肯定会发生碰撞

In addition to that, thanks to the collision resolution technique we saw earlier, we expect each of our stored values to be retrieved correctly, hence the assertions in the last three lines.

除此之外,由于我们前面看到的碰撞解决技术,我们期望我们的每个存储值都能被正确检索,因此最后三行的断言。

When we run the test, it should pass, indicating that collisions were resolved and we will use the logging produced to confirm that the collisions indeed occurred:

当我们运行测试时,它应该通过,表明碰撞被解决了,我们将使用产生的日志来确认碰撞确实发生了。

storing value for k1
Calling hashCode()
storing value for k2
Calling hashCode()
storing value for k3
Calling hashCode()
Calling equals() for key: MyKey [name=secondKey, id=2]
retrieving value for k1
Calling hashCode()
retrieving value for k2
Calling hashCode()
retrieving value for k3
Calling hashCode()
Calling equals() for key: MyKey [name=secondKey, id=2]

Notice that during storage operations, k1 and k2 were successfully mapped to their values using only the hash code.

请注意,在存储操作中,k1k2仅使用哈希代码就成功映射到了它们的值。

However, storage of k3 was not so simple, the system detected that its bucket location already contained a mapping for k2. Therefore, equals comparison was used to distinguish them and a linked list was created to contain both mappings.

然而,k3的存储并不那么简单,系统检测到其桶的位置已经包含了k2的映射。因此,equals比较被用来区分它们,并且创建了一个链接列表来包含这两个映射。

Any other subsequent mapping whose key hashes to the same bucket location will follow the same route and end up replacing one of the nodes in the linked list or be added to the head of the list if equals comparison returns false for all existing nodes.

任何其他后续的映射,其键哈希到相同的桶的位置将遵循相同的路线,并最终取代链接列表中的一个节点,或者如果equals比较对所有现有节点返回错误,则被添加到列表的头部。

Likewise, during retrieval, k3 and k2 were equals-compared to identify the correct key whose value should be retrieved.

同样,在检索过程中,k3k2equals比较,以确定其值应被检索的正确键。

On a final note, from Java 8, the linked lists are dynamically replaced with balanced binary search trees in collision resolution after the number of collisions in a given bucket location exceed a certain threshold.

最后,从Java 8开始,在碰撞解决中,当某个桶的位置的碰撞数量超过一定的阈值后,链接列表会被动态地替换为平衡的二进制搜索树。

This change offers a performance boost, since, in the case of a collision, storage and retrieval happen in O(log n).

这一变化带来了性能上的提升,因为在发生碰撞的情况下,存储和检索只需O(log n)。

This section is very common in technical interviews, especially after the basic storage and retrieval questions.

这一部分在技术面试中非常常见,特别是在基本的存储和检索问题之后。

7. Conclusion

7.结论

In this article, we have explored HashMap implementation of Java Map interface.

在这篇文章中,我们探讨了Java Map接口的HashMap实现。

The full source code for all the examples used in this article can be found in the GitHub project.

本文中使用的所有示例的完整源代码可以在GitHub项目中找到。