1. Overview
1.概述
HashMap is a well-known class from the Java Collections library. It implements the Map interface and allows the storage of key-value pairs. An instance of HashMap does not have a limitation on its number of entries. In some specific scenarios, we might want to change this behavior. In this tutorial, we’ll look at a few possible ways of enforcing a size limit on a HashMap.
HashMap 是 Java 集合库中的一个著名类。它实现了 Map 接口,允许存储键值对。HashMap的实例没有条目的数量限制。在某些特定场景中,我们可能希望改变这种行为。在本教程中,我们将探讨对 HashMap 执行大小限制的几种可能方法。
2. Notions on Java HashMap
2.关于 Java HashMap 的概念
The core of a HashMap is essentially a hash table. A hash table is a fundamental data structure based on two other basic structures: arrays and linked lists.
HashMap 的核心本质上是散列表。散列表是一种基于其他两种基本结构的基本数据结构:数组和 链接列表。
2.1. Inner Structure
2.1.内部结构
An array is the basic storage entity of a HashMap. Each position of the array contains a reference to a linked list. The linked list can contain a set of entries made of a key and a value. Both keys and values are Java objects, not primitive types, and keys are unique. The interface of HashMap has a put method defined as:
数组是 HashMap 的基本存储实体。数组的每个位置都包含对链表的引用。链表可以包含一组由键和值组成的条目。键和值都是 Java 对象,而不是原始类型,并且键是唯一的。HashMap 的接口有一个 put 方法,其定义如下:
V put(K key, V value)
It uses a so-called “hash function” that calculates a number called the “hash” from the input key. Then, starting from the hash and based on the current size of the array, it calculates the index in which to insert the entry.
它使用所谓的 “散列函数”,根据输入键计算出一个称为 “散列 “的数字。然后,从散列数开始,根据数组的当前大小,计算出插入条目的索引。
Different key values could have the same hash value and, thus, the same index. This results in a conflict, and when this happens, the entry is inserted in the next position of the linked list for that index.
不同的键值可能具有相同的哈希值,从而具有相同的索引。这将导致冲突,当冲突发生时,条目将被插入该索引的下一个链表位置。
If the number of entries in the linked list is greater than a specific threshold, defined by the TREEIFY_THRESHOLD constant, HashMap replaces the linked list with a tree, improving the runtime performance from O(n) to O(log(n)).
如果链表中的条目数大于特定阈值(由 TREEIFY_THRESHOLD 常量定义),HashMap 将用树形替换链表,从而将运行时性能从 O(n) 提高到 O(log(n)) 。
2.2. Resizing, Re-Hashing, and the Load Factor
2.2.调整大小、重新散列和负载系数
From the standpoint of performance, the ideal situation is the one in which the entries are spread over the whole array, with a maximum of one entry per position. As the number of entries grows, though, the number of conflicts rises, and so does the linked list size in each position.
从性能的角度来看,最理想的情况是条目遍布整个数组,每个位置最多只有一个条目。不过,随着条目的增加,冲突的数量也会增加,每个位置的链表大小也会增加。
To maintain a situation as close as possible to the ideal one, the HashMap resizes itself when the number of entries reaches a certain limit and then performs a recalculation of the hashes and indexes.
为了保持尽可能接近理想状态,HashMap 会在条目数达到一定限制时调整自身大小,然后重新计算哈希值和索引。
HashMap resizes itself based on the “load factor”. We can define the load factor as the maximum value of the fraction between the number of entries divided by the available positions before a resizing and re-hashing is needed. The default load factor of HashMap is 0.75f.
HashMap 根据 “负载因子 “调整自身大小。我们可以将负载因子定义为在需要调整大小和重新散列之前,条目数除以可用位置数的最大值。HashMap 的默认负载因子为 0.75f。
3. Scenarios in Which the Need for a Max Size of HashMap Could Arise
3.可能需要 HashMap 最大大小的情况
The size of a HashMap is limited only by the Java Virtual Machine memory available. This is how the class has been designed, and it’s consistent with the regular use cases for the hash table data structure.
HashMap的大小仅受 Java 虚拟机可用内存的限制。该类就是这样设计的,它与散列表数据结构的常规用例一致。
Still, there might be some specific scenarios in which we may need to impose a custom limit. For instance:
不过,在某些特殊情况下,我们可能需要施加自定义限制。例如
- Implementing a cache
- Gathering metrics on HashMap in a well-defined condition, avoiding its automatic re-sizing phase
As we’ll see in the following sections, for the first case, we can use the LinkedHashMap extension and its removeEldestEntry method. For the second case, we have to implement a custom extension of HashMap instead.
我们将在下面的章节中看到,对于第一种情况,我们可以使用 LinkedHashMap 扩展及其 removeEldestEntry 方法。对于第二种情况,我们必须实现 HashMap 的自定义扩展。
These scenarios are far from the original purpose of the HashMap design. A cache is a broader concept than a simple map, and the need to extend the original class makes it impossible to test it as if it were a pure black box.
这些情况与 HashMap 设计的初衷相去甚远。缓存是一个比简单的地图更宽泛的概念,而且由于需要扩展原始类,因此无法像测试纯黑盒那样测试缓存。
4. Limiting Max Size Using LinkedHashMap
4.使用 LinkedHashMap 限制最大大小
One possible way of limiting the max size is using LinkedHashMap, a subclass of HashMap. LinkedHashMap is a combination of a HashMap and a LinkedList: it stores a pointer to the previous and next entry. It can, therefore, maintain the insertion order of its entries, while HashMap doesn’t support that.
限制最大大小的一个可行方法是使用 链接哈希表,它是 HashMap 的一个子类。 LinkedHashMap 是 HashMap 和 LinkedList 的组合:它存储了指向上一个和下一个条目的指针。因此,它可以保持条目的插入顺序,而 HashMap 则不支持这一点。
4.1. Example with LinkedHashMap
4.1.使用 LinkedHashMap 的示例
We can limit the max size by overriding the removeEldestEntry() method. This method returns a boolean and is internally invoked to decide if the eldest entry should be removed after a new insert.
我们可以通过覆盖 removeEldestEntry() 方法来限制最大大小。此方法返回一个 布尔值,内部调用此方法是为了决定是否应在新插入后删除最长条目。
The eldest entry would be the least recently inserted key or the least recently accessed entry, depending on whether we create the LinkedHashMap instance with the accessOrder parameter as false, which is the default, or true.
最长条目将是最近插入最少的键,还是最近访问最少的条目,这取决于我们创建 LinkedHashMap 实例时,是将 accessOrder 参数设置为默认的 false 还是 true 。
By overriding the removeEldestEntry() method we define our own rule:
通过重载 removeEldestEntry() 方法,我们可以定义自己的规则:
@Test
void givenLinkedHashMapObject_whenAddingNewEntry_thenEldestEntryIsRemoved() {
final int MAX_SIZE = 4;
LinkedHashMap<Integer, String> linkedHashMap;
linkedHashMap = new LinkedHashMap<Integer, String>() {
@Override
protected boolean removeEldestEntry(Map.Entry<Integer, String> eldest) {
return size() > MAX_SIZE;
}
};
linkedHashMap.put(1, "One");
linkedHashMap.put(2, "Two");
linkedHashMap.put(3, "Three");
linkedHashMap.put(4, "Four");
linkedHashMap.put(5, "Five");
String[] expectedArrayAfterFive = { "Two", "Three", "Four", "Five" };
assertArrayEquals(expectedArrayAfterFive, linkedHashMap.values()
.toArray());
linkedHashMap.put(6, "Six");
String[] expectedArrayAfterSix = { "Three", "Four", "Five", "Six" };
assertArrayEquals(expectedArrayAfterSix, linkedHashMap.values()
.toArray());
}
In the above example, we created an instance of LinkedHashMap, overriding its removeEldestEntry method. Then, when the size of the entries set becomes greater than the given limit upon adding a new entry, the instance itself will remove its eldest key before inserting the new one.
在上述示例中,我们创建了一个 LinkedHashMap 实例,并重载了其 removeEldestEntry 方法。然后,在添加新条目时,当条目集的大小大于给定的限制时,该实例本身将在插入新条目之前移除其最长键。
In the test case, we preliminarily set a max size of 4 in the setUp method and fill the LinkedHashMap object with four entries. We can see that for each further insert, the number of entries is always 4, and the LinkedHashMap instance removes its eldest entry each time.
在测试用例中,我们在 setUp 方法中将最大大小初步设置为 4,并在 LinkedHashMap 对象中填入 4 个条目。我们可以看到,每次进一步插入时,条目数始终为 4,而且 LinkedHashMap 实例每次都会删除其最长的条目。
We must note that this is not a strict interpretation of the requirement of limiting the maximum size as the insertion operation is still allowed: The maximum size is kept by removing the eldest keys.
我们必须注意,这并不是对限制最大容量要求的严格解释,因为插入操作仍然是允许的:通过删除最长的密钥来保持最大容量。
We can implement a so-called LRU (Least Recently Used) cache by creating the LinkedHashMap with the accessOrder parameter as true by using the appropriate constructor.
我们可以通过使用适当的构造函数创建 LinkedHashMap 并将 accessOrder 参数设置为 true 来实现所谓的 LRU(最近最少使用)缓存。
4.2. Performance Considerations
4.2.性能考虑因素
A regular HashMap has the performance that we would expect from a hash table data structure. LinkedHashMap, on the other hand, has to keep the order of its entries, making its insertion and deletion operations slower. The access operation’s performance does not change unless we set the accessOrder flag to true.
普通的 HashMap 具有我们期望的哈希表数据结构的性能。另一方面,LinkedHashMap 必须保持条目的顺序,因此插入和删除操作的速度较慢。除非我们将 accessOrder 标志设置为 true,否则访问操作的性能不会改变。
5. Limiting Max Size Using a Custom HashMap
5.使用自定义 HashMap 限制最大尺寸
Another possible strategy is extending the HashMap with a custom implementation of the put method. We can make this implementation raise an exception when the HashMap instance reaches an imposed limit.
另一种可能的策略是使用 put 方法的自定义实现来扩展 HashMap 。我们可以使该实现在 HashMap 实例达到规定限制时引发异常。
5.1. Implementing the Custom HashMap
5.1.实现自定义 HashMap
For this approach, we have to override the put method, which makes a preliminary check:
对于这种方法,我们必须覆盖 put 方法,该方法会进行初步检查:
public class HashMapWithMaxSizeLimit<K, V> extends HashMap<K, V> {
private int maxSize = -1;
public HashMapWithMaxSizeLimit(int maxSize) {
super();
this.maxSize = maxSize;
}
@Override
public V put(K key, V value) {
if (this.maxSize == -1 || this.containsKey(key) || this.size() < this.maxSize) {
return super.put(key, value);
}
throw new RuntimeException("Max size exceeded!");
}
}
In the above example, we have an extension of HashMap. It has a maxSize attribute with -1 as the default value, which implicitly means “no limit”. We can modify this attribute by a specific constructor. In the put implementation, if maxSize equals the default, the key is already present, or the number of entries is lower than the specified maxSize, the extension calls the corresponding method of the superclass.
在上例中,我们扩展了 HashMap. 它有一个 maxSize 属性,默认值为 -1,隐含的意思是 “无限制”。我们可以通过特定的构造函数修改该属性。在 put 实现中,如果 maxSize 等于默认值、键已存在或条目数小于指定的 maxSize,扩展将调用超类的相应方法。
If the extension does not meet the above conditions, it raises an unchecked exception. We cannot use a checked exception because the put method does not explicitly throw any exception and we cannot redefine its signature. In our example, this limit can be circumvented by using putAll. To avoid this, we may also want to improve the example by overriding putAll, with the same logic.
如果扩展不满足上述条件,就会引发 未检查异常。我们不能使用已检查异常,因为 put 方法没有显式地抛出任何异常,我们也不能重新定义其签名。在我们的示例中,可以通过使用 putAll 来规避这一限制。为了避免这种情况,我们可能还想改进示例,以相同的逻辑覆盖 putAll 方法。
To keep the example simple, we haven’t redefined all the constructors of the superclass. We should do it, though, in a real use case to keep the original design of HashMap. In that case, we should use the max size logic also in the HashMap(Map<? extends K, ? extends V> m) constructor because it adds all the entries of the map argument without using putAll.
为了使示例保持简单,我们没有重新定义超类的所有构造函数。不过,为了保持 HashMap 的原始设计,我们应该在实际用例中这样做。在这种情况下,我们也应该在 HashMap(Map<? extends K, ? extends V> m) 构造函数中使用最大大小逻辑,因为它添加了 map 参数的所有条目,而无需使用 putAll。
This solution follows the requirements in a stricter way than the one described in the previous section. In this case, the HashMap inhibits any insert operation beyond the limit, without removing any existing element.
与上一节所述的解决方案相比,该解决方案更严格地遵循了相关要求。在这种情况下,HashMap 会抑制任何超出限制的插入操作,而不会删除任何现有元素。
5.2. Testing the Custom HashMap
5.2.测试自定义 HashMap
As an example, we can test the above custom class:
例如,我们可以测试上述自定义类:
private final int MAX_SIZE = 4;
private HashMapWithMaxSizeLimit<Integer, String> hashMapWithMaxSizeLimit;
@Test
void givenCustomHashMapObject_whenThereIsNoLimit_thenDoesNotThrowException() {
hashMapWithMaxSizeLimit = new HashMapWithMaxSizeLimit<Integer, String>();
assertDoesNotThrow(() -> {
for (int i = 0; i < 10000; i++) {
hashMapWithMaxSizeLimit.put(i, i + "");
}
});
}
@Test
void givenCustomHashMapObject_whenLimitNotReached_thenDoesNotThrowException() {
hashMapWithMaxSizeLimit = new HashMapWithMaxSizeLimit<Integer, String>(MAX_SIZE);
assertDoesNotThrow(() -> {
for (int i = 0; i < 4; i++) {
hashMapWithMaxSizeLimit.put(i, i + "");
}
});
}
@Test
void givenCustomHashMapObject_whenReplacingValueWhenLimitIsReached_thenDoesNotThrowException() {
hashMapWithMaxSizeLimit = new HashMapWithMaxSizeLimit<Integer, String>(MAX_SIZE);
assertDoesNotThrow(() -> {
hashMapWithMaxSizeLimit.put(1, "One");
hashMapWithMaxSizeLimit.put(2, "Two");
hashMapWithMaxSizeLimit.put(3, "Three");
hashMapWithMaxSizeLimit.put(4, "Four");
hashMapWithMaxSizeLimit.put(4, "4");
});
}
@Test
void givenCustomHashMapObject_whenLimitExceeded_thenThrowsException() {
hashMapWithMaxSizeLimit = new HashMapWithMaxSizeLimit<Integer, String>(MAX_SIZE);
Exception exception = assertThrows(RuntimeException.class, () -> {
for (int i = 0; i < 5; i++) {
hashMapWithMaxSizeLimit.put(i, i + "");
}
});
String messageThrownWhenSizeExceedsLimit = "Max size exceeded!";
String actualMessage = exception.getMessage();
assertTrue(actualMessage.equals(messageThrownWhenSizeExceedsLimit));
}
In the above code, we have four test cases:
在上述代码中,我们有四个测试用例:
- We instantiate the class with the default no-limit behavior. Even if we add a high number of entries, we don’t expect any exceptions.
- We instantiate the class with a maxSize of 4. If we don’t reach the limit, we don’t expect any exceptions.
- We instantiate the class with a maxSize of 4. If we have reached the limit and we replace a value, we don’t expect any exceptions.
- We instantiate the class with a maxSize of 4. If we exceed the limit, we expect a RuntimeException with a specific message.
6. Conclusion
6.结论
The need to enforce a limit for the maximum size of HashMap might arise in some specific scenarios. In this article, we’ve seen some examples that address this requirement.
在某些特定场景中,可能需要对 HashMap 的最大大小执行限制。在本文中,我们看到了一些满足这一需求的示例。
Extending LinkedHashMap and overriding its removeEldestEntry() method can be useful if we want to quickly implement an LRU cache or something similar. If we want to enforce a stricter bound, we can extend HashMap and override its insertion methods to fit our needs.
如果我们想快速实现 LRU 缓存或类似功能,扩展 LinkedHashMap 并覆盖其 removeEldestEntry() 方法将非常有用。如果我们想执行更严格的约束,可以扩展 HashMap 并重写其插入方法,以满足我们的需求。
As usual, the example code is available over on GitHub.
与往常一样,示例代码可在 GitHub 上获取。