1. Overview
1.概述
Hashing is a fundamental concept of computer science.
Hashing是计算机科学的一个基本概念。
In Java, efficient hashing algorithms stand behind some of the most popular collections, such as the HashMap (check out this in-depth article) and the HashSet.
在Java中,高效的散列算法站在一些最流行的集合背后,例如HashMap(请查看这个深入的文章)和HashSet。
In this tutorial, we’ll focus on how hashCode() works, how it plays into collections and how to implement it correctly.
在本教程中,我们将重点讨论hashCode()如何工作,它如何在集合中发挥作用以及如何正确实现它。
2. Using hashCode() in Data Structures
2.在数据结构中使用hashCode()
The simplest operations on collections can be inefficient in certain situations.
在某些情况下,对集合的最简单的操作可能是低效的。
To illustrate, this triggers a linear search, which is highly ineffective for huge lists:
为了说明问题,这引发了一个线性搜索,这对巨大的列表来说是非常无效的。
List<String> words = Arrays.asList("Welcome", "to", "Baeldung");
if (words.contains("Baeldung")) {
System.out.println("Baeldung is in the list");
}
Java provides a number of data structures for dealing with this issue specifically. For example, several Map interface implementations are hash tables.
Java提供了一些数据结构来专门处理这个问题。例如,几个Map接口的实现是哈希表。
When using a hash table, these collections calculate the hash value for a given key using the hashCode() method. Then they use this value internally to store the data so that access operations are much more efficient.
当使用哈希表时,这些集合使用hashCode()方法计算给定键的哈希值。然后它们在内部使用这个值来存储数据,这样访问操作就更有效率了。
3. Understanding How hashCode() Works
3.了解hashCode()如何工作
Simply put, hashCode() returns an integer value, generated by a hashing algorithm.
简单地说,hashCode()返回一个整数值,由散列算法生成。
Objects that are equal (according to their equals()) must return the same hash code. Different objects do not need to return different hash codes.
相等的对象(根据其equals())必须返回相同的哈希代码。不同的对象不需要返回不同的哈希代码。。
The general contract of hashCode() states:
hashCode()的一般契约规定。
- Whenever it is invoked on the same object more than once during an execution of a Java application, hashCode() must consistently return the same value, provided no information used in equals comparisons on the object is modified. This value doesn’t need to stay consistent from one execution of an application to another execution of the same application.
- If two objects are equal according to the equals(Object) method, calling the hashCode() method on each of the two objects must produce the same value.
- If two objects are unequal according to the equals(java.lang.Object) method, calling the hashCode method on each of the two objects doesn’t need to produce distinct integer results. However, developers should be aware that producing distinct integer results for unequal objects improves the performance of hash tables.
“As much as is reasonably practical, the hashCode() method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)”
“在合理可行的范围内,由类Object定义的hashCode()方法确实为不同的对象返回不同的整数。(这通常是通过将对象的内部地址转换为整数来实现的,但这种实现技术并不是JavaTM编程语言所要求的。)”
4. A Naive hashCode() Implementation
4.一个天真的hashCode()实现
A naive hashCode() implementation that fully adheres to the above contract is actually quite straightforward.
一个完全遵守上述契约的天真的hashCode()实现实际上是非常简单的。
To demonstrate this, we’re going to define a sample User class that overrides the method’s default implementation:
为了证明这一点,我们将定义一个样本User类,它覆盖了该方法的默认实现。
public class User {
private long id;
private String name;
private String email;
// standard getters/setters/constructors
@Override
public int hashCode() {
return 1;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null) return false;
if (this.getClass() != o.getClass()) return false;
User user = (User) o;
return id == user.id
&& (name.equals(user.name)
&& email.equals(user.email));
}
// getters and setters here
}
The User class provides custom implementations for both equals() and hashCode() that fully adhere to the respective contracts. Even more, there’s nothing illegitimate with having hashCode() returning any fixed value.
User类为equals()和hashCode()提供了自定义的实现,完全遵守了各自的契约。此外,让hashCode()返回任何固定的值也没有什么不合法的地方。
However, this implementation degrades the functionality of hash tables to basically zero, as every object would be stored in the same, single bucket.
然而,这种实现方式将哈希表的功能降至基本为零,因为每个对象都将被存储在同一个单一的桶中。
In this context, a hash table lookup is performed linearly and does not give us any real advantage. We talk more about this in Section 7.
在这种情况下,哈希表的查找是线性进行的,并没有给我们带来任何真正的优势。我们将在第7节中进一步讨论这个问题。
5. Improving the hashCode() Implementation
5.改进hashCode()实现
Let’s improve the current hashCode() implementation by including all fields of the User class so that it can produce different results for unequal objects:
让我们改进当前的hashCode()实现,包括User类的所有字段,这样它就可以对不相等的对象产生不同的结果。
@Override
public int hashCode() {
return (int) id * name.hashCode() * email.hashCode();
}
This basic hashing algorithm is definitively much better than the previous one. This is because it computes the object’s hash code by just multiplying the hash codes of the name and email fields and the id.
这种基本的散列算法无疑比之前的算法好得多。这是因为它只需将name和email字段的哈希代码与id相乘,就能计算出对象的哈希代码。
In general terms, we can say that this is a reasonable hashCode() implementation, as long as we keep the equals() implementation consistent with it.
一般来说,我们可以说这是一个合理的hashCode()实现,只要我们保持equals()实现与它一致。
6. Standard hashCode() Implementations
6.标准的hashCode()实现
The better the hashing algorithm that we use to compute hash codes, the better the performance of hash tables.
我们用来计算哈希码的哈希算法越好,哈希表的性能就越好。
Let’s have a look at a “standard” implementation that uses two prime numbers to add even more uniqueness to computed hash codes:
让我们来看看一个 “标准 “的实现,它使用两个素数来增加计算的哈希码的唯一性。
@Override
public int hashCode() {
int hash = 7;
hash = 31 * hash + (int) id;
hash = 31 * hash + (name == null ? 0 : name.hashCode());
hash = 31 * hash + (email == null ? 0 : email.hashCode());
return hash;
}
While we need to understand the roles that hashCode() and equals() methods play, we don’t have to implement them from scratch every time. This is because most IDEs can generate custom hashCode() and equals() implementations. And since Java 7, we have an Objects.hash() utility method for comfortable hashing:
虽然我们需要了解hashCode()和equals()方法所起的作用,但我们不必每次都从头实现它们。这是因为大多数IDE可以生成自定义的hashCode()和equals()实现。而且从Java 7开始,我们有一个Objects.hash()实用方法来实现舒适的散列。
Objects.hash(name, email)
IntelliJ IDEA generates the following implementation:
IntelliJ IDEA生成了以下实现。
@Override
public int hashCode() {
int result = (int) (id ^ (id >>> 32));
result = 31 * result + name.hashCode();
result = 31 * result + email.hashCode();
return result;
}
And Eclipse produces this one:
而Eclipse产生了这个。
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((email == null) ? 0 : email.hashCode());
result = prime * result + (int) (id ^ (id >>> 32));
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
In addition to the above IDE-based hashCode() implementations, it’s also possible to automatically generate an efficient implementation, for example using Lombok.
除了上述基于IDE的hashCode()实现外,还可以自动生成一个高效的实现,例如使用Lombok。
In this case, we need to add the lombok-maven dependency to pom.xml:
在这种情况下,我们需要将lombok-maven依赖性添加到pom.xml。
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok-maven</artifactId>
<version>1.16.18.0</version>
<type>pom</type>
</dependency>
It’s now enough to annotate the User class with @EqualsAndHashCode:
现在只需用@EqualsAndHashCode来注释User类。
@EqualsAndHashCode
public class User {
// fields and methods here
}
Similarly, if we want Apache Commons Lang’s HashCodeBuilder class to generate a hashCode() implementation for us, we include the commons-lang Maven dependency in the pom file:
同样,如果我们希望Apache Commons Lang的HashCodeBuilder类为我们生成一个hashCode()实现,我们在pom文件中加入commons-langMaven依赖。
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
And hashCode() can be implemented like this:
而hashCode()可以这样实现。
public class User {
public int hashCode() {
return new HashCodeBuilder(17, 37).
append(id).
append(name).
append(email).
toHashCode();
}
}
In general, there’s no universal recipe when it comes to implementing hashCode(). We highly recommend reading Joshua Bloch’s Effective Java. It provides a list of thorough guidelines for implementing efficient hashing algorithms.
一般来说,在实现hashCode()时,没有通用的秘诀。我们强烈建议阅读Joshua Bloch的《Effective Java》。它提供了一系列彻底的指导方针,用于实现高效的散列算法。
Notice here that all those implementations utilize number 31 in some form. This is because 31 has a nice property. Its multiplication can be replaced by a bitwise shift, which is faster than the standard multiplication:
注意这里,所有这些实现都以某种形式利用了数字31。这是因为31有一个很好的特性。它的乘法可以用比特移位来代替,这比标准乘法要快。
31 * i == (i << 5) - i
7. Handling Hash Collisions
7.处理哈希碰撞
The intrinsic behavior of hash tables brings up a relevant aspect of these data structures: Even with an efficient hashing algorithm, two or more objects might have the same hash code even if they’re unequal. So, their hash codes would point to the same bucket even though they would have different hash table keys.
散列表的内在行为带来了这些数据结构的一个相关方面。即使有一个高效的散列算法,两个或更多的对象可能有相同的散列代码,即使它们不相等。因此,他们的哈希代码会指向同一个桶,即使他们有不同的哈希表键。
This situation is commonly known as a hash collision, and various methods exist for handling it, with each one having their pros and cons. Java’s HashMap uses the separate chaining method for handling collisions:
这种情况通常被称为哈希碰撞,并且有多种处理方法,每种方法都有其优点和缺点。Java的HashMap使用独立链式方法来处理碰撞。
“When two or more objects point to the same bucket, they’re simply stored in a linked list. In such a case, the hash table is an array of linked lists, and each object with the same hash is appended to the linked list at the bucket index in the array.
“当两个或多个对象指向同一个桶时,它们被简单地存储在一个链接列表中。在这种情况下,哈希表是一个链接列表的数组,每个具有相同哈希值的对象被附加到数组中的桶索引处的链接列表中。
In the worst case, several buckets would have a linked list bound to it, and the retrieval of an object in the list would be performed linearly.”
在最坏的情况下,几个桶将有一个绑定的链接列表,列表中的一个对象的检索将以线性方式进行。”。
Hash collision methodologies show in a nutshell why it’s so important to implement hashCode() efficiently.
哈希碰撞的方法论一言以蔽之,说明了为什么有效地实现hashCode()是如此重要。。
Java 8 brought an interesting enhancement to HashMap implementation. If a bucket size goes beyond the certain threshold, a tree map replaces the linked list. This allows achieving O(logn) lookup instead of pessimistic O(n).
Java 8为HashMap的实现带来了一个有趣的增强。如果一个桶的大小超过了一定的阈值,那么树形图就会取代链表。这允许实现O(logn)查找而不是悲观的O(n)。
8. Creating a Trivial Application
8.创建一个微不足道的应用程序
Now we’ll test the functionality of a standard hashCode() implementation.
现在我们将测试一个标准的hashCode()实现的功能。
Let’s create a simple Java application that adds some User objects to a HashMap and uses SLF4J for logging a message to the console each time the method is called.
让我们创建一个简单的Java应用程序,将一些User对象添加到HashMap中,并使用SLF4J在每次调用该方法时向控制台记录一条信息。
Here’s the sample application’s entry point:
这里是示例应用程序的入口点。
public class Application {
public static void main(String[] args) {
Map<User, User> users = new HashMap<>();
User user1 = new User(1L, "John", "john@domain.com");
User user2 = new User(2L, "Jennifer", "jennifer@domain.com");
User user3 = new User(3L, "Mary", "mary@domain.com");
users.put(user1, user1);
users.put(user2, user2);
users.put(user3, user3);
if (users.containsKey(user1)) {
System.out.print("User found in the collection");
}
}
}
And this is the hashCode() implementation:
而这是hashCode()的实现。
public class User {
// ...
public int hashCode() {
int hash = 7;
hash = 31 * hash + (int) id;
hash = 31 * hash + (name == null ? 0 : name.hashCode());
hash = 31 * hash + (email == null ? 0 : email.hashCode());
logger.info("hashCode() called - Computed hash: " + hash);
return hash;
}
}
Here, it’s important to note that each time an object is stored in the hash map and checked with the containsKey() method, hashCode() is invoked and the computed hash code is printed out to the console:
在这里,需要注意的是,每当一个对象被存储在哈希图中并被containsKey()方法检查时,hashCode()被调用,计算出的哈希代码被打印到控制台。
[main] INFO com.baeldung.entities.User - hashCode() called - Computed hash: 1255477819
[main] INFO com.baeldung.entities.User - hashCode() called - Computed hash: -282948472
[main] INFO com.baeldung.entities.User - hashCode() called - Computed hash: -1540702691
[main] INFO com.baeldung.entities.User - hashCode() called - Computed hash: 1255477819
User found in the collection
9. Conclusion
9.结论
It’s clear that producing efficient hashCode() implementations often requires a mixture of a few mathematical concepts (i.e. prime and arbitrary numbers), logical and basic mathematical operations.
很明显,制作高效的hashCode()实现往往需要混合一些数学概念(即素数和任意数)、逻辑和基本数学运算。
Regardless, we can implement hashCode() effectively without resorting to these techniques at all. We just need to make sure the hashing algorithm produces different hash codes for unequal objects and that it’s consistent with the implementation of equals().
无论如何,我们可以有效地实现hashCode()而根本不需要借助这些技术。我们只需要确保散列算法对不相等的对象产生不同的散列代码,并且与equals()的实现一致。
As always, all the code examples shown in this article are available over on GitHub.
一如既往,本文中所展示的所有代码示例都可以在GitHub上找到。