Generating Unique Positive long Using UUID in Java – 在 Java 中使用 UUID 生成唯一正长

最后修改: 2023年 11月 30日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Universally Unique Identifier (UUID) represents a 128-bit number that is designed to be globally unique. In practice, UUIDs are suitable for use in situations that require unique identification, such as creating the primary key for a database table.

通用唯一标识符(Universally Unique Identifier,UUID)表示一个 128 位的数字,设计为全球唯一。实际上,UUID 适用于需要唯一标识的情况,例如创建数据库表的主键。

Java provides the long primitive, which is a data type that is easy to read and understand by humans. In many cases, using a 64-bit long can provide sufficient uniqueness with a low collision probability. Furthermore, databases like MySQL, PostgreSQL, and many others have been optimized to work efficiently with numeric data types.

Java 提供了 long 基元,这是一种易于人类阅读和理解的数据类型。在许多情况下,使用 64 位 long 可以以较低的碰撞概率提供足够的唯一性。此外,MySQL、PostgreSQL 等数据库已经过优化,可以高效地处理数字数据类型。

In this article, we’ll discuss generating unique positive long values using UUID, focusing on version 4 UUIDs.

在本文中,我们将讨论使用 UUID 生成唯一的正值,重点关注版本 4 UUID

2. Generate Unique Positive long

2.生成独特的正长</em

This scenario presents an interesting challenge because UUIDs have a 128-bit range. Meanwhile, a long value is only 64-bit. This means there’s a reduction in the potential for uniqueness. We’ll discuss how to obtain unique positive long values using randomly generated UUIDs and how effective it is as an approach.

由于 UUID 的范围是 128 位,因此这种情况提出了一个有趣的挑战。同时,值只有 64 位。这意味着唯一性的可能性降低了。我们将讨论如何使用随机生成的 UUID 获得唯一的正 long 值,以及这种方法的有效性。

2.1. Using getLeastSignificantBits()

2.1.使用 getLeastSignificantBits()

The method getLeastSignificantBits() of the UUID class returns the lowest 64 bits of the UUID. This means it only provides half of the 128-bit UUID value.

UUID 类的方法 getLeastSignificantBits() 返回 UUID 的最低 64 位。这意味着它只能提供 128 位 UUID 值的一半。

So, let’s call it after the randomUUID() method:

因此,让我们在 randomUUID() 方法之后调用它:

long randomPositiveLong = Math.abs(UUID.randomUUID().getLeastSignificantBits());

We’ll continue to use Math.abs() in each of the following methods to ensure positive values.

我们将继续在以下每个方法中使用 Math.abs() 以确保正值。

2.2. Using getMostSignificantBits()

2.2.使用 getMostSignificantBits()

Similarly, the getMostSignificantBits() method in the UUID class also returns a 64-bit long. The difference is that it takes the bits that are located in the highest positions in the 128-bit UUID value.

同样,UUID 类中的 getMostSignificantBits() 方法也返回一个 64 位 long 。不同的是,它获取的是 128 位 UUID 值中位于最高位置的位。

Again, we’ll chain it after the randomUUID() method:

同样,我们将在 randomUUID() 方法之后对其进行链式处理:

long randomPositiveLong = Math.abs(UUID.randomUUID().getMostSignificantBits());

Let’s assert that the resulting values are positive (actually non-negative because it is possible to generate a value of 0):

让我们断言所产生的值是正值(实际上是非负值,因为有可能产生一个 0 值):

assertThat(randomPositiveLong).isNotNegative();

3. How Effective?

3.效果如何?

Let’s discuss how effective the use of UUIDs is in this case. To find out, we’ll look at several factors.

让我们来讨论一下在这种情况下使用 UUID 的效果如何。为了找出答案,我们将研究几个因素。

3.1. Security and Efficiency

3.1.安全与效率

UUID.randomUUID() in Java uses the SecureRandom class to generate secure random numbers to produce a version 4 UUID. This is especially important if the resulting UUID will be used in a security or cryptographic context.

Java 中的 UUID.randomUUID() 使用 SecureRandom 类生成安全随机数,以生成版本 4 UUID。如果生成的 UUID 将用于安全或加密环境,这一点尤为重要。

To assess the efficiency and relevance of using UUIDs for generating unique positive long values, let’s see the source code:

为了评估使用 UID 生成唯一正值的效率和相关性,让我们来看看源代码:

public static UUID randomUUID() {
    SecureRandom ng = Holder.numberGenerator;

    byte[] randomBytes = new byte[16];
    ng.nextBytes(randomBytes);
    randomBytes[6]  &= 0x0f;  /* clear version        */
    randomBytes[6]  |= 0x40;  /* set to version 4     */
    randomBytes[8]  &= 0x3f;  /* clear variant        */
    randomBytes[8]  |= (byte) 0x80;  /* set to IETF variant  */
    return new UUID(randomBytes);
}

The method uses SecureRandom to generate 16 random bytes forming a UUID, then adjusts several bits in those bytes to specify the UUID version (version 4) and UUID variant (IETF).

该方法使用 SecureRandom 生成构成 UUID 的 16 个随机字节,然后调整这些字节中的若干位,以指定 UUID 版本(版本 4)和 UUID 变体(IETF)。

While UUIDs offer powerful features, a simpler approach can achieve the desired outcome in this specific case. Consequently, alternative methods might offer improved efficiency and better fit.

因此,其他方法可能会提高效率并更加合适。

Additionally, this approach may reduce the randomness of the generated bits.

此外,这种方法可能会降低生成比特的随机性

3.2. Uniqueness and Collision Probability

3.2.唯一性和碰撞概率

Although UUID v4 has a range of 128 bits, four bits are used to indicate version 4, and two bits are used to indicate the variant. As we know, in the UUID display format, each character represents a four-bit hexadecimal digit:

虽然 UUID v4 的范围是 128 位,但用 4 位来表示版本 4,用 2 位来表示变体。我们知道,在 UUID 显示格式中,每个字符代表一个四位十六进制数字:

xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx

The number 4 indicates the four-bit UUID version (in this case, version 4). Meanwhile, the letter ‘y’ will include the two-bit IETF variant. The remaining x part contains 122 random bits. So, we can have a collision probability of 1/2^122.

数字 4 表示四位 UUID 版本(本例中为版本 4)。同时,字母 “y “将包含两位 IETF 变体。剩余的 x 部分包含 122 个随机比特。因此,我们可以得到 1/2^122 的碰撞概率。

The RFC explains in general how UUID v4 is created. To see more clearly how the bit positions are distributed, we can look again at the implementation of UUID.randomUUID():

RFC大致解释了 UUID v4 是如何创建的。为了更清楚地了解位位置是如何分配的,我们可以再次查看 UUID.randomUUID() 的实现:

randomBytes[6]  &= 0x0f;  /* clear version        */
randomBytes[6]  |= 0x40;  /* set to version 4     */

We can see that in randomBytes[6], there are four bits set as a version marker in the Most Significant Bits (MSB) position. So, there are only 60 truly random bits in this MSB. Therefore, we can calculate the probability of collision on the MSB as 1/2^59.

我们可以看到,在 randomBytes[6] 中,有四个比特被设置为最重要比特 (MSB) 位置的版本标记。因此,在这个 MSB 中只有 60 个真正的随机比特。因此,我们可以计算出在 MSB 上发生碰撞的概率为 1/2^59。

After adding Math.abs(), the chance of collision is doubled due to overlapping positives and negatives. So, the probability of a collision of a positive long value in the MSB is 1 in 2^58.

在添加 Math.abs() 之后,由于正负值重叠,碰撞的概率会增加一倍。因此,MSB 中的正值发生碰撞的概率是 1 in 2^58

Then, in randomBytes[8], there are two bits set as IETF variants in the Least Significant Bit (LSB) position:

然后,在randomBytes[8]中,有两个比特被设置为 IETF 变体,位于最小有效位 (LSB) 位置:

randomBytes[8] &= 0x3f; /* clear variant */ 
randomBytes[8] |= (byte) 0x80; /* set to IETF variant */

So, there are only 62 truly random bits in the LSB. Therefore, we can calculate the probability of collision on the LSB as 1/2^61.

因此,LSB 中只有 62 个真正的随机比特。因此,我们可以计算出 LSB 上的碰撞概率为 1/2^61 。

After adding Math.abs(), the chance of collision is doubled due to overlapping positives and negatives. So, the probability of a collision of a positive long value in the LSB is 1 in 2^60.

在添加 Math.abs() 之后,由于正负值重叠,碰撞的概率会增加一倍。因此,LSB 中的正值发生碰撞的概率是 1 in 2^60

So, we can see that the probability of collision is small. However, if we ask whether the contents of UUIDs are completely random, then the answer is no.

因此,我们可以看到碰撞的概率很小。但是,如果我们问 UUID 的内容是否完全随机,那么答案是否定的

3.3. Suitability for Purpose

3.3.适用性

UUIDs are designed for global uniqueness, identification, security, and portability. For generating unique positive long values, more efficient methods exist, making UUIDs unnecessary in this context.

UUID 旨在实现全球唯一性、识别性、安全性和可移植性。对于生成唯一的正值,存在更有效的方法,因此在这种情况下没有必要使用 UUID。

While UUID.randomUUID() boasts 128-bit length, we’ve seen that only 122 bits are actually random, and Java’s long data type only handles 64 bits. When converting 128-bit UUIDs to 64-bit longs, some unique potential is lost. Consider this trade-off if uniqueness is critical.

虽然 UUID.randomUUID() 拥有 128 位长度,但我们看到实际上只有 122 位是随机的,而 Java 的 long 数据类型只能处理 64 位。如果唯一性至关重要,请考虑这种权衡。

4. SecureRandom as an Alternative

4.SecureRandom 作为替代方案

If we need unique long values, it makes more sense to use a random number generator with an appropriate range (for example, using SecureRandom to generate unique random long values). This will ensure that we have unique long values in the appropriate range, without losing most of the unique bits as happens when we try to use UUIDs.

如果我们需要唯一的 值,那么使用具有适当范围的随机数生成器会更有意义(例如,使用 SecureRandom 生成唯一的随机 值)。这将确保我们在适当的范围内获得唯一的 long 值,而不会像尝试使用 UUID 时那样丢失大部分唯一位。

SecureRandom secureRandom = new SecureRandom();
long randomPositiveLong = Math.abs(secureRandom.nextLong());

It also has a lower probability of collision because it generates completely random 64-bit long values.

由于它生成的是完全随机的 64 位值,因此碰撞概率也较低。

To ensure positive values, we simply add Math.abs(). Consequently, the probability of collision is calculated as 1/2^62. In decimal form, this probability is approximately 0.000000000000000000216840434497100900. For most practical applications, we can consider this low probability to be insignificant.

为确保正值,我们只需添加 Math.abs() 即可。因此,碰撞概率的计算结果为 1/2^62。以十进制形式表示,该概率约为 0.00000000000000216840434497100900。在大多数实际应用中,我们可以认为这种低概率是微不足道的。

5. Conclusion

5.结论

In conclusion, although UUIDs provide globally unique identifiers, they may not be the most efficient choice for generating unique positive long values, as significant bit loss occurs.

总之,尽管 UUID 提供了全球唯一的标识符,但它们可能不是生成唯一正值的最有效选择, 因为会发生显著的比特损耗

Utilizing methods like getMostSignificantBits() and getLeastSignificantBits() can still provide low collision probabilities, but using a random number generator like SecureRandom might be more efficient and suitable for generating unique positive long values directly.

使用 getMostSignificantBits()getLeastSignificantBits() 等方法仍可提供较低的碰撞概率,但 使用 SecureRandom 等随机数生成器可能更有效,更适合直接生成唯一的正

As always, the full source code is available over on GitHub.

与往常一样,完整的源代码可在 GitHub 上获取