1. Overview
1.概述
From the dawn of Java, all numerical data types are signed. In many situations, however, it’s required to use unsigned values. For example, if we count the number of occurrences of an event, we don’t want to encounter a negative value.
从Java诞生之初,所有的数字数据类型都是有符号的。然而,在许多情况下,需要使用无符号值。例如,如果我们计算一个事件的发生次数,我们不希望遇到一个负值。
The support for unsigned arithmetic has finally been part of the JDK as of version 8. This support came in the form of the Unsigned Integer API, primarily containing static methods in the Integer and Long classes.
从第 8 版开始,对无符号算术的支持终于成为 JDK 的一部分。这种支持以无符号整数 API 的形式出现,主要包含 Integer 和 Long 类中的静态方法。
In this tutorial, we’ll go over this API and give instructions on how to use unsigned numbers correctly.
在本教程中,我们将介绍这个API,并给出如何正确使用无符号数的说明。
2. Bit-Level Representations
2.比特级表示法
To understand how to handle signed and unsigned numbers, let’s take a look at their representation at the bit level first.
为了了解如何处理有符号和无符号的数字,让我们先看看它们在位一级的表示。
In Java, numbers are encoded using the two’s complement system. This encoding implements many basic arithmetic operations, including addition, subtraction, and multiplication, in the same way, whether the operands are signed or unsigned.
在Java中,数字使用two’s complement系统进行编码。这种编码以相同的方式实现许多基本的算术运算,包括加法、减法和乘法,无论操作数是有符号还是无符号。
Things should be clearer with a code example. For the sake of simplicity, we’ll use variables of the byte primitive data type. Operations are similar for other integral numerical types, such as short, int, or long.
通过一个代码例子,事情应该更清楚了。为了简单起见,我们将使用byte原始数据类型的变量。对于其他积分数字类型,如short、int或long,操作也类似。
Assume we have some type byte with the value of 100. This number has the binary representation 0110_0100.
假设我们有一些类型byte,其值为100。这个数字有二进制表示0110_0100。
Let’s double this value:
让我们把这个值增加一倍。
byte b1 = 100;
byte b2 = (byte) (b1 << 1);
The left shift operator in the given code moves all the bits in variable b1 a position to the left, technically making its value twice as large. The binary representation of variable b2 will then be 1100_1000.
所给代码中的左移运算符将变量b1中的所有位向左移动了一个位置,从技术上讲,它的值是两倍大。然后,变量b2的二进制表示将是1100_1000。
In an unsigned type system, this value represents a decimal number equivalent to 2^7 + 2^6 + 2^3, or 200. Nevertheless, in a signed system, the left-most bit works as the sign bit. Therefore, the result is -2^7 + 2^6 + 2^3, or -56.
在无符号类型系统中,这个值代表一个十进制数字,相当于2^7 + 2^6 + 2^3,或200。然而,在有符号系统中,最左边的位作为符号位。因此,结果是-2^7 + 2^6 + 2^3,或-56。
A quick test can verify the outcome:
一个快速测试可以验证结果。
assertEquals(-56, b2);
We can see that the computations of signed and unsigned numbers are the same. Differences only appear when the JVM interprets a binary representation as a decimal number.
我们可以看到,有符号数和无符号数的计算是一样的。只有当JVM将二进制表示法解释为十进制数字时才会出现差异。
The addition, subtraction, and multiplication operations can work with unsigned numbers without requiring any changes in the JDK. Other operations, such as comparison or division, handle signed and unsigned numbers differently.
加法、减法和乘法运算可以在无符号数中进行,而不需要对JDK进行任何修改。其他操作,如比较或除法,处理有符号和无符号数字的方式不同。
This is where the Unsigned Integer API comes into play.
这就是无符号整数API发挥作用的地方。
3. The Unsigned Integer API
3.无符号整数API
The Unsigned Integer API provides support for unsigned integer arithmetic in Java 8. Most members of this API are static methods in the Integer and Long classes.
无符号整数 API 为 Java 8 中的无符号整数运算提供支持。该API的大多数成员是Integer和Long类中的静态方法。
Methods in these classes work similarly. We’ll thus focus on the Integer class only, leaving off the Long class for brevity.
这些类中的方法的工作原理是相似的。因此,我们将只关注Integer类,为了简洁起见,不讨论Long类。
3.1. Comparison
3.1.比较
The Integer class defines a method named compareUnsigned to compare unsigned numbers. This method considers all binary values unsigned, ignoring the notion of the sign bit.
Integer类定义了一个名为compareUnsigned的方法来比较无符号数。该方法认为所有二进制值都是无符号的,忽略了符号位的概念。
Let’s start with two numbers at the boundaries of the int data type:
让我们从两个处于int数据类型边界的数字开始。
int positive = Integer.MAX_VALUE;
int negative = Integer.MIN_VALUE;
If we compare these numbers as signed values, positive is obviously greater than negative:
如果我们将这些数字作为有符号的数值进行比较,正数显然大于负数。
int signedComparison = Integer.compare(positive, negative);
assertEquals(1, signedComparison);
When comparing numbers as unsigned values, the left-most bit is considered the most significant bit instead of the sign bit. Thus, the result is different, with positive being smaller than negative:
当把数字作为无符号值进行比较时,最左边的位被认为是最重要的位,而不是符号位。因此,结果是不同的,正数要比负数小。
int unsignedComparison = Integer.compareUnsigned(positive, negative);
assertEquals(-1, unsignedComparison);
It should be clearer if we take a look at the binary representation of those numbers:
如果我们看一下这些数字的二进制表示,应该会更清楚。
- MAX_VALUE -> 0111_1111_…_1111
- MIN_VALUE -> 1000_0000_…_0000
When the left-most bit is a regular value bit, MIN_VALUE is one unit larger than MAX_VALUE in the binary system. This test confirms that:
当最左边的位是一个常规值位时,MIN_VALUE比二进制系统中的MAX_VALUE大一个单位。这个测试证实了这一点。
assertEquals(negative, positive + 1);
3.2. Division and Modulo
3.2.除法和模数
Just like the comparison operation, the unsigned division and modulo operations process all bits as value bits. The quotients and remainders are therefore different when we perform these operations on signed and unsigned numbers:
就像比较操作一样,无符号除法和模数操作将所有位都处理为值位。因此,当我们对有符号和无符号数字进行这些操作时,商数和余数是不同的。
int positive = Integer.MAX_VALUE;
int negative = Integer.MIN_VALUE;
assertEquals(-1, negative / positive);
assertEquals(1, Integer.divideUnsigned(negative, positive));
assertEquals(-1, negative % positive);
assertEquals(1, Integer.remainderUnsigned(negative, positive));
3.3. Parsing
3.3.解析
When parsing a String using the parseUnsignedInt method, the text argument can represent a number greater than MAX_VALUE.
当使用parseUnsignedInt方法解析String时,文本参数可以代表一个大于MAX_VALUE的数字。
A large value like that cannot be parsed with the parseInt method, which can only handle textual representation of numbers from MIN_VALUE to MAX_VALUE.
像这样的大数值不能用parseInt方法来解析,它只能处理从MIN_VALUE到MAX_VALUE的数字文本表示。
The following test case verifies the parsing results:
下面的测试案例验证了解析结果。
Throwable thrown = catchThrowable(() -> Integer.parseInt("2147483648"));
assertThat(thrown).isInstanceOf(NumberFormatException.class);
assertEquals(Integer.MAX_VALUE + 1, Integer.parseUnsignedInt("2147483648"));
Notice that the parseUnsignedInt method can parse a string indicating a number larger than MAX_VALUE, but will fail to parse any negative representation.
注意,parseUnsignedInt方法可以解析一个表示大于MAX_VALUE的数字的字符串,但是将无法解析任何负数的表示。
3.4. Formatting
3.4.格式化
Similar to parsing, when formatting a number, an unsigned operation regards all bits as value bits. Consequently, we can produce the textual representation of a number about twice as large as MAX_VALUE.
与解析类似,在格式化一个数字时,无符号操作将所有的位都视为值位。因此,我们可以产生一个大约是MAX_VALUE两倍的数字的文本表示。
The following test case confirms the formatting result of MIN_VALUE in both cases — signed and unsigned:
下面的测试案例确认了两种情况下MIN_VALUE的格式化结果–有符号和无符号。
String signedString = Integer.toString(Integer.MIN_VALUE);
assertEquals("-2147483648", signedString);
String unsignedString = Integer.toUnsignedString(Integer.MIN_VALUE);
assertEquals("2147483648", unsignedString);
4. Pros and Cons
4.优点和缺点
Many developers, especially those coming from a language that supports unsigned data types, such as C, welcome the introduction of unsigned arithmetic operations. However, this isn’t necessarily a good thing.
许多开发人员,尤其是那些来自支持无符号数据类型的语言(如C)的开发人员,欢迎引入无符号算术操作。然而,这并不一定是一件好事。。
There are two main reasons for the demand for unsigned numbers.
对无符号数字的需求有两个主要原因。
First, there are cases for which a negative value can never occur, and using an unsigned type can prevent such a value in the first place. Second, with an unsigned type, we can double the range of usable positive values compared to its signed counterpart.
首先,在有些情况下,负值是不可能出现的,而使用无符号类型可以从一开始就防止出现这样的值。其次,与有符号类型相比,使用无符号类型,我们可以将可用的正值范围扩大一倍。
Let’s analyze the rationale behind the appeal for unsigned numbers.
让我们分析一下呼吁无符号数字背后的理由。
When a variable should always be non-negative, a value less than 0 may be handy in indicating an exceptional situation.
当一个变量应该总是非负值时,小于0的值可能会方便地表明一种特殊情况。
For instance, the String.indexOf method returns the position of the first occurrence of a certain character in a string. The index -1 can easily denote the absence of such a character.
例如,String.indexOf方法返回字符串中第一次出现的某个字符的位置。索引-1可以很容易地表示没有这样一个字符。
The other reason for unsigned numbers is the expansion of the value space. However, if the range of a signed type isn’t enough, it’s unlikely that a doubled range would suffice.
无符号数的另一个原因是值空间的扩展。然而,如果一个有符号类型的范围不够,那么一个加倍的范围就不太可能足够了。
In case a data type isn’t large enough, we need to use another data type that supports much larger values, such as using long instead of int, or BigInteger rather than long.
如果一个数据类型不够大,我们需要使用另一个支持更大数值的数据类型,比如使用long而不是int,或者BigInteger而不是long。
Another problem with the Unsigned Integer API is that the binary form of a number is the same regardless of whether it’s signed or unsigned. It’s therefore easy to mix signed and unsigned values, which may lead to unexpected results.
无符号整数API的另一个问题是,无论数字是有符号还是无符号,它的二进制形式都是一样的。因此,很容易将有符号和无符号的值混在一起,这可能会导致意外的结果。
5. Conclusion
5.结论
The support for unsigned arithmetic in Java has come at the request of many people. However, the benefits it brings in are unclear. We should exercise caution when using this new feature to avoid unexpected outcomes.
在Java中对无符号算术的支持是应许多人的要求而来的。然而,它带来的好处还不清楚。在使用这个新功能时,我们应该谨慎行事,以避免出现意外的结果。
As always, the source code for this article is available over on GitHub.
一如既往,本文的源代码可在GitHub上获得over。