1. Introduction
1.绪论
We often need to convert between a String and byte array in Java. In this tutorial, we’ll examine these operations in detail.
我们经常需要在Java中进行String和byte数组之间的转换。在本教程中,我们将详细研究这些操作。
First, we’ll look at various ways to convert a String to a byte array. Then we’ll look at similar operations in reverse.
首先,我们将看一下将字符串转换为字节数组的各种方法。然后,我们再看一下类似的反向操作。
2. Converting a String to Byte Array
2.将字符串转换为字节阵列
A String is stored as an array of Unicode characters in Java. To convert it to a byte array, we translate the sequence of characters into a sequence of bytes. For this translation, we use an instance of Charset. This class specifies a mapping between a sequence of chars and a sequence of bytes.
在Java中,String被存储为一个Unicode字符数组。为了将其转换为byte数组,我们将字符序列翻译成字节序列。对于这种转换,我们使用Charset的一个实例。这个类指定了chars序列和bytes序列之间的映射。
We refer to the above process as encoding.
我们把上述过程称为编码。
In Java, we can encode a String into a byte array in multiple ways. Let’s look at each of them in detail with examples.
在Java中,我们可以通过多种方式将String编码成byte阵列。让我们通过实例来详细了解每一种方式。
2.1. Using String.getBytes()
2.1.使用String.getBytes()
The String class provides three overloaded getBytes methods to encode a String into a byte array:
String类提供了三个重载的getBytes方法来将String编码为一个字节数组。
- getBytes() – encodes using platform’s default charset
- getBytes (String charsetName) – encodes using the named charset
- getBytes (Charset charset) – encodes using the provided charset
First, let’s encode a string using the platform’s default charset:
首先,让我们使用平台的默认字符集对一个字符串进行编码:。
String inputString = "Hello World!";
byte[] byteArrray = inputString.getBytes();
The above method is platform-dependent, as it uses the platform’s default charset. We can get this charset by calling Charset.defaultCharset().
上述方法是依赖于平台的,因为它使用了平台的默认字符集。我们可以通过调用Charset.defaultCharset()获得这个字符集。
Then let’s encode a string using a named charset:
然后让我们用一个命名的字符集对一个字符串进行编码:。
@Test
public void whenGetBytesWithNamedCharset_thenOK()
throws UnsupportedEncodingException {
String inputString = "Hello World!";
String charsetName = "IBM01140";
byte[] byteArrray = inputString.getBytes("IBM01140");
assertArrayEquals(
new byte[] { -56, -123, -109, -109, -106, 64, -26,
-106, -103, -109, -124, 90 },
byteArrray);
}
This method throws an UnsupportedEncodingException if the named charset isn’t supported.
如果指定的字符集不被支持,该方法会抛出一个UnsupportedEncodingException。
The behavior of the above two versions is undefined if the input contains characters which aren’t supported by the charset. In contrast, the third version uses the charset’s default replacement byte array to encode unsupported input.
如果输入包含不被字符集支持的字符,上述两个版本的行为是未定义的。相反,第三个版本使用字符集的默认替换字节数组对不支持的输入进行编码。
Next, let’s call the third version of the getBytes() method, and pass an instance of Charset:
接下来,让我们调用第三个版本的the getBytes()方法,并传递一个Charset:的实例。
@Test
public void whenGetBytesWithCharset_thenOK() {
String inputString = "Hello ਸੰਸਾਰ!";
Charset charset = Charset.forName("ASCII");
byte[] byteArrray = inputString.getBytes(charset);
assertArrayEquals(
new byte[] { 72, 101, 108, 108, 111, 32, 63, 63, 63,
63, 63, 33 },
byteArrray);
}
Here we’re using the factory method Charset.forName to get an instance of the Charset. This method throws a runtime exception if the name of the requested charset is invalid. It also throws a runtime exception if the charset is supported in the current JVM.
这里我们使用工厂方法Charset.forName来获取Charset的一个实例。如果请求的字符集的名称无效,该方法会抛出一个运行时异常。如果该字符集在当前的JVM中被支持,它也会抛出一个运行时异常。
However, some charsets are guaranteed to be available on every Java platform. The StandardCharsets class defines constants for these charsets.
然而,一些字符集被保证在每个Java平台上都可用。StandardCharsets类为这些字符集定义了常数。
Finally, let’s encode using one of the standard charsets:
最后,让我们使用一个标准字符集进行编码:。
@Test
public void whenGetBytesWithStandardCharset_thenOK() {
String inputString = "Hello World!";
Charset charset = StandardCharsets.UTF_16;
byte[] byteArrray = inputString.getBytes(charset);
assertArrayEquals(
new byte[] { -2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0,
111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33 },
byteArrray);
}
Thus, we have completed the review of the various getBytes versions. Next, let’s look into the method provided by Charset itself.
因此,我们已经完成了对各种getBytes版本的审查。接下来,让我们研究一下Charset本身提供的方法。
2.2. Using Charset.encode()
2.2.使用Charset.encode()
The Charset class provides encode(), a convenient method that encodes Unicode characters into bytes. This method always replaces invalid input and unmappable-characters using the charset’s default replacement byte array.
Charset类提供了encode(),这是一个方便的方法,将Unicode字符编码为字节。该方法总是使用charset的默认替换字节数组来替换无效的输入和不可应用的字符。
Let’s use the encode method to convert a String into a byte array:
让我们使用encode方法将String转换成byte数组:。
@Test
public void whenEncodeWithCharset_thenOK() {
String inputString = "Hello ਸੰਸਾਰ!";
Charset charset = StandardCharsets.US_ASCII;
byte[] byteArrray = charset.encode(inputString).array();
assertArrayEquals(
new byte[] { 72, 101, 108, 108, 111, 32, 63, 63, 63, 63, 63, 33 },
byteArrray);
}
As we can see above, unsupported characters have been replaced with the charset’s default replacement byte 63.
正如我们在上面看到的,不支持的字符已被替换为字符集的默认替换byte 63。
The approaches we have used so far use the CharsetEncoder class internally to perform encoding. Let’s examine this class in the next section.
到目前为止,我们使用的方法在内部使用CharsetEncoder类来执行编码。让我们在下一节研究这个类。
2.3. CharsetEncoder
2.3.字符集编码器
CharsetEncoder transforms Unicode characters into a sequence of bytes for a given charset. Moreover, it provides fine-grained control over the encoding process.
CharsetEncoder将Unicode字符转换为指定字符集的字节序列。此外,它还提供了对编码过程的细粒度控制。
Let’s use this class to convert a String into a byte array:
让我们用这个类来将String转换成byte数组。
@Test
public void whenUsingCharsetEncoder_thenOK()
throws CharacterCodingException {
String inputString = "Hello ਸੰਸਾਰ!";
CharsetEncoder encoder = StandardCharsets.US_ASCII.newEncoder();
encoder.onMalformedInput(CodingErrorAction.IGNORE)
.onUnmappableCharacter(CodingErrorAction.REPLACE)
.replaceWith(new byte[] { 0 });
byte[] byteArrray = encoder.encode(CharBuffer.wrap(inputString))
.array();
assertArrayEquals(
new byte[] { 72, 101, 108, 108, 111, 32, 0, 0, 0, 0, 0, 33 },
byteArrray);
}
Here we’re creating an instance of CharsetEncoder by calling the newEncoder method on a Charset object.
这里我们通过在Charset对象上调用newEncoder方法来创建一个CharsetEncoder实例。
Then we’re specifying actions for error conditions by calling the onMalformedInput() and onUnmappableCharacter() methods. We can specify the following actions:
然后我们通过调用onMalformedInput()和onUnmappableCharacter()方法来指定错误条件的行动。
- IGNORE – drop the erroneous input
- REPLACE – replace the erroneous input
- REPORT – report the error by returning a CoderResult object or throwing a CharacterCodingException
Furthermore, we’re using the replaceWith() method to specify the replacement byte array.
此外,我们使用replaceWith()方法来指定替换byte数组。
Thus, we have completed the review of various approaches to convert a String to a byte array. Next, let’s look at the reverse operation.
这样,我们已经完成了将字符串转换为字节数组的各种方法的回顾。接下来,让我们来看看反向操作。
3. Converting a Byte Array to String
3.将一个字节数组转换为字符串
We refer to the process of converting a byte array to a String as decoding. Similar to encoding, this process requires a Charset.
我们把将字节数组转换为字符串的过程称为解码。与编码类似,这个过程需要一个字符集。
However, we can’t just use any charset for decoding a byte array. In particular, we should use the charset that encoded the String into the byte array.
然而,我们不能仅仅使用任何字符集来解码字节数组。特别是,我们应该使用将String编码为byte数组的字符集。
We can also convert a byte array to a String in many ways. Let’s examine each of them in detail.
我们也可以通过很多方式将字节数组转换为字符串。让我们来详细研究一下每一种方法。
3.1. Using the String Constructor
3.1.使用String构造函数
The String class has a few constructors which take a byte array as input. They’re all similar to the getBytes method, but work in reverse.
String类有几个构造函数,它们接受一个byte数组作为输入。它们都与getBytes方法类似,但工作方式相反。
So let’s convert a byte array to String using the platform’s default charset:
所以让我们使用平台的默认字符集将一个字节数组转换为String:。
@Test
public void whenStringConstructorWithDefaultCharset_thenOK() {
byte[] byteArrray = { 72, 101, 108, 108, 111, 32, 87, 111, 114,
108, 100, 33 };
String string = new String(byteArrray);
assertNotNull(string);
}
Note that we don’t assert anything here about the contents of the decoded string. This is because it may decode to something different, depending on the platform’s default charset.
注意,我们在这里并没有对解码后的字符串的内容做出任何断言。这是因为它可能解码为不同的内容,这取决于平台的默认字符集。
For this reason, we should generally avoid this method.
出于这个原因,我们一般应该避免这种方法。
Then let’s use a named charset for decoding:
然后让我们使用一个命名的字符集进行解码:。
@Test
public void whenStringConstructorWithNamedCharset_thenOK()
throws UnsupportedEncodingException {
String charsetName = "IBM01140";
byte[] byteArrray = { -56, -123, -109, -109, -106, 64, -26, -106,
-103, -109, -124, 90 };
String string = new String(byteArrray, charsetName);
assertEquals("Hello World!", string);
}
This method throws an exception if the named charset is not available on the JVM.
如果指定的字符集在JVM上不可用,该方法会抛出一个异常。
Next, let’s use a Charset object to do decoding:
接下来,让我们使用一个Charset对象来进行解码:。
@Test
public void whenStringConstructorWithCharSet_thenOK() {
Charset charset = Charset.forName("UTF-8");
byte[] byteArrray = { 72, 101, 108, 108, 111, 32, 87, 111, 114,
108, 100, 33 };
String string = new String(byteArrray, charset);
assertEquals("Hello World!", string);
}
Finally, let’s use a standard Charset for the same:
最后,让我们使用一个标准的字符集来实现:。
@Test
public void whenStringConstructorWithStandardCharSet_thenOK() {
Charset charset = StandardCharsets.UTF_16;
byte[] byteArrray = { -2, -1, 0, 72, 0, 101, 0, 108, 0, 108, 0,
111, 0, 32, 0, 87, 0, 111, 0, 114, 0, 108, 0, 100, 0, 33 };
String string = new String(byteArrray, charset);
assertEquals("Hello World!", string);
}
So far, we have converted a byte array into a String using the constructor, and now we’ll look into the other approaches.
到目前为止,我们已经使用构造函数将一个字节数组转换为字符串,现在我们来看看其他的方法。
3.2. Using Charset.decode()
3.2.使用Charset.decode()
The Charset class provides the decode() method that converts a ByteBuffer to String:
Charset类提供了decode()方法,将ByteBuffer转换为String。
@Test
public void whenDecodeWithCharset_thenOK() {
byte[] byteArrray = { 72, 101, 108, 108, 111, 32, -10, 111,
114, 108, -63, 33 };
Charset charset = StandardCharsets.US_ASCII;
String string = charset.decode(ByteBuffer.wrap(byteArrray))
.toString();
assertEquals("Hello �orl�!", string);
}
Here, the invalid input is replaced with the default replacement character for the charset.
这里,无效的输入被替换为字符集的默认替换字符。
3.3. CharsetDecoder
3.3.字符集解码器
Note that all of the previous approaches for decoding internally use the CharsetDecoder class. We can use this class directly for fine-grained control on the decoding process:
请注意,之前所有的解码方法都在内部使用CharsetDecoder类。我们可以直接使用这个类来对解码过程进行细粒度的控制。
@Test
public void whenUsingCharsetDecoder_thenOK()
throws CharacterCodingException {
byte[] byteArrray = { 72, 101, 108, 108, 111, 32, -10, 111, 114,
108, -63, 33 };
CharsetDecoder decoder = StandardCharsets.US_ASCII.newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE)
.replaceWith("?");
String string = decoder.decode(ByteBuffer.wrap(byteArrray))
.toString();
assertEquals("Hello ?orl?!", string);
}
Here we’re replacing invalid inputs and unsupported characters with “?”.
这里我们用”?”替换无效的输入和不支持的字符。
If we want to be informed in case of invalid inputs, we can change the decoder:
如果我们想在无效输入的情况下被告知,我们可以改变decoder。
decoder.onMalformedInput(CodingErrorAction.REPORT)
.onUnmappableCharacter(CodingErrorAction.REPORT)
4. Conclusion
4.总结
In this article, we investigated multiple ways to convert a String to a byte array, and vice versa. We should choose the appropriate method based on the input data, as well as the level of control required for invalid inputs.
在这篇文章中,我们研究了将String转换为字节数组的多种方法,反之亦然。我们应该根据输入数据以及无效输入所需的控制水平来选择合适的方法。
As usual, the full source code can be found over on GitHub.
像往常一样,完整的源代码可以在GitHub上找到超过。