Creating Unicode Character From Its Code Point Hex String – 从编码点十六进制字符串创建统一码字符

最后修改: 2024年 1月 29日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Java’s support for Unicode makes it straightforward to work with characters from diverse languages and scripts

Java 支持Unicode,因此可以直接使用各种语言和脚本的字符。

In this tutorial, we’ll explore and learn how to obtain Unicode characters from their code points in Java.

在本教程中,我们将探索和学习如何在 Java 中从编码点获取 Unicode 字符。

2. Introduction to the Problem

2.问题介绍

Java’s Unicode support allows us to build internationalized applications quickly. Let’s look at a couple of examples:

Java 的 Unicode 支持使我们能够快速构建国际化应用程序。让我们来看几个例子:

static final String U_CHECK = "✅"; // U+2705
static final String U_STRONG = "强"; // U+5F3A

In the example above, both the check mark “✅” and “” (“Strong” in Chinese) are Unicode characters.

在上面的示例中,选中标记“✅”和”“(中文 “强”)都是 Unicode 字符。

We know that Unicode characters can be represented correctly in Java if our string follows the pattern of an escaped ‘u’ and a hexadecimal number, for example:

我们知道,如果我们的字符串遵循一个转义 ‘u’ 和一个十六进制数字的模式,Unicode 字符就可以在 Java 中正确表示:

String check = "\u2705";
assertEquals(U_CHECK, check);

String strong = "\u5F3A";
assertEquals(U_STRONG, strong);

In some scenarios, we’re given the hexadecimal number after “\u” and need to get the corresponding Unicode character. For instance, the check mark “✅” should be produced when we receive the number “2705″ in the string format.

在某些情况下,我们会在”\u “后面给出十六进制数字,并需要获取相应的 Unicode 字符。例如,当我们接收到字符串格式的数字”2705″“时,应生成复选标记”✅”。

The first idea we might come up with is concatenating “\\u” and the number. However, this doesn’t do the job:

我们可能想到的第一个办法是将”\\u “和数字连接起来。然而,这并不能解决问题:

String check = "\\u" + "2705";
assertEquals("\\u2705", check);

String strong = "\\u" + "5F3A";
assertEquals("\\u5F3A", strong);

As the test shows, concatenating “\\u” and a number, such as “2705”, produces a literal string “\\u2705 instead of the check mark “✅”.

正如测试所示,将”\u “和一个数字(如 “2705”)连接起来会产生一个字面字符串”\u2705,而不是复选标记”✅”。

Next, let’s explore how to convert the given number to the Unicode string.

接下来,让我们探讨如何将给定的数字转换为 Unicode 字符串。

3. Understanding the Hexadecimal Number After “\u

3.理解”\u“后面的十六进制数字

Unicode assigns a unique code point to every character, providing a universal way to represent text across different languages and scripts. A code point is a numerical value that uniquely identifies a character in the Unicode standard.

Unicode为每个字符分配一个唯一的码位,为不同语言和脚本的文本提供了一种通用的表示方法。码位是 Unicode 标准中唯一标识字符的数值。

To create a Unicode character in Java, we need to understand the code point of the desired character. Once we have the code point, we can use Java’s char data type and the escape sequence ‘\u’ to represent the Unicode character.

要在 Java 中创建 Unicode 字符,我们需要了解所需字符的码位。获得码位后,我们就可以使用 Java 的 char 数据类型和转义序列”\u “来表示 Unicode 字符。

In the “\uxxxx” notation, “xxxx” is the character’s code point in the hexadecimal representation. For example, the hexadecimal ASCII code of ‘A‘ is 41 (decimal: 65). Therefore, we can get the string “A” if we use the Unicode notation “\u0041”:

在”\uxxxx“符号中,“xxxx “是字符在十六进制表示法中的码位。例如,”A“的十六进制ASCII代码是 41(十进制:65)。因此,如果我们使用 Unicode 符号 “\u0041”,就可以得到字符串”A“:

assertEquals("A", "\u0041");

So next, let’s see how to get the desired Unicode character from the hexadecimal number after “\u”.

接下来,让我们看看如何从”\u “后面的十六进制数字中获取所需的 Unicode 字符。

4. Using the Character.toChars() Method

4.使用 Character.toChars() 方法

Now we understand what the hexadecimal number after “\u” indicates. When we received “2705,” it was the hexadecimal representation of a character’s code point.

现在我们明白”\u “后面的十六进制数字表示什么了。当我们收到 “2705 “时,它是字符码位的十六进制表示

If we get the code point integer, the Character.toChars(int codePoint) method can give us the char array that holds the code point’s Unicode representation. Finally, we can call String.valueOf() to get the target string:

如果我们获得了码位整数,Character.toChars(int codePoint) 方法就可以为我们提供保存码位 Unicode 表示的字符数组。最后,我们可以调用 String.valueOf() 来获取目标字符串:

Given "2705"
 |_ Hex(codePoint) = 0x2705
     |_ codePoint = 9989 (decimal)
         |_ char[] chars = Character.toChars(9989) 
            |_ String.valueOf(chars)
               |_"✅"

As we can see, to obtain our target character, we must find the code point first.

我们可以看到,要获得目标字符,必须先找到码位。

The code point integer can be obtained by parsing the provided string in the hexadecimal (base-16) radix using the Integer.parseInt() method. 

使用 Integer.parseInt() 方法以十六进制(基 16)弧度解析所提供的字符串,即可获得码位整数。 <br

So next, let’s put everything together:

接下来,让我们把所有东西放在一起:

int codePoint = Integer.parseInt("2705", 16); // Decimal int: 9989
char[] checkChar = Character.toChars(codePoint);
String check = String.valueOf(checkChar);
assertEquals(U_CHECK, check);

codePoint = Integer.parseInt("5F3A", 16); // Decimal int: 24378
char[] strongChar = Character.toChars(codePoint);
String strong = String.valueOf(strongChar);
assertEquals(U_STRONG, strong);

It’s worth noting that if we work with Java 11 or later version, we can get the string directly from the code point integer using the Character.toString() method, for example:

值得注意的是,如果我们使用的是 Java 11 或更高版本,我们可以使用 Character.toString() 方法等直接从码位整数获取字符串:

// For Java 11 and later versions:
assertEquals(U_STRONG, Character.toString(codePoint));

Of course, we can wrap the implementation above in a method:

当然,我们也可以用一个方法来实现上述功能:

String stringFromCodePointHex(String codePointHex) {
    int codePoint = Integer.parseInt(codePointHex, 16);
    // For Java 11 and later versions: return Character.toString(codePoint)
    char[] chars = Character.toChars(codePoint);
    return String.valueOf(chars);
}

Finally, let’s test the method to make sure it produces the expected result:

最后,让我们测试一下这个方法,确保它能产生预期的结果:

assertEquals("A", stringFromCodePointHex("0041"));
assertEquals(U_CHECK, stringFromCodePointHex("2705"));
assertEquals(U_STRONG, stringFromCodePointHex("5F3A"));

5. Conclusion

5.结论

In this article, we first learned the significance of “xxxx” in the “\uxxxx” notation, then explored how to obtain the target Unicode string from the hexadecimal representation of a given code point.

在本文中,我们首先了解了 “xxxx”“\uxxxx” 符号中的意义,然后探讨了如何从给定码位的十六进制表示中获取目标 Unicode 字符串。

As always, the complete source code for the examples is available over on GitHub.

与往常一样,这些示例的完整源代码可在 GitHub 上获取。