1. Overview
1.概述
In this tutorial, we’ll start by briefly going through some general category types for every defined Unicode code point or character range to understand the difference between letters and alphabetic characters.
在本教程中,我们将首先简要介绍每个定义的Unicode码位或字符范围的一些一般类别类型,以了解字母和字母字符的区别。
Further, we’ll look at the isAlphabetic() and isLetter() methods of the Character class in Java. Finally, we’ll cover the similarities and distinctions between these methods.
此外,我们将看看Java中Character类的isAlphabetic()和isLetter()方法。最后,我们将介绍这些方法的相似性和区别。
2. General Category Types of Unicode Characters
2.Unicode字符的一般分类类型
The Unicode Character Set (UCS) contains 1,114,112 code points: U+0000—U+10FFFF. Characters and code point ranges are grouped by categories.
Unicode字符集(UCS)包含1,114,112个代码点。u+0000-u+10ffff。字符和码位范围按类别进行分组。
The Character class provides two overloaded versions of the getType() method that returns a value indicating the character’s general category type.
字符类提供了两个重载版本的getType()方法,该方法返回一个指示字符的一般类别类型的值。
Let’s look at the signature of the first method:
让我们看一下第一个方法的签名。
public static int getType(char ch)
This method cannot handle supplementary characters. To handle all Unicode characters, including supplementary characters, Java’s Character class provides an overloaded getType method which has the following signature:
这个方法不能处理补充字符。为了处理所有Unicode字符,包括补充字符,Java的Character类提供了一个重载的getType方法,其签名如下。
public static int getType(int codePoint)
Next, let’s start looking at some general category types.
接下来,让我们开始看一下一些一般的类别类型。
2.1. UPPERCASE_LETTER
2.1. UPPERCASE_LETTER
The UPPERCASE_LETTER general category type represents upper-case letters.
UPPERCASE_LETTER一般类别类型代表大写字母。
When we call the Character#getType method on an upper-case letter, for example, ‘U‘, the method returns the value 1, which is equivalent to the UPPERCASE_LETTER enum value:
当我们在一个大写字母上调用Character#getType方法时,例如’U‘,该方法返回值1,这相当于UPPERCASE_LETTER的枚举值。
assertEquals(Character.UPPERCASE_LETTER, Character.getType('U'));
2.2. LOWERCASE_LETTER
2.2. LOWERCASE_LETTER
The LOWERCASE_LETTER general category type is associated with lower-case letters.
LOWERCASE_LETTER一般类别类型与小写字母有关。
When calling the Character#getType method on a lower-case letter, for instance, ‘u‘, the method will return the value 2, which is the same as the enum value of LOWERCASE_LETTER:
当在一个小写字母上调用Character#getType方法时,例如’u‘,该方法将返回值2,这与LOWERCASE_LETTER的枚举值相同。
assertEquals(Character.LOWERCASE_LETTER, Character.getType('u'));
2.3. TITLECASE_LETTER
2.3. TITLECASE_LETTER
Next, the TITLECASE_LETTER general category represents title case characters.
接下来,TITLECASE_LETTER一般类别代表标题大小写字符。
Some characters look like pairs of Latin letters. When we call the Character#getType method on such Unicode characters, this will return the value 3, which is equal to the TITLECASE_LETTER enum value:
有些字符看起来像一对拉丁字母。当我们对这类Unicode字符调用Character#getType方法时,这将返回值3,等于TITLECASE_LETTER枚举值。
assertEquals(Character.TITLECASE_LETTER, Character.getType('\u01f2'));
Here, the Unicode character ‘\u01f2‘ represents the Latin capital letter ‘D‘ followed by a small ‘Z‘ with a caron.
这里,Unicode字符’\u01f2‘代表拉丁大写字母’D‘,后面是一个带卡隆的小’Z。
2.4. MODIFIER_LETTER
2.4. MODIFIER_LETTER
A modifier letter, in the Unicode Standard, is “a letter or symbol typically written next to another letter that it modifies in some way”.
修饰字母,在Unicode标准中,是 “通常写在另一个字母旁边的字母或符号,它以某种方式修饰”。
The MODIFIER_LETTER general category type represents such modifier letters.
MODIFIER_LETTER一般类别类型代表这种修饰字母。
For example, the modifier letter small H, ‘ʰ‘, when passed to Character#getType method returns the value of 4, which is the same as the enum value of MODIFIER_LETTER:
例如,修饰字母小H,’ʰ‘,当传递给Character#getType方法时,返回值为4,这与MODIFIER_LETTER的枚举值相同。
assertEquals(Character.MODIFIER_LETTER, Character.getType('\u02b0'));
The Unicode character ‘\u020b‘ represents the modifier letter small H.
Unicode字符’u020b‘代表修饰字母小H。
2.5. OTHER_LETTER
2.5. OTHER_LETTER
The OTHER_LETTER general category type represents an ideograph or a letter in a unicase alphabet. An ideograph is a graphic symbol representing an idea or a concept, independent of any particular language.
OTHER_LETTER一般类别类型代表一个表意文字或一个单字母的字母。一个表意文字是代表一个思想或概念的图形符号,与任何特定的语言无关。
A unicase alphabet has just one case for its letters. For example, Hebrew is a unicase writing system.
单字母表的字母只有一种情况。例如,希伯来语是一种单字母书写系统。
Let’s look at an example of a Hebrew letter Alef, ‘א‘, when we pass it to the Character#getType method, it returns the value of 5, which is equal to the enum value of OTHER_LETTER:
让我们看一个希伯来字母Alef的例子,’א‘,当我们把它传给Character#getType方法时,它返回的值是5,等于OTHER_LETTER的枚举值。
assertEquals(Character.OTHER_LETTER, Character.getType('\u05d0'));
The Unicode character ‘\u05d0‘ represents the Hebrew letter Alef.
Unicode字符’\u05d0‘代表希伯来文字母Alef。
2.6. LETTER_NUMBER
2.6.LETTER_NUMBER
Finally, the LETTER_NUMBER category is associated with numerals composed of letters or letterlike symbols.
最后,LETTER_NUMBER类别与由字母或类字母符号组成的数字有关。
For example, the Roman numerals come under LETTER_NUMBER general category. When we call the Character#getType method with Roman Numeral Five, ‘Ⅴ’, it returns the value 10, which is equal to the enum LETTER_NUMBER value:
例如,罗马数字属于LETTER_NUMBER一般类别。当我们用罗马数字五”Ⅴ”调用Character#getType方法时,它返回值10,这等于枚举LETTER_NUMBER值。
assertEquals(Character.LETTER_NUMBER, Character.getType('\u2164'));
The Unicode character ‘\u2164‘ represents the Roman Numeral Five.
Unicode字符’u2164‘代表罗马数字五。
Next, let’s look at the Character#isAlphabetic method.
接下来,让我们看一下Character#isAlphabetic方法。
3. Character#isAlphabetic
3.字符#isAlphabetic
First, let’s look at the signature of the isAlphabetic method:
首先,让我们看一下isAlphabetic方法的签名。
public static boolean isAlphabetic(int codePoint)
This takes the Unicode code point as the input parameter and returns true if the specified Unicode code point is alphabetic and false otherwise.
这需要Unicode代码点作为输入参数,如果指定的Unicode代码点是字母,则返回true,否则返回false。
A character is alphabetic if its general category type is any of the following:
如果一个字符的一般类别类型是以下任何一种,那么它就是字母。
- UPPERCASE_LETTER
- LOWERCASE_LETTER
- TITLECASE_LETTER
- MODIFIER_LETTER
- OTHER_LETTER
- LETTER_NUMBER
Additionally, a character is alphabetic if it has contributory property Other_Alphabetic as defined by the Unicode Standard.
此外,如果一个字符具有Unicode标准所定义的贡献属性Other_Alphabetic,那么它就是字母字符。
Let’s look at a few examples of characters that are alphabets:
让我们来看看属于字母的几个例子。
assertTrue(Character.isAlphabetic('A'));
assertTrue(Character.isAlphabetic('\u01f2'));
In the above examples, we pass the UPPERCASE_LETTER ‘A’ and TITLECASE_LETTER ‘\u01f2′ which represents the Latin capital letter ‘D‘ followed by a small ‘Z‘ with a caron to the isAlphabetic method and it returns true.
在上面的例子中,我们把UPPERCASE_LETTER ‘A’和TITLECASE_LETTER ‘\u01f2’(代表拉丁大写字母’D‘,后面有一个小的’Z‘的卡龙)传递给isAlphabetic方法,它返回true。
4. Character#isLetter
4.字符#isLetter
Java’s Character class provides the isLetter() method to determine if a specified character is a letter. Let’s look at the method signature:
Java的Character类提供了isLetter()方法来确定一个指定的字符是否是字母。让我们来看看这个方法的签名。
public static boolean isLetter(char ch)
It takes a character as an input parameter and returns true if the specified character is a letter and false otherwise.
它接受一个字符作为输入参数,如果指定的字符是一个字母,则返回true,否则返回false。
A character is considered to be a letter if its general category type, provided by Character#getType method, is any of the following:
如果由Character#getType方法提供的一个字符的一般类别类型为以下任何一种,则该字符被认为是一个字母。
- UPPERCASE_LETTER
- LOWERCASE_LETTER
- TITLECASE_LETTER
- MODIFIER_LETTER
- OTHER_LETTER
However, this method cannot handle supplementary characters. To handle all Unicode characters, including supplementary characters, Java’s Character class provides an overloaded version of the isLetter() method:
然而,这个方法不能处理补充字符。为了处理所有的Unicode字符,包括补充字符,Java的Character类提供了一个重载版本的isLetter()方法。
public static boolean isLetter(int codePoint)
This method can handle all the Unicode characters as it takes a Unicode code point as the input parameter. Furthermore, it returns true if the specified Unicode code point is a letter as we defined earlier.
这个方法可以处理所有的Unicode字符,因为它需要一个Unicode代码点作为输入参数。此外,如果指定的Unicode码位是我们前面定义的字母,它将返回true。
Let’s look at a few examples of characters that are letters:
让我们来看看几个属于字母的例子。
assertTrue(Character.isAlphabetic('a'));
assertTrue(Character.isAlphabetic('\u02b0'));
In the above examples, we input the LOWERCASE_LETTER ‘a’ and MODIFIER_LETTER ‘\u02b0′ which represents the modifier letter small H to the isLetter method and it returns true.
在上面的例子中,我们在LOWERCASE_LETTER ‘a’和MODIFIER_LETTER ‘\u02b0’中输入代表修饰字母小H到isLetter方法,它返回true。
5. Compare and Contrast
5.比较和对比
Finally, we can see that all letters are alphabetic characters, but not all alphabetic characters are letters.
最后,我们可以看到,所有的字母都是英文字母,但不是所有的英文字母都是字母。
In other words, the isAlphabetic method returns true if a character is a letter or has the general category LETTER_NUMBER. Besides, it also returns true if the character has the Other_Alphabetic property defined by the Unicode Standard.
换句话说,isAlphabetic方法在一个字符是字母或具有一般类别LETTER_NUMBER时返回true。此外,如果该字符具有Unicode标准定义的Other_Alphabetic属性,它也会返回true。
First, let’s look at an example of a character which is a letter as well as an alphabet — character ‘a‘:
首先,让我们看看一个既是字母又是拼音的字符的例子–字符’a‘。
assertTrue(Character.isLetter('a'));
assertTrue(Character.isAlphabetic('a'));
The character ‘a‘, when passed to both isLetter() as well as isAlphabetic() methods as an input parameter, returns true.
当字符’a‘作为一个输入参数传递给isLetter()以及isAlphabetic()方法时,返回true。
Next, let’s look at an example of a character that is an alphabet but not a letter. In this case, we’ll use the Unicode character ‘\u2164‘, which represents the Roman Numeral Five:
接下来,让我们看看一个属于字母但不是字母的字符的例子。在这种情况下,我们将使用Unicode字符’\u2164‘,它代表罗马数字5。
assertFalse(Character.isLetter('\u2164'));
assertTrue(Character.isAlphabetic('\u2164'));
The Unicode character ‘\u2164‘ when passed to the isLetter() method returns false. On the other hand, when passed to the isAlphabetic() method, it returns true.
Unicode字符’\u2164‘传递给isLetter()方法时,返回false。另一方面,当传递给isAlphabetic()方法时,它返回true。。
Certainly, for the English language, the distinction makes no difference. Since all the letters of the English language come under the category of alphabets. On the other hand, some characters in other languages might have a distinction.
当然,对于英语来说,这种区别是没有任何意义的。因为英语的所有字母都属于字母表的范畴。另一方面,其他语言中的一些字符可能有区别。
6. Conclusion
6.结论
In this article, we learned about the different general categories of the Unicode code point. Moreover, we covered the similarities and differences between the isAlphabetic() and isLetter() methods.
在这篇文章中,我们了解了Unicode码位的不同总类。此外,我们还介绍了isAlphabetic()和isLetter()方法的异同。
As always, all these code samples are available over on GitHub.
一如既往,所有这些代码样本都可以在GitHub上找到。