Check if a String Contains Only Unicode Letters – 检查字符串是否只包含 Unicode 字母

最后修改: 2023年 11月 6日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’ll explore different ways to check if a string contains only Unicode letters.

在本教程中,我们将探讨检查字符串是否只包含 Unicode 字母的不同方法。

Unicode is a character encoding standard that represents most of the world’s written languages. In Java, it’s important to ensure that a string contains only Unicode characters to maintain data integrity and avoid unexpected behavior.

Unicode 是一种字符编码标准,代表了世界上大多数书面语言。在 Java 中,确保字符串仅包含 Unicode 字符对于维护数据完整性和避免意外行为非常重要

2. Character Class

2.字符

Java’s Character class provides a set of static methods that can be used to check various properties of characters. To determine if a string contains only Unicode letters, we can iterate through each character in the string and verify it using the Character.isLetter() method:

Java 的 Character 类提供了一组静态方法,可用于检查字符的各种属性。要确定字符串是否只包含 Unicode 字母,我们可以遍历字符串中的每个字符,并使用 Character.isLetter() 方法进行验证:

public class UnicodeLetterChecker {
    public boolean characterClassCheck(String input) {
        for (char c : input.toCharArray()) {
            if (!Character.isLetter(c)) {
                return false;
            }
        }
        return true;
    }
}

This approach checks each character one by one and returns false as soon as a non-letter character is encountered:

这种方法会逐个检查每个字符,并在遇到非字母字符时立即返回 false

@Test
public void givenString_whenUsingIsLetter_thenReturnTrue() {
    UnicodeLetterChecker checker = new UnicodeLetterChecker();

    boolean isUnicodeLetter = checker.characterClassCheck("HelloWorld");
    assertTrue(isUnicodeLetter);
}

3. Regular Expressions

3.正则表达式

Java provides powerful regular expression support for string manipulation. We can use the matches() method from the String class along with a regular expression pattern to verify if a string consists solely of Unicode letters:

Java 为字符串操作提供了强大的 正则表达式支持。我们可以使用 String 类中的 matches() 方法和正则表达式模式来验证字符串是否仅由 Unicode 字母组成:

public class UnicodeLetterChecker {
    public boolean regexCheck(String input) {
        Pattern pattern = Pattern.compile("^\\p{L}+$");
        Matcher matcher = pattern.matcher(input);
        return matcher.matches();
    }
}

In this example, the regular expression \\p{L}+ matches one or more Unicode letters. If the string contains only Unicode letters, the method will return true:

在本例中,正则表达式 \p{L}+ 匹配一个或多个 Unicode 字母。如果字符串只包含 Unicode 字母,该方法将返回 true

@Test
public void givenString_whenUsingRegex_thenReturnTrue() {
    UnicodeLetterChecker checker = new UnicodeLetterChecker();

    boolean isUnicodeLetter = checker.regexCheck("HelloWorld");
    assertTrue(isUnicodeLetter);
}

4. Apache Commons Lang Library

4.Apache Commons Lang 库

The Apache Commons Lang library provides a convenient method in the StringUtils class to check if a string contains only Unicode letters. We can take advantage of the StringUtils.isAlpha() method to check if a string contains only letters:

Apache Commons Lang 库在 StringUtils 类中提供了一个方便的方法,用于检查字符串是否只包含 Unicode 字母。我们可以利用 StringUtils.isAlpha() 方法来检查字符串是否只包含字母:

public class UnicodeLetterChecker {
    public boolean isAlphaCheck(String input) {
        return StringUtils.isAlpha(input);
    }
}

The above method provides a convenient way to check if a string contains only letters, including Unicode letters, without writing custom logic:

上述方法为检查字符串是否只包含字母(包括 Unicode 字母)提供了一种方便的方法,无需编写自定义逻辑:

@Test
public void givenString_whenUsingIsAlpha_thenReturnTrue() {
    UnicodeLetterChecker checker = new UnicodeLetterChecker();

    boolean isUnicodeLetter = checker.isAlphaCheck("HelloWorld");
    assertTrue(isUnicodeLetter);
}

5. Java Streams

5. Java 流

Java Streams provide a powerful and concise way to determine if a string contains only Unicode letters. This approach ensures the string exclusively consists of valid Unicode letters, making it a robust solution for character validation.

Java 提供了一种强大而简洁的方法来确定字符串是否只包含 Unicode 字母。这种方法可确保字符串仅包含有效的 Unicode 字母,因此是一种强大的字符验证解决方案

By working with the String’s codePoints() and utilizing the allMatch() method, we can efficiently check if each character in the input string is a letter and belongs to a recognized Unicode script:

通过使用 String 的 codePoints()allMatch() 方法,我们可以有效地检查输入字符串中的每个字符是否都是字母,是否属于公认的 Unicode 脚本:

public class UnicodeLetterChecker {
    public boolean StreamsCheck(String input){
        return input.codePoints().allMatch(Character::isLetter);
    }
}

The above example uses the codePoints() method to convert the String into a stream of Unicode code points and then uses the allMatch() method to ensure that all code points are letters:

上例使用 codePoints() 方法将 String 转换为 Unicode 代码点流,然后使用 allMatch() 方法确保所有代码点都是字母:

@Test
public void givenString_whenUsingStreams_thenReturnTrue() {
    UnicodeLetterChecker checker = new UnicodeLetterChecker();

    boolean isUnicodeLetter = checker.StreamsCheck("HelloWorld");
    assertTrue(isUnicodeLetter);
}

6. Conclusion

6.结论

In this article, we’ve explored various methods for determining if a string comprises solely Unicode letters.

在本文中,我们探讨了确定字符串是否仅由 Unicode 字母组成的各种方法。

Regular expressions offer a powerful and concise way, while the Character class provides fine-grained control. Libraries like Apache Commons Lang can simplify the process, and Java Streams offer a modern, functional approach. Depending on our specific use case, one of these methods should serve us well to validate strings for Unicode letters.

正则表达式提供了一种强大而简洁的方法,而 Character 类则提供了细粒度控制。Apache Commons Lang 等库可以简化这一过程,而 Java Streams 则提供了一种现代的功能性方法。根据我们的具体使用情况,其中一种方法可以很好地帮助我们验证字符串中的 Unicode 字母。

As always, the full source code is available over on GitHub.

一如既往,您可以在 GitHub 上获取完整的源代码