Introduction to Apache Commons Text – 阿帕奇共享文本简介

最后修改: 2017年 7月 1日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Simply put, the Apache Commons Text library contains a number of useful utility methods for working with Strings, beyond what the core Java offers.

简单地说,Apache Commons Text库包含了许多有用的实用方法,用于处理Strings,超出了核心Java所提供的范围。

In this quick introduction, we’ll see what Apache Commons Text is, and what it is used for, as well as some practical examples of using the library.

在这个快速介绍中,我们将看到什么是Apache Commons Text,它有什么用途,以及一些使用该库的实际例子。

2. Maven Dependency

2.Maven的依赖性

Let’s start by adding the following Maven dependency to our pom.xml:

让我们先在pom.xml中加入以下Maven依赖项。

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-text</artifactId>
    <version>1.1</version>
</dependency>

You can find the latest version of the library at the Maven Central Repository.

您可以在Maven Central Repository上找到该库的最新版本。

3. Overview

3.概述

The root package org.apache.commons.text is divided into different sub-packages:

根包org.apache.commons.text被划分为不同的子包。

  • org.apache.commons.text.diff – diffs between Strings
  • org.apache.commons.text.similarity – similarities and distances between Strings
  • org.apache.commons.text.translate – translating text

Let’s see what each package can be used for – in more detail.

让我们看看每个包都可以用来做什么–更详细的。

3. Handling Text

3.处理文本

The org.apache.commons.text package contains multiple tools for working with Strings.

org.apache.commons.text包包含了多个处理字符串的工具。

For instance, WordUtils has APIs capable of capitalizing the first letter of each word in a String, swapping the case of a String, and checking if a String contains all words in a given array.

例如,WordUtils的API能够将字符串中每个单词的第一个字母大写,交换字符串的大小写,以及检查字符串是否包含给定数组中的所有单词。

Let’s see how we can capitalize the first letter of each word in a String:

让我们看看如何将字符串中每个单词的第一个字母大写:

@Test
public void whenCapitalized_thenCorrect() {
    String toBeCapitalized = "to be capitalized!";
    String result = WordUtils.capitalize(toBeCapitalized);
    
    assertEquals("To Be Capitalized!", result);
}

Here is how we can check if a string contains all words in an array:

下面是我们如何检查一个字符串是否包含一个数组中的所有单词。

@Test
public void whenContainsWords_thenCorrect() {
    boolean containsWords = WordUtils
      .containsAllWords("String to search", "to", "search");
    
    assertTrue(containsWords);
}

StrSubstitutor provides a convenient way to building Strings from templates:

StrSubstitutor提供了一种方便的方式来构建来自模板的Strings

@Test
public void whenSubstituted_thenCorrect() {
    Map<String, String> substitutes = new HashMap<>();
    substitutes.put("name", "John");
    substitutes.put("college", "University of Stanford");
    String templateString = "My name is ${name} and I am a student at the ${college}.";
    StrSubstitutor sub = new StrSubstitutor(substitutes);
    String result = sub.replace(templateString);
    
    assertEquals("My name is John and I am a student at the University of Stanford.", result);
}

StrBuilder is an alternative to Java.lang.StringBuilder. It provides some new features which are not provided by StringBuilder.

StrBuilderJava.lang.StringBuilder的一个替代品。它提供了一些StringBuilder没有提供的新功能。

For example, we can replace all occurrences of a String in another String or clear a String without assigning a new object to its reference.

例如,我们可以在另一个String中替换一个String的所有出现,或者清除一个String,而不给它的引用分配一个新对象。

Here’s a quick example to replace part of a String:

下面是一个快速的例子,用于替换一个字符串的一部分:

@Test
public void whenReplaced_thenCorrect() {
    StrBuilder strBuilder = new StrBuilder("example StrBuilder!");
    strBuilder.replaceAll("example", "new");
   
    assertEquals(new StrBuilder("new StrBuilder!"), strBuilder);
}

To clear a String, we can simply do that by calling the clear() method on the builder:

要清除一个字符串,我们可以简单地通过调用构建器上的clear()方法来实现。

strBuilder.clear();

4. Calculating the Diff Between Strings

4.计算字符串之间的差异

The package org.apache.commons.text.diff implements Myers algorithm for calculating diffs between two Strings.

org.apache.commons.text.diff实现了Myers算法,用于计算两个字符串之间的差异

The diff between two Strings is defined by a sequence of modifications that can convert one String to another.

两个字符串之间的差异是由一连串的修改所定义的,可以将一个字符串转换为另一个。

There are three types of commands that can be used to convert a String to another – InsertCommand, KeepCommand, and DeleteCommand.

有三种类型的命令可用于将一个字符串转换为另一个–InsertCommand,KeepCommandDeleteCommand

An EditScript object holds the script that should be run in order to convert a String to another. Let’s calculate the number of single-char modifications that should be made in order to convert a String to another:

一个EditScript对象持有应该运行的脚本,以便将一个String转换成另一个。让我们计算一下为了将一个字符串转换为另一个字符串而应该进行的单字符修改的数量。

@Test
public void whenEditScript_thenCorrect() {
    StringsComparator cmp = new StringsComparator("ABCFGH", "BCDEFG");
    EditScript<Character> script = cmp.getScript();
    int mod = script.getModifications();
    
    assertEquals(4, mod);
}

5. Similarities and Distances Between Strings

5.琴弦之间的相似性和距离

The org.apache.commons.text.similarity package contains algorithms useful for finding similarities and distances between Strings.

org.apache.commons.text.similarity包包含了有助于寻找字符串之间的相似性和距离的算法。

For example, LongestCommonSubsequence can be used to find the number of common characters in two Strings:

例如,LongestCommonSubsequence可用于查找两个Strings中的共同字符数。

@Test
public void whenCompare_thenCorrect() {
    LongestCommonSubsequence lcs = new LongestCommonSubsequence();
    int countLcs = lcs.apply("New York", "New Hampshire");
    
    assertEquals(5, countLcs);
}

Similarly, LongestCommonSubsequenceDistance can be used to find the number of different characters in two Strings:

同样,LongestCommonSubsequenceDistance可用于查找两个字符串中不同字符的数量。

@Test
public void whenCalculateDistance_thenCorrect() {
    LongestCommonSubsequenceDistance lcsd = new LongestCommonSubsequenceDistance();
    int countLcsd = lcsd.apply("New York", "New Hampshire");
    
    assertEquals(11, countLcsd);
}

6. Text Translation

6.文本翻译

The org.apache.text.translate package was initially created to allow us to customize the rules provided by StringEscapeUtils.

org.apache.text.translate包最初是为了让我们定制StringEscapeUtils提供的规则。

The package has a set of classes which are responsible for translating text to some of the different character encoding models such as Unicode and Numeric Character Reference. We can also create our own customized routines for translation.

该包有一组类,负责将文本翻译成一些不同的字符编码模式,如Unicode和Numeric Character Reference。我们还可以创建我们自己的定制例程来进行翻译。

Let’s see how we can convert a String to its equivalent Unicode text:

让我们看看如何将一个String转换为其对应的Unicode文本。

@Test
public void whenTranslate_thenCorrect() {
    UnicodeEscaper ue = UnicodeEscaper.above(0);
    String result = ue.translate("ABCD");
    
    assertEquals("\\u0041\\u0042\\u0043\\u0044", result);
}

Here, we are passing the index of the character that we want to start translation from to the above() method.

在这里,我们将我们想要开始翻译的字符的索引传递给above()方法。

LookupTranslator enables us to define our own lookup table where each character can have a corresponding value, and we can translate any text to its corresponding equivalent.

LookupTranslator使我们能够定义自己的查找表,每个字符都可以有一个相应的值,我们可以将任何文本翻译成其相应的等价物。

7. Conclusion

7.结论

In this quick tutorial, we’ve seen an overview of what Apache Commons Text is all about and some of its common features.

在这个快速教程中,我们已经看到了Apache Commons Text的概况和它的一些常见功能。

The code samples can be found over on GitHub.

代码样本可以在GitHub上找到over。