Java Localization – Formatting Messages – Java 本地化 – 格式化信息

最后修改: 2019年 5月 9日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.绪论

In this tutorial, we’ll consider how we can localize and format messages based on Locale.

在本教程中,我们将考虑如何根据Locale对消息进行本地化和格式化。

We’ll use both Java’s MessageFormat and the third-party library, ICU.

我们将同时使用Java的MessageFormat和第三方库ICU.

2. Localization Use Case

2.本地化用例

When our application acquires a wide audience of users from all over the world, we may naturally want to show different messages based on the user’s preferences.

当我们的应用程序获得了来自世界各地的广大用户,我们可能自然希望根据用户的喜好显示不同的信息

The first and most important aspect is the language that the user speaks. Others might include currency, number and date formats. Last but not least are cultural preferences: what is acceptable for users from one country might be intolerable for others.

第一个也是最重要的方面是用户使用的语言。其他可能包括货币、数字和日期格式。最后但并非最不重要的是文化偏好:一个国家的用户可以接受的东西,对其他国家的用户来说可能是无法容忍的。

Suppose that we have an email client and we want to show notifications when a new message arrives.

假设我们有一个电子邮件客户端,我们想在有新邮件到达时显示通知。

A simple example of such a message might be this one:

这种信息的一个简单例子可能是这样的。

Alice has sent you a message.

It’s fine for English-speaking users, but non-English speaking ones might be not that happy. For example, French-speaking users would prefer to see this message:

对讲英语的用户来说,这很好,但非英语用户可能就不那么高兴了。例如,讲法语的用户更愿意看到这个消息。

Alice vous a envoyé un message.

While Polish people would be pleased by seeing this one:

虽然波兰人看到这个会很高兴。

Alice wysłała ci wiadomość.

What if we want to have a properly-formatted notification even in the case when Alice sends not just one message, but few messages?

如果我们想在爱丽丝不只发送一条信息,而是发送几条信息的情况下,也能有一个正确格式的通知呢?

We might be tempted to address the issue by concatenating various pieces in a single string, like this:

我们可能会想通过在一个字符串中串联各种片段来解决这个问题,比如这样。

String message = "Alice has sent " + quantity + " messages";

The situation can easily get out of control when we need notifications in the case when not only Alice but also Bob might send the messages:

当我们需要通知的情况下,不仅是Alice,还有Bob也可能发送消息,情况很容易失控。

Bob has sent two messages.
Bob a envoyé deux messages.
Bob wysłał dwie wiadomości.

Notice, how the verb changes in the case of Polish (wysłała vs wysłał) language. It illustrates the fact that banal string concatenation is rarely acceptable for localizing messages.

注意,在波兰语(wysłała vs wysłał)的情况下,动词如何变化。它说明了这样一个事实:banal字符串连接法很少被接受用于本地化信息

As we see, we get two types of issues: one is related to translations and the other is related to formats. Let’s address them in the following sections.

正如我们所看到的,我们得到两种类型的问题。一个是与翻译有关,另一个是与格式有关。让我们在下面的章节中解决它们。

3. Message Localization

3.信息本地化

We may define the localization, or l10n, of an application as the process of adapting the application to the user’s comfort. Sometimes, the term internalization, or i18n, is also used.

我们可以将应用程序的本地化,或l10n,定义为使应用程序适应用户的舒适度的过程。有时,也会使用内部化,i18n这个术语。

In order to localize the application, first of all, let’s eliminate all hardcoded messages by moving them into our resources folder:

为了使应用程序本地化,首先,让我们消除所有的硬编码信息,把它们移到我们的resources文件夹。

messages localization

Each file should contain key-value pairs with the messages in the corresponding language. For example, file messages_en.properties should contain the following pair:

每个文件都应该包含相应语言的信息的键值对。例如,文件messages_en.properties应该包含以下一对。

label=Alice has sent you a message.

messages_pl.properties should contain the following pair:

messages_pl.properties应该包含以下一对。

label=Alice wysłała ci wiadomość.

Similarly, other files assign appropriate values to the key label. Now, in order to pick up the English version of the notification, we can use ResourceBundle:

同样地,其他文件也为键label分配了适当的值。现在,为了拿起英文版的通知,我们可以使用ResourceBundle

ResourceBundle bundle = ResourceBundle.getBundle("messages", Locale.UK);
String message = bundle.getString("label");

The value of the variable message will be “Alice has sent you a message.”

变量message的值将是“Alice给你发了一条信息”

Java’s Locale class contains shortcuts to frequently used languages and countries.

Java的Locale类包含常用语言和国家的快捷方式。

In the case of the Polish language, we might write the following:

在波兰语的情况下,我们可以写成以下内容。

ResourceBundle bundle
  = ResourceBundle.getBundle("messages", Locale.forLanguageTag("pl-PL"));
String message = bundle.getString("label");

Let’s just mention that if we provide no locale, then the system will use a default one. We may more details on this issue in our article “Internationalization and Localization in Java 8“. Then, among available translations, the system will choose the one that is the most similar to the currently active locale.

我们只想说,如果我们没有提供语言,那么系统将使用默认的语言。关于这个问题,我们可以在我们的文章”Java 8中的国际化和本地化“中了解更多细节。然后,在可用的翻译中,系统将选择与当前活动的locale最相似的一个。

Placing the messages in the resource files is a good step towards rendering the application more user-friendly. It makes it easier to translate the whole application for the following reasons:

将信息放在资源文件中是使应用程序更方便用户的一个好步骤。它使整个应用程序的翻译变得更加容易,原因如下。

  1. a translator does not have to look through the application in search of the messages
  2. a translator can see the whole phrase which helps to grasp the context and hence facilitates a better translation
  3. we don’t have to recompile the whole application when a translation for a new language is ready

4. Message Format

4.信息格式

Even though we have moved the messages from the code into a separate location, they still contain some hardcoded information. It would be nice to be able to customize the names and numbers in the messages in such a way that they remain grammatically correct.

尽管我们已经将信息从代码中移到了一个单独的位置,但它们仍然包含一些硬编码信息。如果能够自定义信息中的名称和数字,那就更好了,使其在语法上保持正确。

We may define the formatting as a process of rendering the string template by substituting the placeholders by their values.

我们可以将格式化定义为通过用占位符的值来替换字符串模板的过程。

In the following sections, we’ll consider two solutions that allow us to format the messages.

在下面的章节中,我们将考虑两种解决方案,使我们能够对信息进行格式化。

4.1. Java’s MessageFormat

4.1 Java的消息格式

In order to format strings, Java defines numerous format methods in java.lang.String. But, we can get even more support via java.text.format.MessageFormat.

为了格式化字符串,Java在java.lang.String中定义了众多的格式方法。但是,我们可以通过java.text.format.MessageFormat获得更多支持。

To illustrate, let’s create a pattern and feed it to a MessageFormat instance:

为了说明这一点,让我们创建一个模式,并将其送入一个MessageFormat实例。

String pattern = "On {0, date}, {1} sent you "
  + "{2, choice, 0#no messages|1#a message|2#two messages|2<{2, number, integer} messages}.";
MessageFormat formatter = new MessageFormat(pattern, Locale.UK);

The pattern string has slots for three placeholders.

模式字符串有三个占位符的插槽。

If we supply each value:

如果我们提供每个值。

String message = formatter.format(new Object[] {date, "Alice", 2});

Then MessageFormat will fill in the template and render our message:

然后MessageFormat将填写模板并呈现我们的信息。

On 27-Apr-2019, Alice sent you two messages.

4.2. MessageFormat Syntax

4.2 MessageFormat 语法

From the example above, we see that the message pattern:

从上面的例子中,我们看到,信息模式。

pattern = "On {...}, {..} sent you {...}.";

contains placeholders which are the curly brackets {…} with a required argument index and two optional arguments, type and style:

包含占位符,即大括号{…},有一个必要参数index和两个可选的参数typestyle

{index}
{index, type}
{index, type, style}

The placeholder’s index corresponds to the position of an element from the array of objects that we want to insert.

占位符的索引对应于我们要插入的对象数组中的一个元素的位置。

When present, the type and style may take the following values:

当出现时,typestyle可以取以下值。

type style
number integer, currency, percent, custom format
date short, medium, long, full, custom format
time short, medium, long, full, custom format
choice custom format

The names of the types and styles largely speak for themselves, but we can consult the official documentation for more details.

类型和样式的名称基本上不言自明,但我们可以查阅官方文档以了解更多细节。

Let’s take a closer look, though, at custom format

不过,让我们仔细看看,自定义格式

In the example above, we used the following format expression:

在上面的例子中,我们使用了以下格式表达式。

{2, choice, 0#no messages|1#a message|2#two messages|2<{2, number, integer} messages}

In general, the choice style has the form of options separated by the vertical bar (or pipe):

一般来说,选择样式有选项的形式,由竖条(或管道)分隔。

message format syntax

Inside the options, the match value ki and the string vi are separated by # except for the last option. Notice that we may nest other patterns into the string vi as we did it for the last option:

在选项中,匹配值ki和字符串vi用#分隔,最后一个选项除外。注意,我们可以将其他模式嵌套到字符串vi中,正如我们对最后一个选项所做的那样。

{2, choice, ...|2<{2, number, integer} messages}

The choice type is a numeric-based one, so there is a natural ordering for the match values k that split a numeric line into intervals:

选择类型是一个基于数字的类型,所以对于匹配值ki 有一个自然的排序,将一个数字行分割成区间。

choice style ordering

If we give a value k that belongs to the interval [ki, ki+1) (the left end is included, the right one is excluded), then value vi is selected.

如果我们给出一个属于区间k的值[ki, ki+1)(左端包含,右端排除),那么值vi被选中。

Let’s consider in more details the ranges of the chosen style. To this end, we take this pattern:

让我们更详细地考虑所选风格的范围。为此,我们采取这种模式。

pattern = "You''ve got "
  + "{0, choice, 0#no messages|1#a message|2#two messages|2<{0, number, integer} messages}.";

and pass various values for its unique placeholder:

并为其独特的占位符传递各种数值。

n message
-1, 0, 0.5 You’ve got no messages.
1, 1.5 You’ve got a message.
2 You’ve got two messages.
2.5 You’ve got 2 messages.
5 You’ve got 5 messages.

4.3. Making Things Better

4.3.让事情变得更好

So, we’re now formatting our messages. But, the message itself remains hardcoded.

所以,我们现在正在格式化我们的消息。但是,消息本身仍然是硬编码的。

From the previous section, we know that we should extract the strings patterns to the resources. To separate our concerns, let’s create another bunch of resource files called formats:

从上一节中,我们知道我们应该将字符串模式提取到资源中。为了将我们的关注点分开,让我们再创建一批名为formats的资源文件。

messages format

In those, we’ll create a key called label with language-specific content.

在这些中,我们将创建一个名为label的键,带有特定语言的内容。

For example, in the English version, we’ll put the following string:

例如,在英文版本中,我们会放上以下字符串。

label=On {0, date, full} {1} has sent you 
  + {2, choice, 0#nothing|1#a message|2#two messages|2<{2,number,integer} messages}.

We should slightly modify the French version because of the zero message case:

由于零信息的情况,我们应该稍微修改一下法语版本。

label={0, date, short}, {1}{2, choice, 0# ne|0<} vous a envoyé 
  + {2, choice, 0#aucun message|1#un message|2#deux messages|2<{2,number,integer} messages}.

And we’d need to do similar modifications as well in the Polish and Italian versions.

而我们在波兰语和意大利语版本中也需要做类似的修改。

In fact, the Polish version exhibits yet another problem. According to the grammar of the Polish language (and many others), the verb has to agree in gender with the subject. We could resolve this problem by using the choice type, but let’s consider another solution.

事实上,波兰语版本还表现出另一个问题。根据波兰语(以及其他许多语言)的语法,动词的性别必须与主语一致。我们可以通过使用选择类型来解决这个问题,但让我们考虑另一个解决方案。

4.4. ICU’s MessageFormat

4.4 ICU的信息格式

Let’s use the International Components for Unicode (ICU) library. We have already mentioned it in our Convert a String to Title Case tutorial. It’s a mature and widely-used solution that allows us to customize the application for various languages.

让我们使用International Components for Unicode(ICU)库。我们已经在将字符串转换为标题大小写的教程中提到过它。这是一个成熟且广泛使用的解决方案,它允许我们为各种语言定制应用程序。

Here, we’re not going to explore it in full details. We’ll just limit ourselves to what our toy application needs. For the most comprehensive and updated information, we should check the ICU’s official site.

在这里,我们不打算探讨它的全部细节。我们只限于我们的玩具应用需要的内容。为了获得最全面和最新的信息,我们应该查看ICU的官方网站

At the time of writing, the latest version of ICU for Java (ICU4J) is 64.2. As usual, in order to start using it, we should add it as a dependency to our project:

在撰写本文时,ICU for Java(ICU4J的最新版本是64.2。像往常一样,为了开始使用它,我们应该把它作为一个依赖项添加到我们的项目中。

<dependency>
    <groupId>com.ibm.icu</groupId>
    <artifactId>icu4j</artifactId>
    <version>64.2</version>
</dependency>

Suppose that we want to have a properly formed notification in various languages and for different numbers of messages:

假设我们想用各种语言和不同数量的信息进行正确形成的通知。

N English Polish
0 Alice has sent you no messages.
Bob has sent you no messages.
Alice nie wysłała ci żadnej wiadomości.
Bob nie wysłał ci żadnej wiadomości.
1 Alice has sent you a message.
Bob has sent you a message.
Alice wysłała ci wiadomość.
Bob wysłał ci wiadomość.
> 1 Alice has sent you N messages.
Bob has sent you N messages.
Alice wysłała ci N wiadomości.
Bob wysłał ci N wiadomości.

First of all, we should create a pattern in the locale-specific resource files.

首先,我们应该在本地特定的资源文件中创建一个模式。

Let’s re-use the file formats.properties and add there a key label-icu with the following content:

让我们重新使用文件formats.properties,在那里添加一个键label-icu,内容如下。

label-icu={0} has sent you
  + {2, plural, =0 {no messages} =1 {a message}
  + other {{2, number, integer} messages}}.

It contains three placeholders which we feed by passing there a three-element array:

它包含三个占位符,我们通过传递一个三元素数组将其送入。

Object[] data = new Object[] { "Alice", "female", 0 }

We see that in the English version, the gender-valued placeholder is of no use, while in the Polish one:

我们看到,在英语版本中,性别值占位符没有用,而在波兰语版本中,性别值占位符没有用。

label-icu={0} {2, plural, =0 {nie} other {}}
+  {1, select, male {wysłał} female {wysłała} other {wysłało}} 
+  ci {2, plural, =0 {żadnych wiadomości} =1 {wiadomość}
+  other {{2, number, integer} wiadomości}}.

we use it in order to distinguish between wysłał/wysłała/wysłało.

我们用它来区分wysłał/wysłała/wysłało

5. Conclusion

5.总结

In this tutorial, we considered how to localize and format the messages that we demonstrate to the users of our applications.

在本教程中,我们考虑了如何对我们向应用程序的用户展示的信息进行本地化和格式化。

As always, the code snippets for this tutorial are on our GitHub repository.

一如既往,本教程的代码片段在我们的GitHub库中。