1. Overview
1.概述
In this tutorial, we’ll learn how to set character encoding in Maven.
在本教程中,我们将学习如何在Maven中设置字符编码。
We’ll showcase how to set encoding for some common Maven plugins.
我们将展示如何为一些常见的Maven插件设置编码。
Also, we’ll see how to set the encoding at a project level, as well as through the command line.
此外,我们将看到如何在项目层面上设置编码,以及通过命令行设置编码。
2. What Is Encoding and Why Should We Care?
2.什么是编码,我们为什么要关心?
There are lots of different languages in the world that use different characters.
世界上有很多不同的语言,使用不同的字符。
One system of mapping characters, called Unicode, has well over 100,000 characters, symbols, and even emoticons (emoji).
一个被称为Unicode的字符映射系统拥有超过10万个字符、符号,甚至是表情符号(Emoji)。
So that we don’t use vast amounts of memory, we use a mapping system, called an encoding, to convert a character between bits and bytes, and a human-readable character on a screen.
为了不使用大量的内存,我们使用一个称为编码的映射系统,在比特和字节之间转换一个字符,并在屏幕上转换为人类可读的字符。
There are now lots of encoding systems. To read a file, we must know which encoding system is used.
现在有很多的编码系统。要读取一个文件,我们必须知道使用的是哪种编码系统。
2.1. What Happens if We Don’t Declare Encoding in Maven?
2.1.如果我们不在Maven中声明编码会发生什么?
Maven considers encoding important enough that if we don’t declare an encoding, then it will log out a warning.
Maven认为编码很重要,如果我们不声明编码,那么它就会记录下一个警告。
In fact, this warning occupies the number one spot of the FAQ page on the Apache Maven site.
事实上,这个警告占据了Apache Maven网站FAQ页面的第一位置。
To see this warning, let’s add a couple of plugins to our build.
为了看到这个警告,让我们在我们的构建中添加几个插件。
Firstly, let’s add maven-resources-plugin, which will copy resources into an output directory:
首先,让我们添加maven-resources-plugin,它将把资源复制到一个输出目录。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-resources-plugin</artifactId>
<version>3.2.0</version>
</plugin>
We’ll also want to compile our code files, so let’s add maven-compiler-plugin:
我们还想编译我们的代码文件,所以让我们添加maven-compiler-plugin。
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
</plugin>
As we’re working inside a multi-module project, then a parent POM may have already set encoding for us. For demo purposes, let’s clear the encoding property by overriding it (don’t worry, we’ll come back to this later):
由于我们是在一个多模块项目中工作,那么父级POM可能已经为我们设置了编码。为了演示的目的,让我们通过覆盖它来清除编码属性(别担心,我们以后再来讨论这个问题)。
<properties>
<project.build.sourceEncoding></project.build.sourceEncoding>
</properties>
Let’s run the plugin using the standard Maven command:
让我们用标准的Maven命令来运行该插件。
mvn clean install
Un-setting our encoding like this can break the build! We’ll see in our logging that we get the following warning:
像这样取消设置我们的编码会破坏构建!我们将在日志中看到,我们得到以下警告。
[INFO] --- maven-resources-plugin:3.2.0:resources (default-resources) @ maven-properties ---
[WARNING] Using platform encoding (Cp1252 actually) to copy filtered resources, i.e. build is platform dependent!
The warning states that if no encoding system is specified, Maven will use the platform default.
该警告指出,如果没有指定编码系统,Maven将使用平台默认值。
Normally on Windows, the default is Windows-1252 (aka CP-1252, or Cp1252).
通常在Windows上,默认是Windows-1252(又称CP-1252,或Cp1252)。
This default could change based on the local environment. We’ll see below how we can remove this platform dependency from our build.
这个默认值可以根据本地环境而改变。我们将在下面看到如何从我们的构建中删除这个平台依赖。
2.2. What Happens if We Declare an Incorrect Encoding in Maven?
2.2.如果我们在Maven中声明了一个不正确的编码,会发生什么?
A maven is a build tool that needs to be able to read source files.
maven是一种构建工具,需要能够读取源文件。
In order to read source files, Maven must be set to use the same encoding that the source files are encoded in.
为了读取源文件,Maven必须设置为使用与源文件编码相同的编码。
Maven also produces files that are typically distributed to another computer. Therefore, it is important to write output files using an expected encoding. Output files that are not in the expected encoding could fail to be read on a different system.
Maven产生的文件通常也会被分发到另一台电脑上。因此,使用预期的编码编写输出文件很重要。不采用预期编码的输出文件在不同的系统上可能无法被读取。
To show this, let’s add a simple Java class that uses non-ASCII characters:
为了说明这一点,让我们添加一个使用非ASCII字符的简单Java类。
public class NonAsciiString {
public static String getNonAsciiString() {
String nonAsciiŞŧř = "ÜÝÞßàæç";
return nonAsciiŞŧř;
}
}
In our POM, let’s set our build to use ASCII encoding:
在我们的POM中,让我们将构建设置为使用ASCII编码。
<properties>
<project.build.sourceEncoding>US-ASCII</project.build.sourceEncoding>
</properties>
Running this using mvn clean install, we see that we get many build errors of the form:
使用mvn clean install运行这个,我们看到我们得到许多形式的构建错误。
[ERROR] /Baeldung/tutorials/maven-modules/maven-properties/src/main/java/
com/baeldung/maven/properties/NonAsciiString.java:[15,31] unmappable character (0xC3) for encoding US-ASCII
We’re seeing this because our files contain non-ASCII characters, so they can’t be read through ASCII encoding.
我们看到这种情况是因为我们的文件包含非ASCII字符,所以它们不能通过ASCII编码来读取。
Where possible, it’s a good idea to keep things simple and avoid using non-ASCII characters.
在可能的情况下,保持简单并避免使用非ASCII字符是个好主意。
In the next section, we’ll see it’s also a good idea to set Maven to use UTF-8 encoding to avoid any issues.
在下一节中,我们将看到将Maven设置为使用UTF-8编码也是一个好主意,以避免任何问题。
3. How Do We Set Encoding in Maven Configuration?
3.我们如何在Maven配置中设置编码?
Firstly, let’s look at how we set the encoding at a plugin level.
首先,让我们看看我们如何在插件层面上设置编码。
We’ll then see that we can set project-wide properties. This means that we don’t need to declare an encoding in every plugin.
然后我们会看到,我们可以设置项目范围的属性。这意味着我们不需要在每个插件中都声明一个编码。
3.1. How Do We Set the encoding Parameter in a Maven Plugin?
3.1.我们如何在Maven插件中设置encoding参数?
Most plugins come with an encoding parameter, which makes this very simple.
大多数插件都有一个encoding参数,这使得这非常简单。
We’ll need to set the encoding in the maven-resources-plugin and maven-compiler-plugin. We can simply add the encoding parameter to each of our Maven plugins:
我们需要在maven-resources-plugin和maven-compiler-plugin中设置编码。我们可以简单地在每个Maven插件中添加encoding参数。
<configuration>
<encoding>UTF-8</encoding>
</configuration>
Let’s run this code using mvn clean install and take a look at the logging:
让我们使用mvn clean install来运行这段代码,并看一下日志。
[INFO] --- maven-resources-plugin:3.2.0:resources (default-resources) @ maven-properties ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
We can see that the plugin is now using UTF-8, and we’ve solved the warnings above.
我们可以看到,该插件现在正在使用UTF-8,而且我们已经解决了上面的警告。
3.2. How Do We Set a Project-Wide encoding Parameter in a Maven Build?
3.2.我们如何在Maven构建中设置项目范围内的encoding参数?
Remembering to set an encoding for each plugin that we declare is very cumbersome.
记住为我们声明的每个插件设置一个编码是非常麻烦的。
Thankfully, most Maven plugins use the same global Maven property as a default for their encoding parameter.
值得庆幸的是,大多数Maven插件在encoding参数中使用相同的全局Maven属性作为默认。
As we saw earlier, let’s remove the encoding parameters from our plugins and instead set:
正如我们前面所看到的,让我们从我们的插件中删除encoding参数,而是设置。
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
Running our build produces the same UTF-8 line of logging that we saw above.
运行我们的构建产生了与我们上面看到的UTF-8行记录相同的内容。
In a multi-module project, we would typically look to set this property in the parent POM.
在一个多模块项目中,我们通常会在父级POM中设置这个属性。
This property will be overridden by any plugin-specific properties that are set.
该属性将被任何设置的插件特定属性所覆盖。
It’s important to remember that plugins are not obliged to use this property. For example, earlier versions (<2.2) of the maven-war-plugin would ignore this property.
重要的是要记住,插件不一定要使用这个属性。例如,早期版本(<2.2)的maven-war-plugin会忽略这个属性。
3.3. How Do We Set a Project-Wide encoding Parameter for a Reporting Plugin?
3.3.我们如何为报告插件设置一个项目范围的encoding参数?
Perhaps surprisingly, we must set two properties in order to guarantee that we’ve set project-wide encoding for all cases.
也许令人惊讶的是,我们必须设置两个属性,以保证我们已经为所有情况设置了项目范围的编码。
To illustrate this, we’ll use properties-maven-plugin:
为了说明这一点,我们将使用properties-maven-plugin:
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>properties-maven-plugin</artifactId>
<version>1.1.0</version>
</plugin>
Let’s also set a new system-wide property to be empty:
我们也来设置一个新的全系统属性为空。
<project.reporting.outputEncoding></project.reporting.outputEncoding>
If we run a mvn clean install now, our build will fail with the logging:
如果我们现在运行mvn clean install,我们的构建将因日志而失败。
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-pmd-plugin:3.13.0:pmd (pmd) on project maven-properties: Execution pmd of goal
org.apache.maven.plugins:maven-pmd-plugin:3.13.0:pmd failed: org.apache.maven.reporting.MavenReportException: : UnsupportedEncodingException -> [Help 1]
Even though we’ve set project.build.sourceEncoding, this plugin is also using a different property. To understand why this is, we must understand the difference between Maven Build Configuration and Maven Report Configuration.
尽管我们设置了project.build.sourceEncoding,但这个插件也在使用一个不同的属性。要理解这一点,我们必须了解Maven Build Configuration和Maven Report Configuration之间的区别。
Plugins can be used in either the Build process or the Reporting process, which uses separate property keys.
插件可用于构建过程或报告过程,后者使用单独的属性键。
This means that just setting project.build.sourceEncoding is not enough. We also need to add the following property for the Reporting process:
这意味着仅仅设置project.build.sourceEncoding是不够的。我们还需要为报告过程添加以下属性。
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
It is advisable to set both of the properties at a project-wide level.
最好在项目范围内设置这两个属性。。
3.4. How Do We Set Maven Encoding on the Command Line?
3.4.我们如何在命令行中设置Maven编码?
We are able to set properties through command line arguments without adding any config to POM files. We might do this because we don’t have write-access to the pom.xml files.
我们能够通过命令行参数来设置属性,而无需向POM文件添加任何配置。我们可能会这样做,因为我们没有对pom.xml文件的写权限。
Let’s run the following to specify the encoding that the build should use:
让我们运行以下程序来指定构建时应该使用的编码。
mvn clean install -Dproject.build.sourceEncoding=UTF-8 -Dproject.reporting.outputEncoding=UTF-8
Command-line arguments override any existing config.
命令行参数覆盖任何现有配置。
Therefore, this allows us to run the build successfully even if we remove any encoding properties set in the pom.xml files.
因此,这使我们能够成功地运行构建,即使我们删除了pom.xml文件中设置的任何编码属性。
4. Using Multiple Types of Encoding Within the Same Maven Project
4.在同一Maven项目中使用多种类型的编码
It is a good idea to use using a single type of encoding across a project.
在一个项目中使用单一类型的编码是一个好主意。
However, we might be forced to deal with multiple types of encoding in the same build. For example, our resource files may have different encoding systems, which may be beyond our control.
然而,我们可能被迫在同一个构建中处理多种类型的编码。例如,我们的资源文件可能有不同的编码系统,这可能是我们无法控制的。
Is there a way we can do this? Well, it depends on the situation.
有什么办法可以做到这一点吗?嗯,这取决于情况。
We saw that we could set encoding parameters on a plugin-by-plugin basis. Hence if we require our code in CP-1252 but want to output test results in UTF-8, then we are able to do this.
我们看到,我们可以在每个插件的基础上设置encoding参数。因此,如果我们要求我们的代码是CP-1252,但又想以UTF-8输出测试结果,那么我们就可以这样做。
We’re even able to use multiple types of encoding within the same plugin by using different executions.
我们甚至能够通过使用不同的执行方式在同一个插件中使用多种类型的编码。
In particular, the maven-resources-plugin, which we saw earlier, has extra functionality built into it.
尤其是我们之前看到的maven-resources-plugin,它内置了额外的功能。
We saw the encoding parameter earlier. The plugin also provides a propertiesEncoding parameter to allow property files to be encoded in a different way from other resources:
我们在前面看到了encoding参数。该插件还提供了一个propertiesEncoding参数,允许属性文件以不同于其他资源的方式进行编码。
<configuration>
<encoding>UTF-8</encoding>
<propertiesEncoding>ISO-8859-1</propertiesEncoding>
</configuration>
When the build is run using mvn clean install, this gives:
当使用mvn clean install运行构建时,会出现以下情况。
[INFO] --- maven-resources-plugin:3.2.0:resources (default-resources) @ maven-properties ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Using 'ISO-8859-1' encoding to copy filtered properties files.
It’s always worth referring to the technical documentation on maven.apache.org when investigating how a plugin can use an encoding.
在研究一个插件如何使用某种编码时,总是值得参考maven.apache.org上的技术文档。
5. Conclusion
5.总结
In this article, we saw that declaring encoding helps ensure that the code builds in the same way in any environment.
在这篇文章中,我们看到声明编码有助于确保代码在任何环境下都以相同的方式构建。
We saw that we could set an encoding parameter at the plugin level.
我们看到,我们可以在插件层面设置一个编码参数。
Then, we learned that there are two properties that we can set at a project level. They are project.build.sourceEncoding and project.reporting.outputEncoding.
然后,我们了解到有两个属性,我们可以在项目层面上设置。它们是project.build.sourceEncoding和project.Report.outputEncoding.。
We also saw that it is possible to pass encoding in via the command line. This allows us to set the encoding type without editing the Maven POM files.
我们还看到,可以通过命令行传入编码。这使我们可以在不编辑Maven POM文件的情况下设置编码类型。
Finally, we looked at how we could approach using multiple types of encoding within the same project.
最后,我们研究了如何在同一个项目中使用多种类型的编码的方法。
As always, the example project is available over on GitHub.
一如既往,该示例项目可在GitHub上获得。