Making Tomcat UTF-8-Ready – 使Tomcat具备UTF-8功能

最后修改: 2017年 11月 4日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.介绍

UTF-8 is the most common character encoding used in web applications. It supports all languages currently spoken in the world including Chinese, Korean, and Japanese.

UTF-8是网络应用中最常用的字符编码。它支持目前世界上的所有语言,包括中文、韩文和日文。

In this article, we demonstrate all configuration needed to ensure UTF-8 in Tomcat.

在这篇文章中,我们展示了在Tomcat中确保UTF-8所需的所有配置。

2. Connector Configuration

2.连接器配置

A Connector listens for connections on a specific port. We need to make sure that all of our Connectors use UTF-8 to encode requests.

一个连接器在一个特定的端口上监听连接。我们需要确保我们所有的连接器都使用UTF-8来编码请求。

Let’s add the parameter URIEncoding=”UTF-8″ to all the Connectors in TOMCAT_ROOT/conf/server.xml:

让我们在TOMCAT_ROOT/conf/server.xml的所有连接器中添加参数URIEncoding=”UTF-8″

<Connector 
  URIEncoding="UTF-8" 
  port="8080" 
  redirectPort="8443" 
  connectionTimeout="20000" 
  protocol="HTTP/1.1"/>

<Connector 
  URIEncoding="UTF-8" 
  port="8009" 
  redirectPort="8443" 
  protocol="AJP/1.3"/>

3. Character Set Filter

3.字符集过滤器

After configuring the connector, it’s time to force the web application to handle all requests and responses in UTF-8.

在配置完连接器后,是时候强制Web应用程序以UTF-8处理所有请求和响应了。

Let’s define a class named CharacterSetFilter:

让我们定义一个名为CharacterSetFilter的类。

public class CharacterSetFilter implements Filter {

    // ...

    public void doFilter(
      ServletRequest request, 
      ServletResponse response, 
      FilterChain next) throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        response.setContentType("text/html; charset=UTF-8");
        response.setCharacterEncoding("UTF-8");
        next.doFilter(request, response);
    }

    // ...
}

We need to add the filter to our application’s web.xml so that it’s applied to all requests and responses:

我们需要将过滤器添加到我们应用程序的web.xml中,以便它被应用于所有请求和响应。

<filter>
    <filter-name>CharacterSetFilter</filter-name>
    <filter-class>com.baeldung.CharacterSetFilter</filter-class>
</filter>

<filter-mapping>
    <filter-name>CharacterSetFilter</filter-name>
    <url-pattern>/*</url-pattern>
</filter-mapping>

4. Server Page Encoding

4.服务器页面编码

The other part of our web application we need to configure is Java server pages.

我们需要配置的网络应用的另一部分是Java服务器页面。

The best way to ensure UTF-8 in server pages is to add this tag at the top of each JSP page:

确保服务器页面采用UTF-8的最佳方法是在每个JSP页面的顶部添加这个标签:

<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

5. HTML Page Encoding

5.HTML页面编码

While server page encoding tells JVM how to handle page characters, HTML page encoding tells the browser how to handle page characters.

服务器页面编码告诉JVM如何处理页面字符,而HTML页面编码告诉浏览器如何处理页面字符。

We should add this <meta> tag in the head section of all HTML pages:

我们应该在所有HTML页面的<meta>标签中添加这个head部分。

<meta http-equiv='Content-Type' content='text/html; charset=UTF-8' />

6. MySQL Server Configuration

6.MySQL服务器配置

Now, that our Tomcat is configured, it’s time to configure the database.

现在,我们的Tomcat已经配置好了,是时候配置数据库了。

We assume that a MySQL server is used. The configuration file is named my.ini on Windows and my.cnf on Linux.

我们假设使用的是MySQL服务器。配置文件在Windows下名为my.ini,在Linux下名为my.cnf

We need to find the configuration file, search for these parameters, and edit them accordingly:

我们需要找到配置文件,搜索这些参数,并相应地编辑它们。

[client]
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

We need to restart MySQL server for the changes to take effect.

我们需要重新启动MySQL服务器以使这些变化生效。

7. MySQL Database Configuration

7.MySQL数据库配置

MySQL server character set configuration is only applied to new databases. We need to migrate old ones manually. This can be easily achieved using a few commands.

MySQL服务器字符集配置只适用于新数据库。我们需要手动迁移旧的数据库。这可以通过几个命令轻松实现。

For each database:

对于每个数据库。

ALTER DATABASE database_name CHARACTER SET = utf8mb4 
    COLLATE = utf8mb4_unicode_ci;

For each table:

对于每张表。

ALTER TABLE table_name CONVERT TO 
    CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

For each VARCHAR or TEXT column:

对于每个VARCHARTEXT列。

ALTER TABLE table_name CHANGE column_name column_name 
    VARCHAR(69) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

If we’re passing data with UTF-8 characters in database queries, we need to ensure that any database connection made is UTF-8 encoding compliant.

如果我们在数据库查询中传递带有UTF-8字符的数据,我们需要确保任何数据库连接是符合UTF-8编码的。

For JDBC based connection this can be achieved with following connection URL:

对于基于JDBC的连接,可以通过以下连接URL实现。

jdbc:mysql://localhost:3306/?useUnicode=yes;characterEncoding=UTF-8

8. Conclusion

8.结论

In this article, we demonstrated how to ensure Tomcat uses the UTF-8 encoding.

在这篇文章中,我们演示了如何确保Tomcat使用UTF-8编码。