Translating Space Characters in URLEncoder – 在 URLEncoder 中翻译空格字符

最后修改: 2024年 2月 9日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.导言

When working with URLs in Java, it’s essential to ensure they are properly encoded to avoid errors and maintain accurate data transmission. URLs may contain special characters, including spaces, that need to be encoded for uniform interpretation across different systems.

在使用 Java 中的 URLs 时,必须确保对其进行正确编码,以避免错误并保持准确的数据传输。URL 可能包含特殊字符(包括空格),需要对其进行编码,以便在不同系统中进行统一解释。

In this tutorial, we’ll explore how to handle spaces within URLs using the URLEncoder class.

在本教程中,我们将探讨如何使用 URLEncoder 类处理 URL 中的空格。

2. Understand URL Encoding

2.了解 URL 编码

URLs can’t have spaces directly. To include them, we need to use URL encoding.

URL 不能直接包含空格。要包含空格,我们需要使用 URL 编码。

URL encoding, also known as percent-encoding, is a standard mechanism for converting special characters and non-ASCII characters into a format suitable for transmission via URLs.

URL编码,也称为百分比编码,是一种将特殊字符和非ASCII字符转换为适合通过URL传输的格式的标准机制。

In URL encoding, we replace each character with a percent sign ‘%’ followed by its hexadecimal representation. For example, spaces are represented as %20. This practice ensures that web servers and browsers correctly parse and interpret URLs, preventing ambiguity and errors during data transmission.

在 URL 编码中,我们用百分号”%”替换每个字符,然后用十六进制表示。例如,空格表示为 %20。这种做法可确保网络服务器和浏览器正确解析和解释 URL,防止数据传输过程中出现歧义和错误。

3. Why Use URLEncoder

3.为什么使用 URLEncoder?

The URLEncoder class is part of the Java Standard Library, specifically in the java.net package. The purpose of the URLEncoder class is to encode strings into a format suitable for use in URLs. This includes replacing special characters with percent-encoded equivalents.

URLEncoder 类是 Java 标准库的一部分,特别是在 java.net 包中。URLEncoder类的目的是将字符串编码为适合在 URL 中使用的格式。这包括将特殊字符替换为百分比编码的等价字符。

It offers static methods for encoding strings into the application/x-www-form-urlencoded MIME format, commonly used for transmitting data in HTML forms. The application/x-www-form-urlencoded format is similar to the query component of a URL but with some differences. The main difference lies in encoding the space character as a plus sign (+) instead of %20.

它提供了将字符串编码为application/x-www-form-urlencoded MIME 格式的静态方法,这种格式通常用于在 HTML 表单中传输数据。application/x-www-form-urlencoded格式类似于 URL 的查询组件,但有一些不同之处。主要区别在于将空格字符编码为加号 (+) 而不是 %20。

The URLEncoder class has two methods for encoding strings: encode(String s) and encode(String s, String enc). The first method uses the default encoding scheme of the platform. The second method allows us to specify the encoding scheme, such as UTF-8, which is the recommended standard for web applications. When we specify UTF-8 as the encoding scheme, we ensure consistent encoding and decoding of characters across different systems, thereby minimizing the risk of misinterpretation or errors in URL handling.

URLEncoder类有两个用于编码字符串的方法:encode(String s) encode(String s, String enc). 第一个方法使用平台的默认编码方案。第二种方法允许我们指定编码方案,例如UTF-8,它是网络应用程序的推荐标准。当我们指定 UTF-8 作为编码方案时,我们将确保不同系统中字符编码和解码的一致性,从而将 URL 处理中出现误读或错误的风险降至最低。

4. Implementation

4.实施

Let’s now encode the string “Welcome to the Baeldung Website!” for a URL using URLEncoder. In this example, we encode the string using the platform’s default encoding scheme, replacing spaces with the plus sign (+) symbol:

现在,让我们使用 URLEncoder 为 URL 编码字符串”欢迎访问 Baeldung 网站!“。在本例中,我们使用平台的默认编码方案对字符串进行编码,用加号 (+) 符号替换空格:

String originalString = "Welcome to the Baeldung Website!";
String encodedString = URLEncoder.encode(originalString);
assertEquals("Welcome+to+the+Baeldung+Website%21", encodedString);

Notably, the default encoding scheme used by the URLEncoder.encode() method in Java is indeed UTF-8. As such, specifying UTF-8 explicitly doesn’t change the default behavior of encoding spaces as plus signs:

值得注意的是,Java 中的 URLEncoder.encode() 方法使用的默认编码方案确实是 UTF-8。因此,明确指定 UTF-8 并不会改变将空格编码为加号的默认行为:

String originalString = "Welcome to the Baeldung Website!";
String encodedString = URLEncoder.encode(originalString, StandardCharsets.UTF_8);
assertEquals("Welcome+to+the+Baeldung+Website%21", encodedString);

However, if we want to encode the spaces for use in a URL, we may need to replace the plus sign with %20, as some web servers may not recognize the plus sign as a space. We can do this by using the replace() method of the String class:

但是,如果我们要对空格进行编码以便在 URL 中使用,我们可能需要用 %20 替换加号,因为某些网络服务器可能无法将加号识别为空格。为此,我们可以使用 String 类的 replace() 方法:

String originalString = "Welcome to the Baeldung Website!";
String encodedString = URLEncoder.encode(originalString).replace("+", "%20");
assertEquals("Welcome%20to%20the%20Baeldung%20Website%21", encodedString);

Alternatively, we can use the replaceAll() method with a regular expression \\+ to replace all occurrences of the plus sign:

或者,我们可以使用 replaceAll() 方法和正则表达式 \+ 来替换所有出现的加号:

String originalString = "Welcome to the Baeldung Website!";
String encodedString = URLEncoder.encode(originalString).replaceAll("\\+", "%20");
assertEquals("Welcome%20to%20the%20Baeldung%20Website%21", encodedString);

5. Conclusion

5.结论

In this article, we learned the fundamentals of URL encoding in Java, focusing on the URLEncoder class for encoding spaces into URL-safe formats. By explicitly specifying the encoding, such as UTF-8, we can ensure consistent representation of space characters in URLs.

在本文中,我们学习了 Java 中 URL 编码的基础知识,重点是用于将空格编码为 URL 安全格式的 URLEncoder 类。通过明确指定编码(如 UTF-8),我们可以确保 URL 中空格字符的一致表示。

As always, the code for the examples is available over on GitHub.

与往常一样,这些示例的代码可在 GitHub 上获取。