Guide to Java URL Encoding/Decoding – Java URL 编码/解码指南

最后修改: 2016年 12月 12日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Simply put, URL encoding translates special characters from the URL to a representation that adheres to the spec and can be correctly understood and interpreted.

简单地说,URL编码将URL中的特殊字符翻译成符合规范的表示,并能被正确理解和解释。

In this tutorial, we’ll focus on how to encode/decode the URL or form data so that it adheres to the spec and transmits over the network correctly.

在本教程中,我们将重点讨论如何对URL或表单数据进行编码/解码,使其符合规范并在网络上正确传输。

2. Analyze the URL

2.分析URL

Let’s first look at a basic URI syntax:

让我们首先看看一个基本的URI语法。

scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]

The first step into encoding a URI is examining its parts and then encoding only the relevant portions.

对URI进行编码的第一步是检查其部分,然后只对相关部分进行编码。

Now let’s look at an example of a URI:

现在让我们看看URI的一个例子。

String testUrl = 
  "http://www.baeldung.com?key1=value+1&key2=value%40%21%242&key3=value%253";

One way to analyze the URI is loading the String representation to a java.net.URI class:

分析URI的一种方法是将字符串表示法加载到一个java.net.URI类。

@Test
public void givenURL_whenAnalyze_thenCorrect() throws Exception {
    URI uri = new URI(testUrl);

    assertThat(uri.getScheme(), is("http"));
    assertThat(uri.getHost(), is("www.baeldung.com"));
    assertThat(uri.getRawQuery(),
      .is("key1=value+1&key2=value%40%21%242&key3=value%253"));
}

The URI class parses the string representation URL and exposes its parts via a simple API, e.g., getXXX.

URI类解析了字符串表示的URL,并通过一个简单的API暴露其部分,例如,getXXX

3. Encode the URL

3.对URL进行编码

When encoding URI, one of the common pitfalls is encoding the complete URI. Typically, we need to encode only the query portion of the URI.

在对URI进行编码时,一个常见的陷阱是对完整的URI进行编码。通常情况下,我们只需要对URI的查询部分进行编码。

Let’s encode the data using the encode(data, encodingScheme) method of the URLEncoder class:

让我们使用encode(data, encodingScheme)类的URLEncoder方法对数据进行编码。

private String encodeValue(String value) {
    return URLEncoder.encode(value, StandardCharsets.UTF_8.toString());
}

@Test
public void givenRequestParam_whenUTF8Scheme_thenEncode() throws Exception {
    Map<String, String> requestParams = new HashMap<>();
    requestParams.put("key1", "value 1");
    requestParams.put("key2", "value@!$2");
    requestParams.put("key3", "value%3");

    String encodedURL = requestParams.keySet().stream()
      .map(key -> key + "=" + encodeValue(requestParams.get(key)))
      .collect(joining("&", "http://www.baeldung.com?", ""));

    assertThat(testUrl, is(encodedURL));

The encode method accepts two parameters:

encode方法接受两个参数。

  1. data – string to be translated
  2. encodingScheme – name of the character encoding

This encode method converts the string into application/x-www-form-urlencoded format.

这个encode方法将字符串转换为application/x-www-form-urlencoded格式。

The encoding scheme will convert special characters into two digits hexadecimal representation of eight bits that will be represented in the form of “%xy“. When we are dealing with path parameters or adding parameters that are dynamic, we will encode the data and then send to the server.

编码方案将把特殊字符转换为八位数的十六进制表示,将以”%xy“的形式表示。当我们处理路径参数或添加动态参数时,我们将对数据进行编码,然后发送到服务器。

Note: The World Wide Web Consortium Recommendation states that we should use UTF-8. Not doing so may introduce incompatibilities. (Reference: https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html)

注意: World Wide Web Consortium建议指出,我们应该使用UTF-8。不这样做可能会带来不兼容的情况。(参考。https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html)

4. Decode the URL

4.对URL进行解码

Let’s now decode the previous URL using the decode method of the URLDecoder:

现在让我们使用URLDecoder的解码方法对之前的URL进行解码。

private String decode(String value) {
    return URLDecoder.decode(value, StandardCharsets.UTF_8.toString());
}

@Test
public void givenRequestParam_whenUTF8Scheme_thenDecodeRequestParams() {
    URI uri = new URI(testUrl);

    String scheme = uri.getScheme();
    String host = uri.getHost();
    String query = uri.getRawQuery();

    String decodedQuery = Arrays.stream(query.split("&"))
      .map(param -> param.split("=")[0] + "=" + decode(param.split("=")[1]))
      .collect(Collectors.joining("&"));

    assertEquals(
      "http://www.baeldung.com?key1=value 1&key2=value@!$2&key3=value%3",
      scheme + "://" + host + "?" + decodedQuery);
}

There are two important points to remember here:

这里有两个重要的点需要记住。

  • Analyze URL before decoding
  • Use the same encoding scheme for encoding and decoding

If we were to decode and then analyze, URL portions might not be parsed correctly. If we used another encoding scheme to decode the data, it would result in garbage data.

如果我们进行解码,然后进行分析,URL部分可能不会被正确解析。如果我们使用另一种编码方案对数据进行解码,就会产生垃圾数据。

5. Encode a Path Segment

5.对一个路径段进行编码

We can’t use URLEncoder for encoding path segments of the URL. Path component refers to the hierarchical structure that represents a directory path, or it serves to locate resources separated by “/”.

我们不能使用URLEncoder来编码URL的路径段。路径段指的是代表目录路径的层次结构,或者说它的作用是定位由”/”分隔的资源。

Reserved characters in path segments are different than in query parameter values. For example, a “+” sign is a valid character in path segments and therefore should not be encoded.

路径段中的保留字符与查询参数值中的不同。例如,”+”号是路径段中的一个有效字符,因此不应该被编码。

To encode the path segment, we use the UriUtils class by Spring Framework instead.

为了对路径段进行编码,我们使用Spring Framework的UriUtils类来代替。

UriUtils class provides encodePath and encodePathSegment methods for encoding path and path segment respectively:

UriUtils类提供了encodePathencodePathSegment方法,分别用于编码路径和路径段。

private String encodePath(String path) {
    try {
        path = UriUtils.encodePath(path, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        LOGGER.error("Error encoding parameter {}", e.getMessage(), e);
    }
    return path;
}
@Test
public void givenPathSegment_thenEncodeDecode() 
  throws UnsupportedEncodingException {
    String pathSegment = "/Path 1/Path+2";
    String encodedPathSegment = encodePath(pathSegment);
    String decodedPathSegment = UriUtils.decode(encodedPathSegment, "UTF-8");
    
    assertEquals("/Path%201/Path+2", encodedPathSegment);
    assertEquals("/Path 1/Path+2", decodedPathSegment);
}

In the above code snippet, we can see that when we used the encodePathSegment method, it returned the encoded value, and + is not encoded because it is a value character in the path component.

在上面的代码片段中,我们可以看到,当我们使用encodePathSegment方法时,它返回了编码后的值,而+没有被编码,因为它是路径组件中的一个值字符。

Let’s add a path variable to our test URL:

让我们给我们的测试URL添加一个路径变量。

String testUrl
  = "/path+1?key1=value+1&key2=value%40%21%242&key3=value%253";

And to assemble and assert a properly encoded URL, we’ll change the test from Section 2:

为了组装和断言一个正确编码的URL,我们将改变第2节中的测试。

String path = "path+1";
String encodedURL = requestParams.keySet().stream()
  .map(k -> k + "=" + encodeValue(requestParams.get(k)))
  .collect(joining("&", "/" + encodePath(path) + "?", ""));
assertThat(testUrl, CoreMatchers.is(encodedURL));

6. Conclusion

6.结论

In this article, we saw how to encode and decode the data so that it can be transferred and interpreted correctly.

在这篇文章中,我们看到了如何对数据进行编码和解码,以便能够正确传输和解释。

While the article focused on encoding/decoding URI query parameter values, the approach applies to HTML form parameters as well.

虽然文章的重点是对URI查询参数值进行编码/解码,但该方法也适用于HTML表单参数。

As always, the source code is available over on GitHub.

像往常一样,源代码可在GitHub上获得