Reducing JSON Data Size – 减少JSON数据大小

最后修改: 2020年 9月 22日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.绪论

Java applications often use JSON as a common format for sending and receiving data. Moreover, it’s used as a serialization protocol for storing data. With smaller JSON data sizes, our applications become cheaper and faster.

Java应用程序经常使用JSON作为发送和接收数据的通用格式。此外,它还被用作存储数据的序列化协议。有了更小的JSON数据尺寸,我们的应用程序变得更便宜、更快速。

In this tutorial, we’ll look at various ways of reducing the size of JSON in our Java applications.

在本教程中,我们将研究如何在我们的Java应用程序中减少JSON的大小的各种方法。

2. Domain Model and Test Data

2.领域模型和测试数据

Let’s create a domain model for a Customer with some contact data:

让我们为一个有一些联系数据的客户创建一个领域模型。

public class Customer {
    private long id;
    private String firstName;
    private String lastName;
    private String street;
    private String postalCode;
    private String city;
    private String state;
    private String phoneNumber;
    private String email;

Note that all fields will be mandatory, except for phoneNumber and email.

请注意,除了phoneNumberemail之外,所有字段都是强制性的。

To properly test JSON data size differences, we need at least a few hundred Customer instances. They must have different data to make our tests more lifelike. The data generation web site mockaroo helps us here. We can create 1,000 JSON data records there for free, in our own format, and with authentic test data.

为了正确测试JSON数据大小的差异,我们需要至少几百个Customer实例。它们必须有不同的数据,以使我们的测试更加逼真。数据生成网站mockaroo在这里帮助我们。我们可以在那里免费创建1000条JSON数据记录,采用我们自己的格式,并且有真实的测试数据。

Let’s configure mockaroo for our domain model:

让我们为我们的领域模型配置mockaroo

Here, some items to keep in mind:

这里,有一些项目需要记住。

  • This is where we specified the field names
  • Here we selected the data types of our fields
  • 50% of the phone numbers are empty in the mock data
  • 30% of the email addresses are empty, too

All the code examples below use the same data of 1,000 customers from mockaroo. We use the factory method Customer.fromMockFile() to read that file and turn it into Customer objects.

下面的所有代码示例都使用了来自mockaroo的1000个客户的相同数据。我们使用工厂方法Customer.fromMockFile() 来读取该文件并将其转化为Customer对象。

We’ll be using Jackson as our JSON processing library.

我们将使用Jackson>作为我们的JSON处理库。

3. JSON Data Size with Jackson Default Options

3.使用Jackson默认选项的JSON数据大小

Let’s write a Java object to JSON with the default Jackson options:

让我们用Jackson的默认选项把一个Java对象写成JSON。

Customer[] customers = Customer.fromMockFile();
ObjectMapper mapper = new ObjectMapper();
byte[] feedback = mapper.writeValueAsBytes(customers); 

Let’s see the mock data for the first Customer:

让我们看看第一个Customer的模拟数据。

{
  "id" : 1, 
  "firstName" : "Horatius", 
  "lastName" : "Strognell", 
  "street" : "4848 New Castle Point", 
  "postalCode" : "33432", 
  "city" : "Boca Raton", 
  "state" : "FL", 
  "phoneNumber" : "561-824-9105", 
  "email" : "hstrognell0@dailymail.co.uk"
}

When using default Jackon options, the JSON data byte array with all 1,000 customers is 181.0 KB in size.

当使用默认的Jackon选项时,包含所有1,000个客户的JSON数据字节数组大小为181.0KB

4. Compressing with gzip

4.用gzip进行压缩

As text data, JSON data compresses nicely. That’s why gzip is our first option to reduce the JSON data size. Moreover, it can be automatically applied in HTTP, the common protocol for sending and receiving JSON.

作为文本数据,JSON数据的压缩性很好。这就是为什么gzip是我们减少JSON数据大小的第一选择。此外,它可以在HTTP中自动应用,这是发送和接收JSON的通用协议。

Let’s take the JSON produced with the default Jackson options and compress it with gzip. This results in 45.9 KB, just 25.3% of the original size. So if we can enable gzip compression through configuration, we’ll cut down the JSON data size by 75% without any change to our Java code!

让我们把用Jackson默认选项生成的JSON,用gzip进行压缩。结果是45.9 KB,只是原始大小的25.3%。因此,如果我们能通过配置启用gzip压缩,我们就能将JSON数据的大小减少75%,而无需对我们的Java代码做任何改动

If our Spring Boot application delivers the JSON data to other services or front-ends, then we’ll enable gzip compression in the Spring Boot configuration. Let’s see a typical compression configuration in YAML syntax:

如果我们的Spring Boot应用程序将JSON数据传递给其他服务或前端,那么我们将在Spring Boot配置中启用gzip压缩。让我们看看YAML语法中的典型压缩配置。

server:
  compression:
    enabled: true
    mime-types: text/html,text/plain,text/css,application/javascript,application/json
    min-response-size: 1024

First, we enabled compression in general by setting enabled as true. Then, we specifically enabled JSON data compression by adding application/json to the list of mime-types. Finally, notice that we set min-response-size to 1,024 bytes long. This is because if we compress short amounts of data, we may produce bigger data than the original.

首先,我们通过设置enabled为true来启用一般的压缩。然后,我们将application/json添加到mime-types列表中,特别启用JSON数据压缩。最后,注意我们把min-response-size设置为1,024字节长。这是因为如果我们压缩短的数据,我们可能会产生比原始数据更大的数据。

Often, proxies such as NGINX or web servers such as the Apache HTTP Server deliver the JSON data to other services or front-ends. Configuring JSON data compression in these tools is beyond the scope of this tutorial.

通常,诸如NGINX等代理机构或Apache HTTP Server等网络服务器将JSON数据传递给其他服务或前端。在这些工具中配置JSON数据压缩已经超出了本教程的范围。

A previous tutorial on gzip tells us that gzip has various compression levels. Our code examples use gzip with the default Java compression level. Spring Boot, proxies, or web servers may get different compression results for the same JSON data.

之前关于gzip的教程告诉我们,gzip有各种压缩级别。我们的代码示例使用了gzip的默认Java压缩级别。Spring Boot、代理或 Web 服务器对于相同的 JSON 数据可能会得到不同的压缩结果。

If we use JSON as the serialization protocol to store data, we’ll need to compress and decompress the data ourselves.

如果我们使用JSON作为序列化协议来存储数据,我们就需要自己对数据进行压缩和解压。

5. Shorter Field Names in JSON

5.JSON中更短的字段名

It’s a best practice to use field names that are neither too short nor too long. Let’s omit this for the sake of demonstration: We’ll use single-character field names in JSON, but we’ll not change the Java field names. This reduces the JSON data size but lowers JSON readability. Since it would also require updates to all services and front-ends, we’ll probably use these short field names only when storing data:

使用字段名,既不要太短也不要太长,这是一个最佳做法。为了便于演示,我们省略这一点。我们将在JSON中使用单字符字段名,但我们不会改变Java字段名。这可以减少JSON数据的大小,但降低了JSON的可读性。由于这也需要对所有的服务和前端进行更新,我们可能只在存储数据时使用这些简短的字段名。

{
  "i" : 1,
  "f" : "Horatius",
  "l" : "Strognell",
  "s" : "4848 New Castle Point",
  "p" : "33432",
  "c" : "Boca Raton",
  "a" : "FL",
  "o" : "561-824-9105",
  "e" : "hstrognell0@dailymail.co.uk"
}

It’s easy to change the JSON field names with Jackson while leaving the Java field names intact. We’ll use the @JsonProperty annotation:

用Jackson改变JSON字段的名称,同时保留Java字段的名称,这很容易。我们将使用@JsonProperty注解。

@JsonProperty("p")
private String postalCode;

Using single-character field names leads to data that is 72.5% of the original size. Moreover, using gzip will compress that to 23.8%. That’s not much smaller than the 25.3% we got from simply compressing the original data with gzip. We always need to look for a suitable cost-benefit relation. Losing readability for a small gain in size won’t be recommendable for most scenarios.

使用单字符字段名导致的数据是原始大小的 72.5%。此外,使用gzip将压缩到23.8%。这并不比我们用gzip简单压缩原始数据得到的25.3%小多少。我们总是需要寻找一个合适的成本-效益关系。在大多数情况下,为了小规模的收益而失去可读性是不可取的。

6. Serializing to an Array

6.序列化为一个数组

Let’s see how we can further reduce the JSON data size by leaving out the field names altogether. We can achieve this by storing a customers array in our JSON. Notice that we’ll be also reducing readability. And we’ll also need to update all the services and front-ends that use our JSON data:

让我们看看我们如何通过完全不使用字段名来进一步减少JSON数据的大小。我们可以通过在JSON中存储一个customers数组来实现这一目标。请注意,我们也将减少可读性。而且我们还需要更新所有使用我们的JSON数据的服务和前端。

[ 1, "Horatius", "Strognell", "4848 New Castle Point", "33432", "Boca Raton", "FL", "561-824-9105", "hstrognell0@dailymail.co.uk" ]

Storing the Customer as an array leads to output that’s 53.1% of the original size, and 22.0% with gzip compression. This is our best result so far. Still, 22% is not significantly smaller than the 25.3% we got from merely compressing the original data with gzip.

Customer 存储为一个数组,导致输出为原始大小的 53.1%,而使用 gzip 压缩时为 22.0%。 这是我们迄今为止最好的结果。然而,22% 并没有明显小于我们仅仅用 gzip 压缩原始数据得到的 25.3%。

In order to serialize a customer as an array, we need to take full control of JSON serialization. Refer again to our Jackson tutorial for more examples.

为了将客户序列化为一个数组,我们需要完全控制JSON序列化。请再次参考我们的Jackson教程以了解更多的例子。

7. Excluding null Values

7.排除null

Jackson and other JSON processing libraries may not handle JSON null values correctly when reading or writing JSON. For example, Jackson writes a JSON null value by default when it encounters a Java null value. That’s why it’s a good practice to remove empty fields in JSON data. This leaves the initialization of empty values to each JSON processing library and reduces the JSON data size.

Jackson和其他JSON处理库在读取或写入JSON时可能无法正确处理JSON null值。例如,Jackson在遇到Java的null值时,默认会写入一个JSONnull值。这就是为什么删除JSON数据中的空字段是一个好的做法。这将空值的初始化留给每个JSON处理库,并减少JSON数据的大小。

In our mock data, we set 50% of the phone numbers, and 30% of the email addresses, as empty. Leaving out these null values reduces our JSON data size to 166.8kB or 92.1% of the original data size. Then, gzip compression will drop it to 24.9%.

在我们的模拟数据中,我们将50%的电话号码和30%的电子邮件地址设置为空。撇开这些null值,我们的JSON数据大小减少到166.8kB,即原始数据大小的92.1%。然后,gzip压缩会将其降至24.9%。

Now, if we combine ignoring null values with the shorter field names from the previous section, then we’ll get more significant savings: 68.3% of the original size and 23.4% with gzip.

现在,如果我们把忽略null值和上一节中较短的字段名结合起来,那么我们会得到更显著的节省。原始大小的68.3%,使用gzip时为23.4%。

We can configure the omission of null value fields in Jackson per class or globally for all classes.

我们可以在Jackson每个类所有类的全局中配置省略null值域。

8. New Domain Class

8.新域类

We achieved the smallest JSON data size so far by serializing it to an array. One way of reducing that even further is a new domain model with fewer fields. But why would we do that?

我们通过将其序列化为一个数组,实现了迄今为止最小的JSON数据大小。进一步减少的一个方法是建立一个字段较少的新领域模型。但是我们为什么要这样做呢?

Let’s imagine a front-end for our JSON data that shows all customers as a table with two columns: name and street address. Let’s write JSON data specifically for this front-end:

让我们想象一下,我们的JSON数据的前端显示所有客户,作为一个有两列的表格:姓名和街道地址。让我们专门为这个前端编写JSON数据。

{
  "id" : 1,
  "name" : "Horatius Strognell",
  "address" : "4848 New Castle Point, Boca Raton FL 33432"
}

Notice how we concatenated the name fields into name and the address fields into address. Also, we left out email and phoneNumber.

注意我们如何将姓名字段串联成name,将地址字段串联成address。此外,我们还遗漏了emailphoneNumber

This should produce much smaller JSON data. It also saves the front-end from concatenating the Customer fields. But on the downside, this couples our back-end tightly to the front-end.

这应该产生更小的JSON数据。它也省去了前端对Customer字段的连接。但在缺点上,这将我们的后端与前端紧密结合起来

Let’s create a new domain class CustomerSlim for this front-end:

让我们为这个前端创建一个新的域类CustomerSlim

public class CustomerSlim {
    private long id;
    private String name;
    private String address;

If we convert our test data to this new CustomerSlim domain class, we‘ll reduce it to 46.1% of the original size. That will be using the default Jackson settings. If we use gzip it goes down to 15.1%. This last result is already a significant gain over the previous best result of 22.0%.

如果我们将我们的测试数据转换为这个新的CustomerSlim域类,我们将把它减少到原始尺寸的46.1%。这将是使用Jackson的默认设置。如果我们使用gzip,它将下降到15.1%。最后这个结果已经比之前的最佳结果22.0%有了很大的进步。

Next, if we also use one-character field names, this gets us down to 40.7% of the original size, with gzip further reducing this to 14.7%. This result is only a small gain of over 15.1% we reached with the Jackson default settings.

接下来,如果我们也使用一个字符的字段名,这将使我们减少到原始大小的40.7%,gzip进一步减少到14.7%。这个结果只是我们在杰克逊默认设置下达到的15.1%以上的一个小的收益。

No fields in CustomerSlim are optional, so leaving out empty values has no effect on the JSON data size.

CustomerSlim中没有字段是可选的,所以省去空值对JSON数据大小没有影响。

Our last optimization is the serialization of an array. By serializing CustomerSlim to an array, we achieve our best result: 34.2% of the original size and 14.2% with gzip. So even without compression, we remove nearly two-thirds of the original data. And compression shrinks our JSON data to just one-seventh of the original size!

我们的最后一项优化是对数组进行序列化。通过将 CustomerSlim 序列化为一个数组,我们取得了最好的结果:原始大小的 34.2%,使用 gzip 时为 14.2%。因此,即使没有压缩,我们也删除了近三分之二的原始数据。而压缩后,我们的 JSON 数据仅缩小到原始大小的七分之一

9. Conclusion

9.结语

In this article, we first saw why we need to reduce JSON data sizes. Next, we learned various ways to reduce this JSON data size. Finally, we learned how to further reduce JSON data size with a domain model that’s custom to one front-end.

在这篇文章中,我们首先看到为什么我们需要减少JSON数据的大小。接下来,我们学习了减少这种JSON数据大小的各种方法。最后,我们学习了如何通过一个前端定制的领域模型来进一步减少JSON数据的大小。

The complete code is available, as always, over on GitHub.

完整的代码一如既往地在GitHub上提供,超过GitHub