1. Overview
1.概述
There are occasions when we have to compress files to pack multiple files into a single archive for convenient transfer and saving space. For this use case, Zip is a widely used archive file format in compression.
有时,我们需要压缩文件,将多个文件打包成一个归档文件,以方便传输和节省空间。在这种情况下,Zip 是一种广泛使用的压缩归档文件格式。
Java provides a standard set of classes like ZipFile and ZipInputStream to access zip files. In this tutorial, we’ll learn how to use them to read zip files. Also, we’ll explore their functional differences and evaluate their performance.
Java 提供了一组标准类,如 ZipFile 和 ZipInputStream 来访问压缩文件。在本教程中,我们将学习如何使用它们读取 zip 文件。此外,我们还将探讨它们的功能差异并评估它们的性能。
2. Create a Zip File
2.创建 Zip 文件
Before we dive into the code for reading zip files, let us review the process of creating a zip file first.
在深入学习读取 zip 文件的代码之前,让我们先回顾一下 创建 zip 文件的过程。
In the following code snippet, we’ll have two variables. data stores the content to be compressed, and file represents our destination file:
在下面的代码片段中,我们将使用两个变量。data 存储要压缩的内容,file 表示目标文件:
String data = "..."; // a very long String
try (BufferedOutputStream bos = new BufferedOutputStream(new FileOutputStream(file));
ZipOutputStream zos = new ZipOutputStream(bos)) {
ZipEntry zipEntry = new ZipEntry("zip-entry.txt");
zos.putNextEntry(zipEntry);
zos.write(data);
zos.closeEntry();
}
This snippet archives the data to a zip entry called zip-entry.txt and then writes the entry to the target file.
该代码段将 data 归档到名为 zip-entry.txt 的 zip 条目中,然后将条目写入目标 文件。
3. Read via ZipFile
3.通过 ZipFile 读取
First, let’s see how we read all entries from a zip file via the ZipFile class:
首先,让我们看看如何通过 ZipFile 类读取 zip 文件中的所有条目:
try (ZipFile zipFile = new ZipFile(compressedFile)) {
Enumeration<? extends ZipEntry> zipEntries = zipFile.entries();
while (zipEntries.hasMoreElements()) {
ZipEntry zipEntry = zipEntries.nextElement();
try (InputStream inputStream = new BufferedInputStream(zipFile.getInputStream(zipEntry))) {
// Read data from InputStream
}
}
}
We create an instance of ZipFile to read the compressed file. ZipFile.entries() returns all zip entries in the zip file. We can then obtain the InputStream from the ZipEntry to read the content of it.
我们创建一个 ZipFile 实例来读取压缩文件。ZipFile.entries() 返回压缩文件中的所有压缩条目。然后,我们可以从 ZipEntry 中获取 InputStream 以读取其中的内容。
In addition to entries(), ZipFile has a method getEntry(…), which allows us to randomly access a specific ZipEntry based on the entry name:
除了 entries() 之外,ZipFile 还有一个方法 getEntry(…),它允许我们根据条目名称随机访问特定的 ZipEntry :
ZipEntry zipEntry = zipFile.getEntry("str-data-10.txt");
try (InputStream inputStream = new BufferedInputStream(zipFile.getInputStream(zipEntry))) {
// Read data from InputStream
}
4. Read via ZipInputStream
4.通过 ZipInputStream 读取</em
Next, we’ll go through a typical example of reading all entries from a zip file via the ZipInputStream:
接下来,我们将举例说明如何通过 ZipInputStream 从压缩文件中读取所有条目:
try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream(compressedFile));
ZipInputStream zipInputStream = new ZipInputStream(bis)) {
ZipEntry zipEntry;
while ((zipEntry = zipInputStream.getNextEntry()) != null) {
// Read data from ZipInputStream
}
}
We create a ZipInputStream to wrap the source of data, which is compressedFile in our case. After that, we iterate the ZipInputStream by getNextEntry().
我们创建一个 ZipInputStream 来封装数据源,在本例中就是 compressedFile 。然后,我们通过 getNextEntry() 遍历 ZipInputStream 。
Within the loop, we read the data of each ZipEntry by reading the data from ZipIputStream. Once we complete the reading of an entry, then we call getNextEntry() again to signify we’re going to read the next entry.
在循环中,我们通过从 ZipIputStream 中读取数据来读取每个 ZipEntry 的数据。完成一个条目的读取后,我们会再次调用 getNextEntry() 来表示我们将读取下一个条目。
5. Functional Differences
5.功能差异
Although both classes can serve the purpose of reading entries from a zip file, they have two distinct functional differences.
虽然这两个类都能达到从 zip 文件读取条目的目的,但它们在功能上有两个明显的区别。
5.1. Access Type
5.1.访问类型
The major difference between them is that ZipFile supports random access, whereas ZipInputStream supports sequential access only.
它们之间的主要区别在于 ZipFile 支持随机存取,而 ZipInputStream 仅支持顺序访问。
In ZipFile, we can extract a specific entry by calling ZipFile.getEntry(…). This characteristic is particularly favorable when we need only a specific entry within ZipFile. If we want to achieve the same in ZipInputStream, we have to loop through each ZipEntry until we find a match during the iteration.
在 ZipFile 中,我们可以通过调用 ZipFile.getEntry(…) 来提取特定条目。当我们只需要 ZipFile 中的一个特定条目时,这一特性尤为有利。如果我们想在 ZipInputStream 中实现同样的功能,我们就必须循环查看每个 ZipEntry 直到在迭代过程中找到匹配项。
5.2. Data Source
5.2.数据来源
ZipFile requires the data source to be a physical file, whereas ZipInputStream only requires an InputStream. There may be a scenario that our data isn’t a file. For example, our data is coming from a network stream. In such a case, we must convert the whole InputStream to a file before we can process it using ZipFile.
ZipFile 要求数据源是一个物理文件,而 ZipInputStream 只要求一个 InputStream 。在某些情况下,我们的数据可能不是文件。例如,我们的数据来自网络流。在这种情况下,我们必须先将整个 InputStream 转换为文件,然后才能使用 ZipFile 进行处理。
6. Performance Comparison
6.性能比较
We’ve gone through the functional differences between ZipFile and ZipInputStream. Now, let’s explore further differences in terms of performance.
我们已经了解了 ZipFile 和 ZipInputStream 在功能上的差异。现在,让我们进一步探讨性能方面的差异。
We’ll use JMH (Java Microbenchmark Harness) to capture the processing speed between these two. JMH is a framework designed for measuring the performance of code snippets.
我们将使用 JMH(Java Microbenchmark Harness)来捕捉这两者之间的处理速度。JMH 是一个用于测量代码片段性能的框架。
Before we proceed to the benchmarking, we’ve to include the following Maven dependency in our pom.xml:
在进行基准测试之前,我们必须在 pom.xml 中加入以下 Maven 依赖项:
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.37</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.37</version>
</dependency>
The latest version of JMH Core and Annotation can be found in Maven Central.
最新版本的 JMH Core 和 Annotation 可在 Maven Central 中找到。
6.1. Read All Entries
6.1.阅读所有条目
In this experiment, we aim to assess the performance of reading all entries from a zip file. In our setup, we have a zip file containing 10 entries, and each comprises 200KB of data. We’ll read them via ZipFile and ZipInputStream separately:
在本实验中,我们旨在评估从压缩文件中读取所有条目的性能。在我们的设置中,一个 zip 文件包含 10 个条目,每个条目包含 200KB 的数据。我们将通过 ZipFile 和 ZipInputStream 分别读取它们:
Class | Running time (in milliseconds) |
---|---|
ZipFile | 11.072 |
ZipInputStream | 11.642 |
From the results, we cannot see any significant performance difference between both classes. The difference is within 10% in terms of running time. They both demonstrated comparable efficiency when reading all entries from a zip file.
从结果来看,我们看不出两个类别之间有任何显著的性能差异。就运行时间而言,两者的差距在 10%以内。在读取 zip 文件中的所有条目时,它们的效率相当。
6.2. Read the Last Entry
6.2.读取最后一条记录
Next, we’ll specifically target reading the last entry from the same zip file:
接下来,我们将专门从同一个压缩文件中读取最后一个条目:
Class | Running time (in milliseconds) |
---|---|
ZipFile | 1.016 |
ZipInputStream | 12.830 |
There is a huge difference between them this time. ZipFile requires only 1/10 of the time to read a single entry out of 10 in comparison to reading all entries, while ZipInputStream spends pretty much the same amount of time.
这一次,它们之间的差异非常大。与读取所有条目相比,ZipFile 读取 10 个条目中的一个条目所需的时间只有后者的 1/10,而 ZipInputStream 花费的时间几乎相同。
We can observe the ZipInputStream reads the entries sequentially from the results. The input stream must be read through from the beginning of the zip file until the target entry is located, whereas ZipFile allows jumping to the target entry without reading the entire file.
我们可以看到,ZipInputStream 是按顺序从结果中读取条目的。输入流必须从压缩文件的开头读起,直到找到目标条目,而 ZipFile 允许跳转到目标条目,而无需读取整个文件。
The results indicate the importance of choosing ZipFile over ZipInputStream, particularly when dealing with a small number of entries within a large set of entries.
结果表明,选择 ZipFile 而不是 ZipInputStream 非常重要,尤其是在处理大型条目集中的少量条目时。
7. Conclusion
7.结论
In software development, it’s common to deal with compressed files using zip. Java offers two different classes, ZipFile and ZipIputStream, to read zip files.
在软件开发中,使用 zip 处理压缩文件很常见。Java 提供了 ZipFile 和 ZipIputStream 这两个不同的类来读取压缩文件。
In this article, we’ve explored their usage and functional differences. We also evaluated the performance between them.
在本文中,我们探讨了它们的用法和功能差异。我们还对它们的性能进行了评估。
The choice between them depends on our requirements. We’ll choose ZipFile when we’re dealing with a limited number of entries within a large zip archive to ensure optimal performance. In contrast, we’ll choose ZipInputStream if our source of data isn’t a file.
它们之间的选择取决于我们的要求。为了确保最佳性能,当我们处理大型压缩包中数量有限的条目时,我们会选择 ZipFile 。相反,如果数据源不是文件,我们会选择 ZipInputStream 。
As always, the full source code of our examples can be found over on GitHub.
一如既往,我们示例的完整源代码可在 GitHub 上找到。