1. Overview
1.概述
In this short tutorial, we’ll see how to do Base64 encoding and decoding of a PDF file using Java 8 and Apache Commons Codec.
在这个简短的教程中,我们将看到如何使用Java 8和Apache Commons Codec对一个PDF文件进行Base64编码和解码。
But first, let’s take a quick peek at the basics of Base64.
但首先,让我们快速浏览一下Base64的基础知识。
2. Basics of Base64
2.Base64的基础知识
When sending data over the wire, we need to send it in the binary format. But if we send just 0’s and 1’s, different transport layer protocols may interpret them differently and our data might get corrupted in flight.
在电线上发送数据时,我们需要以二进制格式发送数据。但如果我们只发送0和1,不同的传输层协议可能会对它们做出不同的解释,我们的数据可能会在飞行中被损坏。
So, to have portability and common standards while transferring binary data, Base64 came in the picture.
因此,为了在传输二进制数据时拥有可移植性和通用标准,Base64出现了。
Since the sender and receiver both understand and have agreed upon using the standard, the probability of our data getting lost or misinterpreted is greatly reduced.
由于发送方和接收方都理解并同意使用该标准,我们的数据丢失或被曲解的概率大大降低。
Now let’s see a couple of ways to apply this to a PDF.
现在让我们看看将其应用于PDF的几种方法。
3. Conversion Using Java 8
3.使用Java 8进行转换
Starting with Java 8, we have a utility java.util.Base64 that provides encoders and decoders for the Base64 encoding scheme. It supports Basic, URL safe and MIME types as specified in RFC 4648 and RFC 2045.
从Java 8开始,我们有一个实用程序java.util.Base64,为Base64编码方案提供编码器和解码器。它支持RFC 4648和RFC 2045中规定的Basic、URL safe和MIME类型。
3.1. Encoding
3.1. 编码
To convert a PDF into Base64, we first need to get it in bytes and pass it through java.util.Base64.Encoder‘s encode method:
为了将一个PDF转换为Base64,我们首先需要以字节为单位获取它,并通过java.util.Base64.Encoder的encode方法。
byte[] inFileBytes = Files.readAllBytes(Paths.get(IN_FILE));
byte[] encoded = java.util.Base64.getEncoder().encode(inFileBytes);
Here, IN_FILE is the path to our input PDF.
这里,IN_FILE是我们输入PDF的路径。
3.2. Streaming Encoding
3.2.流媒体编码
For larger files or systems with limited memory, it’s much more efficient to perform the encoding using a stream instead of reading all the data in memory. Let’s look at how to accomplish this:
对于较大的文件或内存有限的系统,使用流来执行编码,而不是读取内存中的所有数据,效率会高很多。让我们来看看如何实现这一点。
try (OutputStream os = java.util.Base64.getEncoder().wrap(new FileOutputStream(OUT_FILE));
FileInputStream fis = new FileInputStream(IN_FILE)) {
byte[] bytes = new byte[1024];
int read;
while ((read = fis.read(bytes)) > -1) {
os.write(bytes, 0, read);
}
}
Here, IN_FILE is the path to our input PDF, and OUT_FILE is the path to a file containing the Base64-encoded document. Instead of reading the entire PDF into memory and then encoding the full document in memory, we are reading up to 1Kb of data at a time and passing that data through the encoder into the OutputStream.
这里,IN_FILE是我们输入PDF的路径,OUT_FILE是包含Base64编码的文件的路径。我们不是将整个PDF读入内存,然后在内存中对整个文档进行编码,而是一次最多读取1Kb的数据,并将这些数据通过编码器传入OutputStream。
3.3. Decoding
3.3.解码
At the receiving end, we get the encoded file.
在接收端,我们得到编码后的文件。
So we now need to decode it to get back our original bytes and write them to a FileOutputStream to get the decoded PDF:
所以我们现在需要对其进行解码,以取回我们的原始字节,并将其写入FileOutputStream以获得解码后的PDF。
byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
FileOutputStream fos = new FileOutputStream(OUT_FILE);
fos.write(decoded);
fos.flush();
fos.close();
Here, OUT_FILE is the path to our PDF to be created.
这里,OUT_FILE是我们要创建的PDF的路径。
4. Conversion Using Apache Commons
4.使用Apache Commons进行转换
Next, we’ll be using the Apache Commons Codec package to achieve the same. It’s based on RFC 2045 and predates the Java 8 implementation we discussed earlier. So, when we need to support multiple JDK versions (including legacy ones) or vendors, this comes in handy as a third-party API.
接下来,我们将使用Apache Commons Codec包来实现同样的功能。它基于RFC 2045,并且早于我们之前讨论的Java 8实现。因此,当我们需要支持多个JDK版本(包括遗留版本)或供应商时,作为第三方API,它就会派上用场。
4.1. Maven
4.1. Maven
To be able to use the Apache library, we need to add a dependency to our pom.xml:
为了能够使用Apache库,我们需要在我们的pom.xml中添加一个依赖项。
<dependency>
<groupId>commons-codec</groupId>
<artifactId>commons-codec</artifactId>
<version>1.14</version>
</dependency>
The latest version of the above can be found on Maven Central.
上述内容的最新版本可在Maven Central上找到。
4.2. Encoding
4.2.编码
The steps are the same as for Java 8, except that this time, we pass on our original bytes to the encodeBase64 method of the org.apache.commons.codec.binary.Base64 class:
其步骤与Java 8相同,只是这次我们将原始字节传递给org.apache.commons.codec.binary.Base64类的encodeBase64方法。
byte[] inFileBytes = Files.readAllBytes(Paths.get(IN_FILE));
byte[] encoded = org.apache.commons.codec.binary.Base64.encodeBase64(inFileBytes);
4.3. Streaming Encoding
4.3.流媒体编码
Streaming encoding is not supported by this library.
这个库不支持流式编码。
4.4. Decoding
4.4.解码
Again, we simply call the decodeBase64 method and write the result to a file:
同样,我们简单地调用decodeBase64方法并将结果写入文件。
byte[] decoded = org.apache.commons.codec.binary.Base64.decodeBase64(encoded);
FileOutputStream fos = new FileOutputStream(OUT_FILE);
fos.write(decoded);
fos.flush();
fos.close();
5. Testing
5.测试
Now we’ll test our encoding and decoding using a simple JUnit test:
现在我们将使用一个简单的JUnit测试来测试我们的编码和解码。
public class EncodeDecodeUnitTest {
private static final String IN_FILE = // path to file to be encoded from;
private static final String OUT_FILE = // path to file to be decoded into;
private static byte[] inFileBytes;
@BeforeClass
public static void fileToByteArray() throws IOException {
inFileBytes = Files.readAllBytes(Paths.get(IN_FILE));
}
@Test
public void givenJavaBase64_whenEncoded_thenDecodedOK() throws IOException {
byte[] encoded = java.util.Base64.getEncoder().encode(inFileBytes);
byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
writeToFile(OUT_FILE, decoded);
assertNotEquals(encoded.length, decoded.length);
assertEquals(inFileBytes.length, decoded.length);
assertArrayEquals(decoded, inFileBytes);
}
@Test
public void givenJavaBase64_whenEncodedStream_thenDecodedStreamOK() throws IOException {
try (OutputStream os = java.util.Base64.getEncoder().wrap(new FileOutputStream(OUT_FILE));
FileInputStream fis = new FileInputStream(IN_FILE)) {
byte[] bytes = new byte[1024];
int read;
while ((read = fis.read(bytes)) > -1) {
os.write(bytes, 0, read);
}
}
byte[] encoded = java.util.Base64.getEncoder().encode(inFileBytes);
byte[] encodedOnDisk = Files.readAllBytes(Paths.get(OUT_FILE));
assertArrayEquals(encoded, encodedOnDisk);
byte[] decoded = java.util.Base64.getDecoder().decode(encoded);
byte[] decodedOnDisk = java.util.Base64.getDecoder().decode(encodedOnDisk);
assertArrayEquals(decoded, decodedOnDisk);
}
@Test
public void givenApacheCommons_givenJavaBase64_whenEncoded_thenDecodedOK() throws IOException {
byte[] encoded = org.apache.commons.codec.binary.Base64.encodeBase64(inFileBytes);
byte[] decoded = org.apache.commons.codec.binary.Base64.decodeBase64(encoded);
writeToFile(OUT_FILE, decoded);
assertNotEquals(encoded.length, decoded.length);
assertEquals(inFileBytes.length, decoded.length);
assertArrayEquals(decoded, inFileBytes);
}
private void writeToFile(String fileName, byte[] bytes) throws IOException {
FileOutputStream fos = new FileOutputStream(fileName);
fos.write(bytes);
fos.flush();
fos.close();
}
}
As we can see, we first read the input bytes in a @BeforeClass method, and in both our @Test methods, verified that:
我们可以看到,我们首先在@BeforeClass方法中读取输入字节,并在我们的@Test方法中,验证了这一点。
- encoded and decoded byte arrays are of different lengths
- inFileBytes and decoded byte arrays are of the same length and have the same contents
Of course, we can also open up the decoded PDF file that we created and see that the contents are the same as the file we gave as input.
当然,我们也可以打开我们创建的解码PDF文件,看到内容与我们作为输入的文件相同。
6. Conclusion
6.结语
In this quick tutorial, we learned more about Java’s Base64 utility.
在这个快速教程中,我们进一步了解了Java的Base64工具。
We also saw code samples for converting a PDF into and from Base64 using Java 8 and Apache Commons Codec. Interestingly, the JDK implementation is much faster than the Apache one.
我们还看到了使用Java 8和Apache Commons Codec将PDF转换为Base64或从Base64转换的代码样本。有趣的是,JDK的实现要比Apache的快得多。
As always, source code is available over on GitHub.
一如既往,源代码可在GitHub上获取。