HTML to PDF Using OpenPDF – 使用OpenPDF将HTML转换成PDF

最后修改: 2021年 9月 19日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this quick tutorial, we’ll look at using OpenPDF in Java to convert HTML files to PDF formats programmatically.

在这个快速教程中,我们将看看如何在Java中使用OpenPDF,以编程方式将HTML文件转换成PDF格式

2. OpenPDF

2.开放式PDF

OpenPDF is a free Java library for creating and editing PDF files under the LGPL and MPL licenses. It’s a fork of the iText program. In fact, before version 5, the code for generating PDF using OpenPDF was nearly identical to the iText API. It is a well-maintained solution for producing PDFs in Java.

OpenPDF是一个免费的Java库,用于创建和编辑PDF文件,采用LGPL和MPL许可。它是iText程序的一个分叉。事实上,在第5版之前,使用OpenPDF生成PDF的代码几乎与iText的API完全相同。它是一个维护良好的在Java中生成PDF的解决方案。

3. Converting Using Flying Saucer

3.使用飞碟进行转换

Flying Saucer is a Java library that allows us to render well-formed XML (or XHTML) with CSS 2.1 for style and formatting, generating output to PDF, pictures, and swing panels.

Flying Saucer是一个Java库,它允许我们用CSS 2.1来渲染格式良好的XML(或XHTML)的样式和格式,生成输出到PDF、图片和摆动面板。

3.1. Maven Dependencies

3.1. Maven的依赖性

We’ll start with Maven dependencies:

我们将从Maven的依赖性开始。

<dependency>
    <groupId>org.jsoup</groupId>
    <artifactId>jsoup</artifactId>
    <version>1.13.1</version>
</dependency>
<dependency>
    <groupId>org.xhtmlrenderer</groupId>
    <artifactId>flying-saucer-pdf-openpdf</artifactId>
    <version>9.1.20</version>
</dependency>

We’ll use the library jsoup for parsing HTML files, input streams, URLs, and even strings. It offers DOM (Document Object Model) traversal capabilities, CSS, and jQuery-like selectors to extract data from HTML.

我们将使用库jsoup来解析HTML文件、输入流、URL,甚至是字符串。它提供了DOM(文档对象模型)遍历功能、CSS和类似jQuery的选择器,以从HTML中提取数据。

The flying-saucer-pdf-openpdf library accepts an XML representation of HTML files as input, applies CSS formatting and styling, and outputs PDF.

flying-saucer-pdf-openpdf库接受HTML文件的XML表示作为输入,应用CSS格式和样式,并输出PDF。

3.2. HTML to PDF

3.2 HTML转PDF

In this tutorial, we’ll try to cover simple instances that you might encounter in HTML to PDF conversions, such as images in HTML and styling, using Flying Saucer and OpenPDF. We’ll also discuss how we can customize the code to accept external styles, images, and fonts.

在本教程中,我们将尝试用Flying Saucer和OpenPDF涵盖你在HTML到PDF转换中可能遇到的简单情况,如HTML中的图像和样式。我们还将讨论如何定制代码以接受外部样式、图像和字体。

Let’s take a look at our sample HTML code:

让我们看一下我们的HTML代码样本。

<html>
    <head>
        <style>
            .center_div {
                border: 1px solid gray;
                margin-left: auto;
                margin-right: auto;
                width: 90%;
                background-color: #d0f0f6;
                text-align: left;
                padding: 8px;
            }
        </style>
        <link href="style.css" rel="stylesheet">
    </head>
    <body>
        <div class="center_div">
            <h1>Hello Baeldung!</h1>
            <img src="Java_logo.png">
            <div class="myclass">
                <p>This is the tutorial to convert html to pdf.</p>
            </div>
        </div>
    </body>
</html>

To convert HTML to PDF, we’ll first read the HTML file from the defined location:

要将HTML转换为PDF,我们首先要从定义的位置读取HTML文件。

File inputHTML = new File(HTML);

As the next step, we’ll use jsoup to convert the above HTML file to a jsoup Document to render XHTML.

下一步,我们将使用jsoup将上述HTML文件转换为jsoup Document来渲染XHTML。

Given below is the XHTML output:

下面是XHTML输出。

Document document = Jsoup.parse(inputHTML, "UTF-8");
document.outputSettings().syntax(Document.OutputSettings.Syntax.xml);
return document;

Now, as the last step, let’s create a PDF from the XHTML document we generated in the previous step. The ITextRenderer will take this XHTML document and create an output PDF file. Note that we’re wrapping our code in a try-with-resources block to ensure the output stream is closed:

现在,作为最后一步,让我们从上一步生成的XHTML文档中创建一个PDF。ITextRenderer将接受这个XHTML文档并创建一个输出的PDF文件。请注意,我们将代码包裹在try-with-resources块中,以确保输出流被关闭

try (OutputStream outputStream = new FileOutputStream(outputPdf)) {
    ITextRenderer renderer = new ITextRenderer();
    SharedContext sharedContext = renderer.getSharedContext();
    sharedContext.setPrint(true);
    sharedContext.setInteractive(false);
    renderer.setDocumentFromString(xhtml.html());
    renderer.layout();
    renderer.createPDF(outputStream);
}

3.3. Customizing for External Styling

3.3.为外部造型进行定制

We can register additional fonts used in the HTML input document to ITextRenderer so that it can include them while generating the PDF:

我们可以向ITextRenderer注册HTML输入文档中使用的额外字体,这样它就可以在生成PDF时包括这些字体。

renderer.getFontResolver().addFont(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").toString(), true);

ITextRenderer may be required to register relative URLs to access the external styles:

ITextRenderer可能需要注册相对的URL来访问外部样式。

String baseUrl = FileSystems.getDefault()
  .getPath("src/main/resources/")
  .toUri().toURL().toString();
renderer.setDocumentFromString(xhtml, baseUrl);

We can customize image-related attributes by implementing ReplacedElementFactory:

我们可以通过实现ReplacedElementFactory来定制图像相关属性。

public ReplacedElement createReplacedElement(LayoutContext lc, BlockBox box, UserAgentCallback uac, int cssWidth, int cssHeight) {
    Element e = box.getElement();
    String nodeName = e.getNodeName();
    if (nodeName.equals("img")) {
        String imagePath = e.getAttribute("src");
        try {
            InputStream input = new FileInputStream("src/main/resources/"+imagePath);
            byte[] bytes = IOUtils.toByteArray(input);
            Image image = Image.getInstance(bytes);
            FSImage fsImage = new ITextFSImage(image);
            if (cssWidth != -1 || cssHeight != -1) {
                fsImage.scale(cssWidth, cssHeight);
            } else {
                fsImage.scale(2000, 1000);
            }
            return new ITextImageElement(fsImage);
        } catch (Exception e1) {
            e1.printStackTrace();
        }
    }
    return null;
}

Note: The above code prefixes the base path to the image path and sets the default image size in case it isn’t provided.

注意:上面的代码将基本路径前缀为图像路径,并设置默认的图像尺寸,以防没有提供。

Then, we can add the custom ReplacedElementFactory to the SharedContext:

然后,我们可以将自定义的ReplacedElementFactory添加到SharedContext

sharedContext.setReplacedElementFactory(new CustomElementFactoryImpl());

4. Converting Using Open HTML

4.使用开放式HTML进行转换

Open HTML to PDF is a Java library that outputs well-formed XML/XHTML (and even some HTML5) to PDF or pictures using CSS 2.1 (and later standards) for layout and formatting.

Open HTML to PDF是一个Java库,它将格式良好的XML/XHTML(甚至一些HTML5)输出到PDF或图片,使用CSS 2.1(及以后的标准)进行布局和格式化。

4.1. Maven Dependencies

4.1.Maven的依赖性

In addition to the jsoup library shown above, we’ll need to add a couple of Open HTML to PDF libraries to our pom.xml file:

除了上面显示的jsoup库之外,我们还需要在pom.xml文件中添加几个Open HTML to PDF库。

<dependency>
    <groupId>com.openhtmltopdf</groupId>
    <artifactId>openhtmltopdf-core</artifactId>
    <version>1.0.6</version>
</dependency>
<dependency>
    <groupId>com.openhtmltopdf</groupId>
    <artifactId>openhtmltopdf-pdfbox</artifactId>
    <version>1.0.6</version>
</dependency>

Library openhtmltopdf-core renders well-formed XML/XHTML, and openhtmltopdf-pdfbox generates a PDF document from the rendered representation of the XHTML.

openhtmltopdf-core渲染格式良好的XML/XHTML,openhtmltopdf-pdfbox从XHTML的渲染表示中生成一个PDF文档

4.2. HTML to PDF

4.2. HTML转PDF

In this program, to convert HTML to PDF using Open HTML, we’ll use the same HTML mentioned in section 3.2. We’ll first convert the HTML file to a jsoup Document as we showed in a previous example.

在这个程序中,要使用Open HTML将HTML转换成PDF,我们将使用3.2节中提到的相同的HTML。我们首先将HTML文件转换为jsoup Document,正如我们在前面的例子中所展示的那样。

In the last step, to create a PDF from the XHTML document, PdfRendererBuilder will take this XHTML document and create a PDF as the output file. Again, we’re using try-with-resources to wrap our logic:

在最后一步,为了从XHTML文档中创建一个PDF,PdfRendererBuilder将采取这个XHTML文档并创建一个PDF作为输出文件。同样,我们使用try-with-resources来包装我们的逻辑:

try (OutputStream os = new FileOutputStream(outputPdf)) {
    PdfRendererBuilder builder = new PdfRendererBuilder();
    builder.withUri(outputPdf);
    builder.toStream(os);
    builder.withW3cDocument(new W3CDom().fromJsoup(doc), "/");
    builder.run();
}

4.3. Customizing for External Styling

4.3.定制外部样式

We can register additional fonts used in the HTML input document to PdfRendererBuilder so that it can include them with the PDF:

我们可以向PdfRendererBuilder注册HTML输入文档中使用的额外字体,这样它就可以将它们包含在PDF中。

builder.useFont(new File(getClass().getClassLoader().getResource("fonts/PRISTINA.ttf").getFile()), "PRISTINA");

PdfRendererBuilder library may also be required to register relative URLs to access the external styles, similar to our earlier example:

PdfRendererBuilder库也可能需要注册相对的URL来访问外部样式,与我们之前的例子类似。

String baseUrl = FileSystems.getDefault()
  .getPath("src/main/resources/")
  .toUri().toURL().toString();
builder.withW3cDocument(new W3CDom().fromJsoup(doc), baseUrl);

5. Conclusion

5.总结

In this article, we have learned how to convert HTML into PDF using Flying Saucer and Open HTML. We’ve also discussed how we can register external fonts, styles, and customizations.

在这篇文章中,我们已经学会了如何使用Flying Saucer和Open HTML将HTML转换成PDF。我们还讨论了如何注册外部字体、样式和自定义。

As is the custom, all the code samples used in this tutorial are available over on GitHub.

按照惯例,本教程中使用的所有代码样本都可以在GitHub上找到