Convert Excel Files to PDF Using Java – 使用 Java 将 Excel 文件转换为 PDF

最后修改: 2023年 12月 20日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.导言

In this article, we’ll explore how to convert Excel files to PDF in Java using the Apache POI and iText libraries. Apache POI handles Excel file parsing and data extraction, while iText takes care of PDF document creation and formatting. By leveraging their strengths, we can efficiently convert Excel data while retaining its original formatting and styles.

在本文中,我们将探讨如何使用 Apache POIiText 库在 Java 中将 Excel 文件转换为 PDF。Apache POI 负责 Excel 文件的解析和数据提取,而 iText 则负责 PDF 文档的创建和格式化。利用它们的优势,我们可以高效地转换 Excel 数据,同时保留其原始格式和样式。

2. Adding Dependencies

2.添加依赖关系

Before we start the implementation, we need to add the Apache POI and iText libraries to our project. In the pom.xml file, add the following dependencies:

在开始实施之前,我们需要在项目中添加 Apache POI 和 iText 库。在 pom.xml 文件中,添加以下依赖项:

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
</dependency>
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
</dependency>

The latest versions of the Apache POI and iText libraries can be downloaded from Maven Central.

Apache POIiText 库的最新版本可从 Maven Central 下载。

3. Loading the Excel File

3.加载 Excel 文件

Load the Excel

With the libraries in place, let’s load the target Excel file using Apache POI. We’ll first open the Excel file using a FileInputStream and create an XSSFWorkbook object representing the loaded workbook:

有了这些库,让我们使用 Apache POI 加载目标 Excel 文件。首先,我们将使用 FileInputStream 打开 Excel 文件,并创建一个 XSSFWorkbook 对象来表示加载的工作簿:

FileInputStream inputStream = new FileInputStream(excelFilePath);
XSSFWorkbook workbook = new XSSFWorkbook(inputStream);

We’ll use this object to access individual sheets and their data.

我们将使用该对象访问各个工作表及其数据。

4. Creating a PDF Document

4.创建 PDF 文档

Next, we’ll utilize iText to create a new PDF document:

接下来,我们将利用 iText 创建一个新的 PDF 文档:

Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(pdfFilePath));
document.open();

This creates a new Document object and associates it with a PDFWriter instance responsible for writing the PDF content. Finally, we specify the desired output location for the PDF through a FileOutputStream.

这将创建一个新的 Document 对象,并将其与负责写入 PDF 内容的 PDFWriter 实例关联起来。最后,我们通过 FileOutputStream 指定 PDF 所需的输出位置。

5. Parsing the Excel Data

5.解析 Excel 数据

With the document ready, we’ll iterate through each row in the worksheet to extract the cell values:

文档准备就绪后,我们将遍历工作表中的每一行,提取单元格值:

void addTableData(PdfPTable table) throws DocumentException, IOException {
    XSSFSheet worksheet = workbook.getSheetAt(0);
    Iterator<Row> rowIterator = worksheet.iterator();
    while (rowIterator.hasNext()) {
        Row row = rowIterator.next();
        if (row.getRowNum() == 0) {
            continue;
        }
        for (int i = 0; i < row.getPhysicalNumberOfCells(); i++) {
            Cell cell = row.getCell(i);
            String cellValue;
            switch (cell.getCellType()) {
                case STRING:
                    cellValue = cell.getStringCellValue();
                    break;
                case NUMERIC:
                    cellValue = String.valueOf(BigDecimal.valueOf(cell.getNumericCellValue()));
                    break;
                case BLANK:
                default:
                    cellValue = "";
                    break;
            }
            PdfPCell cellPdf = new PdfPCell(new Phrase(cellValue));
            table.addCell(cellPdf);
        }
    }
}

The code first creates a PdfTable object matching the number of columns in the first row of the worksheet. Then, iterate through each row in the worksheet, extracting the cell values and weaving them into the PDF table. However, Excel formulas are currently not supported and will be returned with an empty string.

代码首先创建一个与工作表第一行列数相匹配的 PdfTable 对象。然后,遍历工作表中的每一行,提取单元格值并将其编织到 PDF 表格中。但是,目前不支持 Excel 公式,返回的公式将是空字符串。

For each extracted cell value, we create a new PdfPCell object using a Phrase containing the extracted data. Phrase is an iText element that represents a formatted text string.

对于每个提取的单元格值,我们使用包含提取数据的 Phrase 创建一个新的 PdfPCell 对象。Phrase 是一个 iText 元素,表示格式化的文本字符串。

6. Preserving the Excel Styling

6.保留 Excel 样式

One of the key advantages of using Apache POI and iText is the ability to preserve the formatting and styles from the original Excel file. This includes font styles, colors, and alignments.

使用 Apache POI 和 iText 的主要优势之一是能够保留原始 Excel 文件的格式和样式。这包括字体样式、颜色和对齐方式。

By accessing the relevant cell style information from Apache POI, we can apply it to the corresponding elements in the PDF document using iText. However, it’s important to note that this approach, while preserving formatting and styles, may not replicate the exact look and feel of a PDF directly exported from Excel or printed with a printer driver. For more complex formatting needs, additional adjustments may be required.

通过访问 Apache POI 中的相关单元格样式信息,我们可以使用 iText 将其应用到 PDF 文档中的相应元素。不过,需要注意的是,这种方法虽然保留了格式和样式,但可能无法完全复制从 Excel 直接导出或使用打印机驱动程序打印的 PDF 的外观和感觉。对于更复杂的格式化需求,可能需要进行额外的调整。

6.1. Font Styling

6.1.字体样式

We’ll create a dedicated getCellStyle(Cell cell) method to extract styling information like font, color, etc., from the CellStyle object associated with each cell:

我们将创建一个专用的 getCellStyle(Cell cell cell) 方法,以便从与每个单元格关联的 CellStyle 对象中提取字体、颜色等样式信息:

Font getCellStyle(Cell cell) throws DocumentException, IOException {
    Font font = new Font();
    CellStyle cellStyle = cell.getCellStyle();
    org.apache.poi.ss.usermodel.Font cellFont = cell.getSheet()
      .getWorkbook()
      .getFontAt(cellStyle.getFontIndexAsInt());

    if (cellFont.getItalic()) {
        font.setStyle(Font.ITALIC);
    }

    if (cellFont.getStrikeout()) {
        font.setStyle(Font.STRIKETHRU);
    }

    if (cellFont.getUnderline() == 1) {
        font.setStyle(Font.UNDERLINE);
    }

    short fontSize = cellFont.getFontHeightInPoints();
    font.setSize(fontSize);

    if (cellFont.getBold()) {
        font.setStyle(Font.BOLD);
    }

    String fontName = cellFont.getFontName();
    if (FontFactory.isRegistered(fontName)) {
        font.setFamily(fontName);
    } else {
        logger.warn("Unsupported font type: {}", fontName);
        font.setFamily("Helvetica");
    }

    return font;
}

The Phrase object can accept a cell value and a Front object as arguments to its constructor:

Phrase 对象可以接受单元格值和 Front 对象作为其构造函数的参数:

PdfPCell cellPdf = new PdfPCell(new Phrase(cellValue, getCellStyle(cell));

This allows us to control the content and formatting of text within a PDF cell. Note that iText’s built-in fonts are limited to Courier, Helvetica, and TimesRoman. Therefore, we should check if iText supports the extracted cell’s font family before we apply it directly. If the Excel file uses a different font family, it won’t be reflected in the PDF output.

这使我们可以控制 PDF 单元格中文本的内容和格式。请注意,iText 的内置字体仅限于 Courier、Helvetica 和 TimesRoman。因此,在直接应用之前,我们应检查 iText 是否支持提取单元格的字体系列。如果 Excel 文件使用了不同的字体系列,PDF 输出中将无法反映。

6.2. Background Color Styling

6.2.背景颜色样式

In addition to preserving font styles, we also want to ensure that the background colors of cells in the Excel file are accurately reflected in the generated PDF. To achieve this, we’ll create a new method, setBackgroundColor(), to extract the background color information from the Excel cell and apply it to the corresponding PDF cell.

除了保留字体样式外,我们还希望确保 Excel 文件中单元格的背景颜色能准确反映在生成的 PDF 中。为此,我们将创建一个新方法 setBackgroundColor() , 从 Excel 单元格中提取背景颜色信息,并将其应用到相应的 PDF 单元格中。

void setBackgroundColor(Cell cell, PdfPCell cellPdf) {
    short bgColorIndex = cell.getCellStyle()
      .getFillForegroundColor();
    if (bgColorIndex != IndexedColors.AUTOMATIC.getIndex()) {
        XSSFColor bgColor = (XSSFColor) cell.getCellStyle()
          .getFillForegroundColorColor();
        if (bgColor != null) {
            byte[] rgb = bgColor.getRGB();
            if (rgb != null && rgb.length == 3) {
                cellPdf.setBackgroundColor(new BaseColor(rgb[0] & 0xFF, rgb[1] & 0xFF, rgb[2] & 0xFF));
            }
        }
    }
}

6.3. Alignment Styling

6.3.对齐样式

Apache POI provides the getAlignment() method on the CellStyle object. This returns a constant value representing the alignment. Once we have the mapped iText alignment constant, we can set it on the PdfPCell object using the setHorizontalAlignment() method.

Apache POI 为 CellStyle 对象提供了 getAlignment() 方法。该方法会返回一个代表对齐方式的常量值。获得映射的 iText 对齐常量后,我们就可以使用 setHorizontalAlignment() 方法在 PdfPCell 对象上对其进行设置。

Here’s an example of how to incorporate alignment extraction and application:

下面举例说明如何结合对齐提取和应用:

void setCellAlignment(Cell cell, PdfPCell cellPdf) {
    CellStyle cellStyle = cell.getCellStyle();

    HorizontalAlignment horizontalAlignment = cellStyle.getAlignment();

    switch (horizontalAlignment) {
        case LEFT:
            cellPdf.setHorizontalAlignment(Element.ALIGN_LEFT);
            break;
        case CENTER:
            cellPdf.setHorizontalAlignment(Element.ALIGN_CENTER);
            break;
        case JUSTIFY:
        case FILL:
            cellPdf.setVerticalAlignment(Element.ALIGN_JUSTIFIED);
            break;
        case RIGHT:
            cellPdf.setHorizontalAlignment(Element.ALIGN_RIGHT);
            break;
    }
}

Now, let’s update the existing code where we iterate through the cells to include the font and background color styling:

现在,让我们更新迭代单元格的现有代码,使其包含字体和背景颜色样式:

PdfPCell cellPdf = new PdfPCell(new Phrase(cellValue, getCellStyle(cell)));
setBackgroundColor(cell, cellPdf);
setCellAlignment(cell, cellPdf);

Note that the resulting Excel will not look the same as a PDF exported from Excel (or a PDF print via a printer driver).

请注意,生成的 Excel 与从 Excel 导出的 PDF(或通过打印机驱动程序打印的 PDF)看起来并不相同。

7. Saving the PDF Document

7.保存 PDF 文档

Finally, we can save the generated PDF document to the desired location. This involves closing the PDF document object and ensuring all resources are released properly:

最后,我们可以将生成的 PDF 文档保存到所需位置。这需要关闭 PDF 文档对象,并确保所有资源都已正确释放:

document.add(table);
document.close();
workbook.close();Convert Excel to PDF

8. Conclusion

8.结论

We’ve learned to convert Excel files to PDF in Java using Apache POI and iText. By combining the capabilities of Apache POI for Excel handling and iText for PDF generation, we can seamlessly preserve formatting and apply styles from Excel to PDF.

我们已经学会了使用 Apache POI 和 iText 在 Java 中将 Excel 文件转换为 PDF。通过结合 Apache POI 处理 Excel 和 iText 生成 PDF 的功能,我们可以无缝地保留 Excel 到 PDF 的格式并应用样式。

The code examples from this article can be found in the Maven-based project over on GitHub.

本文中的代码示例可在基于 Maven 的项目 over on GitHub 中找到。