Microsoft Word Processing in Java with Apache POI – 用Apache POI在Java中处理Microsoft Word

最后修改: 2016年 12月 26日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Apache POI is a Java library for working with the various file formats based on the Office Open XML standards (OOXML) and Microsoft’s OLE 2 Compound Document format (OLE2).

Apache POI是一个Java库,用于处理基于Office开放式XML标准(OOXML)和Microsoft的OLE 2复合文档格式(OLE2)的各种文件格式。

This tutorial focuses on the support of Apache POI for Microsoft Word, the most commonly used Office file format. It walks through steps needed to format and generate an MS Word file and how to parse this file.

本教程重点介绍Apache POI对Microsoft Word(最常用的Office文件格式)的支持。它介绍了格式化和生成MS Word文件所需的步骤以及如何解析该文件。

2. Maven Dependencies

2.Maven的依赖性

The only dependency that is required for Apache POI to handle MS Word files is:

Apache POI处理MS Word文件所需的唯一依赖性是。

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>3.15</version>
</dependency>

Please click here for the latest version of this artifact.

请点击这里获取该工件的最新版本。

3. Preparation

3.准备工作

Let’s now look at some of the elements used to facilitate the generation of an MS Word file.

现在让我们来看看用于促进生成MS Word文件的一些元素。

3.1. Resource Files

3.1.资源文件

We’ll collect the contents of three text files and write them into an MS Word file – named rest-with-spring.docx.

我们将收集三个文本文件的内容,并将其写入一个MS Word文件–命名为rest-with-spring.docx

In addition, the logo-leaf.png file is used to insert an image into that new file. All these files do exist on the classpath and are represented by several static variables:

此外,logo-leaf.png文件被用来在该新文件中插入一个图像。所有这些文件确实存在于classpath中,并由几个静态变量表示。

public static String logo = "logo-leaf.png";
public static String paragraph1 = "poi-word-para1.txt";
public static String paragraph2 = "poi-word-para2.txt";
public static String paragraph3 = "poi-word-para3.txt";
public static String output = "rest-with-spring.docx";

For those who are curious, contents of these resource files in the repository, whose link is given in the last section of this tutorial, are extracted from this course page here on the site.

对于那些好奇的人来说,资源库中这些资源文件的内容(其链接在本教程的最后一节给出)是从网站上的这个课程页面提取的。

3.2. Helper Method

3.2.帮助方法

The main method consisting of logic used to generate an MS Word file, which is described in the following section, makes use of a helper method:

由用于生成MS Word文件的逻辑组成的主要方法,将在下一节中描述,它使用了一个辅助方法。

public String convertTextFileToString(String fileName) {
    try (Stream<String> stream 
      = Files.lines(Paths.get(ClassLoader.getSystemResource(fileName).toURI()))) {
        
        return stream.collect(Collectors.joining(" "));
    } catch (IOException | URISyntaxException e) {
        return null;
    }
}

This method extracts contents contained in a text file located on the classpath, whose name is the passed-in String argument. Then, it concatenates lines in this file and returns the joining String.

该方法提取位于classpath上的一个文本文件中的内容,该文件的名称是传入的String参数。然后,它将这个文件中的行连接起来,并返回连接的String

4. MS Word File Generation

4.MS Word文件生成

This section gives instructions on how to format and generate a Microsoft Word file. Prior to working on any part of the file, we need to have an XWPFDocument instance:

本节给出了如何格式化和生成一个Microsoft Word文件的说明。在处理文件的任何部分之前,我们需要有一个XWPFDocument实例。

XWPFDocument document = new XWPFDocument();

4.1. Formatting Title and Subtitle

4.1.标题和副标题的格式化

In order to create the title, we need to first instantiate the XWPFParagraph class and set the alignment on the new object:

为了创建标题,我们需要首先实例化XWPFParagraph类,并在新对象上设置对齐。

XWPFParagraph title = document.createParagraph();
title.setAlignment(ParagraphAlignment.CENTER);

The content of a paragraph needs to be wrapped in an XWPFRun object. We may configure this object to set a text value and its associated styles:

一个段落的内容需要被包裹在一个XWPFRun对象中。我们可以配置这个对象来设置一个文本值和它的相关样式。

XWPFRun titleRun = title.createRun();
titleRun.setText("Build Your REST API with Spring");
titleRun.setColor("009933");
titleRun.setBold(true);
titleRun.setFontFamily("Courier");
titleRun.setFontSize(20);

One should be able to infer the purposes of the set-methods from their names.

人们应该能够从这些方法的名称中推断出它们的目的。

In a similar way we create an XWPFParagraph instance enclosing the subtitle:

以类似的方式,我们创建一个XWPFParagraph实例,包围副标题。

XWPFParagraph subTitle = document.createParagraph();
subTitle.setAlignment(ParagraphAlignment.CENTER);

Let’s format the subtitle as well:

让我们把副标题也格式化。

XWPFRun subTitleRun = subTitle.createRun();
subTitleRun.setText("from HTTP fundamentals to API Mastery");
subTitleRun.setColor("00CC44");
subTitleRun.setFontFamily("Courier");
subTitleRun.setFontSize(16);
subTitleRun.setTextPosition(20);
subTitleRun.setUnderline(UnderlinePatterns.DOT_DOT_DASH);

The setTextPosition method sets the distance between the subtitle and the subsequent image, while setUnderline determines the underlining pattern.

setTextPosition方法设置字幕和后续图像之间的距离,而setUnderline则确定下划线模式。

Notice that we hard-code the contents of both the title and subtitle as these statements are too short to justify the use of a helper method.

请注意,我们对标题和副标题的内容进行了硬编码,因为这些语句太短,没有理由使用一个辅助方法。

4.2. Inserting an Image

4.2.插入图像

An image also needs to be wrapped in an XWPFParagraph instance. We want the image to be horizontally centered and placed under the subtitle, thus the following snippet must be put below the code given above:

图片也需要被包裹在一个XWPFParagraph实例中。我们希望图片水平居中,并放在副标题下,因此必须在上面给出的代码下面加上下面的片段。

XWPFParagraph image = document.createParagraph();
image.setAlignment(ParagraphAlignment.CENTER);

Here is how to set the distance between this image and the text below it:

下面是如何设置该图像和它下面的文字之间的距离。

XWPFRun imageRun = image.createRun();
imageRun.setTextPosition(20);

An image is taken from a file on the classpath and then inserted into the MS Word file with the specified dimensions:

从classpath上的一个文件中提取图像,然后以指定的尺寸插入MS Word文件中。

Path imagePath = Paths.get(ClassLoader.getSystemResource(logo).toURI());
imageRun.addPicture(Files.newInputStream(imagePath),
  XWPFDocument.PICTURE_TYPE_PNG, imagePath.getFileName().toString(),
  Units.toEMU(50), Units.toEMU(50));

4.3. Formatting Paragraphs

4.3.设置段落格式

Here is how we create the first paragraph with contents taken from the poi-word-para1.txt file:

下面是我们如何创建第一段,其内容取自poi-word-para1.txt文件。

XWPFParagraph para1 = document.createParagraph();
para1.setAlignment(ParagraphAlignment.BOTH);
String string1 = convertTextFileToString(paragraph1);
XWPFRun para1Run = para1.createRun();
para1Run.setText(string1);

It is apparent that the creation of a paragraph is similar to the creation of the title or subtitle. The only difference here is the use of the helper method instead of hard-coded strings.

显然,创建一个段落与创建标题或副标题相似。这里唯一的区别是使用了帮助方法而不是硬编码的字符串。

In a similar way, we can create two other paragraphs using contents from files poi-word-para2.txt and poi-word-para3.txt:

以类似的方式,我们可以使用文件poi-word-para2.txtpoi-word-para3.txt中的内容创建另外两个段落。

XWPFParagraph para2 = document.createParagraph();
para2.setAlignment(ParagraphAlignment.RIGHT);
String string2 = convertTextFileToString(paragraph2);
XWPFRun para2Run = para2.createRun();
para2Run.setText(string2);
para2Run.setItalic(true);

XWPFParagraph para3 = document.createParagraph();
para3.setAlignment(ParagraphAlignment.LEFT);
String string3 = convertTextFileToString(paragraph3);
XWPFRun para3Run = para3.createRun();
para3Run.setText(string3);

The creation of these three paragraphs is almost the same, except for some styling such as alignment or italics.

这三个段落的创建几乎是相同的,除了一些样式,如对齐或斜体。

4.4. Generating MS Word File

4.4.生成MS Word文件

Now we are ready to write out a Microsoft Word file to memory from the document variable:

现在我们准备从document变量中写出一个Microsoft Word文件到内存。

FileOutputStream out = new FileOutputStream(output);
document.write(out);
out.close();
document.close();

All the code snippets in this section are wrapped in a method named handleSimpleDoc.

本节的所有代码片段都被包裹在一个名为handleSimpleDoc的方法中。

5. Parsing and Testing

5.解析和测试

This section outlines the parsing of MS Word files and verification of the result.

本节概述了MS Word文件的解析和对结果的验证。

5.1. Preparation

5.1.准备工作

We declare a static field in the test class:

我们在测试类中声明一个静态字段。

static WordDocument wordDocument;

This field is used to reference to an instance of the class that encloses all the code fragments shown in sections 3 and 4.

这个字段用于引用一个类的实例,该类包含了第3和第4节中显示的所有代码片段。

Before parsing and testing, we need to initialize the static variable declared right above and generate the rest-with-spring.docx file in the current working directory by invoking the handleSimpleDoc method:

在解析和测试之前,我们需要初始化上面声明的静态变量,并通过调用handleSimpleDoc方法在当前工作目录下生成rest-with-spring.docx文件。

@BeforeClass
public static void generateMSWordFile() throws Exception {
    WordTest.wordDocument = new WordDocument();
    wordDocument.handleSimpleDoc();
}

Let’s move on to the final step: parsing the MS Word file and the verification of the outcome.

让我们进入最后一步:解析MS Word文件并对结果进行验证。

5.2. Parsing MS Word File and Verification

5.2.解析MS Word文件和验证

First, we extract contents from the given MS Word file in the project directory and the store the contents in a List of XWPFParagraph:

首先,我们从项目目录中给定的MS Word文件中提取内容,并将这些内容存储在ListXWPFParagraph中。

Path msWordPath = Paths.get(WordDocument.output);
XWPFDocument document = new XWPFDocument(Files.newInputStream(msWordPath));
List<XWPFParagraph> paragraphs = document.getParagraphs();
document.close();

Next, let’s make sure that the content and style of the title is the same as what we have set before:

接下来,让我们确保标题的内容和风格与我们之前设置的相同。

XWPFParagraph title = paragraphs.get(0);
XWPFRun titleRun = title.getRuns().get(0);
 
assertEquals("Build Your REST API with Spring", title.getText());
assertEquals("009933", titleRun.getColor());
assertTrue(titleRun.isBold());
assertEquals("Courier", titleRun.getFontFamily());
assertEquals(20, titleRun.getFontSize());

For the sake of simplicity, we just validate the contents of other parts of the file, leaving out the styles. The verification of their styles is similar to what we have done with the title:

为了简单起见,我们只验证文件中其他部分的内容,而不去验证样式。对其样式的验证与我们对标题所做的类似。

assertEquals("from HTTP fundamentals to API Mastery",
  paragraphs.get(1).getText());
assertEquals("What makes a good API?", paragraphs.get(3).getText());
assertEquals(wordDocument.convertTextFileToString
  (WordDocument.paragraph1), paragraphs.get(4).getText());
assertEquals(wordDocument.convertTextFileToString
  (WordDocument.paragraph2), paragraphs.get(5).getText());
assertEquals(wordDocument.convertTextFileToString
  (WordDocument.paragraph3), paragraphs.get(6).getText());

Now we can be confident that the creation of the rest-with-spring.docx file has been successful.

现在我们可以确信,rest-with-spring.docx文件的创建已经成功。

6. Conclusion

6.结论

This tutorial introduced Apache POI support for the Microsoft Word format. It went through steps needed to generate an MS Word file and to verify its contents.

本教程介绍了Apache POI对Microsoft Word格式的支持。它经历了生成MS Word文件和验证其内容所需的步骤。

The implementation of all these examples and code snippets can be found in a GitHub project.

所有这些例子和代码片段的实现都可以在一个GitHub项目中找到