Introduction To Docx4J – Docx4J简介

最后修改: 2017年 10月 10日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this article, we’ll focus on creating a .docx document using the docx4j library.

在这篇文章中,我们将重点讨论使用docx4j库创建一个.docx文档。

Docx4j is a Java library used for creating and manipulating Office OpenXML files – which means it can only work with the .docx file type, while older versions of Microsoft Word use a .doc extension (binary files).

Docx4j是一个用于创建和操作Office OpenXML文件的Java库–这意味着它只能处理.docx文件类型,而旧版本的Microsoft Word使用.doc扩展名(二进制文件)。

Note that the OpenXML format is supported by Microsoft Office starting with the 2007 version.

注意,Microsoft Office从2007年版本开始支持OpenXML 格式。

2. Maven Setup

2.Maven的设置

To start working with docx4j, we need to add the required dependency into our pom.xml:

为了开始使用docx4j,我们需要在我们的pom.xml中添加所需的依赖性。

<dependency>
    <groupId>org.docx4j</groupId>
    <artifactId>docx4j</artifactId>
    <version>3.3.5</version>
</dependency>
<dependency> 
    <groupId>javax.xml.bind</groupId>
    <artifactId>jaxb-api</artifactId>
    <version>2.1</version>
</dependency>

Note that we can always look up the latest dependencies versions in the Maven Central Repository.

注意,我们可以随时在Maven Central Repository中查找最新的依赖版本。

The JAXB dependency is needed, as docx4j uses this library under the hood to marshall/unmarshall XML parts in a docx file.

JAXB依赖性是需要的,因为docx4j在引擎盖下使用这个库来marshall/unmarshall一个docx文件中的XML部分。

3. Create a Docx File Document

3.创建一个Docx文件文档

3.1. Text Elements and Styling

3.1.文本元素和样式设计

Let’s first see how to create a simple docx file – with a text paragraph:

让我们先看看如何创建一个简单的docx文件–带有一个文本段落。

WordprocessingMLPackage wordPackage = WordprocessingMLPackage.createPackage();
MainDocumentPart mainDocumentPart = wordPackage.getMainDocumentPart();
mainDocumentPart.addStyledParagraphOfText("Title", "Hello World!");
mainDocumentPart.addParagraphOfText("Welcome To Baeldung");
File exportFile = new File("welcome.docx");
wordPackage.save(exportFile);

Here’s the resulting welcome.docx file:

下面是产生的welcome.docx文件。

im1

To create a new document, we have to make use of the WordprocessingMLPackage, which represents a docx file in OpenXML format, while the MainDocumentPart class holds a representation of the main document.xml part.

要创建一个新的文档,我们必须利用WordprocessingMLPackage,它代表一个docx文件的OpenXML格式,而MainDocumentPart类持有主document.xml部分的表示。

To clear things up, let’s unzip the welcome.docx file, and open the word/document.xml file to see what the XML representation looks like:

为了弄清楚事情,让我们解压缩welcome.docx文件,并打开word/document.xml文件,看看XML的表示方法是什么样子。

<w:body>
    <w:p>
        <w:pPr>
            <w:pStyle w:val="Title"/>
        </w:pPr>
        <w:r>
            <w:t>Hello World!</w:t>
        </w:r>
    </w:p>
    <w:p>
        <w:r>
            <w:t>Welcome To Baeldung!</w:t>
        </w:r>
    </w:p>
</w:body>

As we can see, each sentence is represented by a run (r) of text (t) inside a paragraph (p), and that’s what the addParagraphOfText() method is for.

正如我们所看到的,每个句子由一段(r)内的文字(t)代表,这就是addParagraphOfText()方法的作用。

The addStyledParagraphOfText() do a little more than that; it creates a paragraph properties (pPr) that holds the style to apply to the paragraph.

addStyledParagraphOfText()做得更多;它创建了一个段落属性(pPr),持有要应用到段落的样式。

Simply put, paragraphs declare separate runs, and each run contain some text elements:

简单地说,段落声明独立的运行,每个运行包含一些文本元素。

p-r-t

To create a nice looking document, we need to have full control of these elements (paragraph, run, and text).

为了创建一个漂亮的文档,我们需要完全控制这些元素(段落,运行,文本)。

So, let’s discover how to stylize our content using the runProperties (RPr) object:

因此,让我们发现如何使用runPropertiesRPr)对象来使我们的内容风格化。

ObjectFactory factory = Context.getWmlObjectFactory();
P p = factory.createP();
R r = factory.createR();
Text t = factory.createText();
t.setValue("Welcome To Baeldung");
r.getContent().add(t);
p.getContent().add(r);
RPr rpr = factory.createRPr();       
BooleanDefaultTrue b = new BooleanDefaultTrue();
rpr.setB(b);
rpr.setI(b);
rpr.setCaps(b);
Color green = factory.createColor();
green.setVal("green");
rpr.setColor(green);
r.setRPr(rpr);
mainDocumentPart.getContent().add(p);
File exportFile = new File("welcome.docx");
wordPackage.save(exportFile);

Here’s what the result looks like:

下面是结果的样子。

im2a

After we’ve created a paragraph, a run and a text element using createP(), createR() and createText() respectively, we’ve declared a new runProperties object (RPr) to add some styling to the text element.

在我们分别使用createP()createR()createText()创建了一个段落、一个运行和一个文本元素之后,我们声明了一个新的runProperties对象(R)来给文本元素添加一些样式。

The rpr object is used to set formatting properties, Bold (B), Italicized (I), and capitalized (Caps), those properties are applied to the text run using the setRPr() method.

rpr对象用于设置格式化属性,粗体(B)、斜体(I)和大写(Caps),这些属性使用setRPr()方法应用于文本运行。

3.2. Working With Images

3.2.使用图像

Docx4j offers an easy way to add images to our Word document:

Docx4j提供了一个简单的方法,将图像添加到我们的Word文档中。

File image = new File("image.jpg" );
byte[] fileContent = Files.readAllBytes(image.toPath());
BinaryPartAbstractImage imagePart = BinaryPartAbstractImage
  .createImagePart(wordPackage, fileContent);
Inline inline = imagePart.createImageInline(
  "Baeldung Image (filename hint)", "Alt Text", 1, 2, false);
P Imageparagraph = addImageToParagraph(inline);
mainDocumentPart.getContent().add(Imageparagraph);

And here’s what the implementation of the addImageToParagraph() method looks like:

下面是addImageToParagraph()方法的实现。

private static P addImageToParagraph(Inline inline) {
    ObjectFactory factory = new ObjectFactory();
    P p = factory.createP();
    R r = factory.createR();
    p.getContent().add(r);
    Drawing drawing = factory.createDrawing();
    r.getContent().add(drawing);
    drawing.getAnchorOrInline().add(inline);
    return p;
}

First, we’ve created the file that contains the image we want to add into our main document part, then, we’ve linked the byte array representing the image with the wordMLPackage object.

首先,我们创建了包含我们想要添加到主文档部分的图片的文件,然后,我们将代表图片的字节数组与wordMLPackage对象连接起来。

Once the image part is created, we need to create an Inline object using the createImageInline() method.

一旦图像部分被创建,我们需要使用createImageInline()方法创建一个Inline对象。

The addImageToParagraph() method embed the Inline object into a Drawing so that it can be added to a run.

addImageToParagraph()方法将Inline对象嵌入到Drawing中,以便将其添加到run中。

Finally, like a text paragraph, the paragraph containing the image is added to the mainDocumentPart.

最后,像文本段落一样,包含图像的段落被添加到mainDocumentPart

And here’s the resulting document:

这里是所产生的文件。

im3a

3.3. Creating Tables

3.3.创建表格

Docx4j also makes it quite easy to manipulate Tables (Tbl), rows (Tr), and columns (Tc).

Docx4j还使操作表格(Tbl)、行(Tr)和列(Tc)变得相当容易。

Let’s see how to create a 3×3 table and add some content to it:

让我们看看如何创建一个3×3的表格并向其中添加一些内容。

int writableWidthTwips = wordPackage.getDocumentModel()
  .getSections().get(0).getPageDimensions().getWritableWidthTwips();
int columnNumber = 3;
Tbl tbl = TblFactory.createTable(3, 3, writableWidthTwips/columnNumber);     
List<Object> rows = tbl.getContent();
for (Object row : rows) {
    Tr tr = (Tr) row;
    List<Object> cells = tr.getContent();
    for(Object cell : cells) {
        Tc td = (Tc) cell;
        td.getContent().add(p);
    }
}

Given some rows and columns, the createTable() method creates a new Tbl object, the third argument refers to the column width in twips (which is a distance measurement – 1/1440th of an inch).

给定一些行和列,createTable()方法创建一个新的Tbl对象,第三个参数指的是列宽,单位是twips(这是一种距离测量–一英寸的1/1440)。

Once created, we can iterate over the content of the tbl object, and add Paragraph objects into each cell.

一旦创建,我们可以遍历tbl对象的内容,并将Paragraph对象添加到每个单元格。

Let’s see what the final result looks like:

让我们看看最后的结果是什么样子的。

im4a

4. Reading a Docx File Document

4.阅读一个Docx文件文档

Now that we’ve discovered how to use docx4j to create documents, let’s see how to read an existing docx file, and print its content:

现在我们已经发现了如何使用docx4j来创建文档,让我们看看如何读取一个现有的docx文件,并打印其内容。

File doc = new File("helloWorld.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
  .load(doc);
MainDocumentPart mainDocumentPart = wordMLPackage
  .getMainDocumentPart();
String textNodesXPath = "//w:t";
List<Object> textNodes= mainDocumentPart
  .getJAXBNodesViaXPath(textNodesXPath, true);
for (Object obj : textNodes) {
    Text text = (Text) ((JAXBElement) obj).getValue();
    String textValue = text.getValue();
    System.out.println(textValue);
}

In this example, we’ve created a WordprocessingMLPackage object based on an existing helloWorld.docx file, using the load() method.

在这个例子中,我们基于现有的helloWorld.docx文件,使用load()方法创建了一个WordprocessingMLPackage对象。

After that, we’ve used a XPath expression (//w:t) to get all text nodes from the main document part.

之后,我们使用了一个XPath表达式(//w:t)来获取主文档部分的所有文本结点。

The getJAXBNodesViaXPath() method returns a list of JAXBElement objects.

getJAXBNodesViaXPath()方法返回一个JAXBElement对象的列表。

As a result, all text elements inside the mainDocumentPart object are printed in the console.

结果是,mainDocumentPart对象内的所有文本元素都被打印在控制台中。

Note that we can always unzip our docx files to get a better understanding of the XML structure, which helps in analyzing problems, and gives better insight into how to tackle them.

注意,我们可以随时解压缩我们的docx文件,以更好地了解XML结构,这有助于分析问题,并对如何解决这些问题有更好的见解。

5. Conclusion

5.结论

In this article, we’ve discovered how docx4j makes it easier to perform complex operations on MSWord document, such as creating paragraphs, tables, document parts, and adding images.

在这篇文章中,我们已经发现了docx4j是如何使MSWord文档的复杂操作更容易进行的,如创建段落、表格、文档部分和添加图片。

The code snippets can be found, as always, over on GitHub.

像往常一样,可以在GitHub上找到代码片段,over on GitHub