Pretty-Print XML in Java – 在Java中漂亮地打印XML

最后修改: 2022年 3月 12日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

When we need to read an XML file manually, usually, we would like to read the content in a pretty-printed format. Many text editors or IDEs can reformat XML documents. If we work in Linux, we can pretty-print XML files from the command line.

当我们需要手动阅读XML文件时,通常,我们希望以一种漂亮的打印格式来阅读内容。许多文本编辑器或IDE都可以对XML文件进行重新格式化。如果我们在Linux中工作,我们可以从命令行pretty-print XML文件

However, sometimes, we have requirements to convert a raw XML string to the pretty-printed format in our Java program. For example, we may want to show a pretty-printed XML document in the user interface for better visual comprehension.

然而,有时候,我们有要求在我们的Java程序中把原始的XML字符串转换成漂亮的打印格式。例如,我们可能想在用户界面上显示一个漂亮的XML文档,以便更好地进行视觉理解。

In this tutorial, we’ll explore how to pretty-print XML in Java.

在本教程中,我们将探讨如何在Java中漂亮地打印XML。

2. Introduction to the Problem

2.对问题的介绍

For simplicity, we’ll take a non-formatted emails.xml file as the input:

为了简单起见,我们将采取一个非格式化的emails.xml文件作为输入。

<emails> <email> <from>Kai</from> <to>Amanda</to> <time>2018-03-05</time>
<subject>I am flying to you</subject></email> <email>
<from>Jerry</from> <to>Tom</to> <time>1992-08-08</time> <subject>Hey Tom, catch me if you can!</subject>
</email> </emails>

As we can see, the emails.xml file is well-formed. However, it’s not easy to read due to the messy format.

正如我们所看到的,emails.xml文件是格式良好的。然而,由于格式混乱,它并不容易阅读。

Our goal is to create a method to convert this ugly, raw XML string to a pretty-formatted string.

我们的目标是创建一个方法,将这个丑陋的、原始的XML字符串转换成一个漂亮的格式化字符串。

Further, we’ll discuss customizing two common output properties: indent-size (integer) and suppressing XML declaration (boolean).

此外,我们将讨论定制两个常见的输出属性:缩进尺寸(integer)和抑制XML声明(boolean)。

The indent-size property is pretty straightforward: It’s the number of spaces to indent (per level). On the other hand, the suppressing XML declaration option decides if we want to have the XML declaration tag in the generated XML. A typical XML declaration looks like:

缩进尺寸属性是非常直接的。它是要缩进的空格数(每层)。另一方面,抑制XML声明选项决定了我们是否要在生成的XML中保留XML声明标签。一个典型的XML声明看起来像。

<?xml version="1.0" encoding="UTF-8"?>

In this tutorial, we’ll address a solution with the standard Java API and another approach using an external library.

在本教程中,我们将讨论一个使用标准Java API的解决方案和另一个使用外部库的方法。

Next, let’s see them in action.

接下来,让我们看看他们的行动。

3. Pretty-Printing XML With the Transformer Class

3.用Transformer类漂亮地打印XML

Java API provides the Transformer class to do XML transformations.

Java API提供了Transformer类来做XML的转换。

3.1. Using the Default Transformer

3.1.使用默认的变换器

First, let’s see the pretty-print solution using the Transformer class:

首先,让我们看看使用Transformer类的pretty-print解决方案。

public static String prettyPrintByTransformer(String xmlString, int indent, boolean ignoreDeclaration) {

    try {
        InputSource src = new InputSource(new StringReader(xmlString));
        Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src);

        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        transformerFactory.setAttribute("indent-number", indent);
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, ignoreDeclaration ? "yes" : "no");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        Writer out = new StringWriter();
        transformer.transform(new DOMSource(document), new StreamResult(out));
        return out.toString();
    } catch (Exception e) {
        throw new RuntimeException("Error occurs when pretty-printing xml:\n" + xmlString, e);
    }
}

Now, let’s walk through the method quickly and figure out how it works:

现在,让我们快速浏览一下这个方法,弄清楚它是如何工作的。

  • First, we parse the raw XML string and get a Document object.
  • Next, we obtain a TransformerFactory instance and set the required indent-size attribute.
  • Then, we can get a default transformer instance from the configured tranformerFactory object.
  • The transformer object supports various output properties. To decide if we want to skip the declaration, we set the OutputKeys.OMIT_XML_DECLARATION attribute.
  • Since we would like to have a pretty-formatted String object, finally, we transform() the parsed XML Document to a StringWriter and return the transformed String.

We’ve set the indent size on the TransformerFactory object in the method above. Alternatively, we can also define the indent-amount property on the transformer instance:

我们已经在上面的方法中对TransformerFactory对象设置了缩进大小。另外,我们也可以在transformer实例上定义indent-amount属性

transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", String.valueOf(indent));

Next, let’s test if the method works as expected.

接下来,让我们测试一下该方法是否如预期那样工作。

3.2. Testing the Method

3.2.测试方法

Our Java project is a Maven project, and we’ve put the emails.xml under src/main/resources/xml/email.xml. We’ve created the readFromInputStream method to read the input file as a String. But, we won’t go into the details of this method since it doesn’t have much to do with our topic here. Let’s say we want to set the indent-size=2 and skip the XML declaration in the result:

我们的Java项目是一个Maven项目,我们把emails.xml放在src/main/resources/xml/email.xml下。我们已经创建了readFromInputStream方法来读取输入文件的String。但是,我们不会去研究这个方法的细节,因为它和我们这里的主题没有什么关系。假设我们想设置indent-size=2并跳过结果中的XML声明。

public static void main(String[] args) throws IOException {
    InputStream inputStream = XmlPrettyPrinter.class.getResourceAsStream("/xml/emails.xml");
    String xmlString = readFromInputStream(inputStream);
    System.out.println("Pretty printing by Transformer");
    System.out.println("=============================================");
    System.out.println(prettyPrintByTransformer(xmlString, 2, true));
}

As the main method shows, we read the input file as a String and then call our prettyPrintByTransformer method to get a pretty-printed XML String.

main方法所示,我们将输入文件读作String,然后调用我们的prettyPrintByTransformer方法,得到一个漂亮的XMLString

Next, let’s run the main method with Java 8:

接下来,让我们用Java 8运行mainmethod

Pretty printing by Transformer
=============================================
<emails>
  <email>
    <from>Kai</from>
    <to>Amanda</to>
    <time>2018-03-05</time>
    <subject>I am flying to you</subject>
  </email>
  <email>
    <from>Jerry</from>
    <to>Tom</to>
    <time>1992-08-08</time>
    <subject>Hey Tom, catch me if you can!</subject>
  </email>
</emails>

As the output above shows, our method works as expected.

正如上面的输出所显示的,我们的方法如预期般运作。

However, if we test it once again with Java 9 or a later version, we may see different output.

然而,如果我们用Java 9或更高版本再测试一次,我们可能会看到不同的输出。

Next, let’s see what it produces if we run it with Java 9:

接下来,让我们看看如果我们用Java 9运行它,会产生什么结果

Pretty printing by Transformer
=============================================
<emails>
   
  <email>
     
    <from>Kai</from>
     
    <to>Amanda</to>
     
    <time>2018-03-05</time>
    
    <subject>I am flying to you</subject>
  </email>
   
  <email>
    
    <from>Jerry</from>
     
    <to>Tom</to>
     
    <time>1992-08-08</time>
     
    <subject>Hey Tom, catch me if you can!</subject>
    
  </email>
   
</emails>

=============================================

As we can see in the output above, there are unexpected empty lines in the output.

从上面的输出中我们可以看到,输出中出现了意想不到的空行。

This is because our raw input contains whitespace between elements, for example:

这是因为我们的原始输入在元素之间含有空格,例如。

<emails> <email> <from>Kai</from> ...

As of Java 9, the Transformer class’s pretty-print feature doesn’t define the actual format. Therefore, whitespace-only nodes will be outputted as well. This has been discussed in this JDK bug ticket. Also, Java 9’s release note has explained this in the xml/jaxp section.

从Java 9开始,Transformer类的pretty-print功能并没有定义实际的格式。因此,纯白的节点也将被输出。这个问题已经在这个JDK 错误票中讨论过。此外,Java 9 的发行说明也在 xml/jaxp 部分解释了这一点。

If we want our pretty-print method to always generate the same format under various Java versions, we need to provide a stylesheet file.

如果我们希望我们的pretty-print方法在不同的Java版本下始终生成相同的格式,我们需要提供一个样式表文件。

Next, let’s create a simple xsl file to achieve that.

接下来,让我们创建一个简单的xsl文件来实现这一目标。

3.3. Providing an XSLT File

3.3.提供一个XSLT文件

First, let’s create the prettyprint.xsl file to define the output format:

首先,让我们创建prettyprint.xsl文件来定义输出格式。

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:strip-space elements="*"/>
    <xsl:output method="xml" encoding="UTF-8"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

As we can see, in the prettyprint.xsl file, we’ve used the <xsl:strip-space/> element to remove whitespace-only nodes so that they do not appear in the output.

我们可以看到,在prettyprint.xsl文件中,我们使用了<xsl:strip-space/>元素来移除纯白的节点,这样它们就不会出现在输出中

Next, we still need to make a small change to our method. We won’t use the default transformer anymore. Instead, we’ll create a Transformer object with our XSLT document:

接下来,我们仍然需要对我们的方法做一个小小的改变。我们将不再使用默认的转化器。相反,我们将用我们的XSLT文档创建一个Transformer对象

Transformer transformer = transformerFactory.newTransformer(new StreamSource(new StringReader(readPrettyPrintXslt())));

Here, the readPrettyPrintXslt() method reads prettyprint.xsl content.

这里,readPrettyPrintXslt()方法读取prettyprint.xsl内容。

Now, if we test the method in Java 8 and Java 9, both produce the same output:

现在,如果我们在Java 8和Java 9中测试这个方法,两者都会产生相同的输出。

Pretty printing by Transformer
=============================================
<emails>
  <email>
    <from>Kai</from>
    <to>Amanda</to>
    <time>2018-03-05</time>
    <subject>I am flying to you</subject>
  </email>
...
</emails>

We’ve solved the problem with the standard Java API. Next, let’s pretty print the emails.xml using an external library.

我们已经用标准的Java API解决了这个问题。接下来,让我们用一个外部库来漂亮地打印emails.xml

4. Pretty-Printing XML With the Dom4j Library

4.用Dom4j库漂亮地打印XML

Dom4j is a popular XML library. It allows us to easily pretty-print XML documents.

Dom4j是一个流行的XML库。它允许我们轻松地漂亮地打印XML文档。

First, let’s add the Dom4j dependency into our pom.xml:

首先,让我们把Dom4j的依赖性添加到我们的pom.xml中。

<dependency>
    <groupId>org.dom4j</groupId>
    <artifactId>dom4j</artifactId>
    <version>2.1.3</version>
</dependency>

We’ve used the 2.1.3 version as an example. We can find the latest version in the Maven Central repository.

我们以2.1.3版本为例。我们可以在Maven Central资源库中找到最新版本

Next, let’s see how to pretty-print XML using the Dom4j library:

接下来,让我们看看如何使用Dom4j库来漂亮地打印XML。

public static String prettyPrintByDom4j(String xmlString, int indent, boolean skipDeclaration) {
    try {
        OutputFormat format = OutputFormat.createPrettyPrint();
        format.setIndentSize(indent);
        format.setSuppressDeclaration(skipDeclaration);
        format.setEncoding("UTF-8");

        org.dom4j.Document document = DocumentHelper.parseText(xmlString);
        StringWriter sw = new StringWriter();
        XMLWriter writer = new XMLWriter(sw, format);
        writer.write(document);
        return sw.toString();
    } catch (Exception e) {
        throw new RuntimeException("Error occurs when pretty-printing xml:\n" + xmlString, e);
    }
}

D0m4j’s OutputFormat class has provided a createPrettyPrint method to create a pre-defined pretty-print OutputFormat object. As the method above shows, we can add some customizations on the default pretty-print format. In this case, we set the indent size and decide if we would like to include the declaration in the result.

D0m4j的OutputFormat类提供了一个createPrettyPrint方法来创建一个预先定义的pretty-printOutputFormat对象。如上面的方法所示,我们可以在默认的pretty-print格式上添加一些自定义的内容。在这种情况下,我们设置缩进的大小,并决定是否要在结果中包含声明。

Next, we parse the raw XML string and create an XMLWritter object with the prepared OutputFormat instance.

接下来,我们解析原始的XML字符串,用准备好的OutputFormat实例创建一个XMLWritter对象。

Finally, the XMLWriter object will write the parsed XML document in the required format.

最后,XMLWriter对象将以所需的格式写入解析后的XML文档。

Next, let’s test if it can pretty-print the emails.xml file. This time, let’s say we would like to include the declaration and have an indent size of 8 in the result:

接下来,让我们测试一下它是否能漂亮地打印emails.xml文件。这一次,让我们说,我们想在结果中包括声明,并且缩进尺寸为8。

System.out.println("Pretty printing by Dom4j");
System.out.println("=============================================");
System.out.println(prettyPrintByDom4j(xmlString, 8, false));

When we run the method, we’ll see the output:

当我们运行这个方法时,我们会看到输出。

Pretty printing by Dom4j
=============================================
<?xml version="1.0" encoding="UTF-8"?>

<emails> 
        <email> 
                <from>Kai</from>  
                <to>Amanda</to>  
                <time>2018-03-05</time>  
                <subject>I am flying to you</subject>
        </email>  
        <email> 
                <from>Jerry</from>  
                <to>Tom</to>  
                <time>1992-08-08</time>  
                <subject>Hey Tom, catch me if you can!</subject> 
        </email> 
</emails>

As the output above shows, the method has solved the problem.

如上面的输出所示,该方法已经解决了问题。

5. Conclusion

5.总结

In this article, we’ve addressed two approaches to pretty-print an XML file in Java.

在这篇文章中,我们已经讨论了在Java中漂亮地打印XML文件的两种方法。

We can pretty-print XMLs using the standard Java API. However, we need to keep in mind the Transformer object may produce different results depending on the Java version. The solution is to provide an XSLT file.

我们可以使用标准的Java API对XML进行pretty-print。然而,我们需要记住,Transformer对象可能会根据Java版本产生不同的结果。解决方案是提供一个XSLT文件。

Alternatively, the Dom4j library can solve the problem straightforwardly.

另外,Dom4j库可以直接解决这个问题。

As always, the full version of the code is available over on GitHub.

一如既往,完整版本的代码可在GitHub上获得