Working with XML in Groovy – 在Groovy中使用XML

最后修改: 2019年 6月 9日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.介绍

Groovy provides a substantial number of methods dedicated to traversing and manipulating XML content.

Groovy提供了大量专门用于遍历和操作XML内容的方法。

In this tutorial, we’ll demonstrate how to add, edit, or delete elements from XML in Groovy using various approaches. We’ll also show how to create an XML structure from scratch.

在本教程中,我们将演示如何使用各种方法在Groovy中添加、编辑或删除XML中的元素。我们还将展示如何从头开始创建一个XML结构

2. Defining the Model

2.定义模型

Let’s define an XML structure in our resources directory that we’ll use throughout our examples:

让我们在我们的资源目录中定义一个XML结构,我们将在整个例子中使用。

<articles>
    <article>
        <title>First steps in Java</title>
        <author id="1">
            <firstname>Siena</firstname>
            <lastname>Kerr</lastname>
        </author>
        <release-date>2018-12-01</release-date>
    </article>
    <article>
        <title>Dockerize your SpringBoot application</title>
        <author id="2">
            <firstname>Jonas</firstname>
            <lastname>Lugo</lastname>
        </author>
        <release-date>2018-12-01</release-date>
    </article>
    <article>
        <title>SpringBoot tutorial</title>
        <author id="3">
            <firstname>Daniele</firstname>
            <lastname>Ferguson</lastname>
        </author>
        <release-date>2018-06-12</release-date>
    </article>
    <article>
        <title>Java 12 insights</title>
        <author id="1">
            <firstname>Siena</firstname>
            <lastname>Kerr</lastname>
        </author>
        <release-date>2018-07-22</release-date>
    </article>
</articles>

And read it into an InputStream variable:

并将其读入一个InputStream变量。

def xmlFile = getClass().getResourceAsStream("articles.xml")

3. XmlParser

3.XmlParser

Let’s start exploring this stream with the XmlParser class.

让我们从XmlParser类开始探索这个流。

3.1. Reading

3.1.阅读

Reading and parsing an XML file is probably the most common XML operation a developer will have to do. The XmlParser provides a very straightforward interface meant for exactly that:

读取和解析 XML 文件可能是开发人员必须做的最常见的 XML 操作。XmlParser提供了一个非常直接的接口,正是为了实现这一目的。

def articles = new XmlParser().parse(xmlFile)

At this point, we can access the attributes and values of XML structure using GPath expressions. 

在这一点上,我们可以使用GPath表达式来访问XML结构的属性和值。

Let’s now implement a simple test using Spock to check whether our articles object is correct:

现在让我们使用Spock实现一个简单的测试,检查我们的articles对象是否正确。

def "Should read XML file properly"() {
    given: "XML file"

    when: "Using XmlParser to read file"
    def articles = new XmlParser().parse(xmlFile)

    then: "Xml is loaded properly"
    articles.'*'.size() == 4
    articles.article[0].author.firstname.text() == "Siena"
    articles.article[2].'release-date'.text() == "2018-06-12"
    articles.article[3].title.text() == "Java 12 insights"
    articles.article.find { it.author.'@id'.text() == "3" }.author.firstname.text() == "Daniele"
}

To understand how to access XML values and how to use the GPath expressions, let’s focus for a moment on the internal structure of the result of the XmlParser#parse operation.

为了了解如何访问 XML 值以及如何使用 GPath 表达式,让我们先关注一下 XmlParser#parse 操作的结果的内部结构。

The articles object is an instance of groovy.util.Node. Every Node consists of a name, attributes map, value, and parent (which can be either null or another Node).

articles对象是groovy.util.Node的一个实例。每个Node由一个名称、属性图、值和父级(可以是null或另一个Node)组成。

In our case, the value of articles is a groovy.util.NodeList instance, which is a wrapper class for a collection of Nodes. The NodeList extends the java.util.ArrayList class, which provides extraction of elements by index. To obtain a string value of a Node, we use groovy.util.Node#text().

在我们的例子中,articles的值是一个groovy.util.NodeList实例,它是一个Nodes集合的封装类。NodeList扩展了java.util.ArrayList类,它提供了按索引提取元素的功能。为了获得一个Node的字符串值,我们使用groovy.util.Node#text().

In the above example, we introduced a few GPath expressions:

在上面的例子中,我们介绍了一些GPath表达式。

  • articles.article[0].author.firstname — get the author’s first name for the first article – articles.article[n] would directly access the nth article
  • ‘*’  — get a list of article‘s children – it’s the equivalent of groovy.util.Node#children()
  • author.’@id’ — get the author element’s id attribute – author.’@attributeName’ accesses the attribute value by its name (the equivalents are: author[‘@id’] and author.@id)

3.2. Adding a Node

3.2.添加一个节点

Similar to the previous example, let’s read the XML content into a variable first. This will allow us to define a new node and add it to our articles list using groovy.util.Node#append.

与前面的例子类似,让我们先把XML内容读入一个变量。这将使我们能够定义一个新的节点,并使用groovy.util.Node#append.将其添加到我们的文章列表中。

Let’s now implement a test which proves our point:

现在让我们实施一个测试,证明我们的观点。

def "Should add node to existing xml using NodeBuilder"() {
    given: "XML object"
    def articles = new XmlParser().parse(xmlFile)

    when: "Adding node to xml"
    def articleNode = new NodeBuilder().article(id: '5') {
        title('Traversing XML in the nutshell')
        author {
            firstname('Martin')
            lastname('Schmidt')
        }
        'release-date'('2019-05-18')
    }
    articles.append(articleNode)

    then: "Node is added to xml properly"
    articles.'*'.size() == 5
    articles.article[4].title.text() == "Traversing XML in the nutshell"
}

As we can see in the above example, the process is pretty straightforward.

正如我们在上面的例子中所看到的,这个过程是非常简单的。

Let’s also notice that we used groovy.util.NodeBuilder, which is a neat alternative to using the Node constructor for our Node definition.

让我们也注意到,我们使用了groovy.util.NodeBuilder,这是对我们的Node定义使用Node构造函数的一个巧妙的替代。

3.3. Modifying a Node

3.3.修改一个节点

We can also modify the values of nodes using the XmlParser. To do so, let’s once again parse the content of the XML file. Next, we can edit the content node by changing the value field of the Node object.

我们还可以使用XmlParser修改节点的值。要做到这一点,让我们再一次解析XML文件的内容。接下来,我们可以通过改变Node对象的value域来编辑内容节点。

Let’s remember that while XmlParser uses the GPath expressions, we always retrieve the instance of the NodeList, so to modify the first (and only) element, we have to access it using its index.

让我们记住,当XmlParser使用GPath表达式时,我们总是检索NodeList的实例,所以要修改第一个(也是唯一的)元素,我们必须使用其索引来访问它。

Let’s check our assumptions by writing a quick test:

让我们通过写一个快速测试来检查我们的假设。

def "Should modify node"() {
    given: "XML object"
    def articles = new XmlParser().parse(xmlFile)

    when: "Changing value of one of the nodes"
    articles.article.each { it.'release-date'[0].value = "2019-05-18" }

    then: "XML is updated"
    articles.article.findAll { it.'release-date'.text() != "2019-05-18" }.isEmpty()
}

In the above example, we’ve also used the Groovy Collections API to traverse the NodeList.

在上面的例子中,我们还使用了Groovy Collections API来遍历NodeList

3.4. Replacing a Node

3.4.替换一个节点

Next, let’s see how to replace the whole node instead of just modifying one of its values.

接下来,让我们看看如何替换整个节点,而不仅仅是修改其中的一个值。

Similarly to adding a new element, we’ll use the NodeBuilder for the Node definition and then replace one of the existing nodes within it using groovy.util.Node#replaceNode:

与添加新元素类似,我们将使用NodeBuilder来定义Node,然后使用groovy.util.Node#replaceNode替换其中的一个现有节点。

def "Should replace node"() {
    given: "XML object"
    def articles = new XmlParser().parse(xmlFile)

    when: "Adding node to xml"
    def articleNode = new NodeBuilder().article(id: '5') {
        title('Traversing XML in the nutshell')
        author {
            firstname('Martin')
            lastname('Schmidt')
        }
        'release-date'('2019-05-18')
    }
    articles.article[0].replaceNode(articleNode)

    then: "Node is added to xml properly"
    articles.'*'.size() == 4
    articles.article[0].title.text() == "Traversing XML in the nutshell"
}

3.5. Deleting a Node

3.5.删除一个节点

Deleting a node using the XmlParser is quite tricky. Although the Node class provides the remove(Node child) method, in most cases, we wouldn’t use it by itself.

使用XmlParser删除一个节点是相当棘手的。尽管Node类提供了remove(Node child)方法,在大多数情况下,我们不会单独使用它。

Instead, we’ll show how to delete a node whose value fulfills a given condition.

相反,我们将展示如何删除一个其值满足特定条件的节点。

By default, accessing the nested elements using a chain of Node.NodeList references returns a copy of the corresponding children nodes. Because of that, we can’t use the java.util.NodeList#removeAll method directly on our article collection.

默认情况下,使用Node.NodeList引用链来访问嵌套元素,会返回相应子节点的副本。正因为如此,我们不能在我们的 java.util.NodeList#removeAll 方法上直接使用 article 集合。

To delete a node by a predicate, we have to find all nodes matching our condition first, and then iterate through them and invoke java.util.Node#remove method on the parent each time.

要通过谓词删除一个节点,我们必须先找到所有符合我们条件的节点,然后遍历它们,每次都对父节点调用java.util.Node#remove方法。0px”>.

Let’s implement a test that removes all articles whose author has an id other than 3:

让我们实现一个测试,删除所有作者的id不在3的文章。

def "Should remove article from xml"() {
    given: "XML object"
    def articles = new XmlParser().parse(xmlFile)

    when: "Removing all articles but the ones with id==3"
    articles.article
      .findAll { it.author.'@id'.text() != "3" }
      .each { articles.remove(it) }

    then: "There is only one article left"
    articles.children().size() == 1
    articles.article[0].author.'@id'.text() == "3"
}

As we can see, as a result of our remove operation, we received an XML structure with only one article, and its id is 3.

我们可以看到,作为我们删除操作的结果,我们收到了一个只有一篇文章的XML结构,它的id是3

4. XmlSlurper

4.XmlSlurper

Groovy also provides another class dedicated to working with XML. In this section, we’ll show how to read and manipulate the XML structure using the XmlSlurper.

Groovy还提供了另一个专门用于处理XML的类。在本节中,我们将展示如何使用XmlSlurper.来读取和操作XML结构。

4.1. Reading

4.1.阅读

As in our previous examples, let’s start with parsing the XML structure from a file:

就像我们以前的例子一样,让我们从解析文件的XML结构开始。

def "Should read XML file properly"() {
    given: "XML file"

    when: "Using XmlSlurper to read file"
    def articles = new XmlSlurper().parse(xmlFile)

    then: "Xml is loaded properly"
    articles.'*'.size() == 4
    articles.article[0].author.firstname == "Siena"
    articles.article[2].'release-date' == "2018-06-12"
    articles.article[3].title == "Java 12 insights"
    articles.article.find { it.author.'@id' == "3" }.author.firstname == "Daniele"
}

As we can see, the interface is identical to that of XmlParser. However, the output structure uses the groovy.util.slurpersupport.GPathResult, which is a wrapper class for Node. GPathResult provides simplified definitions of methods such as: equals() and toString() by wrapping Node#text(). As a result, we can read fields and parameters directly using just their names.

我们可以看到,该接口与XmlParser的接口相同。然而,输出结构使用了groovy.util.slurpersupport.GPathResult,它是Node的一个封装类。GPathResult提供了简化的方法定义,如。equals()toString()通过包装Node#text()。因此,我们可以直接使用它们的名字来读取字段和参数。

4.2. Adding a Node

4.2.添加一个节点

Adding a Node is also very similar to using XmlParser. In this case, however, groovy.util.slurpersupport.GPathResult#appendNode provides a method that takes an instance of java.lang.Object as an argument. As a result, we can simplify new Node definitions following the same convention introduced by NodeBuilder:

添加一个Node也与使用XmlParser非常相似。但是在这种情况下,groovy.util.slurpersupport.GPathResult#appendNode提供了一个方法,该方法接收java.lang.Object的实例作为参数。因此,我们可以按照Node引入的相同惯例简化新的NodeBuilder定义。

def "Should add node to existing xml"() {
    given: "XML object"
    def articles = new XmlSlurper().parse(xmlFile)

    when: "Adding node to xml"
    articles.appendNode {
        article(id: '5') {
            title('Traversing XML in the nutshell')
            author {
                firstname('Martin')
                lastname('Schmidt')
            }
            'release-date'('2019-05-18')
        }
    }

    articles = new XmlSlurper().parseText(XmlUtil.serialize(articles))

    then: "Node is added to xml properly"
    articles.'*'.size() == 5
    articles.article[4].title == "Traversing XML in the nutshell"
}

In case we need to modify the structure of our XML with XmlSlurper, we have to reinitialize our articles object to see the results. We can achieve that using the combination of the groovy.util.XmlSlurper#parseText and the groovy.xmlXmlUtil#serialize methods.

如果我们需要用XmlSlurper来修改我们的XML的结构,我们必须重新初始化我们的articles对象来查看结果。我们可以使用groovy.util.XmlSlurper#parseTextgroovy.xmlXmlUtil#serialize方法的组合来实现。

4.3. Modifying a Node

4.3.修改一个节点

As we mentioned before, the GPathResult introduces a simplified approach to data manipulation. That being said, in contrast to the XmlSlurper, we can modify the values directly using the node name or parameter name:

正如我们之前提到的,GPathResult引入了一种简化的数据操作方法。也就是说,与XmlSlurper相比,我们可以直接使用节点名称或参数名称修改数值。

def "Should modify node"() {
    given: "XML object"
    def articles = new XmlSlurper().parse(xmlFile)

    when: "Changing value of one of the nodes"
    articles.article.each { it.'release-date' = "2019-05-18" }

    then: "XML is updated"
    articles.article.findAll { it.'release-date' != "2019-05-18" }.isEmpty()
}

Let’s notice that when we only modify the values of the XML object, we don’t have to parse the whole structure again.

让我们注意到,当我们只修改XML对象的值时,我们不需要再解析整个结构。

4.4. Replacing a Node

4.4.替换一个节点

Now let’s move to replacing the whole node. Again, the GPathResult comes to the rescue. We can easily replace the node using groovy.util.slurpersupport.NodeChild#replaceNode, which extends GPathResult and follows the same convention of using the Object values as arguments:

现在让我们转向替换整个节点。再一次,GPathResult来到了我们的身边。我们可以使用groovy.util.slurpersupport.NodeChild#replaceNode轻松地替换节点,它扩展了GPathResult并遵循同样的惯例,使用Object值作为参数。

def "Should replace node"() {
    given: "XML object"
    def articles = new XmlSlurper().parse(xmlFile)

    when: "Replacing node"
    articles.article[0].replaceNode {
        article(id: '5') {
            title('Traversing XML in the nutshell')
            author {
                firstname('Martin')
                lastname('Schmidt')
            }
            'release-date'('2019-05-18')
        }
    }

    articles = new XmlSlurper().parseText(XmlUtil.serialize(articles))

    then: "Node is replaced properly"
    articles.'*'.size() == 4
    articles.article[0].title == "Traversing XML in the nutshell"
}

As was the case when adding a node, we’re modifying the structure of the XML, so we have to parse it again.

与添加节点时的情况一样,我们正在修改XML的结构,所以我们必须再次解析它。

4.5. Deleting a Node

4.5.删除一个节点

To remove a node using XmlSlurper, we can reuse the groovy.util.slurpersupport.NodeChild#replaceNode method simply by providing an empty Node definition:

要使用XmlSlurper删除一个节点,我们可以重新使用groovy.util.slurpersupport.NodeChild#replaceNode方法,只需提供一个空的Node定义。

def "Should remove article from xml"() {
    given: "XML object"
    def articles = new XmlSlurper().parse(xmlFile)

    when: "Removing all articles but the ones with id==3"
    articles.article
      .findAll { it.author.'@id' != "3" }
      .replaceNode {}

    articles = new XmlSlurper().parseText(XmlUtil.serialize(articles))

    then: "There is only one article left"
    articles.children().size() == 1
    articles.article[0].author.'@id' == "3"
}

Again, modifying the XML structure requires reinitialization of our articles object.

同样,修改XML结构需要重新初始化我们的articles对象。

5. XmlParser vs XmlSlurper

5 XmlParserXmlSlurper的比较

As we showed in our examples, the usages of XmlParser and XmlSlurper are pretty similar. We can more or less achieve the same results with both. However, some differences between them can tilt the scales towards one or the other.

正如我们在例子中所展示的,XmlParserXmlSlurper的用法非常相似。我们或多或少可以用两者达到相同的结果。然而,它们之间的一些差异会使天平向一个或另一个倾斜。

First of all, XmlParser always parses the whole document into the DOM-ish structure. Because of that, we can simultaneously read from and write into it. We can’t do the same with XmlSlurper as it evaluates paths more lazily. As a result, XmlParser can consume more memory.

首先,XmlParser总是将整个文档解析成DOM式的结构。正因为如此,我们可以同时读出和写入它。我们不能用XmlSlurper做同样的事情,因为它对路径的评估更加懒散。因此,XmlParser可以消耗更多的内存。

On the other hand, XmlSlurper uses more straightforward definitions, making it simpler to work with. We also need to remember that any structural changes made to XML using XmlSlurper require reinitialization, which can have an unacceptable performance hit in case of making many changes one after another.

另一方面,XmlSlurper使用更直接的定义,使其更容易操作。我们还需要记住,使用XmlSlurper对XML进行的任何结构性改变都需要重新初始化,这在接连进行许多改变的情况下可能会产生不可接受的性能打击

The decision of which tool to use should be made with care and depends entirely on the use case.

应谨慎决定使用哪种工具,这完全取决于使用情况。

6. MarkupBuilder

6.MarkupBuilder

Apart from reading and manipulating the XML tree, Groovy also provides tooling to create an XML document from scratch. Let’s now create a document consisting of the first two articles from our first example using groovy.xml.MarkupBuilder:

除了读取和操作XML树,Groovy还提供了从头开始创建XML文档的工具。现在让我们使用groovy.xml.MarkupBuilder创建一个由第一个例子的前两篇文章组成的文档。

def "Should create XML properly"() {
    given: "Node structures"

    when: "Using MarkupBuilderTest to create xml structure"
    def writer = new StringWriter()
    new MarkupBuilder(writer).articles {
        article {
            title('First steps in Java')
            author(id: '1') {
                firstname('Siena')
                lastname('Kerr')
            }
            'release-date'('2018-12-01')
        }
        article {
            title('Dockerize your SpringBoot application')
            author(id: '2') {
                firstname('Jonas')
                lastname('Lugo')
            }
            'release-date'('2018-12-01')
        }
    }

    then: "Xml is created properly"
    XmlUtil.serialize(writer.toString()) == XmlUtil.serialize(xmlFile.text)
}

In the above example, we can see that MarkupBuilder uses the very same approach for the Node definitions we used with NodeBuilder and GPathResult previously.

在上面的例子中,我们可以看到MarkupBuilderNode的定义使用了与之前NodeBuilderGPathResult相同的方法。

To compare output from MarkupBuilder with the expected XML structure, we used the groovy.xml.XmlUtil#serialize method.

为了比较来自MarkupBuilder的输出和预期的XML结构,我们使用groovy.xml.XmlUtil#serialize方法。

7. Conclusion

7.结论

In this article, we explored multiple ways of manipulating XML structures using Groovy.

在这篇文章中,我们探讨了使用Groovy操作XML结构的多种方法。

We looked at examples of parsing, adding, editing, replacing, and deleting nodes using two classes provided by Groovy: XmlParser and XmlSlurper. We also discussed differences between them and showed how we could build an XML tree from scratch using MarkupBuilder.

我们看了使用Groovy提供的两个类来解析、添加、编辑、替换和删除节点的例子。XmlParserXmlSlurper。我们还讨论了它们之间的差异,并展示了我们如何使用MarkupBuilder从头开始建立一个XML树。

As always, the complete code used in this article is available over on GitHub.

一如既往,本文中使用的完整代码可在GitHub上找到。