1. Overview
1.概述
In this tutorial, we’ll discuss how to parse DOM with Apache Xerces – a mature and established library for parsing/manipulating XML.
在本教程中,我们将讨论如何使用Apache Xerces解析DOM–一个成熟的、用于解析/操纵XML的库。
There are multiple options to parse an XML document; we’ll focus on DOM parsing in this article. The DOM parser loads a document and creates an entire hierarchical tree in memory.
解析一个XML文档有多种选择;在这篇文章中我们将重点讨论DOM解析。DOM解析器加载一个文档并在内存中创建一个完整的分层树。
For an overview of XML libraries support in Java check out our previous article.
有关 Java 中 XML 库支持的概述,请查阅我们之前的文章。
2. Our Document
2.我们的文件
Let’s start with the XML document we’re going to use in our example:
让我们从我们的例子中要使用的XML文档开始。
<?xml version="1.0"?>
<tutorials>
<tutorial tutId="01" type="java">
<title>Guava</title>
<description>Introduction to Guava</description>
<date>04/04/2016</date>
<author>GuavaAuthor</author>
</tutorial>
...
</tutorials>
Note that our document has a root node called “tutorials” with 4 “tutorial” child nodes. Each of these has 2 attributes: “tutId” and “type”. Also, each “tutorial” has 4 child nodes: “title”, “description”, “date” and “author”.
请注意,我们的文档有一个名为 “教程 “的根节点和4个 “教程 “子节点。每个节点都有2个属性。”tutId “和 “type”。另外,每个 “教程 “有4个子节点。”标题”、”描述”、”日期 “和 “作者”。
Now we can continue with parsing this document.
现在我们可以继续解析这个文件了。
3. Loading XML File
3.加载XML文件
First, we should note that the Apache Xerces library is packaged with the JDK, so we don’t need any additional setup.
首先,我们应该注意,Apache Xerces库是与JDK一起打包的,所以我们不需要任何额外的设置。
Let’s jump right into loading our XML file:
让我们直接跳到加载我们的XML文件。
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new File("src/test/resources/example_jdom.xml"));
doc.getDocumentElement().normalize();
In the example above, we first obtain an instance of the DocumentBuilder class, then use the parse() method on the XML document to get a Document object representing it.
在上面的例子中,我们首先获得一个DocumentBuilder类的实例,然后在XML文档上使用parse()方法来获得一个代表它的Document对象。
We also need to use the normalize() method to ensure that the document hierarchy isn’t affected by any extra white spaces or new lines within nodes.
我们还需要使用normalize()方法来确保文档层次结构不受任何额外的空白或节点内新行的影响。
4. Parsing the DOM
4.解析DOM
Now, let’s explore our XML file.
现在,让我们探讨一下我们的XML文件。
Let’s start by retrieving all elements with tag “tutorial”. We can do this using the getElementsByTagName() method, which will return a NodeList:
让我们从检索所有带有标签 “tutorial “的元素开始。我们可以使用getElementsByTagName()方法来做到这一点,它将返回一个NodeList:。
@Test
public void whenGetElementByTag_thenSuccess() {
NodeList nodeList = doc.getElementsByTagName("tutorial");
Node first = nodeList.item(0);
assertEquals(4, nodeList.getLength());
assertEquals(Node.ELEMENT_NODE, first.getNodeType());
assertEquals("tutorial", first.getNodeName());
}
It’s important to note that Node is the primary datatype for the DOM components. All the elements, attributes, text are considered nodes.
值得注意的是,Node是DOM组件的主要数据类型。所有的元素、属性、文本都被视为节点。
Next, let’s see how we can get the first element’s attributes using getAttributes():
接下来,让我们看看如何使用getAttributes()获得第一个元素的属性。
@Test
public void whenGetFirstElementAttributes_thenSuccess() {
Node first = doc.getElementsByTagName("tutorial").item(0);
NamedNodeMap attrList = first.getAttributes();
assertEquals(2, attrList.getLength());
assertEquals("tutId", attrList.item(0).getNodeName());
assertEquals("01", attrList.item(0).getNodeValue());
assertEquals("type", attrList.item(1).getNodeName());
assertEquals("java", attrList.item(1).getNodeValue());
}
Here, we get the NamedNodeMap object, then use the item(index) method to retrieve each node.
在这里,我们得到NamedNodeMap对象,然后使用item(index)方法来检索每个节点。
For every node, we can use getNodeName() and getNodeValue() to find their attributes.
对于每个节点,我们可以使用getNodeName()和getNodeValue()来查找其属性。
5. Traversing Nodes
5.遍历节点
Next, let’s see how to traverse DOM nodes.
接下来,让我们看看如何遍历DOM节点。
In the following test, we’ll traverse the first element’s child nodes and print their content:
在下面的测试中,我们将遍历第一个元素的子节点并打印其内容。
@Test
public void whenTraverseChildNodes_thenSuccess() {
Node first = doc.getElementsByTagName("tutorial").item(0);
NodeList nodeList = first.getChildNodes();
int n = nodeList.getLength();
Node current;
for (int i=0; i<n; i++) {
current = nodeList.item(i);
if(current.getNodeType() == Node.ELEMENT_NODE) {
System.out.println(
current.getNodeName() + ": " + current.getTextContent());
}
}
}
First, we get the NodeList using the getChildNodes() method, then iterate through it, and print the node name and text content.
首先,我们使用getChildNodes()方法得到NodeList,然后遍历它,并打印节点名称和文本内容。
The output will show the contents of the first “tutorial” element in our document:
输出将显示我们文档中第一个 “教程 “元素的内容。
title: Guava
description: Introduction to Guava
date: 04/04/2016
author: GuavaAuthor
6. Modifying the DOM
6.修改DOM
We can also make changes to the DOM.
我们还可以对DOM进行修改。
As an example, let’s change the value of the type attribute from “java” to “other”:
作为一个例子,让我们把type属性的值从 “java “改为 “other”。
@Test
public void whenModifyDocument_thenModified() {
NodeList nodeList = doc.getElementsByTagName("tutorial");
Element first = (Element) nodeList.item(0);
assertEquals("java", first.getAttribute("type"));
first.setAttribute("type", "other");
assertEquals("other", first.getAttribute("type"));
}
Here, changing the attribute value is a simple matter of calling an Element‘s setAttribute() method.
在这里,改变属性值是一个简单的问题,即调用Element的setAttribute()方法。
7. Creating a New Document
7.创建一个新的文件
Besides modifying the DOM, we can also create new XML documents from scratch.
除了修改DOM,我们还可以从头开始创建新的XML文档。
Let’s first have a look at the file we want to create:
让我们先来看看我们要创建的文件。
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<users>
<user id="1">
<email>john@example.com</email>
</user>
</users>
Our XML contains a users root node with one user element that also has a child node email.
我们的XML包含一个users根节点,其中有一个user元素,也有一个子节点email.。
To achieve this, we first have to call the Builder‘s newDocument() method which returns a Document object.
为了实现这一点,我们首先要调用Builder的newDocument()方法,该方法返回一个Document对象。
Then, we’ll call the createElement() method of the new object:
然后,我们将调用新对象的createElement()方法。
@Test
public void whenCreateNewDocument_thenCreated() throws Exception {
Document newDoc = builder.newDocument();
Element root = newDoc.createElement("users");
newDoc.appendChild(root);
Element first = newDoc.createElement("user");
root.appendChild(first);
first.setAttribute("id", "1");
Element email = newDoc.createElement("email");
email.appendChild(newDoc.createTextNode("john@example.com"));
first.appendChild(email);
assertEquals(1, newDoc.getChildNodes().getLength());
assertEquals("users", newDoc.getChildNodes().item(0).getNodeName());
}
To add each element to the DOM, we’re also calling the appendChild() method.
为了将每个元素添加到DOM中,我们也在调用appendChild()方法。
8. Saving a Document
8.保存文件
After modifying our document or creating one from scratch, we’ll need to save it in a file.
在修改我们的文件或从头开始创建一个文件后,我们需要将其保存在一个文件中。
We’ll start with creating a DOMSource object, then use a simple Transformer to save the document in a file:
我们将首先创建一个DOMSource对象,然后使用一个简单的Transformer将文件保存在一个文件中。
private void saveDomToFile(Document document,String fileName)
throws Exception {
DOMSource dom = new DOMSource(document);
Transformer transformer = TransformerFactory.newInstance()
.newTransformer();
StreamResult result = new StreamResult(new File(fileName));
transformer.transform(dom, result);
}
Similarly, we can print our document in the console:
同样地,我们可以在控制台中打印我们的文档。
private void printDom(Document document) throws Exception{
DOMSource dom = new DOMSource(document);
Transformer transformer = TransformerFactory.newInstance()
.newTransformer();
transformer.transform(dom, new StreamResult(System.out));
}
9. Conclusion
9.结论
In this quick article, we learned how to use the Xerces DOM parser to create, modify and save an XML document.
在这篇快速文章中,我们学习了如何使用Xerces DOM解析器来创建、修改和保存一个XML文档。
As always, the full source code for the examples is available over on GitHub.
一如既往,这些示例的完整源代码可在GitHub上获得over。