Parsing an XML File Using StAX – 使用StAX解析XML文件

最后修改: 2019年 9月 1日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.绪论

In this tutorial, we’ll illustrate how to parse an XML file using StAX. We’ll implement a simple XML parser and see how it works with an example.

在本教程中,我们将说明如何使用StAX解析一个XML文件。我们将实现一个简单的XML解析器,并通过一个例子看看它是如何工作的。

2. Parsing with StAX

2.用StAX进行解析

StAX is one of the several XML libraries in Java. It’s a memory-efficient library included in the JDK since Java 6. StAX doesn’t load the entire XML into memory. Instead, it pulls data from a stream in a forward-only fashion. The stream is read by an XMLEventReader object.

StAX是Java中几个XML库之一这是自 Java 6 以来包含在 JDK 中的一个内存效率高的库。StAX 不会将整个 XML 载入内存。相反,它仅以一种向前的方式从一个流中提取数据。该流由一个XMLEventReader对象读取。

3. XMLEventReader Class

3.XMLEventReader

In StAX, any start tag or end tag is an event. XMLEventReader reads an XML file as a stream of events. It also provides the methods necessary to parse the XML. The most important methods are:

在 StAX 中,任何开始标签或结束标签都是一个事件。 XMLEventReader将一个 XML 文件作为事件流来读取。它还提供了解析 XML 所需的方法。最重要的方法是。

  • isStartElement(): checks if the current event is a StartElement (start tag)
  • isEndElement(): checks if the current event is an EndElement (end tag)
  • asCharacters(): returns the current event as characters
  • getName(): gets the name of the current event
  • getAttributes(): returns an Iterator of the current event’s attributes

4. Implementing a Simple XML Parser

4.实现一个简单的XML解析器

Needless to say, the first step to parse an XML is to read it. We need an XMLInputFactory to create an XMLEventReader for reading our file:

毋庸置疑,解析XML的第一步是读取它。我们需要一个XMLInputFactory来创建一个XMLEventReader来读取我们的文件。

XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader reader = xmlInputFactory.createXMLEventReader(new FileInputStream(path));

Now that the XMLEventReader is ready, we move forward through the stream with nextEvent():

现在,XMLEventReader已经准备好了,我们用nextEvent()在流中前进。

while (reader.hasNext()) {
    XMLEvent nextEvent = reader.nextEvent();
}

Next, we need to find our desired start tag first:

接下来,我们需要先找到我们想要的起始标签。

if (nextEvent.isStartElement()) {
    StartElement startElement = nextEvent.asStartElement();
    if (startElement.getName().getLocalPart().equals("desired")) {
        //...
    }
}

Consequently, we can read the attributes and data:

因此,我们可以读取属性和数据。

String url = startElement.getAttributeByName(new QName("url")).getValue();
String name = nextEvent.asCharacters().getData();

We can also check if we’ve reached an end tag:

我们还可以检查我们是否已经到达了一个结束标签。

if (nextEvent.isEndElement()) {
    EndElement endElement = nextEvent.asEndElement();
}

5. Parsing Example

5.解析实例

To get a better understanding, let’s run our parser on a sample XML file:

为了更好地理解,让我们在一个样本的XML文件上运行我们的解析器。

<?xml version="1.0" encoding="UTF-8"?>
<websites>
    <website url="https://baeldung.com">
        <name>Baeldung</name>
        <category>Online Courses</category>
        <status>Online</status>
    </website>
    <website url="http://example.com">
        <name>Example</name>
        <category>Examples</category>
        <status>Offline</status>
    </website>
    <website url="http://localhost:8080">
        <name>Localhost</name>
        <category>Tests</category>
        <status>Offline</status>
    </website>
</websites>

Let’s parse the XML and store all data into a list of entity objects called websites:

让我们解析XML并将所有数据存储到一个名为websites的实体对象列表中。

while (reader.hasNext()) {
    XMLEvent nextEvent = reader.nextEvent();
    if (nextEvent.isStartElement()) {
        StartElement startElement = nextEvent.asStartElement();
        switch (startElement.getName().getLocalPart()) {
            case "website":
                website = new WebSite();
                Attribute url = startElement.getAttributeByName(new QName("url"));
                if (url != null) {
                    website.setUrl(url.getValue());
                }
                break;
            case "name":
                nextEvent = reader.nextEvent();
                website.setName(nextEvent.asCharacters().getData());
                break;
            case "category":
                nextEvent = reader.nextEvent();
                website.setCategory(nextEvent.asCharacters().getData());
                break;
            case "status":
                nextEvent = reader.nextEvent();
                website.setStatus(nextEvent.asCharacters().getData());
                break;
        }
    }
    if (nextEvent.isEndElement()) {
        EndElement endElement = nextEvent.asEndElement();
        if (endElement.getName().getLocalPart().equals("website")) {
            websites.add(website);
        }
    }
}

To get all the properties of each website, we check startElement.getName().getLocalPart() for each event. We then set the corresponding property accordingly.

为了获得每个网站的所有属性,我们为每个事件检查startElement.getName().getLocalPart()。然后我们相应地设置相应的属性。

When we reach the website’s end element, we know that our entity is complete, so we add the entity to our websites list.

当我们到达网站的结束元素时,我们知道我们的实体已经完成,所以我们将实体添加到我们的websites列表中。

6. Conclusion

6.结语

In this tutorial, we learned how to parse an XML file using StAX library.

在本教程中,我们学习了如何使用StAX库解析一个XML文件

The example XML file and the full parser code are available, as always, over on Github.

示例的XML文件和完整的解析器代码可一如既往地在Github上获取。