Univocity Parsers – 统一的城市分析器

最后修改: 2020年 5月 20日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.绪论

In this tutorial, we’ll take a quick look at Univocity Parsers, a library for parsing CSV, TSV, and fixed-width files in Java.

在本教程中,我们将快速了解Univocity Parsers,这是一个用于在Java中解析CSV、TSV和固定宽度文件的库。

We’ll start with the basics of reading and writing files before moving on to reading and writing files to and from Java beans. Then, we’ll take a quick look at the configuration options before wrapping up.

我们将从读写文件的基础知识开始,然后再进行与Java Bean之间的文件读写。然后,在结束之前,我们将快速浏览一下配置选项。

2. Setup

2.设置

To use the parsers, we need to add the latest Maven dependency to our project pom.xml file:

要使用解析器,我们需要将最新的Maven依赖性添加到我们的项目pom.xml文件。

<dependency>
    <groupId>com.univocity</groupId>
    <artifactId>univocity-parsers</artifactId>
    <version>2.8.4</version>
</dependency>

3. Basic Usage

3.基本使用方法

3.1. Reading

3.1.阅读

In Univocity, we can quickly parse an entire file into a collection of String arrays that represent each line in the file.

在Univocity中,我们可以快速地将整个文件解析成代表文件中每一行的String数组的集合。

First, let’s parse a CSV file by providing a Reader to our CSV file into a CsvParser with default settings:

首先,让我们通过向我们的CSV文件提供一个ReaderCsvParser的默认设置来解析一个CSV文件。

try (Reader inputReader = new InputStreamReader(new FileInputStream(
  new File("src/test/resources/productList.csv")), "UTF-8")) {
    CsvParser parser = new CsvParser(new CsvParserSettings());
    List<String[]> parsedRows = parser.parseAll(inputReader);
    return parsedRows;
} catch (IOException e) {
    // handle exception
}

We can easily switch this logic to parse a TSV file by switching to TsvParser and providing it with a TSV file.

我们可以通过切换到TsvParser并向其提供一个TSV文件,轻松地将这个逻辑切换到解析TSV文件。

It’s only slightly more complicated to process a fixed-width file. The primary difference is that we need to provide our field widths in the parser settings.

处理一个固定宽度的文件只是稍微复杂一些。主要区别在于我们需要在分析器设置中提供我们的字段宽度。

Let’s read a fixed-width file by providing a FixedWidthFields object to our FixedWidthParserSettings:

让我们通过向FixedWidthFields对象提供FixedWidthParserSettings来读取一个固定宽度的文件。

try (Reader inputReader = new InputStreamReader(new FileInputStream(
  new File("src/test/resources/productList.txt")), "UTF-8")) {
    FixedWidthFields fieldLengths = new FixedWidthFields(8, 30, 10);
    FixedWidthParserSettings settings = new FixedWidthParserSettings(fieldLengths);

    FixedWidthParser parser = new FixedWidthParser(settings);
    List<String[]> parsedRows = parser.parseAll(inputReader);
    return parsedRows;
} catch (IOException e) {
    // handle exception
}

3.2. Writing

3.2.写作

Now that we’ve covered reading files with the parsers, let’s learn how to write them.

现在我们已经介绍了用分析器读取文件,让我们来学习如何写文件。

Writing files is very similar to reading them in that we provide a Writer along with our desired settings to the parser that matches our file type.

写入文件与读取文件非常相似,我们提供一个Writer以及我们所需的设置给匹配我们文件类型的分析器。

Let’s create a method to write files in all three possible formats:

让我们创建一个方法来写入所有三种可能格式的文件。

public boolean writeData(List<Object[]> products, OutputType outputType, String outputPath) {
    try (Writer outputWriter = new OutputStreamWriter(new FileOutputStream(new File(outputPath)),"UTF-8")){
        switch(outputType) {
            case CSV:
                CsvWriter writer = new CsvWriter(outputWriter, new CsvWriterSettings());
                writer.writeRowsAndClose(products);
                break;
            case TSV:
                TsvWriter writer = new TsvWriter(outputWriter, new TsvWriterSettings());
                writer.writeRowsAndClose(products);
                break;
            case FIXED_WIDTH:
                FixedWidthFields fieldLengths = new FixedWidthFields(8, 30, 10);
                FixedWidthWriterSettings settings = new FixedWidthWriterSettings(fieldLengths);
                FixedWidthWriter writer = new FixedWidthWriter(outputWriter, settings);
                writer.writeRowsAndClose(products);
                break;
            default:
                logger.warn("Invalid OutputType: " + outputType);
                return false;
        }
        return true;
    } catch (IOException e) {
        // handle exception
    }
}

As with reading files, writing CSV files and TSV files are nearly identical. For fixed-width files, we have to provide the field width to our settings.

与读取文件一样,编写CSV文件和TSV文件几乎是相同的。对于固定宽度的文件,我们必须向我们的设置提供字段宽度。

3.3. Using Row Processors

3.3.使用行处理器

Univocity provides a number of row processors we can use and also provides the ability for us to create our own.

Univocity提供了一些我们可以使用的行处理器,也为我们提供了创建自己的能力。

To get a feel for using row processors, let’s use the BatchedColumnProcessor to process a larger CSV file in batches of five rows:

为了了解使用行处理器的情况,让我们使用BatchedColumnProcessor来处理一个较大的CSV文件,分批处理5行。

try (Reader inputReader = new InputStreamReader(new FileInputStream(new File(relativePath)), "UTF-8")) {
    CsvParserSettings settings = new CsvParserSettings();
    settings.setProcessor(new BatchedColumnProcessor(5) {
        @Override
        public void batchProcessed(int rowsInThisBatch) {}
    });
    CsvParser parser = new CsvParser(settings);
    List<String[]> parsedRows = parser.parseAll(inputReader);
    return parsedRows;
} catch (IOException e) {
    // handle exception
}

To use this row processor, we define it in our CsvParserSettings and then all we have to do is call parseAll.

为了使用这个行处理器,我们在CsvParserSettings中定义它,然后我们所要做的就是调用parseAll

3.4. Reading and Writing into Java Beans

3.4.读取和写入Java Bean

The list of String arrays is alright, but we’re often working with data in Java beans. Univocity also allows for reading and writing into specially annotated Java beans.

String数组的列表是好的,但是我们经常要处理Java beans中的数据。Univocity还允许在特别注释的Java Bean中进行读写。

Let’s define a Product bean with the Univocity annotations:

让我们定义一个带有Univocity注解的Product Bean。

public class Product {

    @Parsed(field = "product_no")
    private String productNumber;
    
    @Parsed
    private String description;
    
    @Parsed(field = "unit_price")
    private float unitPrice;

    // getters and setters
}

The main annotation is the @Parsed annotation.

主要注释是@Parsed注释。

If our column heading matches the field name, we can use @Parsed without any values specified. If our column heading differs from the field name we can specify the column heading using the field property.

如果我们的列标题与字段名相匹配,我们可以使用@Parsed而不指定任何值。如果我们的列标题与字段名不同,我们可以使用field属性指定列标题。

Now that we’ve defined our Product bean, let’s read our CSV file into it:

现在我们已经定义了我们的Product Bean,让我们把我们的CSV文件读进去。

try (Reader inputReader = new InputStreamReader(new FileInputStream(
  new File("src/test/resources/productList.csv")), "UTF-8")) {
    BeanListProcessor<Product> rowProcessor = new BeanListProcessor<Product>(Product.class);
    CsvParserSettings settings = new CsvParserSettings();
    settings.setHeaderExtractionEnabled(true);
    settings.setProcessor(rowProcessor);
    CsvParser parser = new CsvParser(settings);
    parser.parse(inputReader);
    return rowProcessor.getBeans();
} catch (IOException e) {
    // handle exception
}

We first constructed a special row processor, BeanListProcessor, with our annotated class. Then, we provided that to the CsvParserSettings and used it to read in a list of Products.

我们首先用我们的注解类构建了一个特殊的行处理器,BeanListProcessor,。然后,我们将其提供给CsvParserSettings,并使用它来读入Products的列表。

Next, let’s write our list of Products out to a fixed-width file:

接下来,让我们把我们的Products列表写入一个固定宽度的文件中。

try (Writer outputWriter = new OutputStreamWriter(new FileOutputStream(new File(outputPath)), "UTF-8")) {
    BeanWriterProcessor<Product> rowProcessor = new BeanWriterProcessor<Product>(Product.class);
    FixedWidthFields fieldLengths = new FixedWidthFields(8, 30, 10);
    FixedWidthWriterSettings settings = new FixedWidthWriterSettings(fieldLengths);
    settings.setHeaders("product_no", "description", "unit_price");
    settings.setRowWriterProcessor(rowProcessor);
    FixedWidthWriter writer = new FixedWidthWriter(outputWriter, settings);
    writer.writeHeaders();
    for (Product product : products) {
        writer.processRecord(product);
    }
    writer.close();
    return true;
} catch (IOException e) {
    // handle exception
}

The notable difference is that we’re specifying our column headers in our settings.

值得注意的区别是,我们在设置中指定了我们的列标题。

4. Settings

4.设置

Univocity has a number of settings we can apply to the parsers. As we saw earlier, we can use settings to apply a row processor to the parsers.

Univocity有许多设置,我们可以应用到分析器上。正如我们前面所看到的,我们可以使用设置来给解析器应用一个行处理器。

There are many other settings that can be changed to suit our needs. Although many of the configurations are common across the three file types, each parser also has format-specific settings.

还有许多其他的设置,可以根据我们的需要进行改变。虽然许多配置在三种文件类型中是通用的,但每个分析器也有特定的格式设置。

Let’s adjust our CSV parser settings to put some limits on the data we’re reading:

让我们调整我们的CSV分析器设置,对我们正在读取的数据进行一些限制。

CsvParserSettings settings = new CsvParserSettings();
settings.setMaxCharsPerColumn(100);
settings.setMaxColumns(50);
CsvParser parser = new CsvParser(new CsvParserSettings());

5. Conclusion

5.总结

In this quick tutorial, we learned the basics of parsing files using the Univocity library.

在这个快速教程中,我们学习了使用Univocity库解析文件的基本知识。

We learned how to read and write files both into lists of string arrays and Java beans. Before, we got into Java beans, we took a quick look at using different row processors. Finally, we briefly touched on how to customize the settings.

我们学习了如何将文件读入和写入字符串数组列表和Java Bean。在进入Java Bean之前,我们快速浏览了使用不同的行处理器。最后,我们简单地谈了一下如何定制设置。

As always, the source code is available over on GitHub.

像往常一样,源代码可在GitHub上获得