1. Introduction
1.导言
Understanding data mapping is essential in software development. Excel is a widely used data management software, making it crucial for Java developers to know how to map data between Excel and Java objects.
了解数据映射在软件开发中至关重要。Excel 是一种广泛使用的数据管理软件,因此 Java 开发人员必须了解如何在 Excel 和 Java 对象之间映射数据。
In this tutorial, we’ll explore converting Excel data into a list of Java objects.
在本教程中,我们将探讨如何将 Excel 数据转换为 Java 对象列表。
Several Java libraries are available on the Maven repository to work with Excel files in Java, with Apache POI being the most common. However, in this tutorial, we’ll use four Java Excel libraries, including Apache POI, Poiji, FastExcel, and JExcelApi (Jxl), to convert Excel data into a Java object list.
Maven 代码库中有多个 Java 库可用于在 Java 中处理 Excel 文件,其中 Apache POI 是最常用的 Java 库。不过,在本教程中,我们将使用 Apache POI、Poiji、FastExcel 和 JExcelApi (Jxl) 等四个 Java Excel 库将 Excel 数据转换为 Java 对象列表。
2. Model Setup
2.模型设置
To get started, we need to create our object’s blueprint, the FoodInfo class:
首先,我们需要创建对象的蓝图,即 FoodInfo 类:
public class FoodInfo {
private String category;
private String name;
private String measure;
private double calories;
// standard constructors, toString, getters and setters
}
3. Apache POI
3.阿帕奇 POI
Apache POI (Poor Obfuscation Implementation) is a Java API for Microsoft documents. It’s a collection of pure Java libraries used to read and write data from/to Microsoft Office files such as Word, Outlook, Excel, and others.
Apache POI(Poor Obfuscation Implementation,差混淆实现)是微软文档的 Java API。它是一个纯 Java 库集合,用于从 Microsoft Office 文件读写数据,如 Word、Outlook、Excel 等。
3.1. Maven Dependency
3.1.Maven 依赖
Let’s add our Maven dependencies to the pom.xml file:
让我们在 pom.xml 文件中添加 我们的 Maven 依赖项:
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>5.2.5</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.5</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml-schemas</artifactId>
<version>4.1.2</version>
</dependency>
3.2. Converting Excel Data into a List of Objects
3.2.将 Excel 数据转换为对象列表
By using its Workbook interface, we can access various features to read the sheets and cells of an Excel file. This interface has two implementations, one for each Excel format – HSSFWorkbook for .xls and XSSFWorkbook for .xlsx.
通过使用其 Workbook 接口,我们可以访问各种功能来读取 Excel 文件的工作表和单元格。该接口有两个实现,每个 Excel 格式一个 – .xls 的 HSSFWorkbook 和 .xlsx 的 XSSFWorkbook 。
This snippet reads and converts Excel data using the Apache POI library from a .xlsx file into a list of FoodInfo objects:
该代码段使用 Apache POI 库从 .xlsx 文件中读取 Excel 数据并将其转换为 FoodInfo 对象列表:
public static List<FoodInfo> excelDataToListOfObjets_withApachePOI(String fileLocation)
throws IOException {
FileInputStream file = new FileInputStream(new File(fileLocation));
Workbook workbook = new XSSFWorkbook(file);
Sheet sheet = workbook.getSheetAt(0);
List<FoodInfo> foodData = new ArrayList<FoodInfo>();
DataFormatter dataFormatter = new DataFormatter();
for (int n = 1; n < sheet.getPhysicalNumberOfRows(); n++) {
Row row = sheet.getRow(n);
FoodInfo foodInfo = new FoodInfo();
int i = row.getFirstCellNum();
foodInfo.setCategory(dataFormatter.formatCellValue(row.getCell(i)));
foodInfo.setName(dataFormatter.formatCellValue(row.getCell(++i)));
foodInfo.setMeasure(dataFormatter.formatCellValue(row.getCell(++i)));
foodInfo.setCalories(row.getCell(++i).getNumericCellValue());
foodData.add(foodInfo);
}
return foodData;
}
To determine the number of non-empty rows in the sheet object, we utilize the getPhysicalNumberOfRows() method. We then loop through the rows, excluding the header row (i = 1).
为了确定 sheet 对象中的非空行数,我们使用了 getPhysicalNumberOfRows() 方法。然后,我们循环查看各行,但不包括标题行(i = 1)。
Depending on the field of the food object that we need to populate, we either use the dataFormatter object or the getNumericValue() method to convert and assign the cell value to the appropriate data type.
根据我们需要填充的食品对象字段,我们可以使用 dataFormatter 对象或 getNumericValue() 方法将单元格值转换并赋值为适当的数据类型。
Let’s verify our code by writing a unit test to make sure it works as expected using an Excel file named food_info.xlsx:
让我们使用名为 food_info.xlsx 的 Excel 文件编写一个单元测试来验证我们的代码,以确保其按预期运行:
@Test
public void whenParsingExcelFileWithApachePOI_thenConvertsToList() throws IOException {
List<FoodInfo> foodInfoList = ExcelDataToListApachePOI
.excelDataToListOfObjets_withApachePOI("src\\main\\resources/food_info.xlsx");
assertEquals("Beverages", foodInfoList.get(0).getCategory());
assertEquals("Dairy", foodInfoList.get(3).getCategory());
}
The Apache POI library offers assistance for both older and newer versions of Excel files, namely .xls and .xlsx.
Apache POI 库可为 Excel 文件的旧版本和新版本(即 .xls 和 .xlsx. )提供帮助。
4. Poiji
4. Poiji
Poiji is a thread-safe Java library that provides an API for one-way data mapping from Excel sheets to Java classes. It’s built on top of the Apache POI library. But unlike Apache POI, it’s much simpler to use and directly converts each Excel row into a Java object.
Poiji 是一个线程安全的 Java 库,它为从 Excel 表到 Java 类的单向数据映射提供了一个 API。它建立在 Apache POI 库之上。但与 Apache POI 不同的是,它的使用要简单得多,可以直接将 Excel 中的每一行转换为 Java 对象。
4.1. Setting Up the Maven Dependency
4.1.设置 Maven 依赖项
Here’s the Poiji Maven dependency we need to add to the pom.xml file:
下面是我们需要添加到 pom.xml 文件中的 Poiji Maven 依赖项:
<dependency>
<groupId>com.github.ozlerhakan</groupId>
<artifactId>poiji</artifactId>
<version>4.1.1</version>
</dependency>
4.2. Setting Up the Class with Annotations
4.2.使用注解设置类
The Poiji library simplifies Excel data retrieval by requiring class fields to be annotated with either @ExcelCellName(String cellName) or @ExcelCell(int cellIndex).
Poiji 库要求用 @ExcelCellName(String cellName) 或 @ExcelCell(int cellIndex) 对类字段进行注释,从而简化了 Excel 数据检索。
Below, we’re setting up our FoodInfo class for the Poiji library by adding annotations:
下面,我们将通过添加注解为 Poiji 库设置 FoodInfo 类:
public class FoodInfo {
@ExcelCellName("Category")
private String category;
@ExcelCellName("Name")
private String name;
@ExcelCellName("Measure")
private String measure;
@ExcelCellName("Calories")
private double calories;
// standard constructors, getters and setters
}
The API offers support for mapping an Excel workbook that has multiple sheets. When our file has several sheets, we can employ the @ExcelSheet(String sheetName) annotation on our class to indicate which sheet we want to work with. Any other sheets will be disregarded.
API 支持映射具有多个工作表的 Excel 工作簿。当我们的文件有多个工作表时,我们可以在类上使用 @ExcelSheet(String sheetName) 注解来指示我们要处理的工作表。任何其他工作表都将被忽略。
However, if we don’t use this annotation, only the first Excel sheet in the workbook will be taken into account.
但是,如果我们不使用该注释,则只会考虑工作簿中的第一个 Excel 表。
In certain situations, we may not need to extract data from every row in the Excel sheet we are targeting. To address this, we can include a private int rowIndex property that is annotated with @ExcelRow in our class. This will allow us to specify the index of the row items we want to access.
在某些情况下,我们可能不需要从目标 Excel 表中的每一行提取数据。为了解决这个问题,我们可以在我们的类中包含一个注释为 @ExcelRow 的 private int rowIndex 属性。这将允许我们指定要访问的行项目的索引。
4.3. Converting Excel Data into a List of Objects
4.3.将 Excel 数据转换为对象列表
Unlike the libraries mentioned in this article, the Poiji library, by default, ignores the header row of the Excel sheet.
与本文中提到的库不同,Poiji 库默认忽略 Excel 表的页眉行。
The following snippet extracts data from an Excel file and converts its data into a list of FoodInfo:
以下代码段从 Excel 文件中提取数据,并将其转换为 FoodInfo: 列表。
public class ExcelDataToListOfObjectsPOIJI {
public static List<FoodInfo> excelDataToListOfObjets_withPOIJI(String fileLocation){
return Poiji.fromExcel(new File(fileLocation), FoodInfo.class);
}
}
The program translates the first Excel sheet of the fileLocation file into a FoodInfo class. Each row becomes an instance of the FoodInfo class, with the cell values representing the object’s properties. The output is a list of FoodInfo objects with a size equivalent to the number of rows (excluding the header row) in the original Excel sheet.
程序将 fileLocation 文件的第一个 Excel 表转换为 FoodInfo 类。每一行都成为 FoodInfo 类的一个实例,单元格值代表对象的属性。输出结果是一个 FoodInfo 对象列表,其大小相当于原始 Excel 表单的行数(不包括标题行)。
There are some cases where a password may protect our Excel sheet. We can define the password via PoijiOptionsBuilder:
在某些情况下,密码可以保护我们的 Excel 表。我们可以通过 PoijiOptionsBuilder 来定义密码:
PoijiOptions options = PoijiOptionsBuilder.settings()
.password("<excel_sheet_password>").build();
List<FoodInfo> foodData = Poiji.fromExcel(new File(fileLocation), FoodInfo.class, options);
To make sure our code works as expected, we write a unit test:
为了确保代码按预期运行,我们要编写一个单元测试:
@Test
public void whenParsingExcelFileWithPOIJI_thenConvertsToList() throws IOException {
List<FoodInfo> foodInfoList =
ExcelDataToListOfObjectsPOIJI
.excelDataToListOfObjets_withPOIJI("src\\main\\resources/food_info.xlsx");
assertEquals("Beverages", foodInfoList.get(0).getCategory());
assertEquals("Dairy", foodInfoList.get(3).getCategory());
}
5. FastExcel
5.FastExcel
FastExcel is an efficient library that utilizes minimal memory and provides high performance for creating and reading basic Excel workbooks in Java. It exclusively supports the newer version of Excel files (.xlsx) and has limited features compared to Apache POI.
FastExcel 是一个高效的库,它只占用极少的内存,并为在 Java 中创建和读取基本 Excel 工作簿提供高性能。它只支持较新版本的 Excel 文件(.xlsx),与 Apache POI 相比功能有限。
It only reads cell content and does not include graphs, styles, or other cell formatting.
它只读取单元格内容,不包括图表、样式或其他单元格格式。
5.1. Setting Up the Maven Dependency
5.1.设置 Maven 依赖项
Below are the FastExcel and the FastExcel reader Maven dependencies added to pom.xml:
下面是添加到 pom.xml 中的 FastExcel 和 FastExcel 阅读器 Maven 依赖项:
<dependency>
<groupId>org.dhatim</groupId>
<artifactId>fastexcel</artifactId>
<version>0.15.7</version>
</dependency>
<dependency>
<groupId>org.dhatim</groupId>
<artifactId>fastexcel-reader</artifactId>
<version>0.15.7</version>
</dependency>
5.2. Converting Excel Data into a List of Objects
5.2.将 Excel 数据转换为对象列表
When dealing with large files, the FastExcel reader is a great option despite its limited features. It’s easy to use, and we can access the entire Excel workbook using the ReadableWorkbook class.
在处理大型文件时,尽管 FastExcel 阅读器的功能有限,但它仍是一个不错的选择。它易于使用,而且 我们可以使用 ReadableWorkbook 类访问整个 Excel 工作簿。
This allows us to retrieve sheets individually, either by name or index.
这样,我们就可以按名称或索引单独检索工作表。
In the method below, we read data from an Excel sheet and convert it into a list of FoodInfo objects:
在下面的方法中,我们从 Excel 表中读取数据,并将其转换为 FoodInfo 对象列表:
public static List<FoodInfo> excelDataToListOfObjets_withFastExcel(String fileLocation)
throws IOException, NumberFormatException {
List<FoodInfo> foodData = new ArrayList<FoodInfo>();
try (FileInputStream file = new FileInputStream(fileLocation);
ReadableWorkbook wb = new ReadableWorkbook(file)) {
Sheet sheet = wb.getFirstSheet();
for (Row row: sheet.read()) {
if (row.getRowNum() == 1) {
continue;
}
FoodInfo food = new FoodInfo();
food.setCategory(row.getCellText(0));
food.setName(row.getCellText(1));
food.setMeasure(row.getCellText(2));
food.setCalories(Double.parseDouble(row.getCellText(3)));
foodData.add(food);
}
}
return foodData;
}
Because the API reads all the rows(including the header row) in the sheet, we need to skip the first row(non-zero-based index) when looping through rows.
由于 应用程序接口会读取 sheet 中的所有行(包括标题行),因此在循环浏览行时,我们需要跳过第一行(非零索引)。
Accessing a cell is done either by instantiating a Cell class: Cell cell = row.getCell(), which has two implementations, one that takes an int cellIndex and the other a String cellAddress(e.g., “C4”) arguments. Or by directly getting the data in the cell: e.g., row.getCellText().
访问单元格可以通过实例化 Cell 类来实现:Cell cell = row.getCell(),它有两种实现方式,一种是使用 int cellIndex 参数,另一种是使用 String cellAddress 参数(例如 “C4″)。或者直接获取单元格中的数据:例如,row.getCellText()。
Either way, after extracting each cell’s content, we need to make sure to cast it, where necessary, to the appropriate food object field type.
无论如何,在提取每个单元格的内容后,我们都需要确保在必要时将其转换为适当的 food 对象字段类型。
Let’s write a unit test to make sure the conversion works:
让我们编写一个单元测试,确保转换工作正常:
@Test
public void whenParsingExcelFileWithFastExcel_thenConvertsToList() throws IOException {
List<FoodInfo> foodInfoList = ExcelDataToListOfObjectsFastExcel
.excelDataToListOfObjets_withFastExcel("src\\main\\resources/food_info.xlsx");
assertEquals("Beverages", foodInfoList.get(0).getCategory());
assertEquals("Dairy", foodInfoList.get(3).getCategory());
}
6. JExcelApi (Jxl)
6.JExcelApi (Jxl)
JExcelApi (or Jxl) is a lightweight Java library for reading, writing, and modifying Excel spreadsheets.
JExcelApi(或 Jxl)是一个轻量级 Java 库,用于读取、写入和修改 Excel 电子表格。
6.1. Setting Up the Maven Dependency
6.1.设置 Maven 依赖项
Let’s add the Maven dependency for JExcelApi to the pom.xml file:
让我们将 JExcelApi 的 Maven 依赖关系添加到 pom.xml 文件中:
<dependency>
<groupId>net.sourceforge.jexcelapi</groupId>
<artifactId>jxl</artifactId>
<version>2.6.12</version>
</dependency>
6.2. Converting Excel Data into a List of Objects
6.2.将 Excel 数据转换为对象列表
Though it only supports the older Excel format(.xls) files, the JExcel library offers a range of classes for manipulating Excel files. The Workbook class is used to access the list of Excel sheets within the file.
虽然 它只支持旧版 Excel 格式(.xls)文件,但 JExcel 库提供了一系列用于操作 Excel 文件的类。Workbook 类用于访问文件中的 Excel 表单列表。
The code below uses the library to convert data from a .xls file to a list of FoodInfo objects, foodData:
下面的代码使用该库将 .xls 文件中的数据转换为 FoodInfo 对象(foodData:)列表。
public static List<FoodInfo> excelDataToListOfObjets_withJxl(String fileLocation)
throws IOException, BiffException {
List<FoodInfo> foodData = new ArrayList<FoodInfo>();
Workbook workbook = Workbook.getWorkbook(new File(fileLocation));
Sheet sheet = workbook.getSheet(0);
int rows = sheet.getRows();
for (int i = 1; i < rows; i++) {
FoodInfo foodInfo = new FoodInfo();
foodInfo.setCategory(sheet.getCell(0, i).getContents());
foodInfo.setName(sheet.getCell(1, i).getContents());
foodInfo.setMeasure(sheet.getCell(2, i).getContents());
foodInfo.setCalories(Double.parseDouble(sheet.getCell(3, i).getContents()));
foodData.add(foodInfo);
}
return foodData;
}
Since the header row is not ignored by the library, we must loop from i = 1. The sheet object is a zero-based index list of rows.
由于 库不会忽略页眉行,因此我们必须从 i = 1 开始循环。工作表对象是一个基于零的行索引列表。
Retrieving cell data using the JExcel library is quite similar to the FastExcel library. Both libraries use the getCell() method with two implementations.
使用 JExcel 库检索单元格数据与 FastExcel 库非常相似。这两个库都使用具有两种实现的 getCell() 方法。
However, in JExcel, this method is directly accessed from a Sheet object rather than a Row object. Additionally, one of the implementations for the getCell() method in JExcel takes two arguments, colNum and rowNum, both of which are integers: sheet.getCell(colNum, rowNum).
但是,在 JExcel 中,该方法是直接从 Sheet 对象而不是 Row 对象访问的。此外,在 JExcel 中,getCell() 方法的实现之一需要两个参数:colNum 和 rowNum,这两个参数都是整数: sheet.getCell(colNum, rowNum). 。
To make sure the conversion works well, let’s write a unit test for our method:
为了确保转换工作顺利进行,让我们为我们的方法编写一个单元测试:
@Test
public void whenParsingExcelFileWithJxl_thenConvertsToList()
throws IOException, BiffException {
List<FoodInfo> foodInfoList = ExcelDataToListOfObjectsJxl
.excelDataToListOfObjets_withJxl("src\\main\\resources/food_info.xls");
assertEquals("Beverages", foodInfoList.get(0).getCategory());
assertEquals("Dairy", foodInfoList.get(3).getCategory());
}
7. Conclusion
7.结论
In this article, we explored the usage of several libraries, such as Apache POI, Poiji, FastExcel, and JExcelApi, to read and convert data from Excel files into Java objects. However, the choice of which library to use depends on the specific needs, considering the advantages and limitations of each of them.
在本文中,我们探讨了如何使用 Apache POI、Poiji、FastExcel 和 JExcelApi 等几个库来读取 Excel 文件中的数据并将其转换为 Java 对象。不过,选择使用哪种库取决于具体需求,并要考虑到每个库的优势和局限性。
For instance, we might choose to use the Poiji library if we prioritize the simplest way to just read data from an Excel file and directly convert it into a list of Java objects.
例如,如果我们优先使用最简单的方法从 Excel 文件读取数据并直接将其转换为 Java 对象列表,那么我们可能会选择使用 Poiji 库。
When it comes to performance and simplicity for two-way Excel data mapping in Java, FastExcel, and JExcelApi libraries are excellent options. However, they offer fewer features compared to Apache POI, which is a feature-rich library that supports styles and graphs.
说到 Java 中 Excel 双向数据映射的性能和简便性,FastExcel 和 JExcelApi 库是极好的选择。不过,与 Apache POI 相比,它们提供的功能较少,而 Apache POI 是一个功能丰富的库,支持样式和图形。
As always, the complete source code for this article is available over on GitHub.
与往常一样,本文的完整源代码可在 GitHub 上获取。