1. Overview
1.概述
When it comes to Microsoft Excel files, reading values from different cells can be a little tricky. Excel files are spreadsheets organized in rows and cells which can contain String, Numeric, Date, Boolean, and even Formula values. Apache POI is a library offering a full suite of tools to handle different excel files and value types.
当涉及到Microsoft Excel文件时,从不同的单元格中读取数值可能有点棘手。Excel文件是以行和单元格组织的电子表格,其中可以包含字符串、数字、日期、布尔值,甚至是公式值。Apache POI是一个库,提供一整套工具来处理不同的excel文件和数值类型。
In this tutorial, we’ll focus on learning how to handle excel files, iterate through rows and cells, and use the proper way to read each cell value type.
在本教程中,我们将重点学习如何处理excel文件,在行和单元格中进行迭代,并使用正确的方式来读取每个单元格的值类型。
2. Maven Dependency
2.Maven的依赖性
Let’s start by adding the Apache POI dependency to pom.xml:
让我们先把Apache POI的依赖关系添加到pom.xml。
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>5.2.0</version>
</dependency>
The latest versions of poi-ooxml can be found at Maven Central.
poi-ooxml的最新版本可以在Maven中心找到。
3. Apache POI Overview
3.阿帕奇POI概述
The hierarchy starts with the workbook, which represents the whole Excel File. Each file can contain one or more worksheets, which are collections of rows and cells. Depending on the version of the excel file HSSF is the prefix of the classes representing the old Excel files (.xls), whereas the XSSF is used for the newest versions (.xlsx). Therefore we have:
层次结构从工作簿开始,它代表整个Excel文件。每个文件可以包含一个或多个工作表,它们是行和单元格的集合。根据excel文件的版本,HSSF是代表旧的Excel文件(.xls)的类的前缀,而XSSF用于最新的版本(.xlsx)。因此我们有。
- XSSFWorkbook and HSSFWorkbook classes represent the Excel workbook
- Sheet interface represents Excel worksheets
- The Row interface represents rows
- The Cell interface represents cells
3.1. Handling Excel Files
3.1 处理Excel文件
First, we open the file we want to read and convert it into a FileInputStream for further processing. FileInputStream constructor throws a java.io.FileNotFoundException so we need to wrap it around a try-catch block and close the stream at the end:
首先,我们打开要读取的文件,并将其转换成FileInputStream,以便进一步处理。FileInputStream构造函数会抛出一个java.io.FileNotFoundException,所以我们需要用一个try-catch块来包裹它,并在最后关闭该流。
public static void readExcel(String filePath) {
File file = new File(filePath);
try {
FileInputStream inputStream = new FileInputStream(file);
...
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
3.2. Iterating Through the Excel File
3.2.在Excel文件中进行迭代
After we successfully open the InputStream it’s time to create the XSSFWorkbook and iterate through the rows and cells of each sheet. In case we know the exact number of sheets or the name of a specific sheet, we can use the getSheetAt(int index) and getSheet(String sheetName) methods of XSSFWorkbook, respectively.
在我们成功打开InputStream之后,是时候创建XSSFWorkbook并迭代每个工作表的行和单元格了。如果我们知道确切的工作表数量或特定工作表的名称,我们可以分别使用getSheetAt(int index)和getSheet(String sheetName)方法来处理XSSFWorkbook。
Since we want to read through any kind of Excel file, we’ll iterate through all the sheets using three nested for loops, one for the sheets, one for the rows of each sheet, and finally one for the cells of each sheet.
由于我们想读取任何类型的Excel文件,我们将使用三个嵌套for循环遍历所有工作表,一个用于工作表,一个用于每个工作表的行,最后一个用于每个工作表的单元格。
For the sake of this tutorial we will only print the data to the console:
在本教程中,我们将只打印数据到控制台。
FileInputStream inputStream = new FileInputStream(file);
Workbook baeuldungWorkBook = new XSSFWorkbook(inputStream);
for (Sheet sheet : baeuldungWorkBook) {
...
}
Then, in order to iterate through the rows of a sheet, we need to find the index of the first row and the last row which we get from the sheet object:
然后,为了遍历工作表的行,我们需要找到第一行和最后一行的索引,这是我们从工作表对象得到的。
int firstRow = sheet.getFirstRowNum();
int lastRow = sheet.getLastRowNum();
for (int index = firstRow + 1; index <= lastRow; index++) {
Row row = sheet.getRow(index);
}
Finally, we do the same for the cells. Also, while accessing each cell we can optionally pass down a MissingCellPolicy which basically tells the POI what to return when a cell value is empty or null. The MissingCellPolicy enum contains three enumerated values:
最后,我们对单元格做同样的处理。另外,在访问每个单元格时,我们可以选择传递一个MissingCellPolicy,它基本上告诉POI当一个单元格的值为空或null时应该返回什么。MissingCellPolicy枚举包含三个枚举值。
- RETURN_NULL_AND_BLANK
- RETURN_BLANK_AS_NULL
- CREATE_NULL_AS_BLANK;
The code for the cell iteration is as follows:
单元迭代的代码如下。
for (int cellIndex = row.getFirstCellNum(); cellIndex < row.getLastCellNum(); cellIndex++) {
Cell cell = row.getCell(cellIndex, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
...
}
3.3. Reading Cell Values in Excel
3.3.读取Excel中的单元格值
As we mentioned before, Microsoft Excel’s cells can contain different value types, so it’s important to be able to distinguish one cell value type from another and use the appropriate method to extract the value. Below there’s a list of all the value types:
正如我们之前提到的,Microsoft Excel的单元格可以包含不同的数值类型,因此,能够区分一种单元格数值类型并使用适当的方法提取数值是很重要的。下面有一个所有值类型的列表。
- NONE
- NUMERIC
- STRING
- FORMULA
- BLANK
- BOOLEAN
- ERROR
We’ll focus on four main cell value types: Numeric, String, Boolean, and Formula, where the last one contains a calculated value that is of the first three types.
我们将专注于四种主要的单元格值类型。数字、字符串、布尔值和公式,其中最后一种包含前三种类型的计算值。
Let’s create a helper method that basically will check for each value type and based on that it’ll use the appropriate method to access the value. It’s also possible to treat the cell value as a String and retrieve it with the corresponding method.
让我们创建一个辅助方法,基本上会检查每个值的类型,并在此基础上使用适当的方法来访问该值。也可以用相应的方法将单元格的值处理为字符串并获取。
There are two important things worth noting. First, Date values are stored as Numeric values, and also if the cell’s value type is FORMULA we need to use the getCachedFormulaResultType() instead of the getCellType() method to check the result of Formula’s calculation:
有两件重要的事情值得注意。首先,日期值被存储为数字值,另外,如果单元格的值类型是FORMULA,我们需要使用getCachedFormulaResultType()而不是getCellType()方法来检查公式的计算结果。
public static void printCellValue(Cell cell) {
CellType cellType = cell.getCellType().equals(CellType.FORMULA)
? cell.getCachedFormulaResultType() : cell.getCellType();
if (cellType.equals(CellType.STRING)) {
System.out.print(cell.getStringCellValue() + " | ");
}
if (cellType.equals(CellType.NUMERIC)) {
if (DateUtil.isCellDateFormatted(cell)) {
System.out.print(cell.getDateCellValue() + " | ");
} else {
System.out.print(cell.getNumericCellValue() + " | ");
}
}
if (cellType.equals(CellType.BOOLEAN)) {
System.out.print(cell.getBooleanCellValue() + " | ");
}
}
Now, all we need to do is call the printCellValue method inside the cell loop and we are done. Here’s a snippet of the full code:
现在,我们需要做的就是在单元格循环中调用printCellValue方法,然后我们就完成了。下面是完整代码的一个片段。
...
for (int cellIndex = row.getFirstCellNum(); cellIndex < row.getLastCellNum(); cellIndex++) {
Cell cell = row.getCell(cellIndex, Row.MissingCellPolicy.CREATE_NULL_AS_BLANK);
printCellValue(cell);
}
...
4. Conclusion
4.总结
In this article, we have shown an example project for reading Excel files and accessing different cell values using Apache POI.
在这篇文章中,我们展示了一个使用Apache POI读取Excel文件和访问不同单元格值的示例项目。
The full source code can be found over on GitHub.
完整的源代码可以在GitHub上找到over。