1. Overview
1.概述
In this tutorial, we’ll discuss how to find the last row in an Excel spreadsheet using Java and Apache POI.
在本教程中,我们将讨论如何使用Java和Apache POI找到Excel电子表格的最后一行。
Firstly, we’ll see how to fetch a single row from the file using Apache POI. Then, we’ll look at methods for counting all rows in a worksheet. Finally, we’ll combine them to fetch the last row of a given sheet.
首先,我们将看到如何使用Apache POI从文件中获取单一行。然后,我们将看看计算工作表中所有行的方法。最后,我们将结合这些方法来获取一个给定工作表的最后一行。
2. Fetch a Single Row
2.获取单行
As we already know, the Apache POI provides an abstract layer to represent Microsoft documents, including Excel, in Java. We can access the sheets in a file and even read and modify each cell.
正如我们已经知道的,Apache POI提供了一个抽象层,以在Java中表示Microsoft文档,包括Excel。我们可以访问文件中的工作表,甚至可以读取和修改每个单元格。
Let’s start by fetching a single row from our Excel file. Before we move on, we need to get the Worksheet from the file:
让我们首先从我们的Excel文件中获取一条记录。在我们继续之前,我们需要从文件中获取工作表:
Workbook workbook = new XSSFWorkbook(fileLocation);
Sheet sheet = workbook.getSheetAt(0);
The Workbook is a Java representation of the Excel file, while Sheet is the main structure within a Workbook. The Worksheet is the most common subtype of Sheet, representing a grid of cells.
工作簿是Excel文件的Java表示,而表单是工作簿中的主要结构。 工作表是的最常见子类型,代表一个单元格。
When we open our worksheet in Java, we can access the data it contains, i.e., the row data. To fetch a single row, we can use the getRow(int) method:
当我们在Java中打开我们的工作表时,我们可以访问它包含的数据,即行数据。为了获取单行数据,我们可以使用getRow(int) 方法:
Row row = sheet.getRow(2);
The method returns the Row object – the high-level representation of a single row from the Excel file, or null if the row doesn’t exist.
该方法返回Row对象–来自Excel文件的单一行的高级表示,如果该行不存在则为空。
As we see, we need to supply a single parameter, the index (0-based) of the requested row. Unfortunately, there is no API available to get the last row directly.
正如我们所看到的,我们需要提供一个单一的参数,即请求的行的索引(基于0)。不幸的是,没有可用的API来直接获得最后一行。
3. Find the Count of Rows
3.查找行数
We’ve just learned how to get a single row from an Excel file using Java. Now, let’s find the index of the last row on a given Sheet.
我们刚刚学习了如何使用Java从Excel文件中获取一条记录。现在,让我们来找一找给定Sheet.上最后一行的索引。
Apache POI provides two methods that help count rows: getLastRowNum() and getPhysicalNumberOfRows(). Let’s take a look at each of them.
Apache POI提供了两个方法来帮助计算行数。getLastRowNum()和getPhysicalNumberOfRows()。让我们来看看它们各自的情况。
3.1. Using getLastRowNum()
3.1.使用getLastRowNum()
According to the documentation, the getLastRowNum() method returns the number (0-based) of the last initialized row on the worksheet, or -1 if no row exists:
根据文档,getLastRowNum()方法返回工作表上最后初始化的行的数字(基于0),如果不存在行,则返回-1。
int lastRowNum = sheet.getLastRowNum();
Once we fetched lastRowNum, we should now easily access the last row using the getRow() method.
一旦我们获取了lastRowNum,我们现在应该很容易使用getRow()方法访问最后一行。
We should note that rows that had content before and were set to empty later might still be counted as rows. Therefore, the result may not be as expected. To understand this, we need to learn more about physical rows.
我们应该注意,之前有内容、后来被设置为空的行可能仍然被算作行。因此,结果可能不尽如人意。为了理解这一点,我们需要了解更多关于物理行的知识。
3.2. Using getPhysicalNumberOfRows()
3.2.使用getPhysicalNumberOfRows()
Inspecting the Apache POI documentation, we can find a special term related to the rows – the physical row.
检查Apache POI文档,我们可以发现一个与行有关的特殊术语–物理行。
A row is always interpreted as physical whenever it contains any data. The row is initialized not only if any cells in that row contain text or formulas but also if they have some data about formatting, e.g., the background color, the row height, or non-default font used. In other words, each row that is initialized is also physical.
只要包含任何数据,行总是被解释为物理的。行不仅在该行的任何单元格包含文本或公式时被初始化,而且如果它们有一些关于格式化的数据时也被初始化,例如,背景颜色、行高或使用的非默认字体。换句话说,每个被初始化的行也是物理的。
To get the count of physical rows, Apache POI provides the getPhysicalNumberOfRows() method:
为了获得物理行的数量,Apache POI提供了getPhysicalNumberOfRows()方法。
int physicalRows = sheet.getPhysicalNumberOfRows();
int physicalRows = sheet.getPhysicalNumberOfRows();/code>
According to the physical row explanation, the result may differ from the number obtained with the getLastRowNum() method.
根据物理行的解释,结果可能与用getLastRowNum()方法得到的数字不同。
4. Fetch the Last Row
4.获取最后一行
Now, let’s test both methods against a more complex Excel grid:
现在,让我们用一个更复杂的Excel网格来测试这两种方法。
Here, the leading rows contain the text data, the value calculated by the formula (=A1), and the background color changed accordingly. Then, the 4th row has modified height, while the 5th and 6th rows are untouched. The 7th row contains text again. On the 8th row, the text was previously formatted but later cleared. The 9th and subsequent rows weren’t edited.
在这里,领先的几行包含了文本数据,由公式计算出来的值(=A1),背景颜色也相应地改变了。然后,第4行修改了高度,而第5行和第6行则未被触动。第7行又包含了文本。在第8行,文本以前是有格式的,但后来被清除了。第9行和以后的行没有被编辑。
Let’s check the results of the count methods:
让我们检查一下计数方法的结果。
assertEquals(7, sheet.getLastRowNum());
assertEquals(6, sheet.getPhysicalNumberOfRows());
As we mentioned before, the last row number and the physical number of rows are different in some cases.
正如我们之前提到的,最后的行数和物理行数在某些情况下是不同的。
Let’s now fetch rows based on their index:
现在让我们根据索引来获取行。
assertNotNull(sheet.getRow(0)); // data
assertNotNull(sheet.getRow(1)); // formula
assertNotNull(sheet.getRow(2)); // green
assertNotNull(sheet.getRow(3)); // height
assertNull(sheet.getRow(4));
assertNull(sheet.getRow(5));
assertNotNull(sheet.getRow(6)); // last?
assertNotNull(sheet.getRow(7)); // cleared later
assertNull(sheet.getRow(8));
...
As we can see, the getPhysicalNumberOfRows() returns the total number of not-null (i.e., initialized) Rows in the worksheet. The getLastRowNum() value is the index of the last not-null Row.
我们可以看到,getPhysicalNumberOfRows()返回工作表中非空(即初始化)的Rows的总数。getLastRowNum()值是最后一个非空的Row的索引。
Therefore, we can fetch the last row on the sheet:
因此,我们可以取到工作表上的最后一行。
Row lastRow = null;
int lastRowNum = sheet.getLastRowNum();
if (lastRowNum >= 0) {
lastRow = sheet.getRow(lastRowNum);
}
However, we have to remember that the last row returned by Apache POI is not always the one where text or formula is shown, especially in some UI editors such as Microsoft Excel.
然而,我们必须记住,Apache POI返回的最后一行并不总是显示文本或公式的那一行,特别是在一些UI编辑器中,如Microsoft Excel。
5. Conclusion
5.结论
In this article, we inspected the Apache POI API and fetched the last row from a given Excel file.
在这篇文章中,我们检查了Apache POI API,并从一个给定的Excel文件中获取了最后一行。
We started by revisiting some of the basic methods to open a spreadsheet in Java. We then introduced the getRow(int) method to retrieve a single Row. After that, we checked the values of getLastRowNum() and getPhysicalNumberOfRows() and explained their difference. Finally, we checked all the methods against an Excel grid to fetch the last row.
我们首先重温了在Java中打开电子表格的一些基本方法。然后我们介绍了getRow(int)方法,以检索单个Row。之后,我们检查了getLastRowNum()和getPhysicalNumberOfRows()的值,并解释了它们的区别。最后,我们针对Excel网格检查了所有的方法,以获取最后一行。
As always, the full version of the code is available over on GitHub.
一如既往,完整版的代码可以在GitHub上找到。