1. Introduction
1.导言
In this quick tutorial, we’ll see a few alternatives using core Java and external libraries to search for files in a directory (including sub-directories) that match a specific extension. We’ll go from simple arrays and lists to streams and other newer methods.
在本快速教程中,我们将了解使用核心 Java 和外部库搜索目录(包括子目录)中与特定扩展名匹配的文件的几种替代方法。我们将从简单的数组和列表转向流和其他更新的方法。
2. Setting up Our Filter
2.设置我们的过滤器
Since we need to filter files by extension, let’s start with a simple Predicate implementation. We’ll need a little input sanitization to ensure we match most use cases, like accepting extension names beginning with a dot or not:
既然我们需要通过扩展名来过滤文件,那么就让我们从简单的 Predicate 实现开始吧。我们需要对输入进行一些消毒处理,以确保符合大多数使用情况,例如接受以点开头的扩展名或不接受以点开头的扩展名:
public class MatchExtensionPredicate implements Predicate<Path> {
private final String extension;
public MatchExtensionPredicate(String extension) {
if (!extension.startsWith(".")) {
extension = "." + extension;
}
this.extension = extension.toLowerCase();
}
@Override
public boolean test(Path path) {
if (path == null) {
return false;
}
return path.getFileName()
.toString()
.toLowerCase()
.endsWith(extension);
}
}
We start by writing our constructor, which prepends a dot before the extension name (if it doesn’t already contain one). Then, we transform it to lowercase. This way, when we compare it with other files, we can ensure they have the same case. Finally, we implement test() by getting the Path‘s file name and transforming it to lowercase. Most importantly, we check if it ends with the extension name we’re looking for.
首先,我们要编写构造函数,在扩展名前加上一个点(如果扩展名中还没有点的话)。然后,我们将其转换为小写。这样,当我们将其与其他文件进行比较时,就能确保它们具有相同的大小写。最后,我们通过获取 Path 的文件名并将其转换为小写来实现 test() 。最重要的是,我们要检查它是否以我们正在寻找的扩展名结尾。
3. Traversing Directories With Files.listFiles()
3.使用 Files.listFiles() 遍历目录
Our first example will use a method that’s been around since the dawn of Java: Files.listFiles(). Let’s start by instantiating a List to store our results and listing all files in the directory:
我们的第一个示例将使用自 Java 诞生以来就一直存在的方法:Files.listFiles()。让我们先实例化一个 List 来存储我们的结果,并列出目录中的所有文件:
List<File> find(File startPath, String extension) {
List<File> matches = new ArrayList<>();
File[] files = startPath.listFiles();
if (files == null) {
return matches;
}
// ...
}
By itself, listFiles() doesn’t operate recursively, so for every item, if we identify it’s a directory, we start recursing:
listFiles()本身并不进行递归操作,因此对于每个项目,如果我们确定它是一个目录,我们就开始递归:
MatchExtensionPredicate filter = new MatchExtensionPredicate(extension);
for (File file : files) {
if (file.isDirectory()) {
matches.addAll(find(file, extension));
} else if (filter.test(file.toPath())) {
matches.add(file);
}
}
return matches;
We also instantiate our filter and only add the current file to our list if it passes our test() implementation. Ultimately, we’ll have all the results matching our filter. Note that this can cause a StackOverflowError in directory trees that are too deep and an OutOfMemoryError in directories that contain too many files. We’ll see options that perform better later.
我们还实例化了我们的 过滤器,只有当前文件通过了我们的 test() 实现,我们才会将其添加到列表中。最终,我们将得到所有与我们的过滤器匹配的结果。请注意,如果目录树过深,这可能会导致 StackOverflowError 异常;如果目录中包含过多文件,则可能导致 OutOfMemoryError 异常。我们稍后会看到性能更好的选项。
4. Traversing Directories With Files.walkFileTree() From Java 7 Onwards
4.从 Java 7 开始使用 Files.walkFileTree() 遍历目录
Starting with Java 7, we have the NIO2 APIs. It included many utilities like the Files class and a new way to handle files with the Path class. Using walkFileTree() allows us to traverse a directory recursively with zero effort. This method only requires a starting Path and a FileVisitor implementation:
从 Java 7 开始,我们有了NIO2 API。它包括许多实用程序,如 Files 类和使用 Path 类处理文件的新方法。使用 walkFileTree() 可以让我们不费吹灰之力地递归遍历一个目录。该方法只需要一个起始 Path 和一个 FileVisitor 实现:
List<Path> find(Path startPath, String extension) throws IOException {
List<Path> matches = new ArrayList<>();
Files.walkFileTree(startPath, new SimpleFileVisitor<Path>() {
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attributes) {
if (new MatchExtensionPredicate(extension).test(file)) {
matches.add(file);
}
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path file, IOException exc) {
return FileVisitResult.CONTINUE;
}
});
return matches;
}
FileVisitor contains callbacks for a few events: before entering a directory, after leaving a directory, when visiting a file, and when this visit fails. But, with SimpleFileVisitor, we only need to implement the callbacks we’re interested in. In this case, it’s visiting a file with visitFile(). So, for every file visited, we test it against our Predicate and add it to a list of matching files.
FileVisitor 包含对一些事件的回调:进入目录前、离开目录后、访问文件时以及访问失败时。但是,使用 SimpleFileVisitor,我们只需实现我们感兴趣的回调。因此,对于访问的每个文件,我们都会根据 Predicate 对其进行测试,并将其添加到匹配文件列表中。
Also, we’re implementing visitFileFailed() to always return FileVisitResult.CONTINUE. With this, we can continue searching for files even if an exception – like access denied – occurs.
此外,我们正在实施 visitFileFailed() 以始终返回 FileVisitResult.CONTINUE.有了它,即使出现拒绝访问等异常情况,我们也能继续搜索文件。
5. Streaming With Files.walk() From Java 8 Onwards
5.从 Java 8 开始使用 Files.walk() 进行流式传输
Java 8 included a simpler way to traverse directories that integrate with the Stream API. Here’s how our method looks with Files.walk():
Java 8 提供了一种更简单的方法来遍历与 Stream API 集成的目录。下面是我们使用 Files.walk() 时的方法:
Stream<Path> find(Path startPath, String extension) throws IOException {
return Files.walk(startPath)
.filter(new MatchExtensionPredicate(extension));
}
Unfortunately, this breaks at the first exception thrown, and there’s no way to handle this yet. So, let’s try a different approach. We’ll start by implementing a FileVisitor that contains a Consumer<Path>. This time, we’ll use this Consumer to do whatever we want with our file matches instead of accumulating them in a List:
不幸的是,一旦出现异常就会崩溃,而且目前还没有方法来处理这一问题。因此,让我们尝试一种不同的方法。我们将首先实现一个包含 Consumer
public class SimpleFileConsumerVisitor extends SimpleFileVisitor<Path> {
private final Predicate<Path> filter;
private final Consumer<Path> consumer;
public SimpleFileConsumerVisitor(MatchExtensionPredicate filter, Consumer<Path> consumer) {
this.filter = filter;
this.consumer = consumer;
}
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attributes) {
if (filter.test(file)) {
consumer.accept(file);
}
return FileVisitResult.CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException {
return FileVisitResult.CONTINUE;
}
}
Finally, let’s modify our find() method to use it:
最后,让我们修改 find() 方法以使用它:
void find(Path startPath, String extension, Consumer<Path> consumer) throws IOException {
MatchExtensionPredicate filter = new MatchExtensionPredicate(extension);
Files.walkFileTree(startPath, new SimpleFileConsumerVisitor(filter, consumer));
}
Note that we had to go back to Files.walkFileTree() to use our FileVisitor implementation.
请注意,我们必须返回 Files.walkFileTree() 以使用我们的 FileVisitor 实现。
6. Using Apache Commons IO’s FileUtils.iterateFiles()
6.使用 Apache Commons IO 的 FileUtils.iterateFiles() 功能
Another helpful option is FileUtils.iterateFiles() from Apache Commons IO, which returns an Iterator. Let’s include its dependency:
另一个有用的选项是 Apache Commons IO 中的 FileUtils.iterateFiles() ,它会返回一个 Iterator 。让我们加入它的 依赖关系:
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.15.1</version>
</dependency>
With its dependency, we can also use the WildcardFileFilter instead of our MatchExtensionPredicate:
通过其依赖关系,我们还可以使用 WildcardFileFilter 代替 MatchExtensionPredicate :。
Iterator<File> find(Path startPath, String extension) {
if (!extension.startsWith(".")) {
extension = "." + extension;
}
return FileUtils.iterateFiles(
startPath.toFile(),
WildcardFileFilter.builder().setWildcards("*" + extension).get(),
TrueFileFilter.INSTANCE);
}
We start our method by ensuring the extension name is in the expected format. Checking if it’s necessary to prepend a dot allows our method to work if we pass “.extension” or just “extension.”
我们首先要确保扩展名的格式符合预期。如果我们传递的是”.extension “或仅仅是 “extension”,检查是否有必要在前面加上一个点,这样我们的方法就能正常工作。
As with other methods, it just needs a starting directory. But, since this is an older API, it requires a File instead of a Path. The last argument is an optional directory filter. But, if not specified, it ignores subdirectories. So, we include a TrueFileFilter.INSTANCE to make sure the whole directory tree is visited.
与其他方法一样,它只需要一个起始目录。但由于这是一个较早的 API,它需要的是 File 而不是 Path 。最后一个参数是一个可选的目录过滤器。但是,如果不指定,它将忽略子目录。因此,我们包含一个 TrueFileFilter.INSTANCE 来确保访问整个目录树。
7. Conclusion
7.结论
In this article, we explored various approaches to searching for files in a directory and its subdirectories based on a specified extension. We started by setting up a flexible extension matching Predicate. Then, we covered different techniques, ranging from the traditional Files.listFiles() and Files.walkFileTree() methods to more modern alternatives introduced in Java 8, such as Files.walk(). Also, we explored the usage of Apache Commons IO’s FileUtils.iterateFiles() for a different perspective.
在本文中,我们探讨了根据指定扩展名在目录及其子目录中搜索文件的各种方法。我们首先设置了一个灵活的扩展名匹配 谓词。然后,我们介绍了不同的技术,从传统的 Files.listFiles() 和 Files.walkFileTree() 方法到 Java 8 中引入的更现代的替代方法,如 Files.walk()。此外,我们还从另一个角度探讨了 Apache Commons IO 的 FileUtils.iterateFiles() 的用法。
As always, the source code is available over on GitHub.
与往常一样,源代码可在 GitHub 上获取。