Read and Write Files in Java Using Separate Threads – 使用独立线程在 Java 中读写文件

最后修改: 2024年 2月 5日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.导言

When it comes to file handling in Java, it can be challenging to manage large files without causing performance issues. That’s where the concept of using separate threads comes in. By using separate threads, we can efficiently read and write files without blocking the main thread. In this tutorial, we’ll explore how to read and write files using separate threads.

说到 Java 中的文件处理,要在不引起性能问题的情况下管理大文件是很有挑战性的。这就是使用独立线程概念的由来。通过使用独立线程,我们可以在不阻塞主线程的情况下高效地读取和写入文件。

2. Why Use Separate Threads

2.为什么要使用单独的线程

Using separate threads for file operations can improve performance by allowing concurrent execution of tasks. In a single-threaded program, file operations are performed sequentially. For example, we read the entire file first and then write to another file. This can be time-consuming, especially for large files.

使用独立的线程进行文件操作,可以并发执行任务,从而提高性能。在单线程程序中,文件操作是按顺序执行的。例如,我们先读取整个文件,然后再写入另一个文件。这可能会很耗时,尤其是对于大文件。

By using separate threads, multiple file operations can be performed simultaneously, taking advantage of multicore processors and overlapping I/O operations with computation. This concurrency can lead to better utilization of system resources and reduced overall execution time. However, it’s essential to note that the effectiveness of using separate threads depends on the nature of the tasks and the I/O operations involved.

通过使用单独的线程,可以同时执行多个文件操作,从而充分利用多核处理器和 I/O 操作与计算的重叠。这种并发性可以更好地利用系统资源,并缩短整体执行时间。不过,必须注意的是,使用独立线程的有效性取决于任务的性质和所涉及的 I/O 操作。

3. Implementation of File Operations Using Threads

3.使用线程执行文件操作

Reading and writing files can be done using separate threads to improve performance. In this section, we’ll discuss how to implement file operations using threads.

读写文件可以使用单独的线程来完成,以提高性能。本节将讨论如何使用线程执行文件操作。

3.1. Reading Files in Separate Threads

3.1.在独立线程中读取文件

To read a file in a separate thread, we can create a new thread and pass a Runnable object that reads the file. The FileReader class is used to read a file. Moreover, to enhance the file reading process, we use a BufferedReader that allows us to read the file line by line efficiently:

要在单独的线程中读取文件,我们可以创建一个新线程,并传递一个用于读取文件的 Runnable 对象。FileReader 类用于读取文件。此外,为了增强文件读取过程,我们使用了 BufferedReader 来高效地逐行读取文件:

Thread thread = new Thread(new Runnable() {
    @Override
    public void run() {
        try (BufferedReader bufferedReader = new BufferedReader(new FileReader(filePath))) {
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                System.out.println(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
});

thread.start();

3.2. Writing Files in Separate Threads

3.2.在独立线程中写入文件

We create another new thread and use the FileWriter class to write data to the file:

我们创建另一个新线程,并使用 FileWriter 类将数据写入文件:

Thread thread = new Thread(new Runnable() {
    @Override
    public void run() {
        try (FileWriter fileWriter = new FileWriter(filePath)) {
            fileWriter.write("Hello, world!");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
});

thread.start();

This approach allows reading and writing to run concurrently, meaning they can happen simultaneously in separate threads. This is particularly beneficial when one operation doesn’t depend on the completion of the other.

这种方法允许读写操作并发运行,这意味着它们可以在不同的线程中同时进行。当一个操作不依赖于另一个操作的完成时,这一点尤为有利。

4. Handling Concurrency

4.处理并发性

Concurrent access to files by multiple threads requires careful attention to avoid data corruption and unexpected behavior. In the earlier code, the two threads are started concurrently. This means that both can execute simultaneously, and there is no guarantee about the order in which their operations will be interleaved. If a reader thread tries to access the file while a write operation is still ongoing, it might end up reading incomplete or partially written data. This can result in misleading information or errors during processing, potentially affecting downstream operations that rely on accurate data.

多个线程并发访问文件时需要小心谨慎,以避免数据损坏和意外行为。在前面的代码中,两个线程是同时启动的。这意味着两个线程可以同时执行,但无法保证它们的操作顺序。如果阅读器线程试图在写操作仍在进行时访问文件,它最终可能会读取不完整或部分写入的数据。这可能会在处理过程中产生误导信息或错误,从而影响依赖准确数据的下游操作。

Moreover, if two writing threads simultaneously attempt to write data to the file, their writes might interleave and overwrite portions of each other’s data. Without proper synchronization handling, this could result in corrupted or inconsistent information.

此外,如果两个写入线程同时尝试向文件写入数据,它们的写入可能会交错覆盖对方的部分数据。如果不进行适当的同步处理,可能会导致信息损坏或不一致。

To address this, one common approach is to use a producer-consumer model. One or more producer threads read files and add them to a queue, and one or more consumer threads process the files from the queue. This approach allows us to easily scale our application by adding more producers or consumers as needed.

为了解决这个问题,一种常见的方法是使用生产者-消费者模式。一个或多个生产者线程读取文件并将其添加到队列中,一个或多个消费者线程处理队列中的文件。这种方法允许我们根据需要添加更多的生产者或消费者,从而轻松扩展应用程序。

5. Concurrent File Processing With BlockingQueue

5.使用 BlockingQueue 并发文件处理

The producer-consumer model with a queue coordinates operations, ensuring a consistent order of reads and writes. To implement this model, we can use a thread-safe queue data structure, such as a BlockingQueue. The producers can add files to the queue using the offer() method, and the consumers can retrieve files using the poll() method.

使用队列的生产者-消费者模型可以协调操作,确保读写顺序一致。为实现这一模型,我们可以使用线程安全队列数据结构,例如 BlockingQueue 。生产者可以使用 offer() 方法将文件添加到队列中,消费者可以使用 poll() 方法检索文件。

Each BlockingQueue instance has an internal lock that manages access to its internal data structures (linked list, array, etc.). When a thread attempts to perform an operation like offer() or poll(), it first acquires this lock. This ensures that only one thread can access the queue at a time, preventing simultaneous modifications and data corruption.

每个 BlockingQueue 实例都有一个内部锁,用于管理对其内部数据结构(链表、数组等)的访问。当线程尝试执行 offer()poll() 等操作时,它首先会获取该锁。这样可以确保每次只有一个线程可以访问队列,防止同时修改和数据损坏。

By using BlockingQueue, we decouple the producer and consumer, allowing them to work at their own pace without directly waiting for each other. This can improve overall performance.

通过使用BlockingQueue,我们将生产者和消费者分离开来,使他们可以按照自己的节奏工作,而无需直接等待对方。这可以提高整体性能。

5.1. Create FileProducer

5.1.创建文件生成器</em

We begin by creating the FileProducer class, representing the producer thread responsible for reading lines from an input file and adding them to a shared queue. This class utilizes a BlockingQueue to coordinate between the producer and consumer threads. It accepts a BlockingQueue to serve as a synchronized storage for lines, ensuring that the consumer thread can access them.

我们首先创建 FileProducer 类,该类代表负责从输入文件中读取行并将其添加到共享队列的生产者线程。该类利用 BlockingQueue 在生产者和消费者线程之间进行协调。它接受 BlockingQueue 作为行的同步存储,确保消费者线程可以访问这些行。

Here is an example of the FileProducer class:

下面是 FileProducer 类的示例:

class FileProducer implements Runnable {
    private final BlockingQueue<String> queue;
    private final String inputFileName;

    public FileProducer(BlockingQueue<String> queue, String inputFileName) {
        this.queue = queue;
        this.inputFileName = inputFileName;
    }
    // ...
}

Next, in the run() method, we open the file using BufferedReader for efficient line reading. We also include error handling for potential IOException that might occur during file operations.

接下来,在 run() 方法中,我们使用 BufferedReader 打开文件,以便高效读取行。我们还针对文件操作过程中可能出现的IOException进行了错误处理。

@Override
public void run() {
    try (BufferedReader reader = new BufferedReader(new FileReader(inputFileName))) {
        String line;
        // ...
    } catch (IOException e) {
        e.printStackTrace();
    }
}

After we open the file, the code enters a loop, reading lines from the file and concurrently adding them to the queue using the offer() method:

打开文件后,代码会进入一个循环,从文件中读取行数,同时使用 offer() 方法将行数添加到队列中:

while ((line = reader.readLine()) != null) {
    queue.offer(line);
}

5.2. Create FileConsumer

5.2.创建文件消费者</em

Following that, we introduce the FileConsumer class, which represents the consumer thread tasked with retrieving lines from the queue and writing them into an output file. This class accepts a BlockingQueue as input for receiving lines from the producer thread:

随后,我们引入了 FileConsumer 类,该类代表消费者线程,其任务是从队列中检索行并将其写入输出文件。该类接受一个 BlockingQueue 作为输入,以便从生产者线程接收行:

class FileConsumer implements Runnable {
    private final BlockingQueue<String> queue;
    private final String outputFileName;

    public FileConsumer(BlockingQueue queue, String outputFileName) {
        this.queue = queue;
        this.outputFileName = outputFileName;
    }
    
    // ...
}

Next, in the run() method we use BufferedWriter to facilitate efficient writing to the output file:

接下来,在 run() 方法中,我们使用了 BufferedWriter 以提高写入输出文件的效率:

@Override
public void run() {
    try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputFileName))) {
        String line;
        // ...
    } catch (IOException e) {
        e.printStackTrace();
    }
}

After we open the output file, the code enters a continuous loop, using the poll() method to retrieve lines from the queue. If a line is available, it writes the line to a file. The loop terminates when poll() returns null, indicating that the producer has finished writing lines and there are no more lines to process:

打开输出文件后,代码会进入一个连续循环,使用 poll() 方法从队列中检索行。如果有可用的行,它就会将该行写入文件。当 poll() 返回 null 时,循环终止,这表明生产者已经写完一行,没有更多的行需要处理:

while ((line = queue.poll()) != null) {
    writer.write(line);
    writer.newLine();
}

5.3. Orchestrator of Threads

5.3.线程协调器

Finally, we wrap everything together within the main program. First, we create a LinkedBlockingQueue instance to serve as the intermediary for lines between the producer and consumer threads. This queue establishes a synchronized channel for communication and coordination.

最后,我们将主程序中的所有内容封装在一起。首先,我们创建一个LinkedBlockingQueue实例,作为生产者和消费者线程之间线路的中介。该队列为通信和协调建立了同步通道。

BlockingQueue<String> queue = new LinkedBlockingQueue<>();

Next, we create two threads: a FileProducer thread responsible for reading lines from the input file and adding them to the queue. We also create a FileConsumer thread tasked with retrieving lines from the queue and expertly handling their processing and output to the designated output file:

接下来,我们创建两个线程:FileProducer 线程负责从输入文件中读取行并将其添加到队列中。我们还创建了一个 FileConsumer 线程,负责从队列中检索行数,并以专业方式处理这些行数,然后将其输出到指定的输出文件:

String fileName = "input.txt";
String outputFileName = "output.txt"

Thread producerThread = new Thread(new FileProducer(queue, fileName));
Thread consumerThread = new Thread(new FileConsumer(queue, outputFileName);

Subsequently, we initiate their execution using the start() method. We utilize the join() method to ensure both threads gracefully finish their work before the program bows out:

随后,我们使用 start() 方法启动它们的执行。我们使用 join() 方法确保两个线程在程序结束前优雅地完成工作:

producerThread.start();
consumerThread.start();

try {
    producerThread.join();
    consumerThread1.join();
} catch (InterruptedException e) {
    e.printStackTrace();
}

Now, let’s create an input file and then run the program:

现在,让我们创建一个输入文件,然后运行程序:

Hello,
Baeldung!
Nice to meet you!

After running the program, we can inspect the output file. We should see the output file contains the same lines as the input file:

运行程序后,我们可以检查输出文件。我们可以看到输出文件中包含与输入文件相同的行:

Hello,
Baeldung!
Nice to meet you!

In the provided example, the producer is adding lines to the queue in a loop, and the consumer is retrieving lines from the queue in a loop. This means multiple lines can be in the queue simultaneously, and the consumer may process lines from the queue even as the producer is still adding more lines.

在所提供的示例中,生产者以循环方式向队列中添加行,消费者以循环方式从队列中检索行。这意味着队列中可以同时有多条线路,即使生产者仍在添加线路,消费者也可以处理队列中的线路。

6. Conclusion

6.结论

In this article, we’ve explored the utilization of separate threads for efficient file handling in Java. We also demonstrated using BlockingQueue to achieve synchronized and efficient line-by-line processing of files.

在本文中,我们探讨了如何利用独立线程在 Java 中高效处理文件。我们还演示了使用 BlockingQueue 来实现同步和高效的逐行文件处理。

As always, the source code for the examples is available over on GitHub.

与往常一样,这些示例的源代码可在 GitHub 上获取。