Why wait() Requires Synchronization? – 为什么 wait() 需要同步?

最后修改: 2023年 10月 23日

中文/混合/英文(键盘快捷键:t)

1. Introduction

1.导言

In Java, we have a wait()/notify() API. This API is one of the ways to synchronize between threads. In order to use the methods of this API, the current thread must own the monitor of the callee.

在 Java 中,我们有一个 wait()/notify() API。该 API 是线程间同步的方法之一。要使用此 API 的方法,当前线程必须拥有被调用者的监视器。

In this tutorial, we’ll explore the reasons why this requirement makes sense.

在本教程中,我们将探讨这一要求合理的原因。

2. How wait() Works

2.wait()如何工作

First, we need to briefly talk about how wait() works in Java. In Java, in accordance with JLS, each object has a monitor. Essentially, it means that we can synchronize on any object that we like. It was probably not a good decision, but this is what we have now.

首先,我们需要简要谈谈 wait() 在 Java 中是如何工作的。在 Java 中,根据 JLS,每个对象都有一个监视器。从本质上讲,这意味着我们可以对任何对象进行同步。这可能不是一个好的决定,但这就是我们现在所拥有的。

Having that, when we call wait(), we implicitly do two things. First, we place the current thread into the JVM internal wait set for this object monitor. The second is that once the thread is in wait set, we (or the JVM, for that matter) release the synchronization lock on this object. Here, we need to clarify – the word this means the object on which we call the wait() method.

因此,当我们调用 wait() 时,我们隐式地做了两件事。首先,我们将当前线程放入 JVM 内部的 this 对象监视器的等待集。其次,一旦线程进入等待集,我们(或 JVM)就会释放对 this 对象的同步锁。在此,我们需要澄清的是,this 指的是我们调用 wait() 方法的对象。

And then, the current thread just waits in the set until another thread calls notify()/notifyAll() on this object.

然后,当前线程就在集合中等待,直到另一个线程对 this 对象调用 notify()/notifyAll()

3. Why Is Monitor Ownership Required?

3.为什么需要监控器所有权?

In the previous section, we saw that the second thing JVM does is the release of the synchronization lock on this object. In order to release it, we obviously need to own it first. The reasoning for this is relatively simple: synchronization on wait() comes as a requirement in order to avoid the lost wake-up problem. This problem essentially represents a condition where we have a waiting thread that has missed the notify signal. It mostly happens due to the race condition between threads. Let us emulate this problem with an example.

在上一节中,我们看到 JVM 所做的第二件事是释放 this 对象上的同步锁。要释放它,我们显然需要先拥有它。其中的道理相对简单:wait()上的同步是避免丢失唤醒问题的必要条件。这个问题主要是指我们有一个等待线程错过了通知信号。这主要是由于线程之间的竞赛条件造成的。让我们通过一个示例来模拟这个问题。

Suppose we have the following Java code:

假设我们有以下 Java 代码:

private volatile boolean jobIsDone;

private Object lock = new Object();

public void ensureCondition() {
    while (!jobIsDone) {
        try {
            lock.wait();
        } 
        catch (InterruptedException e) {
            // ...
        }
    }
}

public void complete() {
    jobIsDone = true;
    lock.notify();
}

A quick note – this code will fail in runtime with IllegalMonitorStateException. This is because, in both methods, we do not ask for a lock object monitor before wait()/notify() calls. Thus, this code is purely for demonstration and learning purposes.

简要说明–这段代码在运行时会出现 IllegalMonitorStateException 异常。这是因为在这两个方法中,我们在调用 wait()/notify() 之前都没有要求 lock 对象监视器。因此,这段代码纯粹用于演示和学习目的。

Also, let’s assume we have two threads. So, thread B is doing the useful work. Once it is done, thread B needs to call the complete() method to signal the completion. We also have another thread, A, that is waiting for the job performed by B to be completed. Thread A makes its check for condition by calling the ensureCondition() method. The check for the condition is happening in the loop because of the spurious wake-up problem that occurs on the Linux kernel level, but that is another topic.

另外,假设我们有两个线程。因此,线程 B 正在执行有用的工作。一旦完成,线程 B 需要调用 complete() 方法来发出完成信号。我们还有另一个线程A在等待B完成工作。线程 A 通过调用 ensureCondition() 方法对条件进行检查。对条件的检查是在循环中进行的,这是因为 发生在 Linux 内核级别,但这是另一个话题。

4. The Problem of the Lost Wake-up

4.迷失的唤醒问题

Let’s break down our example step by step. Assume thread A called ensureCondition() and enters the while loop. It checked for a condition, which appeared to be false, so it entered the try block. Because we operate in a multithreaded environment, another thread B can simultaneously enter the complete() method. Therefore, B can call set volatile flag jobIsDone to true and call notify() before thread A called wait().

让我们一步步分解我们的示例。假设线程 A 调用了 ensureCondition() 并进入了 while 循环。它检查了一个条件,该条件似乎是 false,因此它进入了 try 块。由于我们在多线程环境中运行,另一个线程 B 可以同时进入 complete() 方法。因此,B 可以在线程 A 调用 wait() 之前,将 volatile 标志 jobIsDone 设置为 true 并调用 notify()

In this case, if thread B will never enter the complete() again, thread A will wait forever, and therefore, all of the resources associated with it will also live forever. This leads not only to deadlocks if thread A happens to hold another lock but to memory leaks because objects reachable from thread A stack frames will remain alive. This is because thread A is considered to be alive, and it can resume its execution. Therefore, GC is not allowed to garbage collect objects allocated in methods of A stack.

在这种情况下,如果线程 B 再也不会进入 complete(),那么线程 A 将永远等待,因此与之相关的所有资源也将永远存在。如果线程 A 恰好持有另一个锁,这不仅会导致死锁,而且还会导致内存泄漏,因为线程 A 堆栈帧中可访问的对象将一直存在。这是因为线程 A 被认为是存活的,它可以继续执行。因此,不允许 GC 垃圾收集在 A 堆栈方法中分配的对象。

5. Solution

5.解决方案

So, in order to avoid this condition, we need synchronization. Therefore, the caller must own the monitor of the callee before execution. So, let’s rewrite our code having synchronization concerns in mind:

因此,为了避免这种情况,我们需要同步。因此,调用者必须在执行前拥有被调用者的监视器。因此,让我们在重写代码时考虑同步问题:

private volatile boolean jobIsDone;
private final Object lock = new Object();

public void ensureCondition() {
    synchronized (lock) {
        while (!jobIsDone) {
            try {
                lock.wait();
            } 
            catch (InterruptedException e) { 
                // ...
            }
        }
    }
}

public void complete() {
    synchronized (lock) {
        jobIsDone = true;
        lock.notify();
    }
}

Here, we just added a synchronized block, where we try to acquire the lock object monitor before invoking the wait()/notify() API. Now, we avoid lost wake-up if B executes complete() method before A will invoke wait(). This is because the complete() method can be executed by B only if A has not acquired the lock object monitor. Thus, A cannot check a condition while the complete() method is executing.

在这里,我们只是添加了一个 synchronized 块,在调用 wait()/notify() API 之前,我们会尝试获取 lock 对象监视器。现在,如果 BA 调用 wait() 之前执行 complete() 方法,我们就可以避免唤醒丢失。这是因为只有当 A 尚未获得 lock 对象监视器时,B 才能执行 complete() 方法。因此,当 complete() 方法正在执行时,A 无法检查条件。

6. Conclusion

6.结论

In this article, we discussed why the Java wait() method requires synchronization. We need ownership of the callee monitor in order to avoid lost wake-up anomaly. If we do not do that, the JVM will take a fail-fast approach and throw IllegalMonitorStateException.

在本文中,我们讨论了 Java wait() 方法需要同步的原因。我们需要拥有被调用者监视器的所有权,以避免丢失唤醒异常。如果我们不这样做,JVM 将采取快速失败方法并抛出 IllegalMonitorStateException 异常。

As always, the source code for these examples can be found over on GitHub.

与往常一样,这些示例的源代码可在 GitHub 上找到