1. Introduction
1.绪论
Today, it’s not uncommon for applications to serve thousands or even millions of users concurrently. Such applications need enormous amounts of memory. However, managing all that memory may easily impact application performance.
今天,应用程序同时为数千甚至数百万用户提供服务的情况并不鲜见。这样的应用程序需要大量的内存。然而,管理所有这些内存可能很容易影响应用程序的性能。
To address this issue, Java 11 introduced the Z Garbage Collector (ZGC) as an experimental garbage collector (GC) implementation.
为了解决这个问题,Java 11引入了Z垃圾收集器(ZGC)作为实验性的垃圾收集器(GC)实现。
In this tutorial, we’ll see how ZGC manages to keep low pause times on even multi-terabyte heaps.
在本教程中,我们将看到ZGC如何设法在多TB的堆上保持低的暂停时间。
2. Main Concepts
2.主要概念
To understand how ZGC works, we need to understand the basic concepts and terminology behind memory management and garbage collectors.
为了了解ZGC的工作原理,我们需要了解内存管理和垃圾收集器背后的基本概念和术语。
2.1. Memory Management
2.1.内存管理
Physical memory is the RAM that our hardware provides.
物理内存是我们的硬件提供的RAM。
The operating system (OS) allocates virtual memory space for each application.
操作系统(OS)为每个应用程序分配virtual memory空间。
Of course, we store virtual memory in physical memory, and the OS is responsible for maintaining the mapping between the two. This mapping usually involves hardware acceleration.
当然,我们在物理内存中存储虚拟内存,操作系统负责维护两者之间的映射。这种映射通常涉及硬件加速。
2.2. Multi-Mapping
2.2.多重映射
Multi-mapping means that there are specific addresses in the virtual memory, which points to the same address in physical memory. Since applications access data through virtual memory, they know nothing about this mechanism (and they don’t need to).
多重映射意味着在虚拟内存中有特定的地址,这些地址指向物理内存中的相同地址。由于应用程序通过虚拟内存访问数据,它们对这种机制一无所知(它们也不需要知道)。
Effectively, we map multiple ranges of the virtual memory to the same range in the physical memory:
有效地,我们将虚拟内存的多个范围映射到物理内存的同一个范围:。
At first glance, its use cases aren’t obvious, but we’ll see later, that ZGC needs it to do its magic. Also, it provides some security because it separates the memory spaces of the applications.
乍一看,它的用例并不明显,但我们稍后会看到,ZGC需要它来发挥它的魔力。此外,它还提供了一些安全性,因为它将应用程序的内存空间分开。
2.3. Relocation
2.3.搬迁
Since we use dynamic memory allocation, the memory of an average application becomes fragmented over time. It’s because when we free up an object in the middle of the memory, a gap of free space remains there. Over time, these gaps accumulate, and our memory will look like a chessboard made of alternating areas of free and used space.
由于我们使用动态内存分配,一个普通的应用程序的内存会随着时间的推移变得支离破碎。这是因为当我们在内存中间释放一个对象时,那里仍然有一个空闲空间的缺口。随着时间的推移,这些空隙不断积累,我们的内存看起来就像一个由自由空间和使用空间交替组成的棋盘。
Of course, we could try to fill these gaps with new objects. To do this, we should scan the memory for free space that’s big enough to hold our object. Doing this is an expensive operation, especially if we have to do it each time we want to allocate memory. Besides, the memory will still be fragmented, since probably we won’t be able to find a free space which has the exact size we need. Therefore, there will be gaps between the objects. Of course, these gaps are smaller. Also, we can try to minimize these gaps, but it uses even more processing power.
当然,我们可以尝试用新的对象来填补这些空隙。要做到这一点,我们应该扫描内存,寻找足以容纳我们对象的自由空间。这样做是一个昂贵的操作,尤其是当我们每次要分配内存的时候都要这样做。此外,内存仍然是碎片化的,因为我们可能无法找到一个具有我们需要的确切大小的自由空间。因此,对象之间会有空隙。当然,这些空隙是比较小的。此外,我们也可以尝试尽量减少这些间隙,但这需要更多的处理能力。
The other strategy is to frequently relocate objects from fragmented memory areas to free areas in a more compact format. To be more effective, we split the memory space into blocks. We relocate all objects in a block or none of them. This way, memory allocation will be faster since we know there are whole empty blocks in the memory.
另一个策略是经常将对象从零散的内存区域重新定位到格式更紧凑的空闲区域。为了更有效,我们将内存空间分割成块。我们重新定位一个块中的所有对象,或者一个都不定位。这样一来,内存分配就会更快,因为我们知道内存中存在完整的空块。
2.4. Garbage Collection
2.4.垃圾收集
When we create a Java application, we don’t have to free the memory we allocated, because garbage collectors do it for us. In summary, GC watches which objects can we reach from our application through a chain of references and frees up the ones we can’t reach.
当我们创建一个Java应用程序时,我们不需要释放我们分配的内存,因为垃圾收集器为我们做了这件事。总之,垃圾收集器观察我们可以通过引用链从我们的应用程序中到达哪些对象,并释放我们无法到达的对象。
A GC needs to track the state of the objects in the heap space to do its work. For example, a possible state is reachable. It means the application holds a reference to the object. This reference might be transitive. The only thing that matters that the application can access these objects through references. Another example is finalizable: objects which we can’t access. These are the objects we consider garbage.
GC需要跟踪堆空间中对象的状态来完成它的工作。例如,一个可能的状态是可达到的。这意味着应用程序持有对该对象的引用。这个引用可能是传递性的。唯一重要的是,应用程序可以通过引用访问这些对象。另一个例子是finalizable:我们无法访问的对象。这些是我们认为是垃圾的对象。
To achieve it, garbage collectors have multiple phases.
为了实现这一点,垃圾收集器有多个阶段。
2.5. GC Phase Properties
2.5.气相色谱法的特性
GC phases can have different properties:
GC相可以有不同的属性。
- a parallel phase can run on multiple GC threads
- a serial phase runs on a single thread
- a stop-the-world phase can’t run concurrently with application code
- a concurrent phase can run in the background, while our application does its work
- an incremental phase can terminate before finishing all of its work and continue it later
Note that all of the above techniques have their strengths and weaknesses. For example, let’s say we have a phase that can run concurrently with our application. A serial implementation of this phase requires 1% of the overall CPU performance and runs for 1000ms. In contrast, a parallel implementation utilizes 30% of CPU and completes its work in 50ms.
请注意,上述所有的技术都有其优势和劣势。例如,假设我们有一个阶段,可以与我们的应用程序同时运行。这个阶段的串行实现需要1%的整体CPU性能,并运行1000ms。相比之下,一个并行的实现则利用了30%的CPU,并在50ms内完成了它的工作。
In this example, the parallel solution uses more CPU overall, because it may be more complex and have to synchronize the threads. For CPU heavy applications (for example, batch jobs), it’s a problem since we have less computing power to do useful work.
在这个例子中,并行方案总体上使用了更多的CPU,因为它可能更复杂,必须同步线程。对于CPU重度应用(例如,批处理作业),这是一个问题,因为我们有更少的计算能力来做有用的工作。
Of course, this example has made-up numbers. However, it’s clear that all applications have their characteristics, so they have different GC requirements.
当然,这个例子有捏造的数字。然而,很明显,所有的应用都有其特点,所以它们有不同的GC要求。
For more detailed descriptions, please visit our article on Java memory management.
关于更详细的描述,请访问我们关于Java内存管理的文章。
3. ZGC Concepts
3.ZGC概念
ZGC intends to provide stop-the-world phases as short as possible. It achieves it in such a way that the duration of these pause times doesn’t increase with the heap size. These characteristics make ZGC a good fit for server applications, where large heaps are common, and fast application response times are a requirement.
ZGC打算提供尽可能短的停止-世界阶段。它以这样一种方式来实现,即这些暂停时间的持续时间不会随着堆的大小而增加。这些特性使ZGC很适合于服务器应用程序,在这些应用程序中,大堆是很常见的,而快速的应用程序响应时间是一个要求。
On top of the tried and tested GC techniques, ZGC introduces new concepts, which we’ll cover in the following sections.
在久经考验的GC技术的基础上,ZGC引入了新的概念,我们将在以下章节中介绍。
But for now, let’s take a look at the overall picture of how ZGC works.
但现在,让我们看看ZGC如何工作的总体情况。
3.1. Big Picture
3.1.大图片
ZGC has a phase called marking, where we find the reachable objects. A GC can store object state information in multiple ways. For example, we could create a Map, where the keys are memory addresses, and the value is the state of the object at that address. It’s simple but needs additional memory to store this information. Also, maintaining such a map can be challenging.
ZGC有一个叫做标记的阶段,我们在这里找到可到达的对象。GC可以用多种方式存储对象的状态信息。例如,我们可以创建一个Map,,其中键是内存地址,而值是该地址的对象的状态。这很简单,但需要额外的内存来存储这些信息。此外,维护这样的地图也是一种挑战。
ZGC uses a different approach: it stores the reference state as the bits of the reference. It’s called reference coloring. But this way we have a new challenge. Setting bits of a reference to store metadata about an object means that multiple references can point to the same object since the state bits don’t hold any information about the location of the object. Multimapping to the rescue!
ZGC使用了一种不同的方法:它将参考状态存储为参考的比特。这被称为参考着色。但是这种方式我们有一个新的挑战。将引用的位设置为存储对象的元数据意味着多个引用可以指向同一个对象,因为状态位并不包含任何关于对象位置的信息。多重映射来解救!
We also want to decrease memory fragmentation. ZGC uses relocation to achieve this. But with a large heap, relocation is a slow process. Since ZGC doesn’t want long pause times, it does most of the relocating in parallel with the application. But this introduces a new problem.
我们还想减少内存碎片。ZGC使用重定位来实现这一点。但是对于一个大堆来说,重定位是一个缓慢的过程。由于ZGC不希望有很长的暂停时间,所以它与应用程序并行地进行大部分的重定位。
Let’s say we have a reference to an object. ZGC relocates it, and a context switch occurs, where the application thread runs and tries to access this object through its old address. ZGC uses load barriers to solve this. A load barrier is a piece of code that runs when a thread loads a reference from the heap – for example, when we access a non-primitive field of an object.
比方说,我们有一个对象的引用。ZGC重新定位了它,然后发生了上下文切换,应用程序线程运行并试图通过其旧地址访问这个对象。ZGC使用负载屏障来解决这个问题。加载屏障是一段代码,当线程从堆中加载一个引用时运行 – 例如,当我们访问一个对象的非原始字段时。
In ZGC, load barriers check the metadata bits of the reference. Depending on these bits, ZGC may perform some processing on the reference before we get it. Therefore, it might produce an entirely different reference. We call this remapping.
在ZGC中,加载屏障检查引用的元数据位。根据这些位,ZGC可能在我们获得引用之前对其进行一些处理。因此,它可能会产生一个完全不同的引用。我们把这称为重映射。
3.2. Marking
3.2 标记
ZGC breaks marking into three phases.
ZGC将标记分为三个阶段。
The first phase is a stop-the-world phase. In this phase, we look for root references and mark them. Root references are the starting points to reach objects in the heap, for example, local variables or static fields. Since the number of root references is usually small, this phase is short.
第一阶段是一个停止世界的阶段。在这个阶段,我们寻找根引用并对其进行标记。根引用是到达堆中对象的起点,例如局部变量或静态字段。由于根引用的数量通常很少,所以这个阶段很短。
The next phase is concurrent. In this phase, we traverse the object graph, starting from the root references. We mark every object we reach. Also, when a load barrier detects an unmarked reference, it marks it too.
下一个阶段是并发的。在这个阶段,我们遍历对象图,从根引用开始。同时,当一个负载屏障检测到一个未标记的引用时,它也会标记它。
The last phase is also a stop-the-world phase to handle some edge cases, like weak references.
最后一个阶段也是一个停止世界的阶段,以处理一些边缘情况,如弱引用。
At this point, we know which objects we can reach.
在这一点上,我们知道我们可以接触到哪些物体。
ZGC uses the marked0 and marked1 metadata bits for marking.
ZGC使用marked0和marked1元数据位进行标记。
3.3. Reference Coloring
3.3.参考 着色
A reference represents the position of a byte in the virtual memory. However, we don’t necessarily have to use all bits of a reference to do that – some bits can represent properties of the reference. That’s what we call reference coloring.
一个引用代表了一个字节在虚拟内存中的位置。然而,我们不一定非要用引用的所有位来做这件事–有些位可以代表引用的属性。这就是我们所说的引用着色。
With 32 bits, we can address 4 gigabytes. Since nowadays it’s widespread for a computer to have more memory than this, we obviously can’t use any of these 32 bits for coloring. Therefore, ZGC uses 64-bit references. It means ZGC is only available on 64-bit platforms:
用32位,我们可以寻址4千兆字节。由于现在的计算机普遍拥有比这更多的内存,我们显然不能把这32位中的任何一位用于着色。因此,ZGC使用64位引用。这意味着ZGC只适用于64位平台:。
ZGC references use 42 bits to represent the address itself. As a result, ZGC references can address 4 terabytes of memory space.
ZGC引用使用42位来表示地址本身。因此,ZGC引用可以寻址4兆字节的内存空间。
On top of that, we have 4 bits to store reference states:
在此基础上,我们有4个比特来存储参考状态。
- finalizable bit – the object is only reachable through a finalizer
- remap bit – the reference is up to date and points to the current location of the object (see relocation)
- marked0 and marked1 bits – these are used to mark reachable objects
We also called these bits metadata bits. In ZGC, precisely one of these metadata bits is 1.
我们也称这些位为元数据位。在ZGC中,这些元数据位中正好有一个是1。
3.4. Relocation
3.4.搬迁
In ZGC, relocation consists of the following phases:
在ZGC中,搬迁工作包括以下几个阶段。
- A concurrent phase, which looks for blocks, we want to relocate and puts them in the relocation set.
- A stop-the-world phase relocates all root references in the relocation set and updates their references.
- A concurrent phase relocates all remaining objects in the relocation set and stores the mapping between the old and new addresses in the forwarding table.
- The rewriting of the remaining references happens in the next marking phase. This way, we don’t have to traverse the object tree twice. Alternatively, load barriers can do it, as well.
3.5. Remapping and Load Barriers
3.5.重新映射和负载障碍
Note that in the relocation phase, we didn’t rewrite most of the references to the relocated addresses. Therefore, using those references, we wouldn’t access the objects we wanted to. Even worse, we could access garbage.
请注意,在重定位阶段,我们并没有重写大部分对重定位地址的引用。因此,使用这些引用,我们将无法访问我们想要的对象。更糟糕的是,我们可能会访问垃圾。
ZGC uses load barriers to solve this issue. Load barriers fix the references pointing to relocated objects with a technique called remapping.
ZGC使用加载屏障来解决这个问题。加载障碍通过一种叫做重定位的技术来修复指向重定位对象的引用。
When the application loads a reference, it triggers the load barrier, which then follows the following steps to return the correct reference:
当应用程序加载一个引用时,它触发了加载屏障,然后按照以下步骤返回正确的引用。
- Checks whether the remap bit is set to 1. If so, it means that the reference is up to date, so can safely we return it.
- Then we check whether the referenced object was in the relocation set or not. If it wasn’t, that means we didn’t want to relocate it. To avoid this check next time we load this reference, we set the remap bit to 1 and return the updated reference.
- Now we know that the object we want to access was the target of relocation. The only question is whether the relocation happened or not? If the object has been relocated, we skip to the next step. Otherwise, we relocate it now and create an entry in the forwarding table, which stores the new address for each relocated object. After this, we continue with the next step.
- Now we know that the object was relocated. Either by ZGC, us in the previous step, or the load barrier during an earlier hit of this object. We update this reference to the new location of the object (either with the address from the previous step or by looking it up in the forwarding table), set the remap bit, and return the reference.
And that’s it, with the steps above we ensured that each time we try to access an object, we get the most recent reference to it. Since every time we load a reference, it triggers the load barrier. Therefore it decreases application performance. Especially the first time we access a relocated object. But this is a price we have to pay if we want short pause times. And since these steps are relatively fast, it doesn’t impact the application performance significantly.
就这样,通过上面的步骤,我们确保了每次我们试图访问一个对象时,我们都能得到它的最新引用。因为每次我们加载一个引用时,都会触发加载屏障。因此,它降低了应用程序的性能。特别是在我们第一次访问一个重新定位的对象时。但如果我们想获得较短的暂停时间,这是我们必须付出的代价。而且,由于这些步骤相对较快,它不会对应用程序的性能产生明显影响。
4. How to Enable ZGC?
4.如何启用ZGC?
We can enable ZGC with the following command-line options when running our application:
在运行我们的应用程序时,我们可以通过以下命令行选项启用ZGC。
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC
Note that since ZGC is an experimental GC, it’ll take some time to become officially supported.
请注意,由于ZGC是一个实验性的GC,它需要一些时间才能成为官方支持的。
5. Conclusion
5.总结
In this article, we saw that ZGC intends to support large heap sizes with low application pause times.
在这篇文章中,我们看到ZGC打算以低的应用暂停时间来支持大的堆尺寸。
To reach this goal, it uses techniques, including colored 64-bit references, load barriers, relocation, and remapping.
为了达到这个目标,它使用了一些技术,包括彩色的64位引用、加载障碍、重定位和重映射。