1. Introduction
1.导言
In this tutorial, we’ll look at Thread-Local Allocation Buffers (TLABs). We’ll see what they are, how the JVM uses them, and how we can manage them.
在本教程中,我们将了解线程本地分配缓冲区(TLAB)。我们将了解它们是什么、JVM 如何使用它们以及如何管理它们。
2. Memory Allocation in Java
2. Java 中的内存分配
Certain commands in Java will allocate memory. The most obvious is the new keyword, but there are others – for example, using reflection.
Java 中的某些命令会分配内存。最明显的是new关键字,但还有其他一些,例如使用反射。
Whenever we do this, the JVM must set aside some memory for the new objects on the heap. In particular, the JVM memory allocation does all allocations in this way in the Eden, or Young, space.
无论我们何时这样做,JVM 都必须在堆上为新对象预留一些内存。特别是,JVM 内存分配以这种方式在 Eden(或 Young)空间进行所有分配。
In a single-threaded application, this is easy. Since only a single memory allocation request can happen at any time, the thread can simply grab the next block of a suitable size, and we’re done:
在单线程应用程序中,这很容易做到。因为任何时候都只能有一个内存分配请求,所以线程只需抓取下一个大小合适的块就可以了:
However, in a multi-threaded application, we can’t do things quite so simply. If we do, then there’s the risk that two threads will request memory at the exact same instant and will both be given the exact same block:
但是,在多线程应用程序中,我们不能把事情做得这么简单。如果这样做,就有可能出现两个线程在完全相同的时刻请求内存,并同时获得完全相同的内存块:
To avoid this, we synchronize memory allocations so that two threads cannot request the same memory block simultaneously. However, synchronizing all memory allocations will make them essentially single-threaded, which can be a huge bottleneck in our application.
为了避免这种情况,我们同步了内存分配,这样两个线程就不能同时请求相同的内存块。但是,同步所有内存分配将使它们基本上成为单线程,这在我们的应用程序中可能是一个巨大的瓶颈。
3. Thread-Local Allocation Buffers
3. 线程本地分配缓冲区 4.
The JVM addresses this concern using Thread-Local Allocation Buffers, or TLABs. These are areas of heap memory that are reserved for a given thread and are used only by that thread to allocate memory:
JVM 使用线程本地分配缓冲区或 TLAB 解决了这一问题。这些是为特定线程保留的堆内存区域,仅供该线程分配内存使用:
By working in this way, no synchronization is necessary since only a single thread can pull from this buffer. The buffer itself is allocated in a synchronized manner, but this is a much less frequent operation.
通过这种方式,无需同步,因为只有单个线程可以从缓冲区中提取数据。缓冲区本身是以同步方式分配的,但这种操作的频率要低得多。
Since allocating memory for objects is a relatively common occurrence, this can be a huge performance improvement. But how much, exactly? We can determine this easily enough with a simple test:
由于为对象分配内存是一件比较常见的事情,因此这可以极大地提高性能。但具体能提高多少呢?我们可以通过一个简单的测试来轻松确定:
@Test
public void testAllocations() {
long start = System.currentTimeMillis();
List<Object> objects = new ArrayList<>();
for (int i = 0; i < 1_000_000; ++i) {
objects.add(new Object());
}
Assertions.assertEquals(1_000_000, objects.size());
long end = System.currentTimeMillis();
System.out.println((end - start) + "ms");
}
This is a relatively simple test, but it does the job. We’re going to allocate memory for 1,000,000 new Object instances and record how long it takes. We can then run this a number of times, both with and without TLAB, and see what the average time is (We’ll see in section 5 how we can turn TLAB off.):
这是一个相对简单的测试,但却能完成任务。我们将为 1,000,000 个新的 Object 实例分配内存,并记录所需的时间。然后,我们可以在使用和不使用 TLAB 的情况下运行若干次,看看平均耗时是多少(我们将在第 5 节中了解如何关闭 TLAB):
We can clearly see the difference. The average time with TLAB is 33 ms, and the average without goes up to 110 ms. That’s an increase of 230%, just by changing this one setting.
我们可以清楚地看到其中的差别。使用 TLAB 的平均时间为 33 毫秒,而不使用 TLAB 的平均时间则高达 110 毫秒。仅仅改变这一个设置,就增加了 230%。
3.1. Running out of TLAB Space
3.1.TLAB 空间耗尽
Obviously, our TLAB space is finite. So, what happens when we run out?
显然,我们的 TLAB 空间是有限的。那么,如果我们用完了,会发生什么呢?
If our application tries to allocate space for a new object and the TLAB doesn’t have enough available, the JVM has four possible options:
如果我们的应用程序试图为一个新对象分配空间,而 TLAB 没有足够的可用空间,JVM 有四种可能的选择:
- It can allocate a new amount of TLAB space for this thread, effectively increasing the amount available.
- It can allocate the memory for this object from outside of TLAB space.
- It can attempt to free up some memory using the garbage collector.
- It can fail to allocate the memory and, instead, throw an error.
Option #4 is our catastrophic case, so we want to avoid it wherever possible, but it’s an option if the other cases can’t happen.
选项 4 是我们的灾难性案例,因此我们希望尽可能避免它,但如果其他案例无法发生,它也是一个选项。
The JVM uses a number of complicated heuristics to determine which of the other options to use, and these heuristics may change between different JVMs and different versions. However, the most important details that feed into this decision include:
JVM 使用许多复杂的启发式方法来决定使用其他选项中的哪一个,这些启发式方法可能会在不同的 JVM 和不同的版本之间发生变化。但是,影响这一决定的最重要细节包括:
- The number of allocations that are likely in a period of time. If we’re likely to be allocating a lot of objects, then increasing TLAB space will be the more efficient choice. If we’re likely to be allocating very few objects, then increasing TLAB space might actually be less efficient.
- The amount of memory being requested. The more memory requested, the more expensive it’ll be to allocate this outside of the TLAB space.
- The amount of available memory. If the JVM has a lot of memory available, then increasing TLAB space is much easier than if the memory usage is very high.
- The amount of memory contention. If the JVM has a lot of threads that each need memory, then increasing TLAB space might be much more expensive than if there are very few threads.
3.2. TLAB Capacity
3.2.实验室容量
Using TLAB seems like a fantastic way to improve performance, but there are always costs. The synchronization needed to prevent multiple threads from allocating the same memory area makes the TLAB itself relatively expensive to allocate. We might also need to wait for sufficient memory to be available to allocate from in the first place if the JVM memory usage is especially high. As such, we ideally want to do this as infrequently as possible.
使用 TLAB 似乎是提高性能的绝佳方法,但总要付出代价。为防止多个线程分配相同的内存区域而需要进行同步,这使得 TLAB 本身的分配成本相对较高。如果 JVM 内存使用率特别高,我们可能还需要等待足够的内存才能分配。因此,我们希望尽可能少地分配内存。
However, if a thread is allocated a larger amount of memory for its TLAB space than it needs, then this memory will just sit there unused and is essentially wasted. Worse, wasting this space makes it more difficult for other threads to obtain memory for TLAB space and can make the entire application slower overall.
但是,如果为线程的 TLAB 空间分配了比它所需更多的内存,那么这些内存就会闲置在那里,基本上就是浪费了。更糟糕的是,浪费这些空间会使其他线程更难获得 TLAB 空间的内存,从而降低整个应用程序的运行速度。
As such, there’s contention about exactly how much space to allocate. Allocate too much, and we’re wasting space. But allocate too little, and we’ll spend more time than is desirable allocating TLAB space.
因此,对于到底应该分配多少空间存在争议。分配太多,就会浪费空间。但分配太少,我们又会在分配 TLAB 空间上花费更多的时间。
Thankfully the JVM will handle all of this for us, though we’ll soon see how we can tune it to our needs if necessary.
值得庆幸的是,JVM 会为我们处理所有这些问题,不过我们很快就会看到如何在必要时根据我们的需要对其进行调整。
4. Seeing TLAB Usage
4.查看 TLAB 使用情况
Now that we know what TLAB is and the impact it can have on our application, how can we see it in action?
既然我们已经知道什么是 TLAB 以及它能对我们的应用程序产生的影响,那么我们如何才能看到它的实际应用呢?
Unfortunately, the jconsole tool doesn’t give any visibility into it as it does with the standard memory pools.
遗憾的是,jconsole 工具并不能像标准内存池那样提供任何可见性。
However, the JVM itself can output some diagnostic information. This uses the new unified GC logging mechanism, so we must launch the JVM with the -Xlog:gc+tlab=trace flag to see this information. This will then periodically print out information about the current TLAB usage by the JVM. For example, during a GC run, we might see something like:
不过,JVM 本身可以输出一些诊断信息。这使用了新的统一 GC 日志机制,因此我们必须使用 -Xlog:gc+tlab=trace 标志启动 JVM 才能看到这些信息。这将定期打印出有关 JVM 当前 TLAB 使用情况的信息。例如,在 GC 运行期间,我们可能会看到如下内容
[0.343s][trace][gc,tlab] GC(0) TLAB: gc thread: 0x000000014000a600 [id: 10499] desired_size: 450KB slow allocs: 4 refill waste: 7208B alloc: 0.99999 22528KB refills: 42 waste 1.4% gc: 161384B slow: 59152B
This tells us that, for this particular thread:
这就告诉我们,就这一特定主题而言:
- The current TLAB size is 450 KB (desired_size).
- There have been four allocations outside of TLAB since the last GV (slow allocs).
Note that the exact logging will vary between JVMs and versions.
请注意,不同 JVM 和版本的具体日志记录会有所不同。
5. Tuning TLAB Settings
5.调整 TLAB 设置
We’ve already seen what the impact can be from turning TLAB on and off, but what else can we do with it? There are a number of settings that we can adjust by providing JVM parameters when starting our application.
我们已经了解了打开和关闭 TLAB 可能产生的影响,但我们还能利用它做些什么呢?我们可以通过在启动应用程序时提供 JVM 参数来调整许多设置。
First, let’s actually see how to turn it off. This is done by passing the JVM parameter -XX-UseTLAB. Setting this will stop the JVM from using TLAB and force it to use synchronization on every memory allocation.
首先,让我们实际看看如何关闭它。这可以通过传递 JVM 参数 -XX-UseTLAB 来实现。设置该参数将阻止 JVM 使用 TLAB,并强制它在每次内存分配时使用同步。
We can also leave TLAB enabled but stop it from being resized by setting the JVM parameter -XX:-ResizeTLAB. Doing this will mean that if the TLAB for a given thread fills up, all future allocations will be outside TLAB and require synchronization.
我们还可以启用 TLAB,但通过设置 JVM 参数 -XX:-ResizeTLAB来停止调整其大小。这样做意味着,如果给定线程的 TLAB 填满,所有未来的分配都将在 TLAB 之外进行,并且需要同步。
We also have the ability to configure the size of TLAB. We can provide the JVM parameter -XX:TLABSize with a value to use. This defines the suggested initial size that the JVM should use for each TLAB, so it’s the size per thread to allocate. If this is set to 0 – which is the default – then the JVM will dynamically determine how much to allocate per thread based on the current state of the JVM.
我们还可以配置 TLAB 的大小。我们可以为 JVM 参数 -XX:TLABSize 提供一个使用值。该值定义了 JVM 应该为每个 TLAB 使用的建议初始大小,因此它是每个线程分配的大小。如果将其设置为 0(默认值),那么 JVM 将根据 JVM 的当前状态动态决定为每个线程分配的大小。
We can also specify -XX:MinTLABSize to give a lower limit on what the TLAB size for each thread should be for cases where we’re allowing the JVM to dynamically determine the size. We also have -XX:MaxTLABSize as the upper limit on what the TLAB can grow to for each thread.
我们还可以指定-XX:MinTLABSize,为允许 JVM 动态决定大小的情况下每个线程的 TLAB 大小提供下限。我们还指定了 -XX:MaxTLABSize 作为每个线程的 TLAB 容量上限。
All of these settings have sensible defaults already, and it’s usually best to just use these, but if we find there are problems, we do have a level of control.
所有这些设置都有合理的默认值,通常最好使用这些默认值,但如果我们发现存在问题,我们确实可以进行一定程度的控制。
6. Summary
6.小结
In this article, we’ve seen what Thread-Local Allocation Buffers are, how they’re used, and how we can manage them. Next time you have any performance issues with your application, consider if this could be something to investigate.
在本文中,我们了解了线程本地分配缓冲区是什么、如何使用以及如何管理它们。下次当您的应用程序出现任何性能问题时,请考虑是否需要对此进行调查。