1. Overview
1.概述
The JVM interprets and executes bytecode at runtime. In addition, it makes use of the just-in-time (JIT) compilation to boost performance.
JVM解释并在运行时执行bytecode。此外,它还利用了及时编译(JIT)来提高性能。
In earlier versions of Java, we had to manually choose between the two types of JIT compilers available in the Hotspot JVM. One is optimized for faster application start-up, while the other achieves better overall performance. Java 7 introduced tiered compilation in order to achieve the best of both worlds.
在早期的Java版本中,我们不得不在Hotspot JVM中的两种JIT编译器中手动选择。一种是为更快的应用程序启动而优化的,而另一种则能实现更好的整体性能。Java 7引入了分层编译,以实现两者的最佳效果。
In this tutorial, we’ll look at the client and server JIT compilers. We’ll review tiered compilation and its five compilation levels. Finally, we’ll see how method compilation works by tracking the compilation logs.
在本教程中,我们将看一下客户端和服务器端的JIT编译器。我们将回顾分层编译和它的五个编译级别。最后,我们将通过跟踪编译日志来了解方法编译是如何工作的。
2. JIT Compilers
2.JIT编译器
A JIT compiler compiles bytecode to native code for frequently executed sections. These sections are called hotspots, hence the name Hotspot JVM. As a result, Java can run with similar performance to a fully compiled language. Let’s look at the two types of JIT compilers available in the JVM.
JIT编译器将字节码编译为经常执行的部分的本地代码。这些部分被称为热点,因此被称为Hotspot JVM。因此,Java的运行性能与完全编译的语言相似。让我们来看看JVM中的两种JIT编译器。
2.1. C1 – Client Complier
2.1.C1 – 客户端编译器
The client compiler, also called C1, is a type of a JIT compiler optimized for faster start-up time. It tries to optimize and compile the code as soon as possible.
客户端编译器,也叫C1,是一种为加快启动时间而优化的JIT编译器。它试图尽快优化和编译代码。
Historically, we used C1 for short-lived applications and applications where start-up time was an important non-functional requirement. Prior to Java 8, we had to specify the -client flag to use the C1 compiler. However, if we use Java 8 or higher, this flag will have no effect.
历史上,我们将C1用于短命的应用程序和启动时间为重要非功能要求的应用程序。在Java 8之前,我们必须指定 -client标志才能使用C1编译器。然而,如果我们使用Java 8或更高版本,这个标志就没有影响了。
2.2. C2 – Server Complier
2.2.C2 – 服务器编译器
The server compiler, also called C2, is a type of a JIT compiler optimized for better overall performance. C2 observes and analyzes the code over a longer period of time compared to C1. This allows C2 to make better optimizations in the compiled code.
服务器编译器,也称为C2,是一种为更好的整体性能而优化的JIT编译器。与C1相比,C2在更长的时间内观察和分析代码。这使得C2能够在编译后的代码中进行更好的优化。
Historically, we used C2 for long-running server-side applications. Prior to Java 8, we had to specify the -server flag to use the C2 compiler. However, this flag will have no effect in Java 8 or higher.
历史上,我们将C2用于长期运行的服务器端应用程序。在Java 8之前,我们必须指定 -server flag来使用C2编译器。然而,这个标志在Java 8或更高版本中不会有任何影响。
We should note that the Graal JIT compiler is also available since Java 10, as an alternative to C2. Unlike C2, Graal can run in both just-in-time and ahead-of-time compilation modes to produce native code.
我们应该注意到,Graal JIT 编译器也从 Java 10 开始提供,作为 C2 的替代品。与C2不同的是,Graal可以在即时编译和ahead-of-time编译模式下运行,以生成本地代码。
3. Tiered Compilation
3.分层编撰
The C2 compiler often takes more time and consumes more memory to compile the same methods. However, it generates better-optimized native code than that produced by C1.
C2编译器通常需要更多的时间和消耗更多的内存来编译相同的方法。然而,它生成的本地代码比C1生成的代码优化得更好。
The tiered compilation concept was first introduced in Java 7. Its goal was to use a mix of C1 and C2 compilers in order to achieve both fast startup and good long-term performance.
分层编译的概念是在Java 7中首次提出的。其目标是混合使用C1和C2编译器,以实现快速启动和良好的长期性能。
3.1. Best of Both Worlds
3.1.两个世界的最佳选择
On application startup, the JVM initially interprets all bytecode and collects profiling information about it. The JIT compiler then makes use of the collected profiling information to find hotspots.
在应用程序启动时,JVM最初解释所有字节码,并收集其剖析信息。然后,JIT编译器利用收集的剖析信息来寻找热点。
First, the JIT compiler compiles the frequently executed sections of code with C1 to quickly reach native code performance. Later, C2 kicks in when more profiling information is available. C2 recompiles the code with more aggressive and time-consuming optimizations to boost performance:
首先,JIT编译器用C1编译经常执行的代码部分,以迅速达到本地代码的性能。后来,当有更多的剖析信息可用时,C2就会启动。C2用更积极、更耗时的优化方法重新编译代码,以提高性能。
In summary, C1 improves performance faster, while C2 makes better performance improvements based on more information about hotspots.
综上所述,C1提高性能的速度更快,而C2基于更多的热点信息,做出了更好的性能改进。
3.2. Accurate Profiling
3.2.准确剖析
An additional benefit of tiered compilation is more accurate profiling information. Before tiered compilation, the JVM collected profiling information only during interpretation.
分层编译的另一个好处是更准确的剖析信息。在分层编译之前,JVM只在解释期间收集剖析信息。
With tiered compilation enabled, the JVM also collects profiling information on the C1 compiled code. Since the compiled code achieves better performance, it allows the JVM to collect more profiling samples.
启用分层编译后,JVM也会收集C1编译代码的剖析信息。由于编译后的代码实现了更好的性能,它允许JVM收集更多的剖析样本。
3.3. Code Cache
3.3 缓存代码
Code cache is a memory area where the JVM stores all bytecode compiled into native code. Tiered compilation increased the amount of code that needs to be cached up to four times.
代码缓存是一个内存区域,JVM将所有被编译成本地代码的字节码存储在这里。分层编译将需要缓存的代码量增加到四倍。
Since Java 9, the JVM segments the code cache into three areas:
从Java 9开始,JVM将代码缓存分为三个区域。
- The non-method segment – JVM internal related code (around 5 MB, configurable via -XX:NonNMethodCodeHeapSize)
- The profiled-code segment – C1 compiled code with potentially short lifetimes (around 122 MB by default, configurable via -XX:ProfiledCodeHeapSize)
- The non-profiled segment – C2 compiled code with potentially long lifetimes (similarly 122 MB by default, configurable via -XX:NonProfiledCodeHeapSize)
Segmented code cache helps to improve code locality and reduces memory fragmentation. Thus, it improves overall performance.
分段式代码缓存有助于提高代码定位,减少内存碎片。因此,它提高了整体性能。
3.4. Deoptimization
3.4.取消优化
Even though C2 compiled code is highly optimized and long-lived, it can be deoptimized. As a result, the JVM would temporarily roll back to interpretation.
即使C2编译的代码是高度优化的,而且寿命很长,但它也可以被取消优化。因此,JVM会暂时回滚到解释。
Deoptimization happens when the compiler’s optimistic assumptions are proven wrong — for example, when profile information does not match method behavior:
当编译器的乐观假设被证明是错误的,例如,当配置文件信息与方法行为不匹配时,就会发生取消优化。
In our example, once the hot path changes, the JVM deoptimizes the compiled and inlined code.
在我们的例子中,一旦热路径发生变化,JVM就会对编译和内联的代码进行反优化。
4. Compilation Levels
4.编译水平
Even though the JVM works with only one interpreter and two JIT compilers, there are five possible levels of compilation. The reason behind this is that the C1 compiler can operate on three different levels. The difference between those three levels is in the amount of profiling done.
即使JVM只用一个解释器和两个JIT编译器工作,也有五种可能的编译级别。这背后的原因是,C1编译器可以在三个不同的级别上运行。这三个级别之间的区别在于所做的剖析的数量。
4.1. Level 0 – Interpreted Code
4.1.0级–解释代码
Initially, JVM interprets all Java code. During this initial phase, the performance is usually not as good compared to compiled languages.
最初,JVM解释所有的Java代码。在这个初始阶段,与编译语言相比,性能通常没有那么好。
However, the JIT compiler kicks in after the warmup phase and compiles the hot code at runtime. The JIT compiler makes use of the profiling information collected on this level to perform optimizations.
然而,JIT编译器在预热阶段后启动,在运行时编译热代码。JIT编译器利用在这个层面上收集的剖析信息来进行优化。
4.2. Level 1 – Simple C1 Compiled Code
4.2.第1级–简单的C1编译代码
On this level, the JVM compiles the code using the C1 compiler, but without collecting any profiling information. The JVM uses level 1 for methods that are considered trivial.
在这个级别,JVM使用C1编译器编译代码,但不收集任何剖析信息。JVM对被认为是微不足道的方法使用1级。
Due to low method complexity, the C2 compilation wouldn’t make it faster. Thus, the JVM concludes that there is no point in collecting profiling information for code that cannot be optimized further.
由于方法复杂度低,C2编译不会使其更快。因此,JVM的结论是,对于不能进一步优化的代码,收集剖析信息是没有意义的。
4.3. Level 2 – Limited C1 Compiled Code
4.3.2级–有限的C1编译代码
On level 2, the JVM compiles the code using the C1 compiler with light profiling. The JVM uses this level when the C2 queue is full. The goal is to compile the code as soon as possible to improve performance.
在第2层,JVM使用C1编译器编译代码,并进行轻度剖析。当C2队列满了时,JVM会使用这个级别。其目的是尽快编译代码以提高性能。
Later, the JVM recompiles the code on level 3, using full profiling. Finally, once the C2 queue is less busy, the JVM recompiles it on level 4.
后来,JVM在第3层重新编译代码,使用完全剖析。最后,一旦C2队列不那么繁忙了,JVM就在第4级上重新编译它。
4.4. Level 3 – Full C1 Compiled Code
4.4.3级–完整的C1编译代码
On level 3, the JVM compiles the code using the C1 compiler with full profiling. Level 3 is part of the default compilation path. Thus, the JVM uses it in all cases except for trivial methods or when compiler queues are full.
在第3级,JVM使用C1编译器对代码进行编译,并进行完全剖析。第3级是默认编译路径的一部分。因此,JVM在所有情况下都使用它,除了微不足道的方法或编译器队列已满时。
The most common scenario in JIT compilation is that the interpreted code jumps directly from level 0 to level 3.
在JIT编译中最常见的情况是,被解释的代码直接从0级跳到3级。
4.5. Level 4 – C2 Compiled Code
4.5.第4级–C2编译代码
On this level, the JVM compiles the code using the C2 compiler for maximum long-term performance. Level 4 is also a part of the default compilation path. The JVM uses this level to compile all methods except trivial ones.
在这个级别上,JVM使用C2编译器编译代码,以获得最大的长期性能。第4级也是默认编译路径的一部分。JVM使用这个级别来编译所有方法,除了琐碎的方法。
Given that level 4 code is considered fully optimized, the JVM stops collecting profiling information. However, it may decide to deoptimize the code and send it back to level 0.
鉴于4级代码被认为是完全优化的,JVM停止收集剖析信息。然而,它可能决定对代码进行非优化,并将其送回0级。
5. Compilation Parameters
5.编译参数
Tiered compilation is enabled by default since Java 8. It’s highly recommended to use it unless there’s a strong reason to disable it.
分层编译是从Java 8开始默认启用的。我们强烈建议使用它,除非有充分的理由禁用它。
5.1. Disabling Tiered Compilation
5.1.禁用分层编译
We may disable tiered compilation by setting the –XX:-TieredCompilation flag. When we set this flag, the JVM will not transition between compilation levels. As a result, we’ll need to select which JIT compiler to use: C1 or C2.
我们可以通过设置-XX:-TieredCompilation flag来禁用分层编译。当我们设置这个标志时,JVM将不会在编译级别之间进行转换。因此,我们需要选择使用哪个JIT编译器。C1或C2。
Unless explicitly specified, the JVM decides which JIT compiler to use based on our CPU. For multi-core processors or 64-bit VMs, the JVM will select C2. In order to disable C2 and only use C1 with no profiling overhead, we can apply the -XX:TieredStopAtLevel=1 parameter.
除非明确指定,否则JVM会根据我们的CPU来决定使用哪个JIT编译器。对于多核处理器或64位虚拟机,JVM将选择C2。为了禁用C2,只使用C1,没有剖析开销,我们可以应用-XX:TieredStopAtLevel=1参数。
To completely disable both JIT compilers and run everything using the interpreter, we can apply the -Xint flag. However, we should note that disabling JIT compilers will have a negative impact on performance.
要完全禁用这两个JIT编译器,并使用解释器运行一切,我们可以应用-Xint标志。然而,我们应该注意,禁用JIT编译器将对性能产生负面影响。
5.2. Setting Thresholds for Levels
5.2.为水平设置阈值
A compile threshold is the number of method invocations before the code gets compiled. In the case of tiered compilation, we can set these thresholds for compilation levels 2-4. For example, we can set a parameter -XX:Tier4CompileThreshold=10000.
编译阈值是指代码被编译前的方法调用次数。在分层编译的情况下,我们可以为编译级别2-4设置这些阈值。例如,我们可以设置一个参数-XX:Tier4CompileThreshold=10000。
In order to check the default thresholds used on a specific Java version, we can run Java using the -XX:+PrintFlagsFinal flag:
为了检查特定Java版本上使用的默认阈值,我们可以使用-XX:+PrintFlagsFinal标志运行Java。
java -XX:+PrintFlagsFinal -version | grep CompileThreshold
intx CompileThreshold = 10000
intx Tier2CompileThreshold = 0
intx Tier3CompileThreshold = 2000
intx Tier4CompileThreshold = 15000
We should note that the JVM doesn’t use the generic CompileThreshold parameter when tiered compilation is enabled.
我们应该注意,当分层编译被启用时,JVM不使用通用的CompileThreshold参数。
6. Method Compilation
6.方法汇编
Let’s now take a look at a method compilation life-cycle:
现在让我们来看看一个方法的编译生命周期。
In summary, the JVM initially interprets a method until its invocations reach the Tier3CompileThreshold. Then, it compiles the method using the C1 compiler while profiling information continues to be collected. Finally, the JVM compiles the method using the C2 compiler when its invocations reach the Tier4CompileThreshold. Eventually, the JVM may decide to deoptimize the C2 compiled code. That means that the complete process will repeat.
总之,JVM最初会解释一个方法,直到其调用达到Tier3CompileThreshold。然后,它使用C1编译器编译该方法,同时继续收集剖析信息。最后,当方法的调用达到Tier4CompileThreshold时,JVM会使用C2编译器对其进行编译。最终,JVM可能会决定对C2编译后的代码进行非优化。这意味着整个过程将重复进行。
6.1. Compilation Logs
6.1.编译日志
By default, JIT compilation logs are disabled. To enable them, we can set the -XX:+PrintCompilation flag. The compilation logs are formatted as:
默认情况下,JIT编译日志是禁用的。要启用它们,我们可以设置 -XX:+PrintCompilation标志。编译日志的格式为:。
- Timestamp – In milliseconds since application start-up
- Compile ID – Incremental ID for each compiled method
- Attributes – The state of the compilation with five possible values:
- % – On-stack replacement occurred
- s – The method is synchronized
- ! – The method contains an exception handler
- b – Compilation occurred in blocking mode
- n – Compilation transformed a wrapper to a native method
- Compilation level – Between 0 and 4
- Method name
- Bytecode size
- Deoptimisation indicator – With two possible values:
- Made not entrant – Standard C1 deoptimization or the compiler’s optimistic assumptions proven wrong
- Made zombie – A cleanup mechanism for the garbage collector to free space from the code cache
6.2. An Example
6.2.一个例子
Let’s demonstrate the method compilation life-cycle on a simple example. First, we’ll create a class that implements a JSON formatter:
让我们在一个简单的例子上演示方法编译的生命周期。首先,我们将创建一个实现JSON格式化的类。
public class JsonFormatter implements Formatter {
private static final JsonMapper mapper = new JsonMapper();
@Override
public <T> String format(T object) throws JsonProcessingException {
return mapper.writeValueAsString(object);
}
}
Next, we’ll create a class that implements the same interface, but implements an XML formatter:
接下来,我们将创建一个实现相同接口的类,但实现一个XML格式化器。
public class XmlFormatter implements Formatter {
private static final XmlMapper mapper = new XmlMapper();
@Override
public <T> String format(T object) throws JsonProcessingException {
return mapper.writeValueAsString(object);
}
}
Now, we’ll write a method that uses the two different formatter implementations. In the first half of the loop, we’ll use the JSON implementation and then switch to the XML one for the rest:
现在,我们将写一个方法,使用这两种不同的格式化实现。在循环的前半部分,我们将使用JSON的实现,然后在其余部分切换到XML的实现。
public class TieredCompilation {
public static void main(String[] args) throws Exception {
for (int i = 0; i < 1_000_000; i++) {
Formatter formatter;
if (i < 500_000) {
formatter = new JsonFormatter();
} else {
formatter = new XmlFormatter();
}
formatter.format(new Article("Tiered Compilation in JVM", "Baeldung"));
}
}
}
Finally, we’ll set the -XX:+PrintCompilation flag, run the main method, and observe the compilation logs.
最后,我们将设置-XX:+PrintCompilation标志,运行main方法,并观察编译日志。
6.3. Review Logs
6.3.审查日志
Let’s focus on log output for our three custom classes and their methods.
让我们关注一下我们的三个自定义类和它们的方法的日志输出。
The first two log entries show that the JVM compiled the main method and the JSON implementation of the format method on level 3. Therefore, both methods were compiled by the C1 compiler. The C1 compiled code replaced the initially interpreted version:
前两个日志条目显示,JVM在第3层编译了main方法和format方法的JSON实现。因此,这两个方法都是由C1编译器编译的。C1编译的代码取代了最初解释的版本。
567 714 3 com.baeldung.tieredcompilation.JsonFormatter::format (8 bytes)
687 832 % 3 com.baeldung.tieredcompilation.TieredCompilation::main @ 2 (58 bytes)
A few hundred milliseconds later, the JVM compiled both methods on level 4. Hence, the C2 compiled versions replaced the previous versions compiled with C1:
659 800 4 com.baeldung.tieredcompilation.JsonFormatter::format (8 bytes)
807 834 % 4 com.baeldung.tieredcompilation.TieredCompilation::main @ 2 (58 bytes)
Just a few milliseconds later, we see our first example of deoptimization. Here, the JVM marked obsolete (not entrant) the C1 compiled versions:
仅仅几毫秒后,我们看到了第一个去优化的例子。在这里,JVM将C1的编译版本标记为过时(不入流)。
812 714 3 com.baeldung.tieredcompilation.JsonFormatter::format (8 bytes) made not entrant
838 832 % 3 com.baeldung.tieredcompilation.TieredCompilation::main @ 2 (58 bytes) made not entrant
After a while, we’ll notice another example of deoptimization. This log entry is interesting as the JVM marked obsolete (not entrant) the fully optimized C2 compiled versions. That means the JVM rolled back the fully optimized code when it detected that it wasn’t valid anymore:
一段时间后,我们会注意到另一个去优化的例子。这个日志条目很有意思,因为JVM将完全优化的C2编译版本标记为过时(不入流)。这意味着JVM在检测到完全优化的代码不再有效时回滚了。
1015 834 % 4 com.baeldung.tieredcompilation.TieredCompilation::main @ 2 (58 bytes) made not entrant
1018 800 4 com.baeldung.tieredcompilation.JsonFormatter::format (8 bytes) made not entrant
Next, we’ll see the XML implementation of the format method for the first time. The JVM compiled it on level 3, together with the main method:
接下来,我们将首次看到format方法的XML实现。JVM在第3层将其与main方法一起编译。
1160 1073 3 com.baeldung.tieredcompilation.XmlFormatter::format (8 bytes)
1202 1141 % 3 com.baeldung.tieredcompilation.TieredCompilation::main @ 2 (58 bytes)
A few hundred milliseconds later, the JVM compiled both methods on level 4. However, this time, it’s the XML implementation that was used by the main method:
几百毫秒后,JVM在第4级编译了这两个方法。然而,这一次,是XML的实现被main方法使用。
1341 1171 4 com.baeldung.tieredcompilation.XmlFormatter::format (8 bytes)
1505 1213 % 4 com.baeldung.tieredcompilation.TieredCompilation::main @ 2 (58 bytes
Same as before, a few milliseconds later, the JVM marked obsolete (not entrant) the C1 compiled versions:
和以前一样,几毫秒后,JVM将C1的编译版本标记为过时(不入流)。
1492 1073 3 com.baeldung.tieredcompilation.XmlFormatter::format (8 bytes) made not entrant
1508 1141 % 3 com.baeldung.tieredcompilation.TieredCompilation::main @ 2 (58 bytes) made not entrant
The JVM continued to use the level 4 compiled methods until the end of our program.
JVM继续使用第4级编译的方法,直到我们的程序结束。
7. Conclusion
7.结语
In this article, we explored the tiered compilation concept in the JVM. We reviewed the two types of JIT compilers and how tiered compilation uses both of them to achieve the best results. We saw five levels of compilation and learned how to control them using JVM parameters.
在这篇文章中,我们探讨了JVM中的分层编译概念。我们回顾了两种类型的JIT编译器,以及分层编译如何使用这两种编译器来实现最佳结果。我们看到了五个层次的编译,并学习了如何使用JVM参数来控制它们。
In the examples, we explored the complete method compilation life-cycle by observing the compilation logs.
在例子中,我们通过观察编译日志,探索了完整的方法编译生命周期。
As always, the source code is available over on GitHub.
像往常一样,源代码可在GitHub上获得。