Deep Dive Into the New Java JIT Compiler – Graal – 深入了解新的Java JIT编译器 – Graal

最后修改: 2018年 11月 8日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

In this tutorial, we’ll take a deeper look at the new Java Just-In-Time (JIT) compiler, called Graal.

在本教程中,我们将深入了解新的Java即时编译器(JIT),称为Graal。

We’ll see what the project Graal is and describe one of its parts, a high-performance dynamic JIT compiler.

我们将看到什么是Graal项目,并描述它的一个部分,一个高性能的动态JIT编译器。

2. What Is a JIT Compiler?

2.什么是JIT编译器?

Let’s first explain what JIT compiler does.

让我们首先解释一下JIT编译器的作用。

When we compile our Java program (e.g., using the javac command), we’ll end up with our source code compiled into the binary representation of our code – a JVM bytecode. This bytecode is simpler and more compact than our source code, but conventional processors in our computers cannot execute it.

当我们编译我们的Java程序时(例如,使用javac命令),我们最终会将我们的源代码编译成我们代码的二进制表示 – JVM字节码。这个字节码比我们的源代码更简单、更紧凑,但我们计算机中的传统处理器无法执行它。

To be able to run a Java program, the JVM interprets the bytecode. Since interpreters are usually a lot slower than native code executing on a real processor, the JVM can run another compiler which will now compile our bytecode into the machine code that can be run by the processor. This so-called just-in-time compiler is much more sophisticated than the javac compiler, and it runs complex optimizations to generate high-quality machine code.

为了能够运行一个Java程序,JVM对字节码进行解释。由于解释器通常比在真正的处理器上执行的本地代码慢得多,JVM可以运行另一个编译器,现在它将把我们的字节码编译成可以被处理器运行的机器代码。这种所谓的即时编译器比javac编译器要复杂得多,它运行复杂的优化以生成高质量的机器代码。

3. More Detailed Look into the JIT Compiler

3.对JIT编译器的更详细了解

The JDK implementation by Oracle is based on the open-source OpenJDK project. This includes the HotSpot virtual machine, available since Java version 1.3. It contains two conventional JIT-compilers: the client compiler, also called C1 and the server compiler, called opto or C2.

Oracle的JDK实现是基于开源的OpenJDK项目。这包括HotSpot虚拟机,自Java 1.3版起就可用。它包含两个传统的JIT编译器:客户端编译器,也叫C1,服务器编译器,叫opto或C2

C1 is designed to run faster and produce less optimized code, while C2, on the other hand, takes a little more time to run but produces a better-optimized code. The client compiler is a better fit for desktop applications since we don’t want to have long pauses for the JIT-compilation. The server compiler is better for long-running server applications that can spend more time on the compilation.

C1被设计成运行速度更快,产生的优化代码更少,而另一方面,C2需要更多一点时间来运行,但产生的优化代码更好。客户端编译器更适合于桌面应用程序,因为我们不希望在JIT-编译中出现长时间的停顿。服务器编译器更适合长期运行的服务器应用程序,可以在编译上花费更多时间。

3.1. Tiered Compilation

3.1.分层汇编

Today, Java installation uses both JIT compilers during the normal program execution.

今天,在正常的程序执行过程中,Java安装同时使用JIT编译器。

As we mentioned in the previous section, our Java program, compiled by javac, starts its execution in an interpreted mode. The JVM tracks each frequently called method and compiles them. In order to do that, it uses C1 for the compilation. But, the HotSpot still keeps an eye on the future calls of those methods. If the number of calls increases, the JVM will recompile these methods once more, but this time using C2.

正如我们在上一节提到的,我们的Java程序由javac编译,以解释模式开始执行。JVM跟踪每个经常被调用的方法并对其进行编译。为了做到这一点,它使用C1进行编译。但是,HotSpot仍然关注着这些方法的未来调用情况。如果调用次数增加,JVM将再次重新编译这些方法,但这次是使用C2。

This is the default strategy used by the HotSpot, called tiered compilation.

这是HotSpot使用的默认策略,称为分层编译

3.2. The Server Compiler

3.2.服务器编译器

Let’s now focus for a bit on C2, since it is the most complex of the two. C2 has been extremely optimized and produces code that can compete with C++ or be even faster. The server compiler itself is written in a specific dialect of C++.

现在让我们集中讨论一下C2,因为它是两者中最复杂的。C2已经被极度优化,产生的代码可以与C++竞争,甚至更快。服务器编译器本身是用C++的特定方言编写的。

However, it comes with some issues. Due to possible segmentation faults in C++, it can cause the VM to crash. Also, no major improvements have been implemented in the compiler over the last several years. The code in C2 has become difficult to maintain, so we couldn’t expect new major enhancements with the current design. With that in mind, the new JIT compiler is being created in the project named GraalVM.

然而,它也有一些问题。由于C++中可能存在分段故障,它可能导致虚拟机崩溃。另外,在过去几年中,编译器没有实现重大改进。C2中的代码已经变得难以维护,所以我们无法期望在目前的设计下有新的重大改进。考虑到这一点,新的JIT编译器正在名为GraalVM的项目中创建。

4. Project GraalVM

4.GraalVM项目

Project GraalVM is a research project created by Oracle. We can look at Graal as several connected projects: a new JIT compiler that builds on HotSpot and a new polyglot virtual machine. It offers a comprehensive ecosystem supporting a large set of languages (Java and other JVM-based languages; JavaScript, Ruby, Python, R,  C/C++, and other LLVM-based languages).

项目GraalVM是由Oracle创建的一个研究项目。我们可以把Graal看成是几个相互关联的项目:一个建立在HotSpot基础上的新JIT编译器和一个新的多语言虚拟机。它提供了一个全面的生态系统,支持一大批语言(Java和其他基于JVM的语言;JavaScript、Ruby、Python、R、C/C++和其他基于LLVM的语言)。

We’ll of course focus on Java.

当然,我们会把重点放在Java上。

4.1. Graal – a JIT Compiler Written in Java

4.1 Graal – 一个用Java编写的JIT编译器

Graal is a high-performance JIT compiler. It accepts the JVM bytecode and produces the machine code.

Graal是一个高性能的JIT编译器。它接受JVM字节码并生成机器代码。

There are several key advantages of writing a compiler in Java. First of all, safety, meaning no crashes but exceptions instead and no real memory leaks. Furthermore, we’ll have a good IDE support and we’ll be able to use debuggers or profilers or other convenient tools. Also, the compiler can be independent of the HotSpot and it would be able to produce a faster JIT-compiled version of itself.

用Java编写编译器有几个关键的优点。首先,安全,意味着没有崩溃,而是有异常,没有真正的内存泄漏。此外,我们将有一个良好的IDE支持,我们将能够使用调试器或分析器或其他方便的工具。另外,编译器可以独立于HotSpot,它将能够产生一个更快的JIT-编译版本的自身。

The Graal compiler was created with those advantages in mind. It uses the new JVM Compiler Interface – JVMCI to communicate with the VM. To enable the use of the new JIT compiler, we need to set the following options when running Java from the command line:

Graal编译器就是考虑到这些优点而创建的。它使用新的JVM编译器接口–JVMCI来与虚拟机通信。为了启用新的JIT编译器,我们需要在从命令行运行Java时设置以下选项。

-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler

What this means is that we can run a simple program in three different ways: with the regular tiered compilers, with the JVMCI version of Graal on Java 10 or with the GraalVM itself.

这意味着我们可以用三种不同的方式来运行一个简单的程序:用常规的分层编译器、用Java 10上的JVMCI版本的Graal或者用GraalVM本身

4.2. JVM Compiler Interface

4.2 JVM编译器接口

The JVMCI is part of the OpenJDK since JDK 9, so we can use any standard OpenJDK or Oracle JDK to run Graal.

从JDK 9开始,JVMCI是OpenJDK的一部分,所以我们可以使用任何标准的OpenJDK或Oracle JDK来运行Graal。

What JVMCI actually allows us to do is to exclude the standard tiered compilation and plug in our brand new compiler (i.e. Graal) without the need of changing anything in the JVM.

JVMCI实际上允许我们做的是排除标准的分层编译,并插入我们全新的编译器(即Graal),而不需要改变JVM中的任何东西。

The interface is quite simple. When Graal is compiling a method, it’ll pass the bytecode of that method as the input to the JVMCI’. As an output, we’ll get the compiled machine code. Both the input and the output are just byte arrays:

这个接口很简单。当Graal编译一个方法时,它将把该方法的字节码作为输入传给JVMCI’。作为输出,我们将得到编译后的机器码。输入和输出都是字节数组。

interface JVMCICompiler {
    byte[] compileMethod(byte[] bytecode);
}

In real-life scenarios, we’ll usually need some more information like the number of local variables, the stack size, and the information collected from profiling in the interpreter so that we know how the code is running in practice.

在现实生活中,我们通常需要一些更多的信息,比如局部变量的数量、堆栈的大小,以及从解释器的剖析中收集的信息,这样我们就能知道代码在实践中的运行情况。

Essentially, when calling the compileMethod() of the JVMCICompiler interface, we’ll need to pass a CompilationRequest object. It’ll then return the Java method we want to compile, and in that method, we’ll find all the information we need.

基本上,当调用compileMethod()的JVMCICompiler接口时,我们需要传递一个CompilationRequest对象。然后它将返回我们要编译的Java方法,在该方法中,我们将找到我们需要的所有信息。

4.3. Graal in Action

4.3.圣杯在行动

Graal itself is executed by the VM, so it’ll first be interpreted and JIT-compiled when it becomes hot. Let’s check out an example, which can be also found on the GraalVM’s official site:

Graal本身是由虚拟机执行的,所以当它变热时,首先会被解释并进行JIT编译。我们来看看一个例子,这个例子也可以在GraalVM的官方网站上找到。

public class CountUppercase {
    static final int ITERATIONS = Math.max(Integer.getInteger("iterations", 1), 1);

    public static void main(String[] args) {
        String sentence = String.join(" ", args);
        for (int iter = 0; iter < ITERATIONS; iter++) {
            if (ITERATIONS != 1) {
                System.out.println("-- iteration " + (iter + 1) + " --");
            }
            long total = 0, start = System.currentTimeMillis(), last = start;
            for (int i = 1; i < 10_000_000; i++) {
                total += sentence
                  .chars()
                  .filter(Character::isUpperCase)
                  .count();
                if (i % 1_000_000 == 0) {
                    long now = System.currentTimeMillis();
                    System.out.printf("%d (%d ms)%n", i / 1_000_000, now - last);
                    last = now;
                }
            }
            System.out.printf("total: %d (%d ms)%n", total, System.currentTimeMillis() - start);
        }
    }
}

Now, we’ll compile it and run it:

现在,我们将编译它并运行它。

javac CountUppercase.java
java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler

This will result in the output similar to the following:

这将导致类似于以下的输出。

1 (1581 ms)
2 (480 ms)
3 (364 ms)
4 (231 ms)
5 (196 ms)
6 (121 ms)
7 (116 ms)
8 (116 ms)
9 (116 ms)
total: 59999994 (3436 ms)

We can see that it takes more time in the beginning. That warm-up time depends on various factors, such as the amount of multi-threaded code in the application or the number of threads the VM uses. If there are fewer cores, the warm-up time could be longer.

我们可以看到,开始时需要更多时间。这个预热时间取决于各种因素,比如应用程序中多线程代码的数量或虚拟机使用的线程数量。如果核数较少,预热时间可能会更长。

If we want to see the statistics of Graal compilations we need to add the following flag when executing our program:

如果我们想看到Graal编译的统计数据,我们需要在执行我们的程序时添加以下标志。

-Dgraal.PrintCompilation=true

This will show the data related to the compiled method, the time taken, the bytecodes processed (which includes inlined methods as well), the size of the machine code produced, and the amount of memory allocated during compilation. The output of the execution takes quite a lot of space, so we won’t show it here.

这将显示与编译方法有关的数据,所花的时间,处理的字节码(其中也包括内联方法),产生的机器码的大小,以及编译过程中分配的内存量。执行的输出结果需要相当大的空间,所以我们在这里就不显示了。

4.4. Comparing with the Top Tier Compiler

4.4.与顶级编译器的比较

Let’s now compare the above results with the execution of the same program compiled with the top tier compiler instead. To do that, we need to tell the VM to not use the JVMCI compiler:

现在让我们将上述结果与用顶级编译器编译的同一程序的执行情况进行比较。要做到这一点,我们需要告诉虚拟机不要使用JVMCI编译器。

java -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:-UseJVMCICompiler 
1 (510 ms)
2 (375 ms)
3 (365 ms)
4 (368 ms)
5 (348 ms)
6 (370 ms)
7 (353 ms)
8 (348 ms)
9 (369 ms)
total: 59999994 (4004 ms)

We can see that there is a smaller difference between the individual times. It also results in a briefer initial time.

我们可以看到,各个时间之间的差异较小。这也导致了更简短的初始时间。

4.5. The Data Structure Behind Graal

4.5.Graal背后的数据结构

As we said earlier, Graal basically turns a byte array into another byte array. In this section, we’ll focus on what’s behind this process. The following examples are relying on Chris Seaton’s talk at JokerConf 2017.

正如我们前面所说,Graal基本上把一个字节数组变成另一个字节数组。在本节中,我们将重点讨论这个过程背后的内容。下面的例子是依靠Chris Seaton在JokerConf 2017的演讲

Basic compiler’s job, in general, is to act upon our program. This means that it must symbolize it with an appropriate data structure. Graal uses a graph for such a purpose, the so-called program-dependence-graph.

一般来说,基础编译器的工作是在我们的程序上采取行动。这意味着它必须用一个适当的数据结构来象征它。Graal为此使用了一个图,即所谓的程序依赖图

In a simple scenario, where we want to add two local variables, i.e., x + y, we would have one node for loading each variable and another node for adding them. Beside it, we’d also have two edges representing the data flow:

在一个简单的场景中,我们想添加两个局部变量,即x + y我们将有一个节点用于加载每个变量,另一个节点用于添加它们。在它旁边,我们也会有两条边代表数据流

data graph x p y

The data flow edges are displayed in blue. They’re pointing out that when the local variables are loaded, the result goes into the addition operation.

数据流边缘显示为蓝色。他们指出,当局部变量被加载时,结果会进入加法运算。

Let’s now introduce another type of edges, the ones that describe the control flow. To do so, we’ll extend our example by calling methods to retrieve our variables instead of reading them directly. When we do that, we need to keep track of the methods calling order. We’ll represent this order with the red arrows:

现在让我们来介绍另一种类型的边,即描述控制流的边。为此,我们将扩展我们的例子,调用方法来检索我们的变量,而不是直接读取它们。当我们这样做时,我们需要跟踪方法的调用顺序。我们将用红色的箭头来表示这个顺序。

control graph getx p gety

Here, we can see that the nodes didn’t change actually, but we have the control flow edges added.

在这里,我们可以看到,节点实际上并没有改变,但我们增加了控制流边。

4.6. Actual Graphs

4.6.实际图表

We can examine the real Graal graphs with the IdealGraphVisualiser. To run it, we use the mx igv command. We also need to configure the JVM by setting the -Dgraal.Dump flag.

我们可以用IdealGraphVisualiser来检查真实的Graal图。为了运行它,我们使用mx igv命令。我们还需要通过设置-Dgraal.Dump标志来配置JVM。

Let’s check out a simple example:

让我们看看一个简单的例子。

int average(int a, int b) {
    return (a + b) / 2;
}

This has a very simple data flow:

这有一个非常简单的数据流。

graph average

In the graph above, we can see a clear representation of our method. Parameters P(0) and P(1) flow into the add operation which enters the divide operation with the constant C(2). Finally, the result is returned.

在上图中,我们可以看到我们的方法的清晰表述。参数P(0)和P(1)流入加法运算,而加法运算则进入常数C(2)的除法运算中。最后,结果被返回。

We’ll now change the previous example to be applicable to an array of numbers:

现在我们将改变前面的例子,使其适用于一个数组。

int average(int[] values) {
    int sum = 0;
    for (int n = 0; n < values.length; n++) {
        sum += values[n];
    }
    return sum / values.length;
}

We can see that adding a loop led us to the much more complex graph:

我们可以看到,增加一个循环使我们得到了更复杂的图形。

average loop detail

What we can notice here are:

我们可以注意到这里有:

  • the begin and the end loop nodes
  • the nodes representing the array reading and the array length reading
  • data and control flow edges, just as before.

This data structure is sometimes called a sea-of-nodes, or a soup-of-nodes. We need to mention that the C2 compiler uses a similar data structure, so it’s not something new, innovated exclusively for Graal.

这种数据结构有时被称为节点之海,或节点之汤。我们需要提到的是,C2编译器也使用了类似的数据结构,所以这并不是什么新东西,是专门为Graal创新的。

It is noteworthy remember that Graal optimizes and compiles our program by modifying the above-mentioned data structure. We can see why it was an actually good choice to write the Graal JIT compiler in Java: a graph is nothing more than a set of objects with references connecting them as the edges. That structure is perfectly compatible with the object-oriented language, which in this case is Java.

值得一提的是,记得Graal通过修改上述数据结构来优化和编译我们的程序。我们可以看到为什么用Java编写Graal JIT编译器实际上是一个不错的选择。图无非是一组对象,连接它们的引用是边。这种结构与面向对象的语言完全兼容,在这种情况下就是Java

4.7. Ahead-of-Time Compiler Mode

4.7.超时空编译器模式

It is also important to mention that we can also use the Graal compiler in the Ahead-of-Time compiler mode in Java 10. As we said already, the Graal compiler has been written from scratch. It conforms to a new clean interface, the JVMCI, which enables us to integrate it with the HotSpot. That doesn’t mean that the compiler is bound to it though.

同样重要的是,我们也可以在Java 10的Ahead-of-Time编译器模式下使用Graal编译器。正如我们已经说过的,Graal编译器是从头开始编写的。它符合一个新的干净的接口,即JVMCI,这使我们能够将其与HotSpot集成。但这并不意味着编译器被束缚在其中。

One way of using the compiler is to use a profile-driven approach to compile only the hot methods, but we can also make use of Graal to do a total compilation of all methods in an offline mode without executing the code. This is a so-called “Ahead-of-Time Compilation”, JEP 295, but we’ll not go deep into the AOT compilation technology here.

使用编译器的一种方式是使用配置文件驱动的方法,只编译热门方法,但我们也可以利用Graal在离线模式下对所有方法进行整体编译,而不执行代码。这就是所谓的 “超前编译”,JEP 295,但我们在此不深入探讨AOT编译技术。

The main reason why we would use Graal in this manner is to speed up startup time until the regular Tiered Compilation approach in the HotSpot can take over.

我们以这种方式使用Graal的主要原因是为了加快启动时间,直到HotSpot中的常规分层编译方法可以接管。

5. Conclusion

5.总结

In this article, we explored the functionalities of the new Java JIT compiler as the part of the project Graal.

在这篇文章中,我们探索了新的Java JIT编译器的功能,它是Graal项目的一部分。

We first described traditional JIT compilers and then discuss new features of the Graal, especially the new JVM Compiler interface. Then, we illustrated how both compilers work and compared their performances.

我们首先描述了传统的JIT编译器,然后讨论了Graal的新特性,特别是新的JVM编译器接口。然后,我们说明了两种编译器的工作原理,并比较了它们的性能。

After that, we’ve talked about the data structure that Graal uses to manipulate our program and, finally, about the AOT compiler mode as another way to use Graal.

之后,我们谈到了Graal用来操作我们的程序的数据结构,最后,谈到了AOT编译器模式是使用Graal的另一种方式。

As always, the source code can be found over on GitHub. Remember that the JVM needs to be configured with the specific flags – which were described here.

一如既往,源代码可以在GitHub上找到。请记住,JVM需要用特定的标志来配置–这里已经介绍了。