1. Overview
1.概述
Compilers and runtimes tend to optimize everything, even the smallest and seemingly less critical parts. When it comes to these sorts of optimizations, JVM and Java have a lot to offer.
编译器和运行时倾向于优化一切,即使是最小的、看起来不太关键的部分。当涉及到这类优化时,JVM和Java有很多东西可以提供。
In this article, we’re going to evaluate one of these relatively new optimizations: string concatenation with invokedynamic.
在这篇文章中,我们将评估这些相对较新的优化之一。字符串连接与invokedynamic。
2. Before Java 9
2.在Java 9之前
Before Java 9, non-trivial string concatenations were implemented using StringBuilder. For instance, let’s consider the following method:
在Java 9之前,非复杂的字符串连接是用StringBuilder实现的。例如,让我们考虑下面这个方法。
String concat(String s, int i) {
return s + i;
}
The bytecode for this simple code is as follows (with javap -c):
这个简单代码的字节码如下(用javap -c)。
java.lang.String concat(java.lang.String, int);
Code:
0: new #2 // class StringBuilder
3: dup
4: invokespecial #3 // Method StringBuilder."<init>":()V
7: aload_0
8: invokevirtual #4 // Method StringBuilder.append:(LString;)LStringBuilder;
11: iload_1
12: invokevirtual #5 // Method StringBuilder.append:(I)LStringBuilder;
15: invokevirtual #6 // Method StringBuilder.toString:()LString;
Here, the Java 8 compiler is using StringBuilder to concatenate the method inputs, even though we didn’t use StringBuilder in our code.
在这里,Java 8编译器正在使用StringBuilder来连接方法输入,e,尽管我们在代码中没有使用StringBuilder。
To be fair, concatenating strings using StringBuilder is pretty efficient and well-engineered.
公平地说,使用StringBuilder对字符串进行连接是相当有效的,而且设计得很好。
Let’s see how Java 9 changes this implementation and what are the motivations for such a change.
让我们看看Java 9是如何改变这种实现的,以及这种改变的动机是什么。
3. Invoke Dynamic
3.调用动态
As of Java 9 and as part of JEP 280, the string concatenation is now using invokedynamic.
从Java 9开始,作为JEP 280的一部分,字符串连接现在使用invokedynamic。
The primary motivation behind the change is to have a more dynamic implementation. That is, it’s possible to change the concatenation strategy without changing the bytecode. This way, clients can benefit from a new optimized strategy even without recompilation.
这种变化背后的主要动机是为了有一个更动态的实现。也就是说,我们可以在不改变字节码的情况下改变连接策略。这样,即使不重新编译,客户也能从新的优化策略中受益。
There are other advantages, too. For example, the bytecode for invokedynamic is more elegant, less brittle, and smaller.
也有其他优点。例如,invokedynamic的字节码更加优雅,不那么脆,而且更小。
3.1. Big Picture
3.1.大图片
Before diving into details of how this new approach works, let’s see it from a broader point of view.
在深入研究这种新方法如何运作的细节之前,让我们从更广泛的角度来看待它。
As an example, suppose we’re going to create a new String by joining another String with an int. We can think of this as a function that accepts a String and an int and then returns the concatenated String.
举个例子,假设我们要通过连接另一个String和一个int来创建一个新的String。我们可以把它看作是一个接受String和int的函数,然后返回连接后的String。
Here’s how the new approach works for this example:
下面是新方法在这个例子中的作用。
- Preparing the function signature describing the concatenation. For instance, (String, int) -> String
- Preparing the actual arguments for the concatenation. For instance, if we’re going to join “The answer is “ and 42, then these values will be the arguments
- Calling the bootstrap method and passing the function signature, the arguments, and a few other parameters to it
- Generating the actual implementation for that function signature and encapsulating it inside a MethodHandle
- Calling the generated function to create the final joined string
Put simply, the bytecode defines a specification at compile-time. Then the bootstrap method links an implementation to that specification at runtime. This, in turn, will make it possible to change the implementation without touching the bytecode.
简单地说,字节码在编译时定义了一个规范。然后,引导方法在运行时将一个实现与该规范联系起来。这反过来将使我们有可能在不触及字节码的情况下改变实现。
Throughout this article, we’ll uncover the details associated with each of these steps.
在这篇文章中,我们将揭开与这些步骤中的每个步骤相关的细节。
First, let’s see how the linkage to the bootstrap method works.
首先,让我们看看与自举法的联系是如何进行的。
4. The Linkage
4.链接
Let’s see how the Java 9+ compiler generates the bytecode for the same method:
让我们看看Java 9+编译器是如何为同一个方法生成字节码的。
java.lang.String concat(java.lang.String, int);
Code:
0: aload_0
1: iload_1
2: invokedynamic #7, 0 // InvokeDynamic #0:makeConcatWithConstants:(LString;I)LString;
7: areturn
As opposed to the naive StringBuilder approach, this one is using a significantly smaller number of instructions.
相对于天真的StringBuilder方法,这个方法使用的指令数量明显较少。
In this bytecode, the (LString;I)LString signature is quite interesting. It takes a String and an int (the I represents int) and returns the concatenated string. This is because the method joins one String and an int together.
在这个字节码中,(LString;I)LString签名相当有趣。它接收一个String和一个int(I代表int)并返回串联的字符串。这是因为该方法将一个String和一个int连在一起。
Similar to other invoke dynamic implementations, much of the logic is moved out from compile-time to runtime.
与其他调用动态实现类似,许多逻辑被从编译时移到了运行时。
To see that runtime logic, let’s inspect the bootstrap method table (with javap -c -v):
为了看到这个运行时逻辑,让我们检查一下引导方法表(用javap -c -v)。
BootstrapMethods:
0: #25 REF_invokeStatic java/lang/invoke/StringConcatFactory.makeConcatWithConstants:
(Ljava/lang/invoke/MethodHandles$Lookup;
Ljava/lang/String;
Ljava/lang/invoke/MethodType;
Ljava/lang/String;
[Ljava/lang/Object;)Ljava/lang/invoke/CallSite;
Method arguments:
#31 \u0001\u0001
In this case, when the JVM sees the invokedynamic instruction for the first time, it calls the makeConcatWithConstants bootstrap method. The bootstrap method will, in turn, return a ConstantCallSite, which points to the concatenation logic.
在这种情况下,当JVM第一次看到invokedynamic指令时,它会调用makeConcatWithConstants bootstrap方法。该引导方法将反过来返回一个ConstantCallSite,它指向连接逻辑。
Among the arguments passed to the bootstrap method, two stand out:
在传递给bootstrap方法的参数中,有两个很突出。
- Ljava/lang/invoke/MethodType represents the string concatenation signature. In this case, it’s (LString;I)LString since we’re combining an integer with a String
- \u0001\u0001 is the recipe for constructing the string (more on this later)
5. Recipes
5.餐馆菜谱
To better understand the role of recipes, let’s consider a simple data class:
为了更好地理解配方的作用,让我们考虑一个简单的数据类。
public class Person {
private String firstName;
private String lastName;
// constructor
@Override
public String toString() {
return "Person{" +
"firstName='" + firstName + '\'' +
", lastName='" + lastName + '\'' +
'}';
}
}
To generate a String representation, the JVM passes firstName and lastName fields to the invokedynamic instruction as the arguments:
为了生成String表示,JVM将firstName和lastName字段作为参数传递给invokedynamic指令。
0: aload_0
1: getfield #7 // Field firstName:LString;
4: aload_0
5: getfield #13 // Field lastName:LString;
8: invokedynamic #16, 0 // InvokeDynamic #0:makeConcatWithConstants:(LString;LString;)L/String;
13: areturn
This time, the bootstrap method table looks a bit different:
这一次,引导法的表格看起来有点不同。
BootstrapMethods:
0: #28 REF_invokeStatic StringConcatFactory.makeConcatWithConstants // truncated
Method arguments:
#34 Person{firstName=\'\u0001\', lastName=\'\u0001\'} // The recipe
As shown above, the recipe represents the basic structure of the concatenated String. For instance, the preceding recipe consists of:
如上所示,配方代表了串联的字符串的基本结构。例如,前面的配方由以下部分组成。
- Constant strings such as “Person“. These literal values will be present in the concatenated string as-is
- Two \u0001 tags to represent ordinary arguments. They will be replaced by the actual arguments such as firstName
We can think of the recipe as a templated String containing both static parts and variable placeholders.
我们可以把配方看作是一个模板化的String,包含静态部分和变量占位符。
Using recipes can dramatically reduce the number of arguments passed to the bootstrap method, as we only need to pass all dynamic arguments plus one recipe.
使用配方可以极大地减少传递给引导方法的参数数量,因为我们只需要传递所有的动态参数和一个配方。
6. Bytecode Flavors
6种字节码类型
There are two bytecode flavors for the new concatenation approach. So far, we’re familiar with the one flavor: calling the makeConcatWithConstants bootstrap method and passing a recipe. This flavor, known as indy with constants, is the default one as of Java 9.
新的连接方法有两种字节码的味道。到目前为止,我们所熟悉的是一种风味:调用makeConcatWithConstantsbootstrap方法并传递一个配方。这种方式被称为带常量的indy,是Java 9.的默认方式。
Instead of using a recipe, the second flavor passes everything as arguments. That is, it doesn’t differentiate between constant and dynamic parts and passes all of them as arguments.
第二种味道没有使用配方,而是把所有东西都作为参数传递。也就是说,它不区分常量和动态部分,而是将它们全部作为参数传递。
To use the second flavor, we should pass the -XDstringConcat=indy option to the Java compiler. For instance, if we compile the same Person class with this flag, then the compiler generates the following bytecode:
为了使用第二种风味,我们应该向Java编译器传递-XDstringConcat=indy选项。例如,如果我们用这个标志编译同一个Person类,那么编译器就会生成以下字节码。
public java.lang.String toString();
Code:
0: ldc #16 // String Person{firstName=\'
2: aload_0
3: getfield #7 // Field firstName:LString;
6: bipush 39
8: ldc #18 // String , lastName=\'
10: aload_0
11: getfield #13 // Field lastName:LString;
14: bipush 39
16: bipush 125
18: invokedynamic #20, 0 // InvokeDynamic #0:makeConcat:(LString;LString;CLString;LString;CC)LString;
23: areturn
This time around, the bootstrap method is makeConcat. Moreover, the concatenation signature takes seven arguments. Each argument represents one part from toString:
这一次,引导方法是makeConcat。此外,连接签名需要七个参数。每个参数代表来自toString的一个部分。
- The first argument represents the part before the firstName variable — the “Person{firstName=\’” literal
- The second argument is the value of the firstName field
- The third argument is a single quotation character
- The fourth argument is the part before the next variable — “, lastName=\’”
- The fifth argument is the lastName field
- The sixth argument is a single quotation character
- The last argument is the closing curly bracket
This way, the bootstrap method has enough information to link an appropriate concatenation logic.
这样一来,自举法就有足够的信息来链接一个适当的连接逻辑。
Quite interestingly, it’s also possible to travel back to the pre-Java 9 world and use StringBuilder with the -XDstringConcat=inline compiler option.
有趣的是,我们也可以回到Java 9之前的世界,使用StringBuilder与XDstringConcat=inline编译器选项。
7. Strategies
7.战略
The bootstrap method eventually provides a MethodHandle that points to the actual concatenation logic. As of this writing, there are six different strategies to generate this logic:
引导方法最终提供了一个MethodHandle,指向实际的连接逻辑。截至目前,有六种不同的策略来生成这种逻辑。
- BC_SB or “bytecode StringBuilder” strategy generates the same StringBuilder bytecode at runtime. Then it loads the generated bytecode via the Unsafe.defineAnonymousClass method
- BC_SB_SIZED strategy will try to guess the necessary capacity for StringBuilder. Other than that, it’s identical to the previous approach. Guessing the capacity can potentially help the StringBuilder to perform the concatenation without resizing the underlying byte[]
- BC_SB_SIZED_EXACT is a bytecode generator based on StringBuilder that computes the required storage exactly. To calculate the exact size, first, it converts all arguments to String
- MH_SB_SIZED is based on MethodHandles and eventually calls the StringBuilder API for concatenation. This strategy also makes an educated guess about the required capacity
- MH_SB_SIZED_EXACT is similar to the previous one except it calculates the necessary capacity with complete accuracy
- MH_INLINE_SIZE_EXACT calculates the required storage upfront and directly maintains its byte[] to store the concatenation result. This strategy is inline because it replicates what StringBuilder does internally
The default strategy is MH_INLINE_SIZE_EXACT. However, we can change this strategy using the -Djava.lang.invoke.stringConcat=<strategyName> system property.
默认策略是MH_INLINE_SIZE_EXACT。然而,我们可以使用-Djava.lang.invoke.stringConcat=<strategyName>系统属性来改变这个策略。
8. Conclusion
8.结语
In this detailed article, we looked at how the new String concatenation is implemented and the advantages of using such an approach.
在这篇详细的文章中,我们研究了新的String concatenation是如何实现的,以及使用这种方法的优势。
For an even more detailed discussion, it’s a good idea to check out the experimental notes or even the source code.