1. Overview
1.概述
Generally speaking, the Java documents strongly discourage us from serializing a lambda expression. That’s because the lambda expression will generate synthetic constructs. And, these synthetic constructs suffer several potential problems: no corresponding construct in the source code, variation among different Java compiler implementations, and compatibility issues with a different JRE implementation. However, sometimes, serializing a lambda is necessary.
一般来说,Java 文档强烈建议我们不要对lambda表达式进行序列化处理。这是因为lambda表达式将生成合成结构体。而且,这些合成结构体存在几个潜在的问题:源代码中没有相应的结构体,不同的Java编译器实现之间存在差异,以及与不同的JRE实现的兼容性问题。然而,有时,序列化lambda是必要的。
In this tutorial, we’re going to explain how to serialize a lambda expression and its underlying mechanism.
在本教程中,我们将解释如何对lambda表达式进行序列化以及它的基本机制。
2. Lambda and Serialization
2.Lambda和序列化
When we use Java Serialization to serialize or deserialize an object, its class and non-static fields must be all serializable. Otherwise, it will lead to NotSerializableException. Likewise, when serializing a lambda expression, we must make sure its target type and capturing arguments are serializable.
当我们使用Java Serialization来序列化或反序列化一个对象时,它的类和非静态字段必须全部是可序列化的。否则,它将导致NotSerializableException。同样,当序列化一个lambda表达式时,我们必须确保其目标类型和捕获的参数是可序列化的。
2.1. A Failed Lambda Serialization
2.1.一个失败的Lambda序列化
In the source file, let’s use the Runnable interface to construct a lambda expression:
在源文件中,让我们使用Runnable接口来构造一个lambda表达式。
public class NotSerializableLambdaExpression {
public static Object getLambdaExpressionObject() {
Runnable r = () -> System.out.println("please serialize this message");
return r;
}
}
When trying to serialize the Runnable object, we’ll get a NotSerializableException. Before going on, let’s explain it a little bit.
当试图序列化Runnable对象时,我们会得到一个NotSerializableException。在继续之前,让我们先解释一下。
When the JVM encounters a lambda expression, it will use the built-in ASM to build an inner class. So, what does this inner class look like? We can dump this generated inner class by specifying the jdk.internal.lambda.dumpProxyClasses property on the command line:
当JVM遇到一个lambda表达式时,它将使用内置的ASM来构建一个内类。那么,这个内层类是什么样子的呢?我们可以通过在命令行中指定jdk.internal.lambda.dumpProxyClasses属性来转储这个生成的内类。
-Djdk.internal.lambda.dumpProxyClasses=<dump directory>
Be careful here: When we replace the <dump directory> with our target directory, this target directory had better be empty because the JVM may dump quite a few unexpected generated inner classes if our project depends on third-party libraries.
这里要小心。当我们把<dump directory>替换成我们的目标目录时,这个目标目录最好是空的,因为如果我们的项目依赖于第三方库,JVM可能会转储相当多的意外生成的内部类。
After dumping, we can inspect this generated inner class with an appropriate Java decompiler:
转储后,我们可以用适当的Java反编译器检查这个生成的内层类。
In the above picture, the generated inner class only implements the Runnable interface, which is the lambda expression’s target type. Also, in the run method, the code will invoke the NotSerializableLambdaExpression.lambda$getLambdaExpressionObject$0 method, which is generated by the Java compiler and represents our lambda expression implementation.
在上图中,生成的内类只实现了Runnable接口,也就是lambda表达式的目标类型。另外,在run方法中,代码将调用NotSerializableLambdaExpression.lambda$getLambdaExpressionObject$0方法,该方法由Java编译器生成,代表我们的lambda表达式实现。
Because this generated inner class is our lambda expression’s actual class and it doesn’t implement the Serializable interface, the lambda expression isn’t suitable for serialization.
因为这个生成的内类是我们lambda表达式的实际类,它没有实现Serializable接口,所以lambda表达式不适合于序列化。
2.2. How to Serialize Lambda
2.2.如何序列化Lambda
At this point, the problem falls to the point: how to add the Serializable interface to the generated inner class? The answer is casting a lambda expression with an intersection type that combines the functional interface and the Serializable interface.
这时,问题落到了点子上:如何将Serializable接口添加到生成的内类中?答案是用一个交集类型的lambda表达式,它结合了functional接口和Serializable接口。
For example, let’s combine the Runnable and Serializable into an intersection type:
例如,让我们把Runnable和Serializable组合成一个交叉类型。
Runnable r = (Runnable & Serializable) () -> System.out.println("please serialize this message");
Now, if we try to serialize the above Runnable object, it will succeed.
现在,如果我们尝试序列化上述Runnable对象,它将成功。
However, if we do this often, it can introduce a lot of boilerplate. To make the code clean, we can define a new interface that implements both Runnable and Serializable:
然而,如果我们经常这样做,就会引入大量的模板。为了使代码简洁,我们可以定义一个新的接口,同时实现Runnable和Serializable。
interface SerializableRunnable extends Runnable, Serializable {
}
Then we can use it:
然后我们可以使用它。
SerializableRunnable obj = () -> System.out.println("please serialize this message");
But we should also be careful not to capture any non-serializable arguments. For example, let’s define another interface:
但是我们也应该注意不要捕获任何不可序列化的参数。例如,让我们定义另一个接口。
interface SerializableConsumer<T> extends Consumer<T>, Serializable {
}
Then we may select the System.out::println as its implementation:
然后我们可以选择System.out::println作为其实现。
SerializableConsumer<String> obj = System.out::println;
As a result, it will lead to a NotSerializableException. That’s because this implementation will capture as its argument the System.out variable, whose class is PrintStream, which is not serializable.
因此,它将导致一个NotSerializableException。这是因为该实现将捕获System.out变量作为其参数,该变量的类是PrintStream,而这是不可序列化的。
3. The Underlying Mechanism
3.底层机制
At this point, we may be wondering: What happens underneath after we introduce an intersection type?
在这一点上,我们可能想知道。在我们引入一个交叉类型后,下面会发生什么?
To have a basis for discussion, let’s prepare another piece of code:
为了有一个讨论的基础,让我们再准备一段代码。
public class SerializableLambdaExpression {
public static Object getLambdaExpressionObject() {
Runnable r = (Runnable & Serializable) () -> System.out.println("please serialize this message");
return r;
}
}
3.1. The Compiled Class File
3.1.编译后的类文件
After compiling, we can use the javap to inspect the compiled class:
编译后,我们可以使用javap来检查编译后的类。
javap -v -p SerializableLambdaExpression.class
The -v option will print verbose messages, and the -p option will display private methods.
-v选项将打印冗长的信息,而-p选项将显示私有方法。
And, we may find that the Java compiler provides a $deserializeLambda$ method, which accepts a SerializedLambda parameter:
而且,我们可能会发现,Java编译器提供了一个$deserializeLambda$方法,它接受一个SerializedLambda参数。
For readability, let’s decompile the above bytecode into Java code:
为了便于阅读,让我们把上述字节码反编译成Java代码。
The main responsibility of the above $deserializeLambda$ method is to construct an object. First, it checks the SerializedLambda‘s getXXX methods with different parts of the lambda expression details. Then, if all conditions are met, it will invoke the SerializableLambdaExpression::lambda$getLambdaExpressionObject$36ab28bd$1 method reference to create an instance. Otherwise, it will throw an IllegalArgumentException.
上述$deserializeLambda$方法的主要职责是构造一个对象。首先,它检查SerializedLambda的getXXX方法与lambda表达式的不同部分细节。然后,如果所有条件都满足,它将调用SerializableLambdaExpression::lambda$getLambdaExpressionObject$36ab28bd$1方法引用来创建一个实例。否则,它将抛出一个IllegalArgumentException。
3.2. The Generated Inner Class
3.2.生成的内部类
Besides inspecting the compiled class file, we also need to inspect the newly generated inner class. So, let’s use the jdk.internal.lambda.dumpProxyClasses property to dump the generated inner class:
除了检查编译后的类文件,我们还需要检查新生成的内类。所以,让我们使用jdk.internal.lambda.dumpProxyClasses属性来转储生成的内类。
In the above code, the newly generated inner class implements both the Runnable and Serializable interfaces, which means it’s suitable for serialization. And, it also provides an extra writeReplace method. To look inside, this method returns a SerializedLambda instance describing the lambda expression implementation details.
在上面的代码中,新生成的内类同时实现了Runnable和Serializable接口,这意味着它适合序列化。而且,它还提供了一个额外的writeReplace方法。要看内部,这个方法返回一个SerializedLambda实例,描述lambda表达式的实现细节。
To form a closed loop, there is one more thing missing: the serialized lambda file.
要形成一个闭环,还缺一样东西:序列化的lambda文件。
3.3. The Serialized Lambda File
3.3.序列化的Lambda文件
As the serialized lambda file is stored in binary format, we can use a hex tool to check its contents:
由于序列化的lambda文件是以二进制格式存储的,我们可以使用一个十六进制工具来检查其内容。
In the serialized stream, the hex “AC ED” (“rO0” in Base64) is the stream magic number, and the hex “00 05” is the stream version. But, the remaining data isn’t human-readable.
在序列化的流中,十六进制”AC ED“(Base64中的 “rO0″)是流的神奇数字,而十六进制 “00 05 “是流的版本。但是,剩下的数据并不是人类可以阅读的。
According to the Object Serialization Stream Protocol, the remaining data can be interpreted:
根据对象序列化流协议,可以对剩余的数据进行解释。
From the above picture, we may notice the serialized lambda file actually contains the SerializedLambda class data. To be specific, it contains 10 fields and corresponding values. And, these fields and values of the SerializedLambda class are bridges between the $deserializeLambda$ method in the compiled class file and the writeReplace method in the generated inner class.
从上图中,我们可以注意到序列化的lambda文件实际上包含了SerializedLambda类数据。具体来说,它包含了10个字段和相应的值。而且,这些SerializedLambda类的字段和值在编译的类文件中的$deserializeLambda$方法和生成的内类中的writeReplace方法之间架起了桥梁。
3.4. Putting It All Together
3.4.归纳总结
Now, it’s time to combine different parts together:
现在,是时候把不同的部分组合在一起了。
When we use the ObjectOutputStream to serialize a lambda expression, the ObjectOutputStream will find the generated inner class contains a writeReplace method that returns a SerializedLambda instance. Then, the ObjectOutputStream will serialize this SerializedLambda instance instead of the original object.
当我们使用ObjectOutputStream来序列化一个lambda表达式时,ObjectOutputStream会发现生成的内类包含一个writeReplace方法,该方法返回一个SerializedLambda实例。然后,ObjectOutputStream将序列化这个SerializedLambda实例而不是原始对象。
Next, when we use the ObjectInputStream to deserialize the serialized lambda file, a SerializedLambda instance is created. Then, the ObjectInputStream will use this instance to invoke the readResolve defined in the SerializedLambda class. And, the readResolve method will invoke the $deserializeLambda$ method defined in the capturing class. Finally, we get the deserialized lambda expression.
接下来,当我们使用ObjectInputStream对序列化的lambda文件进行反序列化时,一个SerializedLambda实例被创建。然后,ObjectInputStream将使用这个实例来调用SerializedLambda类中定义的readResolve。而且,readResolve方法将调用捕获类中定义的$deserializeLambda$方法。最后,我们得到反序列化的lambda表达式。
To summarize, the SerializedLambda class is the key to the lambda serialization process.
总而言之,SerializedLambda类是lambda序列化过程的关键。
4. Conclusion
4.总结
In this article, we first looked at a failed lambda serialization example and explained why it failed. Then, we introduced how to make a lambda expression serializable. Finally, we explored the underlying mechanism of lambda serialization.
在这篇文章中,我们首先看了一个失败的lambda序列化例子,并解释了它失败的原因。然后,我们介绍了如何使一个lambda表达式可序列化。最后,我们探讨了lambda序列化的基本机制。
As usual, the source code for this tutorial can be found over on GitHub.
像往常一样,本教程的源代码可以在GitHub上找到超过。