1. Introduction
1.介绍
This quick article is focused on JMH (the Java Microbenchmark Harness). First, we get familiar with the API and learn its basics. Then we would see a few best practices that we should consider when writing microbenchmarks.
这篇快速的文章主要是关于JMH(Java Microbenchmark Harness)的。首先,我们要熟悉API并学习其基本知识。然后,我们将看到一些在编写微基准时应该考虑的最佳实践。
Simply put, JMH takes care of the things like JVM warm-up and code-optimization paths, making benchmarking as simple as possible.
简单地说,JMH负责处理JVM预热和代码优化路径等事宜,使基准测试尽可能简单。
2. Getting Started
2.开始
To get started, we can actually keep working with Java 8 and simply define the dependencies:
为了开始工作,我们实际上可以继续使用Java 8,并简单地定义依赖关系。
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.35</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.35</version>
</dependency>
The latest versions of the JMH Core and JMH Annotation Processor can be found in Maven Central.
JMH Core和JMH Annotation Processor的最新版本可以在Maven Central找到。
Next, create a simple benchmark by utilizing @Benchmark annotation (in any public class):
接下来,通过利用@Benchmark注解(在任何公共类中)创建一个简单的基准。
@Benchmark
public void init() {
// Do nothing
}
Then we add the main class that starts the benchmarking process:
然后,我们添加启动基准测试过程的主类。
public class BenchmarkRunner {
public static void main(String[] args) throws Exception {
org.openjdk.jmh.Main.main(args);
}
}
Now running BenchmarkRunner will execute our arguably somewhat useless benchmark. Once the run is complete, a summary table is presented:
现在运行BenchmarkRunner将执行我们的可以说是有些无用的基准。一旦运行完成,就会出现一个汇总表。
# Run complete. Total time: 00:06:45
Benchmark Mode Cnt Score Error Units
BenchMark.init thrpt 200 3099210741.962 ± 17510507.589 ops/s
3. Types of Benchmarks
3.基准的类型
JMH supports some possible benchmarks: Throughput, AverageTime, SampleTime, and SingleShotTime. These can be configured via @BenchmarkMode annotation:
JMH支持一些可能的基准。Throughput, AverageTime, SampleTime, 和SingleShotTime。这些可以通过@BenchmarkMode注解进行配置。
@Benchmark
@BenchmarkMode(Mode.AverageTime)
public void init() {
// Do nothing
}
The resulting table will have an average time metric (instead of throughput):
结果表将有一个平均时间指标(而不是吞吐量)。
# Run complete. Total time: 00:00:40
Benchmark Mode Cnt Score Error Units
BenchMark.init avgt 20 ≈ 10⁻⁹ s/op
4. Configuring Warmup and Execution
4.配置热身和执行
By using the @Fork annotation, we can set up how the benchmark execution happens: the value parameter controls how many times the benchmark will be executed, and the warmup parameter controls how many times a benchmark will dry run before results are collected, for example:
通过使用@Fork注解,我们可以设置基准执行的方式:value参数控制基准的执行次数,而warmup参数控制在收集结果前基准的干运行次数,例如:。
@Benchmark
@Fork(value = 1, warmups = 2)
@BenchmarkMode(Mode.Throughput)
public void init() {
// Do nothing
}
This instructs JMH to run two warm-up forks and discard results before moving onto real timed benchmarking.
这指示JMH运行两个热身叉,并在进入真正的计时基准测试之前丢弃结果。
Also, the @Warmup annotation can be used to control the number of warmup iterations. For example, @Warmup(iterations = 5) tells JMH that five warm-up iterations will suffice, as opposed to the default 20.
另外,@Warmup注解可以用来控制预热迭代的数量。例如,@Warmup(iterations = 5)告诉JMH,5次热身迭代就足够了,而不是默认的20次。
5. State
5.州
Let’s now examine how a less trivial and more indicative task of benchmarking a hashing algorithm can be performed by utilizing State. Suppose we decide to add extra protection from dictionary attacks on a password database by hashing the password a few hundred times.
现在让我们研究一下如何通过利用State来执行对散列算法进行基准测试这一不那么琐碎但更具指示性的任务。假设我们决定通过对密码进行几百次的哈希运算来增加对密码数据库的额外保护,以防止字典攻击。
We can explore the performance impact by using a State object:
我们可以通过使用State对象来探索性能影响。
@State(Scope.Benchmark)
public class ExecutionPlan {
@Param({ "100", "200", "300", "500", "1000" })
public int iterations;
public Hasher murmur3;
public String password = "4v3rys3kur3p455w0rd";
@Setup(Level.Invocation)
public void setUp() {
murmur3 = Hashing.murmur3_128().newHasher();
}
}
Our benchmark method then will look like:
那么我们的基准方法将看起来像。
@Fork(value = 1, warmups = 1)
@Benchmark
@BenchmarkMode(Mode.Throughput)
public void benchMurmur3_128(ExecutionPlan plan) {
for (int i = plan.iterations; i > 0; i--) {
plan.murmur3.putString(plan.password, Charset.defaultCharset());
}
plan.murmur3.hash();
}
Here, the field iterations will be populated with appropriate values from the @Param annotation by the JMH when it is passed to the benchmark method. The @Setup annotated method is invoked before each invocation of the benchmark and creates a new Hasher ensuring isolation.
在这里,当字段iterations被传递给基准方法时,JMH将用@Param注解中的适当值来填充。@Setup 注释的方法在每次调用基准时被调用,并创建一个新的Hasher以确保隔离。
When the execution is finished, we’ll get a result similar to the one below:
当执行完成后,我们会得到一个与下面类似的结果。
# Run complete. Total time: 00:06:47
Benchmark (iterations) Mode Cnt Score Error Units
BenchMark.benchMurmur3_128 100 thrpt 20 92463.622 ± 1672.227 ops/s
BenchMark.benchMurmur3_128 200 thrpt 20 39737.532 ± 5294.200 ops/s
BenchMark.benchMurmur3_128 300 thrpt 20 30381.144 ± 614.500 ops/s
BenchMark.benchMurmur3_128 500 thrpt 20 18315.211 ± 222.534 ops/s
BenchMark.benchMurmur3_128 1000 thrpt 20 8960.008 ± 658.524 ops/s
6. Dead Code Elimination
6.消除死代码
When running microbenchmarks, it’s very important to be aware of optimizations. Otherwise, they may affect the benchmark results in a very misleading way.
在运行微观基准测试时,注意优化是非常重要的。否则,它们可能会以一种非常误导的方式影响基准测试结果。
To make matters a bit more concrete, let’s consider an example:
为了使事情更加具体,让我们考虑一个例子。
@Benchmark
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
public void doNothing() {
}
@Benchmark
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
public void objectCreation() {
new Object();
}
We expect object allocation costs more than doing nothing at all. However, if we run the benchmarks:
我们预计对象分配的成本比什么都不做要高。然而,如果我们运行这些基准。
Benchmark Mode Cnt Score Error Units
BenchMark.doNothing avgt 40 0.609 ± 0.006 ns/op
BenchMark.objectCreation avgt 40 0.613 ± 0.007 ns/op
Apparently finding a place in the TLAB, creating and initializing an object is almost free! Just by looking at these numbers, we should know that something does not quite add up here.
显然,在TLAB中找到一个位置,创建和初始化一个对象几乎是免费的!仅仅通过观察这些数字,我们就应该知道,这里有些东西并不完全吻合。
Here, we’re the victim of dead code elimination. Compilers are very good at optimizing away the redundant code. As a matter of fact, that’s exactly what the JIT compiler did here.
在这里,我们是死代码消除的受害者。编译器非常善于优化掉多余的代码。事实上,这正是JIT编译器在这里所做的。
In order to prevent this optimization, we should somehow trick the compiler and make it think that the code is used by some other component. One way to achieve this is just to return the created object:
为了防止这种优化,我们应该以某种方式欺骗编译器,使其认为该代码被其他组件使用。实现这一目的的一种方法是直接返回所创建的对象。
@Benchmark
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
public Object pillarsOfCreation() {
return new Object();
}
Also, we can let the Blackhole consume it:
另外,我们可以让黑洞消耗它。
@Benchmark
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
public void blackHole(Blackhole blackhole) {
blackhole.consume(new Object());
}
Having Blackhole consume the object is a way to convince the JIT compiler to not apply the dead code elimination optimization. Anyway, if we run theses benchmarks again, the numbers would make more sense:
让黑洞消耗对象是一种说服JIT编译器不应用死代码消除优化的方法。无论如何,如果我们再次运行这些基准,这些数字会更有意义。
Benchmark Mode Cnt Score Error Units
BenchMark.blackHole avgt 20 4.126 ± 0.173 ns/op
BenchMark.doNothing avgt 20 0.639 ± 0.012 ns/op
BenchMark.objectCreation avgt 20 0.635 ± 0.011 ns/op
BenchMark.pillarsOfCreation avgt 20 4.061 ± 0.037 ns/op
7. Constant Folding
7.不断折叠
Let’s consider yet another example:
让我们再考虑一个例子。
@Benchmark
public double foldedLog() {
int x = 8;
return Math.log(x);
}
Calculations based on constants may return the exact same output, regardless of the number of executions. Therefore, there is a pretty good chance that the JIT compiler will replace the logarithm function call with its result:
基于常量的计算可能会返回完全相同的输出,无论执行的次数如何。因此,JIT编译器很有可能将对数函数的调用替换为其结果。
@Benchmark
public double foldedLog() {
return 2.0794415416798357;
}
This form of partial evaluation is called constant folding. In this case, constant folding completely avoids the Math.log call, which was the whole point of the benchmark.
这种部分评估的形式被称为常量折叠。在这种情况下,常量折叠完全避免了Math.log调用,而这正是该基准的全部意义所在。
In order to prevent constant folding, we can encapsulate the constant state inside a state object:
为了防止常量折叠,我们可以将常量状态封装在一个状态对象里面。
@State(Scope.Benchmark)
public static class Log {
public int x = 8;
}
@Benchmark
public double log(Log input) {
return Math.log(input.x);
}
If we run these benchmarks against each other:
如果我们对这些基准进行对照运行。
Benchmark Mode Cnt Score Error Units
BenchMark.foldedLog thrpt 20 449313097.433 ± 11850214.900 ops/s
BenchMark.log thrpt 20 35317997.064 ± 604370.461 ops/s
Apparently, the log benchmark is doing some serious work compared to the foldedLog, which is sensible.
显然,与foldedLog相比,log基准正在做一些严肃的工作,这是明智的。
8. Conclusion
8.结论
This tutorial focused on and showcased Java’s micro benchmarking harness.
该教程重点介绍并展示了Java的微型基准测试工具。
As always, code examples can be found on GitHub.