A Guide to async-profiler – async-profiler指南

最后修改: 2020年 8月 5日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Java Sampling Profilers are usually designed using the JVM Tool Interface (JVMTI) and collect stack traces at a safepoint. Therefore, these sampling profilers can suffer from the safepoint bias problem.

Java 采样剖析器通常使用 JVM 工具接口(JVMTI)进行设计,并在安全点收集堆栈跟踪。因此,这些采样剖析器可能存在安全点偏差问题

For a holistic view of the application, we need a sampling profiler that doesn’t require threads to be at safepoints and can collect the stack traces at any time to avoid the safepoint bias problem.

为了全面了解应用程序,我们需要一个采样剖析器,它不要求线程处于安全点,并且可以在任何时候收集堆栈跟踪,以避免安全点偏差问题

In this tutorial, we’ll explore async-profiler along with various profiling techniques it offers.

在本教程中,我们将探索async-profiler以及它提供的各种分析技术。

2. async-profiler

2.async-profiler

async-profiler is a sampling profiler for any JDK based on the HotSpot JVM. It has low overhead and doesn’t rely on JVMTI.

async-profiler是一个适用于任何基于HotSpot JVM的JDK的采样分析器。它的开销很低,而且不依赖JVMTI。

It avoids the safepoint bias problem by using the AsyncGetCallTrace API provided by HotSpot JVM to profile the Java code paths, and Linux’s perf_events to profile the native code paths.

它通过使用HotSpot JVM提供的AsyncGetCallTrace API来剖析Java代码路径,以及使用Linux的perf_events来剖析本地代码路径,从而避免了安全点偏差问题。

In other words, the profiler matches call stacks of both Java code and native code paths to produce accurate results.

换句话说,剖析器匹配Java代码和本地代码路径的调用堆栈,以产生准确的结果。

3. Setup

3.设置

3.1. Installation

3.1.安装

First, we’ll download the latest release of async-profiler based on our platform. Currently, it supports Linux and macOS platforms only.

首先,我们将下载基于我们平台的async-profiler的最新版本。目前,它只支持Linux和macOS平台。

Once downloaded, we can check if it’s working on our platform:

一旦下载,我们可以检查它是否在我们的平台上工作。

$ ./profiler.sh --version
Async-profiler 1.7.1 built on May 14 2020
Copyright 2016-2020 Andrei Pangin

It’s always a good idea to check all the options available with async-profiler beforehand:

事先检查async-profiler的所有可用选项总是一个好主意。

$ ./profiler.sh
Usage: ./profiler.sh [action] [options] 
Actions:
  start             start profiling and return immediately
  resume            resume profiling without resetting collected data
  stop              stop profiling
  check             check if the specified profiling event is available
  status            print profiling status
  list              list profiling events supported by the target JVM
  collect           collect profile for the specified period of time
                    and then stop (default action)
Options:
  -e event          profiling event: cpu|alloc|lock|cache-misses etc.
  -d duration       run profiling for  seconds
  -f filename       dump output to 
  -i interval       sampling interval in nanoseconds
  -j jstackdepth    maximum Java stack depth
  -b bufsize        frame buffer size
  -t                profile different threads separately
  -s                simple class names instead of FQN
  -g                print method signatures
  -a                annotate Java method names
  -o fmt            output format: summary|traces|flat|collapsed|svg|tree|jfr
  -I include        output only stack traces containing the specified pattern
  -X exclude        exclude stack traces with the specified pattern
  -v, --version     display version string

  --title string    SVG title
  --width px        SVG width
  --height px       SVG frame height
  --minwidth px     skip frames smaller than px
  --reverse         generate stack-reversed FlameGraph / Call tree

  --all-kernel      only include kernel-mode events
  --all-user        only include user-mode events
  --cstack mode     how to traverse C stack: fp|lbr|no

 is a numeric process ID of the target JVM
      or 'jps' keyword to find running JVM automatically

Many of the shown options will come handy in the later sections.

所示的许多选项在后面的章节中会很方便。

3.2. Kernel Configuration

内核配置

When using async-profiler on the Linux platform, we should make sure to configure our kernel to capture call stacks using the perf_events by all users:

在Linux平台上使用async-profiler时,我们应该确保配置我们的内核,以便所有用户使用perf_events捕获调用堆栈。

First, we’ll set the perf_event_paranoid to 1, which will allow the profiler to collect performance information:

首先,我们将设置perf_event_paranoid为1,这将允许分析器收集性能信息:

$ sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

Then, we’ll set the kptr_restrict to 0 to remove the restrictions on exposing kernel addresses:

然后,我们将设置kptr_restrict为0,以消除对暴露内核地址的限制。

$ sudo sh -c 'echo 0 >/proc/sys/kernel/kptr_restrict'

However, the async-profiler will work by itself on the macOS platform.

然而,async-profiler将在macOS平台上自行工作。

Now that our platform is ready, we can build our profiling application and run it using the Java command:

现在,我们的平台已经准备好了,我们可以建立我们的剖析应用程序,并使用Java命令运行它。

$ java -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints -jar path-to-jar-file

Here, we’ve started our profiling app using the -XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints JVM flags that are highly recommended for accurate results.

在这里,我们使用-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepointsJVM标志开始我们的剖析应用,这些标志是强烈推荐的,以获得准确的结果

Now that we’re ready to profile our application, let’s explore various types of profiling supported by the async-profiler.

现在我们已经准备好对我们的应用程序进行剖析,让我们来探索async-profiler所支持的各种类型的剖析。

4. CPU Profiling

4.CPU剖析

Async-profiler collects sample stack traces of Java methods, including JVM code, native class, and kernel functions, when profiling CPU.

Async-profiler在对CPU进行剖析时收集Java方法的堆栈跟踪样本,包括JVM代码、本地类和内核函数。

Let’s profile our application using its PID:

让我们用它的PID来描述我们的应用程序。

$ ./profiler.sh -e cpu -d 30 -o summary 66959
Started [cpu] profiling
--- Execution profile --- 
Total samples       : 28

Frame buffer usage  : 0.069%

Here, we’ve defined the cpu profiling event by using the -e option. Then, we used the -d <duration> option to collect the sample for 30 seconds.

在这里,我们通过使用e选项定义了cpu剖析事件。然后,我们使用-d <duration>选项来收集30秒的样本。

Last, the -o option is useful to define the output format like summary, HTML, traces, SVG, and tree.

最后,-o选项对定义输出格式很有用,如摘要、HTML、痕迹、SVG和树形

Let’s create the HTML output while CPU profiling our application:

让我们在对我们的应用程序进行CPU分析时,创建HTML输出。

$ ./profiler.sh -e cpu -d 30 -f cpu_profile.html 66959

Here, we can see the HTML output allows us to expand, collapse, and search the samples.

在这里,我们可以看到HTML输出允许我们展开、折叠和搜索样本。

Additionally, async-profiler supports flame graphs out-of-the-box.

此外,async-profiler支持火焰图,开箱即用

Let’s generate a flame graph by using the .svg file extension for the CPU profile of our application:

让我们通过使用.svg文件扩展名为我们应用程序的CPU配置文件来生成一个火焰图。

$ ./profiler.sh -e cpu -d 30 -f cpu_profile.svg 66959

Here, the resulting flame graph shows Java code paths in green, C++ in yellow, and system code paths in red.

这里,产生的火焰图以绿色显示Java代码路径,黄色显示C++,红色显示系统代码路径。

5. Allocation Profiling

5.配置剖析

Similarly, we can collect samples of memory allocation without using an intrusive technique like bytecode instrumentation.

同样地,我们可以收集内存分配的样本,而不需要使用像字节码工具这样的侵入性技术。

async-profiler uses the TLAB (Thread Local Allocation Buffer) based sampling technique to collect the samples of the heap allocation above the average size of TLAB.

async-profiler使用基于TLAB(线程本地分配缓冲区)的抽样技术来收集高于TLAB平均大小的堆分配样本。

By using the alloc event, we can enable the profiler to collect heap allocations of our profiling application:

通过使用alloc事件,我们可以让剖析器收集我们剖析应用程序的堆分配。

$ ./profiler.sh -e alloc -d 30 -f alloc_profile.svg 66255

Here, we can see the object cloning has allocated a large part of memory, which is otherwise hard to perceive when looking at the code.

在这里,我们可以看到对象克隆已经分配了很大一部分内存,否则在看代码时很难察觉到。

6. Wall-Clock Profiling

6.壁挂式测绘

Also, async-profiler can sample all threads irrespective of their status – like running, sleeping, or blocked – by using the wall-clock profile.

此外,async-profiler可以通过使用壁时钟配置文件对所有线程进行采样,而不管其状态如何–如运行、睡眠或阻塞。

This can prove handy when troubleshooting issues in the application start-up time.

在排除应用程序启动时间的问题时,这可以证明很方便。

By defining the wall event, we can configure the profiler to collect samples of all threads:

通过定义wall事件,我们可以配置剖析器来收集所有线程的样本。

$ ./profiler.sh -e wall -t -d 30 -f wall_clock_profile.svg 66959

Here, we’ve used the wall-clock profiler in per-thread mode by using the -t option, which is highly recommended when profiling all threads.

在这里,我们通过使用-t选项,在每线程模式下使用壁时钟剖析器,在剖析所有线程时,强烈建议使用这种模式。

Additionally, we can check all profiling events supported by our JVM by using the list option:

此外,我们可以通过使用list选项来检查我们的JVM支持的所有剖析事件。

$ ./profiler.sh list 66959
Basic events:
  cpu
  alloc
  lock
  wall
  itimer
Java method calls:
  ClassName.methodName

7. async-profiler With IntelliJ IDEA

7.async-profiler与IntelliJ IDEA的关系

IntelliJ IDEA features integration with async-profiler as a profiling tool for Java.

IntelliJ IDEA的特点是与async-profiler集成,作为Java的分析工具

7.1. Profiler Configurations

7.1.剖析器的配置

We can configure async-profiler in IntelliJ IDEA by selecting the Java Profiler menu option at Settings/Preferences > Build, Execution, Deployment:

我们可以在IntelliJ IDEA中通过选择Settings/Preferences > Build, Execution, Deployment:Java Profiler菜单选项来配置async-profiler。

Also, for quick usage, we can choose any predefined configuration, like the CPU Profiler and the Allocation Profiler that IntelliJ IDEA offers.

另外,为了快速使用,我们可以选择任何预定义的配置,比如IntelliJ IDEA提供的CPU分析器和分配分析器

Similarly, we can copy a profiler template and edit the Agent options for specific use cases.

同样地,我们可以复制一个剖析器模板,并为特定的使用情况编辑代理选项

7.2. Profile Application Using IntelliJ IDEA

7.2.使用IntelliJ IDEA的配置文件应用程序

There are a few ways to analyze our application with a profiler.

有几种方法可以用剖析器来分析我们的应用程序。

For instance, we can select the application and choose Run <application name> with <profiler configuration name> option:

例如,我们可以选择应用程序并选择运行<应用程序名称>与<剖析器配置名称>选项。

Or, we can click on the toolbar and choose the Run <application name> with <profiler configuration name> option:

或者,我们可以点击工具栏,选择运行<应用程序名称>与<分析器配置名称>选项。

Or, by choosing the Run with Profiler option under the Run menu, then selecting the <profiler configuration name>:

或者,通过选择Run菜单下的Run with Profiler选项,然后选择<profiler配置名称>

Additionally, we can see the option to Attach Profiler to Process under the Run menu. It opens a dialog that lets us choose the process to attach:

此外,我们可以在Run菜单下看到Attach Profiler to Process选项。它打开一个对话框,让我们选择要附加的进程。

Once our application is profiled, we can analyze the profiling result using the Profiler tool window bar at the bottom of the IDE.

一旦我们的应用程序被剖析,我们可以使用IDE底部的Profiler工具窗口栏分析剖析结果。

The profiling result of our application will look like:

我们的应用程序的剖析结果将看起来像。

It shows the thread wise results in different output formats like flame graphs, call trees, and method list.

它以不同的输出格式显示线程的结果,如火焰图、调用树和方法列表。

Alternatively, we can choose the Profiler option under the View > Tool Windows menu to see the results:

另外,我们可以选择View > Tool Windows菜单下的Profiler选项来查看结果:

8. Conclusion

8.结语

In this article, we explored the async-profiler, along with a few profiling techniques.

在这篇文章中,我们探讨了async-profiler,以及一些剖析技术。

First, we’ve seen how to configure the kernel when using the Linux platform, and a few recommended JVM flags to start profiling our application with to obtain accurate results.

首先,我们已经看到了在使用Linux平台时如何配置内核,以及一些推荐的JVM标志,以开始对我们的应用程序进行分析,从而获得准确的结果。

Then, we examined various types of profiling techniques like CPU, allocation, and wall-clock.

然后,我们研究了各种类型的剖析技术,如CPU、分配和壁时钟。

Last, we profiled an application with async-profiler using IntelliJ IDEA.

最后,我们用async-profiler使用IntelliJ IDEA对一个应用程序进行了分析。