Introduction to Netflix Servo – Netflix伺服系统简介

最后修改: 2017年 6月 28日

中文/混合/英文(键盘快捷键:t)

1. Overview

1.概述

Netflix Servo is a metrics tool for Java applications. Servo is similar to Dropwizard Metrics, yet much simpler. It leverages JMX only to provide a simple interface for exposing and publishing application metrics.

Netflix Servo是一个用于Java应用程序的度量工具。Servo与Dropwizard Metrics类似,但要简单得多。它只利用JMX来提供一个简单的接口,用于公开和发布应用程序的度量。

In this article, we’ll introduce what Servo provides and how can we use it to collect and publish application metrics.

在这篇文章中,我们将介绍Servo提供的内容,以及如何使用它来收集和发布应用程序的指标。

2. Maven Dependencies

2.Maven的依赖性

Before we dive into actual implementation, let’s add the Servo dependency to the pom.xml file:

在我们深入研究实际实施之前,让我们将Servo依赖性添加到pom.xml文件中。

<dependency>
    <groupId>com.netflix.servo</groupId>
    <artifactId>servo-core</artifactId>
    <version>0.12.16</version>
</dependency>

Besides, there are many extensions available, such as Servo-Apache, Servo-AWS, etc. We may need them later. Latest versions of these extensions can also be found on Maven Central.

此外,还有许多可用的扩展,如Servo-ApacheServo-AWS等。我们以后可能会需要它们。这些扩展的最新版本也可以在Maven中心找到。

3. Collect Metrics

3.收集指标

First, let’s see how to gather metrics from our application.

首先,让我们看看如何从我们的应用程序中收集指标。

Servo provides four primary metric types: Counter, Gauge, Timer, and Informational.

伺服系统提供了四种主要的度量衡类型。计数器, 仪表, 计时器,信息性

3.1. Metric Types – Counter

3.1.公制类型 – 计数器

Counters are used to record incrementation. Commonly used implementations are BasicCounter, StepCounter, and PeakRateCounter.

计数器是用来记录增量的。常用的实现是 BasicCounter, StepCounter, 和 PeakRateCounter

BasicCounter does what a counter should do, plain and straightforward:

BasicCounter做了一个计数器应该做的事情,简单而直接。

Counter counter = new BasicCounter(MonitorConfig.builder("test").build());
assertEquals("counter should start with 0", 0, counter.getValue().intValue());

counter.increment();
 
assertEquals("counter should have increased by 1", 1, counter.getValue().intValue());

counter.increment(-1);
 
assertEquals("counter should have decreased by 1", 0, counter.getValue().intValue());

PeakRateCounter returns the maximum count for a given second during the polling interval:

PeakRateCounter返回轮询间隔期间某一秒的最大计数。

Counter counter = new PeakRateCounter(MonitorConfig.builder("test").build());
assertEquals(
  "counter should start with 0", 
  0, counter.getValue().intValue());

counter.increment();
SECONDS.sleep(1);

counter.increment();
counter.increment();

assertEquals("peak rate should have be 2", 2, counter.getValue().intValue());

Unlike other counters, StepCounter records rate per second of previous polling interval:

与其他计数器不同, StepCounter记录的是前一个轮询间隔的每秒速率。

System.setProperty("servo.pollers", "1000");
Counter counter = new StepCounter(MonitorConfig.builder("test").build());
 
assertEquals("counter should start with rate 0.0", 0.0, counter.getValue());

counter.increment();
SECONDS.sleep(1);

assertEquals(
  "counter rate should have increased to 1.0", 
  1.0, counter.getValue());

Notice that we set the servo.pollers to 1000 in the code above. That was to set the polling interval to 1 second instead of intervals of 60 seconds and 10 seconds by default. We’ll cover more on this later.

注意我们在上面的代码中把servo.pollers设为1000。那是为了将轮询间隔设置为1秒,而不是默认的60秒和10秒的间隔。我们将在后面介绍更多这方面的内容。

3.2. Metric Types – Gauge

3.2.公制类型 – 量具

Gauge is a simple monitor that returns the current value. BasicGauge, MinGauge, MaxGauge, and NumberGauges are provided.

Gauge是一个简单的监视器,它返回当前值。提供了BasicGaugeMinGaugeMaxGauge以及NumberGauges

BasicGauge invokes a Callable to get the current value. We can get the size of a collection, latest value of a BlockingQueue or any value that requires small computations.

BasicGauge调用一个Callable来获取当前值。我们可以获得一个集合的大小,一个BlockingQueue的最新值,或者任何需要小规模计算的值。

Gauge<Double> gauge = new BasicGauge<>(MonitorConfig.builder("test")
  .build(), () -> 2.32);
 
assertEquals(2.32, gauge.getValue(), 0.01);

MaxGauge and MinGauge are used to keep track of the maximum and minimum values respectively:

MaxGaugeMinGauge分别用于跟踪最大值和最小值。

MaxGauge gauge = new MaxGauge(MonitorConfig.builder("test").build());
assertEquals(0, gauge.getValue().intValue());

gauge.update(4);
assertEquals(4, gauge.getCurrentValue(0));

gauge.update(1);
assertEquals(4, gauge.getCurrentValue(0));

NumberGauge (LongGauge, DoubleGauge) wraps a provided Number (Long, Double). To collect metrics using these gauges, we must ensure the Number is thread-safe.

NumberGauge (LongGauge, DoubleGauge)包装了一个提供的Number(Long, Double)。为了使用这些仪表收集指标,我们必须确保Number是线程安全的。

3.3. Metric Types – Timer

3.3.公制类型 – Timer

Timers help measure duration of a particular event. Default implementations are BasicTimer, StatsTimer, and BucketTimer.

Timers帮助测量一个特定事件的持续时间。默认的实现是BasicTimerStatsTimer,和BucketTimer

BasicTimer records total time, count and other simple statistics:

BasicTimer记录总时间、计数和其他简单的统计数据。

BasicTimer timer = new BasicTimer(MonitorConfig.builder("test").build(), SECONDS);
Stopwatch stopwatch = timer.start();

SECONDS.sleep(1);
timer.record(2, SECONDS);
stopwatch.stop();

assertEquals("timer should count 1 second", 1, timer.getValue().intValue());
assertEquals("timer should count 3 seconds in total", 
  3.0, timer.getTotalTime(), 0.01);
assertEquals("timer should record 2 updates", 2, timer.getCount().intValue());
assertEquals("timer should have max 2", 2, timer.getMax(), 0.01);

StatsTimer provides much richer statistics by sampling between polling intervals:

StatsTimer通过在轮询时间间隔内取样,提供更丰富的统计数据。

System.setProperty("netflix.servo", "1000");
StatsTimer timer = new StatsTimer(MonitorConfig
  .builder("test")
  .build(), new StatsConfig.Builder()
  .withComputeFrequencyMillis(2000)
  .withPercentiles(new double[] { 99.0, 95.0, 90.0 })
  .withPublishMax(true)
  .withPublishMin(true)
  .withPublishCount(true)
  .withPublishMean(true)
  .withPublishStdDev(true)
  .withPublishVariance(true)
  .build(), SECONDS);
Stopwatch stopwatch = timer.start();

SECONDS.sleep(1);
timer.record(3, SECONDS);
stopwatch.stop();

stopwatch = timer.start();
timer.record(6, SECONDS);
SECONDS.sleep(2);
stopwatch.stop();

assertEquals("timer should count 12 seconds in total", 
  12, timer.getTotalTime());
assertEquals("timer should count 12 seconds in total", 
  12, timer.getTotalMeasurement());
assertEquals("timer should record 4 updates", 4, timer.getCount());
assertEquals("stats timer value time-cost/update should be 2", 
  3, timer.getValue().intValue());

final Map<String, Number> metricMap = timer.getMonitors().stream()
  .collect(toMap(monitor -> getMonitorTagValue(monitor, "statistic"),
    monitor -> (Number) monitor.getValue()));
 
assertThat(metricMap.keySet(), containsInAnyOrder(
  "count", "totalTime", "max", "min", "variance", "stdDev", "avg", 
  "percentile_99", "percentile_95", "percentile_90"));

BucketTimer provides a way to get the distribution of samples by bucketing value ranges:

BucketTimer提供了一种通过桶状值范围获得样本分布的方法。

BucketTimer timer = new BucketTimer(MonitorConfig
  .builder("test")
  .build(), new BucketConfig.Builder()
  .withBuckets(new long[] { 2L, 5L })
  .withTimeUnit(SECONDS)
  .build(), SECONDS);

timer.record(3);
timer.record(6);

assertEquals(
  "timer should count 9 seconds in total",
  9, timer.getTotalTime().intValue());
 
Map<String, Long> metricMap = timer.getMonitors().stream()
  .filter(monitor -> monitor.getConfig().getTags().containsKey("servo.bucket"))
  .collect(toMap(
    m -> getMonitorTagValue(m, "servo.bucket"),
    m -> (Long) m.getValue()));

assertThat(metricMap, allOf(hasEntry("bucket=2s", 0L), hasEntry("bucket=5s", 1L),
  hasEntry("bucket=overflow", 1L)));

To track long-time operations that might last for hours, we can use the composite monitor DurationTimer.

为了跟踪可能持续数小时的长时间操作,我们可以使用复合监视器DurationTimer

3.4. Metric Types – Informational

3.4.公制类型 – 信息性

Also, we can make use of the Informational monitor to record descriptive information to help debugging and diagnostics. The only implementation is BasicInformational, and its usage cannot be simpler:

另外,我们可以利用Informational监视器来记录描述性信息,以帮助调试和诊断。唯一的实现是BasicInformational,其用法再简单不过了。

BasicInformational informational = new BasicInformational(
  MonitorConfig.builder("test").build());
informational.setValue("information collected");

3.5. MonitorRegistry

3.5.MonitorRegistry

The metric types are all of type Monitor, which is the very base of Servo. We now know kinds of tools collect raw metrics, but to report the data, we need to register these monitors.

这些指标类型都是Monitor类型,这是Servo的基础。我们现在知道种类的工具收集原始度量,但为了报告数据,我们需要注册这些监视器。

Note that each single configured monitor should be registered once and only once to ensure the correctness of metrics. So we can register the monitors using Singleton pattern.

请注意,每一个配置好的监视器都应该被注册一次,而且只注册一次,以确保度量的正确性。所以我们可以使用Singleton模式来注册监视器。

Most of the time, we can use DefaultMonitorRegistry to register monitors:

大多数时候,我们可以使用DefaultMonitorRegistry来注册监视器。

Gauge<Double> gauge = new BasicGauge<>(MonitorConfig.builder("test")
  .build(), () -> 2.32);
DefaultMonitorRegistry.getInstance().register(gauge);

If we want to dynamically register a monitor, DynamicTimer, and DynamicCounter can be used:

如果我们想动态地注册一个监视器,可以使用DynamicTimer,和DynamicCounter

DynamicCounter.increment("monitor-name", "tag-key", "tag-value");

Note that dynamic registration would cause expensive lookup operation each time the value is updated.

注意,动态注册将导致每次更新值时都要进行昂贵的查找操作。

Servo also provides several helper methods to register monitors declared in objects:

Servo还提供了几个辅助方法来注册对象中声明的监视器。

Monitors.registerObject("testObject", this);
assertTrue(Monitors.isObjectRegistered("testObject", this));

Method registerObject will use reflection to add all instances of Monitors declared by annotation @Monitor and add tags declared by @MonitorTags:

方法registerObject将使用反射来添加注解@Monitor所声明的所有Monitors实例,并且添加@MonitorTags所声明的标签。

@Monitor(
  name = "integerCounter",
  type = DataSourceType.COUNTER,
  description = "Total number of update operations.")
private AtomicInteger updateCount = new AtomicInteger(0);

@MonitorTags
private TagList tags = new BasicTagList(
  newArrayList(new BasicTag("tag-key", "tag-value")));

@Test
public void givenAnnotatedMonitor_whenUpdated_thenDataCollected() throws Exception {
    System.setProperty("servo.pollers", "1000");
    Monitors.registerObject("testObject", this);
    assertTrue(Monitors.isObjectRegistered("testObject", this));

    updateCount.incrementAndGet();
    updateCount.incrementAndGet();
    SECONDS.sleep(1);

    List<List<Metric>> metrics = observer.getObservations();
 
    assertThat(metrics, hasSize(greaterThanOrEqualTo(1)));
 
    Iterator<List<Metric>> metricIterator = metrics.iterator();
    metricIterator.next(); //skip first empty observation
 
    while (metricIterator.hasNext()) {
        assertThat(metricIterator.next(), hasItem(
          hasProperty("config", 
          hasProperty("name", is("integerCounter")))));
    }
}

4. Publish Metrics

4.发布指标

With the metrics collected, we can publish it to in any format, such as rendering time series graphs on various data visualization platforms. To publish the metrics, we need to poll the data periodically from the monitor observations.

有了收集到的指标,我们可以把它发布到任何格式,比如在各种数据可视化平台上渲染时间序列图。为了发布指标,我们需要定期从监测器观察中轮询数据。

4.1. MetricPoller

4.1.MetricPoller

MetricPoller is used as a metrics fetcher. We can fetch metrics of MonitorRegistries, JVM, JMX. With the help of extensions, we can poll metrics like Apache server status and Tomcat metrics.

MetricPoller被用作一个指标获取器。我们可以获取MonitorRegistriesJVMJMX的度量。在扩展的帮助下,我们可以轮询诸如Apache服务器状态Tomcat度量等指标。

MemoryMetricObserver observer = new MemoryMetricObserver();
PollRunnable pollRunnable = new PollRunnable(new JvmMetricPoller(),
  new BasicMetricFilter(true), observer);
PollScheduler.getInstance().start();
PollScheduler.getInstance().addPoller(pollRunnable, 1, SECONDS);

SECONDS.sleep(1);
PollScheduler.getInstance().stop();
List<List<Metric>> metrics = observer.getObservations();

assertThat(metrics, hasSize(greaterThanOrEqualTo(1)));
List<String> keys = extractKeys(metrics);
 
assertThat(keys, hasItems("loadedClassCount", "initUsage", "maxUsage", "threadCount"));

Here we created a JvmMetricPoller to poll metrics of JVM. When adding the poller to the scheduler, we let the poll task to run every second. System default poller configurations are defined in Pollers, but we can specify pollers to use with system property servo.pollers.

这里我们创建了一个JvmMetricPoller来轮询JVM的度量。当把轮询器添加到调度器时,我们让轮询任务每秒钟运行一次。系统默认的轮询器配置在Pollers中定义,但我们可以通过系统属性servo.pollers指定使用的轮询器。

4.2. MetricObserver

4.2.MetricObserver

When polling metrics, observations of registered MetricObservers will be updated.

当轮询指标时,注册的MetricObservers的观察值将被更新。

MetricObservers provided by default are MemoryMetricObserver, FileMetricObserver, and AsyncMetricObserver. We have already shown how to use MemoryMetricObserver in the previous code sample.

默认提供的 MetricObserversMemoryMetricObserver, FileMetricObserver, 和 AsyncMetricObserver 。我们已经在前面的代码示例中展示了如何使用MemoryMetricObserver

Currently, several useful extensions are available:

目前,有几个有用的扩展可用。

We can implement a customized MetricObserver to publish application metrics to where we see fit. The only thing to care about is to handle the updated metrics:

我们可以实现一个自定义的MetricObserver,将应用指标发布到我们认为合适的地方。唯一需要关心的是如何处理更新的指标。

public class CustomObserver extends BaseMetricObserver {

    //...

    @Override
    public void updateImpl(List<Metric> metrics) {
        //TODO
    }
}

4.3. Publish to Netflix Atlas

4.3.发布到Netflix Atlas

Atlas is another metrics-related tool from Netflix. It’s a tool for managing dimensional time series data, which is a perfect place to publish the metrics we collected.

Atlas是Netflix的另一个指标相关工具。它是一个管理维度时间序列数据的工具,这是一个发布我们收集的指标的完美场所。

Now, we’ll demonstrate how to publish our metrics to Netflix Atlas.

现在,我们将演示如何将我们的指标发布到Netflix Atlas。

First, let’s append the servo-atlas dependency to the pom.xml:

首先,让我们把servo-atlas依赖关系追加到pom.xml

<dependency>
      <groupId>com.netflix.servo</groupId>
      <artifactId>servo-atlas</artifactId>
      <version>${netflix.servo.ver}</version>
</dependency>

<properties>
    <netflix.servo.ver>0.12.17</netflix.servo.ver>
</properties>

This dependency includes an AtlasMetricObserver to help us publish metrics to Atlas.

这个依赖包括一个AtlasMetricObserver,以帮助我们向Atlas发布指标。

Then, we shall set up an Atlas server:

然后,我们将建立一个Atlas服务器。

$ curl -LO 'https://github.com/Netflix/atlas/releases/download/v1.4.4/atlas-1.4.4-standalone.jar'
$ curl -LO 'https://raw.githubusercontent.com/Netflix/atlas/v1.4.x/conf/memory.conf'
$ java -jar atlas-1.4.4-standalone.jar memory.conf

To save our time for the test, let’s set the step size to 1 second in memory.conf, so that we can generate a time series graph with enough details of the metrics.

为了节省我们的测试时间,让我们在memory.conf中设置步长为1秒,这样我们就可以生成一个有足够细节指标的时间序列图。

The AtlasMetricObserver requires a simple configuration and a list of tags. Metrics of the given tags will be pushed to Atlas:

AtlasMetricObserver需要一个简单的配置和一个标签列表。给定标签的指标将被推送到Atlas。

System.setProperty("servo.pollers", "1000");
System.setProperty("servo.atlas.batchSize", "1");
System.setProperty("servo.atlas.uri", "http://localhost:7101/api/v1/publish");
AtlasMetricObserver observer = new AtlasMetricObserver(
  new BasicAtlasConfig(), BasicTagList.of("servo", "counter"));

PollRunnable task = new PollRunnable(
  new MonitorRegistryMetricPoller(), new BasicMetricFilter(true), observer);

After starting up a PollScheduler with the PollRunnable task, we can publish metrics to Atlas automatically:

在用PollRunnable任务启动了PollScheduler之后,我们可以自动向Atlas发布指标。

Counter counter = new BasicCounter(MonitorConfig
  .builder("test")
  .withTag("servo", "counter")
  .build());
DefaultMonitorRegistry
  .getInstance()
  .register(counter);
assertThat(atlasValuesOfTag("servo"), not(containsString("counter")));

for (int i = 0; i < 3; i++) {
    counter.increment(RandomUtils.nextInt(10));
    SECONDS.sleep(1);
    counter.increment(-1 * RandomUtils.nextInt(10));
    SECONDS.sleep(1);
}

assertThat(atlasValuesOfTag("servo"), containsString("counter"));

Based on the metrics, we can generate a line graph using graph API of Atlas:

基于这些指标,我们可以使用Atlas的graph API生成一个线图。

graph

5. Summary

5.总结

In this article, we have introduced how to use Netflix Servo to collect and publish application metrics.

在这篇文章中,我们介绍了如何使用Netflix Servo来收集和发布应用指标。

In case you haven’t read our introduction to Dropwizard Metrics, check it out here for a quick comparison with Servo.

如果您还没有阅读我们对Dropwizard Metrics的介绍,请查看这里,以了解与Servo的快速比较。

As always, the full implementation code of this article can be found over on Github.

一如既往,本文的完整实施代码可以在Github上找到over