1. Overview

1.概述

Spring Cloud provides client-side load balancing through the use of Netflix Ribbon. Ribbon’s load balancing mechanism can be supplemented with retries.

Spring Cloud 通过使用Netflix Ribbon提供客户端负载平衡。Ribbon的负载平衡机制可以用重试来补充。

In this tutorial, we’re going to explore this retry mechanism.

在本教程中，我们将探讨这种重试机制。

First, we’ll see why it’s important that our applications need to be built with this feature in mind. Then, we’ll build and configure an application with Spring Cloud Netflix Ribbon to demonstrate the mechanism.

首先，我们将看到为什么我们的应用程序需要在构建时考虑到这一特性，这一点很重要。然后，我们将用Spring Cloud Netflix Ribbon构建和配置一个应用程序，以演示这一机制。

2. Motivation

2.动机

In a cloud-based application, it’s a common practice for a service to make requests to other services. But in such a dynamic and volatile environment, networks could fail or services could be temporarily unavailable.

在基于云的应用程序中，一项服务向其他服务发出请求是一种常见的做法。但在这样一个动态和不稳定的环境中，网络可能出现故障或服务可能暂时不可用。

We want to handle failures in a graceful manner and recover quickly. In many cases, these issues are short-lived. If we repeated the same request shortly after the failure occurred, maybe it would succeed.

我们希望以优雅的方式处理失败，并迅速恢复。在许多情况下，这些问题是短暂的。如果我们在故障发生后不久重复同样的请求，也许会成功。

This practice helps us to improve the application’s resilience, which is one of the key aspects of a reliable cloud application.

这种做法有助于我们提高应用程序的弹性，这是一个可靠的云应用程序的关键方面之一。

Nevertheless, we need to keep an eye on retries since they can also lead to bad situations. For example, they can increase latency which might not be desirable.

然而，我们需要密切关注重试，因为它们也可能导致不好的情况。例如，它们会增加延迟，这可能是不可取的。

3. Setup

3.设置

In order to experiment with the retry mechanism, we need two Spring Boot services. First, we’ll create a weather-service that will display today’s weather information through a REST endpoint.

为了试验重试机制，我们需要两个Spring Boot服务。首先，我们将创建一个weather-service，它将通过一个REST端点显示今天的天气信息。

Second, we’ll define a client service that will consume the weather endpoint.

第二，我们将定义一个客户端服务，它将消费weather端点。

3.1. The Weather Service

3.1.气象局

Let’s build a very simple weather service that will fail sometimes, with a 503 HTTP status code (service unavailable). We’ll simulate this intermittent failure by choosing to fail when the number of calls is a multiple of a configurable successful.call.divisor property:

让我们建立一个非常简单的天气服务，它有时会失败，出现503 HTTP状态代码（服务不可用）。我们将模拟这种间歇性的失败，选择在调用次数是可配置的successful.call.divisor属性的倍数时失败。

@Value("${successful.call.divisor}")
private int divisor;
private int nrOfCalls = 0;

@GetMapping("/weather")
public ResponseEntity<String> weather() {
    LOGGER.info("Providing today's weather information");
    if (isServiceUnavailable()) {
        return new ResponseEntity<>(HttpStatus.SERVICE_UNAVAILABLE);
    }
    LOGGER.info("Today's a sunny day");
    return new ResponseEntity<>("Today's a sunny day", HttpStatus.OK);
}

private boolean isServiceUnavailable() {
    return ++nrOfCalls % divisor != 0;
}

Also, to help us observe the number of retries made to the service, we have a message logger inside the handler.

另外，为了帮助我们观察对服务的重试次数，我们在处理程序中设置了一个消息记录器。

Later on, we’re going to configure the client service to trigger the retry mechanism when the weather service is temporarily unavailable.

稍后，我们要配置客户端服务，以便在天气服务暂时不可用时触发重试机制。

3.2. The Client Service

3.2.客户端服务

Our second service will use Spring Cloud Netflix Ribbon.

我们的第二个服务将使用Spring Cloud Netflix Ribbon。

First, let’s define the Ribbon client configuration:

首先，我们来定义Ribbon客户端配置。

@Configuration
@RibbonClient(name = "weather-service", configuration = RibbonConfiguration.class)
public class WeatherClientRibbonConfiguration {

    @LoadBalanced
    @Bean
    RestTemplate getRestTemplate() {
        return new RestTemplate();
    }

}

Our HTTP Client is annotated with @LoadBalanced which means we want it to be load balanced with Ribbon.

我们的HTTP客户端被注释为@LoadBalanced，这意味着我们希望它与Ribbon进行负载平衡。

We’ll now add a ping mechanism to determine the service’s availability, and also a round-robin load balancing strategy, by defining the RibbonConfiguration class included in the @RibbonClient annotation above:

现在我们将通过定义上述@RibbonClient注解中包含的RibbonConfiguration类，添加一个ping机制来确定服务的可用性，以及一个轮流负载平衡策略。

public class RibbonConfiguration {
 
    @Bean
    public IPing ribbonPing() {
        return new PingUrl();
    }
 
    @Bean
    public IRule ribbonRule() {
        return new RoundRobinRule();
    }
}

Next, we need to turn off Eureka from the Ribbon client since we’re not using service discovery. Instead, we’re using a manually defined list of weather-service instances available for load balancing.

接下来，我们需要从 Ribbon 客户端关闭 Eureka，因为我们没有使用服务发现。相反，我们正在使用手动定义的可用于负载平衡的天气服务实例的列表。

So, let’s also add this all to the application.yml file:

所以，让我们也把这一切添加到application.yml文件中。

weather-service:
    ribbon:
        eureka:
            enabled: false
        listOfServers: http://localhost:8021, http://localhost:8022

Finally, let’s build a controller and make it call the backend service:

最后，让我们建立一个控制器，让它调用后台服务。

@RestController
public class MyRestController {

    @Autowired
    private RestTemplate restTemplate;

    @RequestMapping("/client/weather")
    public String weather() {
        String result = this.restTemplate.getForObject("http://weather-service/weather", String.class);
        return "Weather Service Response: " + result;
    }
}

4. Enabling the Retry Mechanism

4.启用重试机制

4.1. Configuring application.yml Properties

4.1.配置application.yml属性

We need to put weather service properties in our client application’s application.yml file:

我们需要把天气服务属性放在我们客户应用程序的application.yml文件中。

weather-service:
  ribbon:
    MaxAutoRetries: 3
    MaxAutoRetriesNextServer: 1
    retryableStatusCodes: 503, 408
    OkToRetryOnAllOperations: true

The above configuration uses the standard Ribbon properties we need to define to enable retries:

上面的配置使用了我们需要定义的标准Ribbon属性来启用重试。

MaxAutoRetries – the number of times a failed request is retried on the same server (default 0)
MaxAutoRetriesNextServer – the number of servers to try excluding the first one (default 0)
retryableStatusCodes – the list of HTTP status codes to retry
OkToRetryOnAllOperations – when this property is set to true, all types of HTTP requests are retried, not just GET ones (default)

We’re going to retry a failed request when the client service receives a 503 (service unavailable) or 408 (request timeout) response code.

当客户端服务收到503（服务不可用）或408（请求超时）响应代码时，我们将重试一个失败的请求。

4.2. Required Dependencies

4.2.所需的依赖性

Spring Cloud Netflix Ribbon leverages Spring Retry to retry failed requests.

Spring Cloud Netflix Ribbon利用Spring Retry来重试失败的请求。。

We have to make sure the dependency is on the classpath. Otherwise, the failed requests won’t be retried. We can omit the version since it’s managed by Spring Boot:

我们必须确保该依赖关系在classpath上。否则，失败的请求将不会被重试。我们可以省略版本，因为它是由Spring Boot管理的。

<dependency>
    <groupId>org.springframework.retry</groupId>
    <artifactId>spring-retry</artifactId>
</dependency>

4.3. Retry Logic in Practice

4.3.实践中的重试逻辑

Finally, let’s see the retry logic in practice.

最后，让我们看看重试逻辑的实践。

For this reason, we need two instances of our weather service and we’ll run them on 8021 and 8022 ports. Of course, these instances should match the listOfServers list defined in the previous section.

出于这个原因，我们需要两个天气服务的实例，我们将在8021和8022端口运行它们。当然，这些实例应该与上一节中定义的listOfServers列表相匹配。

Moreover, we need to configure the successful.call.divisor property on each instance to make sure our simulated services fail at different times:

此外，我们需要在每个实例上配置successful.call.divisor属性，以确保我们的模拟服务在不同时间失败。

successful.call.divisor = 5 // instance 1
successful.call.divisor = 2 // instance 2

Next, let’s also run the client service on port 8080 and call:

接下来，让我们也在8080端口上运行客户端服务并调用。

http://localhost:8080/client/weather

Let’s take a look at the weather-service‘s console:

让我们来看看天气服务的控制台。

weather service instance 1:
    Providing today's weather information
    Providing today's weather information
    Providing today's weather information
    Providing today's weather information

weather service instance 2:
    Providing today's weather information
    Today's a sunny day

So, after several attempts (4 on instance 1 and 2 on instance 2) we’ve got a valid response.

因此，经过几次尝试（实例1上4次，实例2上2次），我们得到了一个有效的回应。

5. Backoff Policy Configuration

5.倒退政策配置

When a network experiences a higher amount of data than it can handle, then congestion occurs. In order to alleviate it, we can set up a backoff policy.

当一个网络遇到的数据量超过它所能处理的数量时，就会发生拥堵。为了缓解它，我们可以设置一个回退策略。

By default, there is no delay between the retry attempts. Underneath, Spring Cloud Ribbon uses Spring Retry‘s NoBackOffPolicy object which does nothing.

默认情况下，重试之间没有延迟。下面，Spring Cloud Ribbon使用Spring Retry的NoBackOffPolicy对象，它什么都不做。

However, we can override the default behavior by extending the RibbonLoadBalancedRetryFactory class:

然而，我们可以通过扩展RibbonLoadBalancedRetryFactory类来覆盖默认行为。

@Component
private class CustomRibbonLoadBalancedRetryFactory 
  extends RibbonLoadBalancedRetryFactory {

    public CustomRibbonLoadBalancedRetryFactory(
      SpringClientFactory clientFactory) {
        super(clientFactory);
    }

    @Override
    public BackOffPolicy createBackOffPolicy(String service) {
        FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
        fixedBackOffPolicy.setBackOffPeriod(2000);
        return fixedBackOffPolicy;
    }
}

The FixedBackOffPolicy class provides a fixed delay between retry attempts. If we don’t set a backoff period, the default is 1 second.

FixedBackOffPolicy类提供了重试之间的固定延迟。如果我们不设置回退期，默认是1秒。

Alternatively, we can set up an ExponentialBackOffPolicy or an ExponentialRandomBackOffPolicy:

另外，我们可以设置一个ExponentialBackOffPolicy或ExponentialRandomBackOffPolicy。

@Override
public BackOffPolicy createBackOffPolicy(String service) {
    ExponentialBackOffPolicy exponentialBackOffPolicy = 
      new ExponentialBackOffPolicy();
    exponentialBackOffPolicy.setInitialInterval(1000);
    exponentialBackOffPolicy.setMultiplier(2); 
    exponentialBackOffPolicy.setMaxInterval(10000);
    return exponentialBackOffPolicy;
}

Here, the initial delay between the attempts is 1 second. Then, the delay is doubled for each subsequent attempt without exceeding 10 seconds: 1000 ms, 2000 ms, 4000 ms, 8000 ms, 10000 ms, 10000 ms…

这里，尝试之间的初始延迟是1秒。然后，在不超过10秒的情况下，以后每一次尝试的延迟都会翻倍。1000 ms, 2000 ms, 4000 ms, 8000 ms, 10000 ms, 10000 ms…

Additionally, the ExponentialRandomBackOffPolicy adds a random value to each sleeping period without exceding the next value. So, it may yield 1500 ms, 3400 ms, 6200 ms, 9800 ms, 10000 ms, 10000 ms…

此外，ExponentialRandomBackOffPolicy给每个睡眠期增加一个随机值，而不超过下一个值。因此，它可能产生1500毫秒、3400毫秒、6200毫秒、9800毫秒、10000毫秒、10000毫秒…

Choosing one or another depends on how much traffic we have and how many different client services. From fixed to random, these strategies help us achieve a better spread of traffic spikes also meaning fewer retries. For example, with many clients, a random factor helps avoid several clients hitting the service at the same time while retrying.

选择一个或另一个取决于我们有多少流量和多少不同的客户端服务。从固定到随机，这些策略帮助我们实现了更好的流量峰值分布，也意味着更少的重试。例如，在有许多客户的情况下，随机因素有助于避免几个客户在重试时同时冲击服务。

6. Conclusion

6.结语

In this article, we learned how to retry failed requests in our Spring Cloud applications using Spring Cloud Netflix Ribbon. We also discussed the benefits this mechanism provides.

在这篇文章中，我们学习了如何使用Spring Cloud Netflix Ribbon在我们的Spring Cloud应用程序中重试失败的请求。我们还讨论了这种机制带来的好处。

Next, we demonstrated how the retry logic works through a REST application backed by two Spring Boot services. Spring Cloud Netflix Ribbon makes that possible by leveraging the Spring Retry library.

接下来，我们演示了重试逻辑是如何通过一个由两个Spring Boot服务支持的REST应用工作的。Spring Cloud Netflix Ribbon通过利用Spring Retry库使之成为可能。

Finally, we saw how to configure different types of delays between the retry attempts.

最后，我们看到了如何配置重试之间不同类型的延迟。

As always, the source code for this tutorial is available over on GitHub.

像往常一样，本教程的源代码可在GitHub上获得over。