1. Overview
1.概述
A typical distributed system consists of many services collaborating together.
一个典型的分布式系统由许多协作的服务组成。
These services are prone to failure or delayed responses. If a service fails it may impact on other services affecting performance and possibly making other parts of application inaccessible or in the worst case bring down the whole application.
这些服务很容易发生故障或延迟响应。如果一个服务出现故障,可能会影响到其他服务的性能,并可能使应用程序的其他部分无法访问,或者在最坏的情况下使整个应用程序瘫痪。
Of course, there are solutions available that help make applications resilient and fault tolerant – one such framework is Hystrix.
当然,有一些解决方案可以帮助使应用程序具有弹性和容错性,Hystrix就是这样一个框架。
The Hystrix framework library helps to control the interaction between services by providing fault tolerance and latency tolerance. It improves overall resilience of the system by isolating the failing services and stopping the cascading effect of failures.
Hystrix框架库通过提供容错和延迟容忍,帮助控制服务之间的互动。它通过隔离故障的服务和阻止故障的级联效应来提高系统的整体弹性。
In this series of posts we will begin by looking at how Hystrix comes to the rescue when a service or system fails and what Hystrix can accomplish in these circumstances.
在这一系列的文章中,我们将首先看一下当一个服务或系统出现故障时,Hystrix是如何来拯救的,以及在这些情况下Hystrix可以完成什么。
2. Simple Example
2.简单的例子
The way Hystrix provides fault and latency tolerance is to isolate and wrap calls to remote services.
Hystrix提供故障和延迟容忍的方式是隔离和包裹对远程服务的调用。
In this simple example we wrap a call in the run() method of the HystrixCommand:
在这个简单的例子中,我们在HystrixCommand的run()方法中封装了一个调用:。
class CommandHelloWorld extends HystrixCommand<String> {
private String name;
CommandHelloWorld(String name) {
super(HystrixCommandGroupKey.Factory.asKey("ExampleGroup"));
this.name = name;
}
@Override
protected String run() {
return "Hello " + name + "!";
}
}
and we execute the call as follows:
而我们执行的调用如下。
@Test
public void givenInputBobAndDefaultSettings_whenCommandExecuted_thenReturnHelloBob(){
assertThat(new CommandHelloWorld("Bob").execute(), equalTo("Hello Bob!"));
}
3. Maven Setup
3.Maven的设置
To use Hystrix in a Maven projects, we need to have hystrix-core and rxjava-core dependency from Netflix in the project pom.xml:
要在Maven项目中使用Hystrix,我们需要在项目pom.xml中设置hystrix-core和rxjava-core的Netflix依赖。
<dependency>
<groupId>com.netflix.hystrix</groupId>
<artifactId>hystrix-core</artifactId>
<version>1.5.4</version>
</dependency>
The latest version can always be found here.
最新的版本总是可以找到这里。
<dependency>
<groupId>com.netflix.rxjava</groupId>
<artifactId>rxjava-core</artifactId>
<version>0.20.7</version>
</dependency>
The latest version of this library can always be found here.
这个库的最新版本总是可以在这里找到。
4. Setting up Remote Service
4.设置远程服务
Let’s start by simulating a real world example.
让我们从模拟一个真实世界的例子开始。
In the example below, the class RemoteServiceTestSimulator represents a service on a remote server. It has a method which responds with a message after the given period of time. We can imagine that this wait is a simulation of a time consuming process at the remote system resulting in a delayed response to the calling service:
在下面的例子中,RemoteServiceTestSimulator类代表一个远程服务器上的服务。它有一个方法,在给定的时间段后会有一个消息回应。我们可以想象,这个等待是对远程系统中一个耗时过程的模拟,导致对调用服务的延迟响应。
class RemoteServiceTestSimulator {
private long wait;
RemoteServiceTestSimulator(long wait) throws InterruptedException {
this.wait = wait;
}
String execute() throws InterruptedException {
Thread.sleep(wait);
return "Success";
}
}
And here is our sample client that calls the RemoteServiceTestSimulator.
这里是我们的示例客户端,调用RemoteServiceTestSimulator。
The call to the service is isolated and wrapped in the run() method of a HystrixCommand. Its this wrapping that provides the resilience we touched upon above:
对服务的调用被隔离,并被包装在HystrixCommand的run()方法中。这种包装提供了我们上面提到的弹性。
class RemoteServiceTestCommand extends HystrixCommand<String> {
private RemoteServiceTestSimulator remoteService;
RemoteServiceTestCommand(Setter config, RemoteServiceTestSimulator remoteService) {
super(config);
this.remoteService = remoteService;
}
@Override
protected String run() throws Exception {
return remoteService.execute();
}
}
The call is executed by calling the execute() method on an instance of the RemoteServiceTestCommand object.
该调用是通过调用execute()方法在RemoteServiceTestCommand对象的一个实例上执行。
The following test demonstrates how this is done:
下面的测试演示了如何做到这一点。
@Test
public void givenSvcTimeoutOf100AndDefaultSettings_whenRemoteSvcExecuted_thenReturnSuccess()
throws InterruptedException {
HystrixCommand.Setter config = HystrixCommand
.Setter
.withGroupKey(HystrixCommandGroupKey.Factory.asKey("RemoteServiceGroup2"));
assertThat(new RemoteServiceTestCommand(config, new RemoteServiceTestSimulator(100)).execute(),
equalTo("Success"));
}
So far we have seen how to wrap remote service calls in the HystrixCommand object. In the section below let’s look at how to deal with a situation when the remote service starts to deteriorate.
到目前为止,我们已经看到如何在HystrixCommand对象中包装远程服务调用。在下面的部分,让我们看看如何处理远程服务开始恶化的情况。
5. Working With Remote Service and Defensive Programming
5.使用远程服务和防御性编程
5.1. Defensive Programming With Timeout
5.1.使用超时的防御性编程
It is general programming practice to set timeouts for calls to remote services.
为调用远程服务设置超时是一般的编程实践。
Let’s begin by looking at how to set timeout on HystrixCommand and how it helps by short circuiting:
让我们首先看看如何在HystrixCommand上设置超时,以及它如何通过短路来帮助。
@Test
public void givenSvcTimeoutOf5000AndExecTimeoutOf10000_whenRemoteSvcExecuted_thenReturnSuccess()
throws InterruptedException {
HystrixCommand.Setter config = HystrixCommand
.Setter
.withGroupKey(HystrixCommandGroupKey.Factory.asKey("RemoteServiceGroupTest4"));
HystrixCommandProperties.Setter commandProperties = HystrixCommandProperties.Setter();
commandProperties.withExecutionTimeoutInMilliseconds(10_000);
config.andCommandPropertiesDefaults(commandProperties);
assertThat(new RemoteServiceTestCommand(config, new RemoteServiceTestSimulator(500)).execute(),
equalTo("Success"));
}
In the above test, we are delaying the service’s response by setting the timeout to 500 ms. We are also setting the execution timeout on HystrixCommand to be 10,000 ms, thus allowing sufficient time for the remote service to respond.
在上述测试中,我们通过设置超时为500ms来延迟服务的响应。我们还将HystrixCommand的执行超时设置为10,000毫秒,从而为远程服务的响应留出足够的时间。
Now let’s see what happens when the execution timeout is less than the service timeout call:
现在让我们看看当执行超时小于服务超时调用时会发生什么。
@Test(expected = HystrixRuntimeException.class)
public void givenSvcTimeoutOf15000AndExecTimeoutOf5000_whenRemoteSvcExecuted_thenExpectHre()
throws InterruptedException {
HystrixCommand.Setter config = HystrixCommand
.Setter
.withGroupKey(HystrixCommandGroupKey.Factory.asKey("RemoteServiceGroupTest5"));
HystrixCommandProperties.Setter commandProperties = HystrixCommandProperties.Setter();
commandProperties.withExecutionTimeoutInMilliseconds(5_000);
config.andCommandPropertiesDefaults(commandProperties);
new RemoteServiceTestCommand(config, new RemoteServiceTestSimulator(15_000)).execute();
}
Notice how we’ve lowered the bar and set the execution timeout to 5,000 ms.
请注意,我们降低了标准,将执行超时设置为5000毫秒。
We are expecting the service to respond within 5,000 ms, whereas we have set the service to respond after 15,000 ms. If you notice when you execute the test, the test will exit after 5,000 ms instead of waiting for 15,000 ms and will throw a HystrixRuntimeException.
我们期望服务在5000毫秒内响应,而我们已经将服务设置为15000毫秒后响应。如果你注意到当你执行测试时,测试将在5,000毫秒后退出,而不是等待15,000毫秒,并将抛出一个HystrixRuntimeException.。
This demonstrates how Hystrix does not wait longer than the configured timeout for a response. This helps make the system protected by Hystrix more responsive.
这展示了Hystrix如何不等待超过配置的超时来获得响应。这有助于使受Hystrix保护的系统反应更迅速。
In the below sections we will look into setting thread pool size which prevents threads being exhausted and we will discuss its benefit.
在下面的章节中,我们将研究设置线程池的大小,以防止线程被耗尽,我们将讨论其好处。
5.2. Defensive Programming With Limited Thread Pool
5.2.使用有限线程池的防御性编程
Setting timeouts for service call does not solve all the issues associated with remote services.
为服务调用设置超时并不能解决与远程服务相关的所有问题。
When a remote service starts to respond slowly, a typical application will continue to call that remote service.
当一个远程服务开始响应缓慢时,一个典型的应用程序将继续调用该远程服务。
The application doesn’t know if the remote service is healthy or not and new threads are spawned every time a request comes in. This will cause threads on an already struggling server to be used.
应用程序不知道远程服务是否健康,每次有请求进来都会产生新的线程。这将导致已经在奋斗的服务器上的线程被使用。
We don’t want this to happen as we need these threads for other remote calls or processes running on our server and we also want to avoid CPU utilization spiking up.
我们不希望发生这种情况,因为我们需要这些线程用于其他远程调用或在服务器上运行的进程,我们也希望避免CPU利用率飙升。
Let’s see how to set the thread pool size in HystrixCommand:
让我们看看如何在HystrixCommand中设置线程池大小。
@Test
public void givenSvcTimeoutOf500AndExecTimeoutOf10000AndThreadPool_whenRemoteSvcExecuted
_thenReturnSuccess() throws InterruptedException {
HystrixCommand.Setter config = HystrixCommand
.Setter
.withGroupKey(HystrixCommandGroupKey.Factory.asKey("RemoteServiceGroupThreadPool"));
HystrixCommandProperties.Setter commandProperties = HystrixCommandProperties.Setter();
commandProperties.withExecutionTimeoutInMilliseconds(10_000);
config.andCommandPropertiesDefaults(commandProperties);
config.andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter()
.withMaxQueueSize(10)
.withCoreSize(3)
.withQueueSizeRejectionThreshold(10));
assertThat(new RemoteServiceTestCommand(config, new RemoteServiceTestSimulator(500)).execute(),
equalTo("Success"));
}
In the above test, we are setting the maximum queue size, the core queue size and the queue rejection size. Hystrix will start rejecting the requests when the maximum number of threads have reached 10 and the task queue has reached a size of 10.
在上述测试中,我们正在设置最大队列大小、核心队列大小和队列拒绝大小。Hystrix将在最大线程数达到10和任务队列达到10的规模时开始拒绝请求。
The core size is the number of threads that always stay alive in the thread pool.
核心大小是指线程池中始终保持活力的线程数量。
5.3. Defensive Programming With Short Circuit Breaker Pattern
5.3.使用短路断路器模式的防御性编程
However, there is still an improvement that we can make to remote service calls.
然而,我们仍然可以对远程服务调用作出改进。
Let’s consider the case that the remote service has started failing.
让我们考虑一下远程服务已经开始失效的情况。
We don’t want to keep firing off requests at it and waste resources. We would ideally want to stop making requests for a certain amount of time in order to give the service time to recover before then resuming requests. This is what is called the Short Circuit Breaker pattern.
我们不希望一直向它发出请求,浪费资源。我们最好是在一定时间内停止请求,以便在恢复请求之前给服务以时间恢复。这就是所谓的Short Circuit Breaker模式。
Let’s see how Hystrix implements this pattern:
让我们看看Hystrix是如何实现这种模式的。
@Test
public void givenCircuitBreakerSetup_whenRemoteSvcCmdExecuted_thenReturnSuccess()
throws InterruptedException {
HystrixCommand.Setter config = HystrixCommand
.Setter
.withGroupKey(HystrixCommandGroupKey.Factory.asKey("RemoteServiceGroupCircuitBreaker"));
HystrixCommandProperties.Setter properties = HystrixCommandProperties.Setter();
properties.withExecutionTimeoutInMilliseconds(1000);
properties.withCircuitBreakerSleepWindowInMilliseconds(4000);
properties.withExecutionIsolationStrategy
(HystrixCommandProperties.ExecutionIsolationStrategy.THREAD);
properties.withCircuitBreakerEnabled(true);
properties.withCircuitBreakerRequestVolumeThreshold(1);
config.andCommandPropertiesDefaults(properties);
config.andThreadPoolPropertiesDefaults(HystrixThreadPoolProperties.Setter()
.withMaxQueueSize(1)
.withCoreSize(1)
.withQueueSizeRejectionThreshold(1));
assertThat(this.invokeRemoteService(config, 10_000), equalTo(null));
assertThat(this.invokeRemoteService(config, 10_000), equalTo(null));
assertThat(this.invokeRemoteService(config, 10_000), equalTo(null));
Thread.sleep(5000);
assertThat(new RemoteServiceTestCommand(config, new RemoteServiceTestSimulator(500)).execute(),
equalTo("Success"));
assertThat(new RemoteServiceTestCommand(config, new RemoteServiceTestSimulator(500)).execute(),
equalTo("Success"));
assertThat(new RemoteServiceTestCommand(config, new RemoteServiceTestSimulator(500)).execute(),
equalTo("Success"));
}
public String invokeRemoteService(HystrixCommand.Setter config, int timeout)
throws InterruptedException {
String response = null;
try {
response = new RemoteServiceTestCommand(config,
new RemoteServiceTestSimulator(timeout)).execute();
} catch (HystrixRuntimeException ex) {
System.out.println("ex = " + ex);
}
return response;
}
In the above test we have set different circuit breaker properties. The most important ones are:
在上述测试中,我们设置了不同的断路器属性。其中最重要的是。
- The CircuitBreakerSleepWindow which is set to 4,000 ms. This configures the circuit breaker window and defines the time interval after which the request to the remote service will be resumed
- The CircuitBreakerRequestVolumeThreshold which is set to 1 and defines the minimum number of requests needed before the failure rate will be considered
With the above settings in place, our HystrixCommand will now trip open after two failed request. The third request will not even hit the remote service even though we have set the service delay to be 500 ms, Hystrix will short circuit and our method will return null as the response.
有了上述设置,我们的HystrixCommand现在将在两次失败的请求后跳开。即使我们将服务延迟设置为500毫秒,第三个请求也不会击中远程服务,Hystrix将短路,我们的方法将返回null作为响应。
We will subsequently add a Thread.sleep(5000) in order to cross the limit of the sleep window that we have set. This will cause Hystrix to close the circuit and the subsequent requests will flow through successfully.
我们随后将添加一个Thread.sleep(5000),以便越过我们设定的睡眠窗口的极限。这将导致Hystrix关闭电路,随后的请求将成功流过。
6. Conclusion
6.结论
In summary Hystrix is designed to:
综上所述,Hystrix的设计是为了。
- Provide protection and control over failures and latency from services typically accessed over the network
- Stop cascading of failures resulting from some of the services being down
- Fail fast and rapidly recover
- Degrade gracefully where possible
- Real time monitoring and alerting of command center on failures
In the next post we will see how to combine the benefits of Hystrix with the Spring framework.
在下一篇文章中,我们将看到如何将Hystrix的优势与Spring框架相结合。
The full project code and all examples can be found over on the github project.
完整的项目代码和所有实例可以在github项目中找到。