1. Introduction

1.绪论

In this tutorial, we’ll understand how to do distributed performance testing with Gatling. In the process, we’ll create a simple application to test with Gatling, understand the rationale for using distributed performance testing, and finally, understand what support is available in Gatling to achieve it.

在本教程中，我们将了解如何使用Gatling进行分布式性能测试。在此过程中，我们将创建一个简单的应用程序来使用 Gatling 进行测试，了解使用分布式性能测试的理由，最后，了解 Gatling 中有哪些支持来实现它。

2. Performance Testing with Gatling

2.用加特林进行性能测试

Performance testing is a testing practice that evaluates a system’s responsiveness and stability under a certain workload. There are several types of tests that generally come under performance testing. These include load testing, stress testing, soak testing, spike testing, and several others. All of these have their own specific objectives to attain.

性能测试是评估系统在一定工作负荷下的响应性和稳定性的测试实践。有几种类型的测试通常属于性能测试。这些包括负载测试、压力测试、浸泡测试、尖峰测试和其他一些测试。所有这些都有自己的具体目标要实现。

However, one common aspect of any performance testing is to simulate workloads, and tools like Gatling, JMeter, and K6 help us do that. But, before we proceed further, we need an application that we can test for performance.

然而，任何性能测试的一个共同点是模拟工作负载，像Gatling、JMeter和K6等工具可以帮助我们做到这一点。但是，在我们进一步进行之前，我们需要一个可以测试性能的应用程序。

We’ll then develop a simple workload model for the performance testing of this application.

然后，我们将开发一个简单的工作负载模型，用于该应用的性能测试。

2.1. Creating an Application

2.1.创建一个应用程序

For this tutorial, we’ll create a straightforward Spring Boot web application using Spring CLI:

在本教程中，我们将使用Spring CLI创建一个简单的Spring Boot网络应用。

spring init --dependencies=web my-application

Next, we’ll create a simple REST API that provides a random number on request:

接下来，我们将创建一个简单的REST API，根据请求提供一个随机数。

@RestController
@SpringBootApplication
public class Application {
    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }

    @GetMapping("/api/random")
    public Integer getRandom() {
        Random random = new Random();
        return random.nextInt(1000);
    }
}

There’s nothing special about this API — it simply returns a random integer in the range 0 to 999 on every call.

这个API没有什么特别之处–它只是在每次调用时返回一个范围为0到999的随机整数。

Starting this application is quite simple using the Maven command:

使用Maven命令启动该应用程序非常简单。

mvnw spring-boot:run

2.2. Creating a Workload Model

2.2.创建一个工作负载模型

If we need to deploy this simple API into production, we need to ensure that it can handle the anticipated load and still provide the desired quality of service. This is where we need to perform various performance tests. A workload model typically identifies one or more workload profiles to simulate real-life usage.

如果我们需要将这个简单的API部署到生产中，我们需要确保它能够处理预期的负载，并且仍然能够提供预期的服务质量。这就是我们需要进行各种性能测试的地方。一个工作负载模型通常会确定一个或多个工作负载配置文件，以模拟现实生活中的使用情况。

For a web application with a user interface, defining an appropriate workload model can be quite challenging. But for our simple API, we can make assumptions about the load distribution for the load testing.

对于一个有用户界面的网络应用，定义一个合适的工作负载模型可能是相当有挑战性的。但是对于我们简单的API，我们可以对负载测试的负载分布做出假设。

Gatling provides Scala DSL to create scenarios to test in a simulation. Let’s begin by creating a basic scenario for the API that we created earlier:

Gatling提供了Scala DSL来创建场景以在模拟中进行测试。让我们首先为我们之前创建的API创建一个基本场景。

package randomapi

import io.gatling.core.Predef._
import io.gatling.core.structure.ScenarioBuilder
import io.gatling.http.Predef._
import io.gatling.http.protocol.HttpProtocolBuilder

class RandomAPILoadTest extends Simulation {
    val protocol: HttpProtocolBuilder = http.baseUrl("http://localhost:8080/")
    val scn: ScenarioBuilder = scenario("Load testing of Random Number API")
      .exec(
        http("Get Random Number")
          .get("api/random")
          .check(status.is(200))
      )

    val duringSeconds: Integer = Integer.getInteger("duringSeconds", 10)
    val constantUsers: Integer = Integer.getInteger("constantUsers", 10)
    setUp(scn.inject(constantConcurrentUsers(constantUsers) during (duringSeconds))
      .protocols(protocol))
      .maxDuration(1800)
      .assertions(global.responseTime.max.lt(20000), global.successfulRequests.percent.gt(95))
}

Let’s discuss the salient points in this basic simulation:

让我们讨论一下这个基本模拟中的突出问题。

We begin by adding some necessary Gatling DSL imports
Next, we define the HTTP protocol configuration
Then, we define a scenario with a single request to our API
Finally, we create a simulation definition for the load we want to inject; here, we’re injecting load using 10 concurrent users for 10 seconds

It can be quite complex to create this kind of scenario for more complex applications with a user interface. Thankfully, Gatling comes with another utility, called a recorder. Using this recorder, we can create scenarios by letting it proxy interactions between the browser and the server. It can also consume a HAR (HTTP archive) file to create scenarios.

对于具有用户界面的更复杂的应用程序来说，创建这种场景可能是相当复杂的。值得庆幸的是，Gatling 附带了另一个实用程序，称为记录器。使用该记录器，我们可以通过让它代理浏览器和服务器之间的交互来创建场景。它还可以消耗HAR（HTTP归档）文件来创建场景。

2.3. Executing the Simulation

2.3.执行模拟

Now, we’re ready to execute our load test. For this, we can place our simulation file “RandomAPILoadTest.scala” in the directory “%GATLING_HOME%/user-file/randomapi/”. Please note that this is not the only way to execute the simulation, but it’s certainly one of the easiest ones.

现在，我们准备执行我们的负载测试。为此，我们可以将我们的模拟文件 “RandomAPILoadTest.scala “放在“%GATLING_HOME%/user-file/randomapi/”目录下。请注意，这不是执行模拟的唯一方法，但这肯定是最简单的方法之一。

We can start Gatling by running the command:

我们可以通过运行命令来启动Gatling。

$GATLING_HOME/bin/gatling.sh

This will prompt us to choose the simulation to run:

这将提示我们选择要运行的模拟。

Choose a simulation number:
     [0] randomapi.RandomAPILoadTest

On selecting the simulation, it will run the simulation and generate an output with the summary:

在选择模拟时，它将运行模拟并生成一个带有摘要的输出。

Further, it generates a report in HTML format in the directory “%GATLING_HOME%/results”:

此外，它还在”%GATLING_HOME%/results “目录下生成一份HTML格式的报告。

This is just one part of the report that is generated, but we can clearly see the summary of the result. This is quite detailed and easy to follow.

这只是生成的报告的一部分，但我们可以清楚地看到结果的摘要。这相当详细，也很容易理解。

3. Distributed Performance Testing

3.分布式性能测试

So far, so good. But, if we recall, the purpose of performance testing is to simulate real-life workloads. This can be significantly higher for popular applications than the load we’ve seen in our trivial case here. If we notice in the test summary, we managed to achieve a throughput of roughly 500 requests/sec. For a real-life application, handling real-life workloads, this can be many times higher!

到目前为止，一切都很好。但是，如果我们记得，性能测试的目的是模拟真实的工作负载。对于流行的应用程序来说，这可能比我们在这里的琐碎案例中看到的负载要高得多。如果我们在测试总结中注意到，我们设法实现了大约500个请求/秒的吞吐量。对于一个现实生活中的应用程序，处理现实生活中的工作负载，这可能会高出许多倍。

How do we simulate this kind of workload using any performance tool? Is it really possible to achieve these numbers by injecting load just from a single machine? Perhaps not. Even if the load injection tool can handle much higher loads, the underlying operating system and network have their own limitations.

我们如何使用任何性能工具来模拟这种工作负载？仅仅从一台机器上注入负载，真的有可能达到这些数字吗？也许不能。即使负载注入工具可以处理更高的负载，底层操作系统和网络也有自己的限制。

This is where we have to distribute our load injection over multiple machines. Of course, like any other distributed computing model, this comes with its own share of challenges:

这时，我们必须将我们的负载注入分布在多台机器上。当然，像任何其他分布式计算模式一样，这也有其自身的挑战。

How do we distribute the workload amongst participating machines?
Who coordinates their completion and recovery from any errors that may happen?
How do we collect and summarize the results for consolidated reporting?

A typical architecture for distributed performance testing uses master and slave nodes to address some of these concerns:

分布式性能测试的典型架构使用主节点和从节点来解决其中的一些问题。

But, here again, what happens if the master breaks down? It’s not in the scope of this tutorial to address all the concerns of distributed computing, but we must certainly emphasize their implications while choosing a distributed model for performance testing.

但是，在这里，如果主站崩溃了怎么办？解决所有分布式计算的关注点并不在本教程的范围内，但我们在选择分布式模型进行性能测试时，肯定要强调其影响。

4. Distributed Performance Testing with Gatling

4.用Gatling进行分布式性能测试

Now that we’ve understood the need for distributed performance testing, we’ll see how we can achieve this using Gatling. The clustering-mode is a built-in feature of Gatling Frontline. However, Frontline is the enterprise version of Gatling and not available as open-source. Frontline has support for deploying injectors on-premises, or on any of the popular cloud vendors.

现在我们已经了解了分布式性能测试的需求，我们将看看如何使用Gatling实现这一目标。集群模式是Gatling Frontline的一个内置功能。然而，Frontline是Gatling的企业版，并没有作为开源版本提供。Frontline支持在企业内部或任何流行的云供应商上部署注射器。

Nevertheless, it’s still possible to achieve this with Gatling open-source. But, we’ll have to do most of the heavy lifting ourselves. We’ll cover the basic steps to achieve it in this section. Here, we’ll use the same simulation that we defined earlier to generate a multiple-machine load.

尽管如此，使用加特林开源仍有可能实现这一目标。但是，我们必须自己做大部分的重活。我们将在本节中介绍实现它的基本步骤。在这里，我们将使用之前定义的相同的模拟来生成一个多机负载。

4.1. Setup

4.1.设置

We’ll begin by creating a controller machine and several remote worker machines, either on-premise or on any of the cloud vendors. There are certain prerequisites that we have to perform on all these machines. These include installing Gatling open-source on all worker machines and setting up some controller machine environment variables.

我们将首先创建一个控制器机器和几个远程工作者机器，可以在内部或任何一个云供应商上。有一些先决条件，我们必须在所有这些机器上执行。这些包括在所有工人机器上安装Gatling开源软件，并设置一些控制器机器的环境变量。

To achieve a consistent result, we should install the same version of Gatling on all worker machines, with the same configuration on each one. This includes the directory we install Gatling in and the user we create to install it.

为了获得一致的结果，我们应该在所有工人机器上安装相同版本的 Gatling，并在每台机器上进行相同的配置。这包括我们安装Gatling的目录和我们为安装它而创建的用户。

Let’s see the important environment variables that we need to set on the controller machine:

让我们看看我们需要在控制器机器上设置的重要环境变量。

HOSTS=( 192.168.x.x 192.168.x.x 192.168.x.x)

And let’s also define the list of remote worker machines that we’ll use to inject the load from:

让我们也定义一下我们将用来注入负载的远程工作者机器的列表。

GATLING_HOME=/gatling/gatling-charts-highcharts-1.5.6
GATLING_SIMULATIONS_DIR=$GATLING_HOME/user-files/simulations
SIMULATION_NAME='randomapi.RandomAPILoadTest'
GATLING_RUNNER=$GATLING_HOME/bin/gatling.sh
GATLING_REPORT_DIR=$GATLING_HOME/results/
GATHER_REPORTS_DIR=/gatling/reports/

Some variables point to the Gatling installation directory and other scripts that we need to start the simulation. It also mentions the directory where we wish to generate the reports. We’ll see where to use them later on.

一些变量指向Gatling安装目录和其他我们需要启动模拟的脚本。它还提到了我们希望生成报告的目录。我们以后会看到在哪里使用它们。

It’s important to note that we’re assuming the machines have a Linux-like environment. But, we can easily adapt the procedure for other platforms like Windows.

需要注意的是，我们假设机器有一个类似Linux的环境。但是，我们可以很容易地将该程序调整为其他平台，如Windows。

4.2. Distributing Load

4.2.分布负荷

Here, we’ll copy the same scenario to multiple worker machines that we created earlier. There can be several ways to copy the simulation to a remote host. The simplest way is to use scp for supported hosts. We can also automate this using a shell script:

在这里，我们将复制相同的场景到我们之前创建的多个工人机。可以有几种方法将模拟复制到远程主机上。最简单的方法是使用scp来支持主机。我们也可以用一个shell脚本来自动完成。

for HOST in "${HOSTS[@]}"
do
  scp -r $GATLING_SIMULATIONS_DIR/* $USER_NAME@$HOST:$GATLING_SIMULATIONS_DIR
done

The above command copies a directory’s contents on the local host to a directory on the remote host. For windows users, PuTTY is a better option that also comes with PSCP (PuTTY Secure Copy Protocol). We can use PSCP to transfer files between Windows clients and Windows or Unix servers.

上述命令将本地主机上一个目录的内容复制到远程主机上的一个目录。对于windows用户来说，PuTTY是一个更好的选择，它还带有PSCP（PuTTY安全拷贝协议）。我们可以使用PSCP来在Windows客户端和Windows或Unix服务器之间传输文件。

4.3. Executing Simulation

4.3.执行模拟

Once we’ve copied the simulations to the worker machines, we’re ready to trigger them. The key to achieving an aggregated number of concurrent users is to execute the simulation on all hosts, almost simultaneously.

一旦我们把模拟复制到工作机上，我们就准备好触发它们了。实现聚合并发用户数的关键是在所有主机上执行模拟，几乎同时执行。

We can again automate this step using a shell script:

我们可以再次用一个shell脚本来自动完成这个步骤。

for HOST in "${HOSTS[@]}"
do
  ssh -n -f $USER_NAME@$HOST \
    "sh -c 'nohup $GATLING_RUNNER -nr -s $SIMULATION_NAME \
    > /gatling/run.log 2>&1 &'"
done

We’re using ssh to trigger the simulation on remote worker machines. The key point to note here is that we’re using the “no reports” option (-nr). This is because we’re only interested in collecting the logs at this stage, and we’ll create the report by combining logs from all worker machines later.

我们正在使用ssh来触发远程工人机器上的模拟。这里需要注意的关键点是，我们使用的是 “无报告 “选项（-nr）。这是因为我们在这个阶段只对收集日志感兴趣，以后我们将通过合并所有工人机的日志来创建报告。

4.4. Gathering Results

4.4.收集结果

Now, we need to collect the log files generated by simulations on all the worker machines. This is, again, something we can automate using a shell script and execute from the controller machine:

现在，我们需要收集所有工作母机上模拟生成的日志文件。这也是我们可以使用shell脚本自动完成的事情，并从控制器机器上执行。

for HOST in "${HOSTS[@]}"
do
  ssh -n -f $USER_NAME@$HOST \
    "sh -c 'ls -t $GATLING_REPORT_DIR | head -n 1 | xargs -I {} \
    mv ${GATLING_REPORT_DIR}{} ${GATLING_REPORT_DIR}report'"
  scp $USER_NAME@$HOST:${GATLING_REPORT_DIR}report/simulation.log \
    ${GATHER_REPORTS_DIR}simulation-$HOST.log
done

The commands may seem complex for those of us not well versed with shell scripting. But, it’s not that complex when we break them into parts. First, we ssh into a remote host, list all the files in the Gatling report directory in reverse chronological order, and take the first file.

对于我们这些不精通shell脚本的人来说，这些命令可能看起来很复杂。但是，当我们把它们分成几个部分时，就没有那么复杂了。首先，我们ssh进入一个远程主机，按时间倒序列出Gatling报告目录中的所有文件，并取第一个文件。

Then, we copy the selected logfile from the remote host to the controller machine and rename it to append the hostname. This is important, as we’ll have multiple log files with the same name from different hosts.

然后，我们把选定的日志文件从远程主机复制到控制器机器上，并重新命名，附加上主机名。这很重要，因为我们会有多个来自不同主机的同名日志文件。

4.5. Generating a Report

4.5.生成一个报告

Lastly, we have to generate a report from all the log files collected from simulations executed on different worker machines. Thankfully, Gatling does all the heavy lifting here:

最后，我们必须从收集到的所有日志文件中生成一份报告，这些文件是在不同的工人机器上执行的模拟。值得庆幸的是，Gatling在这里完成了所有繁重的工作。

mv $GATHER_REPORTS_DIR $GATLING_REPORT_DIR
$GATLING_RUNNER -ro reports

We copy all the log files into the standard Gatling report directory and execute the Gating command to generate the report. This assumes that we have Gatling installed on the controller machine as well. The final report is similar to what we’ve seen earlier:

我们将所有的日志文件复制到标准的Gatling报告目录中，并执行Gating命令来生成报告。这假定我们在控制器机器上也安装了Gatling。最终的报告与我们之前看到的类似。

Here, we don’t even realize that the load was actually injected from multiple machines! We can clearly see that the number of requests almost tripled when we used three worker machines. In real-life scenarios, the scaling would not be this perfectly linear, though!

在这里，我们甚至没有意识到，负载实际上是从多台机器上注入的！我们可以清楚地看到，当我们使用三台工作机时，请求数几乎增加了三倍。我们可以清楚地看到，当我们使用三台工作机时，请求的数量几乎增加了两倍。在现实生活中，这种扩展不会是如此完美的线性，虽然

5. Considerations for Scaling Performance Testing

5.扩展性能测试的考虑因素

We’ve seen that distributed performance testing is a way to scale performance testing to simulate real-life workloads. Now, while distributed performance testing is useful, it does have its nuances. Hence, we should definitely attempt to scale the load injection capability vertically as much as possible. Only when we reach the vertical limit on a single machine should we consider using distributed testing.

我们已经看到，分布式性能测试是一种扩展性能测试以模拟现实生活中的工作负载的方法。现在，虽然分布式性能测试很有用，但它确实有其细微的差别。因此，我们肯定应该尽可能地尝试纵向扩展负载注入能力。只有当我们在单台机器上达到垂直极限时，我们才应该考虑使用分布式测试。

Typically, the limiting factors to scale load injection on a machine comes from the underlying operating system or network. There are certain things we can optimize to make this better. In Linux-like environments, the number of concurrent users that a load injector can spawn is generally limited by the open files limit. We can consider increasing it using the ulimit command.

通常情况下，在机器上进行规模化负载注入的限制因素来自于底层操作系统或网络。有一些东西我们可以优化，使之更好。在类似 Linux 的环境中，负载注入器可以产生的并发用户数一般受开放文件限制。我们可以考虑使用ulimit命令增加它。

Another important factor concerns the resources available on the machine. For instance, load injection typically consumes a lot of network bandwidth. If the network throughput of the machine is the limiting factor, we can consider upgrading it. Similarly, CPU or memory available on the machine can be other limiting factors. In cloud-based environments, it’s fairly easy to switch to a more powerful machine.

另一个重要因素涉及机器上的可用资源。例如，负载注入通常会消耗大量的网络带宽。如果机器的网络吞吐量是限制性因素，我们可以考虑升级它。同样地，机器上可用的CPU或内存也可能是其他限制因素。在基于云的环境中，切换到一个更强大的机器是相当容易的。

Finally, the scenarios that we include in our simulation should be resilient, as we should not assume a positive response always under load. Hence, we should be careful and defensive in writing our assertions on the response. Also, we should keep the number of assertions to the bare minimum to save our effort for increasing the throughput.

最后，我们在模拟中包括的场景应该是有弹性的，因为我们不应该假设在负载下总是有积极的反应。因此，我们在编写关于响应的断言时应该小心谨慎，并采取防御措施。另外，我们应该把断言的数量保持在最低限度，以节省我们的精力来提高吞吐量。

6. Conclusion

6.结语

In this tutorial, we went through the basics of executing a distributed performance test with Gatling. We created a simple application to test, developed a simple simulation in Gatling, and then understood how we could execute this from multiple machines.

在本教程中，我们学习了用Gatling执行分布式性能测试的基础知识。我们创建了一个简单的应用程序进行测试，在Gatling中开发了一个简单的模拟，然后了解了如何从多个机器上执行这个程序。

In the process, we also understood the need for distributed performance testing and the best practices related to it.

在这个过程中，我们也了解了分布式性能测试的必要性以及与之相关的最佳实践。