Writing and deploying microservices is easy. Nowadays we have many tools, languages, and frameworks to help us write, build and deploy API services on the cloud in just a few minutes. However, it remains a bigger challenge to build services that actually perform well under heavy load of concurrent clients. In this sense, a straightforward question that we may ask is: How can I reasonably know that my new, up-and-running, microservice will perform well under stress?
17th-century British empiricists, like John Locke and David Hume, struggled with this very question. Well, not exactly the same question, but the underlying philosophical problem is the same: the relationship between experience and knowledge (Locke, 1710; Hume, 1748).
Remarkably, their insights remain relevant still today: It is not possible to achieve actual true knowledge about how our code will perform under stress. Nonetheless, we can test it in order to have some idea of how the service may perform under stress.
Once we have an up-and-running service, there are many tools available to simulate a heavy influx of users and connections in order to push the service to its limits. Perhaps JMeter¹ is the most well-known tool available for this task, but here we will explore Gatling² and its distinct features, along with a proposition of a Kubernetes cluster architecture to distribute the load test execution.
The main feature of Gatling can be found in its headline advertisement: “load test as code”. JMeter, for instance, has its own IDE, used to design and produce the test suites. The resulting Jmeter test file is an xml file — which is terrible for code versioning, reviewing and general maintenance. In this sense, Gatling is more “developer friendly”, as its test suites may be written in its specific Scala DSL³. A test suite (or simulation, following Gatling terminology) in Gatling DSL looks roughly like this:
The above excerpt can be found at the Gatling “quickstart” page⁴, and it is quite complete, albeit simple. This example contains all the elements required to execute a Gatling simulation. A brief description follows.
A Gatling Simulation must extend the type Simulation, as we see in the definition of the class BasicSimulation. Next, we define a variable http, which stores the main http settings that we want to use in our requests. The requests to be performed are defined as executions in a scenario object, as seen in the scn variable definition.
The requests can be built into the scenario in several different ways using Gatling DSL. In this example the request is chained by means of the exec() method, followed by a pause of five milliseconds, to avoid overloading the service. Finally, everything is put together by setting up the simulation with the scenario and the configuration of the protocols by implementing the setup() method of the Simulation type, as seen in the above example. In this case, just one user is injected into the simulation.
At iFood — a unicorn startup company of food delivery — our software platform process more than one million orders on a busy day. To achieve that, we have a plethora of microservices responsible for many features, like user account management, catalog geo-referenced searches, order and payment processing, and delivery tracking, among other stuff.
To anticipate software bottlenecks in our platform, we use Gatling to simulate a heavy influx of orders, projecting the volume of requests way above our current production peak. In such a scenario, certainly, the simulation is somewhat more complex than the simple BasicSimulation example provided above. Currently, our simulation encompasses more than 56 distinct HTTP endpoints, with an uneven load distribution among them. In our test, the final reports produced by Gatling looks somewhat like this: Picture 1: Gatling reports – endpoint list Gatling default reports also produce a few very useful charts, like the ones below: Picture 2: Gatling reports – overall requests/responses per second The above examples are “real-life” results from iFood platform load tests.
In picture 1 we have each endpoint listed, alongside success count, error count, and elapsed time percentiles, which are immensely useful for diagnosing service degradation. In picture 2 we have the overall number of requests and responses over time and the overall picture of the request simulation ramp that we set up.
By the way, the above examples were configured like this:
As shown in the code excerpt above, our simulation injects 5400 concurrent users in the span of 75 minutes and after that, keeps a steady count of 5400 concurrent users for 20 minutes. The chart in picture 2 shows that our platform could not keep up with this scenario — it is quite easy to see that after 5h58, the response throughput degraded heavily. So far, I have shown a basic Gatling setup and argued about the usefulness of load testing to identify throughput bottlenecks in microservices.
However, the main topic of this piece is how to setup Gatling in a distributed environment, to actually hit the test throughput and stress our platform beyond the production peak. In picture 2, the attentive reader will notice that the test report peaked at 50 thousand requests per minute and almost 10 thousand simultaneous users. Could Gatling achieve that kind of throughput, with a simulation of more than 56 HTTP endpoints, in a single computational instance? In our experience, most certainly, no… Unless we use some very specific (and expensive) hardware. Gatling has an excellent throughput by means of its software stack based on Akka and Netty frameworks, both well-known for their great thread poll performances.
However, the computational limits will still be bound by the operating system and hardware resources. The solution is to distribute the load tests, which is not supported by Gatling open-source software but can be done, with some preliminary setup, as shown below.
The overall strategy to execute our Gatling simulation in a multi-node Kubernetes cluster is to: i) build a runtime package containing the simulation and all its framework dependencies; ii) build a docker image of the simulation; iii) deploy n instances of Kubernetes jobs do execute each simulation; and iv) aggregate the results in a single report.
Building a runtime package – the uberjar
There are several ways to run Gatling, including most well-known dependency managers like Maven, Gradle and Sbt, as exemplified in Gatling documentation⁴. By default, Gatling’s Maven plugin will perform the tests by invoking a test goal such as mvn gatling:test, once the Maven plugin is specified in the pom.xml file of your project, as shown below:
However, to execute the test simulations as a standalone app and to avoid the hassle of installing Maven and other dependencies on the OS, we chose to package all library dependencies, Scala, Gatling and the simulation files into a single uber jar file (also called super jar or “fat” jar). To achieve that, a few changes were made to the pom.xml:
1. Build the project as a regular Scala-Maven project:
2. Use the maven-shade-plugin⁵ to package all the dependencies into the jar file:
The important section in the excerpt is the <transformer/>, in which we specify Gatling as the main class of the project, by means of the ManifestResourceTransformer of the Shade plugin. It is also a good idea to exclude a few META-INF artifacts as shown in the <filter/> section.
With these changes, executing mvn package will produce the uberjar, containing Gatling and all its dependencies and thus the simulation can be executed with a regular java -jar command.
Building the executable docker image
This step is very straightforward. Basically we will build a docker image based on OpenJDK 11 together with our jar file. Certainly, it will be convenient to build the jar file in the docker container as well, so our continuous integration platform can remain completely agnostic regarding the software stack. We chose to use a multi-staged docker file:
The first stage is used to build the uberjar file and the second stage is responsible for actually executing it. For the build step, we used a regular maven image, but for the runtime image we chose an Alpine JRE 11 image, in order to reduce the ECR image size and the deployment time.
Other tweaks that were necessary include the addition of curl and a couple Bash scripts to execute our commands. Later on we will discuss in more detail both scripts added at the end of the docker file. The first stage is used to build the uberjar file and the second is responsible for actually executing it. For the build step, we used a regular maven image, however for the runtime image we chose an Alpine JRE 11 image, in order to reduce the ECR image size and reduce the deploy time.
Other tweaks that were necessary include the addition of curl and a couple Bash scripts to execute our commands. Later on we will discuss in more detail both scripts seen at the end of the docker file.
Executing multiple Gatling instances simultaneously
Due to our throughput goal, we already knew that the load tests would have to run distributed. Since the recent trend at iFood to employ Kubernetes clusters for backend services deployment, this approach seemed quite natural and straightforward for the distributed load testing as well. As such, our overall deployment architecture follows to the picture below: Picture 3: Deployment architecture
A Jenkins instance is used to schedule the Gatling simulation execution and the Kubernetes deployment is orchestrated by a Jenkinsfile pipeline, written in Groovy.
The whole process occurs in two distinct phases: the first phase deploys the multiple loadtest-gatling-i pod instances of the kubernetes job deployment, which will run in parallel. Once all pod instances are finished, the second phase starts and the pipeline deploy the loadtest-summary kubernetes job, which retrieves the Gatling simulation raw files and summarize all the data in a single Gatling report, using Gatling’s “report only” execution mode⁷.
Finally, the Jenkins pipeline pulls all interesting artifacts from the summary job: reports, log files and the original raw files to publish them at interesting places and the Gatling reports are made available directly in Jenkins (by means of the Jenkins Gatling Plugin⁶) and Gatling raw files are uploaded to an Amazon S3 bucket for further processing, aggregation and statistical analysis in our data lake.
The same Gatling docker image with the uberjar can be used to execute Gatling in the “report only” mode, by issuing the command: J
There are a few caveats for distributing this kind of execution in Kubernetes. A Kubernetes cluster is composed of one or more nodes, which usually are the individual machines of a cluster.
Each node can execute several Kubernetes pods, so, if you are not careful, you may end up executing the parallel pods in the same node — which is not what we want in a distributed load test. To avoid that issue, in the Kubernetes job yaml descriptor may set the job resources in such a way that no more than one pod will “fit” a single node.
Also, the completions and parallelism of the job descriptor must have the same value, which is number of pods — and nodes — to run in parallel. Both issues are addressed in the highlighted sections of the yaml descriptor excerpt below:
Another issue to be aware of, is that each Kubernetes pod is completely independent of each other and, when the execution of the pod computation finishes, the pod is terminated along with its volume.
So we need to setup some sort of channel in order to summarize the results from all the pods. A shared volume would work, but we chose to deploy a service to act as a repository for all artifacts produced by the load test pods.
In summary, what we have outlined in this paper is that Gatling is very effective as a load testing tool. Its reports are also quite useful to identify performance bottlenecks, providing detailed response time statistical reports for each tested resource.
Besides that, it is easy and convenient for software developers to write test suites using its powerful scala-based DSL. However, Gatling does have one main shortcoming, at least in its open-source incarnation, by not supporting distributed load testing out-of-the-box.
As we also have shown in this paper, this drawback can be overcome by leveraging some kind of distributed processing and, in this paper, we suggested using Kubernetes jobs to accomplish that. Certainly, any distributed processing architecture can accomplish the same goal, by means of the summarizing “reports only” feature of Gatling open-source.
Admittedly, some level of boilerplate orchestration setup will have to be written, but the resulting solution will leverage Gatling’s outputs to new highs.
: “Apache JMeter™.” Apache JMeter — Apache JMeter™, jmeter.apache.org/index.html.
: “For DevOps and CI/CD.” Gatling Open-Source Load Testing, gatling.io/.
: “The Scala Programming Language.” News, http://www.scala-lang.org/.
: “Quickstart — Gatling Open-Source Load Testing Documentation.” Gatling Open-Source Load Testing, gatling.io/docs/3.2/quickstart/.
: Talevi, Mauro. Apache Maven Shade Plugin — Introduction, maven.apache.org/plugins/maven-shade-plugin/.
: Jenkinsci. “Jenkinsci/Gatling-Plugin.” GitHub, github.com/jenkinsci/gatling-plugin.
: “Scaling Out with Gatling Open-Source.” Gatling Open-Source Load Testing, gatling.io//docs/current/cookbook/scaling_out.