Performance Comparison of Service Mesh Frameworks: the MTLS Test Case

Yaniv Naor
Project ,
2024
Projects, thesis, and dissertations
Cloud

Abstract

In recent years, Service Mesh has become a fundamental aspect of most modern cloud-native applications. Service Mesh abstracts the way different parts of the application communicate with each other away from the application itself.  In most cases, the service mesh layer is developed and maintained by third parties. This lets the application developers focus on the business logic without worrying about network complexities. In addition, it makes it a lot easier to adopt new network capabilities such as network policies, retries, circuit breaking, and more. However, all the benefits of the service mesh do not come without a cost. The extra layer responsible for all the network traffic management has a considerable impact on the system performance, as it increases the application latency and resource consumption. Since performance has a key role in almost every modern system, especially in cloud-native applications, this becomes a serious concern that might make developers think twice before they integrate a service mesh into their system.

In this work, we executed various performance tests in order to evaluate and compare the performance overhead of three of the leading service meshes today: Istio, Linkerd, and Cilium. In our experiments, we tested the performance overhead of a service mesh in a service-to-service communication inside a Kubernetes cluster.
The CNCF survey shows that 79% of the respondents adopt service mesh for security reasons such as enforcing mTLS authentication. Therefore, we decided to focus on the impact of the mTLS protocol on performance.

We observed a significant latency and resource consumption overhead in all of the tested service mesh providers. However, some providers performed better than others. Linkerd had the lowest performance overhead compared to Istio and Cilium with just a 33% increase in latency, proving it is a light and simple service mesh as it claims to be. Cilium gave better results than Istio with a 99% increase in latency for Cilium as opposed to a 166% increase for Istio. It shows the performance benefits of its sidecarless architecture and usage of eBPF. Finally, despite Istio being one of the popular service mesh providers and supporting a large number of functionalities and configurations, it has the highest performance overhead among the tested service meshes. In some tests, Istio’s latency increase was almost four times the increase of Linkerd. We aimed to understand the root cause of Istio’s high latency and discovered that some of the steps in the request processing such as HTTP parsing contribute a lot to the performance overhead and the accumulation of all of them creates this significant impact on latency and resource consumption. We believe that this work improves the understanding of the service mesh architecture and its impact on performance.