Prof. Anat Bremler-Barr

Blavatnik School of Computer Science, Tel-Aviv University


Conferences & Workshops
Anat Bremler-Barr, Hanoch Levy, Michael Czeizler, Jhonatan Tavori

Today’s software development landscape has witnessed a shift towards microservices based architectures. Using this approach, large software systems are implemented by combining loosely-coupled services, each responsible for specific task and defined with separate scaling properties.
Auto-scaling is a primary capability of cloud computing which allows systems to adapt to fluctuating traffic loads by dynamically increasing (scale-up) and decreasing (scale-down) the number of resources used.

We observe that when microservices which utilize separate auto-scaling mechanisms operate in tandem to process traffic, they may perform ineffectively, especially under overload conditions, due to DDoS attacks. This can result in throttling (Denial of service — DoS) and over-provisioning of resources (Economic Denial of Sustainability — EDoS).

This paper demonstrates how an attacker can exploit the tandem behavior of microservices with different auto-scaling mechanisms to create an attack we denote as the \emph{Tandem Attack}. We demonstrate the attack on a typical \emph{Serverless} architecture and analyze its economical and performance damages. One intriguing finding is that some attacks may make a cloud customer paying for service denied requests.

We conclude that independent scaling of loosely coupled components might form an inherent difficulty and end-to-end controls might be needed.

Projects, thesis, and dissertations
Anat Bremler-Barr, Tal Shapira, Daniel Alfasi

The proliferation of software vulnerabilities poses a significant challenge for security databases and analysts tasked with their timely identification, classification, and remediation. With the National Vulnerability Database (NVD) reporting an ever-increasing number of vulnerabilities, the traditional manual analysis becomes untenably time-consuming and prone to errors. This paper introduces \VulnScopper, an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Language Processing (NLP), to automate and enhance the analysis of software vulnerabilities. Leveraging ULTRA, a knowledge graph foundation model, combined with a Large Language Model (LLM),  VulnScopper effectively handles unseen entities, overcoming the limitations of previous KG approaches.

We evaluate VulnScopper on two major security datasets, the NVD and the Red Hat CVE database. Our method significantly improves the link prediction accuracy between Common Vulnerabilities and Exposures (CVEs), Common Weakness Enumeration (CWEs), and Common Platform Enumerations (CPEs). Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to CPEs and CWEs and presenting an 11.7% improvement over large language models in predicting CWE labels based on the Red Hat database.
Based on the NVD, only 6.37% of the linked CPEs are being published during the first 30 days; many of them are related to critical and high-risk vulnerabilities which, according to multiple compliance frameworks (such as CISA and PCI), should be remediated within 15-30 days. We provide an analysis of several CVEs published during 2023, showcasing the ability of our model to uncover new products previously unlinked to vulnerabilities. As such, our approach dramatically reduces the vulnerability remediation time and improves the vulnerability management process.

Projects, thesis, and dissertations
Anat Bremler-Barr, Bar Meyuhas, Tal Shapira

The IoT market is diverse and characterized by a multitude of vendors that support different device functions (e.g., speaker, camera, vacuum cleaner, etc.). Within this market, IoT security
and observability systems use real-time identification techniques to manage these devices effectively. Most existing IoT identification solutions employ machine learning techniques
that assume the IoT device, labeled by both its vendor and function, was observed during their training phase. We tackle a key challenge in IoT labeling: how can an AI solution
label an IoT device that has never been seen before and whose label is unknown?

Our solution extracts textual features such as domain names and hostnames from network traffic, and then enriches these features using Google search data alongside catalog of vendors
and device functions. The solution also integrates an auto-update mechanism that uses Large Language Models (LLMs) to update these catalogs with emerging device types.
Based on the information gathered, the device’s vendor is identified through string matching with the enriched features.
The function is then deduced by LLMs and zero-shot classification from a predefined catalog of IoT functions. In an evaluation of our solution on 97 unique IoT devices,
our function labeling approach achieved HIT1 and HIT2 scores of 0.7 and 0.77, respectively. As far as we know, this is the first research to tackle AI-automated IoT labeling.

Projects, thesis, and dissertations
Yaniv Naor
Project ,

In recent years, Service Mesh has become a fundamental aspect of most modern cloud-native applications. Service Mesh abstracts the way different parts of the application communicate with each other away from the application itself.  In most cases, the service mesh layer is developed and maintained by third parties. This lets the application developers focus on the business logic without worrying about network complexities. In addition, it makes it a lot easier to adopt new network capabilities such as network policies, retries, circuit breaking, and more. However, all the benefits of the service mesh do not come without a cost. The extra layer responsible for all the network traffic management has a considerable impact on the system performance, as it increases the application latency and resource consumption. Since performance has a key role in almost every modern system, especially in cloud-native applications, this becomes a serious concern that might make developers think twice before they integrate a service mesh into their system.

In this work, we executed various performance tests in order to evaluate and compare the performance overhead of three of the leading service meshes today: Istio, Linkerd, and Cilium. In our experiments, we tested the performance overhead of a service mesh in a service-to-service communication inside a Kubernetes cluster.
The CNCF survey shows that 79% of the respondents adopt service mesh for security reasons such as enforcing mTLS authentication. Therefore, we decided to focus on the impact of the mTLS protocol on performance.

We observed a significant latency and resource consumption overhead in all of the tested service mesh providers. However, some providers performed better than others. Linkerd had the lowest performance overhead compared to Istio and Cilium with just a 33% increase in latency, proving it is a light and simple service mesh as it claims to be. Cilium gave better results than Istio with a 99% increase in latency for Cilium as opposed to a 166% increase for Istio. It shows the performance benefits of its sidecarless architecture and usage of eBPF. Finally, despite Istio being one of the popular service mesh providers and supporting a large number of functionalities and configurations, it has the highest performance overhead among the tested service meshes. In some tests, Istio’s latency increase was almost four times the increase of Linkerd. We aimed to understand the root cause of Istio’s high latency and discovered that some of the steps in the request processing such as HTTP parsing contribute a lot to the performance overhead and the accumulation of all of them creates this significant impact on latency and resource consumption. We believe that this work improves the understanding of the service mesh architecture and its impact on performance.

Poster and brief announcement
Anat Bremler-Barr, Michael Czeizler

Auto-scaling is a fundamental capability of cloud computing which allows consuming resources dynamically according to changing traffic needed to be served.
By the micro-services architecture paradigm, software systems are built as a set of loosely-coupled applications and services that can be individually scaled.
In this paper, we present a new attack the \emph{Tandem Attack} that exploits the Tandem behavior of micro-services with different scaling properties. Such issues can result in Denial of Service (DoS) and Economic Denial of Sustainability (EDoS) created by malicious attackers or self-inflicted due to wrong configurations set up by administrators. We demonstrate the Tandem attack using a popular AWS serverless infrastructure modeling two services and show that removing servers’ management responsibility from the cloud users does not mitigate the different scaling properties challenge and can even make the problem harder to solve.