Publications by Year

Projects, thesis, and dissertations
Yaniv Naor
Project ,
2024

In recent years, Service Mesh has become a fundamental aspect of most modern cloud-native applications. Service Mesh abstracts the way different parts of the application communicate with each other away from the application itself.  In most cases, the service mesh layer is developed and maintained by third parties. This lets the application developers focus on the business logic without worrying about network complexities. In addition, it makes it a lot easier to adopt new network capabilities such as network policies, retries, circuit breaking, and more. However, all the benefits of the service mesh do not come without a cost. The extra layer responsible for all the network traffic management has a considerable impact on the system performance, as it increases the application latency and resource consumption. Since performance has a key role in almost every modern system, especially in cloud-native applications, this becomes a serious concern that might make developers think twice before they integrate a service mesh into their system.

In this work, we executed various performance tests in order to evaluate and compare the performance overhead of three of the leading service meshes today: Istio, Linkerd, and Cilium. In our experiments, we tested the performance overhead of a service mesh in a service-to-service communication inside a Kubernetes cluster.
The CNCF survey shows that 79% of the respondents adopt service mesh for security reasons such as enforcing mTLS authentication. Therefore, we decided to focus on the impact of the mTLS protocol on performance.

We observed a significant latency and resource consumption overhead in all of the tested service mesh providers. However, some providers performed better than others. Linkerd had the lowest performance overhead compared to Istio and Cilium with just a 33% increase in latency, proving it is a light and simple service mesh as it claims to be. Cilium gave better results than Istio with a 99% increase in latency for Cilium as opposed to a 166% increase for Istio. It shows the performance benefits of its sidecarless architecture and usage of eBPF. Finally, despite Istio being one of the popular service mesh providers and supporting a large number of functionalities and configurations, it has the highest performance overhead among the tested service meshes. In some tests, Istio’s latency increase was almost four times the increase of Linkerd. We aimed to understand the root cause of Istio’s high latency and discovered that some of the steps in the request processing such as HTTP parsing contribute a lot to the performance overhead and the accumulation of all of them creates this significant impact on latency and resource consumption. We believe that this work improves the understanding of the service mesh architecture and its impact on performance.

Conferences & Workshops
Anat Bremler-Barr, Hanoch Levy, Michael Czeizler, Jhonatan Tavori
INFOCOM,
2024

Today’s software development landscape has witnessed a shift towards microservices based architectures. Using this approach, large software systems are implemented by combining loosely-coupled services, each responsible for specific task and defined with separate scaling properties.
Auto-scaling is a primary capability of cloud computing which allows systems to adapt to fluctuating traffic loads by dynamically increasing (scale-up) and decreasing (scale-down) the number of resources used.

We observe that when microservices which utilize separate auto-scaling mechanisms operate in tandem to process traffic, they may perform ineffectively, especially under overload conditions, due to DDoS attacks. This can result in throttling (Denial of service — DoS) and over-provisioning of resources (Economic Denial of Sustainability — EDoS).

This paper demonstrates how an attacker can exploit the tandem behavior of microservices with different auto-scaling mechanisms to create an attack we denote as the \emph{Tandem Attack}. We demonstrate the attack on a typical \emph{Serverless} architecture and analyze its economical and performance damages. One intriguing finding is that some attacks may make a cloud customer paying for service denied requests.

We conclude that independent scaling of loosely coupled components might form an inherent difficulty and end-to-end controls might be needed.

Projects, thesis, and dissertations
Anat Bremler-Barr, Bar Meyuhas, Tal Shapira
arxiv,
2024

The IoT market is diverse and characterized by a multitude of vendors that support different device functions (e.g., speaker, camera, vacuum cleaner, etc.). Within this market, IoT security
and observability systems use real-time identification techniques to manage these devices effectively. Most existing IoT identification solutions employ machine learning techniques
that assume the IoT device, labeled by both its vendor and function, was observed during their training phase. We tackle a key challenge in IoT labeling: how can an AI solution
label an IoT device that has never been seen before and whose label is unknown?

Our solution extracts textual features such as domain names and hostnames from network traffic, and then enriches these features using Google search data alongside catalog of vendors
and device functions. The solution also integrates an auto-update mechanism that uses Large Language Models (LLMs) to update these catalogs with emerging device types.
Based on the information gathered, the device’s vendor is identified through string matching with the enriched features.
The function is then deduced by LLMs and zero-shot classification from a predefined catalog of IoT functions. In an evaluation of our solution on 97 unique IoT devices,
our function labeling approach achieved HIT1 and HIT2 scores of 0.7 and 0.77, respectively. As far as we know, this is the first research to tackle AI-automated IoT labeling.

Projects, thesis, and dissertations
Anat Bremler-Barr, Tal Shapira, Daniel Alfasi
arxiv,
2024

The proliferation of software vulnerabilities poses a significant challenge for security databases and analysts tasked with their timely identification, classification, and remediation. With the National Vulnerability Database (NVD) reporting an ever-increasing number of vulnerabilities, the traditional manual analysis becomes untenably time-consuming and prone to errors. This paper introduces \VulnScopper, an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Language Processing (NLP), to automate and enhance the analysis of software vulnerabilities. Leveraging ULTRA, a knowledge graph foundation model, combined with a Large Language Model (LLM),  VulnScopper effectively handles unseen entities, overcoming the limitations of previous KG approaches.

We evaluate VulnScopper on two major security datasets, the NVD and the Red Hat CVE database. Our method significantly improves the link prediction accuracy between Common Vulnerabilities and Exposures (CVEs), Common Weakness Enumeration (CWEs), and Common Platform Enumerations (CPEs). Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to CPEs and CWEs and presenting an 11.7% improvement over large language models in predicting CWE labels based on the Red Hat database.
Based on the NVD, only 6.37% of the linked CPEs are being published during the first 30 days; many of them are related to critical and high-risk vulnerabilities which, according to multiple compliance frameworks (such as CISA and PCI), should be remediated within 15-30 days. We provide an analysis of several CVEs published during 2023, showcasing the ability of our model to uncover new products previously unlinked to vulnerabilities. As such, our approach dramatically reduces the vulnerability remediation time and improves the vulnerability management process.

Poster and brief announcement
Anat Bremler-Barr, Michael Czeizler
INFOCOM,
2023

Auto-scaling is a fundamental capability of cloud computing which allows consuming resources dynamically according to changing traffic needed to be served.
By the micro-services architecture paradigm, software systems are built as a set of loosely-coupled applications and services that can be individually scaled.
In this paper, we present a new attack the \emph{Tandem Attack} that exploits the Tandem behavior of micro-services with different scaling properties. Such issues can result in Denial of Service (DoS) and Economic Denial of Sustainability (EDoS) created by malicious attackers or self-inflicted due to wrong configurations set up by administrators. We demonstrate the Tandem attack using a popular AWS serverless infrastructure modeling two services and show that removing servers’ management responsibility from the cloud users does not mitigate the different scaling properties challenge and can even make the problem harder to solve.

Poster and brief announcement
Yehuda Afek, Anat Bremler-Barr, Shani Stajnrod
Usenix Security ,
2023

To fully understand the root cause of the NRDelegationAttack and to analyze its amplification factor, we developed mini- lab setup, disconnected from the Internet, that contains all
the components of the DNS system, a client, a resolver, and authoritative name servers. This setup is built to analyze and examine the behavior of a resolver (or any other component) under the microscope. On the other hand it is not useful for performance analysis (stress analysis).
Here we provide the code and details of this setup enabling to reproduce our analysis. Moreover, researchers may find it useful for farther behavioral analysis and examination of different components in the DNS system.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Daniel Bachar
IFIP Networking,
2023

With the advent of cloud and container technologies, enterprises develop applications using a microservices architecture, managed by orchestration systems (e.g. Kubernetes), that group the microservices into clusters. As the number of application setups across multiple clusters and different clouds is increasing, technologies that enable communication and service discovery between the clusters are emerging (mainly as part of the Cloud Native ecosystem).
In such a multi-cluster setting, copies of the same microservice may be deployed in different geo-locations, each with different cost and latency penalties. Yet, current service selection and load balancing mechanisms do not take into account these locations and corresponding penalties.
We present \emph{MCOSS}, a novel solution for optimizing the service selection, given a certain microservice deployment among clouds and clusters in the system. Our solution is agnostic to the different multi-cluster networking layers, cloud vendors, and discovery mechanisms used by the operators. Our simulations show a reduction in outbound traffic cost by up to 72% and response time by up to 64%, compared to the currently-deployed service selection mechanisms.

Poster and brief announcement
Anat Bremler-Barr, Tal Shapira, Daniel Alfasi
Systor,
2023

With the continuous increase in reported Common Vulnerabilities and Exposures (CVEs), security teams are overwhelmed by vast amounts of data, which are often analyzed manually, leading to a slow and inefficient process. To address cybersecurity threats effectively, it is essential to establish connections across multiple security entity databases, including CVEs, Common Weakness Enumeration (CWEs), and Common Attack Pattern Enumeration and Classification (CAPECs). In this study, we introduce a new approach that leverages the RotatE [4] knowledge graph embedding model, initialized with embeddings from Ada language model developed by OpenAI [3]. Additionally, we extend this approach by initializing the embeddings for the relations.

Technical reports
Yehuda Afek, Anat Bremler-Barr, Niv Focus,
2023

The objective of this study is to propose an efficient solution for Low-Rate Attacks (LRA), such as scraping attacks that aim to download all the Uniform Resource Identifiers (URIs) of a website. Attackers attempt to evade detection by behaving like regular users while browsing a small set of distinct pages (URI) at small time scales. However, at larger time scales, the attacker becomes a distinct heavy hitter that requests numerous distinct URIs. Although there are several space-efficient and time-efficient methods to detect distinct heavy hitters, they still require excessive memory to track all users over a large time scale. In this research, an innovative streaming algorithm is proposed to detect the attacker.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Bar Meyuhas, Shoham Danino
ACM/IRTF Applied Networking Research Workshop (ANRW),
2023

We explore the impact of device location on the communication endpoints of IoT devices within the context of Manufacturer Usage Description (MUD), an IETF security framework for IoT devices.
Two types of device location are considered: IP-based location, which corresponds to the physical location of the device based on its IP address; and user-defined location, which is chosen during device registration.
Our findings show that IP-based location barely affects the domain set with which IoT devices interact. Conversely, user-defined location drastically changes this set, mainly through region-specific domains that embody location identifiers selected by the user at registration.
We examine these findings’ effects on creating MUD file tools and IoT device identification. As MUD files rely on allowlists of domain allowlists, we show that security appliances supporting MUD need to manage a significantly larger number of MUD rules than initially anticipated. 
To address this challenge, we leverage EDNS Client Subnet (ECS) extension to differentiate user-defined locations without needing regional domains, consequently reducing the number of Access Control Entries (ACEs) required by security appliances.

Poster and brief announcement
Anat Bremler-Barr, Hanoch Levy, Jhonatan Tavori
ACM CoNEXT,
2023

Retry mechanisms are commonly used in microservices architectures as a mechanism for recovering from transit errors, including network failures and service overloading. This research aims at studying the operation of cloud retry mechanisms under deliberated DDoS attacks, and their effect on the application performance and operational costs. In this poster we focus on the economic aspect, and demonstrate that enabling such mechanisms improperly might be counter-productive and expose the system to substantial
and quadratic economical damage in the presence of attacks.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Shani Stajnrod
Usenix Security ,
2023

Malicious actors carrying out distributed denial-of-service (DDoS) attacks are interested in requests that consume a large amount of resources and provide them with ammunition. We present a severe complexity attack on DNS resolvers, where a single malicious query to a DNS resolver can significantly increase its CPU load. Even a few such concurrent queries can result in resource exhaustion and lead to a denial of its service to legitimate clients. This attack is unlike most recent DDoS attacks on DNS servers, which use communication amplification attacks where a single query generates a large number of message exchanges between DNS servers.

The attack described here involves a malicious client whose request to a target resolver is sent to a collaborating malicious authoritative server; this server, in turn, generates a carefully crafted referral response back to the (victim) resolver. The chain reaction of requests continues, leading to the delegation of queries. These ultimately direct the resolver to a server that does not respond to DNS queries. The exchange generates a long sequence of cache and memory accesses that dramatically increase the CPU load on the target resolver. Hence the name non-responsive delegation attack, or NRDelegationAttack.

We demonstrate that three major resolver implementations, BIND9, Unbound, and Knot, are affected by the NRDelegationAttack, and carry out a detailed analysis of the amplification factor on a BIND9 based resolver. As a result of this work, three common vulnerabilities and exposures (CVEs) regarding NRDelegationAttack were issued by these resolver implementations. We also carried out minimal testing on 16 open resolvers, confirming that the attack affects them as well.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Dor Israeli and Alon Noy
The International Symposium on Cyber Security, Cryptology and Machine Learning (CSCML),
2023

This paper presents a new localhost browser based vulnerability and corresponding attack that opens the door to new attacks on private networks and local devices. We show that this new vulnerability may put hundreds of millions of internet users and their IoT devices at risk. Following the attack presentation, we suggest three new protection mechanisms to mitigate this vulnerability.
This new attack bypasses recently suggested protection mechanisms designed to stop browser-based attacks on private devices and local applications.

Conferences & Workshops
Anat Bremler-Barr, Matan Sabag
IFIP Networking,
2022

Distributed denial of service (DDoS) attacks, especially distributed reflection denial of service attacks (DRDoS), have increased dramatically in frequency and volume in recent years. Such attacks are possible due to the attacker’s ability to spoof the source address of IP packets. Since the early days of the internet, authenticating the IP source address has remained unresolved in the real world. Although there are many methods available to eliminate source spoofing, they are not widely used, primarily due to a lack of economic incentives.
We propose a collaborative on-demand route-based defense technique (CORB) to offer efficient DDoS mitigation as a paid-for-service, and efficiently assuage reflector attacks before they reach the reflectors and flood the victim. The technique uses scrubbing facilities located across the internet at internet service providers (ISPs) and internet exchange points (IXPs).
By transmitting a small amount of data based on border gateway protocol (BGP) information from the victim to the scrubbing facilities, we can filter out the attack without any false-positive cases. For example, the data can be sent using DOTS, a new signaling DDoS protocol that was standardized by the IETF. CORB filters the attack before it is amplified by the reflector, thereby reducing the overall cost of the attack. This provides a win-win financial situation for the victim and the scrubbing facilities that provide the service.
We demonstrate the value of CORB by simulating a Memcached DRDoS attack using real-life data. Our evaluation found that deploying CORB on scrubbing facilities at approximately 40 autonomous systems blocks 90% of the attack and can reduce the mitigation cost by 85%.

Conferences & Workshops
Eli Brosh, Elad Wasserstein, Anat Bremler-Barr
IEEE/IFIP NOMS Manage-IoT workshop ,
2022

Monitoring medical data, e.g., Electrocardiogram (ECG) signals, is a common application of Internet of Things (IoT) devices. Compression methods are often applied on the massive amounts of sensor data generated prior to sending it to the Cloud to reduce the storage and delivery costs. A lossy compression provides high compression gain (CG), but may reduce the performance of an ECG application (downstream task) due to information loss. Previous works on ECG monitoring focus either on optimizing the signal reconstruction or the task’s performance. Instead, we advocate a self-adapting lossy compression solution that allows configuring a desired performance level on the downstream tasks while maintaining an optimized CG that reduces Cloud costs.
We propose Dynamic-Deep, a task-aware compression geared for IoT-Cloud architectures. Our compressor is trained to optimize the CG while maintaining the performance requirement of the downstream tasks chosen out of a wide range. In deployment, the IoT edge device adapts the compression and sends an optimized representation for each data segment, accounting for the downstream task’s desired performance without relying on feedback from the Cloud. We conduct an extensive evaluation of our approach on common ECG datasets using two popular ECG applications, which includes heart rate (HR) arrhythmia classification. We demonstrate that Dynamic-Deep can be configured to improve HR classification F1-score in a wide range of requirements. One of which is tuned to improve the F1-score by 3 and increases CG by up to 83% compared to the previous state of-the-art (autoencoder-based) compressor. Analyzing DynamicDeep on the Google Cloud Platform, we observe a 97% reduction in cloud costs compared to a no compression solution. To the best of our knowledge, Dynamic-Deep is the first end-to end system architecture proposal to focus on balancing the need for high performance of cloud-based downstream tasks and the desire to achieve optimized compression in IoT ECG monitoring settings.

Refine list

Publication Type

Publication Type Filter

Author

Publication Author Filter - manual

Venue

Publication Venue Filter - manual