Anat Bremler-Barr

Publications by Year

Projects, thesis, and dissertations
Anat Bremler-Barr, Bar Meyuhas, Tal Shapira
arxiv,
2024

The IoT market is diverse and characterized by a multitude of vendors that support different device functions (e.g., speaker, camera, vacuum cleaner, etc.). Within this market, IoT security
and observability systems use real-time identification techniques to manage these devices effectively. Most existing IoT identification solutions employ machine learning techniques
that assume the IoT device, labeled by both its vendor and function, was observed during their training phase. We tackle a key challenge in IoT labeling: how can an AI solution
label an IoT device that has never been seen before and whose label is unknown?

Our solution extracts textual features such as domain names and hostnames from network traffic, and then enriches these features using Google search data alongside catalog of vendors
and device functions. The solution also integrates an auto-update mechanism that uses Large Language Models (LLMs) to update these catalogs with emerging device types.
Based on the information gathered, the device’s vendor is identified through string matching with the enriched features.
The function is then deduced by LLMs and zero-shot classification from a predefined catalog of IoT functions. In an evaluation of our solution on 97 unique IoT devices,
our function labeling approach achieved HIT1 and HIT2 scores of 0.7 and 0.77, respectively. As far as we know, this is the first research to tackle AI-automated IoT labeling.

Projects, thesis, and dissertations
Anat Bremler-Barr, Tal Shapira, Daniel Alfasi
arxiv,
2024

The proliferation of software vulnerabilities poses a significant challenge for security databases and analysts tasked with their timely identification, classification, and remediation. With the National Vulnerability Database (NVD) reporting an ever-increasing number of vulnerabilities, the traditional manual analysis becomes untenably time-consuming and prone to errors. This paper introduces \VulnScopper, an innovative approach that utilizes multi-modal representation learning, combining Knowledge Graphs (KG) and Natural Language Processing (NLP), to automate and enhance the analysis of software vulnerabilities. Leveraging ULTRA, a knowledge graph foundation model, combined with a Large Language Model (LLM),  VulnScopper effectively handles unseen entities, overcoming the limitations of previous KG approaches.

We evaluate VulnScopper on two major security datasets, the NVD and the Red Hat CVE database. Our method significantly improves the link prediction accuracy between Common Vulnerabilities and Exposures (CVEs), Common Weakness Enumeration (CWEs), and Common Platform Enumerations (CPEs). Our results show that VulnScopper outperforms existing methods, achieving up to 78% Hits@10 accuracy in linking CVEs to CPEs and CWEs and presenting an 11.7% improvement over large language models in predicting CWE labels based on the Red Hat database.
Based on the NVD, only 6.37% of the linked CPEs are being published during the first 30 days; many of them are related to critical and high-risk vulnerabilities which, according to multiple compliance frameworks (such as CISA and PCI), should be remediated within 15-30 days. We provide an analysis of several CVEs published during 2023, showcasing the ability of our model to uncover new products previously unlinked to vulnerabilities. As such, our approach dramatically reduces the vulnerability remediation time and improves the vulnerability management process.

Conferences & Workshops
Anat Bremler-Barr, Hanoch Levy, Michael Czeizler, Jhonatan Tavori
INFOCOM,
2024

Today’s software development landscape has witnessed a shift towards microservices based architectures. Using this approach, large software systems are implemented by combining loosely-coupled services, each responsible for specific task and defined with separate scaling properties.
Auto-scaling is a primary capability of cloud computing which allows systems to adapt to fluctuating traffic loads by dynamically increasing (scale-up) and decreasing (scale-down) the number of resources used.

We observe that when microservices which utilize separate auto-scaling mechanisms operate in tandem to process traffic, they may perform ineffectively, especially under overload conditions, due to DDoS attacks. This can result in throttling (Denial of service — DoS) and over-provisioning of resources (Economic Denial of Sustainability — EDoS).

This paper demonstrates how an attacker can exploit the tandem behavior of microservices with different auto-scaling mechanisms to create an attack we denote as the \emph{Tandem Attack}. We demonstrate the attack on a typical \emph{Serverless} architecture and analyze its economical and performance damages. One intriguing finding is that some attacks may make a cloud customer paying for service denied requests.

We conclude that independent scaling of loosely coupled components might form an inherent difficulty and end-to-end controls might be needed.

Poster and brief announcement
Anat Bremler-Barr, Hanoch Levy, Jhonatan Tavori
ACM CoNEXT,
2023

Retry mechanisms are commonly used in microservices architectures as a mechanism for recovering from transit errors, including network failures and service overloading. This research aims at studying the operation of cloud retry mechanisms under deliberated DDoS attacks, and their effect on the application performance and operational costs. In this poster we focus on the economic aspect, and demonstrate that enabling such mechanisms improperly might be counter-productive and expose the system to substantial
and quadratic economical damage in the presence of attacks.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Bar Meyuhas, Shoham Danino
ACM/IRTF Applied Networking Research Workshop (ANRW),
2023

We explore the impact of device location on the communication endpoints of IoT devices within the context of Manufacturer Usage Description (MUD), an IETF security framework for IoT devices.
Two types of device location are considered: IP-based location, which corresponds to the physical location of the device based on its IP address; and user-defined location, which is chosen during device registration.
Our findings show that IP-based location barely affects the domain set with which IoT devices interact. Conversely, user-defined location drastically changes this set, mainly through region-specific domains that embody location identifiers selected by the user at registration.
We examine these findings’ effects on creating MUD file tools and IoT device identification. As MUD files rely on allowlists of domain allowlists, we show that security appliances supporting MUD need to manage a significantly larger number of MUD rules than initially anticipated. 
To address this challenge, we leverage EDNS Client Subnet (ECS) extension to differentiate user-defined locations without needing regional domains, consequently reducing the number of Access Control Entries (ACEs) required by security appliances.

Poster and brief announcement
Yehuda Afek, Anat Bremler-Barr, Shani Stajnrod
Usenix Security ,
2023

To fully understand the root cause of the NRDelegationAttack and to analyze its amplification factor, we developed mini- lab setup, disconnected from the Internet, that contains all
the components of the DNS system, a client, a resolver, and authoritative name servers. This setup is built to analyze and examine the behavior of a resolver (or any other component) under the microscope. On the other hand it is not useful for performance analysis (stress analysis).
Here we provide the code and details of this setup enabling to reproduce our analysis. Moreover, researchers may find it useful for farther behavioral analysis and examination of different components in the DNS system.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Daniel Bachar
IFIP Networking,
2023

With the advent of cloud and container technologies, enterprises develop applications using a microservices architecture, managed by orchestration systems (e.g. Kubernetes), that group the microservices into clusters. As the number of application setups across multiple clusters and different clouds is increasing, technologies that enable communication and service discovery between the clusters are emerging (mainly as part of the Cloud Native ecosystem).
In such a multi-cluster setting, copies of the same microservice may be deployed in different geo-locations, each with different cost and latency penalties. Yet, current service selection and load balancing mechanisms do not take into account these locations and corresponding penalties.
We present \emph{MCOSS}, a novel solution for optimizing the service selection, given a certain microservice deployment among clouds and clusters in the system. Our solution is agnostic to the different multi-cluster networking layers, cloud vendors, and discovery mechanisms used by the operators. Our simulations show a reduction in outbound traffic cost by up to 72% and response time by up to 64%, compared to the currently-deployed service selection mechanisms.

Poster and brief announcement
Anat Bremler-Barr, Michael Czeizler
INFOCOM,
2023

Auto-scaling is a fundamental capability of cloud computing which allows consuming resources dynamically according to changing traffic needed to be served.
By the micro-services architecture paradigm, software systems are built as a set of loosely-coupled applications and services that can be individually scaled.
In this paper, we present a new attack the \emph{Tandem Attack} that exploits the Tandem behavior of micro-services with different scaling properties. Such issues can result in Denial of Service (DoS) and Economic Denial of Sustainability (EDoS) created by malicious attackers or self-inflicted due to wrong configurations set up by administrators. We demonstrate the Tandem attack using a popular AWS serverless infrastructure modeling two services and show that removing servers’ management responsibility from the cloud users does not mitigate the different scaling properties challenge and can even make the problem harder to solve.

Poster and brief announcement
Anat Bremler-Barr, Tal Shapira, Daniel Alfasi
Systor,
2023

With the continuous increase in reported Common Vulnerabilities and Exposures (CVEs), security teams are overwhelmed by vast amounts of data, which are often analyzed manually, leading to a slow and inefficient process. To address cybersecurity threats effectively, it is essential to establish connections across multiple security entity databases, including CVEs, Common Weakness Enumeration (CWEs), and Common Attack Pattern Enumeration and Classification (CAPECs). In this study, we introduce a new approach that leverages the RotatE [4] knowledge graph embedding model, initialized with embeddings from Ada language model developed by OpenAI [3]. Additionally, we extend this approach by initializing the embeddings for the relations.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Shani Stajnrod
Usenix Security ,
2023

Malicious actors carrying out distributed denial-of-service (DDoS) attacks are interested in requests that consume a large amount of resources and provide them with ammunition. We present a severe complexity attack on DNS resolvers, where a single malicious query to a DNS resolver can significantly increase its CPU load. Even a few such concurrent queries can result in resource exhaustion and lead to a denial of its service to legitimate clients. This attack is unlike most recent DDoS attacks on DNS servers, which use communication amplification attacks where a single query generates a large number of message exchanges between DNS servers.

The attack described here involves a malicious client whose request to a target resolver is sent to a collaborating malicious authoritative server; this server, in turn, generates a carefully crafted referral response back to the (victim) resolver. The chain reaction of requests continues, leading to the delegation of queries. These ultimately direct the resolver to a server that does not respond to DNS queries. The exchange generates a long sequence of cache and memory accesses that dramatically increase the CPU load on the target resolver. Hence the name non-responsive delegation attack, or NRDelegationAttack.

We demonstrate that three major resolver implementations, BIND9, Unbound, and Knot, are affected by the NRDelegationAttack, and carry out a detailed analysis of the amplification factor on a BIND9 based resolver. As a result of this work, three common vulnerabilities and exposures (CVEs) regarding NRDelegationAttack were issued by these resolver implementations. We also carried out minimal testing on 16 open resolvers, confirming that the attack affects them as well.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Dor Israeli and Alon Noy
The International Symposium on Cyber Security, Cryptology and Machine Learning (CSCML),
2023

This paper presents a new localhost browser based vulnerability and corresponding attack that opens the door to new attacks on private networks and local devices. We show that this new vulnerability may put hundreds of millions of internet users and their IoT devices at risk. Following the attack presentation, we suggest three new protection mechanisms to mitigate this vulnerability.
This new attack bypasses recently suggested protection mechanisms designed to stop browser-based attacks on private devices and local applications.

Poster and brief announcement
Anat Bremler-Barr, Bar Meyuhas, Ran Shister
IEEE/IFIP NOMS,
2022

The Manufacturer Usage Description (MUD) is an IETF white-list protection scheme that formalizes the authorized network behavior in a MUD file; this MUD file can then be used as a type of firewall mechanism.

This demo introduces MUDIS, a MUD Inspection System that inspects the network behavior of devices, based on their formal description in the MUD file. We present several use-cases in which MUDIS is useful, including examining the impact of device location, the impact of a firmware update, the correlation of network behavior between different devices of the same manufacture, and more.

MUDIS inspects two MUD files, clusters together and graph- ically visualizes identical, similar, and dissimilar rules. It then calculates a similarity score that measures the similarity between them both. It also generalizes the two MUD files where possible, such that the resulting generalized MUD covers all the permitted (white-list) network behavior for both MUDs.

Our open-source MUDIS tool and proof-of-concept dataset are available for researchers and IoT manufacturers, allowing anyone to gain meaningful insights over the network behavior of IoT devices.

Conferences & Workshops
Eli Brosh, Elad Wasserstein, Anat Bremler-Barr
IEEE/IFIP NOMS Manage-IoT workshop ,
2022

Monitoring medical data, e.g., Electrocardiogram (ECG) signals, is a common application of Internet of Things (IoT) devices. Compression methods are often applied on the massive amounts of sensor data generated prior to sending it to the Cloud to reduce the storage and delivery costs. A lossy compression provides high compression gain (CG), but may reduce the performance of an ECG application (downstream task) due to information loss. Previous works on ECG monitoring focus either on optimizing the signal reconstruction or the task’s performance. Instead, we advocate a self-adapting lossy compression solution that allows configuring a desired performance level on the downstream tasks while maintaining an optimized CG that reduces Cloud costs.
We propose Dynamic-Deep, a task-aware compression geared for IoT-Cloud architectures. Our compressor is trained to optimize the CG while maintaining the performance requirement of the downstream tasks chosen out of a wide range. In deployment, the IoT edge device adapts the compression and sends an optimized representation for each data segment, accounting for the downstream task’s desired performance without relying on feedback from the Cloud. We conduct an extensive evaluation of our approach on common ECG datasets using two popular ECG applications, which includes heart rate (HR) arrhythmia classification. We demonstrate that Dynamic-Deep can be configured to improve HR classification F1-score in a wide range of requirements. One of which is tuned to improve the F1-score by 3 and increases CG by up to 83% compared to the previous state of-the-art (autoencoder-based) compressor. Analyzing DynamicDeep on the Google Cloud Platform, we observe a 97% reduction in cloud costs compared to a no compression solution. To the best of our knowledge, Dynamic-Deep is the first end-to end system architecture proposal to focus on balancing the need for high performance of cloud-based downstream tasks and the desire to achieve optimized compression in IoT ECG monitoring settings.

Conferences & Workshops
Anat Bremler-Barr, Matan Sabag
IFIP Networking,
2022

Distributed denial of service (DDoS) attacks, especially distributed reflection denial of service attacks (DRDoS), have increased dramatically in frequency and volume in recent years. Such attacks are possible due to the attacker’s ability to spoof the source address of IP packets. Since the early days of the internet, authenticating the IP source address has remained unresolved in the real world. Although there are many methods available to eliminate source spoofing, they are not widely used, primarily due to a lack of economic incentives.
We propose a collaborative on-demand route-based defense technique (CORB) to offer efficient DDoS mitigation as a paid-for-service, and efficiently assuage reflector attacks before they reach the reflectors and flood the victim. The technique uses scrubbing facilities located across the internet at internet service providers (ISPs) and internet exchange points (IXPs).
By transmitting a small amount of data based on border gateway protocol (BGP) information from the victim to the scrubbing facilities, we can filter out the attack without any false-positive cases. For example, the data can be sent using DOTS, a new signaling DDoS protocol that was standardized by the IETF. CORB filters the attack before it is amplified by the reflector, thereby reducing the overall cost of the attack. This provides a win-win financial situation for the victim and the scrubbing facilities that provide the service.
We demonstrate the value of CORB by simulating a Memcached DRDoS attack using real-life data. Our evaluation found that deploying CORB on scrubbing facilities at approximately 40 autonomous systems blocks 90% of the attack and can reduce the mitigation cost by 85%.

Conferences & Workshops
Anat Bremler-Barr, Bar Meyuhas, Ran Shister
IEEE/IFIP NOMS,
2022

Analyzing the network behavior of IoT devices, including which domains, protocols, and ports the device communicates with, is a fundamental challenge for IoT security and identification. Solutions that analyze and manage these areas must be able to learn what constitutes normal device behavior and then extract rules and features to permit only legitimate behavior or identify the device. The Manufacturer Usage Description (MUD) is an IETF white-list protection scheme that formalizes the authorized network behavior in a MUD file; this MUD file can then be used as a type of firewall mechanism.

We demonstrate that learning what is normal behavior for an IoT device is more challenging than expected. In many cases, the same IoT device, with the same firmware, can exhibit different behavior or connect to different domains with different protocols, depending on the device’s geographical location.

Then, we present a technique to generalize MUD files. By processing MUD files that originate in different locations, we can generalize and create a comprehensive MUD file that is applicable for all locations.
To conduct the research, we created MUDIS, a MUD Inspection System tool, that compares and generalizes MUD files. Our open-source MUDIS tool and dataset are available online to researchers and IoT manufacturers, allowing anyone to visualize, compare, and generalize MUD files.

Conferences & Workshops
Anat Bremler-Barr, Ronen Ben David
CLOSER 2021,
2021

In recent years, we have witnessed a new kind of DDoS attack, the burst attack(Chai, 2013; Dahan, 2018), where the attacker launches periodic bursts of traffic overload on online targets. Recent work presents a new kind of Burst attack, the YoYo attack (Bremler-Barr et al., 2017) that operates against the auto-scaling mechanism of VMs in the cloud. The periodic bursts of traffic loads cause the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The auto-scaling mechanism translates the flat DDoS attacks into Economic Denial of Sustainability attacks (EDoS), where the victim suffers from economic damage accrued by paying for extra resources required to process the traffic generated by the attacker. However, it was shown that YoYo attack also causes significant performance degradation since it takes time to scale-up VMs.In this research, we analyze the resilience of Kubernetes auto-scaling against YoYo attacks. As containerized cloud applications using Kubernetes gain popularity and replace VM-based architecture in recent years. We present experimental results on Google Cloud Platform, showing that even though the scale-up time of containers is much lower than VM, Kubernetes is still vulnerable to the YoYo attack since VMs are still involved. Finally, we evaluate ML models that can accurately detect YoYo attack on a Kubernetes cluster

Poster and brief announcement
Yehuda Afek, Anat Bremler-Barr, Neta Peleg
IMC,
2021

We investigate the negative caching (caching of NXdomain
responses) behavior on nine large open DNS resolvers. We
measure the amount of time an NXDomain response is kept
in the cache in various TTL configurations and compare it
to the time an existent domain is kept in the cache.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, David Hay, Avraham Shalev
IEEE iThings,
2021

Manufacturer Usage Description (MUD) is a new, whitelist-based cybersecurity framework that was recently proposed by the IETF to cope with the huge attack surface and a constantly increasing number of IoT devices connected to the Internet.
MUD allows the IoT manufacturers themselves to publish the legitimate communication patterns of their devices, making it easier for security devices to enforce this policy, filter out non-complying traffic, and block a device in case it has been compromised.
Typically, MUD includes a set of legitimate endpoints, specified either by domain names or by IP addresses, along with the legitimate port numbers and protocols. While these descriptions are adequate when IoT devices connect (as clients) to servers (e.g., services in the cloud), they cannot adequately describe the cases where IoT devices act as servers to which endpoints connect [1]. These endpoints (e.g., users’ mobile devices) typically do not have fixed IP addresses, nor do they associate with a domain name. In this case, accounting for 78% of IoT devices we have surveyed, MUD degrades nowadays to allow all possible endpoints and cannot mitigate any attack. In this work, we evaluate this phenomenon and show it has a high prevalence today, thus harming dramatically the MUD framework security efficiency. We then present a solution, MUDirect, which enhances the MUD framework to deal with these cases while preserving the current MUD specification. Finally, we have implemented our solution (extending the existing osMUD implementation [2]) and showed that it enables P2P IoT devices protection while having minimal changes to the osMUD code.

Poster and brief announcement
Anat Bremler-Barr, Bar Meyuhas, Ran Shister
IMC,
2021

Analyzing the network behavior of IoT devices, including which domains, protocols, and ports the device communicates with, is a fundamental challenge for IoT security and identification. Solutions that analyze and manage these areas must be able to learn what constitutes normal device behavior and then extract rules and features to permit only legitimate behavior or to identify the device. The Manufacturer Usage Description (MUD) is an IETF white-list protection scheme that formalizes the authorized network behavior in a MUD file; this MUD file can then be used as a type of firewall mechanism.
We demonstrate that learning what is normal behavior for an IoT device is more challenging than expected. In many cases, the same IoT device, with the same firmware, can exhibit different behavior or connect to different domains with different protocols. This behavior can even change, depending on the device’s geographical location. Thus, MUD functioning and IoT identification methods may not be effective in different locations. The reasons for this vary from country requirements to weak encryption, privacy regulations, CDN-like solutions, and more.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Gafnit Abraham, David Hay, Ran Goldschmidt, Lior Shafir, Avraham Shalev
IEEE/IFIP NOMS,
2020

A new scalable ISP level system architecture to secure and protect all IoT devices in a large number of homes is presented. The system is based on whitelisting, as in the Manufacturer Usage Description (MUD) framework, implemented as a VNF. Unlike common MUD suggestions that place the whitelist application at the home/enterprise network, our approach is to place the enforcement upstream at the provider network, combining an NFV (Network Function Virtualization) with router/switching filtering capabilities, e.g., ACLs. The VNF monitors many home networks simultaneously, and therefore, is a highly-scalable managed service solution that provides both the end customers and the ISP with excellent visibility and security of the IoT devices at the customer premises.
The system includes a mechanism to distinguish between flows of different devices at the ISP level despite the fact that most home networks (and their IoT devices) are behind a NAT and all the flows from the same home come out with the same source IP address. Moreover, the NFV system needs to receive only the first packet of each connection at the VNF, and rules space is proportional to the number of unique types of IoT devices rather than the number of IoT devices. The monitoring part of the solution is off the critical path and can also uniquely protect from incoming DDoS attacks.
To cope with internal traffic, that is not visible outside the customer premise and often consists of P2P communication, we suggest a hybrid approach, where we deploy a lightweight component at the CPE, whose sole purpose is to monitor P2P communication. As current MUD solution does not provide a secure solution to P2P communication, we also extend the MUD protocol to deal also with peer-to-peer communicating devices. A PoC with a large national level ISP proves that our technology works as expected.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Lior Shafir
Usenix Security,
2020

This paper exposes a new vulnerability and introduces a corresponding attack, the NoneXistent Name Server Attack (NXNSAttack), that disrupts and may paralyze the DNS system, making it difficult or impossible for Internet users to access websites, web e-mail, online video chats, or any other online resource. The NXNSAttack generates a storm of packets between DNS resolvers and DNS authoritative name servers. The storm is produced by the response of resolvers to unrestricted referral response messages of authoritative name servers. The attack is significantly more destructive than NXDomain attacks (e.g., the Mirai attack): i) It reaches an amplification factor of more than 1620x on the number of packets exchanged by the recursive resolver. ii) In addition to the negative cache, the attack also saturates the ‘NS’ section of the resolver caches. To mitigate the attack impact, we propose an enhancement to the recursive resolver algorithm, MaxFetch(k), that prevents unnecessary proactive fetches. We implemented the MaxFetch(1) mitigation enhancement on a BIND resolver and tested it on real-world DNS query datasets. Our results show that MaxFetch(1) degrades neither the recursive resolver throughput nor its latency. Following the discovery of the attack, a responsible disclosure procedure was carried out, and several DNS vendors and public providers have issued a CVE and patched their systems.

Conferences & Workshops
Anat Bremler-Barr, Haim Levy, Zohar Yakhini
IEEE/IFIP NOMS,
2020

In recent years the number of IoT devices in home networks has increased dramatically. Whenever a new device connects to the network, it must be quickly managed and secured using the relevant security mechanism or QoS policy. Thus a key challenge is to distinguish between IoT and NoT devices in a matter of minutes. Unfortunately, there is no clear indication of whether a device in a network is an IoT. In this paper, we propose different classifiers that identify a device as IoT or non-IoT, in a short time scale, and with high accuracy.
Our classifiers were constructed using machine learning techniques on a seen (training) dataset and were tested on an unseen (test) dataset. They successfully classified devices that were not in the seen dataset with accuracy above 95%. The first classifier is a logistic regression classifier based on traffic features. The second classifier is based on features we retrieve from DHCP packets. Finally, we present a unified classifier that leverages the advantages of the other two classifiers.

Technical reports
Yehuda Afek, Anat Bremler-Barr, Alon Noy
arxiv,
2020

In this paper we present three attacks on private internal networks behind a NAT and a corresponding new
protection mechanism, Internal Network Policy, to mitigate a wide range of attacks that penetrate internal networks behind a NAT. In the attack scenario, a victim is
tricked to visit the attacker’s website, which contains a
malicious script that lets the attacker access the victim’s
internal network in different ways, including opening a
port in the NAT or sending a sophisticated request to
local devices. The first attack utilizes DNS Rebinding
in a particular way, while the other two demonstrate different methods of attacking the network, based on application security vulnerabilities. Following the attacks,
we provide a new browser security policy, Internal Network Policy (INP), which protects against these types of vulnerabilities and attacks. This policy is implemented
in the browser just like Same Origin Policy (SOP) and
prevents malicious access to internal resources by external entities.

Journal
Yehuda Afek, Anat Bremler-Barr, Shir Landau Feibish
ACM/IEEE Transactions on Networking,
2019

We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P the messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A the messages captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings that are frequently found in A and not in P , thus allowing the identification of the attack packets. This is an important tool in protecting sites on the Internet from worm attacks and distributed denial of service attacks and may also be useful for other problems, including command and control identification and the DNA-sequences analysis. The main contributions of this paper are the system we developed to extract the required signatures together with the string-heavy hitters problem definition and the algorithm for solving this problem. This algorithm finds popular strings of variable length in a set of messages, using, in a tricky way, the classic heavy-hitter algorithm as a building block. The algorithm runs in linear time requiring one-pass over the input. Our system makes use of this algorithm to extract the desired signatures. Furthermore, we provide an extended algorithm which is able to identify groups of signatures, often found together in the same packets, which further improves the quality of signatures generated by our system. Using our system, a yet unknown attack can be detected and stopped within minutes from attack start time.

Poster and brief announcement
Yehuda Afek, Anat Bremler-Barr, Lior Shafir, Neta Peleg, Matan Sabag
SIGCOMM,
2019

In this work we measure what percentage of DNS recursive
resolvers perform negative caching in the wild. We deploy
our own authoritative name server and harness thousands
of RIPE Atlas [3] sites spread over the globe to perform
repeated DNS queries for non-existing sub-domains of our
authoritative domain.

Journal
Yehuda Afek, Anat Bremler-Barr, Liron Schiff, Shir Landau Feibish
Computer Networks,
2018

Efficient algorithms and techniques to detect and identify large flows in a high throughput traffic stream in the SDN match-and-action model are presented. This is in contrast to previous work that either deviated from the match and action model by requiring additional switch level capabilities or did not exploit the SDN data plane. Our construction has two parts; (a) new methods to efficiently sample in an SDN match and action model, (b) new and efficient algorithms to detect large flows efficiently and in a scalable way, in the SDN model.

Our large flow detection methods provide high accuracy and present a good and practical tradeoff between switch – controller traffic, and the number of entries required in the switch flow table. Based on different parameters, we differentiate between heavy flows, elephant flows and bulky flows and present efficient algorithms to detect flows of the different types.

Additionally, as part of our heavy flow detection scheme, we present sampling methods to sample packets with arbitrary probability p per packet or per byte that traverses an SDN switch.

Finally, we show how our algorithms can be adapted to a distributed monitoring SDN setting with multiple switches, and easily scale with the number of monitoring switches.

Journal
Anat Bremler-Barr, David Hay, Yotam Harchol, Yacov Hel-Or
ACM/IEEE Transactions on Networking,
2018

We present range encoding with no expansion (RENÉ)- a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems, such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENÉ on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Idan Moyal , Liron Schiff
IFIP Networking,
2017

Memcached is an in-memory key-value distributed
caching solution, commonly used by web servers for fast content delivery. Keys with their values are distributed between Memcached servers using a consistent hashing technique, resulting in an even distribution (of keys) among the servers. However, as small number of keys are significantly more popular than others (a.k.a., hot keys), even distribution of keys may cause a significantly different request load on the Memcached servers, which, in turn, causes substantial performance degradation.
Previous solutions to this problem require complex application level solutions and extra servers. In this paper, we propose MBalancer–a simple L7 load balancing scheme for Memcached that can be seamlessly integrated into Memcached architectures running over Software-Defined Networks (SDN). In a nutshell, MBalancer runs as an SDN application and duplicates the hot keys to many (or all) Memcached servers. The SDN controller updates the SDN switches forwarding tables and uses SDN readymade load balancing capabilities. Thus, no change is required to 1Memcached clients or servers.
Our analysis shows that with minimal overhead for storing
a few extra keys, the number of requests per server is close to balanced (assuming requests for keys follows a Zipf distribution).
Moreover, we have implemented MBalancer on a hardware based OpenFlow switch. As MBalancer offloads requests from bottleneck Memcached servers, our experiments show that it achieves significant throughput boost and latency reduction.

Conferences & Workshops
Anat Bremler-Barr, Mor Sides, Eli Brosh
INFOCOM,
2017

Auto-scaling mechanisms are an important line of defense against Distributed Denial of Service (DDoS) in the cloud. Using auto-scaling, machines can be added and removed in an on-line manner to respond to fluctuating load. It is commonly believed that the auto-scaling mechanism casts DDoS attacks into Economic Denial of Sustainability (EDoS) attacks. Rather than suffering from performance degradation up to a total denial of service, the victim suffers only from the economic damage incurred by paying for the extra resources required to process the bogus traffic of the attack.
Contrary to this belief, we present and analyze the Yo-Yo attack, a new attack against the auto-scaling mechanism, that can cause significant performance degradation in addition to economic damage. In the Yo-Yo attack, the attacker sends peri- odic bursts of overload, thus causing the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The Yo- Yo attack is harder to detect and requires less resources from the attacker compared to traditional DDoS. We demonstrate the attack on Amazon EC2 [4], and analyze protection measures the victim can take by reconfiguring the auto-scaling mechanism.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Lior Shafir
INFOCOM,
2017

Traditional DDoS anti-spoofing scrubbers require dedicated middleboxes thus adding CAPEX, latency and complexity in the network. This paper starts by showing that the current SDN match-and-action model is rich enough to implement a collection of anti-spoofing methods. Secondly we develop and utilize advance methods for dynamic resource sharing to distribute the required mitigation resources over a network of switches.
None of the earlier attempts to implement anti-spoofing in SDN actually directly exploited the match and action power of the switch data plane. They required additional functionalities on top of the match-and-action model, and are not implementable on an SDN switch as is. Our method builds on the premise that an SDN data path is a very fast and efficient engine to perform low level primitive operations at wire speed. The solution requires a number of flow-table rules and switch-controller messages proportional to the legitimate traffic. To scale when protecting multiple large servers the flow tables of multiple switches are harnessed in a distributed and dynamic network based solution.
We have fully implemented all our methods in either OpenFlow1.5 in Open-vSwitch and in P4. The system mitigates spoofed attacks on either the SDN infrastructure itself or on downstream servers.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Edith Cohen, Michal Sagam, Shir Landau Feibish
HotWeb ,
2017

Random Subdomain DDoS attacks on the Domain Name System (DNS) infrastructure are becoming a popular vector in recent attacks (e.g., recent Mirai attack on Dyn). In these attacks, many queries are sent for a single or a few victim domains, yet they include highly varying non-existent subdomains generated randomly.
Motivated by these attacks we designed and implemented novel and efficient algorithms for distinct heavy hitters (dHH). A (classic) heavy hitter (HH) in a stream of elements is a key (e.g., the domain of a query) which appears in many elements (e.g., requests).
When stream elements consist of <key, subkey> pairs, (<domain, subdomain>) a distinct heavy hitter (dhh) is a key that is paired with a large number of different subkeys. Our algorithms dominate previous designs in both the asymptotic (theoretical) sense and practicality. Specifically, the new fixed-size algorithms are simple to code and with asymptotically optimal space accuracy tradeoffs.
Based on these algorithms, we build and implement a system for the detection and mitigation of Random Subdomain DDoS attacks. We perform experimental evaluation, demonstrating the effectiveness of our algorithms.

Patents
Anat Bremler-Barr
Patent,
2017

A method for network protection includes monitoring data packets transmitted to and from an IoT device (24, 26, 28, 30) over a network (38, 46). Based on the monitored data packets, a set of one or more endpoints (42, 44) that are authorized to communicate with the IoT device via the network is identified. Among the monitored data packets, an attempt to communicate with the IoT device by an endpoint (52) that is not a member of the identified set is detected. Responsively to detecting the attempt, a protective action is performed at a guard location in the network, between the endpoint and the IoT device, so as to mitigate unauthorized communications with the IoT device.

Poster and brief announcement
Anat Bremler-Barr, Mor Sides, Elisha Rosensweig
SIGCOMM,
2016
Projects, thesis, and dissertations
Lior Barak, Yotam Harchol, Anat Bremler-Barr
Project,
2016

Today, most of the network traffic need to traverse through several middleboxes before it can reach its destination. Common operation between the many of these middlebox is DPI – Deep Packet Inspection, which allows to perform different actions based on patterns in the packets content.
DPI consumes many of the middlebox resources during its operation. In addition, each packet usually traverses several middleboxes which causes the same packet to be scanned by different DPI engines over and over again. As a result the network becomes less efficient, which affects directly its total bandwidth.
One solution for those issues is a system that provide DPI as service. Means, the different middleboxes in the network that need DPI, can register to the service and expose their desired patterns. The System will direct the packets to a designated DPI engine instances across the network and pass the pattern matches, if exists, to the relevant middlebox.
There are many advantages in such system, among others: a single scan of every packet, the ability to upgrade to latest DPI algorithms, better partition of packets between DPI engines and increasing middlebox development innovation. Developing such a system is more simple today than ever with the emerging of SDN, which allows dynamic routing of the network traffic using a centralized controller.
The goal of this work is to implement a prototype of the DPI as a service system and to provide a realistic as possible environment to evaluate it. This paper documents the design and implementation of the system and other tools which are needed to deploy functioning network that uses this system.
Finally, the paper describes the experiments done to prove the system correctness and effectiveness and discusses their results.

Projects, thesis, and dissertations
By Asher Gruber (Supervisor: Prof. Anat Bremler-Barr)
Project,
2016

Deep Packet Inspection (DPI) is a widespread functionality among middlebox applications. In many cases, DPI is performed by Intrusion Detection Systems (IDS) such as Snort. Traditionally, each packet is re-scanned by multiple middleboxes until reaching its final destination. Recent studies show that DPI is one of the most processing intensive tasks of modern middleboxes.
The DPI as a Service paper, presents a framework which allows to extract the time-consuming task of DPI out of the middleboxes while providing it as a service. Alongside with the framework design, the authors introduce a reference implementation on a simulated environment, while demonstrating promising results through a set of experiments.
In this work we have enhanced the reference implementation in order to demonstrate that the framework can operate in a more realistic environment setup. First, and foremost, we have integrated the framework with the commonly used Snort IDS. Second, we have extended the DPI Service to support the Network Service Header (NSH) protocol which allows passing of the pattern match results with the inspected packet. These significant enhancements, transformed the reference implementation to a more robust system, which can take proactive measures in an event of malicious pattern detection.
Finally, once the work on the framework was completed we were able to perform the basic experiments which were reported in the original paper. Our findings indicate, that the original framework results are reproducible in a our version of the framework.

Conferences & Workshops
Anat Bremler-Barr, Alon Atari
INFOCOM,
2016

Monitoring Round-Trip Time provides important insights for network troubleshooting and traffic engineering. The common monitoring technique is to actively send probe packets from selected vantage points (hosts or middleboxes). In traditional networks, the control over the network routing is limited, making it impossible to monitor every selected path. The emerging concept of Software Defined Networking simplifies network control. However, OpenFlow, the common SDN protocol, does not support RTT monitoring as part of its specification. In this paper, we leverage the ability of OpenFlow to control the routing, and present GRAMI, the Granular RTT Monitoring Infrastructure. GRAMI uses active probing from selected vantage points for efficient RTT monitoring of all the links and any round-trip path between any two switches in the network.
GRAMI was designed to be resource efficient. It requires only four flow entries installed on every switch in order to enable RTT monitoring of all the links. For every round-trip path selected by the user, it requires a maximum of two additional flow entries installed on every switch along the measured path. Moreover, GRAMI uses a minimal number of probe packets and does not require the involvement of the controller during online RTT monitoring.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol, Yacov Hel-Or
SPAA,
2016

We present RENE ́ — a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENE ́ on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol
SIGCOMM,
2016

We present OpenBox — a software-defined framework for network-wide development, deployment, and management of network functions (NFs). OpenBox effectively decouples the control plane of NFs from their data plane, similarly to SDN solutions that only address the network’s forwarding plane.
OpenBox consists of three logic components. First, user-defined OpenBox applications provide NF specifications through the OpenBox north-bound API. Second, a logically-centralized OpenBox controller is able to merge logic of multiple NFs, possibly from multiple tenants, and to use a network-wide view to efficiently deploy and scale NFs across the network data plane. Finally, OpenBox instances constitute OpenBox’s data plane and are implemented either purely in software or contain specific hardware accelerators (e.g., a TCAM). In practice, different NFs carry out similar processing steps on the same packet, and our experiments indeed show a significant improvement of the network performance when using OpenBox. Moreover, OpenBox readily supports smart NF placement, NF scaling, and multi-tenancy through its controller.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Daniel Krauthgamer, Shimrit Tzur David
IFIP Networking,
2016

URL matching lies at the core of many networking applications and Information Centric Networking architectures. For example, URL matching is extensively used by Layer 7 switches, ICN/NDN routers, load balancers, and security devices. Modern URL matching is done by maintaining a rich database that consists of tens of millions of URL which are classified to dozens of categories (or egress ports). In real-time, any input URL has to be searched in this database to find the corresponding category.
In this paper, we introduce a generic framework for accurate URL matching (namely, no false positives or miscategorization) that aims to reduce the overall memory footprint, while still having low matching latency. We introduce a dictionary-based compression method that compresses the database by 60%, while having only a slight overhead in time. Our framework is very flexible and it allows hot-updates,
cloud-based deployments, and can deal with strings that are not URLs.

Journal
Yehuda Afek, Anat Bremler-Barr, David Hay, Yotam Harchol
ACM/IEEE Transactions on Networking,
2016

This paper starts by demonstrating the vulnerability of Deep Packet Inspection (DPI) mechanisms, which are at the core of security devices, to algorithmic complexity denial of service attacks, thus exposing a weakness in the first line of defense of enterprise networks and clouds. A system and a multi-core architecture to defend from these algorithmic complexity attacks is presented in the second part of the paper. The integration of this system with two different DPI engines is demonstrated and dis- cussed. The vulnerability is exposed by showing how a simple low bandwidth cache-miss attack takes down the Aho-Corasick (AC) pattern matching algorithm that lies at the heart of most DPI engines. As a first step in the mitigation of the attack, we have developed a compressed variant of the AC algorithm that improves the worst case performance (under an attack). Still, under normal traffic its running-time is worse than classical AC implementations. To overcome this problem, we introduce MCA^2 —Multi- Core Architecture to Mitigate Complexity Attacks, which dynamically combines the classical AC algorithm with our compressed implementation, to provide a robust solution to mitigate this cache- miss attack. We demonstrate the effectiveness of our architecture by examining cache-miss algorithmic complexity attacks against DPI engines and show a goodput boost of up to 73%. Finally, we show that our architecture may be generalized to provide a principal solution to a wide variety of algorithmic complexity attacks.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol, Shimrit Tzur David
INFOCOM,
2015

Deep Packet Inspection (DPI) plays a major role in contemporary networks. Specifically, in datacenters of content providers, the scanned data may be highly repetitive. Most DPI engines are based on identifying signatures in the packet payload. This pattern matching process is expensive both in memory and CPU resources, and thus, often becomes the bottleneck of the entire application.
In this paper we show how DPI can be accelerated by leveraging repetitions in the inspected traffic. Our new mechanism makes use of these repetitions to allow the repeated data to be skipped rather than scanned again. The mechanism consists of a slow path, in which frequently repeated strings are identified and stored in a dictionary, along with some succinct information for accelerating the DPI process, and a data path, where the traffic is scanned byte by byte but strings from the dictionary, if encountered, are skipped. Upon skipping, the data path recovers to the state it would have been in had the scanning continued byte by byte.
Our solution achieves a significant performance boost, especially when data is from the same content source (e.g., the same website). Our experiments show that for such cases, our solution achieves a throughput gain of 1.25 − 2.5 times the original throughput, when implemented in software.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Liron Schiff
ACM/IEEE ANCS,
2015

Configuring range based packet classification rules in network switches is crucial to all network core functionalities, such as firewalls and routing. However, OpenFlow, the leading management protocol for SDN switches, lacks the interface to configure range rules directly and only provides mask based rules, named flow entries.
In this work we present, ORange, the first solution to multi dimensional range classification in OpenFlow. Our solution is based on paradigms used in state of the art non-OpenFlow classifiers and is designed in a modular fashion allowing future extensions and improvements. We consider switch space utilization as well as atomic updates functionality, and in the network context we provide flow consistency even if flows change their entrance point to the network during policy updates, a property we name cross-entrance consistency. Our scheme achieves remarkable results and is easy to deploy.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol, Yacov Hel-Or
ACM DaMoN,
2015

Similarity search, and specifically the nearest-neighbor search (NN) problem is widely used in many fields of computer science such as machine learning, computer vision and databases. However, in many settings such searches are known to suffer from the notorious curse of dimensionality, where running time grows exponentially with d. This causes severe performance degradation when working in high-dimensional spaces. Approximate techniques such as locality-sensitive hashing improve the performance of the search, but are still computationally intensive.
In this paper we propose a new way to solve this problem using a special hardware device called ternary content addressable memory (TCAM). TCAM is an associative memory, which is a special type of computer memory that is widely used in switches and routers for very high speed search applications. We show that the TCAM computational model can be leveraged and adjusted to solve NN search problems in a single TCAM lookup cycle, and with linear space. This concept does not suffer from the curse of dimensionality and is shown to improve the best known approaches for NN by more than four orders of magnitude. Simulation results demonstrate dramatic improvement over the best known approaches for NN, and suggest that TCAM devices may play a critical role in future large-scale databases and cloud applications.

Poster and brief announcement
Anat Bremler-Barr, Mor Sides and Elisha Rosensweig
SIGCOMM,
2015
Conferences & Workshops
Anat Bremler-Barr, David Hay, Yaron Koral, Michela Becchi, Omer Kochba
INFOCOM,
2015

This paper focuses on regular expression matching over compressed traffic. The need for such matching arises from two independent trends. First, the volume and share of compressed HTTP traffic is constantly increasing. Second, due to their superior expressibility, current Deep Packet Inspection engines use regular expressions more and more frequently.
We present an algorithmic framework to accelerate such matching, taking advantage of information gathered when the traffic was initially compressed. HTTP compression is typically performed through the GZIP protocol, which uses back-references to repeated strings. Our algorithm is based on calculating (for every byte) the minimum number of (previous) bytes that can be part of a future regular expression matching. When inspecting a back-reference, only these bytes should be taken into account, thus enabling one to skip repeated strings almost entirely without missing a match. We show that our generic framework works with either NFA-based or DFA-based implementations and gains performance boosts of more than 70%. Moreover, it can be readily adapted to most existing regular expression matching algorithms, which usually are based either on NFA, DFA or combinations of the two. Finally, we discuss other applications in which calculating the number of relevant bytes becomes handy, even when the traffic is not compressed.

Poster and brief announcement
Netanel Cohen Anat Bremler-Barr
ACM SoCC Poster Session,
2015
Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol
ACM SIGCOMM HotMiddleboxes,
2015
Journal
Yehuda Afek, Anat Bremler-Barr, Liron Schiff
Computer Networks,
2014

A recursive and fast construction of an n-element priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date
require either O(log n) instructions per operation or, exponential (with key size) space or, expensive special hardware whose cost and latency dramatically increases with the priority queue size.
Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues
(which are also much faster) while maintaining the O(1) steps per PQ operation is critical. Here
we present such an acceleration technique called the Power Priority Queue (PPQ) technique.
Specifically, an n-element PPQ is constructed from 2k − 1 primitive priority queues of size k √n
(k = 2, 3, …) and a RAM of size n, where the throughput of the construct beats that of a single,
size n primitive hardware priority queue. For example an n-element PQ can be constructed from
either three √n or five 3√n primitive H/W priority queues.
Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This
demonstrates the benefits of our scheme; when used with hardware TCAM. We expect similar results with systolic arrays, shift-registers and similar technologies.
As a byproduct of our technique we present an O(n) time sorting algorithm in a system
equipped with a O(w√n) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used
an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n-element with TCAM of size O(n) that matches our TCAM based sorting algorithm.

Journal
Anat Bremler-Barr, Hanoch Levy, Udi Ben-Porat
Computer Networks,
2014

Performance analysis has been used for over half a century for the design and operations of computer systems and communications networks. This methodology, mostly stochastic in nature, has been based on the underlying paradigm that users are innocent and “well behaving”. Following the flourish of the internet and the subsequent rise of malicious incidents, research started investigating the effect of malicious attacks aimed at degrading system performance.

The objective of this work is to understand how system performance is affected by malicious behavior and how performance evaluation should account for it. We do so by considering an array of “classical” systems taken from the literature and examining their degree of vulnerability. These can also serve in providing some initial insights into what makes a system design vulnerable or resilient.

Journal
Anat Bremler-Barr, Hanoch Levy, Udi Ben-Porat
Computer Networks,
2014

Performance analysis has been used for over half a century for the design and operations of computer systems and communications networks. This methodology, mostly stochastic in nature, has been based on the underlying paradigm that users are innocent and “well behaving”. Following the flourish of the internet and the subsequent rise of malicious incidents, research started investigating the effect of malicious attacks aimed at degrading system performance. The objective of this work is to understand how system performance is affected by malicious behavior and how performance evaluation should account for it. We do so by considering an array of “classical” systems taken from the literature and examining their degree of vulnerability. These can also serve in providing some initial insights into what makes a system design vulnerable or resilient.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol, Yaron Koral
ACM CoNEXT,
2014

Middleboxes play a major role in contemporary networks, as for- warding packets is often not enough to meet operator demands, and other functionalities (such as security, QoS/QoE provisioning, and load balancing) are required. Traffic is usually routed through a sequence of such middleboxes, which either reside across the net- work or in a single, consolidated location. Although middleboxes provide a vast range of different capabilities, there are components that are shared among many of them.

A task common to almost all middleboxes that deal with L7 protocols is Deep Packet Inspection (DPI). Today, traffic is inspected from scratch by all the middleboxes on its route. In this paper, we propose to treat DPI as a service to the middleboxes, implying that traffic should be scanned only once, but against the data of all middleboxes that use the service. The DPI service then passes the scan results to the appropriate middleboxes. Having DPI as a service has significant advantages in performance, scalability, robustness, and as a catalyst for innovation in the middlebox domain. Moreover, technologies and solutions for current Software Defined Networks (SDN) (e.g., SIMPLE [42]) make it feasible to implement such a service and route traffic to and from its instances.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Liron Schiff
SPAA,
2013

A recursive and fast construction of an n elements priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date either require O (log n ) instructions per operation or exponential (with key size) space or expensive special hardware whose cost and latency dramatically increases with the priority queue size. Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues (which are also much faster)
while maintaining the O (1) steps per PQ operation is critical. Here we present such an acceleration technique called the Power Priority Queue (PPQ) technique. Specifically, an n elements PPQ is constructed from 2k −1 primitive priority queues of size k√n ( k = 2 , 3 , … ) and a RAM of size n , where the throughput of the construct beats that of a single, size n primitive hardware priority queue. For example an n elements PQ can be constructed from either three √n or five 3√n primitive H/W priority queues.
Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This demonstrates the benefits of our scheme when used with hardware TCAM, we expect similar results with systolic arrays,
shift-registers and similar technologies.
As a by product of our technique we present an O(n) time sorting algorithm i n a system equipped with a O(w√n) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n elements with TCAM of size O(n) that matches our TCAM based sorting algorithm.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Shir Landau Feibish
ACM/IEEE ANCS,
2013

We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P of messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings, that are frequently found in A and not in P . Therefore, a packet containing one of the strings from S is likely to be an attack packet.
This is an important tool in protecting sites on the Internet from Worm attacks, and Distributed Denial of Service (DDoS) attacks. It may also be useful for other problems, including command and control identification, DNA-sequences analysis, etc. which are beyond the scope of this work.
Two contributions of this paper are the system we developed to extract the required signatures together with the problem definition and the string-heavy hitters algorithm. This algorithm finds popular strings of variable length in a set of messages, using, in a tricky way, the classic heavy-hitter algorithm as a building block. This algorithm is then used by our system to extract the desired signatures. Using our system a yet unknown attack can be detected and stopped within minutes from attack start time.

Journal
Anat Bremler-Barr, David Hay, Yaron Koral
ACM/IEEE Transactions on Networking,
2013

A central component in all contemporary intrusion detection systems (IDSs) is their pattern matching algorithms, which are often based on constructing and traversing a deterministic finite automaton (DFA) that represents the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs, which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name used by common DFA encoding is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state are represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Match (LPM), which can be solved either in ternary content-addressable memory (TCAM), in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gb/s with low power consumption.

Journal
Yehuda Afek, Anat Bremler-Barr, Yaron Koral
Computer Communication,
2012

In this paper we focus on the process of deep packet inspection of compressed web traffic. The major limiting factor in this process imposed by the compression, is the high memory requirements of 32KB per connection. This leads to the requirements of hundreds of megabytes to gigabytes of main memory on a multi-connection setting. We introduce new algorithms and techniques that drastically reduce this space requirement for such bump-in-the-wire devices like security and other content based networking tools. Our proposed scheme improves both space and time performance by almost 80% and over 40% respectively, thus making real-time compressed traffic inspection a viable option for networking devices.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yaron Koral, Shimrit Tzur David
INFOCOM,
2012

Deep Packet Inspection (DPI) is the most time and resource consuming procedure in contemporary security tools such as Network Intrusion Detection/Prevention System (NIDS/IPS), Web Application Firewall (WAF), or Content Filtering Proxy. DPI consists of inspecting both the packet header and payload and alerting when signatures of malicious software appear in the traffic. These signatures are identified through pattern matching algorithms.
The portion of compressed traffic of overall Internet traffic is constantly increasing. This paper focuses on traffic compressed using shared dictionary. Unlike traditional compression algorithms, this compression method takes advantage of the inter-response redundancy (e.g., almost the same data is sent over and over again) as in nowadays dynamic Data. Shared Dictionary Compression over HTTP (SDCH), introduced by Google in 2008, is the first algorithm of this type. SDCH works well with other compression algorithm (as Gzip), making it even more appealing. Performing DPI on any compressed traffic is considered hard, therefore today’s security tools either do not inspect compressed data, alter HTTP headers to avoid compression, or decompress the traffic before inspecting it.
We present a novel pattern matching algorithm that inspects SDCH-compressed traffic without decompressing it first. Our algorithm relies on offline inspection of the shared dictionary, which is common to all compressed traffic, and marking auxiliary information on it to speed up the online DPI inspection. We show that our algorithm works near the rate of the compressed traffic, implying a speed gain of SDCH’s compression ratio (which is around 40%). We also discuss how to deal with SDCH compression over Gzip compression, and show how to perform regular expression matching with about the same speed gain.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, David Hay, Yotam Harchol, Yaron Koral
ACM/IEEE ANCS,
2012

This paper takes advantage of the emerging multi-core computer architecture to design a general framework for mitigating network-based complexity attacks. In complexity attacks, an attacker carefully crafts “heavy” messages (or packets) such that each heavy message consumes substantially more resources than a normal message. Then, it sends a sufficient number of heavy messages to bring the system to a crawl at best. In our architecture, called MCA^2 – Multi-Core Architecture for Mitigating Complexity Attacks – cores quickly identify such suspicious messages and divert them to a fraction of the cores that are dedicated to handle all the heavy messages. This keeps the rest of the cores relatively unaffected and free to provide the legitimate traffic the same quality of service as if no attack takes place.
We demonstrate the effectiveness of our architecture by examining cache-miss complexity attacks against Deep Packet Inspection (DPI) engines. For example, for Snort DPI engine, an attack in which 30% of the packets are malicious degrades the system throughput by over 50%, while with MCA^2 the throughput drops by either 20% when no packets are dropped or by 10% in case dropping of heavy packets is allowed. At 60% malicious packets, the corresponding numbers are 70%, 40% and 23%.

Conferences & Workshops
Anat Bremler-Barr, Hanoch Levy, Udi Ben-Porat
IFIP Networking,
2012
Journal
Anat Bremler-Barr, Danny Handler
IEEE Transactions on Computer,
2012

Ternary content-addressable memories (TCAMs) are increasingly used for high-speed packet classification. TCAMs compare packet headers against all rules in a classification database in parallel and thus provide high throughput unparalleled by software-based solutions. TCAMs are not well-suited, however, for representing rules that contain range fields. Such rules typically have to be represented (or encoded) by multiple TCAM entries. The resulting range expansion can dramatically reduce TCAM utilization. A TCAM range-encoding algorithm A is database-independent if, for all ranges r, it encodes r independently of the database in which it appears; otherwise, we say that A is database-dependent. Typically, when storing a classification database in TCAM, a few dozens of so-called extra bits in each TCAM entry remain unused. These extra bits are used by some (both database-dependent and database-independent) prior algorithms to reduce range expansion. The majority of real-life database ranges are short. We present a novel database-independent algorithm called Short Range Gray Encoding (SRGE) for the efficient representation of short range rules. SRGE encodes range endpoints as binary-reflected Gray codes and then represents the resulting range by a minimal set of ternary strings. To the best of our knowledge, SRGE is the first algorithm that achieves a reduction in range expansion in general, and a significant expansion reduction for short ranges in particular, without resorting to the use of extra bits. The “traditional” database-independent technique for representing range entries in TCAM is prefix expansion. As we show, SRGE significantly reduces the expansion of short ranges in comparison with prefix expansion. We also prove that the SRGE algorithm’s range expansion is at least as good as that of prefix expansion for any range. Real-world classification databases contain a small number of unique long ranges, some of which appear in numerous rules. These long ranges cause high expansion which is not significantly reduced by any database-independent range encoding scheme that we are aware of, including SRGE. We introduce hybrid SRGE, a database-dependent encoding scheme that uses SRGE for reducing the expansion of short ranges and uses extra bits for reducing the expansion caused by long ones. Our comparative analysis establishes that hybrid SRGE utilizes TCAM more efficiently than previously published range-encoding algorithms. This work also makes a more theoretic contribution. Prefix expansion for ranges defined by W-bit endpoints has worst-case expansion ratio of 2W-2. It follows from the work of Schieber et al. [1] that the SRGE algorithm has a slightly better worst-case expansion ratio of 2W-4. We prove that any independent TCAM encoding scheme has worst-case expansion ratio of at least W.

Journal
Yehuda Afek, Anat Bremler-Barr, Yaron Koral
Computer Communication,
2012

In this paper we focus on the process of deep packet inspection of compressed web traffic. The major limiting factor in this process imposed by the compression, is the high memory requirements of 32 KB per connection. This leads to the requirements of hundreds of megabytes to gigabytes of main memory on a multi-connection setting. We introduce new algorithms and techniques that drastically reduce this space requirement for such bump-in-the-wire devices like security and other content based networking tools. Our proposed scheme improves both space and time performance by almost 80% and over 40% respectively, thus making real-time compressed traffic inspection a viable option for networking devices.

    Journal
    Anat Bremler-Barr, David Hay, Danny Handler
    Computer Networks,
    2012

    Ternary content-addressable memories (TCAMs) are increasingly used for high-speed packet classification. TCAMs compare packet headers against all rules in a classification database in parallel and thus provide high throughput.

    TCAMs are not well-suited, however, for representing rules that contain range fields and previously published algorithms typically represent each such rule by multiple TCAM entries. The resulting range expansion can dramatically reduce TCAM utilization because it introduces a large number of redundant TCAM entries. This redundancy can be mitigated by making use of extra bits, available in each TCAM entry.

    We present a scheme for constructing efficient representations of range rules, based on the simple observation that sets of disjoint ranges may be encoded much more efficiently than sets of overlapping ranges. Since the ranges in real-world classification databases are, in general, non-disjoint, the algorithms we present split ranges between multiple layers, each of which consisting of mutually disjoint ranges. Each layer is then coded and assigned its own set of extra bits.

    Our layering algorithms are based on approximations for specific variants of interval-graph coloring. We evaluate these algorithms by performing extensive comparative analysis on real-life classification databases. Our analysis establishes that our algorithms reduce the number of redundant TCAM entries caused by range rules by more than 60% as compared with best range-encoding prior work.

    Patents
    Yehuda Afek, Anat Bremler-Barr, Yaron Koral
    Patent,
    2012

    A method for processing communication traffic includes receiving an incoming stream of compressed data conveyed by a sequence of data packets, each containing a respective portion of the compressed data. The respective portion of the compressed data contained in the first packet is stored in a buffer, having a predefined buffer size. Upon receiving a subsequent packet, at least a part of the compressed data stored in the buffer and the respective portion of the compressed data contained in the subsequent packet are decompressed, thereby providing decompressed data. A most recent part of the decompressed data that is within the buffer size is recompressed and stored in the buffer.

    Patents
    Anat Bremler-Barr, David Hay, Yaron Koral
    Patent,
    2011

    A method for processing data includes encoding a finite automaton, which includes states and transitions between the states that express a plurality of predefined patterns, by grouping the states of the automaton into sets according to a common property shared by the states in each set, and assigning codes to the states according to the grouping. The codes are stored in an electronic memory, along with rules that are associated with the patterns. The automaton is traversed in order to identify one or more of the patterns in an input sequence of data elements by iteratively reading out the codes from the memory responsively to the data elements and to the codes that have been previously read out. Upon identifying a given pattern in the input sequence, an associated action is performed.

    Conferences & Workshops
    Yehuda Afek, Anat Bremler-Barr, Yaron Koral
    IFIP Networking,
    2011

    Compressing web traffic using standard GZIP is becoming both popular and challenging due to the huge increase in wireless web devices, where bandwidth is limited. Security and other content based networking devices are required to decompress the traffic of tens of thousands concurrent connections in order to insp ect the content for different signatures. The major limiting factor in this pro cess is the high memory requirements of 32KB p er connection that leads to hundreds of megabytes to gigabytes of main memory consumption. This requirement inhibits most devices from handling compressed traffic, which in turn either limits traffic compression or intro duces security holes and other dysfunctionalities. In this paper we introduce new algorithms and techniques that drastically reduce this space requirement by over 80%, with only a slight increase in the time overhead, thus making real-time compressed traffic inspection a viable option for network devices.

    Conferences & Workshops
    Anat Bremler-Barr, David Hay, Yotam Harchol
    IEEE HPSR,
    2011

    Deep Packet Inspection (DPI) lies at the core of contemporary Network Intrusion Detection/Prevention Systems and Web Application Firewalls. DPI aims to identify various malware (including spam and viruses) by inspecting both the header and the payload of each packet and comparing it to a known set of patterns. DPI is often performed on the critical path of the packet processing, thus the overall performance of the security tools is dominated by the speed of DPI. The seminal algorithm of Aho-Corasick (AC) is the de-facto standard for pattern matching in network intrusion detection systems (NIDS). Basically, the AC algorithm constructs a Deterministic Finite Automaton (DFA) for detecting all occurrences of a given set of patterns by processing the input in a single pass. The input is inspected symbol by symbol (usually each symbol is a byte), such that each symbol results in a state transition. Thus, in principle, the AC algorithm has deterministic performance, which does not depend on specific input and therefore is not vulnerable to algorithmic complexity attacks, making it very attractive to NIDS systems.
    In this paper we show that, when implementing the AC algorithm in software, this property does not hold, due to the fact that contemporary pattern sets induce very large DFAs that cannot be stored entirely in cache. We then propose a novel technique to compress the representation of the Aho-Corasick automaton, so it can fit in modern cache. We compare both the performance and the memory footprint of our technique to previously-proposed implementation, under various settings and pattern sets. Our results reveal the space-time tradeoffs of DPI. Specifically, we show that our compression technique reduces the memory footprint of the best prior-art algorithm by approximately 60% , while achieving comparable throughput.

    Conferences & Workshops
    Anat Bremler-Barr, Yaron Koral, Victor Zigdon
    IEEE HPSR,
    2011

    Compressing web traffic using standard GZIP is becoming both popular and challenging due to the huge increase in wireless web devices, where bandwidth is limited. Security and other content based networking devices are required to decompress the traffic of tens of thousands concurrent connections in order to inspect the content for different signatures. The overhead imposed by the decompression inhibits most devices from handling compressed traffic, which in turn either limits traffic compression or introduces security holes and other disfunctionalities.
    The ACCH algorithm was the first to present a unified approach to pattern matching and decompression, by taking advantage of information gathered in the decompression phase to accelerate the pattern matching. ACCH accelerated the DFA-based Aho-Corasick multi-pattern matching algorithm. In this paper, we present a novel algorithm, SPC (Shift-based Pattern matching for Compressed traffic) that accelerates the commonly used Wu-Manber pattern matching algorithm. SPC is simpler and has higher throughput and lower storage overhead than ACCH. Analysis of real web traffic and real security devices signatures shows that we can skip scanning up to 87 . 5% of the data and gain performance boost of more than 51% as compared to ACCH. Moreover, the additional storage requirement of the technique requires only 4 KB additional information per connection as compared to 8 KB of ACCH.

    Journal
    Anat Bremler-Barr, Yaron Koral
    ACM/IEEE Transactions on Networking,
    2011

    Current security tools, using ‘signature-based’ detection, do not handle compressed traffic, whose market-share is constantly increasing. This paper focus on compressed HTTP traffic. HTTP uses GZIP compression and requires some kind of decompression phase before performing a string-matching.
    We present a novel algorithm, Aho-Corasick-based algorithm for Compressed HTTP (ACCH), that takes advantage of information gathered by the decompression phase in order to accelerate the commonly used Aho-Corasick pattern matching algorithm. By analyzing real HTTP traffic and real web application firewall signatures, we show that up to 84% of the data can be skipped in its scan. Surprisingly, we show that it is faster to perform pattern matching on the compressed data, with the penalty of decompression, than on regular traffic. As far as we know, we are the first paper that analyzes the problem of ‘on-the-fly’ multi-pattern matching on compressed HTTP traffic and suggest a solution

    Patents
    Anat Bremler-Barr, Victor Zidon, Yaron Koral
    Patent,
    2010

    A method for processing data includes accepting a specification of a plurality of patterns, each pattern defining a respective uncompressed sequence of symbols. Multi-pattern matching is applied to an incoming stream of compressed communication traffic containing compression metadata so as to identify the patterns occurring in the stream while using the compression metadata to skip over parts of the stream.

    Patents
    Anat Bremler-Barr, David Hay, Ronny Roth
    ,
    2010
    A method for error detection includes storing in an associative memory multiple data entries, each data entry including a data item together with one or more check symbols computed with respect to the data item. A predetermined sequence of search keys is applied to the memory, thereby causing the memory to generate, in parallel, match results with respect to the data entries. The match results are processed in order to identify an error in at least one of the data entries.

    Images (4)

    Journal
    Anat Bremler-Barr, David Hay, Danny Handler, Ronny Roth
    ACM/IEEE Transactions on Networking,
    2010