Anat Bremler-Barr

Cloud

Conferences & Workshops
Anat Bremler-Barr, Hanoch Levy, Michael Czeizler, Jhonatan Tavori
INFOCOM,
2024

Today’s software development landscape has witnessed a shift towards microservices based architectures. Using this approach, large software systems are implemented by combining loosely-coupled services, each responsible for specific task and defined with separate scaling properties.
Auto-scaling is a primary capability of cloud computing which allows systems to adapt to fluctuating traffic loads by dynamically increasing (scale-up) and decreasing (scale-down) the number of resources used.

We observe that when microservices which utilize separate auto-scaling mechanisms operate in tandem to process traffic, they may perform ineffectively, especially under overload conditions, due to DDoS attacks. This can result in throttling (Denial of service — DoS) and over-provisioning of resources (Economic Denial of Sustainability — EDoS).

This paper demonstrates how an attacker can exploit the tandem behavior of microservices with different auto-scaling mechanisms to create an attack we denote as the \emph{Tandem Attack}. We demonstrate the attack on a typical \emph{Serverless} architecture and analyze its economical and performance damages. One intriguing finding is that some attacks may make a cloud customer paying for service denied requests.

We conclude that independent scaling of loosely coupled components might form an inherent difficulty and end-to-end controls might be needed.

Poster and brief announcement
Anat Bremler-Barr, Hanoch Levy, Jhonatan Tavori
ACM CoNEXT,
2023

Retry mechanisms are commonly used in microservices architectures as a mechanism for recovering from transit errors, including network failures and service overloading. This research aims at studying the operation of cloud retry mechanisms under deliberated DDoS attacks, and their effect on the application performance and operational costs. In this poster we focus on the economic aspect, and demonstrate that enabling such mechanisms improperly might be counter-productive and expose the system to substantial
and quadratic economical damage in the presence of attacks.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Daniel Bachar
IFIP Networking,
2023

With the advent of cloud and container technologies, enterprises develop applications using a microservices architecture, managed by orchestration systems (e.g. Kubernetes), that group the microservices into clusters. As the number of application setups across multiple clusters and different clouds is increasing, technologies that enable communication and service discovery between the clusters are emerging (mainly as part of the Cloud Native ecosystem).
In such a multi-cluster setting, copies of the same microservice may be deployed in different geo-locations, each with different cost and latency penalties. Yet, current service selection and load balancing mechanisms do not take into account these locations and corresponding penalties.
We present \emph{MCOSS}, a novel solution for optimizing the service selection, given a certain microservice deployment among clouds and clusters in the system. Our solution is agnostic to the different multi-cluster networking layers, cloud vendors, and discovery mechanisms used by the operators. Our simulations show a reduction in outbound traffic cost by up to 72% and response time by up to 64%, compared to the currently-deployed service selection mechanisms.

Talk
Anat Bremler-Barr, Michael Czeizler
Red Hat research ,
2023

It is a common belief that Auto-scaling mechanisms serve as a mitigation for Distributed Denial of Service (DDoS) attacks on cloud computing infrastructures by dynamically adding machines to cope with the additional load. Intuitively, such attacks are mostly associated with Economic Denial of Sustainability (EDoS) derived from paying for the extra resources required to process the malicious incoming traffic.

Contrary to this belief, we present and analyze the Yo-Yo attack, a new attack against the auto-scaling mechanism that can cause significant performance degradation in addition to economic damage. We demonstrate the attack on Amazon EC2, Kubernetes, and serverless architecture. We then present and analyze Tandem Attack, a new attack on Microservices architecture. In this attack, the attacker exploits the tandem behavior of services with different auto-scaling mechanisms, causing both economic and performance damage.

Poster and brief announcement
Anat Bremler-Barr, Michael Czeizler
INFOCOM,
2023

Auto-scaling is a fundamental capability of cloud computing which allows consuming resources dynamically according to changing traffic needed to be served.
By the micro-services architecture paradigm, software systems are built as a set of loosely-coupled applications and services that can be individually scaled.
In this paper, we present a new attack the \emph{Tandem Attack} that exploits the Tandem behavior of micro-services with different scaling properties. Such issues can result in Denial of Service (DoS) and Economic Denial of Sustainability (EDoS) created by malicious attackers or self-inflicted due to wrong configurations set up by administrators. We demonstrate the Tandem attack using a popular AWS serverless infrastructure modeling two services and show that removing servers’ management responsibility from the cloud users does not mitigate the different scaling properties challenge and can even make the problem harder to solve.

Conferences & Workshops
Anat Bremler-Barr, Ronen Ben David
CLOSER 2021,
2021

In recent years, we have witnessed a new kind of DDoS attack, the burst attack(Chai, 2013; Dahan, 2018), where the attacker launches periodic bursts of traffic overload on online targets. Recent work presents a new kind of Burst attack, the YoYo attack (Bremler-Barr et al., 2017) that operates against the auto-scaling mechanism of VMs in the cloud. The periodic bursts of traffic loads cause the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The auto-scaling mechanism translates the flat DDoS attacks into Economic Denial of Sustainability attacks (EDoS), where the victim suffers from economic damage accrued by paying for extra resources required to process the traffic generated by the attacker. However, it was shown that YoYo attack also causes significant performance degradation since it takes time to scale-up VMs.In this research, we analyze the resilience of Kubernetes auto-scaling against YoYo attacks. As containerized cloud applications using Kubernetes gain popularity and replace VM-based architecture in recent years. We present experimental results on Google Cloud Platform, showing that even though the scale-up time of containers is much lower than VM, Kubernetes is still vulnerable to the YoYo attack since VMs are still involved. Finally, we evaluate ML models that can accurately detect YoYo attack on a Kubernetes cluster

Conferences & Workshops
Anat Bremler-Barr, Mor Sides, Eli Brosh
INFOCOM,
2017

Auto-scaling mechanisms are an important line of defense against Distributed Denial of Service (DDoS) in the cloud. Using auto-scaling, machines can be added and removed in an on-line manner to respond to fluctuating load. It is commonly believed that the auto-scaling mechanism casts DDoS attacks into Economic Denial of Sustainability (EDoS) attacks. Rather than suffering from performance degradation up to a total denial of service, the victim suffers only from the economic damage incurred by paying for the extra resources required to process the bogus traffic of the attack.
Contrary to this belief, we present and analyze the Yo-Yo attack, a new attack against the auto-scaling mechanism, that can cause significant performance degradation in addition to economic damage. In the Yo-Yo attack, the attacker sends peri- odic bursts of overload, thus causing the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The Yo- Yo attack is harder to detect and requires less resources from the attacker compared to traditional DDoS. We demonstrate the attack on Amazon EC2 [4], and analyze protection measures the victim can take by reconfiguring the auto-scaling mechanism.

Poster and brief announcement
Anat Bremler-Barr, Mor Sides and Elisha Rosensweig
SIGCOMM,
2015
Poster and brief announcement
Netanel Cohen Anat Bremler-Barr
ACM SoCC Poster Session,
2015

Cybersecurity

Projects, thesis, and dissertations
Anat Bremler-Barr, Hanan Iserovich
,
2020

In this work, we argue of an urgent need to secure network infrastructure. Many
organizations suffer from low security of their network elements, placing them at elevated
risk from availability or information leakage cyber-attacks. While the IT infrastructure has
many cybersecurity defenses, network infrastructure protection is neglected for several
reasons: Most security planning and operation rely on the information assurance
methodology: prioritize protection according to the network’s information’s sensitivity.
Network elements are not a regular endpoint; they serve as a critical part of the backbone
and have dedicated operational and security needs. This unique ecosystem makes the
security lifecycle (manage and update) a significant challenge. Consequently, even simple
vulnerabilities expose this critical infrastructure to harmful malware and attacks, although
they are (mistakenly) thought to be hard to reach.
In corollary with the described above, we observed those elements being targeted by
advanced cyber-attack groups like “SHADOW BROKER” [1] [2] [3]. Analysis of those
publications shown dedicated tools exploiting the weaknesses described for various
attacks.
This work will introduce our approach to reducing the threat described by improving the
network element’s management security. We suggest a centralized approach in which all
the management traffic will pass through a dedicated IDS-like mechanism that will detect
or prevent attacks on the critical infrastructure.

Technical reports
Anat Bremler-Barr, David Hay, Tal Shapira, Shilo Daum
arxiv,
2024

With 95% of Internet traffic now encrypted, an effective approach to classifying this traffic is crucial for network security and management. This paper introduces ECHO — a novel optimization process for ML/DL-based encrypted traffic classification. ECHO targets both classification time and memory utilization and incorporates two innovative techniques.
The first component, HO (Hyperparameter Optimization of binnings), aims at creating efficient traffic representations. While previous research often uses representations that map packet sizes and packet arrival times to fixed-sized bins, we show that non-uniform binnings are significantly more efficient. These non-uniform binnings are derived by employing a hyperparameter optimization algorithm in the training stage. HO significantly improves accuracy given a required representation size, or, equivalently, achieves comparable accuracy using smaller representations.
Then, we introduce EC (Early Classification of traffic), which enables faster classification using a cascade of classifiers adapted for different exit times, where classification is based on the level of confidence. EC reduces the average classification latency by up to 90\%. Remarkably, this method not only maintains classification accuracy but also, in certain cases, improves it.
Using three publicly available datasets, we demonstrate that the combined method, Early Classification with Hyperparameter Optimization (ECHO), leads to a significant improvement in classification efficiency.

Poster and brief announcement
Anat Bremler-Barr, Tal Shapira, Daniel Alfasi
Systor,
2023

With the continuous increase in reported Common Vulnerabilities and Exposures (CVEs), security teams are overwhelmed by vast amounts of data, which are often analyzed manually, leading to a slow and inefficient process. To address cybersecurity threats effectively, it is essential to establish connections across multiple security entity databases, including CVEs, Common Weakness Enumeration (CWEs), and Common Attack Pattern Enumeration and Classification (CAPECs). In this study, we introduce a new approach that leverages the RotatE [4] knowledge graph embedding model, initialized with embeddings from Ada language model developed by OpenAI [3]. Additionally, we extend this approach by initializing the embeddings for the relations.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Dor Israeli and Alon Noy
The International Symposium on Cyber Security, Cryptology and Machine Learning (CSCML),
2023

This paper presents a new localhost browser based vulnerability and corresponding attack that opens the door to new attacks on private networks and local devices. We show that this new vulnerability may put hundreds of millions of internet users and their IoT devices at risk. Following the attack presentation, we suggest three new protection mechanisms to mitigate this vulnerability.
This new attack bypasses recently suggested protection mechanisms designed to stop browser-based attacks on private devices and local applications.

Talk
Anat Bremler-Barr
Tel-Aviv University : CS Colloquium,
2019

Computer networks have undergone and continue to experience a major transformation, whereby billions of low-cost devices, Internet of Things (IoTs), are being connected to the network. Unlike traditional network devices, these devices typically have very limited computational, memory, and power resources, and attackers often exploit them to launch large-scale attacks.

This talk will highlight the security concerns of IoT devices from a networking perspective and explore how to secure IoT devices using whitelists, in which communication between a device and an endpoint is prohibited unless that endpoint appears in the device whitelist.

We present a new scalable ISP level architecture to secure and protect IoT devices. We address several challenges in the system design: the whitelist enforcement architecture, ML techniques to identify IoT devices, and the automatic acquisition of whitelists in the wild.

Technical reports
Yehuda Afek, Anat Bremler-Barr, Alon Noy
arxiv,
2020

In this paper we present three attacks on private internal networks behind a NAT and a corresponding new
protection mechanism, Internal Network Policy, to mitigate a wide range of attacks that penetrate internal networks behind a NAT. In the attack scenario, a victim is
tricked to visit the attacker’s website, which contains a
malicious script that lets the attacker access the victim’s
internal network in different ways, including opening a
port in the NAT or sending a sophisticated request to
local devices. The first attack utilizes DNS Rebinding
in a particular way, while the other two demonstrate different methods of attacking the network, based on application security vulnerabilities. Following the attacks,
we provide a new browser security policy, Internal Network Policy (INP), which protects against these types of vulnerabilities and attacks. This policy is implemented
in the browser just like Same Origin Policy (SOP) and
prevents malicious access to internal resources by external entities.

Conferences & Workshops
Anat Bremler-Barr, Haim Levy, Zohar Yakhini
IEEE/IFIP NOMS,
2020

In recent years the number of IoT devices in home networks has increased dramatically. Whenever a new device connects to the network, it must be quickly managed and secured using the relevant security mechanism or QoS policy. Thus a key challenge is to distinguish between IoT and NoT devices in a matter of minutes. Unfortunately, there is no clear indication of whether a device in a network is an IoT. In this paper, we propose different classifiers that identify a device as IoT or non-IoT, in a short time scale, and with high accuracy.
Our classifiers were constructed using machine learning techniques on a seen (training) dataset and were tested on an unseen (test) dataset. They successfully classified devices that were not in the seen dataset with accuracy above 95%. The first classifier is a logistic regression classifier based on traffic features. The second classifier is based on features we retrieve from DHCP packets. Finally, we present a unified classifier that leverages the advantages of the other two classifiers.

Journal
Anat Bremler-Barr, Hanoch Levy, Udi Ben-Porat
Computer Networks,
2014

Performance analysis has been used for over half a century for the design and operations of computer systems and communications networks. This methodology, mostly stochastic in nature, has been based on the underlying paradigm that users are innocent and “well behaving”. Following the flourish of the internet and the subsequent rise of malicious incidents, research started investigating the effect of malicious attacks aimed at degrading system performance.

The objective of this work is to understand how system performance is affected by malicious behavior and how performance evaluation should account for it. We do so by considering an array of “classical” systems taken from the literature and examining their degree of vulnerability. These can also serve in providing some initial insights into what makes a system design vulnerable or resilient.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, David Hay, Yotam Harchol, Yaron Koral
ACM/IEEE ANCS,
2012

This paper takes advantage of the emerging multi-core computer architecture to design a general framework for mitigating network-based complexity attacks. In complexity attacks, an attacker carefully crafts “heavy” messages (or packets) such that each heavy message consumes substantially more resources than a normal message. Then, it sends a sufficient number of heavy messages to bring the system to a crawl at best. In our architecture, called MCA^2 – Multi-Core Architecture for Mitigating Complexity Attacks – cores quickly identify such suspicious messages and divert them to a fraction of the cores that are dedicated to handle all the heavy messages. This keeps the rest of the cores relatively unaffected and free to provide the legitimate traffic the same quality of service as if no attack takes place.
We demonstrate the effectiveness of our architecture by examining cache-miss complexity attacks against Deep Packet Inspection (DPI) engines. For example, for Snort DPI engine, an attack in which 30% of the packets are malicious degrades the system throughput by over 50%, while with MCA^2 the throughput drops by either 20% when no packets are dropped or by 10% in case dropping of heavy packets is allowed. At 60% malicious packets, the corresponding numbers are 70%, 40% and 23%.

Patents
Anat Bremler-Barr, Hanoch Levy, Omer Dekel
Patent,
2007

A method for communication management includes detecting addresses of peer nodes (34) belonging to a service layer (30) of a distributed application running on a computer network (22). Responsively to the detected addresses, filtering of communication traffic transmitted by client computers (26) is actuated so as to limit access by the client computers to the distributed application.

Conferences & Workshops
A. Bremler-Barr, R. Halacmi-Bekel and J. Kangasharju
NPSec workshop,
2006

n this paper we present the unregister attack, a new kind of a denial of service attack on SIP servers. In this attack, the attacker sends a spoofed “unregister” message to a SIP server and cancels the registration of the victim at that server. This prevents the victim user from receiving any calls. We have tested common implementations of SIP servers and show that the unregister attack is easily performed on SIP servers which do not use authentication. Even on SIP servers with authentication, an attacker able to sniff the traffic between the client and server can still successfully attack common servers. We show that the root causes behind this vulnerability are either buggy implementations, or the SIP specification RFC which does not require sufficient security from the implementations. We present a solution, the SIP one-way hash function algorithm (SOFIA), motivated by the onetime password mechanism [6]. SOFIA prevents the unregister attack in all situations. The algorithm is easy to deploy since it requires only a minor modification, namely adding one header field into the SIP messages. Furthermore, the algorithm is fully backwards compatible and requires no additional configuration from the user or the server.

DDoS attack

Poster and brief announcement
Yehuda Afek, Anat Bremler-Barr, Shani Stajnrod
Usenix Security ,
2023

To fully understand the root cause of the NRDelegationAttack and to analyze its amplification factor, we developed mini- lab setup, disconnected from the Internet, that contains all
the components of the DNS system, a client, a resolver, and authoritative name servers. This setup is built to analyze and examine the behavior of a resolver (or any other component) under the microscope. On the other hand it is not useful for performance analysis (stress analysis).
Here we provide the code and details of this setup enabling to reproduce our analysis. Moreover, researchers may find it useful for farther behavioral analysis and examination of different components in the DNS system.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Shani Stajnrod
Usenix Security ,
2023

Malicious actors carrying out distributed denial-of-service (DDoS) attacks are interested in requests that consume a large amount of resources and provide them with ammunition. We present a severe complexity attack on DNS resolvers, where a single malicious query to a DNS resolver can significantly increase its CPU load. Even a few such concurrent queries can result in resource exhaustion and lead to a denial of its service to legitimate clients. This attack is unlike most recent DDoS attacks on DNS servers, which use communication amplification attacks where a single query generates a large number of message exchanges between DNS servers.

The attack described here involves a malicious client whose request to a target resolver is sent to a collaborating malicious authoritative server; this server, in turn, generates a carefully crafted referral response back to the (victim) resolver. The chain reaction of requests continues, leading to the delegation of queries. These ultimately direct the resolver to a server that does not respond to DNS queries. The exchange generates a long sequence of cache and memory accesses that dramatically increase the CPU load on the target resolver. Hence the name non-responsive delegation attack, or NRDelegationAttack.

We demonstrate that three major resolver implementations, BIND9, Unbound, and Knot, are affected by the NRDelegationAttack, and carry out a detailed analysis of the amplification factor on a BIND9 based resolver. As a result of this work, three common vulnerabilities and exposures (CVEs) regarding NRDelegationAttack were issued by these resolver implementations. We also carried out minimal testing on 16 open resolvers, confirming that the attack affects them as well.

Poster and brief announcement
Anat Bremler-Barr, Michael Czeizler
INFOCOM,
2023

Auto-scaling is a fundamental capability of cloud computing which allows consuming resources dynamically according to changing traffic needed to be served.
By the micro-services architecture paradigm, software systems are built as a set of loosely-coupled applications and services that can be individually scaled.
In this paper, we present a new attack the \emph{Tandem Attack} that exploits the Tandem behavior of micro-services with different scaling properties. Such issues can result in Denial of Service (DoS) and Economic Denial of Sustainability (EDoS) created by malicious attackers or self-inflicted due to wrong configurations set up by administrators. We demonstrate the Tandem attack using a popular AWS serverless infrastructure modeling two services and show that removing servers’ management responsibility from the cloud users does not mitigate the different scaling properties challenge and can even make the problem harder to solve.

Conferences & Workshops
Anat Bremler-Barr, Matan Sabag
IFIP Networking,
2022

Distributed denial of service (DDoS) attacks, especially distributed reflection denial of service attacks (DRDoS), have increased dramatically in frequency and volume in recent years. Such attacks are possible due to the attacker’s ability to spoof the source address of IP packets. Since the early days of the internet, authenticating the IP source address has remained unresolved in the real world. Although there are many methods available to eliminate source spoofing, they are not widely used, primarily due to a lack of economic incentives.
We propose a collaborative on-demand route-based defense technique (CORB) to offer efficient DDoS mitigation as a paid-for-service, and efficiently assuage reflector attacks before they reach the reflectors and flood the victim. The technique uses scrubbing facilities located across the internet at internet service providers (ISPs) and internet exchange points (IXPs).
By transmitting a small amount of data based on border gateway protocol (BGP) information from the victim to the scrubbing facilities, we can filter out the attack without any false-positive cases. For example, the data can be sent using DOTS, a new signaling DDoS protocol that was standardized by the IETF. CORB filters the attack before it is amplified by the reflector, thereby reducing the overall cost of the attack. This provides a win-win financial situation for the victim and the scrubbing facilities that provide the service.
We demonstrate the value of CORB by simulating a Memcached DRDoS attack using real-life data. Our evaluation found that deploying CORB on scrubbing facilities at approximately 40 autonomous systems blocks 90% of the attack and can reduce the mitigation cost by 85%.

Conferences & Workshops
Anat Bremler-Barr, Ronen Ben David
CLOSER 2021,
2021

In recent years, we have witnessed a new kind of DDoS attack, the burst attack(Chai, 2013; Dahan, 2018), where the attacker launches periodic bursts of traffic overload on online targets. Recent work presents a new kind of Burst attack, the YoYo attack (Bremler-Barr et al., 2017) that operates against the auto-scaling mechanism of VMs in the cloud. The periodic bursts of traffic loads cause the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The auto-scaling mechanism translates the flat DDoS attacks into Economic Denial of Sustainability attacks (EDoS), where the victim suffers from economic damage accrued by paying for extra resources required to process the traffic generated by the attacker. However, it was shown that YoYo attack also causes significant performance degradation since it takes time to scale-up VMs.In this research, we analyze the resilience of Kubernetes auto-scaling against YoYo attacks. As containerized cloud applications using Kubernetes gain popularity and replace VM-based architecture in recent years. We present experimental results on Google Cloud Platform, showing that even though the scale-up time of containers is much lower than VM, Kubernetes is still vulnerable to the YoYo attack since VMs are still involved. Finally, we evaluate ML models that can accurately detect YoYo attack on a Kubernetes cluster

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Lior Shafir
Usenix Security,
2020

This paper exposes a new vulnerability and introduces a corresponding attack, the NoneXistent Name Server Attack (NXNSAttack), that disrupts and may paralyze the DNS system, making it difficult or impossible for Internet users to access websites, web e-mail, online video chats, or any other online resource. The NXNSAttack generates a storm of packets between DNS resolvers and DNS authoritative name servers. The storm is produced by the response of resolvers to unrestricted referral response messages of authoritative name servers. The attack is significantly more destructive than NXDomain attacks (e.g., the Mirai attack): i) It reaches an amplification factor of more than 1620x on the number of packets exchanged by the recursive resolver. ii) In addition to the negative cache, the attack also saturates the ‘NS’ section of the resolver caches. To mitigate the attack impact, we propose an enhancement to the recursive resolver algorithm, MaxFetch(k), that prevents unnecessary proactive fetches. We implemented the MaxFetch(1) mitigation enhancement on a BIND resolver and tested it on real-world DNS query datasets. Our results show that MaxFetch(1) degrades neither the recursive resolver throughput nor its latency. Following the discovery of the attack, a responsible disclosure procedure was carried out, and several DNS vendors and public providers have issued a CVE and patched their systems.

Journal
Yehuda Afek, Anat Bremler-Barr, Shir Landau Feibish
ACM/IEEE Transactions on Networking,
2019

We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P the messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A the messages captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings that are frequently found in A and not in P , thus allowing the identification of the attack packets. This is an important tool in protecting sites on the Internet from worm attacks and distributed denial of service attacks and may also be useful for other problems, including command and control identification and the DNA-sequences analysis. The main contributions of this paper are the system we developed to extract the required signatures together with the string-heavy hitters problem definition and the algorithm for solving this problem. This algorithm finds popular strings of variable length in a set of messages, using, in a tricky way, the classic heavy-hitter algorithm as a building block. The algorithm runs in linear time requiring one-pass over the input. Our system makes use of this algorithm to extract the desired signatures. Furthermore, we provide an extended algorithm which is able to identify groups of signatures, often found together in the same packets, which further improves the quality of signatures generated by our system. Using our system, a yet unknown attack can be detected and stopped within minutes from attack start time.

Poster and brief announcement
Yehuda Afek, Anat Bremler-Barr, Lior Shafir, Neta Peleg, Matan Sabag
SIGCOMM,
2019

In this work we measure what percentage of DNS recursive
resolvers perform negative caching in the wild. We deploy
our own authoritative name server and harness thousands
of RIPE Atlas [3] sites spread over the globe to perform
repeated DNS queries for non-existing sub-domains of our
authoritative domain.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Edith Cohen, Michal Sagam, Shir Landau Feibish
HotWeb ,
2017

Random Subdomain DDoS attacks on the Domain Name System (DNS) infrastructure are becoming a popular vector in recent attacks (e.g., recent Mirai attack on Dyn). In these attacks, many queries are sent for a single or a few victim domains, yet they include highly varying non-existent subdomains generated randomly.
Motivated by these attacks we designed and implemented novel and efficient algorithms for distinct heavy hitters (dHH). A (classic) heavy hitter (HH) in a stream of elements is a key (e.g., the domain of a query) which appears in many elements (e.g., requests).
When stream elements consist of <key, subkey> pairs, (<domain, subdomain>) a distinct heavy hitter (dhh) is a key that is paired with a large number of different subkeys. Our algorithms dominate previous designs in both the asymptotic (theoretical) sense and practicality. Specifically, the new fixed-size algorithms are simple to code and with asymptotically optimal space accuracy tradeoffs.
Based on these algorithms, we build and implement a system for the detection and mitigation of Random Subdomain DDoS attacks. We perform experimental evaluation, demonstrating the effectiveness of our algorithms.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Lior Shafir
INFOCOM,
2017

Traditional DDoS anti-spoofing scrubbers require dedicated middleboxes thus adding CAPEX, latency and complexity in the network. This paper starts by showing that the current SDN match-and-action model is rich enough to implement a collection of anti-spoofing methods. Secondly we develop and utilize advance methods for dynamic resource sharing to distribute the required mitigation resources over a network of switches.
None of the earlier attempts to implement anti-spoofing in SDN actually directly exploited the match and action power of the switch data plane. They required additional functionalities on top of the match-and-action model, and are not implementable on an SDN switch as is. Our method builds on the premise that an SDN data path is a very fast and efficient engine to perform low level primitive operations at wire speed. The solution requires a number of flow-table rules and switch-controller messages proportional to the legitimate traffic. To scale when protecting multiple large servers the flow tables of multiple switches are harnessed in a distributed and dynamic network based solution.
We have fully implemented all our methods in either OpenFlow1.5 in Open-vSwitch and in P4. The system mitigates spoofed attacks on either the SDN infrastructure itself or on downstream servers.

Conferences & Workshops
Anat Bremler-Barr, Mor Sides, Eli Brosh
INFOCOM,
2017

Auto-scaling mechanisms are an important line of defense against Distributed Denial of Service (DDoS) in the cloud. Using auto-scaling, machines can be added and removed in an on-line manner to respond to fluctuating load. It is commonly believed that the auto-scaling mechanism casts DDoS attacks into Economic Denial of Sustainability (EDoS) attacks. Rather than suffering from performance degradation up to a total denial of service, the victim suffers only from the economic damage incurred by paying for the extra resources required to process the bogus traffic of the attack.
Contrary to this belief, we present and analyze the Yo-Yo attack, a new attack against the auto-scaling mechanism, that can cause significant performance degradation in addition to economic damage. In the Yo-Yo attack, the attacker sends peri- odic bursts of overload, thus causing the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The Yo- Yo attack is harder to detect and requires less resources from the attacker compared to traditional DDoS. We demonstrate the attack on Amazon EC2 [4], and analyze protection measures the victim can take by reconfiguring the auto-scaling mechanism.

Journal
Yehuda Afek, Anat Bremler-Barr, David Hay, Yotam Harchol
ACM/IEEE Transactions on Networking,
2016

This paper starts by demonstrating the vulnerability of Deep Packet Inspection (DPI) mechanisms, which are at the core of security devices, to algorithmic complexity denial of service attacks, thus exposing a weakness in the first line of defense of enterprise networks and clouds. A system and a multi-core architecture to defend from these algorithmic complexity attacks is presented in the second part of the paper. The integration of this system with two different DPI engines is demonstrated and dis- cussed. The vulnerability is exposed by showing how a simple low bandwidth cache-miss attack takes down the Aho-Corasick (AC) pattern matching algorithm that lies at the heart of most DPI engines. As a first step in the mitigation of the attack, we have developed a compressed variant of the AC algorithm that improves the worst case performance (under an attack). Still, under normal traffic its running-time is worse than classical AC implementations. To overcome this problem, we introduce MCA^2 —Multi- Core Architecture to Mitigate Complexity Attacks, which dynamically combines the classical AC algorithm with our compressed implementation, to provide a robust solution to mitigate this cache- miss attack. We demonstrate the effectiveness of our architecture by examining cache-miss algorithmic complexity attacks against DPI engines and show a goodput boost of up to 73%. Finally, we show that our architecture may be generalized to provide a principal solution to a wide variety of algorithmic complexity attacks.

Poster and brief announcement
Anat Bremler-Barr, Mor Sides and Elisha Rosensweig
SIGCOMM,
2015
Journal
Anat Bremler-Barr, Hanoch Levy, Udi Ben-Porat
Computer Networks,
2014

Performance analysis has been used for over half a century for the design and operations of computer systems and communications networks. This methodology, mostly stochastic in nature, has been based on the underlying paradigm that users are innocent and “well behaving”. Following the flourish of the internet and the subsequent rise of malicious incidents, research started investigating the effect of malicious attacks aimed at degrading system performance. The objective of this work is to understand how system performance is affected by malicious behavior and how performance evaluation should account for it. We do so by considering an array of “classical” systems taken from the literature and examining their degree of vulnerability. These can also serve in providing some initial insights into what makes a system design vulnerable or resilient.

Journal
Anat Bremler-Barr, Hanoch Levy, Udi Ben-Porat
Computer Networks,
2014

Performance analysis has been used for over half a century for the design and operations of computer systems and communications networks. This methodology, mostly stochastic in nature, has been based on the underlying paradigm that users are innocent and “well behaving”. Following the flourish of the internet and the subsequent rise of malicious incidents, research started investigating the effect of malicious attacks aimed at degrading system performance.

The objective of this work is to understand how system performance is affected by malicious behavior and how performance evaluation should account for it. We do so by considering an array of “classical” systems taken from the literature and examining their degree of vulnerability. These can also serve in providing some initial insights into what makes a system design vulnerable or resilient.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, David Hay, Yotam Harchol, Yaron Koral
ACM/IEEE ANCS,
2012

This paper takes advantage of the emerging multi-core computer architecture to design a general framework for mitigating network-based complexity attacks. In complexity attacks, an attacker carefully crafts “heavy” messages (or packets) such that each heavy message consumes substantially more resources than a normal message. Then, it sends a sufficient number of heavy messages to bring the system to a crawl at best. In our architecture, called MCA^2 – Multi-Core Architecture for Mitigating Complexity Attacks – cores quickly identify such suspicious messages and divert them to a fraction of the cores that are dedicated to handle all the heavy messages. This keeps the rest of the cores relatively unaffected and free to provide the legitimate traffic the same quality of service as if no attack takes place.
We demonstrate the effectiveness of our architecture by examining cache-miss complexity attacks against Deep Packet Inspection (DPI) engines. For example, for Snort DPI engine, an attack in which 30% of the packets are malicious degrades the system throughput by over 50%, while with MCA^2 the throughput drops by either 20% when no packets are dropped or by 10% in case dropping of heavy packets is allowed. At 60% malicious packets, the corresponding numbers are 70%, 40% and 23%.

Conferences & Workshops
Anat Bremler-Barr, Hanoch Levy, Udi Ben-Porat
IFIP Networking,
2012
Conferences & Workshops
Anat Bremler-Barr, Hanoch Levy
INFOCOM,
2005

A new approach for filtering spoofed IP packets, called spoofing prevention method (SPM), is proposed. The method enables routers closer to the destination of a packet to verify the authenticity of the source address of the packet. This stands in contrast to standard ingress filtering which is effective mostly at routers next to the source and is ineffective otherwise. In the proposed method a unique temporal key is associated with each ordered pair of source destination networks (AS’s, autonomous systems). Each packet leaving a source network S is tagged with the key K(S, D), associated with (S, D), where D is the destination network. Upon arrival at the destination network the key is verified and removed. Thus the method verifies the authenticity of packets carrying the address s which belongs to network S. An efficient implementation of the method, ensuring not to overload the routers, is presented. The major benefits of the method are the strong incentive it provides to network operators to implement it, and the fact that the method lends itself to stepwise deployment, since it benefits networks deploying the method even if it is implemented only on parts of the Internet. These two properties, not shared by alternative approaches, make it an attractive and viable solution to the packet spoofing problem.

Patents
Anat Bremler-Barr, Hank Nussbacher, Roi Hermoni, Dan Touitou
Patent,
2005

A method for communication includes coupling a first port of a Layer-3 packet router to receive communication traffic from a network, the traffic including packets destined for a target address, which is accessible via a second port of the router. At the router, the packets that are destined for the target address are diverted to a traffic processor via a third port of the router. The diverted packets are processed at the traffic processor, and returning the processed packets to the router via the third port. At the router, the processed packets are conveyed from the third port to the second port for delivery to the target address

Patents
Yehuda Afek, Anat Bremler-Barr
Patent,
2001

A method for screening packet-based communication traffic. At least a first data packet, sent over a network (40) from a source address to a destination address, is received. A determination is made, by analyzing the first data packet, that the first data packet was generated by a worm. In response to the determination, a second data packet sent over the network from the source address is blocked.

Patents
Anat Bremler-BarrDan TouitouKeren HorvitzRephael TzadikarioYehuda Afek
Patent,
2001

An improved network device that controls throughput of packets received thereby, e.g., to downstream devices or to downstream logic contained within the same network device. The network device comprises a scheduler that schedules one or more packets of a selected class for throughput as a function of a weight of that class and weights of one or more other classes. The weight of at least the selected class is dynamic and is a function of a history of volume of packets received by the network device in the selected class. An apparatus for protecting against overload conditions on a network, e.g., of the type caused by DDoS attacks, has a scheduler and a token bucket mechanism, e.g., as described above. Such apparatus can also include a plurality of queues into which packets of the respective classes are placed on receipt by the apparatus. Those packets are dequeued by the scheduler, e.g., in the manner described above, for transmittal to downstream devices (e.g., potential victim nodes) on the network.

Conferences & Workshops
Anat Bremler-Barr
NANOG,
2001

Network engineers have been known to use diversion to blackhole DDoS attacks. This technique may divert and blackhole legitimate traffic. We present a method that provides availability under DDoS attacks by combining different diversion methods with a mechanism that sieves the “bad” packets and forwards the “good” packets to the intended victim. The method minimizes demand on router resources and does not introduce additional elements on the normal data path.

The diversion method allows a sieving mechanism to process only the victims’ traffic. The system is employable on a provider’s backbone, preferably at the peering points. Furthermore, since diversion is done on demand for different targets at different periods of time, the solution can be shared by a large number of potential victims and can protect any element in the provider’s backbone. This method can also be applied on egress traffic, thus enabling a service provider to clean attack traffic generated within its own network. Various alternative methods of transparently diverting a victim’s traffic and returning its legitimate traffic will be presented.

Deep Packet Inspection (DPI)

Journal
Anat Bremler-Barr, David Hay, Yotam Harchol, Yacov Hel-Or
ACM/IEEE Transactions on Networking,
2018

We present range encoding with no expansion (RENÉ)- a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems, such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENÉ on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.

Projects, thesis, and dissertations
Lior Barak, Yotam Harchol, Anat Bremler-Barr
Project,
2016

Today, most of the network traffic need to traverse through several middleboxes before it can reach its destination. Common operation between the many of these middlebox is DPI – Deep Packet Inspection, which allows to perform different actions based on patterns in the packets content.
DPI consumes many of the middlebox resources during its operation. In addition, each packet usually traverses several middleboxes which causes the same packet to be scanned by different DPI engines over and over again. As a result the network becomes less efficient, which affects directly its total bandwidth.
One solution for those issues is a system that provide DPI as service. Means, the different middleboxes in the network that need DPI, can register to the service and expose their desired patterns. The System will direct the packets to a designated DPI engine instances across the network and pass the pattern matches, if exists, to the relevant middlebox.
There are many advantages in such system, among others: a single scan of every packet, the ability to upgrade to latest DPI algorithms, better partition of packets between DPI engines and increasing middlebox development innovation. Developing such a system is more simple today than ever with the emerging of SDN, which allows dynamic routing of the network traffic using a centralized controller.
The goal of this work is to implement a prototype of the DPI as a service system and to provide a realistic as possible environment to evaluate it. This paper documents the design and implementation of the system and other tools which are needed to deploy functioning network that uses this system.
Finally, the paper describes the experiments done to prove the system correctness and effectiveness and discusses their results.

Journal
Yehuda Afek, Anat Bremler-Barr, David Hay, Yotam Harchol
ACM/IEEE Transactions on Networking,
2016

This paper starts by demonstrating the vulnerability of Deep Packet Inspection (DPI) mechanisms, which are at the core of security devices, to algorithmic complexity denial of service attacks, thus exposing a weakness in the first line of defense of enterprise networks and clouds. A system and a multi-core architecture to defend from these algorithmic complexity attacks is presented in the second part of the paper. The integration of this system with two different DPI engines is demonstrated and dis- cussed. The vulnerability is exposed by showing how a simple low bandwidth cache-miss attack takes down the Aho-Corasick (AC) pattern matching algorithm that lies at the heart of most DPI engines. As a first step in the mitigation of the attack, we have developed a compressed variant of the AC algorithm that improves the worst case performance (under an attack). Still, under normal traffic its running-time is worse than classical AC implementations. To overcome this problem, we introduce MCA^2 —Multi- Core Architecture to Mitigate Complexity Attacks, which dynamically combines the classical AC algorithm with our compressed implementation, to provide a robust solution to mitigate this cache- miss attack. We demonstrate the effectiveness of our architecture by examining cache-miss algorithmic complexity attacks against DPI engines and show a goodput boost of up to 73%. Finally, we show that our architecture may be generalized to provide a principal solution to a wide variety of algorithmic complexity attacks.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Daniel Krauthgamer, Shimrit Tzur David
IFIP Networking,
2016

URL matching lies at the core of many networking applications and Information Centric Networking architectures. For example, URL matching is extensively used by Layer 7 switches, ICN/NDN routers, load balancers, and security devices. Modern URL matching is done by maintaining a rich database that consists of tens of millions of URL which are classified to dozens of categories (or egress ports). In real-time, any input URL has to be searched in this database to find the corresponding category.
In this paper, we introduce a generic framework for accurate URL matching (namely, no false positives or miscategorization) that aims to reduce the overall memory footprint, while still having low matching latency. We introduce a dictionary-based compression method that compresses the database by 60%, while having only a slight overhead in time. Our framework is very flexible and it allows hot-updates,
cloud-based deployments, and can deal with strings that are not URLs.

Projects, thesis, and dissertations
By Asher Gruber (Supervisor: Prof. Anat Bremler-Barr)
Project,
2016

Deep Packet Inspection (DPI) is a widespread functionality among middlebox applications. In many cases, DPI is performed by Intrusion Detection Systems (IDS) such as Snort. Traditionally, each packet is re-scanned by multiple middleboxes until reaching its final destination. Recent studies show that DPI is one of the most processing intensive tasks of modern middleboxes.
The DPI as a Service paper, presents a framework which allows to extract the time-consuming task of DPI out of the middleboxes while providing it as a service. Alongside with the framework design, the authors introduce a reference implementation on a simulated environment, while demonstrating promising results through a set of experiments.
In this work we have enhanced the reference implementation in order to demonstrate that the framework can operate in a more realistic environment setup. First, and foremost, we have integrated the framework with the commonly used Snort IDS. Second, we have extended the DPI Service to support the Network Service Header (NSH) protocol which allows passing of the pattern match results with the inspected packet. These significant enhancements, transformed the reference implementation to a more robust system, which can take proactive measures in an event of malicious pattern detection.
Finally, once the work on the framework was completed we were able to perform the basic experiments which were reported in the original paper. Our findings indicate, that the original framework results are reproducible in a our version of the framework.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol, Yacov Hel-Or
SPAA,
2016

We present RENE ́ — a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENE ́ on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol
SIGCOMM,
2016

We present OpenBox — a software-defined framework for network-wide development, deployment, and management of network functions (NFs). OpenBox effectively decouples the control plane of NFs from their data plane, similarly to SDN solutions that only address the network’s forwarding plane.
OpenBox consists of three logic components. First, user-defined OpenBox applications provide NF specifications through the OpenBox north-bound API. Second, a logically-centralized OpenBox controller is able to merge logic of multiple NFs, possibly from multiple tenants, and to use a network-wide view to efficiently deploy and scale NFs across the network data plane. Finally, OpenBox instances constitute OpenBox’s data plane and are implemented either purely in software or contain specific hardware accelerators (e.g., a TCAM). In practice, different NFs carry out similar processing steps on the same packet, and our experiments indeed show a significant improvement of the network performance when using OpenBox. Moreover, OpenBox readily supports smart NF placement, NF scaling, and multi-tenancy through its controller.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol, Yacov Hel-Or
ACM DaMoN,
2015

Similarity search, and specifically the nearest-neighbor search (NN) problem is widely used in many fields of computer science such as machine learning, computer vision and databases. However, in many settings such searches are known to suffer from the notorious curse of dimensionality, where running time grows exponentially with d. This causes severe performance degradation when working in high-dimensional spaces. Approximate techniques such as locality-sensitive hashing improve the performance of the search, but are still computationally intensive.
In this paper we propose a new way to solve this problem using a special hardware device called ternary content addressable memory (TCAM). TCAM is an associative memory, which is a special type of computer memory that is widely used in switches and routers for very high speed search applications. We show that the TCAM computational model can be leveraged and adjusted to solve NN search problems in a single TCAM lookup cycle, and with linear space. This concept does not suffer from the curse of dimensionality and is shown to improve the best known approaches for NN by more than four orders of magnitude. Simulation results demonstrate dramatic improvement over the best known approaches for NN, and suggest that TCAM devices may play a critical role in future large-scale databases and cloud applications.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Liron Schiff
ACM/IEEE ANCS,
2015

Configuring range based packet classification rules in network switches is crucial to all network core functionalities, such as firewalls and routing. However, OpenFlow, the leading management protocol for SDN switches, lacks the interface to configure range rules directly and only provides mask based rules, named flow entries.
In this work we present, ORange, the first solution to multi dimensional range classification in OpenFlow. Our solution is based on paradigms used in state of the art non-OpenFlow classifiers and is designed in a modular fashion allowing future extensions and improvements. We consider switch space utilization as well as atomic updates functionality, and in the network context we provide flow consistency even if flows change their entrance point to the network during policy updates, a property we name cross-entrance consistency. Our scheme achieves remarkable results and is easy to deploy.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol, Shimrit Tzur David
INFOCOM,
2015

Deep Packet Inspection (DPI) plays a major role in contemporary networks. Specifically, in datacenters of content providers, the scanned data may be highly repetitive. Most DPI engines are based on identifying signatures in the packet payload. This pattern matching process is expensive both in memory and CPU resources, and thus, often becomes the bottleneck of the entire application.
In this paper we show how DPI can be accelerated by leveraging repetitions in the inspected traffic. Our new mechanism makes use of these repetitions to allow the repeated data to be skipped rather than scanned again. The mechanism consists of a slow path, in which frequently repeated strings are identified and stored in a dictionary, along with some succinct information for accelerating the DPI process, and a data path, where the traffic is scanned byte by byte but strings from the dictionary, if encountered, are skipped. Upon skipping, the data path recovers to the state it would have been in had the scanning continued byte by byte.
Our solution achieves a significant performance boost, especially when data is from the same content source (e.g., the same website). Our experiments show that for such cases, our solution achieves a throughput gain of 1.25 − 2.5 times the original throughput, when implemented in software.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yaron Koral, Michela Becchi, Omer Kochba
INFOCOM,
2015

This paper focuses on regular expression matching over compressed traffic. The need for such matching arises from two independent trends. First, the volume and share of compressed HTTP traffic is constantly increasing. Second, due to their superior expressibility, current Deep Packet Inspection engines use regular expressions more and more frequently.
We present an algorithmic framework to accelerate such matching, taking advantage of information gathered when the traffic was initially compressed. HTTP compression is typically performed through the GZIP protocol, which uses back-references to repeated strings. Our algorithm is based on calculating (for every byte) the minimum number of (previous) bytes that can be part of a future regular expression matching. When inspecting a back-reference, only these bytes should be taken into account, thus enabling one to skip repeated strings almost entirely without missing a match. We show that our generic framework works with either NFA-based or DFA-based implementations and gains performance boosts of more than 70%. Moreover, it can be readily adapted to most existing regular expression matching algorithms, which usually are based either on NFA, DFA or combinations of the two. Finally, we discuss other applications in which calculating the number of relevant bytes becomes handy, even when the traffic is not compressed.

Conferences & Workshops
Anat Bremler-Barr, David Hay, Yotam Harchol, Yaron Koral
ACM CoNEXT,
2014

Middleboxes play a major role in contemporary networks, as for- warding packets is often not enough to meet operator demands, and other functionalities (such as security, QoS/QoE provisioning, and load balancing) are required. Traffic is usually routed through a sequence of such middleboxes, which either reside across the net- work or in a single, consolidated location. Although middleboxes provide a vast range of different capabilities, there are components that are shared among many of them.

A task common to almost all middleboxes that deal with L7 protocols is Deep Packet Inspection (DPI). Today, traffic is inspected from scratch by all the middleboxes on its route. In this paper, we propose to treat DPI as a service to the middleboxes, implying that traffic should be scanned only once, but against the data of all middleboxes that use the service. The DPI service then passes the scan results to the appropriate middleboxes. Having DPI as a service has significant advantages in performance, scalability, robustness, and as a catalyst for innovation in the middlebox domain. Moreover, technologies and solutions for current Software Defined Networks (SDN) (e.g., SIMPLE [42]) make it feasible to implement such a service and route traffic to and from its instances.

Journal
Yehuda Afek, Anat Bremler-Barr, Liron Schiff
Computer Networks,
2014

A recursive and fast construction of an n-element priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date
require either O(log n) instructions per operation or, exponential (with key size) space or, expensive special hardware whose cost and latency dramatically increases with the priority queue size.
Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues
(which are also much faster) while maintaining the O(1) steps per PQ operation is critical. Here
we present such an acceleration technique called the Power Priority Queue (PPQ) technique.
Specifically, an n-element PPQ is constructed from 2k − 1 primitive priority queues of size k √n
(k = 2, 3, …) and a RAM of size n, where the throughput of the construct beats that of a single,
size n primitive hardware priority queue. For example an n-element PQ can be constructed from
either three √n or five 3√n primitive H/W priority queues.
Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This
demonstrates the benefits of our scheme; when used with hardware TCAM. We expect similar results with systolic arrays, shift-registers and similar technologies.
As a byproduct of our technique we present an O(n) time sorting algorithm in a system
equipped with a O(w√n) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used
an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n-element with TCAM of size O(n) that matches our TCAM based sorting algorithm.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Shir Landau Feibish
ACM/IEEE ANCS,
2013

We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P of messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings, that are frequently found in A and not in P . Therefore, a packet containing one of the strings from S is likely to be an attack packet.
This is an important tool in protecting sites on the Internet from Worm attacks, and Distributed Denial of Service (DDoS) attacks. It may also be useful for other problems, including command and control identification, DNA-sequences analysis, etc. which are beyond the scope of this work.
Two contributions of this paper are the system we developed to extract the required signatures together with the problem definition and the string-heavy hitters algorithm. This algorithm finds popular strings of variable length in a set of messages, using, in a tricky way, the classic heavy-hitter algorithm as a building block. This algorithm is then used by our system to extract the desired signatures. Using our system a yet unknown attack can be detected and stopped within minutes from attack start time.

Conferences & Workshops
Yehuda Afek, Anat Bremler-Barr, Liron Schiff
SPAA,
2013

A recursive and fast construction of an n elements priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date either require O (log n ) instructions per operation or exponential (with key size) space or expensive special hardware whose cost and latency dramatically increases with the priority queue size. Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues (which are also much faster)
while maintaining the O (1) steps per PQ operation is critical. Here we present such an acceleration technique called the Power Priority Queue (PPQ) technique. Specifically, an n elements PPQ is constructed from 2k −1 primitive priority queues of size k√n ( k = 2 , 3 , … ) and a RAM of size n , where the throughput of the construct beats that of a single, size n primitive hardware priority queue. For example an n elements PQ can be constructed from either three √n or five 3√n primitive H/W priority queues.
Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This demonstrates the benefits of our scheme when used with hardware TCAM, we expect similar results with systolic arrays,
shift-registers and similar technologies.
As a by product of our technique we present an O(n) time sorting algorithm i n a system equipped with a O(w√n) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n elements with TCAM of size O(n) that matches our TCAM based sorting algorithm.

Journal
Anat Bremler-Barr, David Hay, Yaron Koral
ACM/IEEE Transactions on Networking,
2013

A central component in all contemporary intrusion detection systems (IDSs) is their pattern matching algorithms, which are often based on constructing and traversing a deterministic finite automaton (DFA) that represents the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs, which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name used by common DFA encoding is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state are represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Match (LPM), which can be solved either in ternary content-addressable memory (TCAM), in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gb/s with low power consumption.

Journal
Anat Bremler-Barr, Danny Handler
IEEE Transactions on Computer,
2012

Ternary content-addressable memories (TCAMs) are increasingly used for high-speed packet classification. TCAMs compare packet headers against all rules in a classification database in parallel and thus provide high throughput unparalleled by software-based solutions. TCAMs are not well-suited, however, for representing rules that contain range fields. Such rules typically have to be represented (or encoded) by multiple TCAM entries. The resulting range expansion can dramatically reduce TCAM utilization. A TCAM range-encoding algorithm A is database-independent if, for all ranges r, it encodes r independently of the database in which it appears; otherwise, we say that A is database-dependent. Typically, when storing a classification database in TCAM, a few dozens of so-called extra bits in each TCAM entry remain unused. These extra bits are used by some (both database-dependent and database-independent) prior algorithms to reduce range expansion. The majority of real-life database ranges are short. We present a novel database-independent algorithm called Short Range Gray Encoding (SRGE) for the efficient representation of short range rules. SRGE encodes range endpoints as binary-reflected Gray codes and then represents the resulting range by a minimal set of ternary strings. To the best of our knowledge, SRGE is the first algorithm that achieves a reduction in range expansion in general, and a significant expansion reduction for short ranges in particular, without resorting to the use of extra bits. The “traditional” database-independent technique for representing range entries in TCAM is prefix expansion. As we show, SRGE significantly reduces the expansion of short ranges in comparison with prefix expansion. We also prove that the SRGE algorithm’s range expansion is at least as good as that of prefix expansion for any range. Real-world classification databases contain a small number of unique long ranges, some of which appear in numerous rules. These long ranges cause high expansion which is not significantly reduced by any database-independent range encoding scheme that we are aware of, including SRGE. We introduce hybrid SRGE, a database-dependent encoding scheme that uses SRGE for reducing the expansion of short ranges and uses extra bits for reducing the expansion caused by long ones. Our comparative analysis establishes that hybrid SRGE utilizes TCAM more efficiently than previously published range-encoding algorithms. This work also makes a more theoretic contribution. Prefix expansion for ranges defined by W-bit endpoints has worst-case expansion ratio of 2W-2. It follows from the work of Schieber et al. [1] that the SRGE algorithm has a slightly better worst-case expansion ratio of 2W-4. We prove that any independent TCAM encoding scheme has worst-case expansion ratio of at least W.

Journal
Yehuda Afek, Anat Bremler-Barr, Yaron Koral
Computer Communication,
2012

In this paper we focus on the process of deep packet inspection of compressed web traffic. The major limiting factor in this process imposed by the compression, is the high memory requirements of 32 KB per connection. This leads to the requirements of hundreds of megabytes to gigabytes of main memory on a multi-connection setting. We introduce new algorithms and techniques that drastically reduce this space requirement for such bump-in-the-wire devices like security and other content based networking tools. Our proposed scheme improves both space and time performance by almost 80% and over 40% respectively, thus making real-time compressed traffic inspection a viable option for networking devices.

    Patents
    Yehuda Afek, Anat Bremler-Barr, Yaron Koral
    Patent,
    2012

    A method for processing communication traffic includes receiving an incoming stream of compressed data conveyed by a sequence of data packets, each containing a respective portion of the compressed data. The respective portion of the compressed data contained in the first packet is stored in a buffer, having a predefined buffer size. Upon receiving a subsequent packet, at least a part of the compressed data stored in the buffer and the respective portion of the compressed data contained in the subsequent packet are decompressed, thereby providing decompressed data. A most recent part of the decompressed data that is within the buffer size is recompressed and stored in the buffer.

    Conferences & Workshops
    Anat Bremler-Barr, David Hay, Yaron Koral, Shimrit Tzur David
    INFOCOM,
    2012

    Deep Packet Inspection (DPI) is the most time and resource consuming procedure in contemporary security tools such as Network Intrusion Detection/Prevention System (NIDS/IPS), Web Application Firewall (WAF), or Content Filtering Proxy. DPI consists of inspecting both the packet header and payload and alerting when signatures of malicious software appear in the traffic. These signatures are identified through pattern matching algorithms.
    The portion of compressed traffic of overall Internet traffic is constantly increasing. This paper focuses on traffic compressed using shared dictionary. Unlike traditional compression algorithms, this compression method takes advantage of the inter-response redundancy (e.g., almost the same data is sent over and over again) as in nowadays dynamic Data. Shared Dictionary Compression over HTTP (SDCH), introduced by Google in 2008, is the first algorithm of this type. SDCH works well with other compression algorithm (as Gzip), making it even more appealing. Performing DPI on any compressed traffic is considered hard, therefore today’s security tools either do not inspect compressed data, alter HTTP headers to avoid compression, or decompress the traffic before inspecting it.
    We present a novel pattern matching algorithm that inspects SDCH-compressed traffic without decompressing it first. Our algorithm relies on offline inspection of the shared dictionary, which is common to all compressed traffic, and marking auxiliary information on it to speed up the online DPI inspection. We show that our algorithm works near the rate of the compressed traffic, implying a speed gain of SDCH’s compression ratio (which is around 40%). We also discuss how to deal with SDCH compression over Gzip compression, and show how to perform regular expression matching with about the same speed gain.

    Journal
    Anat Bremler-Barr, David Hay, Danny Handler
    Computer Networks,
    2012

    Ternary content-addressable memories (TCAMs) are increasingly used for high-speed packet classification. TCAMs compare packet headers against all rules in a classification database in parallel and thus provide high throughput.

    TCAMs are not well-suited, however, for representing rules that contain range fields and previously published algorithms typically represent each such rule by multiple TCAM entries. The resulting range expansion can dramatically reduce TCAM utilization because it introduces a large number of redundant TCAM entries. This redundancy can be mitigated by making use of extra bits, available in each TCAM entry.

    We present a scheme for constructing efficient representations of range rules, based on the simple observation that sets of disjoint ranges may be encoded much more efficiently than sets of overlapping ranges. Since the ranges in real-world classification databases are, in general, non-disjoint, the algorithms we present split ranges between multiple layers, each of which consisting of mutually disjoint ranges. Each layer is then coded and assigned its own set of extra bits.

    Our layering algorithms are based on approximations for specific variants of interval-graph coloring. We evaluate these algorithms by performing extensive comparative analysis on real-life classification databases. Our analysis establishes that our algorithms reduce the number of redundant TCAM entries caused by range rules by more than 60% as compared with best range-encoding prior work.

    Journal
    Yehuda Afek, Anat Bremler-Barr, Yaron Koral
    Computer Communication,
    2012

    In this paper we focus on the process of deep packet inspection of compressed web traffic. The major limiting factor in this process imposed by the compression, is the high memory requirements of 32KB per connection. This leads to the requirements of hundreds of megabytes to gigabytes of main memory on a multi-connection setting. We introduce new algorithms and techniques that drastically reduce this space requirement for such bump-in-the-wire devices like security and other content based networking tools. Our proposed scheme improves both space and time performance by almost 80% and over 40% respectively, thus making real-time compressed traffic inspection a viable option for networking devices.

    Patents
    Anat Bremler-Barr, David Hay, Yaron Koral
    Patent,
    2011

    A method for processing data includes encoding a finite automaton, which includes states and transitions between the states that express a plurality of predefined patterns, by grouping the states of the automaton into sets according to a common property shared by the states in each set, and assigning codes to the states according to the grouping. The codes are stored in an electronic memory, along with rules that are associated with the patterns. The automaton is traversed in order to identify one or more of the patterns in an input sequence of data elements by iteratively reading out the codes from the memory responsively to the data elements and to the codes that have been previously read out. Upon identifying a given pattern in the input sequence, an associated action is performed.

    Conferences & Workshops
    Anat Bremler-Barr, Yaron Koral, Victor Zigdon
    IEEE HPSR,
    2011

    Compressing web traffic using standard GZIP is becoming both popular and challenging due to the huge increase in wireless web devices, where bandwidth is limited. Security and other content based networking devices are required to decompress the traffic of tens of thousands concurrent connections in order to inspect the content for different signatures. The overhead imposed by the decompression inhibits most devices from handling compressed traffic, which in turn either limits traffic compression or introduces security holes and other disfunctionalities.
    The ACCH algorithm was the first to present a unified approach to pattern matching and decompression, by taking advantage of information gathered in the decompression phase to accelerate the pattern matching. ACCH accelerated the DFA-based Aho-Corasick multi-pattern matching algorithm. In this paper, we present a novel algorithm, SPC (Shift-based Pattern matching for Compressed traffic) that accelerates the commonly used Wu-Manber pattern matching algorithm. SPC is simpler and has higher throughput and lower storage overhead than ACCH. Analysis of real web traffic and real security devices signatures shows that we can skip scanning up to 87 . 5% of the data and gain performance boost of more than 51% as compared to ACCH. Moreover, the additional storage requirement of the technique requires only 4 KB additional information per connection as compared to 8 KB of ACCH.

    Conferences & Workshops
    Anat Bremler-Barr, David Hay, Yotam Harchol
    IEEE HPSR,
    2011

    Deep Packet Inspection (DPI) lies at the core of contemporary Network Intrusion Detection/Prevention Systems and Web Application Firewalls. DPI aims to identify various malware (including spam and viruses) by inspecting both the header and the payload of each packet and comparing it to a known set of patterns. DPI is often performed on the critical path of the packet processing, thus the overall performance of the security tools is dominated by the speed of DPI. The seminal algorithm of Aho-Corasick (AC) is the de-facto standard for pattern matching in network intrusion detection systems (NIDS). Basically, the AC algorithm constructs a Deterministic Finite Automaton (DFA) for detecting all occurrences of a given set of patterns by processing the input in a single pass. The input is inspected symbol by symbol (usually each symbol is a byte), such that each symbol results in a state transition. Thus, in principle, the AC algorithm has deterministic performance, which does not depend on specific input and therefore is not vulnerable to algorithmic complexity attacks, making it very attractive to NIDS systems.
    In this paper we show that, when implementing the AC algorithm in software, this property does not hold, due to the fact that contemporary pattern sets induce very large DFAs that cannot be stored entirely in cache. We then propose a novel technique to compress the representation of the Aho-Corasick automaton, so it can fit in modern cache. We compare both the performance and the memory footprint of our technique to previously-proposed implementation, under various settings and pattern sets. Our results reveal the space-time tradeoffs of DPI. Specifically, we show that our compression technique reduces the memory footprint of the best prior-art algorithm by approximately 60% , while achieving comparable throughput.

    Conferences & Workshops
    Yehuda Afek, Anat Bremler-Barr, Yaron Koral
    IFIP Networking,
    2011

    Compressing web traffic using standard GZIP is becoming both popular and challenging due to the huge increase in wireless web devices, where bandwidth is limited. Security and other content based networking devices are required to decompress the traffic of tens of thousands concurrent connections in order to insp ect the content for different signatures. The major limiting factor in this pro cess is the high memory requirements of 32KB p er connection that leads to hundreds of megabytes to gigabytes of main memory consumption. This requirement inhibits most devices from handling compressed traffic, which in turn either limits traffic compression or intro duces security holes and other dysfunctionalities. In this paper we introduce new algorithms and techniques that drastically reduce this space requirement by over 80%, with only a slight increase in the time overhead, thus making real-time compressed traffic inspection a viable option for network devices.

    Journal
    Anat Bremler-Barr, Yaron Koral
    ACM/IEEE Transactions on Networking,
    2011

    Current security tools, using ‘signature-based’ detection, do not handle compressed traffic, whose market-share is constantly increasing. This paper focus on compressed HTTP traffic. HTTP uses GZIP compression and requires some kind of decompression phase before performing a string-matching.
    We present a novel algorithm, Aho-Corasick-based algorithm for Compressed HTTP (ACCH), that takes advantage of information gathered by the decompression phase in order to accelerate the commonly used Aho-Corasick pattern matching algorithm. By analyzing real HTTP traffic and real web application firewall signatures, we show that up to 84% of the data can be skipped in its scan. Surprisingly, we show that it is faster to perform pattern matching on the compressed data, with the penalty of decompression, than on regular traffic. As far as we know, we are the first paper that analyzes the problem of ‘on-the-fly’ multi-pattern matching on compressed HTTP traffic and suggest a solution