Cloud
Today’s software development landscape has witnessed a shift towards microservices architectures. Using this approach, large software systems are composed of multiple separate microservices, each responsible for specific tasks. The breakdown to microservices is also reflected in the infrastructure, where individual microservices can be executed with different hardware configurations and scaling properties. As systems grow larger, incoming traffic can trigger multiple calls between different microservices to handle each request.
Auto-scaling is a technique widely used to adapt systems to fluctuating traffic loads by automatically increasing (scale-up) and decreasing (scale-down) the number of resources used.
Our work shows that when microservices with separate auto-scaling mechanisms work in tandem to process ingress traffic, they can overload each other. This overload results in throttling (DoS)
or the over-provisioning of resources (EDoS).
In the lecture we will demonstrate how an attacker can exploit the tandem behavior of microservices with different auto-scaling mechanisms to create an attack we denote as the Tandem Attack. We demonstrate the attack on a typical and recommended serverless architecture, using AWS Lambda for code execution and DynamoDB as database. Part of the results will be presented as an IEEE INFOCOM’23 poster.
The performance of distributed and data-centric applications often critically depends on the interconnecting network. Emerging reconfigurable datacenter networks (RDCNs) are a particularly innovative approach to improve datacenter throughput. Relying on a dynamic optical topology which can be adjusted towards the workload in a demand-aware manner, RDCNs allow to exploit temporal and spatial locality in the communication pattern, and to provide topological shortcuts for frequently communicating racks. The key challenge, however, concerns how to realize demand-awareness in RDCNs in a scalable fashion.
With the advent of cloud and container technologies, enterprises develop applications using a microservices architecture, managed by orchestration systems (e.g. Kubernetes), that group the microservices into clusters. As the number of application setups across multiple clusters and different clouds is increasing, technologies that enable communication and service discovery between the clusters are emerging (mainly as part of the Cloud Native ecosystem).
In such a multi-cluster setting, copies of the same microservice may be deployed in different geo-locations, each with different cost and latency penalties. Yet, current service selection and load balancing mechanisms do not take into account these locations and corresponding penalties.
We present \emph{MCOSS}, a novel solution for optimizing the service selection, given a certain microservice deployment among clouds and clusters in the system. Our solution is agnostic to the different multi-cluster networking layers, cloud vendors, and discovery mechanisms used by the operators. Our simulations show a reduction in outbound traffic cost by up to 72% and response time by up to 64%, compared to the currently-deployed service selection mechanisms.
It is a common belief that Auto-scaling mechanisms serve as a mitigation for Distributed Denial of Service (DDoS) attacks on cloud computing infrastructures by dynamically adding machines to cope with the additional load. Intuitively, such attacks are mostly associated with Economic Denial of Sustainability (EDoS) derived from paying for the extra resources required to process the malicious incoming traffic.
Contrary to this belief, we present and analyze the Yo-Yo attack, a new attack against the auto-scaling mechanism that can cause significant performance degradation in addition to economic damage. We demonstrate the attack on Amazon EC2, Kubernetes, and serverless architecture. We then present and analyze Tandem Attack, a new attack on Microservices architecture. In this attack, the attacker exploits the tandem behavior of services with different auto-scaling mechanisms, causing both economic and performance damage.
Auto-scaling is a fundamental capability of cloud computing which allows consuming resources dynamically according to changing traffic needed to be served.
By the micro-services architecture paradigm, software systems are built as a set of loosely-coupled applications and services that can be individually scaled.
In this paper, we present a new attack the \emph{Tandem Attack} that exploits the Tandem behavior of micro-services with different scaling properties. Such issues can result in Denial of Service (DoS) and Economic Denial of Sustainability (EDoS) created by malicious attackers or self-inflicted due to wrong configurations set up by administrators. We demonstrate the Tandem attack using a popular AWS serverless infrastructure modeling two services and show that removing servers’ management responsibility from the cloud users does not mitigate the different scaling properties challenge and can even make the problem harder to solve.
In recent years, we have witnessed a new kind of DDoS attack, the burst attack(Chai, 2013; Dahan, 2018), where the attacker launches periodic bursts of traffic overload on online targets. Recent work presents a new kind of Burst attack, the YoYo attack (Bremler-Barr et al., 2017) that operates against the auto-scaling mechanism of VMs in the cloud. The periodic bursts of traffic loads cause the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The auto-scaling mechanism translates the flat DDoS attacks into Economic Denial of Sustainability attacks (EDoS), where the victim suffers from economic damage accrued by paying for extra resources required to process the traffic generated by the attacker. However, it was shown that YoYo attack also causes significant performance degradation since it takes time to scale-up VMs.In this research, we analyze the resilience of Kubernetes auto-scaling against YoYo attacks. As containerized cloud applications using Kubernetes gain popularity and replace VM-based architecture in recent years. We present experimental results on Google Cloud Platform, showing that even though the scale-up time of containers is much lower than VM, Kubernetes is still vulnerable to the YoYo attack since VMs are still involved. Finally, we evaluate ML models that can accurately detect YoYo attack on a Kubernetes cluster
Auto-scaling mechanisms are an important line of defense against Distributed Denial of Service (DDoS) in the cloud. Using auto-scaling, machines can be added and removed in an on-line manner to respond to fluctuating load. It is commonly believed that the auto-scaling mechanism casts DDoS attacks into Economic Denial of Sustainability (EDoS) attacks. Rather than suffering from performance degradation up to a total denial of service, the victim suffers only from the economic damage incurred by paying for the extra resources required to process the bogus traffic of the attack.
Contrary to this belief, we present and analyze the Yo-Yo attack, a new attack against the auto-scaling mechanism, that can cause significant performance degradation in addition to economic damage. In the Yo-Yo attack, the attacker sends peri- odic bursts of overload, thus causing the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The Yo- Yo attack is harder to detect and requires less resources from the attacker compared to traditional DDoS. We demonstrate the attack on Amazon EC2 [4], and analyze protection measures the victim can take by reconfiguring the auto-scaling mechanism.
Cybersecurity
Network Function Virtualization (NFV) holds a great promise as it provides flexibility and scalability, reduces costs, and promotes innovation (by moving from hardware-based middleboxes to software-based virtual network functions). These benefits, however, expose network functions to security vulnerabilities. In this paper, we investigate two such attack vectors: algorithmic complexity Denial of Service (DoS) attacks and attacks due to co-residency, which include side-channel attacks and DoS attacks on a specific machine. We propose Moving Target Defense (MTD) mechanisms—which force an attacker to cope with frequent changes ongoing within the targeted network function to carry out a successful attack through the above-mentioned attack vectors. For algorithmic complexity DoS attacks, we show a mechanism that proactively and reactively switches between different implementations of the network function. Thus, eliminating the certainty of the attacker regarding the targeted implementation. For co-residency attacks, we show a framework to efficiently migrate the virtual network function state without migrating the entire virtual machine, which is prohibitive in such a challenging setting. Our experiments show that both mechanisms can counteract these attack vectors and provide significantly better performance than state-of-the-art solutions.
Wi-Fi (IEEE 802.11) is the most-used protocol for wireless internet access on customer premises. The MAC address of each connected device, which used to be static, is being recently randomized (by the device’s operating system) as frequently as daily to prevent tracking and fingerprinting of devices and users. While this feature might be useful in public areas, it disturbs some day-to-day functionalities, such as firewalls, parental control, and similar applications that require a static identifier per device. In this work, we present methods to ensure the functionalities of these applications, even when the MAC address is changed every time the device connects to the network. Our methods work even if the latest MAC randomization techniques are applied and provide these device identifications only to the gateway router. (Potentially malicious) devices that are connected to the same LAN, still see the randomized MAC
With the continuous increase in reported Common Vulnerabilities and Exposures (CVEs), security teams are overwhelmed by vast amounts of data, which are often analyzed manually, leading to a slow and inefficient process. To address cybersecurity threats effectively, it is essential to establish connections across multiple security entity databases, including CVEs, Common Weakness Enumeration (CWEs), and Common Attack Pattern Enumeration and Classification (CAPECs). In this study, we introduce a new approach that leverages the RotatE \cite{RotatE} knowledge graph embedding model, initialized with embeddings from Ada language model developed by OpenAI \cite{embeddingada}. Additionally, we extend this approach by initializing the embeddings for the relations. \ignore{This method surpasses previous attempts and provides a valuable tool for security teams to efficiently identify and respond to cybersecurity threats.
Unlike previous works that only handled CVEs present in the training set, our approach can deal with unseen entities. Furthermore, we contribute a comprehensive dataset and our models for future benchmarking.
Computer networks have undergone and continue to experience a major transformation, whereby billions of low-cost devices, Internet of Things (IoTs), are being connected to the network. Unlike traditional network devices, these devices typically have very limited computational, memory, and power resources, and attackers often exploit them to launch large-scale attacks.
This talk will highlight the security concerns of IoT devices from a networking perspective and explore how to secure IoT devices using whitelists, in which communication between a device and an endpoint is prohibited unless that endpoint appears in the device whitelist.
We present a new scalable ISP level architecture to secure and protect IoT devices. We address several challenges in the system design: the whitelist enforcement architecture, ML techniques to identify IoT devices, and the automatic acquisition of whitelists in the wild.
In this paper we present three attacks on private internal networks behind a NAT and a corresponding new
protection mechanism, Internal Network Policy, to mitigate a wide range of attacks that penetrate internal networks behind a NAT. In the attack scenario, a victim is
tricked to visit the attacker’s website, which contains a
malicious script that lets the attacker access the victim’s
internal network in different ways, including opening a
port in the NAT or sending a sophisticated request to
local devices. The first attack utilizes DNS Rebinding
in a particular way, while the other two demonstrate different methods of attacking the network, based on application security vulnerabilities. Following the attacks,
we provide a new browser security policy, Internal Network Policy (INP), which protects against these types of vulnerabilities and attacks. This policy is implemented
in the browser just like Same Origin Policy (SOP) and
prevents malicious access to internal resources by external entities.
In recent years the number of IoT devices in home networks has increased dramatically. Whenever a new device connects to the network, it must be quickly managed and secured using the relevant security mechanism or QoS policy. Thus a key challenge is to distinguish between IoT and NoT devices in a matter of minutes. Unfortunately, there is no clear indication of whether a device in a network is an IoT. In this paper, we propose different classifiers that identify a device as IoT or non-IoT, in a short time scale, and with high accuracy.
Our classifiers were constructed using machine learning techniques on a seen (training) dataset and were tested on an unseen (test) dataset. They successfully classified devices that were not in the seen dataset with accuracy above 95%. The first classifier is a logistic regression classifier based on traffic features. The second classifier is based on features we retrieve from DHCP packets. Finally, we present a unified classifier that leverages the advantages of the other two classifiers.
We provide an overview of Layer 2 attacks in OpenFlow: ARP Poisoning and a new DDos attack on the Controller, both implemented by us. We will describe our approach to mitigate these attacks, called Switch Reactive ARP-query. The key idea is to shift responsibilities back from the control-plane to the data-plane in order to reduce the load on the Controller.
ARP Poisoning is the kind of attack in which an attacker is able to alter or change the victim’s ARP cache in order to leverage it to Man in the Middle (MitM) attack or a Denial of Service (DoS) attack. A Distributed Denial of Service (DDoS) is a form of attack in which the victim’s resources are being depleted by multiple adversaries.
Both of these attacks are relevant in an OpenFlow-managed SDN network, where the contradicting relationship between the whole view of the network and the centralized Controller may clash.
In this paper, we have successfully mitigated ARP Poisoning attacks and have decreased dramatically and bounded the number of packet-ins, the main cause for the DDoS on the Controller.
This paper takes advantage of the emerging multi-core computer architecture to design a general framework for mitigating network-based complexity attacks. In complexity attacks, an attacker carefully crafts “heavy” messages (or packets) such that each heavy message consumes substantially more resources than a normal message. Then, it sends a sufficient number of heavy messages to bring the system to a crawl at best. In our architecture, called MCA^2 – Multi-Core Architecture for Mitigating Complexity Attacks – cores quickly identify such suspicious messages and divert them to a fraction of the cores that are dedicated to handle all the heavy messages. This keeps the rest of the cores relatively unaffected and free to provide the legitimate traffic the same quality of service as if no attack takes place.
We demonstrate the effectiveness of our architecture by examining cache-miss complexity attacks against Deep Packet Inspection (DPI) engines. For example, for Snort DPI engine, an attack in which 30% of the packets are malicious degrades the system throughput by over 50%, while with MCA^2 the throughput drops by either 20% when no packets are dropped or by 10% in case dropping of heavy packets is allowed. At 60% malicious packets, the corresponding numbers are 70%, 40% and 23%.
DDoS attack
Malicious actors carrying out distributed denial-of-service (DDoS) attacks are interested in requests that consume a large amount of resources and provide them with ammunition. We present a severe complexity attack on DNS resolvers, where a single malicious query to a DNS resolver can significantly increase its CPU load. Even a few such concurrent queries can result in resource exhaustion and lead to a denial of its service to legitimate clients. This attack is unlike most recent DDoS attacks on DNS servers, which use communication amplification attacks where a single query generates a large number of message exchanges between DNS servers.
The attack described here involves a malicious client whose request to a target resolver is sent to a collaborating malicious authoritative server; this server, in turn, generates a carefully crafted referral response back to the (victim) resolver. The chain reaction of requests continues, leading to the delegation of queries. These ultimately direct the resolver to a server that does not respond to DNS queries. The exchange generates a long sequence of cache and memory accesses that dramatically increase the CPU load on the target resolver. Hence the name non-responsive delegation attack, or NRDelegationAttack.
We demonstrate that three major resolver implementations, BIND9, Unbound, and Knot, are affected by the NRDelegationAttack, and carry out a detailed analysis of the amplification factor on a BIND9 based resolver. As a result of this work, three common vulnerabilities and exposures (CVEs) regarding NRDelegationAttack were issued by these resolver implementations. We also carried out minimal testing on 16 open resolvers, confirming that the attack affects them as well.
Auto-scaling is a fundamental capability of cloud computing which allows consuming resources dynamically according to changing traffic needed to be served.
By the micro-services architecture paradigm, software systems are built as a set of loosely-coupled applications and services that can be individually scaled.
In this paper, we present a new attack the \emph{Tandem Attack} that exploits the Tandem behavior of micro-services with different scaling properties. Such issues can result in Denial of Service (DoS) and Economic Denial of Sustainability (EDoS) created by malicious attackers or self-inflicted due to wrong configurations set up by administrators. We demonstrate the Tandem attack using a popular AWS serverless infrastructure modeling two services and show that removing servers’ management responsibility from the cloud users does not mitigate the different scaling properties challenge and can even make the problem harder to solve.
Distributed denial of service (DDoS) attacks, especially distributed reflection denial of service attacks (DRDoS), have increased dramatically in frequency and volume in recent years. Such attacks are possible due to the attacker’s ability to spoof the source address of IP packets. Since the early days of the internet, authenticating the IP source address has remained unresolved in the real world. Although there are many methods available to eliminate source spoofing, they are not widely used, primarily due to a lack of economic incentives.
We propose a collaborative on-demand route-based defense technique (CORB) to offer efficient DDoS mitigation as a paid-for-service, and efficiently assuage reflector attacks before they reach the reflectors and flood the victim. The technique uses scrubbing facilities located across the internet at internet service providers (ISPs) and internet exchange points (IXPs).
By transmitting a small amount of data based on border gateway protocol (BGP) information from the victim to the scrubbing facilities, we can filter out the attack without any false-positive cases. For example, the data can be sent using DOTS, a new signaling DDoS protocol that was standardized by the IETF. CORB filters the attack before it is amplified by the reflector, thereby reducing the overall cost of the attack. This provides a win-win financial situation for the victim and the scrubbing facilities that provide the service.
We demonstrate the value of CORB by simulating a Memcached DRDoS attack using real-life data. Our evaluation found that deploying CORB on scrubbing facilities at approximately 40 autonomous systems blocks 90% of the attack and can reduce the mitigation cost by 85%.
In recent years, we have witnessed a new kind of DDoS attack, the burst attack(Chai, 2013; Dahan, 2018), where the attacker launches periodic bursts of traffic overload on online targets. Recent work presents a new kind of Burst attack, the YoYo attack (Bremler-Barr et al., 2017) that operates against the auto-scaling mechanism of VMs in the cloud. The periodic bursts of traffic loads cause the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The auto-scaling mechanism translates the flat DDoS attacks into Economic Denial of Sustainability attacks (EDoS), where the victim suffers from economic damage accrued by paying for extra resources required to process the traffic generated by the attacker. However, it was shown that YoYo attack also causes significant performance degradation since it takes time to scale-up VMs.In this research, we analyze the resilience of Kubernetes auto-scaling against YoYo attacks. As containerized cloud applications using Kubernetes gain popularity and replace VM-based architecture in recent years. We present experimental results on Google Cloud Platform, showing that even though the scale-up time of containers is much lower than VM, Kubernetes is still vulnerable to the YoYo attack since VMs are still involved. Finally, we evaluate ML models that can accurately detect YoYo attack on a Kubernetes cluster
This paper exposes a new vulnerability and introduces a corresponding attack, the NoneXistent Name Server Attack (NXNSAttack), that disrupts and may paralyze the DNS system, making it difficult or impossible for Internet users to access websites, web e-mail, online video chats, or any other online resource. The NXNSAttack generates a storm of packets between DNS resolvers and DNS authoritative name servers. The storm is produced by the response of resolvers to unrestricted referral response messages of authoritative name servers. The attack is significantly more destructive than NXDomain attacks (e.g., the Mirai attack): i) It reaches an amplification factor of more than 1620x on the number of packets exchanged by the recursive resolver. ii) In addition to the negative cache, the attack also saturates the ‘NS’ section of the resolver caches. To mitigate the attack impact, we propose an enhancement to the recursive resolver algorithm, MaxFetch(k), that prevents unnecessary proactive fetches. We implemented the MaxFetch(1) mitigation enhancement on a BIND resolver and tested it on real-world DNS query datasets. Our results show that MaxFetch(1) degrades neither the recursive resolver throughput nor its latency. Following the discovery of the attack, a responsible disclosure procedure was carried out, and several DNS vendors and public providers have issued a CVE and patched their systems.
We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P the messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A the messages captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings that are frequently found in A and not in P , thus allowing the identification of the attack packets. This is an important tool in protecting sites on the Internet from worm attacks and distributed denial of service attacks and may also be useful for other problems, including command and control identification and the DNA-sequences analysis. The main contributions of this paper are the system we developed to extract the required signatures together with the string-heavy hitters problem definition and the algorithm for solving this problem. This algorithm finds popular strings of variable length in a set of messages, using, in a tricky way, the classic heavy-hitter algorithm as a building block. The algorithm runs in linear time requiring one-pass over the input. Our system makes use of this algorithm to extract the desired signatures. Furthermore, we provide an extended algorithm which is able to identify groups of signatures, often found together in the same packets, which further improves the quality of signatures generated by our system. Using our system, a yet unknown attack can be detected and stopped within minutes from attack start time.
In this work we measure what percentage of DNS recursive
resolvers perform negative caching in the wild. We deploy
our own authoritative name server and harness thousands
of RIPE Atlas [3] sites spread over the globe to perform
repeated DNS queries for non-existing sub-domains of our
authoritative domain.
Random Subdomain DDoS attacks on the Domain Name System (DNS) infrastructure are becoming a popular vector in recent attacks (e.g., recent Mirai attack on Dyn). In these attacks, many queries are sent for a single or a few victim domains, yet they include highly varying non-existent subdomains generated randomly.
Motivated by these attacks we designed and implemented novel and efficient algorithms for distinct heavy hitters (dHH). A (classic) heavy hitter (HH) in a stream of elements is a key (e.g., the domain of a query) which appears in many elements (e.g., requests).
When stream elements consist of <key, subkey> pairs, (<domain, subdomain>) a distinct heavy hitter (dhh) is a key that is paired with a large number of different subkeys. Our algorithms dominate previous designs in both the asymptotic (theoretical) sense and practicality. Specifically, the new fixed-size algorithms are simple to code and with asymptotically optimal space accuracy tradeoffs.
Based on these algorithms, we build and implement a system for the detection and mitigation of Random Subdomain DDoS attacks. We perform experimental evaluation, demonstrating the effectiveness of our algorithms.
Traditional DDoS anti-spoofing scrubbers require dedicated middleboxes thus adding CAPEX, latency and complexity in the network. This paper starts by showing that the current SDN match-and-action model is rich enough to implement a collection of anti-spoofing methods. Secondly we develop and utilize advance methods for dynamic resource sharing to distribute the required mitigation resources over a network of switches.
None of the earlier attempts to implement anti-spoofing in SDN actually directly exploited the match and action power of the switch data plane. They required additional functionalities on top of the match-and-action model, and are not implementable on an SDN switch as is. Our method builds on the premise that an SDN data path is a very fast and efficient engine to perform low level primitive operations at wire speed. The solution requires a number of flow-table rules and switch-controller messages proportional to the legitimate traffic. To scale when protecting multiple large servers the flow tables of multiple switches are harnessed in a distributed and dynamic network based solution.
We have fully implemented all our methods in either OpenFlow1.5 in Open-vSwitch and in P4. The system mitigates spoofed attacks on either the SDN infrastructure itself or on downstream servers.
Auto-scaling mechanisms are an important line of defense against Distributed Denial of Service (DDoS) in the cloud. Using auto-scaling, machines can be added and removed in an on-line manner to respond to fluctuating load. It is commonly believed that the auto-scaling mechanism casts DDoS attacks into Economic Denial of Sustainability (EDoS) attacks. Rather than suffering from performance degradation up to a total denial of service, the victim suffers only from the economic damage incurred by paying for the extra resources required to process the bogus traffic of the attack.
Contrary to this belief, we present and analyze the Yo-Yo attack, a new attack against the auto-scaling mechanism, that can cause significant performance degradation in addition to economic damage. In the Yo-Yo attack, the attacker sends peri- odic bursts of overload, thus causing the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The Yo- Yo attack is harder to detect and requires less resources from the attacker compared to traditional DDoS. We demonstrate the attack on Amazon EC2 [4], and analyze protection measures the victim can take by reconfiguring the auto-scaling mechanism.
This paper starts by demonstrating the vulnerability of Deep Packet Inspection (DPI) mechanisms, which are at the core of security devices, to algorithmic complexity denial of service attacks, thus exposing a weakness in the first line of defense of enterprise networks and clouds. A system and a multi-core architecture to defend from these algorithmic complexity attacks is presented in the second part of the paper. The integration of this system with two different DPI engines is demonstrated and dis- cussed. The vulnerability is exposed by showing how a simple low bandwidth cache-miss attack takes down the Aho-Corasick (AC) pattern matching algorithm that lies at the heart of most DPI engines. As a first step in the mitigation of the attack, we have developed a compressed variant of the AC algorithm that improves the worst case performance (under an attack). Still, under normal traffic its running-time is worse than classical AC implementations. To overcome this problem, we introduce MCA^2 —Multi- Core Architecture to Mitigate Complexity Attacks, which dynamically combines the classical AC algorithm with our compressed implementation, to provide a robust solution to mitigate this cache- miss attack. We demonstrate the effectiveness of our architecture by examining cache-miss algorithmic complexity attacks against DPI engines and show a goodput boost of up to 73%. Finally, we show that our architecture may be generalized to provide a principal solution to a wide variety of algorithmic complexity attacks.
Performance analysis has been used for over half a century for the design and operations of computer systems and communications networks. This methodology, mostly stochastic in nature, has been based on the underlying paradigm that users are innocent and “well behaving”. Following the flourish of the internet and the subsequent rise of malicious incidents, research started investigating the effect of malicious attacks aimed at degrading system performance. The objective of this work is to understand how system performance is affected by malicious behavior and how performance evaluation should account for it. We do so by considering an array of “classical” systems taken from the literature and examining their degree of vulnerability. These can also serve in providing some initial insights into what makes a system design vulnerable or resilient.
This paper takes advantage of the emerging multi-core computer architecture to design a general framework for mitigating network-based complexity attacks. In complexity attacks, an attacker carefully crafts “heavy” messages (or packets) such that each heavy message consumes substantially more resources than a normal message. Then, it sends a sufficient number of heavy messages to bring the system to a crawl at best. In our architecture, called MCA^2 – Multi-Core Architecture for Mitigating Complexity Attacks – cores quickly identify such suspicious messages and divert them to a fraction of the cores that are dedicated to handle all the heavy messages. This keeps the rest of the cores relatively unaffected and free to provide the legitimate traffic the same quality of service as if no attack takes place.
We demonstrate the effectiveness of our architecture by examining cache-miss complexity attacks against Deep Packet Inspection (DPI) engines. For example, for Snort DPI engine, an attack in which 30% of the packets are malicious degrades the system throughput by over 50%, while with MCA^2 the throughput drops by either 20% when no packets are dropped or by 10% in case dropping of heavy packets is allowed. At 60% malicious packets, the corresponding numbers are 70%, 40% and 23%.
Deep Packet Inspection (DPI)
We present range encoding with no expansion (RENÉ)- a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems, such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENÉ on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.
Most modern networks nowadays contain a massive amount of appliances, each appliance typically executes one network function (NF) (eg. Firewall). Each such appliance is bought, configured and administered separately. Most NFs perform some kind of Deep Packet Inspection (DPI). OpenBox provides a framework for network-wide deployment and management of NFs which decouples the NFs control plane from NFs data plane. OpenBox consists of three logic components. First, user-defined OpenBox Applications that provide NF specifications. Second, a logically-centralized OpenBox Controller (OBC) which serves as the control plane. Finally, OpenBox Instances (OBI) constitute OpenBox’s data plane.
This work presents a design and implementation [2] for the user facing interface of the OpenBox Controller, which allows network administrators to efficiently create and manage their NFs. The implementation supplies users with a framework from which they can build and experiment with NFs, as well as a functioning OpenBox Controller which loads NFs and manages the OpenBox control plane. The design is extensible and allows OpenBox future developers to quickly add more functionality and retrieve more data from the control plane.
Today, most of the network traffic need to traverse through several middleboxes before it can reach its destination. Common operation between the many of these middlebox is DPI – Deep Packet Inspection, which allows to perform different actions based on patterns in the packets content.
DPI consumes many of the middlebox resources during its operation. In addition, each packet usually traverses several middleboxes which causes the same packet to be scanned by different DPI engines over and over again. As a result the network becomes less efficient, which affects directly its total bandwidth.
One solution for those issues is a system that provide DPI as service. Means, the different middleboxes in the network that need DPI, can register to the service and expose their desired patterns. The System will direct the packets to a designated DPI engine instances across the network and pass the pattern matches, if exists, to the relevant middlebox.
There are many advantages in such system, among others: a single scan of every packet, the ability to upgrade to latest DPI algorithms, better partition of packets between DPI engines and increasing middlebox development innovation. Developing such a system is more simple today than ever with the emerging of SDN, which allows dynamic routing of the network traffic using a centralized controller.
The goal of this work is to implement a prototype of the DPI as a service system and to provide a realistic as possible environment to evaluate it. This paper documents the design and implementation of the system and other tools which are needed to deploy functioning network that uses this system.
Finally, the paper describes the experiments done to prove the system correctness and effectiveness and discusses their results.
This paper starts by demonstrating the vulnerability of Deep Packet Inspection (DPI) mechanisms, which are at the core of security devices, to algorithmic complexity denial of service attacks, thus exposing a weakness in the first line of defense of enterprise networks and clouds. A system and a multi-core architecture to defend from these algorithmic complexity attacks is presented in the second part of the paper. The integration of this system with two different DPI engines is demonstrated and discussed. The vulnerability is exposed by showing how a simple low bandwidth cache-miss attack takes down the Aho-Corasick (AC) pattern matching algorithm that lies at the heart of most DPI engines. As a first step in the mitigation of the attack, we have developed a compressed variant of the AC algorithm that improves the worst case performance (under an attack). Still, under normal traffic its running-time is worse than classical AC implementations. To overcome this problem, we introduce MCA 2 -Multi-Core Architecture to Mitigate Complexity Attacks, which dynamically combines the classical AC algorithm with our compressed implementation, to provide a robust solution to mitigate this cache-miss attack. We demonstrate the effectiveness of our architecture by examining cache-miss algorithmic complexity attacks against DPI engines and show a goodput boost of up to 73%. Finally, we show that our architecture may be generalized to provide a principal solution to a wide variety of algorithmic complexity attacks.
This paper starts by demonstrating the vulnerability of Deep Packet Inspection (DPI) mechanisms, which are at the core of security devices, to algorithmic complexity denial of service attacks, thus exposing a weakness in the first line of defense of enterprise networks and clouds. A system and a multi-core architecture to defend from these algorithmic complexity attacks is presented in the second part of the paper. The integration of this system with two different DPI engines is demonstrated and dis- cussed. The vulnerability is exposed by showing how a simple low bandwidth cache-miss attack takes down the Aho-Corasick (AC) pattern matching algorithm that lies at the heart of most DPI engines. As a first step in the mitigation of the attack, we have developed a compressed variant of the AC algorithm that improves the worst case performance (under an attack). Still, under normal traffic its running-time is worse than classical AC implementations. To overcome this problem, we introduce MCA^2 —Multi- Core Architecture to Mitigate Complexity Attacks, which dynamically combines the classical AC algorithm with our compressed implementation, to provide a robust solution to mitigate this cache- miss attack. We demonstrate the effectiveness of our architecture by examining cache-miss algorithmic complexity attacks against DPI engines and show a goodput boost of up to 73%. Finally, we show that our architecture may be generalized to provide a principal solution to a wide variety of algorithmic complexity attacks.
URL matching lies at the core of many networking applications and Information Centric Networking architectures. For example, URL matching is extensively used by Layer 7 switches, ICN/NDN routers, load balancers, and security devices. Modern URL matching is done by maintaining a rich database that consists of tens of millions of URL which are classified to dozens of categories (or egress ports). In real-time, any input URL has to be searched in this database to find the corresponding category.
In this paper, we introduce a generic framework for accurate URL matching (namely, no false positives or miscategorization) that aims to reduce the overall memory footprint, while still having low matching latency. We introduce a dictionary-based compression method that compresses the database by 60%, while having only a slight overhead in time. Our framework is very flexible and it allows hot-updates,
cloud-based deployments, and can deal with strings that are not URLs.
Deep Packet Inspection (DPI) is a widespread functionality among middlebox applications. In many cases, DPI is performed by Intrusion Detection Systems (IDS) such as Snort. Traditionally, each packet is re-scanned by multiple middleboxes until reaching its final destination. Recent studies show that DPI is one of the most processing intensive tasks of modern middleboxes.
The DPI as a Service paper, presents a framework which allows to extract the time-consuming task of DPI out of the middleboxes while providing it as a service. Alongside with the framework design, the authors introduce a reference implementation on a simulated environment, while demonstrating promising results through a set of experiments.
In this work we have enhanced the reference implementation in order to demonstrate that the framework can operate in a more realistic environment setup. First, and foremost, we have integrated the framework with the commonly used Snort IDS. Second, we have extended the DPI Service to support the Network Service Header (NSH) protocol which allows passing of the pattern match results with the inspected packet. These significant enhancements, transformed the reference implementation to a more robust system, which can take proactive measures in an event of malicious pattern detection.
Finally, once the work on the framework was completed we were able to perform the basic experiments which were reported in the original paper. Our findings indicate, that the original framework results are reproducible in a our version of the framework.
We present RENE ́ — a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENE ́ on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.
We present OpenBox — a software-defined framework for network-wide development, deployment, and management of network functions (NFs). OpenBox effectively decouples the control plane of NFs from their data plane, similarly to SDN solutions that only address the network’s forwarding plane.
OpenBox consists of three logic components. First, user-defined OpenBox applications provide NF specifications through the OpenBox north-bound API. Second, a logically-centralized OpenBox controller is able to merge logic of multiple NFs, possibly from multiple tenants, and to use a network-wide view to efficiently deploy and scale NFs across the network data plane. Finally, OpenBox instances constitute OpenBox’s data plane and are implemented either purely in software or contain specific hardware accelerators (e.g., a TCAM). In practice, different NFs carry out similar processing steps on the same packet, and our experiments indeed show a significant improvement of the network performance when using OpenBox. Moreover, OpenBox readily supports smart NF placement, NF scaling, and multi-tenancy through its controller.
The OpenBox Framework is framework that effectively decouples the con- trol plane of NFs from their data plane. Similarly to SDN solutions that address only the networks forwarding plane (e.g., switching, routing), OpenBox provides a framework for network-wide deployment and management of NFs. The OpenBox framework is composed of three logic components: OpenBox Application, OpenBox Controller and OpenBox Instances used as the data plane.
This project presents a design of a general Open Box Instance that can be used as the data plane of the OpenBox Framework. The suggested architecture is modular in nature and allows the easy replacement of its packet processing engine. This feature allows a lot of improvement and innovation in the way packets are processed with an OBI and between them.
We also present a reference implementation of the suggested architecture which shows its usability as an OpenBox Instance and integrates it inside a working OpenBox Framework. Our reference implementation uses Click as its packet processing engine and explains how it can be easily replaced.
In this project we implemented the Aho-Corasick for Compressed HTTP (ACCH) as a pattern matching engine for Snort.
Similarity search, and specifically the nearest-neighbor search (NN) problem is widely used in many fields of computer science such as machine learning, computer vision and databases. However, in many settings such searches are known to suffer from the notorious curse of dimensionality, where running time grows exponentially with d. This causes severe performance degradation when working in high-dimensional spaces. Approximate techniques such as locality-sensitive hashing improve the performance of the search, but are still computationally intensive.
In this paper we propose a new way to solve this problem using a special hardware device called ternary content addressable memory (TCAM). TCAM is an associative memory, which is a special type of computer memory that is widely used in switches and routers for very high speed search applications. We show that the TCAM computational model can be leveraged and adjusted to solve NN search problems in a single TCAM lookup cycle, and with linear space. This concept does not suffer from the curse of dimensionality and is shown to improve the best known approaches for NN by more than four orders of magnitude. Simulation results demonstrate dramatic improvement over the best known approaches for NN, and suggest that TCAM devices may play a critical role in future large-scale databases and cloud applications.
Configuring range based packet classification rules in network switches is crucial to all network core functionalities, such as firewalls and routing. However, OpenFlow, the leading management protocol for SDN switches, lacks the interface to configure range rules directly and only provides mask based rules, named flow entries.
In this work we present, ORange, the first solution to multi dimensional range classification in OpenFlow. Our solution is based on paradigms used in state of the art non-OpenFlow classifiers and is designed in a modular fashion allowing future extensions and improvements. We consider switch space utilization as well as atomic updates functionality, and in the network context we provide flow consistency even if flows change their entrance point to the network during policy updates, a property we name cross-entrance consistency. Our scheme achieves remarkable results and is easy to deploy.
Deep Packet Inspection (DPI) plays a major role in contemporary networks. Specifically, in datacenters of content providers, the scanned data may be highly repetitive. Most DPI engines are based on identifying signatures in the packet payload. This pattern matching process is expensive both in memory and CPU resources, and thus, often becomes the bottleneck of the entire application.
In this paper we show how DPI can be accelerated by leveraging repetitions in the inspected traffic. Our new mechanism makes use of these repetitions to allow the repeated data to be skipped rather than scanned again. The mechanism consists of a slow path, in which frequently repeated strings are identified and stored in a dictionary, along with some succinct information for accelerating the DPI process, and a data path, where the traffic is scanned byte by byte but strings from the dictionary, if encountered, are skipped. Upon skipping, the data path recovers to the state it would have been in had the scanning continued byte by byte.
Our solution achieves a significant performance boost, especially when data is from the same content source (e.g., the same website). Our experiments show that for such cases, our solution achieves a throughput gain of 1.25 − 2.5 times the original throughput, when implemented in software.
This paper focuses on regular expression matching over compressed traffic. The need for such matching arises from two independent trends. First, the volume and share of compressed HTTP traffic is constantly increasing. Second, due to their superior expressibility, current Deep Packet Inspection engines use regular expressions more and more frequently.
We present an algorithmic framework to accelerate such matching, taking advantage of information gathered when the traffic was initially compressed. HTTP compression is typically performed through the GZIP protocol, which uses back-references to repeated strings. Our algorithm is based on calculating (for every byte) the minimum number of (previous) bytes that can be part of a future regular expression matching. When inspecting a back-reference, only these bytes should be taken into account, thus enabling one to skip repeated strings almost entirely without missing a match. We show that our generic framework works with either NFA-based or DFA-based implementations and gains performance boosts of more than 70%. Moreover, it can be readily adapted to most existing regular expression matching algorithms, which usually are based either on NFA, DFA or combinations of the two. Finally, we discuss other applications in which calculating the number of relevant bytes becomes handy, even when the traffic is not compressed.
A recursive and fast construction of an n-element priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date
require either O(log n) instructions per operation or, exponential (with key size) space or, expensive special hardware whose cost and latency dramatically increases with the priority queue size.
Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues
(which are also much faster) while maintaining the O(1) steps per PQ operation is critical. Here
we present such an acceleration technique called the Power Priority Queue (PPQ) technique.
Specifically, an n-element PPQ is constructed from 2k − 1 primitive priority queues of size k √n
(k = 2, 3, …) and a RAM of size n, where the throughput of the construct beats that of a single,
size n primitive hardware priority queue. For example an n-element PQ can be constructed from
either three √n or five 3√n primitive H/W priority queues.
Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This
demonstrates the benefits of our scheme; when used with hardware TCAM. We expect similar results with systolic arrays, shift-registers and similar technologies.
As a byproduct of our technique we present an O(n) time sorting algorithm in a system
equipped with a O(w√n) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used
an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n-element with TCAM of size O(n) that matches our TCAM based sorting algorithm.
TCAM is a fast ternary associative cache memory, which is common in today’s routers, mainly in order to perform high rate packet-classification. Recently we are witnessing many works that use this powerful memory type to solve other hard problems that require high-speed solutions for example, pattern matching, regular expression matching, and heavy hitters analysis. While TCAM is a very powerful off-the-shelf new type of memory, currently it still requires hardware specialties to use it. Thus the research works are evaluated and analyzed using synthetic model only.
We show that understanding the detailed design of TCAM is important in order to understand the limitations and power of TCAM, as many works ignore the fact that while TCAM has a high throughput, it also has, by design, a high latency. Thus, many of the previously-proposed works which assumed closed-loop lookups (namely, where the input of a TCAM lookup depends on the result of a previous TCAM lookup) cannot be efficiently implemented as is and require a modification to their algorithm.
We present TCAMimic, a TCAM simulator that addresses the need of an easy-to-use software simulator of TCAM hardware. Using the TCAMimic simulator, we run intra-flow interleaving, where queries from different parts of the same flow are interleaved, and show that this significantly reduces the latency with only a marginal reduction in the throughput.
We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P of messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings, that are frequently found in A and not in P . Therefore, a packet containing one of the strings from S is likely to be an attack packet.
This is an important tool in protecting sites on the Internet from Worm attacks, and Distributed Denial of Service (DDoS) attacks. It may also be useful for other problems, including command and control identification, DNA-sequences analysis, etc. which are beyond the scope of this work.
Two contributions of this paper are the system we developed to extract the required signatures together with the problem definition and the string-heavy hitters algorithm. This algorithm finds popular strings of variable length in a set of messages, using, in a tricky way, the classic heavy-hitter algorithm as a building block. This algorithm is then used by our system to extract the desired signatures. Using our system a yet unknown attack can be detected and stopped within minutes from attack start time.
A recursive and fast construction of an n elements priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date either require O (log n ) instructions per operation or exponential (with key size) space or expensive special hardware whose cost and latency dramatically increases with the priority queue size. Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues (which are also much faster)
while maintaining the O (1) steps per PQ operation is critical. Here we present such an acceleration technique called the Power Priority Queue (PPQ) technique. Specifically, an n elements PPQ is constructed from 2k −1 primitive priority queues of size k√n ( k = 2 , 3 , … ) and a RAM of size n , where the throughput of the construct beats that of a single, size n primitive hardware priority queue. For example an n elements PQ can be constructed from either three √n or five 3√n primitive H/W priority queues.
Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This demonstrates the benefits of our scheme when used with hardware TCAM, we expect similar results with systolic arrays,
shift-registers and similar technologies.
As a by product of our technique we present an O(n) time sorting algorithm i n a system equipped with a O(w√n) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n elements with TCAM of size O(n) that matches our TCAM based sorting algorithm.
In software-defined networks (SDNs), the network controller first formulates abstract network-wide policies, and then implements them in the forwarding tables of network switches. However, fast SDN tables often cannot scale beyond a few hundred entries. This is because they typically include wildcards, and therefore are implemented using either expensive and power-hungry TCAMs, or complex and slow data structures.
This paper presents the Palette distribution framework for decomposing large SDN tables into small ones and then distributing them across the network, while preserving the overall SDN policy semantics. Palette helps balance the sizes of the tables across the network, as well as reduce the total number of entries by sharing resources among different connections. It copes with two NP-hard optimization problems: Decomposing a large SDN table into equivalent subtables, and distributing the subtables such that each connection traverses each type of subtable at least once. To implement the Palette distribution framework, we introduce graph-theoretical formulations and algorithms, and show that they achieve close-to-optimal results in practice.
A central component in all contemporary intrusion detection systems (IDSs) is their pattern matching algorithms, which are often based on constructing and traversing a deterministic finite automaton (DFA) that represents the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs, which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name used by common DFA encoding is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state are represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Match (LPM), which can be solved either in ternary content-addressable memory (TCAM), in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gb/s with low power consumption.
Ternary content-addressable memories (TCAMs) are increasingly used for high-speed packet classification. TCAMs compare packet headers against all rules in a classification database in parallel and thus provide high throughput unparalleled by software-based solutions. TCAMs are not well-suited, however, for representing rules that contain range fields. Such rules typically have to be represented (or encoded) by multiple TCAM entries. The resulting range expansion can dramatically reduce TCAM utilization. A TCAM range-encoding algorithm A is database-independent if, for all ranges r, it encodes r independently of the database in which it appears; otherwise, we say that A is database-dependent. Typically, when storing a classification database in TCAM, a few dozens of so-called extra bits in each TCAM entry remain unused. These extra bits are used by some (both database-dependent and database-independent) prior algorithms to reduce range expansion. The majority of real-life database ranges are short. We present a novel database-independent algorithm called Short Range Gray Encoding (SRGE) for the efficient representation of short range rules. SRGE encodes range endpoints as binary-reflected Gray codes and then represents the resulting range by a minimal set of ternary strings. To the best of our knowledge, SRGE is the first algorithm that achieves a reduction in range expansion in general, and a significant expansion reduction for short ranges in particular, without resorting to the use of extra bits. The “traditional” database-independent technique for representing range entries in TCAM is prefix expansion. As we show, SRGE significantly reduces the expansion of short ranges in comparison with prefix expansion. We also prove that the SRGE algorithm’s range expansion is at least as good as that of prefix expansion for any range. Real-world classification databases contain a small number of unique long ranges, some of which appear in numerous rules. These long ranges cause high expansion which is not significantly reduced by any database-independent range encoding scheme that we are aware of, including SRGE. We introduce hybrid SRGE, a database-dependent encoding scheme that uses SRGE for reducing the expansion of short ranges and uses extra bits for reducing the expansion caused by long ones. Our comparative analysis establishes that hybrid SRGE utilizes TCAM more efficiently than previously published range-encoding algorithms. This work also makes a more theoretic contribution. Prefix expansion for ranges defined by W-bit endpoints has worst-case expansion ratio of 2W-2. It follows from the work of Schieber et al. [1] that the SRGE algorithm has a slightly better worst-case expansion ratio of 2W-4. We prove that any independent TCAM encoding scheme has worst-case expansion ratio of at least W.
In this paper we focus on the process of deep packet inspection of compressed web traffic. The major limiting factor in this process imposed by the compression, is the high memory requirements of 32 KB per connection. This leads to the requirements of hundreds of megabytes to gigabytes of main memory on a multi-connection setting. We introduce new algorithms and techniques that drastically reduce this space requirement for such bump-in-the-wire devices like security and other content based networking tools. Our proposed scheme improves both space and time performance by almost 80% and over 40% respectively, thus making real-time compressed traffic inspection a viable option for networking devices.
A method for processing communication traffic includes receiving an incoming stream of compressed data conveyed by a sequence of data packets, each containing a respective portion of the compressed data. The respective portion of the compressed data contained in the first packet is stored in a buffer, having a predefined buffer size. Upon receiving a subsequent packet, at least a part of the compressed data stored in the buffer and the respective portion of the compressed data contained in the subsequent packet are decompressed, thereby providing decompressed data. A most recent part of the decompressed data that is within the buffer size is recompressed and stored in the buffer.
Bloom Filters should particularly suit network devices, because of their low theoretical memory-access rates. However, in practice, since memory is often divided into blocks and Bloom Filters hash elements into several arbitrary memory blocks, Bloom Filters actually need high memory-access rates. On the other hand, hashing all Bloom Filter elements into a single memory block to solve this problem also yields high false positive rates.
In this paper, we propose to implement load-balancing schemes for the choice of the memory block, along with an optional overflow list, resulting in improved false positive rates while keeping a high memory-access efficiency. To study this problem, we define, analyze and solve a fundamental access-constrained balancing problem, where incoming elements need to be optimally balanced across resources while satisfying average and instantaneous constraints on the number of memory accesses associated with checking the current load of the resources. We then build on this problem to suggest a new access-efficient Bloom Filter scheme, called the Balanced Bloom Filter. Finally, we show that this scheme can reduce the false positive rate by up to two orders of magnitude, with a worst-case cost of up to 3 memory accesses for each element and an overflow list size of 0 . 5% of the elements.
Deep Packet Inspection (DPI) is the most time and resource consuming procedure in contemporary security tools such as Network Intrusion Detection/Prevention System (NIDS/IPS), Web Application Firewall (WAF), or Content Filtering Proxy. DPI consists of inspecting both the packet header and payload and alerting when signatures of malicious software appear in the traffic. These signatures are identified through pattern matching algorithms.
The portion of compressed traffic of overall Internet traffic is constantly increasing. This paper focuses on traffic compressed using shared dictionary. Unlike traditional compression algorithms, this compression method takes advantage of the inter-response redundancy (e.g., almost the same data is sent over and over again) as in nowadays dynamic Data. Shared Dictionary Compression over HTTP (SDCH), introduced by Google in 2008, is the first algorithm of this type. SDCH works well with other compression algorithm (as Gzip), making it even more appealing. Performing DPI on any compressed traffic is considered hard, therefore today’s security tools either do not inspect compressed data, alter HTTP headers to avoid compression, or decompress the traffic before inspecting it.
We present a novel pattern matching algorithm that inspects SDCH-compressed traffic without decompressing it first. Our algorithm relies on offline inspection of the shared dictionary, which is common to all compressed traffic, and marking auxiliary information on it to speed up the online DPI inspection. We show that our algorithm works near the rate of the compressed traffic, implying a speed gain of SDCH’s compression ratio (which is around 40%). We also discuss how to deal with SDCH compression over Gzip compression, and show how to perform regular expression matching with about the same speed gain.
Ternary content-addressable memories (TCAMs) are increasingly used for high-speed packet classification. TCAMs compare packet headers against all rules in a classification database in parallel and thus provide high throughput.
TCAMs are not well-suited, however, for representing rules that contain range fields and previously published algorithms typically represent each such rule by multiple TCAM entries. The resulting range expansion can dramatically reduce TCAM utilization because it introduces a large number of redundant TCAM entries. This redundancy can be mitigated by making use of extra bits, available in each TCAM entry.
We present a scheme for constructing efficient representations of range rules, based on the simple observation that sets of disjoint ranges may be encoded much more efficiently than sets of overlapping ranges. Since the ranges in real-world classification databases are, in general, non-disjoint, the algorithms we present split ranges between multiple layers, each of which consisting of mutually disjoint ranges. Each layer is then coded and assigned its own set of extra bits.
Our layering algorithms are based on approximations for specific variants of interval-graph coloring. We evaluate these algorithms by performing extensive comparative analysis on real-life classification databases. Our analysis establishes that our algorithms reduce the number of redundant TCAM entries caused by range rules by more than 60% as compared with best range-encoding prior work.
In this paper we focus on the process of deep packet inspection of compressed web traffic. The major limiting factor in this process imposed by the compression, is the high memory requirements of 32KB per connection. This leads to the requirements of hundreds of megabytes to gigabytes of main memory on a multi-connection setting. We introduce new algorithms and techniques that drastically reduce this space requirement for such bump-in-the-wire devices like security and other content based networking tools. Our proposed scheme improves both space and time performance by almost 80% and over 40% respectively, thus making real-time compressed traffic inspection a viable option for networking devices.
A method for processing data includes encoding a finite automaton, which includes states and transitions between the states that express a plurality of predefined patterns, by grouping the states of the automaton into sets according to a common property shared by the states in each set, and assigning codes to the states according to the grouping. The codes are stored in an electronic memory, along with rules that are associated with the patterns. The automaton is traversed in order to identify one or more of the patterns in an input sequence of data elements by iteratively reading out the codes from the memory responsively to the data elements and to the codes that have been previously read out. Upon identifying a given pattern in the input sequence, an associated action is performed.
Compressing web traffic using standard GZIP is becoming both popular and challenging due to the huge increase in wireless web devices, where bandwidth is limited. Security and other content based networking devices are required to decompress the traffic of tens of thousands concurrent connections in order to inspect the content for different signatures. The overhead imposed by the decompression inhibits most devices from handling compressed traffic, which in turn either limits traffic compression or introduces security holes and other disfunctionalities.
The ACCH algorithm was the first to present a unified approach to pattern matching and decompression, by taking advantage of information gathered in the decompression phase to accelerate the pattern matching. ACCH accelerated the DFA-based Aho-Corasick multi-pattern matching algorithm. In this paper, we present a novel algorithm, SPC (Shift-based Pattern matching for Compressed traffic) that accelerates the commonly used Wu-Manber pattern matching algorithm. SPC is simpler and has higher throughput and lower storage overhead than ACCH. Analysis of real web traffic and real security devices signatures shows that we can skip scanning up to 87 . 5% of the data and gain performance boost of more than 51% as compared to ACCH. Moreover, the additional storage requirement of the technique requires only 4 KB additional information per connection as compared to 8 KB of ACCH.
Deep Packet Inspection (DPI) lies at the core of contemporary Network Intrusion Detection/Prevention Systems and Web Application Firewalls. DPI aims to identify various malware (including spam and viruses) by inspecting both the header and the payload of each packet and comparing it to a known set of patterns. DPI is often performed on the critical path of the packet processing, thus the overall performance of the security tools is dominated by the speed of DPI. The seminal algorithm of Aho-Corasick (AC) is the de-facto standard for pattern matching in network intrusion detection systems (NIDS). Basically, the AC algorithm constructs a Deterministic Finite Automaton (DFA) for detecting all occurrences of a given set of patterns by processing the input in a single pass. The input is inspected symbol by symbol (usually each symbol is a byte), such that each symbol results in a state transition. Thus, in principle, the AC algorithm has deterministic performance, which does not depend on specific input and therefore is not vulnerable to algorithmic complexity attacks, making it very attractive to NIDS systems.
In this paper we show that, when implementing the AC algorithm in software, this property does not hold, due to the fact that contemporary pattern sets induce very large DFAs that cannot be stored entirely in cache. We then propose a novel technique to compress the representation of the Aho-Corasick automaton, so it can fit in modern cache. We compare both the performance and the memory footprint of our technique to previously-proposed implementation, under various settings and pattern sets. Our results reveal the space-time tradeoffs of DPI. Specifically, we show that our compression technique reduces the memory footprint of the best prior-art algorithm by approximately 60% , while achieving comparable throughput.
Compressing web traffic using standard GZIP is becoming both popular and challenging due to the huge increase in wireless web devices, where bandwidth is limited. Security and other content based networking devices are required to decompress the traffic of tens of thousands concurrent connections in order to insp ect the content for different signatures. The major limiting factor in this pro cess is the high memory requirements of 32KB p er connection that leads to hundreds of megabytes to gigabytes of main memory consumption. This requirement inhibits most devices from handling compressed traffic, which in turn either limits traffic compression or intro duces security holes and other dysfunctionalities. In this paper we introduce new algorithms and techniques that drastically reduce this space requirement by over 80%, with only a slight increase in the time overhead, thus making real-time compressed traffic inspection a viable option for network devices.
Current security tools, using ‘signature-based’ detection, do not handle compressed traffic, whose market-share is constantly increasing. This paper focus on compressed HTTP traffic. HTTP uses GZIP compression and requires some kind of decompression phase before performing a string-matching.
We present a novel algorithm, Aho-Corasick-based algorithm for Compressed HTTP (ACCH), that takes advantage of information gathered by the decompression phase in order to accelerate the commonly used Aho-Corasick pattern matching algorithm. By analyzing real HTTP traffic and real web application firewall signatures, we show that up to 84% of the data can be skipped in its scan. Surprisingly, we show that it is faster to perform pattern matching on the compressed data, with the penalty of decompression, than on regular traffic. As far as we know, we are the first paper that analyzes the problem of ‘on-the-fly’ multi-pattern matching on compressed HTTP traffic and suggest a solution
A method for processing data includes accepting a specification of a plurality of patterns, each pattern defining a respective uncompressed sequence of symbols. Multi-pattern matching is applied to an incoming stream of compressed communication traffic containing compression metadata so as to identify the patterns occurring in the stream while using the compression metadata to skip over parts of the stream.
DNS security
Malicious actors carrying out distributed denial-of-service (DDoS) attacks are interested in requests that consume a large amount of resources and provide them with ammunition. We present a severe complexity attack on DNS resolvers, where a single malicious query to a DNS resolver can significantly increase its CPU load. Even a few such concurrent queries can result in resource exhaustion and lead to a denial of its service to legitimate clients. This attack is unlike most recent DDoS attacks on DNS servers, which use communication amplification attacks where a single query generates a large number of message exchanges between DNS servers.
The attack described here involves a malicious client whose request to a target resolver is sent to a collaborating malicious authoritative server; this server, in turn, generates a carefully crafted referral response back to the (victim) resolver. The chain reaction of requests continues, leading to the delegation of queries. These ultimately direct the resolver to a server that does not respond to DNS queries. The exchange generates a long sequence of cache and memory accesses that dramatically increase the CPU load on the target resolver. Hence the name non-responsive delegation attack, or NRDelegationAttack.
We demonstrate that three major resolver implementations, BIND9, Unbound, and Knot, are affected by the NRDelegationAttack, and carry out a detailed analysis of the amplification factor on a BIND9 based resolver. As a result of this work, three common vulnerabilities and exposures (CVEs) regarding NRDelegationAttack were issued by these resolver implementations. We also carried out minimal testing on 16 open resolvers, confirming that the attack affects them as well.
We investigate the negative caching (caching of NXdomain
responses) behavior on nine large open DNS resolvers. We
measure the amount of time an NXDomain response is kept
in the cache in various TTL configurations and compare it
to the time an existent domain is kept in the cache.