Publications by Year
Auto-scaling is a fundamental capability of cloud computing which allows consuming resources dynamically according to changing traffic needed to be served.
By the micro-services architecture paradigm, software systems are built as a set of loosely-coupled applications and services that can be individually scaled.
In this paper, we present a new attack the \emph{Tandem Attack} that exploits the Tandem behavior of micro-services with different scaling properties. Such issues can result in Denial of Service (DoS) and Economic Denial of Sustainability (EDoS) created by malicious attackers or self-inflicted due to wrong configurations set up by administrators. We demonstrate the Tandem attack using a popular AWS serverless infrastructure modeling two services and show that removing servers’ management responsibility from the cloud users does not mitigate the different scaling properties challenge and can even make the problem harder to solve.
This paper presents a new localhost browser based vulnerability and corresponding attack that opens the door to new attacks on private networks and local devices. We show that this new vulnerability may put hundreds of millions of internet users and their IoT devices at risk. Following the attack presentation, we suggest three new protection mechanisms to mitigate this vulnerability.
This new attack bypasses recently suggested protection mechanisms designed to stop browser-based attacks on private devices and local applications.
Malicious actors carrying out distributed denial-of-service (DDoS) attacks are interested in requests that consume a large amount of resources and provide them with ammunition. We present a severe complexity attack on DNS resolvers, where a single malicious query to a DNS resolver can significantly increase its CPU load. Even a few such concurrent queries can result in resource exhaustion and lead to a denial of its service to legitimate clients. This attack is unlike most recent DDoS attacks on DNS servers, which use communication amplification attacks where a single query generates a large number of message exchanges between DNS servers.
The attack described here involves a malicious client whose request to a target resolver is sent to a collaborating malicious authoritative server; this server, in turn, generates a carefully crafted referral response back to the (victim) resolver. The chain reaction of requests continues, leading to the delegation of queries. These ultimately direct the resolver to a server that does not respond to DNS queries. The exchange generates a long sequence of cache and memory accesses that dramatically increase the CPU load on the target resolver. Hence the name non-responsive delegation attack, or NRDelegationAttack.
We demonstrate that three major resolver implementations, BIND9, Unbound, and Knot, are affected by the NRDelegationAttack, and carry out a detailed analysis of the amplification factor on a BIND9 based resolver. As a result of this work, three common vulnerabilities and exposures (CVEs) regarding NRDelegationAttack were issued by these resolver implementations. We also carried out minimal testing on 16 open resolvers, confirming that the attack affects them as well.
With the continuous increase in reported Common Vulnerabilities and Exposures (CVEs), security teams are overwhelmed by vast amounts of data, which are often analyzed manually, leading to a slow and inefficient process. To address cybersecurity threats effectively, it is essential to establish connections across multiple security entity databases, including CVEs, Common Weakness Enumeration (CWEs), and Common Attack Pattern Enumeration and Classification (CAPECs). In this study, we introduce a new approach that leverages the RotatE \cite{RotatE} knowledge graph embedding model, initialized with embeddings from Ada language model developed by OpenAI \cite{embeddingada}. Additionally, we extend this approach by initializing the embeddings for the relations. \ignore{This method surpasses previous attempts and provides a valuable tool for security teams to efficiently identify and respond to cybersecurity threats.
Unlike previous works that only handled CVEs present in the training set, our approach can deal with unseen entities. Furthermore, we contribute a comprehensive dataset and our models for future benchmarking.
With the advent of cloud and container technologies, enterprises develop applications using a microservices architecture, managed by orchestration systems (e.g. Kubernetes), that group the microservices into clusters. As the number of application setups across multiple clusters and different clouds is increasing, technologies that enable communication and service discovery between the clusters are emerging (mainly as part of the Cloud Native ecosystem).
In such a multi-cluster setting, copies of the same microservice may be deployed in different geo-locations, each with different cost and latency penalties. Yet, current service selection and load balancing mechanisms do not take into account these locations and corresponding penalties.
We present \emph{MCOSS}, a novel solution for optimizing the service selection, given a certain microservice deployment among clouds and clusters in the system. Our solution is agnostic to the different multi-cluster networking layers, cloud vendors, and discovery mechanisms used by the operators. Our simulations show a reduction in outbound traffic cost by up to 72% and response time by up to 64%, compared to the currently-deployed service selection mechanisms.
Distributed denial of service (DDoS) attacks, especially distributed reflection denial of service attacks (DRDoS), have increased dramatically in frequency and volume in recent years. Such attacks are possible due to the attacker’s ability to spoof the source address of IP packets. Since the early days of the internet, authenticating the IP source address has remained unresolved in the real world. Although there are many methods available to eliminate source spoofing, they are not widely used, primarily due to a lack of economic incentives.
We propose a collaborative on-demand route-based defense technique (CORB) to offer efficient DDoS mitigation as a paid-for-service, and efficiently assuage reflector attacks before they reach the reflectors and flood the victim. The technique uses scrubbing facilities located across the internet at internet service providers (ISPs) and internet exchange points (IXPs).
By transmitting a small amount of data based on border gateway protocol (BGP) information from the victim to the scrubbing facilities, we can filter out the attack without any false-positive cases. For example, the data can be sent using DOTS, a new signaling DDoS protocol that was standardized by the IETF. CORB filters the attack before it is amplified by the reflector, thereby reducing the overall cost of the attack. This provides a win-win financial situation for the victim and the scrubbing facilities that provide the service.
We demonstrate the value of CORB by simulating a Memcached DRDoS attack using real-life data. Our evaluation found that deploying CORB on scrubbing facilities at approximately 40 autonomous systems blocks 90% of the attack and can reduce the mitigation cost by 85%.
Analyzing the network behavior of IoT devices, including which domains, protocols, and ports the device communicates with, is a fundamental challenge for IoT security and identification. Solutions that analyze and manage these areas must be able to learn what constitutes normal device behavior and then extract rules and features to permit only legitimate behavior or identify the device. The Manufacturer Usage Description (MUD) is an IETF white-list protection scheme that formalizes the authorized network behavior in a MUD file; this MUD file can then be used as a type of firewall mechanism.
We demonstrate that learning what is normal behavior for an IoT device is more challenging than expected. In many cases, the same IoT device, with the same firmware, can exhibit different behavior or connect to different domains with different protocols, depending on the device’s geographical location.
Then, we present a technique to generalize MUD files. By processing MUD files that originate in different locations, we can generalize and create a comprehensive MUD file that is applicable for all locations.
To conduct the research, we created MUDIS, a MUD Inspection System tool, that compares and generalizes MUD files. Our open-source MUDIS tool and dataset are available online to researchers and IoT manufacturers, allowing anyone to visualize, compare, and generalize MUD files.
The Manufacturer Usage Description (MUD) is an IETF white-list protection scheme that formalizes the authorized network behavior in a MUD file; this MUD file can then be used as a type of firewall mechanism.
This demo introduces MUDIS, a MUD Inspection System that inspects the network behavior of devices, based on their formal description in the MUD file. We present several use-cases in which MUDIS is useful, including examining the impact of device location, the impact of a firmware update, the correlation of network behavior between different devices of the same manufacture, and more.
MUDIS inspects two MUD files, clusters together and graph- ically visualizes identical, similar, and dissimilar rules. It then calculates a similarity score that measures the similarity between them both. It also generalizes the two MUD files where possible, such that the resulting generalized MUD covers all the permitted (white-list) network behavior for both MUDs.
Our open-source MUDIS tool and proof-of-concept dataset are available for researchers and IoT manufacturers, allowing anyone to gain meaningful insights over the network behavior of IoT devices.
Monitoring medical data, e.g., Electrocardiogram (ECG) signals, is a common application of Internet of Things (IoT) devices. Compression methods are often applied on the massive amounts of sensor data generated prior to sending it to the Cloud to reduce the storage and delivery costs. A lossy compression provides high compression gain (CG), but may reduce the performance of an ECG application (downstream task) due to information loss. Previous works on ECG monitoring focus either on optimizing the signal reconstruction or the task’s performance. Instead, we advocate a self-adapting lossy compression solution that allows configuring a desired performance level on the downstream tasks while maintaining an optimized CG that reduces Cloud costs.
We propose Dynamic-Deep, a task-aware compression geared for IoT-Cloud architectures. Our compressor is trained to optimize the CG while maintaining the performance requirement of the downstream tasks chosen out of a wide range. In deployment, the IoT edge device adapts the compression and sends an optimized representation for each data segment, accounting for the downstream task’s desired performance without relying on feedback from the Cloud. We conduct an extensive evaluation of our approach on common ECG datasets using two popular ECG applications, which includes heart rate (HR) arrhythmia classification. We demonstrate that Dynamic-Deep can be configured to improve HR classification F1-score in a wide range of requirements. One of which is tuned to improve the F1-score by 3 and increases CG by up to 83% compared to the previous state of-the-art (autoencoder-based) compressor. Analyzing DynamicDeep on the Google Cloud Platform, we observe a 97% reduction in cloud costs compared to a no compression solution. To the best of our knowledge, Dynamic-Deep is the first end-to end system architecture proposal to focus on balancing the need for high performance of cloud-based downstream tasks and the desire to achieve optimized compression in IoT ECG monitoring settings.
Analyzing the network behavior of IoT devices, including which domains, protocols, and ports the device communicates with, is a fundamental challenge for IoT security and identification. Solutions that analyze and manage these areas must be able to learn what constitutes normal device behavior and then extract rules and features to permit only legitimate behavior or to identify the device. The Manufacturer Usage Description (MUD) is an IETF white-list protection scheme that formalizes the authorized network behavior in a MUD file; this MUD file can then be used as a type of firewall mechanism.
We demonstrate that learning what is normal behavior for an IoT device is more challenging than expected. In many cases, the same IoT device, with the same firmware, can exhibit different behavior or connect to different domains with different protocols. This behavior can even change, depending on the device’s geographical location. Thus, MUD functioning and IoT identification methods may not be effective in different locations. The reasons for this vary from country requirements to weak encryption, privacy regulations, CDN-like solutions, and more.
Manufacturer Usage Description (MUD) is a new, whitelist-based cybersecurity framework that was recently proposed by the IETF to cope with the huge attack surface and a constantly increasing number of IoT devices connected to the Internet.
MUD allows the IoT manufacturers themselves to publish the legitimate communication patterns of their devices, making it easier for security devices to enforce this policy, filter out non-complying traffic, and block a device in case it has been compromised.
Typically, MUD includes a set of legitimate endpoints, specified either by domain names or by IP addresses, along with the legitimate port numbers and protocols. While these descriptions are adequate when IoT devices connect (as clients) to servers (e.g., services in the cloud), they cannot adequately describe the cases where IoT devices act as servers to which endpoints connect [1]. These endpoints (e.g., users’ mobile devices) typically do not have fixed IP addresses, nor do they associate with a domain name. In this case, accounting for 78% of IoT devices we have surveyed, MUD degrades nowadays to allow all possible endpoints and cannot mitigate any attack. In this work, we evaluate this phenomenon and show it has a high prevalence today, thus harming dramatically the MUD framework security efficiency. We then present a solution, MUDirect, which enhances the MUD framework to deal with these cases while preserving the current MUD specification. Finally, we have implemented our solution (extending the existing osMUD implementation [2]) and showed that it enables P2P IoT devices protection while having minimal changes to the osMUD code.
We investigate the negative caching (caching of NXdomain
responses) behavior on nine large open DNS resolvers. We
measure the amount of time an NXDomain response is kept
in the cache in various TTL configurations and compare it
to the time an existent domain is kept in the cache.
In recent years, we have witnessed a new kind of DDoS attack, the burst attack(Chai, 2013; Dahan, 2018), where the attacker launches periodic bursts of traffic overload on online targets. Recent work presents a new kind of Burst attack, the YoYo attack (Bremler-Barr et al., 2017) that operates against the auto-scaling mechanism of VMs in the cloud. The periodic bursts of traffic loads cause the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The auto-scaling mechanism translates the flat DDoS attacks into Economic Denial of Sustainability attacks (EDoS), where the victim suffers from economic damage accrued by paying for extra resources required to process the traffic generated by the attacker. However, it was shown that YoYo attack also causes significant performance degradation since it takes time to scale-up VMs.In this research, we analyze the resilience of Kubernetes auto-scaling against YoYo attacks. As containerized cloud applications using Kubernetes gain popularity and replace VM-based architecture in recent years. We present experimental results on Google Cloud Platform, showing that even though the scale-up time of containers is much lower than VM, Kubernetes is still vulnerable to the YoYo attack since VMs are still involved. Finally, we evaluate ML models that can accurately detect YoYo attack on a Kubernetes cluster
In recent years the number of IoT devices in home networks has increased dramatically. Whenever a new device connects to the network, it must be quickly managed and secured using the relevant security mechanism or QoS policy. Thus a key challenge is to distinguish between IoT and NoT devices in a matter of minutes. Unfortunately, there is no clear indication of whether a device in a network is an IoT. In this paper, we propose different classifiers that identify a device as IoT or non-IoT, in a short time scale, and with high accuracy.
Our classifiers were constructed using machine learning techniques on a seen (training) dataset and were tested on an unseen (test) dataset. They successfully classified devices that were not in the seen dataset with accuracy above 95%. The first classifier is a logistic regression classifier based on traffic features. The second classifier is based on features we retrieve from DHCP packets. Finally, we present a unified classifier that leverages the advantages of the other two classifiers.
A new scalable ISP level system architecture to secure and protect all IoT devices in a large number of homes is presented. The system is based on whitelisting, as in the Manufacturer Usage Description (MUD) framework, implemented as a VNF. Unlike common MUD suggestions that place the whitelist application at the home/enterprise network, our approach is to place the enforcement upstream at the provider network, combining an NFV (Network Function Virtualization) with router/switching filtering capabilities, e.g., ACLs. The VNF monitors many home networks simultaneously, and therefore, is a highly-scalable managed service solution that provides both the end customers and the ISP with excellent visibility and security of the IoT devices at the customer premises.
The system includes a mechanism to distinguish between flows of different devices at the ISP level despite the fact that most home networks (and their IoT devices) are behind a NAT and all the flows from the same home come out with the same source IP address. Moreover, the NFV system needs to receive only the first packet of each connection at the VNF, and rules space is proportional to the number of unique types of IoT devices rather than the number of IoT devices. The monitoring part of the solution is off the critical path and can also uniquely protect from incoming DDoS attacks.
To cope with internal traffic, that is not visible outside the customer premise and often consists of P2P communication, we suggest a hybrid approach, where we deploy a lightweight component at the CPE, whose sole purpose is to monitor P2P communication. As current MUD solution does not provide a secure solution to P2P communication, we also extend the MUD protocol to deal also with peer-to-peer communicating devices. A PoC with a large national level ISP proves that our technology works as expected.
In this paper we present three attacks on private internal networks behind a NAT and a corresponding new
protection mechanism, Internal Network Policy, to mitigate a wide range of attacks that penetrate internal networks behind a NAT. In the attack scenario, a victim is
tricked to visit the attacker’s website, which contains a
malicious script that lets the attacker access the victim’s
internal network in different ways, including opening a
port in the NAT or sending a sophisticated request to
local devices. The first attack utilizes DNS Rebinding
in a particular way, while the other two demonstrate different methods of attacking the network, based on application security vulnerabilities. Following the attacks,
we provide a new browser security policy, Internal Network Policy (INP), which protects against these types of vulnerabilities and attacks. This policy is implemented
in the browser just like Same Origin Policy (SOP) and
prevents malicious access to internal resources by external entities.
This paper exposes a new vulnerability and introduces a corresponding attack, the NoneXistent Name Server Attack (NXNSAttack), that disrupts and may paralyze the DNS system, making it difficult or impossible for Internet users to access websites, web e-mail, online video chats, or any other online resource. The NXNSAttack generates a storm of packets between DNS resolvers and DNS authoritative name servers. The storm is produced by the response of resolvers to unrestricted referral response messages of authoritative name servers. The attack is significantly more destructive than NXDomain attacks (e.g., the Mirai attack): i) It reaches an amplification factor of more than 1620x on the number of packets exchanged by the recursive resolver. ii) In addition to the negative cache, the attack also saturates the ‘NS’ section of the resolver caches. To mitigate the attack impact, we propose an enhancement to the recursive resolver algorithm, MaxFetch(k), that prevents unnecessary proactive fetches. We implemented the MaxFetch(1) mitigation enhancement on a BIND resolver and tested it on real-world DNS query datasets. Our results show that MaxFetch(1) degrades neither the recursive resolver throughput nor its latency. Following the discovery of the attack, a responsible disclosure procedure was carried out, and several DNS vendors and public providers have issued a CVE and patched their systems.
We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P the messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A the messages captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings that are frequently found in A and not in P , thus allowing the identification of the attack packets. This is an important tool in protecting sites on the Internet from worm attacks and distributed denial of service attacks and may also be useful for other problems, including command and control identification and the DNA-sequences analysis. The main contributions of this paper are the system we developed to extract the required signatures together with the string-heavy hitters problem definition and the algorithm for solving this problem. This algorithm finds popular strings of variable length in a set of messages, using, in a tricky way, the classic heavy-hitter algorithm as a building block. The algorithm runs in linear time requiring one-pass over the input. Our system makes use of this algorithm to extract the desired signatures. Furthermore, we provide an extended algorithm which is able to identify groups of signatures, often found together in the same packets, which further improves the quality of signatures generated by our system. Using our system, a yet unknown attack can be detected and stopped within minutes from attack start time.
In this work we measure what percentage of DNS recursive
resolvers perform negative caching in the wild. We deploy
our own authoritative name server and harness thousands
of RIPE Atlas [3] sites spread over the globe to perform
repeated DNS queries for non-existing sub-domains of our
authoritative domain.
Efficient algorithms and techniques to detect and identify large flows in a high throughput traffic stream in the SDN match-and-action model are presented. This is in contrast to previous work that either deviated from the match and action model by requiring additional switch level capabilities or did not exploit the SDN data plane. Our construction has two parts; (a) new methods to efficiently sample in an SDN match and action model, (b) new and efficient algorithms to detect large flows efficiently and in a scalable way, in the SDN model.
Our large flow detection methods provide high accuracy and present a good and practical tradeoff between switch – controller traffic, and the number of entries required in the switch flow table. Based on different parameters, we differentiate between heavy flows, elephant flows and bulky flows and present efficient algorithms to detect flows of the different types.
Additionally, as part of our heavy flow detection scheme, we present sampling methods to sample packets with arbitrary probability p per packet or per byte that traverses an SDN switch.
Finally, we show how our algorithms can be adapted to a distributed monitoring SDN setting with multiple switches, and easily scale with the number of monitoring switches.
We present range encoding with no expansion (RENÉ)- a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems, such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENÉ on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.
Memcached is an in-memory key-value distributed
caching solution, commonly used by web servers for fast content delivery. Keys with their values are distributed between Memcached servers using a consistent hashing technique, resulting in an even distribution (of keys) among the servers. However, as small number of keys are significantly more popular than others (a.k.a., hot keys), even distribution of keys may cause a significantly different request load on the Memcached servers, which, in turn, causes substantial performance degradation.
Previous solutions to this problem require complex application level solutions and extra servers. In this paper, we propose MBalancer–a simple L7 load balancing scheme for Memcached that can be seamlessly integrated into Memcached architectures running over Software-Defined Networks (SDN). In a nutshell, MBalancer runs as an SDN application and duplicates the hot keys to many (or all) Memcached servers. The SDN controller updates the SDN switches forwarding tables and uses SDN readymade load balancing capabilities. Thus, no change is required to 1Memcached clients or servers.
Our analysis shows that with minimal overhead for storing
a few extra keys, the number of requests per server is close to balanced (assuming requests for keys follows a Zipf distribution).
Moreover, we have implemented MBalancer on a hardware based OpenFlow switch. As MBalancer offloads requests from bottleneck Memcached servers, our experiments show that it achieves significant throughput boost and latency reduction.
Auto-scaling mechanisms are an important line of defense against Distributed Denial of Service (DDoS) in the cloud. Using auto-scaling, machines can be added and removed in an on-line manner to respond to fluctuating load. It is commonly believed that the auto-scaling mechanism casts DDoS attacks into Economic Denial of Sustainability (EDoS) attacks. Rather than suffering from performance degradation up to a total denial of service, the victim suffers only from the economic damage incurred by paying for the extra resources required to process the bogus traffic of the attack.
Contrary to this belief, we present and analyze the Yo-Yo attack, a new attack against the auto-scaling mechanism, that can cause significant performance degradation in addition to economic damage. In the Yo-Yo attack, the attacker sends peri- odic bursts of overload, thus causing the auto-scaling mechanism to oscillate between scale-up and scale-down phases. The Yo- Yo attack is harder to detect and requires less resources from the attacker compared to traditional DDoS. We demonstrate the attack on Amazon EC2 [4], and analyze protection measures the victim can take by reconfiguring the auto-scaling mechanism.
Traditional DDoS anti-spoofing scrubbers require dedicated middleboxes thus adding CAPEX, latency and complexity in the network. This paper starts by showing that the current SDN match-and-action model is rich enough to implement a collection of anti-spoofing methods. Secondly we develop and utilize advance methods for dynamic resource sharing to distribute the required mitigation resources over a network of switches.
None of the earlier attempts to implement anti-spoofing in SDN actually directly exploited the match and action power of the switch data plane. They required additional functionalities on top of the match-and-action model, and are not implementable on an SDN switch as is. Our method builds on the premise that an SDN data path is a very fast and efficient engine to perform low level primitive operations at wire speed. The solution requires a number of flow-table rules and switch-controller messages proportional to the legitimate traffic. To scale when protecting multiple large servers the flow tables of multiple switches are harnessed in a distributed and dynamic network based solution.
We have fully implemented all our methods in either OpenFlow1.5 in Open-vSwitch and in P4. The system mitigates spoofed attacks on either the SDN infrastructure itself or on downstream servers.
Random Subdomain DDoS attacks on the Domain Name System (DNS) infrastructure are becoming a popular vector in recent attacks (e.g., recent Mirai attack on Dyn). In these attacks, many queries are sent for a single or a few victim domains, yet they include highly varying non-existent subdomains generated randomly.
Motivated by these attacks we designed and implemented novel and efficient algorithms for distinct heavy hitters (dHH). A (classic) heavy hitter (HH) in a stream of elements is a key (e.g., the domain of a query) which appears in many elements (e.g., requests).
When stream elements consist of <key, subkey> pairs, (<domain, subdomain>) a distinct heavy hitter (dhh) is a key that is paired with a large number of different subkeys. Our algorithms dominate previous designs in both the asymptotic (theoretical) sense and practicality. Specifically, the new fixed-size algorithms are simple to code and with asymptotically optimal space accuracy tradeoffs.
Based on these algorithms, we build and implement a system for the detection and mitigation of Random Subdomain DDoS attacks. We perform experimental evaluation, demonstrating the effectiveness of our algorithms.
A method for network protection includes monitoring data packets transmitted to and from an IoT device (24, 26, 28, 30) over a network (38, 46). Based on the monitored data packets, a set of one or more endpoints (42, 44) that are authorized to communicate with the IoT device via the network is identified. Among the monitored data packets, an attempt to communicate with the IoT device by an endpoint (52) that is not a member of the identified set is detected. Responsively to detecting the attempt, a protective action is performed at a guard location in the network, between the endpoint and the IoT device, so as to mitigate unauthorized communications with the IoT device.
We present OpenBox — a software-defined framework for network-wide development, deployment, and management of network functions (NFs). OpenBox effectively decouples the control plane of NFs from their data plane, similarly to SDN solutions that only address the network’s forwarding plane.
OpenBox consists of three logic components. First, user-defined OpenBox applications provide NF specifications through the OpenBox north-bound API. Second, a logically-centralized OpenBox controller is able to merge logic of multiple NFs, possibly from multiple tenants, and to use a network-wide view to efficiently deploy and scale NFs across the network data plane. Finally, OpenBox instances constitute OpenBox’s data plane and are implemented either purely in software or contain specific hardware accelerators (e.g., a TCAM). In practice, different NFs carry out similar processing steps on the same packet, and our experiments indeed show a significant improvement of the network performance when using OpenBox. Moreover, OpenBox readily supports smart NF placement, NF scaling, and multi-tenancy through its controller.
We present RENE ́ — a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENE ́ on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.
Deep Packet Inspection (DPI) is a widespread functionality among middlebox applications. In many cases, DPI is performed by Intrusion Detection Systems (IDS) such as Snort. Traditionally, each packet is re-scanned by multiple middleboxes until reaching its final destination. Recent studies show that DPI is one of the most processing intensive tasks of modern middleboxes.
The DPI as a Service paper, presents a framework which allows to extract the time-consuming task of DPI out of the middleboxes while providing it as a service. Alongside with the framework design, the authors introduce a reference implementation on a simulated environment, while demonstrating promising results through a set of experiments.
In this work we have enhanced the reference implementation in order to demonstrate that the framework can operate in a more realistic environment setup. First, and foremost, we have integrated the framework with the commonly used Snort IDS. Second, we have extended the DPI Service to support the Network Service Header (NSH) protocol which allows passing of the pattern match results with the inspected packet. These significant enhancements, transformed the reference implementation to a more robust system, which can take proactive measures in an event of malicious pattern detection.
Finally, once the work on the framework was completed we were able to perform the basic experiments which were reported in the original paper. Our findings indicate, that the original framework results are reproducible in a our version of the framework.
This paper starts by demonstrating the vulnerability of Deep Packet Inspection (DPI) mechanisms, which are at the core of security devices, to algorithmic complexity denial of service attacks, thus exposing a weakness in the first line of defense of enterprise networks and clouds. A system and a multi-core architecture to defend from these algorithmic complexity attacks is presented in the second part of the paper. The integration of this system with two different DPI engines is demonstrated and dis- cussed. The vulnerability is exposed by showing how a simple low bandwidth cache-miss attack takes down the Aho-Corasick (AC) pattern matching algorithm that lies at the heart of most DPI engines. As a first step in the mitigation of the attack, we have developed a compressed variant of the AC algorithm that improves the worst case performance (under an attack). Still, under normal traffic its running-time is worse than classical AC implementations. To overcome this problem, we introduce MCA^2 —Multi- Core Architecture to Mitigate Complexity Attacks, which dynamically combines the classical AC algorithm with our compressed implementation, to provide a robust solution to mitigate this cache- miss attack. We demonstrate the effectiveness of our architecture by examining cache-miss algorithmic complexity attacks against DPI engines and show a goodput boost of up to 73%. Finally, we show that our architecture may be generalized to provide a principal solution to a wide variety of algorithmic complexity attacks.
This paper starts by demonstrating the vulnerability of Deep Packet Inspection (DPI) mechanisms, which are at the core of security devices, to algorithmic complexity denial of service attacks, thus exposing a weakness in the first line of defense of enterprise networks and clouds. A system and a multi-core architecture to defend from these algorithmic complexity attacks is presented in the second part of the paper. The integration of this system with two different DPI engines is demonstrated and discussed. The vulnerability is exposed by showing how a simple low bandwidth cache-miss attack takes down the Aho-Corasick (AC) pattern matching algorithm that lies at the heart of most DPI engines. As a first step in the mitigation of the attack, we have developed a compressed variant of the AC algorithm that improves the worst case performance (under an attack). Still, under normal traffic its running-time is worse than classical AC implementations. To overcome this problem, we introduce MCA 2 -Multi-Core Architecture to Mitigate Complexity Attacks, which dynamically combines the classical AC algorithm with our compressed implementation, to provide a robust solution to mitigate this cache-miss attack. We demonstrate the effectiveness of our architecture by examining cache-miss algorithmic complexity attacks against DPI engines and show a goodput boost of up to 73%. Finally, we show that our architecture may be generalized to provide a principal solution to a wide variety of algorithmic complexity attacks.
URL matching lies at the core of many networking applications and Information Centric Networking architectures. For example, URL matching is extensively used by Layer 7 switches, ICN/NDN routers, load balancers, and security devices. Modern URL matching is done by maintaining a rich database that consists of tens of millions of URL which are classified to dozens of categories (or egress ports). In real-time, any input URL has to be searched in this database to find the corresponding category.
In this paper, we introduce a generic framework for accurate URL matching (namely, no false positives or miscategorization) that aims to reduce the overall memory footprint, while still having low matching latency. We introduce a dictionary-based compression method that compresses the database by 60%, while having only a slight overhead in time. Our framework is very flexible and it allows hot-updates,
cloud-based deployments, and can deal with strings that are not URLs.
Today, most of the network traffic need to traverse through several middleboxes before it can reach its destination. Common operation between the many of these middlebox is DPI – Deep Packet Inspection, which allows to perform different actions based on patterns in the packets content.
DPI consumes many of the middlebox resources during its operation. In addition, each packet usually traverses several middleboxes which causes the same packet to be scanned by different DPI engines over and over again. As a result the network becomes less efficient, which affects directly its total bandwidth.
One solution for those issues is a system that provide DPI as service. Means, the different middleboxes in the network that need DPI, can register to the service and expose their desired patterns. The System will direct the packets to a designated DPI engine instances across the network and pass the pattern matches, if exists, to the relevant middlebox.
There are many advantages in such system, among others: a single scan of every packet, the ability to upgrade to latest DPI algorithms, better partition of packets between DPI engines and increasing middlebox development innovation. Developing such a system is more simple today than ever with the emerging of SDN, which allows dynamic routing of the network traffic using a centralized controller.
The goal of this work is to implement a prototype of the DPI as a service system and to provide a realistic as possible environment to evaluate it. This paper documents the design and implementation of the system and other tools which are needed to deploy functioning network that uses this system.
Finally, the paper describes the experiments done to prove the system correctness and effectiveness and discusses their results.
Monitoring Round-Trip Time provides important insights for network troubleshooting and traffic engineering. The common monitoring technique is to actively send probe packets from selected vantage points (hosts or middleboxes). In traditional networks, the control over the network routing is limited, making it impossible to monitor every selected path. The emerging concept of Software Defined Networking simplifies network control. However, OpenFlow, the common SDN protocol, does not support RTT monitoring as part of its specification. In this paper, we leverage the ability of OpenFlow to control the routing, and present GRAMI, the Granular RTT Monitoring Infrastructure. GRAMI uses active probing from selected vantage points for efficient RTT monitoring of all the links and any round-trip path between any two switches in the network.
GRAMI was designed to be resource efficient. It requires only four flow entries installed on every switch in order to enable RTT monitoring of all the links. For every round-trip path selected by the user, it requires a maximum of two additional flow entries installed on every switch along the measured path. Moreover, GRAMI uses a minimal number of probe packets and does not require the involvement of the controller during online RTT monitoring.
This paper focuses on regular expression matching over compressed traffic. The need for such matching arises from two independent trends. First, the volume and share of compressed HTTP traffic is constantly increasing. Second, due to their superior expressibility, current Deep Packet Inspection engines use regular expressions more and more frequently.
We present an algorithmic framework to accelerate such matching, taking advantage of information gathered when the traffic was initially compressed. HTTP compression is typically performed through the GZIP protocol, which uses back-references to repeated strings. Our algorithm is based on calculating (for every byte) the minimum number of (previous) bytes that can be part of a future regular expression matching. When inspecting a back-reference, only these bytes should be taken into account, thus enabling one to skip repeated strings almost entirely without missing a match. We show that our generic framework works with either NFA-based or DFA-based implementations and gains performance boosts of more than 70%. Moreover, it can be readily adapted to most existing regular expression matching algorithms, which usually are based either on NFA, DFA or combinations of the two. Finally, we discuss other applications in which calculating the number of relevant bytes becomes handy, even when the traffic is not compressed.
Deep Packet Inspection (DPI) plays a major role in contemporary networks. Specifically, in datacenters of content providers, the scanned data may be highly repetitive. Most DPI engines are based on identifying signatures in the packet payload. This pattern matching process is expensive both in memory and CPU resources, and thus, often becomes the bottleneck of the entire application.
In this paper we show how DPI can be accelerated by leveraging repetitions in the inspected traffic. Our new mechanism makes use of these repetitions to allow the repeated data to be skipped rather than scanned again. The mechanism consists of a slow path, in which frequently repeated strings are identified and stored in a dictionary, along with some succinct information for accelerating the DPI process, and a data path, where the traffic is scanned byte by byte but strings from the dictionary, if encountered, are skipped. Upon skipping, the data path recovers to the state it would have been in had the scanning continued byte by byte.
Our solution achieves a significant performance boost, especially when data is from the same content source (e.g., the same website). Our experiments show that for such cases, our solution achieves a throughput gain of 1.25 − 2.5 times the original throughput, when implemented in software.
Configuring range based packet classification rules in network switches is crucial to all network core functionalities, such as firewalls and routing. However, OpenFlow, the leading management protocol for SDN switches, lacks the interface to configure range rules directly and only provides mask based rules, named flow entries.
In this work we present, ORange, the first solution to multi dimensional range classification in OpenFlow. Our solution is based on paradigms used in state of the art non-OpenFlow classifiers and is designed in a modular fashion allowing future extensions and improvements. We consider switch space utilization as well as atomic updates functionality, and in the network context we provide flow consistency even if flows change their entrance point to the network during policy updates, a property we name cross-entrance consistency. Our scheme achieves remarkable results and is easy to deploy.
Similarity search, and specifically the nearest-neighbor search (NN) problem is widely used in many fields of computer science such as machine learning, computer vision and databases. However, in many settings such searches are known to suffer from the notorious curse of dimensionality, where running time grows exponentially with d. This causes severe performance degradation when working in high-dimensional spaces. Approximate techniques such as locality-sensitive hashing improve the performance of the search, but are still computationally intensive.
In this paper we propose a new way to solve this problem using a special hardware device called ternary content addressable memory (TCAM). TCAM is an associative memory, which is a special type of computer memory that is widely used in switches and routers for very high speed search applications. We show that the TCAM computational model can be leveraged and adjusted to solve NN search problems in a single TCAM lookup cycle, and with linear space. This concept does not suffer from the curse of dimensionality and is shown to improve the best known approaches for NN by more than four orders of magnitude. Simulation results demonstrate dramatic improvement over the best known approaches for NN, and suggest that TCAM devices may play a critical role in future large-scale databases and cloud applications.
Performance analysis has been used for over half a century for the design and operations of computer systems and communications networks. This methodology, mostly stochastic in nature, has been based on the underlying paradigm that users are innocent and “well behaving”. Following the flourish of the internet and the subsequent rise of malicious incidents, research started investigating the effect of malicious attacks aimed at degrading system performance.
The objective of this work is to understand how system performance is affected by malicious behavior and how performance evaluation should account for it. We do so by considering an array of “classical” systems taken from the literature and examining their degree of vulnerability. These can also serve in providing some initial insights into what makes a system design vulnerable or resilient.
A recursive and fast construction of an n-element priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date
require either O(log n) instructions per operation or, exponential (with key size) space or, expensive special hardware whose cost and latency dramatically increases with the priority queue size.
Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues
(which are also much faster) while maintaining the O(1) steps per PQ operation is critical. Here
we present such an acceleration technique called the Power Priority Queue (PPQ) technique.
Specifically, an n-element PPQ is constructed from 2k − 1 primitive priority queues of size k √n
(k = 2, 3, …) and a RAM of size n, where the throughput of the construct beats that of a single,
size n primitive hardware priority queue. For example an n-element PQ can be constructed from
either three √n or five 3√n primitive H/W priority queues.
Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This
demonstrates the benefits of our scheme; when used with hardware TCAM. We expect similar results with systolic arrays, shift-registers and similar technologies.
As a byproduct of our technique we present an O(n) time sorting algorithm in a system
equipped with a O(w√n) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used
an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n-element with TCAM of size O(n) that matches our TCAM based sorting algorithm.
Performance analysis has been used for over half a century for the design and operations of computer systems and communications networks. This methodology, mostly stochastic in nature, has been based on the underlying paradigm that users are innocent and “well behaving”. Following the flourish of the internet and the subsequent rise of malicious incidents, research started investigating the effect of malicious attacks aimed at degrading system performance. The objective of this work is to understand how system performance is affected by malicious behavior and how performance evaluation should account for it. We do so by considering an array of “classical” systems taken from the literature and examining their degree of vulnerability. These can also serve in providing some initial insights into what makes a system design vulnerable or resilient.
Middleboxes play a major role in contemporary networks, as for- warding packets is often not enough to meet operator demands, and other functionalities (such as security, QoS/QoE provisioning, and load balancing) are required. Traffic is usually routed through a sequence of such middleboxes, which either reside across the net- work or in a single, consolidated location. Although middleboxes provide a vast range of different capabilities, there are components that are shared among many of them.
A task common to almost all middleboxes that deal with L7 protocols is Deep Packet Inspection (DPI). Today, traffic is inspected from scratch by all the middleboxes on its route. In this paper, we propose to treat DPI as a service to the middleboxes, implying that traffic should be scanned only once, but against the data of all middleboxes that use the service. The DPI service then passes the scan results to the appropriate middleboxes. Having DPI as a service has significant advantages in performance, scalability, robustness, and as a catalyst for innovation in the middlebox domain. Moreover, technologies and solutions for current Software Defined Networks (SDN) (e.g., SIMPLE [42]) make it feasible to implement such a service and route traffic to and from its instances.
A central component in all contemporary intrusion detection systems (IDSs) is their pattern matching algorithms, which are often based on constructing and traversing a deterministic finite automaton (DFA) that represents the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs, which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name used by common DFA encoding is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state are represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Match (LPM), which can be solved either in ternary content-addressable memory (TCAM), in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gb/s with low power consumption.
A recursive and fast construction of an n elements priority queue from exponentially smaller hardware priority queues and size n RAM is presented. All priority queue implementations to date either require O (log n ) instructions per operation or exponential (with key size) space or expensive special hardware whose cost and latency dramatically increases with the priority queue size. Hence constructing a priority queue (PQ) from considerably smaller hardware priority queues (which are also much faster)
while maintaining the O (1) steps per PQ operation is critical. Here we present such an acceleration technique called the Power Priority Queue (PPQ) technique. Specifically, an n elements PPQ is constructed from 2k −1 primitive priority queues of size k√n ( k = 2 , 3 , … ) and a RAM of size n , where the throughput of the construct beats that of a single, size n primitive hardware priority queue. For example an n elements PQ can be constructed from either three √n or five 3√n primitive H/W priority queues.
Applying our technique to a TCAM based priority queue, results in TCAM-PPQ, a scalable perfect line rate fair queuing of millions of concurrent connections at speeds of 100 Gbps. This demonstrates the benefits of our scheme when used with hardware TCAM, we expect similar results with systolic arrays,
shift-registers and similar technologies.
As a by product of our technique we present an O(n) time sorting algorithm i n a system equipped with a O(w√n) entries TCAM, where here n is the number of items, and w is the maximum number of bits required to represent an item, improving on a previous result that used an Ω(n) entries TCAM. Finally, we provide a lower bound on the time complexity of sorting n elements with TCAM of size O(n) that matches our TCAM based sorting algorithm.
We present a basic tool for zero day attack signature extraction. Given two large sets of messages, P of messages captured in the network at peacetime (i.e., mostly legitimate traffic) and A captured during attack time (i.e., contains many attack messages), we present a tool for extracting a set S of strings, that are frequently found in A and not in P . Therefore, a packet containing one of the strings from S is likely to be an attack packet.
This is an important tool in protecting sites on the Internet from Worm attacks, and Distributed Denial of Service (DDoS) attacks. It may also be useful for other problems, including command and control identification, DNA-sequences analysis, etc. which are beyond the scope of this work.
Two contributions of this paper are the system we developed to extract the required signatures together with the problem definition and the string-heavy hitters algorithm. This algorithm finds popular strings of variable length in a set of messages, using, in a tricky way, the classic heavy-hitter algorithm as a building block. This algorithm is then used by our system to extract the desired signatures. Using our system a yet unknown attack can be detected and stopped within minutes from attack start time.
In this paper we focus on the process of deep packet inspection of compressed web traffic. The major limiting factor in this process imposed by the compression, is the high memory requirements of 32KB per connection. This leads to the requirements of hundreds of megabytes to gigabytes of main memory on a multi-connection setting. We introduce new algorithms and techniques that drastically reduce this space requirement for such bump-in-the-wire devices like security and other content based networking tools. Our proposed scheme improves both space and time performance by almost 80% and over 40% respectively, thus making real-time compressed traffic inspection a viable option for networking devices.
Ternary content-addressable memories (TCAMs) are increasingly used for high-speed packet classification. TCAMs compare packet headers against all rules in a classification database in parallel and thus provide high throughput.
TCAMs are not well-suited, however, for representing rules that contain range fields and previously published algorithms typically represent each such rule by multiple TCAM entries. The resulting range expansion can dramatically reduce TCAM utilization because it introduces a large number of redundant TCAM entries. This redundancy can be mitigated by making use of extra bits, available in each TCAM entry.
We present a scheme for constructing efficient representations of range rules, based on the simple observation that sets of disjoint ranges may be encoded much more efficiently than sets of overlapping ranges. Since the ranges in real-world classification databases are, in general, non-disjoint, the algorithms we present split ranges between multiple layers, each of which consisting of mutually disjoint ranges. Each layer is then coded and assigned its own set of extra bits.
Our layering algorithms are based on approximations for specific variants of interval-graph coloring. We evaluate these algorithms by performing extensive comparative analysis on real-life classification databases. Our analysis establishes that our algorithms reduce the number of redundant TCAM entries caused by range rules by more than 60% as compared with best range-encoding prior work.
In this paper we focus on the process of deep packet inspection of compressed web traffic. The major limiting factor in this process imposed by the compression, is the high memory requirements of 32 KB per connection. This leads to the requirements of hundreds of megabytes to gigabytes of main memory on a multi-connection setting. We introduce new algorithms and techniques that drastically reduce this space requirement for such bump-in-the-wire devices like security and other content based networking tools. Our proposed scheme improves both space and time performance by almost 80% and over 40% respectively, thus making real-time compressed traffic inspection a viable option for networking devices.
Ternary content-addressable memories (TCAMs) are increasingly used for high-speed packet classification. TCAMs compare packet headers against all rules in a classification database in parallel and thus provide high throughput unparalleled by software-based solutions. TCAMs are not well-suited, however, for representing rules that contain range fields. Such rules typically have to be represented (or encoded) by multiple TCAM entries. The resulting range expansion can dramatically reduce TCAM utilization. A TCAM range-encoding algorithm A is database-independent if, for all ranges r, it encodes r independently of the database in which it appears; otherwise, we say that A is database-dependent. Typically, when storing a classification database in TCAM, a few dozens of so-called extra bits in each TCAM entry remain unused. These extra bits are used by some (both database-dependent and database-independent) prior algorithms to reduce range expansion. The majority of real-life database ranges are short. We present a novel database-independent algorithm called Short Range Gray Encoding (SRGE) for the efficient representation of short range rules. SRGE encodes range endpoints as binary-reflected Gray codes and then represents the resulting range by a minimal set of ternary strings. To the best of our knowledge, SRGE is the first algorithm that achieves a reduction in range expansion in general, and a significant expansion reduction for short ranges in particular, without resorting to the use of extra bits. The “traditional” database-independent technique for representing range entries in TCAM is prefix expansion. As we show, SRGE significantly reduces the expansion of short ranges in comparison with prefix expansion. We also prove that the SRGE algorithm’s range expansion is at least as good as that of prefix expansion for any range. Real-world classification databases contain a small number of unique long ranges, some of which appear in numerous rules. These long ranges cause high expansion which is not significantly reduced by any database-independent range encoding scheme that we are aware of, including SRGE. We introduce hybrid SRGE, a database-dependent encoding scheme that uses SRGE for reducing the expansion of short ranges and uses extra bits for reducing the expansion caused by long ones. Our comparative analysis establishes that hybrid SRGE utilizes TCAM more efficiently than previously published range-encoding algorithms. This work also makes a more theoretic contribution. Prefix expansion for ranges defined by W-bit endpoints has worst-case expansion ratio of 2W-2. It follows from the work of Schieber et al. [1] that the SRGE algorithm has a slightly better worst-case expansion ratio of 2W-4. We prove that any independent TCAM encoding scheme has worst-case expansion ratio of at least W.
Deep Packet Inspection (DPI) is the most time and resource consuming procedure in contemporary security tools such as Network Intrusion Detection/Prevention System (NIDS/IPS), Web Application Firewall (WAF), or Content Filtering Proxy. DPI consists of inspecting both the packet header and payload and alerting when signatures of malicious software appear in the traffic. These signatures are identified through pattern matching algorithms.
The portion of compressed traffic of overall Internet traffic is constantly increasing. This paper focuses on traffic compressed using shared dictionary. Unlike traditional compression algorithms, this compression method takes advantage of the inter-response redundancy (e.g., almost the same data is sent over and over again) as in nowadays dynamic Data. Shared Dictionary Compression over HTTP (SDCH), introduced by Google in 2008, is the first algorithm of this type. SDCH works well with other compression algorithm (as Gzip), making it even more appealing. Performing DPI on any compressed traffic is considered hard, therefore today’s security tools either do not inspect compressed data, alter HTTP headers to avoid compression, or decompress the traffic before inspecting it.
We present a novel pattern matching algorithm that inspects SDCH-compressed traffic without decompressing it first. Our algorithm relies on offline inspection of the shared dictionary, which is common to all compressed traffic, and marking auxiliary information on it to speed up the online DPI inspection. We show that our algorithm works near the rate of the compressed traffic, implying a speed gain of SDCH’s compression ratio (which is around 40%). We also discuss how to deal with SDCH compression over Gzip compression, and show how to perform regular expression matching with about the same speed gain.
This paper takes advantage of the emerging multi-core computer architecture to design a general framework for mitigating network-based complexity attacks. In complexity attacks, an attacker carefully crafts “heavy” messages (or packets) such that each heavy message consumes substantially more resources than a normal message. Then, it sends a sufficient number of heavy messages to bring the system to a crawl at best. In our architecture, called MCA^2 – Multi-Core Architecture for Mitigating Complexity Attacks – cores quickly identify such suspicious messages and divert them to a fraction of the cores that are dedicated to handle all the heavy messages. This keeps the rest of the cores relatively unaffected and free to provide the legitimate traffic the same quality of service as if no attack takes place.
We demonstrate the effectiveness of our architecture by examining cache-miss complexity attacks against Deep Packet Inspection (DPI) engines. For example, for Snort DPI engine, an attack in which 30% of the packets are malicious degrades the system throughput by over 50%, while with MCA^2 the throughput drops by either 20% when no packets are dropped or by 10% in case dropping of heavy packets is allowed. At 60% malicious packets, the corresponding numbers are 70%, 40% and 23%.
A method for processing communication traffic includes receiving an incoming stream of compressed data conveyed by a sequence of data packets, each containing a respective portion of the compressed data. The respective portion of the compressed data contained in the first packet is stored in a buffer, having a predefined buffer size. Upon receiving a subsequent packet, at least a part of the compressed data stored in the buffer and the respective portion of the compressed data contained in the subsequent packet are decompressed, thereby providing decompressed data. A most recent part of the decompressed data that is within the buffer size is recompressed and stored in the buffer.
Current security tools, using ‘signature-based’ detection, do not handle compressed traffic, whose market-share is constantly increasing. This paper focus on compressed HTTP traffic. HTTP uses GZIP compression and requires some kind of decompression phase before performing a string-matching.
We present a novel algorithm, Aho-Corasick-based algorithm for Compressed HTTP (ACCH), that takes advantage of information gathered by the decompression phase in order to accelerate the commonly used Aho-Corasick pattern matching algorithm. By analyzing real HTTP traffic and real web application firewall signatures, we show that up to 84% of the data can be skipped in its scan. Surprisingly, we show that it is faster to perform pattern matching on the compressed data, with the penalty of decompression, than on regular traffic. As far as we know, we are the first paper that analyzes the problem of ‘on-the-fly’ multi-pattern matching on compressed HTTP traffic and suggest a solution
Compressing web traffic using standard GZIP is becoming both popular and challenging due to the huge increase in wireless web devices, where bandwidth is limited. Security and other content based networking devices are required to decompress the traffic of tens of thousands concurrent connections in order to insp ect the content for different signatures. The major limiting factor in this pro cess is the high memory requirements of 32KB p er connection that leads to hundreds of megabytes to gigabytes of main memory consumption. This requirement inhibits most devices from handling compressed traffic, which in turn either limits traffic compression or intro duces security holes and other dysfunctionalities. In this paper we introduce new algorithms and techniques that drastically reduce this space requirement by over 80%, with only a slight increase in the time overhead, thus making real-time compressed traffic inspection a viable option for network devices.
Deep Packet Inspection (DPI) lies at the core of contemporary Network Intrusion Detection/Prevention Systems and Web Application Firewalls. DPI aims to identify various malware (including spam and viruses) by inspecting both the header and the payload of each packet and comparing it to a known set of patterns. DPI is often performed on the critical path of the packet processing, thus the overall performance of the security tools is dominated by the speed of DPI. The seminal algorithm of Aho-Corasick (AC) is the de-facto standard for pattern matching in network intrusion detection systems (NIDS). Basically, the AC algorithm constructs a Deterministic Finite Automaton (DFA) for detecting all occurrences of a given set of patterns by processing the input in a single pass. The input is inspected symbol by symbol (usually each symbol is a byte), such that each symbol results in a state transition. Thus, in principle, the AC algorithm has deterministic performance, which does not depend on specific input and therefore is not vulnerable to algorithmic complexity attacks, making it very attractive to NIDS systems.
In this paper we show that, when implementing the AC algorithm in software, this property does not hold, due to the fact that contemporary pattern sets induce very large DFAs that cannot be stored entirely in cache. We then propose a novel technique to compress the representation of the Aho-Corasick automaton, so it can fit in modern cache. We compare both the performance and the memory footprint of our technique to previously-proposed implementation, under various settings and pattern sets. Our results reveal the space-time tradeoffs of DPI. Specifically, we show that our compression technique reduces the memory footprint of the best prior-art algorithm by approximately 60% , while achieving comparable throughput.
Compressing web traffic using standard GZIP is becoming both popular and challenging due to the huge increase in wireless web devices, where bandwidth is limited. Security and other content based networking devices are required to decompress the traffic of tens of thousands concurrent connections in order to inspect the content for different signatures. The overhead imposed by the decompression inhibits most devices from handling compressed traffic, which in turn either limits traffic compression or introduces security holes and other disfunctionalities.
The ACCH algorithm was the first to present a unified approach to pattern matching and decompression, by taking advantage of information gathered in the decompression phase to accelerate the pattern matching. ACCH accelerated the DFA-based Aho-Corasick multi-pattern matching algorithm. In this paper, we present a novel algorithm, SPC (Shift-based Pattern matching for Compressed traffic) that accelerates the commonly used Wu-Manber pattern matching algorithm. SPC is simpler and has higher throughput and lower storage overhead than ACCH. Analysis of real web traffic and real security devices signatures shows that we can skip scanning up to 87 . 5% of the data and gain performance boost of more than 51% as compared to ACCH. Moreover, the additional storage requirement of the technique requires only 4 KB additional information per connection as compared to 8 KB of ACCH.
A method for processing data includes encoding a finite automaton, which includes states and transitions between the states that express a plurality of predefined patterns, by grouping the states of the automaton into sets according to a common property shared by the states in each set, and assigning codes to the states according to the grouping. The codes are stored in an electronic memory, along with rules that are associated with the patterns. The automaton is traversed in order to identify one or more of the patterns in an input sequence of data elements by iteratively reading out the codes from the memory responsively to the data elements and to the codes that have been previously read out. Upon identifying a given pattern in the input sequence, an associated action is performed.
A method for processing data includes accepting a specification of a plurality of patterns, each pattern defining a respective uncompressed sequence of symbols. Multi-pattern matching is applied to an incoming stream of compressed communication traffic containing compression metadata so as to identify the patterns occurring in the stream while using the compression metadata to skip over parts of the stream.
Ternary content-addressable memory (TCAM) devices are increasingly used for performing high-speed packet classification. A TCAM consists of an associative memory that compares a search key in parallel against all entries. TCAMs may suffer from error events that cause ternary cells to change their value to any symbol in the ternary alphabet {”0”,“1”,“*”}. Due to their parallel access feature, standard error detection schemes are not directly applicable to TCAMs; an additional difficulty is posed by the special semantic of the “*” symbol. This paper introduces PEDS, a novel parallel error detection scheme that locates the erroneous entries in a TCAM device. PEDS is based on applying an error-detecting code to each TCAM entry and utilizing the parallel capabilities of the TCAM by simultaneously checking the correctness of multiple TCAM entries. A key feature of PEDS is that the number of TCAM lookup operations required to locate all errors depends on the number of symbols per entry in a manner that is typically orders of magnitude smaller than the number of TCAM entries. For large TCAM devices, a specific instance of PEDS requires only 200 lookups for 100-symbol entries, while a naive approach may need hundreds of thousands of lookups. PEDS allows flexible and dynamic selection of tradeoff points between robustness, space complexity, and number of lookups.
Images (4)
Ternary content-addressable memory (TCAM) devices are increasingly used for performing high-speed packet classification. A TCAM consists of an associative memory that compares a search key in parallel against all entries. TCAMs may suffer from error events that cause ternary cells to change their value to any symbol in the ternary alphabet “0”,”1″,”*”. Due to their parallel access feature, standard error detection schemes are not directly applicable to TCAMs; an additional difficulty is posed by the special semantic of the “*” symbol. This paper introduces PEDS, a novel parallel error detection scheme that locates the erroneous entries in a TCAM device. PEDS is based on applying an error-detection code to each TCAM entry, and utilizing the parallel capabilities of the TCAM, by simultaneously checking the correctness of multiple TCAM entries. A key feature of PEDS is that the number of TCAM lookup operations required to locate all errors depends on the number of symbols per entry rather than the (orders-of-magnitude larger) number of TCAM entries. For large TCAM devices, a specific instance of PEDS requires only 200 lookups for 100-symbol entries, while a naive approach may need hundreds of thousands lookups. PEDS allows flexible and dynamic selection of trade-off points between robustness, space complexity, and number of lookups.
Aggressive use of networks, in particular the Internet, either by malicious or innocent users, threatens the service availability and quality of polite applications. Common queueing mechanisms which supposedly solve the problem, are shown in this work to be ineffective for bursty applications, including Web applications. This can be exploited by malicious users to conduct a new kind of Denial of Service attacks. We propose a new traffic control mechanism called Aggressiveness Protective Queuing (APQ) which is based on attributing importance weights to the users and which solves this problem by dynamically decreasing the weight of the aggressive users. The actual weight used for a flow is a dynamically varying parameter reflecting the past bandwidth usage of the flow. We show that under heavy load (deterministic model), APQ significantly restricts the amount of traffic an aggressive user can send and bounds it, at most, to twice the amount of traffic sent by a polite (regular) user. Simulation results demonstrate the effectiveness of APQ under a stochastic environment.
Path Layout is a fundamental graph problem in label switching protocols. This problem is raised in various protocols such as the traditional ATM protocol and MPLS which is a new label switching protocol standardized recently by the IETF. Path layout is essentially the problem of reducing the size of the label table in a router. The size is equivalent to the number of different paths that pass through the router, or start from it. A reduction in the size can be achieved by choosing a relatively small number of paths, from which a larger set is composed using concatenation.
In this paper we deal with three variants of the Path Layout Problem according to the special characteristics of paths in three label switching protocols, MPLS, ATM and TRAINET. We focus on tree networks, and show an algorithm which finds label tables of small size, while permitting concatenation of at most k paths. We prove that this algorithm gives worst case tight bounds (up to constant factor) for all three models. The bounds are given as a function of the size of the tree, and the maximum degree.
A method for communication management includes detecting addresses of peer nodes (34) belonging to a service layer (30) of a distributed application running on a computer network (22). Responsively to the detected addresses, filtering of communication traffic transmitted by client computers (26) is actuated so as to limit access by the client computers to the distributed application.
In this paper a new mechanism called aggressiveness protective queuing (APQ), is described to control traffic in network devices. The method is based on the principles of the WFQ scheme and which uses dynamic weights for its operation. APQ uses a simple mechanism for tracking the resource usage by the flows and uses this accounting to properly and dynamically control the weights of WFQ. Also analytic results and simulation results are provided to demonstrate the effectiveness of APQ in controlling bursty traffic