Tel-Aviv University
Advisors: Prof. Yehuda Afek and Prof. Anat Bremler-Barr
Graduation 2013
Post-doc at Princeton University with Prof. Jennifer Rexford
Current: AT&T Lead Inventive Scientist
Tel-Aviv University
Advisors: Prof. Yehuda Afek and Prof. Anat Bremler-Barr
Graduation 2013
Post-doc at Princeton University with Prof. Jennifer Rexford
Current: AT&T Lead Inventive Scientist
This paper focuses on regular expression matching over compressed traffic. The need for such matching arises from two independent trends. First, the volume and share of compressed HTTP traffic is constantly increasing. Second, due to their superior expressibility, current Deep Packet Inspection engines use regular expressions more and more frequently.
We present an algorithmic framework to accelerate such matching, taking advantage of information gathered when the traffic was initially compressed. HTTP compression is typically performed through the GZIP protocol, which uses back-references to repeated strings. Our algorithm is based on calculating (for every byte) the minimum number of (previous) bytes that can be part of a future regular expression matching. When inspecting a back-reference, only these bytes should be taken into account, thus enabling one to skip repeated strings almost entirely without missing a match. We show that our generic framework works with either NFA-based or DFA-based implementations and gains performance boosts of more than 70%. Moreover, it can be readily adapted to most existing regular expression matching algorithms, which usually are based either on NFA, DFA or combinations of the two. Finally, we discuss other applications in which calculating the number of relevant bytes becomes handy, even when the traffic is not compressed.
Middleboxes play a major role in contemporary networks, as for- warding packets is often not enough to meet operator demands, and other functionalities (such as security, QoS/QoE provisioning, and load balancing) are required. Traffic is usually routed through a sequence of such middleboxes, which either reside across the net- work or in a single, consolidated location. Although middleboxes provide a vast range of different capabilities, there are components that are shared among many of them.
A task common to almost all middleboxes that deal with L7 protocols is Deep Packet Inspection (DPI). Today, traffic is inspected from scratch by all the middleboxes on its route. In this paper, we propose to treat DPI as a service to the middleboxes, implying that traffic should be scanned only once, but against the data of all middleboxes that use the service. The DPI service then passes the scan results to the appropriate middleboxes. Having DPI as a service has significant advantages in performance, scalability, robustness, and as a catalyst for innovation in the middlebox domain. Moreover, technologies and solutions for current Software Defined Networks (SDN) (e.g., SIMPLE [42]) make it feasible to implement such a service and route traffic to and from its instances.
A central component in all contemporary intrusion detection systems (IDSs) is their pattern matching algorithms, which are often based on constructing and traversing a deterministic finite automaton (DFA) that represents the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs, which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name used by common DFA encoding is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state are represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Match (LPM), which can be solved either in ternary content-addressable memory (TCAM), in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gb/s with low power consumption.
Deep Packet Inspection (DPI) is the most time and resource consuming procedure in contemporary security tools such as Network Intrusion Detection/Prevention System (NIDS/IPS), Web Application Firewall (WAF), or Content Filtering Proxy. DPI consists of inspecting both the packet header and payload and alerting when signatures of malicious software appear in the traffic. These signatures are identified through pattern matching algorithms.
The portion of compressed traffic of overall Internet traffic is constantly increasing. This paper focuses on traffic compressed using shared dictionary. Unlike traditional compression algorithms, this compression method takes advantage of the inter-response redundancy (e.g., almost the same data is sent over and over again) as in nowadays dynamic Data. Shared Dictionary Compression over HTTP (SDCH), introduced by Google in 2008, is the first algorithm of this type. SDCH works well with other compression algorithm (as Gzip), making it even more appealing. Performing DPI on any compressed traffic is considered hard, therefore today’s security tools either do not inspect compressed data, alter HTTP headers to avoid compression, or decompress the traffic before inspecting it.
We present a novel pattern matching algorithm that inspects SDCH-compressed traffic without decompressing it first. Our algorithm relies on offline inspection of the shared dictionary, which is common to all compressed traffic, and marking auxiliary information on it to speed up the online DPI inspection. We show that our algorithm works near the rate of the compressed traffic, implying a speed gain of SDCH’s compression ratio (which is around 40%). We also discuss how to deal with SDCH compression over Gzip compression, and show how to perform regular expression matching with about the same speed gain.
This paper takes advantage of the emerging multi-core computer architecture to design a general framework for mitigating network-based complexity attacks. In complexity attacks, an attacker carefully crafts “heavy” messages (or packets) such that each heavy message consumes substantially more resources than a normal message. Then, it sends a sufficient number of heavy messages to bring the system to a crawl at best. In our architecture, called MCA^2 – Multi-Core Architecture for Mitigating Complexity Attacks – cores quickly identify such suspicious messages and divert them to a fraction of the cores that are dedicated to handle all the heavy messages. This keeps the rest of the cores relatively unaffected and free to provide the legitimate traffic the same quality of service as if no attack takes place.
We demonstrate the effectiveness of our architecture by examining cache-miss complexity attacks against Deep Packet Inspection (DPI) engines. For example, for Snort DPI engine, an attack in which 30% of the packets are malicious degrades the system throughput by over 50%, while with MCA^2 the throughput drops by either 20% when no packets are dropped or by 10% in case dropping of heavy packets is allowed. At 60% malicious packets, the corresponding numbers are 70%, 40% and 23%.
DEEPNESS Lab 2022 © all rights reserved