URL matching lies at the core of many networking applications and Information Centric Networking architectures. For example, URL matching is extensively used by Layer 7 switches, ICN/NDN routers, load balancers, and security devices. Modern URL matching is done by maintaining a rich database that consists of tens of millions of URL which are classified to dozens of categories (or egress ports). In real-time, any input URL has to be searched in this database to find the corresponding category.
In this paper, we introduce a generic framework for accurate URL matching (namely, no false positives or miscategorization) that aims to reduce the overall memory footprint, while still having low matching latency. We introduce a dictionary-based compression method that compresses the database by 60%, while having only a slight overhead in time. Our framework is very flexible and it allows hot-updates,
cloud-based deployments, and can deal with strings that are not URLs.
Deep Packet Inspection (DPI) plays a major role in contemporary networks. Specifically, in datacenters of content providers, the scanned data may be highly repetitive. Most DPI engines are based on identifying signatures in the packet payload. This pattern matching process is expensive both in memory and CPU resources, and thus, often becomes the bottleneck of the entire application.
In this paper we show how DPI can be accelerated by leveraging repetitions in the inspected traffic. Our new mechanism makes use of these repetitions to allow the repeated data to be skipped rather than scanned again. The mechanism consists of a slow path, in which frequently repeated strings are identified and stored in a dictionary, along with some succinct information for accelerating the DPI process, and a data path, where the traffic is scanned byte by byte but strings from the dictionary, if encountered, are skipped. Upon skipping, the data path recovers to the state it would have been in had the scanning continued byte by byte.
Our solution achieves a significant performance boost, especially when data is from the same content source (e.g., the same website). Our experiments show that for such cases, our solution achieves a throughput gain of 1.25 − 2.5 times the original throughput, when implemented in software.
Deep Packet Inspection (DPI) is the most time and resource consuming procedure in contemporary security tools such as Network Intrusion Detection/Prevention System (NIDS/IPS), Web Application Firewall (WAF), or Content Filtering Proxy. DPI consists of inspecting both the packet header and payload and alerting when signatures of malicious software appear in the traffic. These signatures are identified through pattern matching algorithms.
The portion of compressed traffic of overall Internet traffic is constantly increasing. This paper focuses on traffic compressed using shared dictionary. Unlike traditional compression algorithms, this compression method takes advantage of the inter-response redundancy (e.g., almost the same data is sent over and over again) as in nowadays dynamic Data. Shared Dictionary Compression over HTTP (SDCH), introduced by Google in 2008, is the first algorithm of this type. SDCH works well with other compression algorithm (as Gzip), making it even more appealing. Performing DPI on any compressed traffic is considered hard, therefore today’s security tools either do not inspect compressed data, alter HTTP headers to avoid compression, or decompress the traffic before inspecting it.
We present a novel pattern matching algorithm that inspects SDCH-compressed traffic without decompressing it first. Our algorithm relies on offline inspection of the shared dictionary, which is common to all compressed traffic, and marking auxiliary information on it to speed up the online DPI inspection. We show that our algorithm works near the rate of the compressed traffic, implying a speed gain of SDCH’s compression ratio (which is around 40%). We also discuss how to deal with SDCH compression over Gzip compression, and show how to perform regular expression matching with about the same speed gain.