Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman and Randy H. Katz

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2006-76
May 22, 2006

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-76.pdf

Packet content scanning at high speed has become extremely important due to its applications in network security, network monitoring, HTTP load balancing, etc. In content scanning, the packet payload is compared against a set of patterns specified as regular expressions. In this paper, we first show that memory requirements using traditional methods are prohibitively high for many patterns used in packet scanning applications. We then propose regular expression rewrite techniques that can effectively reduce memory usage. Further, we develop a grouping scheme that can strategically compile a set of regular expressions into several engines, resulting in remarkable improvement of regular expression matching speed without much increase in memory usage. We implement a new DFA-based packet scanner using the above techniques. Our experimental results using real-world traffic and patterns show that our implementation achieves a factor of 12 to 42 performance improvement over a commonly used DFA-based scanner. Compared to the state-of-art NFA-based implementation, our DFA-based packet scanner achieves 50 to 700 times speedup.


BibTeX citation:

@techreport{Yu:EECS-2006-76,
    Author = {Yu, Fang and Chen, Zhifeng and Diao, Yanlei and Lakshman, T.V. and Katz, Randy H.},
    Title = {Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2006},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-76.html},
    Number = {UCB/EECS-2006-76},
    Abstract = {Packet content scanning at high speed has become extremely important due to its applications in network security, network monitoring, HTTP load balancing, etc. In content scanning, the packet payload is compared against a set of patterns specified as regular expressions.  In this paper, we first show that memory requirements using traditional methods are prohibitively high for many patterns used in packet scanning applications.  We then propose regular expression rewrite techniques that can effectively reduce memory usage. Further, we develop a grouping scheme that can strategically compile a set of regular expressions into several engines, resulting in remarkable improvement of regular expression matching speed without much increase in memory usage. We implement a new DFA-based packet scanner using the above techniques. Our experimental results using real-world traffic and patterns show that our implementation achieves a factor of 12 to 42 performance improvement over a commonly used DFA-based scanner.  Compared to the state-of-art NFA-based implementation, our DFA-based packet scanner achieves 50 to 700 times speedup.}
}

EndNote citation:

%0 Report
%A Yu, Fang
%A Chen, Zhifeng
%A Diao, Yanlei
%A Lakshman, T.V.
%A Katz, Randy H.
%T Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection
%I EECS Department, University of California, Berkeley
%D 2006
%8 May 22
%@ UCB/EECS-2006-76
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-76.html
%F Yu:EECS-2006-76