Bidirectional Protocol Reverse Engineering: Message Format Extraction and Field Semantics Inference

Juan Caballero, Pongsin Poosankam, Christian Kreibich and Dawn Song

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2009-57
May 5, 2009

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-57.pdf

Automatic protocol reverse-engineering is important for many security applications, including the analysis and defense against botnets. Understanding such C&C protocols is crucial for anticipating a botnet’s repertoire of nefarious activity and to enable active botnet infiltration. Frequently, messages sent and received by a bot have to be rewritten in order to contain malicious activity and to provide the botmaster with an illusion of successful and unhampered operation. To enable such rewriting, we need detailed information about the intent and structure of the messages in both directions of the communication despite the fact that we generally only have access to the implementation of one endpoint, namely the bot binary. Current techniques cannot enable such rewriting. In this paper, we propose techniques to extract the format of the protocol messages sent by an application that implements a protocol specification, and to infer the field semantics for messages both sent and received by the application. Our techniques enable applications such as rewriting the C&C messages for active botnet infiltration. We implement our techniques into Dispatcher, a tool to extract the message format and field semantics of both received and sent messages. We use Dispatcher to analyze MegaD, a prevalent spam botnet employing a hitherto undocumented C&C protocol, and show that the protocol information extracted by Dispatcher can be used to rewrite the messages sent upstream to the botmaster.


BibTeX citation:

@techreport{Caballero:EECS-2009-57,
    Author = {Caballero, Juan and Poosankam, Pongsin and Kreibich, Christian and Song, Dawn},
    Title = {Bidirectional Protocol Reverse Engineering: Message Format Extraction and Field Semantics Inference},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {2009},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-57.html},
    Number = {UCB/EECS-2009-57},
    Abstract = {Automatic protocol reverse-engineering is important for many security applications, including the analysis and defense against botnets. Understanding such C&C protocols is crucial for anticipating a botnet’s repertoire of nefarious activity and to enable active botnet infiltration. Frequently, messages sent and received by a bot have to be rewritten in order to contain malicious activity and to provide the botmaster with an illusion of successful and unhampered operation. To enable such rewriting, we need detailed information about the intent and structure of the messages in both directions of the communication despite the fact that we generally only have access to the implementation of one endpoint, namely the bot binary. Current techniques cannot enable such rewriting. In this paper, we propose techniques to extract the format of the protocol messages sent by an application that implements a protocol specification, and to infer the field semantics for messages both sent and received by the application. Our techniques enable applications such as rewriting the C&C messages for active botnet infiltration. We implement our techniques into Dispatcher, a tool to extract the message format and field semantics of both received and sent messages. We use Dispatcher to analyze MegaD, a prevalent spam botnet employing a hitherto undocumented C&C protocol, and show that the protocol information extracted by Dispatcher can be used to rewrite the messages sent upstream to the botmaster.}
}

EndNote citation:

%0 Report
%A Caballero, Juan
%A Poosankam, Pongsin
%A Kreibich, Christian
%A Song, Dawn
%T Bidirectional Protocol Reverse Engineering: Message Format Extraction and Field Semantics Inference
%I EECS Department, University of California, Berkeley
%D 2009
%8 May 5
%@ UCB/EECS-2009-57
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-57.html
%F Caballero:EECS-2009-57