TreeRegex: An Extension to Regular Expressions for Matching and Manipulating Tree-Structured Text (Technical Report)
Benjamin Mehne
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2017-202
December 12, 2017
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-202.pdf
Tree-structured text is ubiquitous in software engineering and programming tasks. However, despite its prevalence, users frequently write custom, specialized routines to query and update such text.For example, a user might wish to rapidly prototype a compiler fora domain-specific language by issuing successive transformations,or they might wish to identify all the call sites of a particular function in a project (e.g. eval in JavaScript). We propose a natural and intuitive extension to regular expressions, called TreeRegex, which can specify patterns over tree-structured text. A key insight behind the design of TreeRegex is that if we annotate a string with special markers to expose information about the string’s tree structure,then a simple extension to regular expressions can be used to describe patterns over the annotated string. We develop an algorithm for matching TreeRegex expressions against annotated texts and report on five case studies where we find that using TreeRegex simplifies various tasks related to searching and modifying tree-structured texts.
Advisors: Koushik Sen
BibTeX citation:
@mastersthesis{Mehne:EECS-2017-202, Author= {Mehne, Benjamin}, Title= {TreeRegex: An Extension to Regular Expressions for Matching and Manipulating Tree-Structured Text (Technical Report)}, School= {EECS Department, University of California, Berkeley}, Year= {2017}, Month= {Dec}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-202.html}, Number= {UCB/EECS-2017-202}, Abstract= {Tree-structured text is ubiquitous in software engineering and programming tasks. However, despite its prevalence, users frequently write custom, specialized routines to query and update such text.For example, a user might wish to rapidly prototype a compiler fora domain-specific language by issuing successive transformations,or they might wish to identify all the call sites of a particular function in a project (e.g. eval in JavaScript). We propose a natural and intuitive extension to regular expressions, called TreeRegex, which can specify patterns over tree-structured text. A key insight behind the design of TreeRegex is that if we annotate a string with special markers to expose information about the string’s tree structure,then a simple extension to regular expressions can be used to describe patterns over the annotated string. We develop an algorithm for matching TreeRegex expressions against annotated texts and report on five case studies where we find that using TreeRegex simplifies various tasks related to searching and modifying tree-structured texts.}, }
EndNote citation:
%0 Thesis %A Mehne, Benjamin %T TreeRegex: An Extension to Regular Expressions for Matching and Manipulating Tree-Structured Text (Technical Report) %I EECS Department, University of California, Berkeley %D 2017 %8 December 12 %@ UCB/EECS-2017-202 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-202.html %F Mehne:EECS-2017-202