David D. Palmer and Marti A. Hearst
EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-94-797
February 1994
http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/CSD-94-797.pdf
Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.
BibTeX citation:
@techreport{Palmer:CSD-94-797, Author = {Palmer, David D. and Hearst, Marti A.}, Title = {Adaptive Sentence Boundary Disambiguation}, Institution = {EECS Department, University of California, Berkeley}, Year = {1994}, Month = {Feb}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/6317.html}, Number = {UCB/CSD-94-797}, Abstract = {Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.} }
EndNote citation:
%0 Report %A Palmer, David D. %A Hearst, Marti A. %T Adaptive Sentence Boundary Disambiguation %I EECS Department, University of California, Berkeley %D 1994 %@ UCB/CSD-94-797 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/6317.html %F Palmer:CSD-94-797