Adaptive Sentence Boundary Disambiguation
David D. Palmer and Marti A. Hearst
EECS Department, University of California, Berkeley
Technical Report No. UCB/CSD-94-797
, 1994
http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/CSD-94-797.pdf
Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.
BibTeX citation:
@techreport{Palmer:CSD-94-797, Author= {Palmer, David D. and Hearst, Marti A.}, Title= {Adaptive Sentence Boundary Disambiguation}, Year= {1994}, Month= {Feb}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/6317.html}, Number= {UCB/CSD-94-797}, Abstract= {Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.}, }
EndNote citation:
%0 Report %A Palmer, David D. %A Hearst, Marti A. %T Adaptive Sentence Boundary Disambiguation %I EECS Department, University of California, Berkeley %D 1994 %@ UCB/CSD-94-797 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/6317.html %F Palmer:CSD-94-797