David D. Palmer and Marti A. Hearst

EECS Department, University of California, Berkeley

Technical Report No. UCB/CSD-94-797

, 1994

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/CSD-94-797.pdf

Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.


BibTeX citation:

@techreport{Palmer:CSD-94-797,
    Author= {Palmer, David D. and Hearst, Marti A.},
    Title= {Adaptive Sentence Boundary Disambiguation},
    Year= {1994},
    Month= {Feb},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/6317.html},
    Number= {UCB/CSD-94-797},
    Abstract= {Labeling of sentence boundaries is a necessary prerequisite for many natural language processing tasks, including part-of-speech tagging and sentence alignment. End-of-sentence punctuation marks are ambiguous; to disambiguate them most systems use brittle, special-purpose regular expression grammars and exception rules. As an alternative, we have developed an efficient, trainable algorithm that uses a lexicon with part-of-speech probabilities and a feed-forward neural network. After training for less than one minute, the method correctly labels over 98.5% of sentence boundaries in a corpus of over 27,000 sentence-boundary marks. We show the method to be efficient and easily adaptable to different text genres, including single-case texts.},
}

EndNote citation:

%0 Report
%A Palmer, David D. 
%A Hearst, Marti A. 
%T Adaptive Sentence Boundary Disambiguation
%I EECS Department, University of California, Berkeley
%D 1994
%@ UCB/CSD-94-797
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/6317.html
%F Palmer:CSD-94-797