A Scalable, Empirical Approach to Anaphoric Reference

Caroline Tice

EECS Department
University of California, Berkeley
Technical Report No. UCB/CSD-94-827
August 1994

http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/CSD-94-827.pdf

Most of the suggested solutions for the anaphora problem have been concerned with getting the correct answer in all situations. This means they must involve large amounts of syntactic, semantic and world knowledge, and therefore can only be applied to a limited scope of documents. These solutions do not scale well. We have designed and implemented a scalable heuristic approach to the anaphora problem. Our main concern was not to get the correct answer in all situations, but to create an easily scalable solution that will find the correct answer most of the time. We designed our heuristics in two stages. First we created the simplest possible solution we could. We then used this simple solution as a baseline against which to measure our more advanced heuristics. Our heuristic solutions work for both definite noun phrase anaphora and pronoun anaphora. We tested our implementations on two moderate-sized pieces of text, containing a total of 670 definite noun phrases and 95 pronouns. Our baseline program achieved 50.9% accuracy for the definite noun phrases, and 30.5% accuracy for the pronouns. Our more advanced heuristics showed a dramatic improvement over the baseline, with 71.0% accuracy for the definite noun phrases, and 73.7% accuracy for the pronouns.


BibTeX citation:

@techreport{Tice:CSD-94-827,
    Author = {Tice, Caroline},
    Title = {A Scalable, Empirical Approach to Anaphoric Reference},
    Institution = {EECS Department, University of California, Berkeley},
    Year = {1994},
    Month = {Aug},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/5287.html},
    Number = {UCB/CSD-94-827},
    Abstract = {Most of the suggested solutions for the anaphora problem have been concerned with getting the correct answer in all situations. This means they must involve large amounts of syntactic, semantic and world knowledge, and therefore can only be applied to a limited scope of documents. These solutions do not scale well. We have designed and implemented a scalable heuristic approach to the anaphora problem. Our main concern was not to get the correct answer in all situations, but to create an easily scalable solution that will find the correct answer most of the time. We designed our heuristics in two stages. First we created the simplest possible solution we could. We then used this simple solution as a baseline against which to measure our more advanced heuristics. Our heuristic solutions work for both definite noun phrase anaphora and pronoun anaphora. We tested our implementations on two moderate-sized pieces of text, containing a total of 670 definite noun phrases and 95 pronouns. Our baseline program achieved 50.9% accuracy for the definite noun phrases, and 30.5% accuracy for the pronouns. Our more advanced heuristics showed a dramatic improvement over the baseline, with 71.0% accuracy for the definite noun phrases, and 73.7% accuracy for the pronouns.}
}

EndNote citation:

%0 Report
%A Tice, Caroline
%T A Scalable, Empirical Approach to Anaphoric Reference
%I EECS Department, University of California, Berkeley
%D 1994
%@ UCB/CSD-94-827
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/1994/5287.html
%F Tice:CSD-94-827