Unsupervised Text Generation and its Application to News Interfaces

Philippe Laban

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2021-215

September 14, 2021

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-215.pdf

Recent progress in automated text generation relies predominantly on the use of large datasets, sometimes requiring millions of examples for each application setting. In the first part of this thesis, we advance the field by developing novel text generation methods that balance the goals of fluency, consistency, and relevancy without requiring any training data. We achieve this objective on tasks such as text summarization and simplification by directly defining a multi-component reward, and training text generators to optimize this objective. The novel approaches that we introduce perform better than all existing unsupervised approaches and in many cases outperform those that rely on large datasets. The second part of the thesis incorporates text generation into interfaces to help news readers navigate complex, unfolding news topics. We build a novel representation of news stories at scale and integrate new summarization, question generation and question answering modules into a chatbot and an automated interactive podcast. Human evaluations confirm that even though imperfect systems introduce friction for the user, they can serve as powerful tools to stimulate reader curiosity and help readers dive deeper into unfolding topics.

Advisors: John F. Canny and Marti Hearst

BibTeX citation:

@phdthesis{Laban:EECS-2021-215,
    Author= {Laban, Philippe},
    Title= {Unsupervised Text Generation and its Application to News Interfaces},
    School= {EECS Department, University of California, Berkeley},
    Year= {2021},
    Month= {Sep},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-215.html},
    Number= {UCB/EECS-2021-215},
    Abstract= {Recent progress in automated text generation relies predominantly on the use of large datasets,
sometimes requiring millions of examples for each application setting. In the first part of this thesis,
we advance the field by developing novel text generation methods that balance the goals of fluency,
consistency, and relevancy without requiring any training data. We achieve this objective on tasks
such as text summarization and simplification by directly defining a multi-component reward, and
training text generators to optimize this objective. The novel approaches that we introduce perform
better than all existing unsupervised approaches and in many cases outperform those that rely on
large datasets.
The second part of the thesis incorporates text generation into interfaces to help news readers
navigate complex, unfolding news topics. We build a novel representation of news stories at
scale and integrate new summarization, question generation and question answering modules into
a chatbot and an automated interactive podcast. Human evaluations confirm that even though
imperfect systems introduce friction for the user, they can serve as powerful tools to stimulate reader
curiosity and help readers dive deeper into unfolding topics.},
}

EndNote citation:

%0 Thesis
%A Laban, Philippe 
%T Unsupervised Text Generation and its Application to News Interfaces
%I EECS Department, University of California, Berkeley
%D 2021
%8 September 14
%@ UCB/EECS-2021-215
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2021/EECS-2021-215.html
%F Laban:EECS-2021-215