Leveraging Speaker Context for Natural Language Processing

Samee Ibraheem

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-242

December 1, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-242.pdf

Neural networks have allowed for a host of advances in Natural Language Processing (NLP), from text classification to machine translation. These applications have demonstrated the ability to capture the effects of properties such as sentiment and politeness on language usage through computational means. However, using NLP to examine the effects of contextual information in relation to the intrinsic features of one's identity or the extrinsic features of one's conversational role is still an active area of research. This thesis focuses on modeling the effects of speaker attributes on language, looking at applications that are designed to help improve the safety of users in the digital world. Gender is a personal characteristic that people might not wish to share online but that can be determined by one's language use. We first examine how intrinsic speaker attributes affect language by attempting to obfuscate the gender of users on Reddit. Detecting deceptive actors in online interactions is also important for user security. We next explore the effect of extrinsic speaker attributes on language through the game of Mafia, in which participants may take on either an honest or a deceptive role. Through these analyses, we demonstrate that there are linguistic differences based on a person's role or identity, indicating that these aspects of an entity might be identified through their linguistic behavior. In addition to providing insight on how such entities use language in accordance with these features, these applications have implications for real-life communication paradigms, providing possible avenues for hiding aspects of one's identity or discovering aspects of another's.

Advisors: John DeNero

BibTeX citation:

@phdthesis{Ibraheem:EECS-2023-242,
    Author= {Ibraheem, Samee},
    Title= {Leveraging Speaker Context for Natural Language Processing},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-242.html},
    Number= {UCB/EECS-2023-242},
    Abstract= {Neural networks have allowed for a host of advances in Natural Language Processing (NLP), from text classification to machine translation. These applications have demonstrated the ability to capture the effects of properties such as sentiment and politeness on language usage through computational means. However, using NLP to examine the effects of contextual information in relation to the intrinsic features of one's identity or the extrinsic features of one's conversational role is still an active area of research. This thesis focuses on modeling the effects of speaker attributes on language, looking at applications that are designed to help improve the safety of users in the digital world. Gender is a personal characteristic that people might not wish to share online but that can be determined by one's language use. We first examine how intrinsic speaker attributes affect language by attempting to obfuscate the gender of users on Reddit. Detecting deceptive actors in online interactions is also important for user security. We next explore the effect of extrinsic speaker attributes on language through the game of Mafia, in which participants may take on either an honest or a deceptive role. Through these analyses, we demonstrate that there are linguistic differences based on a person's role or identity, indicating that these aspects of an entity might be identified through their linguistic behavior. In addition to providing insight on how such entities use language in accordance with these features, these applications have implications for real-life communication paradigms, providing possible avenues for hiding aspects of one's identity or discovering aspects of another's.},
}

EndNote citation:

%0 Thesis
%A Ibraheem, Samee 
%T Leveraging Speaker Context for Natural Language Processing
%I EECS Department, University of California, Berkeley
%D 2023
%8 December 1
%@ UCB/EECS-2023-242
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-242.html
%F Ibraheem:EECS-2023-242