Modeling Social Interactions from Multimodal Signals

Evonne Ng

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-101

May 14, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-101.pdf

As social agents, humans have social intelligence that enables them to engage with others through complex signals such as facial expressions, body motion, and speech, among other communication modalities. Building systems that can perceive and model these fine-grain interaction dynamics is therefore essential for advancing human-machine interactions in everyday life. In this talk, I will share three of our recent advancements in this research area. The first one explores modeling nonverbal dyadic conversational dynamics between the facial expressions of a speaker and listener. The second one builds upon this prior work, expanding to full-body dynamics. Our final work then explores how we can incorporate higher-level syntactic understanding by leveraging large language models.

Advisors: Trevor Darrell and Angjoo Kanazawa

BibTeX citation:

@phdthesis{Ng:EECS-2024-101,
    Author= {Ng, Evonne},
    Editor= {Darrell, Trevor and Kanazawa, Angjoo and Malik, Jitendra and Gopnik, Alison},
    Title= {Modeling Social Interactions from Multimodal Signals},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-101.html},
    Number= {UCB/EECS-2024-101},
    Abstract= {As social agents, humans have social intelligence that enables them to engage with others through complex signals such as facial expressions, body motion, and speech, among other communication modalities. Building systems that can perceive and model these fine-grain interaction dynamics is therefore essential for advancing human-machine interactions in everyday life. In this talk, I will share three of our recent advancements in this research area. The first one explores modeling nonverbal dyadic conversational dynamics between the facial expressions of a speaker and listener. The second one builds upon this prior work, expanding to full-body dynamics. Our final work then explores how we can incorporate higher-level syntactic understanding by leveraging large language models.},
}

EndNote citation:

%0 Thesis
%A Ng, Evonne 
%E Darrell, Trevor 
%E Kanazawa, Angjoo 
%E Malik, Jitendra 
%E Gopnik, Alison 
%T Modeling Social Interactions from Multimodal Signals
%I EECS Department, University of California, Berkeley
%D 2024
%8 May 14
%@ UCB/EECS-2024-101
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-101.html
%F Ng:EECS-2024-101