Hallucination Is All You Need: Using Generative Models for Test Time Data Augmentation

Dhruv Jhamb

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-85

May 12, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-85.pdf

Multimodal learning, which consists of building models that can take information from different modalities as input, is growing in popularity due to its potential. Deep learning-based multimodal models can be applied to a variety of downstream tasks such as video description, sentiment analysis, event detection, cross-modal translation, and cross-modal retrieval. Inherently, we can expect multimodal models to outperform unimodal models because the additional modalities provide more information. The way humans experience and learn is multimodal, as we combine multiple senses to experience the world around us. In the ideal case, we assume completeness of data, meaning that all modalities are entirely present. However, this assumption is not always guaranteed at test time, meaning that it is necessary to create multimodal models robust to missing modalities in real-world applications. We choose to address this missing modality problem during test time by comparing several feature reconstruction methods on multimodal emotion recognition datasets.

Advisors: John F. Canny

BibTeX citation:

@mastersthesis{Jhamb:EECS-2022-85,
    Author= {Jhamb, Dhruv},
    Editor= {Chan, David and Canny, John F. and Zakhor, Avideh},
    Title= {Hallucination Is All You Need: Using Generative Models for Test Time Data Augmentation},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-85.html},
    Number= {UCB/EECS-2022-85},
    Abstract= {Multimodal learning, which consists of building models that can take information from different modalities as input, is growing in popularity due to its potential. Deep learning-based multimodal models can be applied to a variety of downstream tasks such as video description, sentiment analysis, event detection, cross-modal translation, and cross-modal retrieval. Inherently, we can expect multimodal models to outperform unimodal models because the additional modalities provide more information. The way humans experience and learn is multimodal, as we combine multiple senses to experience the world around us. In the ideal case, we assume completeness of data, meaning that all modalities are entirely present. However, this assumption is not always guaranteed at test time, meaning that it is necessary to create multimodal models robust to missing modalities in real-world applications. We choose to address this missing modality problem during test time by comparing several feature reconstruction methods on multimodal emotion recognition datasets.},
}

EndNote citation:

%0 Thesis
%A Jhamb, Dhruv 
%E Chan, David 
%E Canny, John F. 
%E Zakhor, Avideh 
%T Hallucination Is All You Need: Using Generative Models for Test Time Data Augmentation
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 12
%@ UCB/EECS-2022-85
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-85.html
%F Jhamb:EECS-2022-85