Grace Luo

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2022-109

May 13, 2022

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-109.pdf

People frequently use the internet to transmit information about events. Whether we see a photo of a protest or a video of an airstrike, we often accept that the event occurred because of the visual evidence. However, what if this visual evidence were miscaptioned or taken out-of-context? Digital forensics, or the examination of the provenance and validity of online media, is a critical practice in fields such as human rights law, investigative journalism, and social media fact checking for this reason. Organizations manually verify textual claims about visual media via reverse image search and geolocation, which is an incredibly time consuming process. This report presents automated methods for verifying image-caption consistency, combining state-of-the-art vision-and-language neural models with real-world data relevant to digital forensics.

Chapter 1 discusses NewsCLIPpings, an approach for producing challenging instances of out- of-context images. Because such media is often unlabeled (and if detected, taken down by platform content moderators), our method can be used to benchmark and augment training data for automated verification methods.

Moving from news to social media, Chapter 2 produces out-of-context images in specific topical domains such as climate change and explores further techniques for automated verifi- cation, including methods for multimodal fusion and remedies for the domain shift between machine-made training data and human-made evaluation data. These chapters also give a glimpse into the outstanding challenges of multimodal digital forensics research, such as understanding the diverse set of text-image relationships present in social media or solving specific subtasks in the verification process such as geolocation.

Advisors: Trevor Darrell


BibTeX citation:

@mastersthesis{Luo:EECS-2022-109,
    Author= {Luo, Grace},
    Title= {Vision and Language for Digital Forensics},
    School= {EECS Department, University of California, Berkeley},
    Year= {2022},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-109.html},
    Number= {UCB/EECS-2022-109},
    Abstract= {People frequently use the internet to transmit information about events. Whether we see a photo of a protest or a video of an airstrike, we often accept that the event occurred because of the visual evidence. However, what if this visual evidence were miscaptioned or taken out-of-context? Digital forensics, or the examination of the provenance and validity of online media, is a critical practice in fields such as human rights law, investigative journalism, and social media fact checking for this reason. Organizations manually verify textual claims about visual media via reverse image search and geolocation, which is an incredibly time consuming process. This report presents automated methods for verifying image-caption consistency, combining state-of-the-art vision-and-language neural models with real-world data relevant to digital forensics.

Chapter 1 discusses NewsCLIPpings, an approach for producing challenging instances of out- of-context images. Because such media is often unlabeled (and if detected, taken down by platform content moderators), our method can be used to benchmark and augment training data for automated verification methods.

Moving from news to social media, Chapter 2 produces out-of-context images in specific topical domains such as climate change and explores further techniques for automated verifi- cation, including methods for multimodal fusion and remedies for the domain shift between machine-made training data and human-made evaluation data. These chapters also give a glimpse into the outstanding challenges of multimodal digital forensics research, such as understanding the diverse set of text-image relationships present in social media or solving specific subtasks in the verification process such as geolocation.},
}

EndNote citation:

%0 Thesis
%A Luo, Grace 
%T Vision and Language for Digital Forensics
%I EECS Department, University of California, Berkeley
%D 2022
%8 May 13
%@ UCB/EECS-2022-109
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2022/EECS-2022-109.html
%F Luo:EECS-2022-109