Analyzing 18th-20th Century Art and Music with Contrastive Cross-Modal Learning

Vivien Nguyen

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-160

August 14, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-160.pdf

The relationship between art and music goes back at least as far as the depiction of instruments and musicians on ancient walls and vases. In more recent centuries, some artists, composers, and theorists have tried to define and explore the relationship between visual and musical concepts more abstractly. Why should two seemingly completely separate creative domains be related to one another at all? It turns out that even the typical human is generally able to form relationships between different sensory inputs, even when it is not always clear how those relationships are formed or what they are based off of. In the world of artificial intelligence, this insight has led to a long line of work in exploring multimodal machine learning. These works are built on the idea that, for machines to more successfully reason about and navigate the human world, models need to be able to process and interpret multimodal signals. In this work, we are interested in exploring the relationship between art and music, and more broadly, are motivated by questions of cross-modal perception. We apply techniques from multimodal machine learning to a novel domain, paintings and classical music, in order to learn a shared representation between two different creative modalities. Our results demonstrate that such a representation can be achieved even with limited supervision. Our embedding space is one that is chronologically organized; works that were created close in time to one another lie close to one another in this embedding space, regardless of their modality (paintings or music). We hypothesize that future work can improve upon and use such a representation to pro- pose relationships between works from these two domains. Doing so could provide valuable insights about the shared culture two works come from, or about the basis of cross-modal perception.

Advisors: Ren Ng

BibTeX citation:

@mastersthesis{Nguyen:EECS-2020-160,
    Author= {Nguyen, Vivien},
    Editor= {Ng, Ren and Efros, Alexei (Alyosha)},
    Title= {Analyzing 18th-20th Century Art and Music with Contrastive Cross-Modal Learning},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {Aug},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-160.html},
    Number= {UCB/EECS-2020-160},
    Abstract= {The relationship between art and music goes back at least as far as the depiction of instruments and musicians on ancient walls and vases. In more recent centuries, some artists, composers, and theorists have tried to define and explore the relationship between visual and musical concepts more abstractly.
Why should two seemingly completely separate creative domains be related to one another at all? It turns out that even the typical human is generally able to form relationships between different sensory inputs, even when it is not always clear how those relationships are formed or what they are based off of.
In the world of artificial intelligence, this insight has led to a long line of work in exploring multimodal machine learning. These works are built on the idea that, for machines to more successfully reason about and navigate the human world, models need to be able to process and interpret multimodal signals.
In this work, we are interested in exploring the relationship between art and music, and more broadly, are motivated by questions of cross-modal perception. We apply techniques from multimodal machine learning to a novel domain, paintings and classical music, in order to learn a shared representation between two different creative modalities. Our results demonstrate that such a representation can be achieved even with limited supervision.
Our embedding space is one that is chronologically organized; works that were created close in time to one another lie close to one another in this embedding space, regardless of their modality (paintings or music).
We hypothesize that future work can improve upon and use such a representation to pro- pose relationships between works from these two domains. Doing so could provide valuable insights about the shared culture two works come from, or about the basis of cross-modal perception.},
}

EndNote citation:

%0 Thesis
%A Nguyen, Vivien 
%E Ng, Ren 
%E Efros, Alexei (Alyosha) 
%T Analyzing 18th-20th Century Art and Music with Contrastive Cross-Modal Learning
%I EECS Department, University of California, Berkeley
%D 2020
%8 August 14
%@ UCB/EECS-2020-160
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-160.html
%F Nguyen:EECS-2020-160