On Unsupervised Object-Centric Representation Learning: Advantages and Shortcomings

Yarden Goraly

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2025-112

May 16, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-112.pdf

Unsupervised object-centric representation learning is an active area of research with promising applications to robotics and computer vision. These models go beyond the ability to segment objects in a scene. The goal is for these models to develop a disentangled internal representation of objects in latent space. Some models can even encode specific interpretable properties of these objects, such as position, size and shape, in the latent space. In this work, we review the current literature and history of unsupervised object-centric learning and evaluate the impact of each model and how they compare to human perception. We then look at the current theory related to object-centric latent disentanglement and suggest avenues for future research. Finally, we look into a few novel experiments that improve the segmentation performance of these methods and solve sim-to-real problems. We found that it is possible to improve segmentation performance of unsupervised object-centric models using knowledge distillation while retaining latent encoding of object properties. We also uncover unique ways in which the type of dataset can affect reconstruction quality for real and synthetic inputs.

Advisors: Claire Tomlin

BibTeX citation:

@mastersthesis{Goraly:EECS-2025-112,
    Author= {Goraly, Yarden},
    Editor= {Stocking, Kaylene and Tomlin, Claire},
    Title= {On Unsupervised Object-Centric Representation Learning: Advantages and Shortcomings},
    School= {EECS Department, University of California, Berkeley},
    Year= {2025},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-112.html},
    Number= {UCB/EECS-2025-112},
    Abstract= {Unsupervised object-centric representation learning is an active area of research with promising applications to robotics and computer vision. These models go beyond the ability to
segment objects in a scene. The goal is for these models to develop a disentangled internal
representation of objects in latent space. Some models can even encode specific interpretable
properties of these objects, such as position, size and shape, in the latent space. In this work,
we review the current literature and history of unsupervised object-centric learning and evaluate the impact of each model and how they compare to human perception. We then look at
the current theory related to object-centric latent disentanglement and suggest avenues for
future research. Finally, we look into a few novel experiments that improve the segmentation
performance of these methods and solve sim-to-real problems. We found that it is possible to
improve segmentation performance of unsupervised object-centric models using knowledge
distillation while retaining latent encoding of object properties. We also uncover unique
ways in which the type of dataset can affect reconstruction quality for real and synthetic
inputs.},
}

EndNote citation:

%0 Thesis
%A Goraly, Yarden 
%E Stocking, Kaylene 
%E Tomlin, Claire 
%T On Unsupervised Object-Centric Representation Learning: Advantages and Shortcomings
%I EECS Department, University of California, Berkeley
%D 2025
%8 May 16
%@ UCB/EECS-2025-112
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-112.html
%F Goraly:EECS-2025-112