Scaling Properties of Diffusion Models for Perceptual Tasks

Zeeshan Patel and Rahul Ravishankar and Jathushan Rajasegaran and Jitendra Malik

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2025-38

May 1, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-38.pdf

In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. We release code and models at https://scaling-diffusion-perception.github.io.

Advisors: Alexei (Alyosha) Efros

BibTeX citation:

@mastersthesis{Patel:EECS-2025-38,
    Author= {Patel, Zeeshan and Ravishankar, Rahul and Rajasegaran, Jathushan and Malik, Jitendra},
    Editor= {Efros, Alexei (Alyosha)},
    Title= {Scaling Properties of Diffusion Models for Perceptual Tasks},
    School= {EECS Department, University of California, Berkeley},
    Year= {2025},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-38.html},
    Number= {UCB/EECS-2025-38},
    Abstract= {In this paper, we argue that iterative computation with diffusion models offers a powerful paradigm for not only generation but also visual perception tasks. We unify tasks such as depth estimation, optical flow, and amodal segmentation under the framework of image-to-image translation, and show how diffusion models benefit from scaling training and test-time compute for these perceptual tasks. Through a careful analysis of these scaling properties, we formulate compute-optimal training and inference recipes to scale diffusion models for visual perception tasks. Our models achieve competitive performance to state-of-the-art methods using significantly less data and compute. We release code and models at https://scaling-diffusion-perception.github.io.},
}

EndNote citation:

%0 Thesis
%A Patel, Zeeshan 
%A Ravishankar, Rahul 
%A Rajasegaran, Jathushan 
%A Malik, Jitendra 
%E Efros, Alexei (Alyosha) 
%T Scaling Properties of Diffusion Models for Perceptual Tasks
%I EECS Department, University of California, Berkeley
%D 2025
%8 May 1
%@ UCB/EECS-2025-38
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-38.html
%F Patel:EECS-2025-38