Ajay Jain

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-161

May 12, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-161.pdf

We present progress in developing stable, scalable and transferable generative models for visual data. We first learn expressive image priors using autoregressive models which generate high-quality and diverse images. We then explore transfer learning to generalize visual representations models to new data modalities with limited available data. We propose two methods to generate high quality 3D graphics from sparse input images or natural language descriptions by distilling knowledge from pretrained discriminative vision models. We briefly summarize our work on improving generation quality with a Denoising Diffusion Probabilistic Model, and demonstrate how to transfer it to new modalities including high-quality text-to-3D synthesis using Score Distillation Sampling. Finally, we generate 2D vector graphics from text by optimizing a vector graphics renderer with knowledge distilled from a pretrained text-to-image diffusion model, without vector graphics data. Our models enable high-quality generation across many modalities, and continue to be broadly applied in subsequent work.

Advisors: Pieter Abbeel


BibTeX citation:

@phdthesis{Jain:EECS-2023-161,
    Author= {Jain, Ajay},
    Title= {Transferable Generative Models},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-161.html},
    Number= {UCB/EECS-2023-161},
    Abstract= {We present progress in developing stable, scalable and transferable generative models for visual data. We first learn expressive image priors using autoregressive models which generate high-quality and diverse images. We then explore transfer learning to generalize visual representations models to new data modalities with limited available data. We propose two methods to generate high quality 3D graphics from sparse input images or natural language descriptions by distilling knowledge from pretrained discriminative vision models. We briefly summarize our work on improving generation quality with a Denoising Diffusion Probabilistic Model, and demonstrate how to transfer it to new modalities including high-quality text-to-3D synthesis using Score Distillation Sampling. Finally, we generate 2D vector graphics from text by optimizing a vector graphics renderer with knowledge distilled from a pretrained text-to-image diffusion model, without vector graphics data. Our models enable high-quality generation across many modalities, and continue to be broadly applied in subsequent work.},
}

EndNote citation:

%0 Thesis
%A Jain, Ajay 
%T Transferable Generative Models
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 12
%@ UCB/EECS-2023-161
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-161.html
%F Jain:EECS-2023-161