Neural Scene Representations for View Synthesis

Benjamin Mildenhall

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2020-223

December 18, 2020

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-223.pdf

View synthesis is the problem of using a given set of input images to render a scene from new points of view. Recent approaches have combined deep learning and volume rendering to achieve photorealistic image quality. However, these methods rely on a dense 3D grid representation that only allows for a small amount of local camera movement and scales poorly to higher resolutions.

In this dissertation, we present a new approach to view synthesis based on neural radiance fields, an efficient way to represent a scene as a continuous function parameterized by the weights of a neural network. In contrast to using a feed-forward neural network to predict scene properties from a small number of inputs, a neural radiance field can be directly optimized to globally reconstruct a scene from tens or hundreds of input images and thus achieve high quality novel view synthesis over a large camera baseline.

The key to enabling high fidelity reconstruction of a low-dimensional signal using a neural network is a high frequency mapping of the input coordinates into a higher-dimensional space. We explain the connection between this mapping and the neural tangent kernel, and show how manipulating the frequency spectrum of the mapping provides control over the network's interpolation behavior between supervision points.

Advisors: Ren Ng

BibTeX citation:

@phdthesis{Mildenhall:EECS-2020-223,
    Author= {Mildenhall, Benjamin},
    Title= {Neural Scene Representations for View Synthesis},
    School= {EECS Department, University of California, Berkeley},
    Year= {2020},
    Month= {Dec},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-223.html},
    Number= {UCB/EECS-2020-223},
    Abstract= {View synthesis is the problem of using a given set of input images to render a scene from new points of view. Recent approaches have combined deep learning and volume rendering to achieve photorealistic image quality. However, these methods rely on a dense 3D grid representation that only allows for a small amount of local camera movement and scales poorly to higher resolutions. 

In this dissertation, we present a new approach to view synthesis based on neural radiance fields, an efficient way to represent a scene as a continuous function parameterized by the weights of a neural network. In contrast to using a feed-forward neural network to predict scene properties from a small number of inputs, a neural radiance field can be directly optimized to globally reconstruct a scene from tens or hundreds of input images and thus achieve high quality novel view synthesis over a large camera baseline. 

The key to enabling high fidelity reconstruction of a low-dimensional signal using a neural network is a high frequency mapping of the input coordinates into a higher-dimensional space. We explain the connection between this mapping and the neural tangent kernel, and show how manipulating the frequency spectrum of the mapping provides control over the network's interpolation behavior between supervision points.},
}

EndNote citation:

%0 Thesis
%A Mildenhall, Benjamin 
%T Neural Scene Representations for View Synthesis
%I EECS Department, University of California, Berkeley
%D 2020
%8 December 18
%@ UCB/EECS-2020-223
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-223.html
%F Mildenhall:EECS-2020-223