Efficient 3D Perception from Images

Ruilong Li

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2025-134

June 2, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-134.pdf

The ability to capture, reconstruct, and interact with the world in three dimensions is transforming how we experience, understand, and shape our environment. From virtual reality and digital heritage to robotics and scientific discovery, 3D perception is opening new frontiers across art, science, and technology. Yet, despite remarkable progress, creating high-fidelity 3D models from images remains a computationally demanding challenge—one that limits the accessibility and scalability of these powerful tools. This thesis addresses the core bottleneck of efficiency in differentiable volume rendering, a founda- tional technique behind recent breakthroughs such as Neural Radiance Fields (NeRF) and Gaussian Splatting. While these methods have demonstrated that continuous volumetric representations and differentiable rendering pipelines can achieve photorealistic results from sparse or uncon- strained inputs, their high computational and memory costs pose significant barriers to real-time and large-scale applications. To overcome these challenges, I present a series of algorithmic and systems-level innovations aimed at making 3D perception faster, more scalable, and more practical. First, I introduce compact and expressive scene representations that reduce memory overhead without sacrificing quality. Second, I develop smarter sampling and visibility strategies that exploit the inherent sparsity of 3D space, focusing computation where it matters most. Third, I design parallelization techniques tailored for modern multi-GPU systems, enabling distributed training and rendering at unprecedented scales. Finally, I explore learning-based approaches that leverage multi-view geometry and attention mechanisms to further accelerate and generalize 3D perception. Through a combination of theoretical insights, open-source tools, and empirical validation, this work charts a path toward real-time, high-resolution 3D reconstruction that is accessible beyond specialized labs and supercomputers. By making 3D perception more efficient, this thesis aims to unlock new possibilities and bring us closer to a future where anyone can capture, share, and explore the spaces they love in all their depth and richness.

BibTeX citation:

@techreport{Li:EECS-2025-134,
    Author= {Li, Ruilong},
    Title= {Efficient 3D Perception from Images},
    Year= {2025},
    Month= {Jun},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-134.html},
    Number= {UCB/EECS-2025-134},
    Abstract= {The ability to capture, reconstruct, and interact with the world in three dimensions is transforming how we experience, understand, and shape our environment. From virtual reality and digital heritage to robotics and scientific discovery, 3D perception is opening new frontiers across art, science, and technology. Yet, despite remarkable progress, creating high-fidelity 3D models from images remains a computationally demanding challenge—one that limits the accessibility and scalability of these powerful tools. This thesis addresses the core bottleneck of efficiency in differentiable volume rendering, a founda- tional technique behind recent breakthroughs such as Neural Radiance Fields (NeRF) and Gaussian Splatting. While these methods have demonstrated that continuous volumetric representations and differentiable rendering pipelines can achieve photorealistic results from sparse or uncon- strained inputs, their high computational and memory costs pose significant barriers to real-time and large-scale applications. To overcome these challenges, I present a series of algorithmic and systems-level innovations aimed at making 3D perception faster, more scalable, and more practical. First, I introduce compact and expressive scene representations that reduce memory overhead without sacrificing quality. Second, I develop smarter sampling and visibility strategies that exploit the inherent sparsity of 3D space, focusing computation where it matters most. Third, I design parallelization techniques tailored for modern multi-GPU systems, enabling distributed training and rendering at unprecedented scales. Finally, I explore learning-based approaches that leverage multi-view geometry and attention mechanisms to further accelerate and generalize 3D perception. Through a combination of theoretical insights, open-source tools, and empirical validation, this work charts a path toward real-time, high-resolution 3D reconstruction that is accessible beyond specialized labs and supercomputers. By making 3D perception more efficient, this thesis aims to unlock new possibilities and bring us closer to a future where anyone can capture, share, and explore the spaces they love in all their depth and richness.},
}

EndNote citation:

%0 Report
%A Li, Ruilong 
%T Efficient 3D Perception from Images
%I EECS Department, University of California, Berkeley
%D 2025
%8 June 2
%@ UCB/EECS-2025-134
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-134.html
%F Li:EECS-2025-134