CoherentGS

CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians

¹Texas A&M University, ²Meta Reality Labs, ³LMU Munich

ECCV 2024

Overview of the optimization pipeline. For every input image, we obtain monocular depth (Depth Anything) and dense flow correspondences between all image pairs (FlowFormer++). These inputs are utilized to initialize a good set of 3D Gaussians for the subsequent optimization stage. The initialized 3D Gaussians, along with depth-based segmentation masks, are then used to perform a regularized 3D Gaussian optimization to obtain high-quality reconstruction.

Abstract

The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.

Talk

Implicit Decoder

During regularized optimization, the implicit decoder predicts the residual depth ΔD that moves the Gaussians from their initial position towards the true scene depth D. The input coordinate n to the decoder corresponds to the input view with camera cam_n. To preserve sharp discontinuities, we apply binary segmentation masks to the decoder output obtained by thresholding the monocular depth.

Optimization

Implicit decoder enables smooth deformation of initialized Gaussians resulting in coherent geometry and high quality texture.

Comparisons with other few-view NeRF methods

RGB Depth

Baseline method (left) vs CoherentGS (right). Scene trained on 2 views. Try selecting different methods and scenes!

Inpainting

In contrast to other methods, our approach does not hallucinates occluded details. This provides a unique advantage where the user can apply any inpainting technique to fill in the missing regions. As proof of concept, here we apply a simple inpainting technique to generate the missing texture and project it into the scene.

Related Work

Sparse Novel View Synthesis

RI3D: Sparse View Synthesis Using Repair and Inpainting Diffusion Priors
PanoDreamer: 3D Panorama Synthesis from a Single Image
ReShader: View-Dependent Highlights for Single Image View-Synthesis

Implicit Models for view and time interpolation

Implicit View-Time Interpolation of Stereo Videos using Multi-Plane Disparities and Non-Uniform Coordinates
Frame Interpolation for Dynamic Scenes with Implicit Flow Encoding

Citation

@inproceedings{paliwal2024coherentgs, title={CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians}, author={Paliwal, Avinash and Ye, Wei and Xiong, Jinhui and Kotovenko, Dmytro and Ranjan, Rakesh and Chandra, Vikas and Kalantari, Nima Khademi}, booktitle={European Conference on Computer Vision}, pages={19--37}, year={2024}, organization={Springer} }

Acknowledgements

The project was funded in part by a generous gift from Meta. The website template was borrowed from ReconFusion.