Our input is just a single picture.
Niagara is the first model that can effectively reconstruct the challenging outdoor scenes from a single view.
Recent advances in single-view 3D scene reconstruction have highlighted the challenges in capturing fine geometric details and ensuring structural consistency, particularly in high-fidelity outdoor scene modeling.This paper presents Niagara, a new single-view 3D scene reconstruction framework that can faithfully reconstruct challenging outdoor scenes from a single input image for the first time.
Our approach integrates monocular depth and normal estimation as input, which substantially improves its ability to capture fine details, mitigating common issues like geometric detail loss and deformation.
Additionally, we introduce a geometric affine field (GAF) and 3D self-attention as geometry-constraint, which combines the structural properties of explicit geometry with the adaptability of implicit feature fields, striking a balance between efficient rendering and high-fidelity reconstruction.
Our framework finally proposes a specialized encoder-decoder architecture, where a depth-based 3D Gaussian decoder is proposed to predict 3D Gaussian parameters, which can be used for novel view synthesis. Extensive results and analyses suggest that our Niagara surpasses prior SoTA approaches such as Flash3D in both single-view and dual-view settings, significantly enhancing the geometric accuracy and visual fidelity, especially in outdoor scenes.
The output is a 3D outdoor scene by Flash3D.
The output is the same 3D outdoor scene by Niagara.
Novel view synthesis comparison on the RealEstate10K dataset. Following Flash3D, we evaluate our method on the in-domain novel view synthesis task. As seen, our model consistently outperforms all existing methods across different frame counts (5 frames, 10 frames, u[-30,30] frames), in terms of PSNR, SSIM, and LPIPS.
(Best results are in bold, second best underlined.)