MVS-Splatting

MVS-Splatting: Fast Multi-View Stereo Depth Fusion for 3D Gaussian Splatting Initialization

IDLab-Media
University of Ghent, Belgium

Abstract

Images and videos allow us to explore places and connect to people all around the world, in the present or past. What if we could break through the glass screen in front of us and step into those camera captures. Although challenging, in recent years, light field technology has developed some promising techniques, such as 3D Gaussian Splatting, that is able to render high-quality views of a scene reconstructed using only camera captures. However, creating the scene model takes a significant amount of time and compute power, which makes it unviable for the multimedia industry which outputs terabytes of new content daily. In this paper, we present a method of speeding up the modeling process, not by optimizing the training, but by initializing the pipeline with an already semi-finished reconstruction. This is done by estimating the depth maps of the camera images, fusing them and converting this to a dense set of Gaussian splats which already closely resembles the scene. Afterwards, the default training process is applied to fine-tune and quickly synthesize new high-quality views. We show that our method on average, after 1000 iterations, improves PSNR by +1.27dB, SSIM by +0.065 and LPIPS by -0.10, compared to the default initialization.

Results

We initialize the Gaussian Splats using the proposed method, MVS-Splatting, and use the default 3DGS training procedure for 1000 iterations. Below, we show some renders of the reconstructed scenes. We compare our results to default 3DGS, which uses Colmap's SfM sparse point cloud to initialize its Gaussian Splats. We also compare against MVSGaussian (ECCV'25).

MVS-Splatting: Fast Multi-View Stereo Depth Fusion for 3D Gaussian Splatting Initialization

Our method

Abstract

Results

BibTeX