MVS-Splatting: Fast Multi-View Stereo Depth Fusion for 3D Gaussian Splatting Initialization

IDLab-Media
University of Ghent, Belgium
This is the page for our MVS-Splatting paper, which is currently under review for IEEE Access. Through the images and videos below, we want to provide a more visual explanation of our method, as well as present the results. The code and datasets are open-source and available on Github. Scroll down for more results.

Our method

Abstract

Images and videos allow us to explore places and connect to people all around the world, in the present or past. What if we could break through the glass screen in front of us and step into those camera captures. Although challenging, in recent years, light field technology has developed some promising techniques, such as 3D Gaussian Splatting, that is able to render high-quality views of a scene reconstructed using only camera captures. However, creating the scene model takes a significant amount of time and compute power, which makes it unviable for the multimedia industry which outputs terabytes of new content daily. In this paper, we present a method of speeding up the modeling process, not by optimizing the training, but by initializing the pipeline with an already semi-finished reconstruction. This is done by estimating the depth maps of the camera images, fusing them and converting this to a dense set of Gaussian splats which already closely resembles the scene. Afterwards, the default training process is applied to fine-tune and quickly synthesize new high-quality views. We show that our method on average, after 1000 iterations, improves PSNR by +1.27dB, SSIM by +0.065 and LPIPS by -0.10, compared to the default initialization.



Results

We initialize the Gaussian Splats using the proposed method, MVS-Splatting, and use the default 3DGS training procedure for 1000 iterations. Below, we show some renders of the reconstructed scenes. We compare our results to default 3DGS, which uses Colmap's SfM sparse point cloud to initialize its Gaussian Splats. We also compare against MVSGaussian (ECCV'25).

We show how the quality of each dataset's reconstruction progresses as the number of 3DGS iterations increases.
The first and last iteration are shown for a second longer, to facilitate comparisons.
Hover over the video to pause the carousel.

Below we show the comparison to MVSGaussian.
Hover over the video to pause the carousel.

We rendered out a path through the datasets' reconstructions after 1000 3DGS iterations.
We compare default Colmap SfM point cloud initialization (left) to our method for splat initialization (right).
Hover over the video to pause the carousel.

Below is the comparison to MVSGaussian (left) and our method (right).
Hover over the video to pause the carousel.

BibTeX

Our paper is currently under review. Once published, the citation will become available here.

Paper and dataset by IDLab Media