3 minute read

StreamGS

Introduction

In this post, I review StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams, a collaboration between Microsoft Research Asia and HKUST, accepted to CVPR 2026.

While 3D Gaussian Splatting (3DGS) has set new benchmarks for real-time rendering, the reconstruction phase remains a bottleneck. Most methods require a pre-processed set of images with known camera poses (COLMAP) and a heavy offline optimization loop. For real-world applications like robotics or live AR/VR, we need a system that can build a 3D scene online from an unposed video stream.

StreamGS is a feed-forward pipeline that transforms raw image streams into “Gaussian streams.” It leverages the geometric priors of DUSt3R but introduces Adaptive Refinement to handle Out-of-Domain (OOD) data and a Feed-Forward ADC (Adaptive Density Control) mechanism to eliminate redundancy.


Paper Info

  • Title: StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams
  • Authors: Yang Li, Jinglu Wang, Lei Chu, Xiao Li, Shiu-Hong Kao, Ying-Cong Chen, Yan Lu
  • Conference: CVPR 2026
  • Paper Link: ArXiv

Model Overview: The 3-Stage Pipeline

The core innovation of StreamGS is how it progressively updates the global Gaussian set \(\mathcal{G}_t\) without iterative optimization. The architecture is divided into three logical stages that handle geometry, refinement, and memory management.

StreamGS Architecture

1. Initial Two-view Reconstruction

The process begins by treating the image stream as a sequence of pairs \((I_{t-1}, I_t)\).

  • Coarse Prediction: A frozen DUSt3R-based predictor \(\phi_{3D}\) estimates the point maps \(X_t\) and \(X_{t-1}\) in a local coordinate system.
  • Camera Estimation: Relative poses are derived by solving a point registration problem: \([R, t] = \arg \min_{s,R,t} \sum C_t \|s(RX_{t-1} + t) - X_t\|^2\)
  • Constraint: Since the coarse predictor is frozen, it often suffers from Out-of-Domain (OOD) issues when applied to scenes unlike the training data.

2. Content-Adaptive Refinement

To fix OOD errors, the model self-corrects using cross-frame correspondences:

  • Feature Matching: A matching head \(\phi_{match}\) extracts local 3D features to find robust pixel-wise matches between \(I_{t-1}\) and \(I_t\) using Nearest Neighbor search.
  • Joint Refine: These matches act as “geometric anchors.” The system re-estimates a residual transform \(\Delta = [\Delta R, \Delta t]\) to refine the camera trajectory and “snap” the point maps into better alignment.
  • Gaussian Decoding: A lightweight decoder \(\phi_{GS}\) then takes these refined points combined with 2D image features to predict Gaussian parameters (rotation \(q\), scale \(s\), opacity \(\alpha\), and color \(c\)).

3. Feed-Forward ADC (Adaptive Density Control)

This stage prevents the “Gaussian Explosion.” If we simply added 50k Gaussians per frame, the system would crash.

  • Warping MergeNet: Using the matches from Stage 2, the current frame’s Gaussian features are warped onto the previous frame.
  • Feature Aggregation: For pixels that correspond between frames, the MergeNet (\(\phi_{MG}\)) aggregates their features into a single Gaussian primitive.
  • Density Control: This reduces redundancy by ~40%, transforming a set of redundant per-frame predictions into a lean, unified “Gaussian Stream.”

Results

StreamGS Results

StreamGS was evaluated across diverse datasets (ScanNet, RE10K, DL3DV) and compared with both optimization-based (CF-3DGS) and pose-dependent (MVSplat) methods.

Quantitative comparison on DL3DV:

Method Pose-Free Generalizable PSNR \(\uparrow\) Speed (FPS)
MVSplat 17.84 27.78
CF-3DGS 19.93 0.06
StreamGS 20.54 9.09

Key Findings:

  • Speed: StreamGS is 150x faster than optimization-based pose-free methods like CF-3DGS.
  • Robustness: As seen in the qualitative results above, MVSplat struggles with view aggregation on OOD data, while StreamGS maintains high visual quality and structural integrity.
  • Memory Efficiency: The merging process effectively prunes Gaussians with only a negligible (~2-3%) impact on PSNR.

Takeaways

StreamGS represents the first holistic, generalizable pipeline for online 3DGS from unposed streams.

  • Geometric Priors + Adaptation: Relying solely on a foundation model like DUSt3R isn’t enough for online use; the adaptive refinement step is what makes it robust to new environments.
  • 2D to 3D Aggregation: By turning 3D aggregation into a 2D pixel-wise warping task, the authors achieved massive speed gains without sacrificing the quality of the final Gaussian map.
  • The End of SfM? For real-time applications, this “pose-free” feed-forward approach is quickly becoming the most viable path forward for spatial computing.