NICER-SLAM

Abstract

Neural implicit representations have recently become popular in simultaneous localization and mapping (SLAM), especially in dense visual SLAM. However, existing works either rely on RGB-D sensors or require a separate monocular SLAM approach for camera tracking, and fail to produce high-fidelity 3D dense reconstructions. To address these shortcomings, we present NICER-SLAM, a dense RGB SLAM system that simultaneously optimizes for camera poses and a hierarchical neural implicit map representation, which also allows for high-quality novel view synthesis. To facilitate the optimization process for mapping, we integrate additional supervision signals including easy-to-obtain monocular geometric cues and optical flow, and also introduce a simple warping loss to further enforce geometric consistency. Moreover, to further boost performance in complex large-scale scenes, we also propose a local adaptive transformation from signed distance functions (SDFs) to density in the volume rendering equation. On multiple challenging indoor and outdoor datasets, NICER-SLAM demonstrates strong performance in dense mapping, novel view synthesis, and tracking, even competitive with recent RGB-D SLAM systems.

Video

Method

NICER-SLAM takes only an RGB stream as input and outputs both the camera poses as well as a learned hierarchical scene representation for geometry and colors. To realize an end-to-end joint mapping and tracking, we render predicted colors, depths, normals and optimize wrt. the input RGB and monocular cues. Moreover, we further enforce the geometric consistency with an RGB warping loss and an optical flow loss.

Additional Results

Replica Dataset

Reconstruction

Rendering

7-Scenes Dataset

Self-Captured Outdoor Dataset

Rendering

Reconstruction

BibTeX

@inproceedings{Zhu2023NICER,
  author={Zhu, Zihan and Peng, Songyou and Larsson, Viktor and Cui, Zhaopeng and Oswald, Martin R and Geiger, Andreas and Pollefeys, Marc},
  title     = {NICER-SLAM: Neural Implicit Scene Encoding for RGB SLAM},
  booktitle = {International Conference on 3D Vision (3DV)},
  month     = {March},
  year      = {2024},
}

Acknowledgements

This project is partially supported by the SONY Research Award Program and a research grant by FIFA. The authors thank the Max Planck ETH Center for Learning Systems (CLS) for supporting Songyou Peng and the strategic research project ELLIIT for supporting Viktor Larsson. We thank Zehao Yu, Yiming Zhao, Weicai Ye, Boyang Sun, Jianhao Zheng, and Heng Li for their helpful discussion.

NICER-SLAM

Neural Implicit Scene Encoding for RGB SLAM

3DV 2024 (Oral, Best Paper Honorable Mention)

* Equal Contribution

TL;DR: We present NICER-SLAM, a dense RGB SLAM system that simultaneously optimizes for camera poses and a hierarchical neural implicit map representation, allowing high-quality 3D reconstruction and novel view synthesis.

Abstract

Video

Method

Additional Results

Replica Dataset

Reconstruction

Rendering

7-Scenes Dataset

Self-Captured Outdoor Dataset

Rendering

Reconstruction

BibTeX

Acknowledgements