We present a novel approach to perform 3D semantic segmentation solely from 2D supervision by leveraging Neural Radiance Fields (NeRFs). By extracting features along a surface point cloud, we achieve a compact representation of the scene which is sample-efficient and conducive to 3D reasoning. Learning this feature space in an unsupervised manner via masked autoencoding enables few-shot segmentation. Our method is agnostic to the scene parameterization, working on scenes fit with any type of NeRF.
We utilize pretraining in data scarce scenarios to reduce the number of training data required. Within the pretraining an auto encoder structure is used to recover the rgb values or normals of masked points. Pretraining on normals can bootstrap the accuracy on the downstream task of semantic segmentation.
Within ablation studies we investigate the impact of the design decisions (ground removal, proximity loss, surface sampling) and show that pretraining on normals provides a more accurate normal estimation.
@misc{hollidt2023geometry,
title={Geometry Aware Field-to-field Transformations for 3D Semantic Segmentation},
author={Dominik Hollidt and Clinton Wang and Polina Golland and Marc Pollefeys},
year={2023},
eprint={2310.05133},
archivePrefix={arXiv},
primaryClass={cs.CV}
}