Abstract
3D Gaussian splatting (3DGS) has emerged as a promising direction for SLAM due to its high-fidelity reconstruction and rapid convergence. However, 3DGS-SLAM algorithms remain impractical for mobile platforms due to their high computational cost, especially for their tracking process.
This work introduces SPLATONIC, a sparse and efficient real-time 3DGS-SLAM algorithm-hardware co-design for resource-constrained devices. Inspired by classical SLAMs, we propose an adaptive sparse pixel sampling algorithm that reduces the number of rendered pixels by up to 256 while retaining accuracy. To unlock this performance potential on mobile GPUs, we design a novel pixel-based rendering pipeline that improves hardware utilization via Gaussian-parallel rendering and preemptive -checking. Together, these optimizations yield up to 121.7 speedup on the bottleneck stages and 14.6 end-to-end speedup on off-the-shelf GPUs. To further address new bottlenecks introduced by our rendering pipeline, we propose a pipelined architecture that simplifies the overall design while addressing newly emerged bottlenecks in projection and aggregation. Evaluated across four 3DGS-SLAM algorithms, SPLATONIC achieves up to 274.9 speedup and 4738.5 energy savings over mobile GPUs and up to 25.2 speedup and 241.1 energy savings over state-of-the-art accelerators, all with comparable accuracy.
Contribution
We proposed a pixel-based rendering pipeline to improve GPU thread utilization under our sparse pixel sampling algorithm. Specifically,we shift from pixel-wise parallelism to Gaussian-wise parallelism in both the rasterization and reverse rasterization stages. Additionally, we introduce an optimization, preemptive -checking, that moves -checking from rasterization to projection in the pipeline.
Our architecture is built upon a pipelined 3DGS accelerator MetaSapiens. We augment the baseline architecture to support the backward pass. Specifically, we co-design a simplified rasterization engine (purple-colored) that mitigates the PE under-utilization in rasterization and reverse rasterization. We also propose a caching technique between these two stages to avoid the across-thread reduction in reverse rasterization. Meanwhile, we augment the projection unit (pink-colored) to support preemptive -checking and design a dedicated aggregation unit (yellow-colored) to accelerate reverse rasterization.
Results
Accuracy
We report absolute trajectory error (ATE) and PSNR on Replica and TUM datasets. The results supports the effectiveness of our sparse sampling algorithm.
GPU Performance
We show that our sampling algorithm with proposed rendering pipeline achieves significant speedup and energy saving on off-the-shelf GPUs without hardware support. Orig.+S refers to applying our sparse sampling algorithm on the original rendering pipeline.
Hardware Performance
We compare SPLATONIC with other architectures on the tracking performance. For fair comparison, we include the variants that incorporate our sparse sampling algorithm, denoted with the “+S” suffix. Numbers are normalized against GPU.
More videos
We provide more GPU performance comparisons between (left) conventional 3DGS SLAM and (right) SPLATONIC to show strengths of our method in different scenarios.
BibTeX
@inproceedings{huang2026splatonic, author = "Huang, Xiaotong and Zhu, He and Ma, Tianrui and Xiong, Yuxiang and Liu, Fangxin and He, Zhezhi and Gan, Yiming and Liu, Zihan and Leng, Jingwen and Feng, Yu and Guo, Minyi", title = "SPLATONIC: Architectural Support for 3D Gaussian Splatting SLAM via Sparse Processing", year = "2026", booktitle = "Proceedings of the IEEE International Symposium on High Performance Computer Architecture",}Acknowledgements
We are sincerely grateful to Weikai Lin for providing valuable advice and support throughout this research.