SPLATONIC: Architectural Support for 3D Gaussian Splatting SLAM via Sparse Processing

1Shanghai Jiao Tong University, 2Shanghai Qi Zhi Institute, 3Institute of Computing Technology, Chinese Academy of Sciences
*Equal contribution
HPCA 2026

Abstract

3D Gaussian splatting (3DGS) has emerged as a promising direction for SLAM due to its high-fidelity reconstruction and rapid convergence. However, 3DGS-SLAM algorithms remain impractical for mobile platforms due to their high computational cost, especially for their tracking process.

This work introduces SPLATONIC, a sparse and efficient real-time 3DGS-SLAM algorithm-hardware co-design for resource-constrained devices. Inspired by classical SLAMs, we propose an adaptive sparse pixel sampling algorithm that reduces the number of rendered pixels by up to 256×\times while retaining accuracy. To unlock this performance potential on mobile GPUs, we design a novel pixel-based rendering pipeline that improves hardware utilization via Gaussian-parallel rendering and preemptive α\alpha-checking. Together, these optimizations yield up to 121.7×\times speedup on the bottleneck stages and 14.6×\times end-to-end speedup on off-the-shelf GPUs. To further address new bottlenecks introduced by our rendering pipeline, we propose a pipelined architecture that simplifies the overall design while addressing newly emerged bottlenecks in projection and aggregation. Evaluated across four 3DGS-SLAM algorithms, SPLATONIC achieves up to 274.9×\times speedup and 4738.5×\times energy savings over mobile GPUs and up to 25.2×\times speedup and 241.1×\times energy savings over state-of-the-art accelerators, all with comparable accuracy.

Contribution

We proposed a pixel-based rendering pipeline to improve GPU thread utilization under our sparse pixel sampling algorithm. Specifically,we shift from pixel-wise parallelism to Gaussian-wise parallelism in both the rasterization and reverse rasterization stages. Additionally, we introduce an optimization, preemptive α\alpha-checking, that moves α\alpha-checking from rasterization to projection in the pipeline.

Overview of our \textit\{pixel-based rendering} pipeline
Overview of our pixel-based rendering pipeline.

Our architecture is built upon a pipelined 3DGS accelerator MetaSapiens. We augment the baseline architecture to support the backward pass. Specifically, we co-design a simplified rasterization engine (purple-colored) that mitigates the PE under-utilization in rasterization and reverse rasterization. We also propose a caching technique between these two stages to avoid the across-thread reduction in reverse rasterization. Meanwhile, we augment the projection unit (pink-colored) to support preemptive α\alpha-checking and design a dedicated aggregation unit (yellow-colored) to accelerate reverse rasterization.

Overview of our pipelined architecture.
Overview of our pipelined architecture.

Results

Accuracy

We report absolute trajectory error (ATE) and PSNR on Replica and TUM datasets. The results supports the effectiveness of our sparse sampling algorithm.

Tracking accuracy and reconstruction quality evaluation on Replica
Evaluation on Replica dataset.
Tracking accuracy and reconstruction quality evaluation on TUM
Evaluation on TUM dataset.

GPU Performance

We show that our sampling algorithm with proposed rendering pipeline achieves significant speedup and energy saving on off-the-shelf GPUs without hardware support. Orig.+S refers to applying our sparse sampling algorithm on the original rendering pipeline.

Tracking and mapping performance
The speedup and energy savings comparison on tracking and mapping
e2e performance
The end-to-end speedup and energy savings on Orin.

Hardware Performance

We compare SPLATONIC with other architectures on the tracking performance. For fair comparison, we include the variants that incorporate our sparse sampling algorithm, denoted with the “+S” suffix. Numbers are normalized against GPU.

e2e performance
The performance and energy consumption comparison across different dedicated architectures during tracking.

More videos

We provide more GPU performance comparisons between (left) conventional 3DGS SLAM and (right) SPLATONIC to show strengths of our method in different scenarios.

Comparison on office0 scene.
Comparison on room0 scene.

BibTeX

@inproceedings{huang2026splatonic,
author = "Huang, Xiaotong and Zhu, He and Ma, Tianrui and Xiong, Yuxiang and Liu, Fangxin and He, Zhezhi and Gan, Yiming and Liu, Zihan and Leng, Jingwen and Feng, Yu and Guo, Minyi",
title = "SPLATONIC: Architectural Support for 3D Gaussian Splatting SLAM via Sparse Processing",
year = "2026",
booktitle = "Proceedings of the IEEE International Symposium on High Performance Computer Architecture",
}

Acknowledgements

We are sincerely grateful to Weikai Lin for providing valuable advice and support throughout this research.