SPLATONIC: Architectural Support for 3D Gaussian Splatting SLAM via Sparse Processing

Xiaotong Huang ^1*

He Zhu ^1*

Tianrui Ma ³

Yuxiang Xiong ¹

Fangxin Liu ¹

Zhezhi He ¹

Yiming Gan ²

Zihan Liu ¹²

Jingwen Leng ¹²

Yu Feng ¹²

Minyi Guo ¹²

¹Shanghai Jiao Tong University, ²Shanghai Qi Zhi Institute, ³Institute of Computing Technology, Chinese Academy of Sciences

^*Equal contribution

HPCA 2026

Paper

Abstract

3D Gaussian splatting (3DGS) has emerged as a promising direction for SLAM due to its high-fidelity reconstruction and rapid convergence. However, 3DGS-SLAM algorithms remain impractical for mobile platforms due to their high computational cost, especially for their tracking process.

This work introduces SPLATONIC, a sparse and efficient real-time 3DGS-SLAM algorithm-hardware co-design for resource-constrained devices. Inspired by classical SLAMs, we propose an adaptive sparse pixel sampling algorithm that reduces the number of rendered pixels by up to 256 $\times$ while retaining accuracy. To unlock this performance potential on mobile GPUs, we design a novel pixel-based rendering pipeline that improves hardware utilization via Gaussian-parallel rendering and preemptive $\alpha$ -checking. Together, these optimizations yield up to 121.7 $\times$ speedup on the bottleneck stages and 14.6 $\times$ end-to-end speedup on off-the-shelf GPUs. To further address new bottlenecks introduced by our rendering pipeline, we propose a pipelined architecture that simplifies the overall design while addressing newly emerged bottlenecks in projection and aggregation. Evaluated across four 3DGS-SLAM algorithms, SPLATONIC achieves up to 274.9 $\times$ speedup and 4738.5 $\times$ energy savings over mobile GPUs and up to 25.2 $\times$ speedup and 241.1 $\times$ energy savings over state-of-the-art accelerators, all with comparable accuracy.

Contribution

We proposed a pixel-based rendering pipeline to improve GPU thread utilization under our sparse pixel sampling algorithm. Specifically,we shift from pixel-wise parallelism to Gaussian-wise parallelism in both the rasterization and reverse rasterization stages. Additionally, we introduce an optimization, preemptive $\alpha$ -checking, that moves $\alpha$ -checking from rasterization to projection in the pipeline.

Overview of our \textit\{pixel-based rendering} pipeline — Overview of our pixel-based rendering pipeline.

Our architecture is built upon a pipelined 3DGS accelerator MetaSapiens. We augment the baseline architecture to support the backward pass. Specifically, we co-design a simplified rasterization engine (purple-colored) that mitigates the PE under-utilization in rasterization and reverse rasterization. We also propose a caching technique between these two stages to avoid the across-thread reduction in reverse rasterization. Meanwhile, we augment the projection unit (pink-colored) to support preemptive $\alpha$ -checking and design a dedicated aggregation unit (yellow-colored) to accelerate reverse rasterization.

Results

Accuracy

We report absolute trajectory error (ATE) and PSNR on Replica and TUM datasets. The results supports the effectiveness of our sparse sampling algorithm.

Tracking accuracy and reconstruction quality evaluation on TUM — Evaluation on TUM dataset.

GPU Performance

We show that our sampling algorithm with proposed rendering pipeline achieves significant speedup and energy saving on off-the-shelf GPUs without hardware support. Orig.+S refers to applying our sparse sampling algorithm on the original rendering pipeline.

Tracking and mapping performance — The speedup and energy savings comparison on tracking and mapping

e2e performance — The end-to-end speedup and energy savings on Orin.

Hardware Performance

We compare SPLATONIC with other architectures on the tracking performance. For fair comparison, we include the variants that incorporate our sparse sampling algorithm, denoted with the “+S” suffix. Numbers are normalized against GPU.

BibTeX

@inproceedings{huang2026splatonic,
  author = "Huang, Xiaotong and Zhu, He and Ma, Tianrui and Xiong, Yuxiang and Liu, Fangxin and He, Zhezhi and Gan, Yiming and Liu, Zihan and Leng, Jingwen and Feng, Yu and Guo, Minyi",
  title = "SPLATONIC: Architectural Support for 3D Gaussian Splatting SLAM via Sparse Processing",
  year = "2026",
  booktitle = "Proceedings of the IEEE International Symposium on High Performance Computer Architecture",
}

Acknowledgements

We are sincerely grateful to Weikai Lin for providing valuable advice and support throughout this research.