GitPedia

LIO SAM GPU ScanToMapOpt

A CUDA reimplementation of the line/plane odometry of LIO-SAM. A point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate 5-neighbour KNN search. Run on Jetson Orin NX 8GB.

From qdLMF·Updated June 11, 2026·View on GitHub·

This repository reimplements the line/plane odometry (based on LOAM) of LIO-SAM with CUDA. Replacing pcl's kdtree, a point cloud hash map (inspired by iVox of [Faster-LIO](https://github.com/gaoxiang12/faster-lio)) on GPU is used to accelerate local map building, 5-neighbour KNN search and non-linear optimization. The project is written primarily in Cuda, distributed under the BSD 3-Clause "New" or "Revised" License license, first published in 2023. Key topics include: 3d-mapping, cuda, faster-lio, gpu, ivox.

LIO-SAM-GPU-ScanToMapOpt

This repository reimplements the line/plane odometry (based on LOAM) of LIO-SAM with CUDA. Replacing pcl's kdtree, a point cloud hash map (inspired by iVox of Faster-LIO) on GPU is used to accelerate local map building, 5-neighbour KNN search and non-linear optimization.

Modifications are as follow :

  • The CUDA codes of the line/plane odometry are in src/cuda_plane_line_odometry.
  • To use this CUDA odometry, the scan2MapOptimization() in mapOptimization.cpp is replaced with scan2MapOptimizationWithCUDA().

About

This repository reimplements the line/plane odometry in scan2MapOptimization() of mapOptimization.cpp with CUDA.

On my machine (Orin-NX-8GB, walking_dataset.bag, with OpenMP), original CPU version:

  • average cost of extracting surrounding key frames is more than 30ms
  • average cost of building local map is about 20ms
  • average cost of KNN search and optimization is about 30ms
  • average cost of all operations in one frame is about 85ms

This repository replaces pcl's kdtree with a point cloud hash map (inspired by iVox of Faster-LIO) implemented with CUDA.

Meanwhile, other parts of the line/plane odometry (jacobians & residuals etc) are also implemented with CUDA.

On my machine (Orin-NX-8GB, walking_dataset.bag), GPU version implemented by this project :

  • average cost of extracting surrounding key frames is down to about 2.74ms
  • average cost of incrementally updating local map is down to about 1.16ms
  • average cost of one 5-neighbour KNN search is down to about 1.40ms
  • average cost of all operations in one frame is down to about 21.56ms

Dependencies

The essential dependencies are as same as LIO-SAM

My Orin-NX-8GB's specific enviroment :

  • Ubuntu 20.04, Ros Noetic, JetPack 5.10
  • C++14
  • CUDA 11.4
  • Eigen 3.3.7

How To Build

Before build this repo, some CMAKE variables in src/cuda_plane_line_odometry/CMakeLists.txt need to be modified to fit your enviroment :

set(CMAKE_CUDA_COMPILER /usr/local/cuda/bin/nvcc)   # change it to your path to nvcc
set(CUDA_TOOLKIT_ROOT_DIR /usr/local/cuda/bin/nvcc) # change it to your path to nvcc
set(CMAKE_CUDA_ARCHITECTURES 87)                    # for example, if your device's compute capability is 6.2, then set this CMAKE variable to 62
                                                    # In my Orin-NX-8GB, this CMAKE variable is 87 

The basic steps to compile and run this repo is as same as LIO-SAM.

Speed-up

<table style="text-align:center;"> <tr> <th rowspan="2">Sequence</th><th colspan="3">Orin-NX-8GB CPU</th><th colspan="5">Orin-NX-8GB GPU</th> </tr> <tr> <th>extract<br>surrounding<br>key frames</th><th>build<br>kdtree</th><th>one<br>frame</th><th>extract<br>surrounding<br>key frames</th><th>incrementally<br>update<br>hashmap</th><th>one<br>KNN</th><th>one<br>frame</th><th>speed-up</th> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/1gJHwfdHCRdjP7vuT556pv8atqrCJPbUq?usp=sharing">Walking</a></td><td>34.65ms</td><td>20.03ms</td><td>84.95ms</td><td>2.74ms</td><td>1.16ms</td><td>1.40ms</td><td>21.56ms</td><td>3.94x</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/1gJHwfdHCRdjP7vuT556pv8atqrCJPbUq?usp=sharing">Campus (large)</a></td><td>25.21ms</td><td>19.34ms</td><td>84.75ms</td><td>1.71ms</td><td>1.13ms</td><td>1.49ms</td><td>23.58ms</td><td>3.59x</td> </tr> <tr> <td><a href="https://drive.google.com/drive/folders/1gJHwfdHCRdjP7vuT556pv8atqrCJPbUq?usp=sharing">2011_09_30_drive_0028</a></td><td>68.17ms</td><td>22.04ms</td><td>166.67ms</td><td>11.70ms</td><td>3.97ms</td><td>2.59ms</td><td>54.06ms</td><td>3.08x</td> </tr> <!-- <tr> <td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td><td>8</td><td>9</td> </tr> --> </table>

Acknowledgements

This repository is a modified version of LIO-SAM, whose line/plane odometry is originally based upon LOAM.

The point cloud hash map on GPU is inspired by iVox data structure of Faster-LIO, and draws experience from kdtree_cuda_builder.h of FLANN.

Star History

Star History Chart

Contributors

Showing top 1 contributor by commit count.

View all contributors on GitHub →

This article is auto-generated from qdLMF/LIO-SAM-GPU-ScanToMapOpt via the GitHub API.Last fetched: 6/14/2026