Incorporating Learnt Local and Global Embeddings into Monocular Visual SLAM
                    
                    
                      Huaiyang Huang, Haoyang Ye, Yuxiang Sun, Lujia Wang, Ming Liu
                    
                    
                      Autonomous Robots (AURO), Sprinter, 2021
                    
                    
                    
                      Traditional approaches for Visual Simultaneous Localization and Mapping (VSLAM) rely on
                        low-level vision information for state estimation, such as handcrafted local features or the
                        image gradient. While significant progress has been made through this track, under more
                        challenging configuration for monocular VSLAM, e.g., varying illumination, the performance of
                        state-of-the-art systems generally degrades. As a consequence, robustness and accuracy for
                        monocular VSLAM are still widely concerned. This paper presents a monocular VSLAM system that
                        fully exploits learnt features for better state estimation. The proposed system leverages both
                        learnt local features and global embeddings at different modules of the system: direct camera
                        pose estimation, inter-frame feature association, and loop closure detection. With a
                        probabilistic explanation of keypoint prediction, we formulate the camera pose tracking in a
                        direct manner and parameterize local features with uncertainty taken into account. To alleviate
                        the quantization effect, we adapt the mapping module to generate 3D landmarks better to
                        guarantee the system's robustness. Detecting temporal loop closure via deep global embeddings
                        further improves the robustness and accuracy of the proposed system. The proposed system is
                        extensively evaluated on public datasets (Tsukuba, EuRoC, and KITTI), and compared against the
                        state-of-the-art methods. The competitive performance of camera pose estimation confirms the
                        effectiveness of our method.
                     
                    
                      
                      
@article{huang2021incorporating,
  
  title={Incorporating learnt local and global embeddings into monocular visual SLAM},
  author={Huang, Huaiyang and Ye, Haoyang and Sun, Yuxiang and Wang, Lujia and Liu, Ming},
  journal={Autonomous Robots},
  volume={45},
  number={6},
  pages={789--803},
  year={2021},
  publisher={Springer}
  
}