Incorporating prior structure information into the visual state
estimation
could generally
improve the localization performance. In this letter, we aim to address
the
paradox between
accuracy and efficiency in coupling visual factors with structure
constraints. To this end, we
present a cross-modality method that tracks a camera in a prior map
modelled
by the Gaussian
Mixture Model (GMM). With the pose estimated by the front-end initially,
the
local visual
observations and map components are associated efficiently, and the
visual
structure from the
triangulation is refined simultaneously. By introducing the hybrid
structure
factors into the
joint optimization, the camera poses are bundle-adjusted with the local
visual structure. By
evaluating our complete system, namely GMMLoc, on the public dataset, we
show how our system can
provide a centimeter-level localization accuracy with only trivial
computational overhead. In
addition, the comparative studies with the state-of-the-art
vision-dominant
state estimators
demonstrate the competitive performance of our method.
@article{huang2020gmmloc,
title={GMMLoc: Structure Consistent Visual Localization with Gaussian Mixture Models},
author={Huang, Huaiyang and Ye, Haoyang and Sun, Yuxiang and Liu, Ming},
journal={IEEE Robotics and Automation Letters},
volume={5},
number={4},
pages={5043--5050},
year={2020},
publisher={IEEE}
}
Abstract
Incorporating prior structure information into the visual state estimation could generally improve the localization performance. In this letter, we aim to address the paradox between accuracy and efficiency in coupling visual factors with structure constraints. To this end, we present a cross-modality method that tracks a camera in a prior map modelled by the Gaussian Mixture Model (GMM). With the pose estimated by the front-end initially, the local visual observations and map components are associated efficiently, and the visual structure from the triangulation is refined simultaneously. By introducing the hybrid structure factors into the joint optimization, the camera poses are bundle-adjusted with the local visual structure. By evaluating our complete system, namely GMMLoc, on the public dataset, we show how our system can provide a centimeter-level localization accuracy with only trivial computational overhead. In addition, the comparative studies with the state-of-the-art vision-dominant state estimators demonstrate the competitive performance of our method.