Visual localization is a key technology for applications such as augmented, mixed and virtual reality, as well as robotics and autonomous driving. It addresses the problem of estimating the 6-degree-of-freedom (DoF) camera pose from which a given image or sequence of images was captured relative to a reference scene representation, often in the form of images with known poses. Although much research has been done in this area in recent years, large variations in appearance caused by season, weather, illumination, and man-made changes, as well as large-scale environments, are still challenging for visual localization solutions. To overcome the limitations caused by appearance changes, traditional hand-crafted local image feature descriptors such as SIFT (Lowe, 2004) or SURF (Bay et al., 2008) are replaced by learned feature descriptors such as SuperPoint (DeTone et al., 2018), R2D2 (Revaud et al., 2019), ASLFeat (Luo et al., 2020), DISK (Tyszkiewicz et al., 2020) or ALIKE (Zhao et al., 2022). Hierarchical approaches combining image retrieval and structure-based localization (Sarlin et al., 2019) are developed to deal with large environments, both to keep the required computational resources low and to ensure the uniqueness of the local features.