融合NetVLAD和全连接层的三元神经网络交叉视角场景图像定位

薛朝辉; 周逸飏; 强永刚; 刘弋锋; 林晖

doi:10.11834/jrs.20210188

前沿进展 | 浏览量 : 0 下载量: 252 CSCD: 3 更多指标

PDF
导出
分享
收藏
专辑

融合NetVLAD和全连接层的三元神经网络交叉视角场景图像定位
Cross-view scene image localization with Triplet Network integrating NetVLAD and Fully Connected Layers
2021年25卷第5期页码：1095-1107
纸质出版日期： 2021-05-07 ，
DOI： 10.11834/jrs.20210188

扫描看全文

薛朝辉，周逸飏，强永刚，刘弋锋，林晖.2021.融合NetVLAD和全连接层的三元神经网络交叉视角场景图像定位.遥感学报，25（5）： 1095-1107

Xue Z H，Zhou Y Y，Qiang Y G，Liu Y F and Lin H. 2021. Cross-view scene image localization with Triplet Network integrating NetVLAD and Fully Connected Layers. National Remote Sensing Bulletin， 25（5）：1095-1107
薛朝辉，周逸飏，强永刚，刘弋锋，林晖.2021.融合NetVLAD和全连接层的三元神经网络交叉视角场景图像定位.遥感学报，25（5）： 1095-1107 DOI： 10.11834/jrs.20210188.

Xue Z H，Zhou Y Y，Qiang Y G，Liu Y F and Lin H. 2021. Cross-view scene image localization with Triplet Network integrating NetVLAD and Fully Connected Layers. National Remote Sensing Bulletin， 25（5）：1095-1107 DOI： 10.11834/jrs.20210188.

摘要

研究场景图像的地理定位问题在室外定位、目标搜寻、军事侦察等领域具有重要意义。针对街景影像与鸟瞰影像之间的交叉视角场景图像匹配与定位问题，本文提出了一种融合可训练局部聚集描述子向量NetVLAD（Net Vector of locally aggregated descriptors）和全连接层的三元神经网络（Triplet Network）定位方法（Tri-NetVLAD）。三元神经网络由三组卷积神经网络CNN（Convolutional Neural Networks）构成，能同时处理3张影像，通过增大不匹配像对间的距离，减小匹配像对间的距离，实现图像检索与匹配；NetVLAD和全连接层的融合可以加强特征间的关联性。本文将CNN提取的局部卷积特征分别通过NetVLAD层和全连接层得到全局描述符与特征向量，并将二者融合，有效地提升了局部特征间的关联性，并保留了不同局部特征之间的差异性，提升了模型的定位精度；改进了DBL loss（Distance-based layer loss），通过加入参数

增强函数判别困难样本的能力，在提升模型的收敛速度和稳定性的同时也提升了模型的定位精度。在美国Vo and Hays公开数据集上的实验结果表明，Tri-NetVLAD取得了优于MCVPlaces、Triplet eDBL-Net和CVM-Net等现有方法的定位精度，在测试集上的精度高于63%。

Abstract

Cross-view scene image matching and positioning have a wide range of applications in target search

combating crime

and positioning. With the development of deep learning

neural networks have played an important role in this issue. Given the problem of cross-view scene image matching and positioning between street view and bird’s eye images

the neural network model’s convergence is slow

and the feature correlation is weak. This paper proposes a triplet network model (Tri-NetVLAD) that combines NetVLAD and a fully connected layer and improves DBL Loss (ADBL loss). The proposed method can not only improve the convergence speed and stability of the network but also the overall positioning accuracy of the model.

The proposed Tri-NetVLAD model extracts the local features of the three input images through a triplet network and inputs the local features to the fully connected and NetVLAD layers to obtain the feature vector and the global feature descriptor. The global feature descriptor can obtain the relative distribution between features

and on this basis

incorporate feature vectors

which can preserve the differences between features to improve the positioning accuracy of the model. ADBL loss improves the model’s ability to discriminate difficult samples by introducing parameters and the positioning accuracy of the model.

The proposed Tri-NetVLAD is compared with several existing methods

namely

MCVPlaces

Triplet eDBL-Net

and CVM-Net

and loss functions

namely

contrastive loss

triplet loss

and DBL loss. In the US vo and hays dataset

the highest positioning accuracy of 63.5% is achieved

proving that the triplet network that combines the NetVLAD and fully connected layers can effectively improve the positioning accuracy with the ADBL Loss.

Compared with existing methods

the proposed Tri-NetVLAD has the following advantages. (1) The Triplet network can increase the Euclidean distance between unmatched images while reducing the Euclidean distance between matched images. (2) The introduction of NetVLAD can aggregate the local features extracted by CNN to obtain global feature descriptors and the distribution relationship between features. (3) The fusing of the Fully Connected Layer adds the feature vector obtained through the fully connected layer to the global feature descriptor

so that the final feature vector not only represents the distribution relationship between features

but also retains the differences between features. (4) The improved loss function ADBL Loss can accelerate the gradient convergence speed and improve the overall positioning accuracy.

关键词

交叉视角场景图像匹配与定位三元神经网络NetVLADCNN（Convolutional Neural Networks）

Keywords

cross-viewscene image matching and geolocationTriplet NetworkNetVLADCNN

references

Altwaijry H, Trulls E, Hays J, Fua P and Belongie S. 2016. Learning to Match Aerial Images with Deep Attentive Architectures//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE: 3539-3547 [DOI: 10.1109/CVPR.2016.385http://dx.doi.org/10.1109/CVPR.2016.385]

Arandjelović R, Gronat P, Torii A, Pajdla T and Sivic J. 2018. NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6): 1437-1451 [DOI: 10.1109/TPAMI.2017.2711011http://dx.doi.org/10.1109/TPAMI.2017.2711011]

Chen W H, Chen X T, Zhang J G and Huang K Q. 2017. Beyond Triplet Loss: a deep quadruplet network for person Re-identification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE: 403-412 [DOI: 10.1109/CVPR.2017.145http://dx.doi.org/10.1109/CVPR.2017.145]

Hadsell R, Chopra S and LeCun Y. 2006. Dimensionality reduction by learning an invariant mapping//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). New York, NY, USA: IEEE: 1735-1742 [DOI: 10.1109/CVPR.2006.100http://dx.doi.org/10.1109/CVPR.2006.100]

Hammoud R I, Kuzdeba S A, Berard B, Tom V, Ivey R, Bostwick R, HandUber J, Vinciguerra L, Shnidman N and Smiley B. 2013. Overhead-based image and video geo-localization framework//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Portland, OR, USA: IEEE: 320-327 [DOI: 10.1109/CVPRW.2013.55http://dx.doi.org/10.1109/CVPRW.2013.55]

Hu S X, Feng M D, Nguyen R M H and Lee G H. 2018. CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 7258-7267 [DOI: 10.1109/CVPR.2018.0075http://dx.doi.org/10.1109/CVPR.2018.0075]

Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P and Schmid C. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9): 1704-1716 [DOI: 10.1109/TPAMI.2011.235http://dx.doi.org/10.1109/TPAMI.2011.235]

Kingma D P and Ba J. 2014. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980

Krizhevsky A, Sutskever I and Hinton G E. 2012. Imagenet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: ACM: 1097-1105

Lin T Y, Belongie S and Hays J. 2013. Cross-view image geolocalization//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE: 891-898 [DOI: 10.1109/CVPR.2013.120http://dx.doi.org/10.1109/CVPR.2013.120]

Lin T Y, Cui Y, Belongie S and Hays J. 2015. Learning deep representations for ground-to-aerial geolocalization//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE: 5007-5015 [DOI: 10.1109/CVPR.2015.7299135http://dx.doi.org/10.1109/CVPR.2015.7299135]

Schindler G, Brown M and Szeliski R. 2007. City-scale location recognition//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Minneapolis, MN, USA: IEEE: 1-7 [DOI: 10.1109/CVPR.2007.383150http://dx.doi.org/10.1109/CVPR.2007.383150]

Tian Y C, Chen C and Shah M. 2017. Cross-view image matching for geo-localization in urban environments//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE: 1998-2006 [DOI: 10.1109/cvpr.2017.216http://dx.doi.org/10.1109/cvpr.2017.216]

Vo N N and Hays J. 2016. Localizing and orienting street views using overhead imagery//Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer: 494-509 [DOI: 10.1007/978-3-319-46448-0_30http://dx.doi.org/10.1007/978-3-319-46448-0_30]

Workman S, Souvenir R and Jacobs N. 2015. Wide-area image geolocalization with aerial reference imagery//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 3961-3969 [DOI: 10.1109/ICCV.2015.451http://dx.doi.org/10.1109/ICCV.2015.451]

Zhang H Q, Liu X Y, Yang S and Li Y. 2017. Retrieval of remote sensing images based on semisupervised deep learning. Journal of Remote Sensing, 21(3): 406-414

张洪群, 刘雪莹, 杨森, 李宇. 2017. 深度学习的半监督遥感图像检索. 遥感学报, 21(3): 406-414 [DOI: 10.11834/jrs.20176105http://dx.doi.org/10.11834/jrs.20176105]

Zhang L and Liao M S. 2006. A context aware fuzzy clustering method for remote sensing images. Journal of Remote Sensing, 2006 (01): 58-65

张路,廖明生.2006.一种顾及上下文的遥感影像模糊聚类.遥感学报,2006(01):58-65 [DOI:10.11834/jrs.20060109http://dx.doi.org/10.11834/jrs.20060109]

Zhao L J and Tang P. 2016. Scalability analysis of typical remote sensing data classification methods: a case of remote sensing image scene. Journal of Remote Sensing, 20(2): 157-171

赵理君, 唐娉. 2016. 典型遥感数据分类方法的适用性分析——以遥感图像场景分类为例. 遥感学报, 20(2): 157-171 [DOI: 10.11834/jrs.20164279http://dx.doi.org/10.11834/jrs.20164279]

文章被引用时，请邮件提醒。

提交

暂无数据