融合NetVLAD和全连接层的三元神经网络交叉视角场景图像定位
Cross-view scene image localization with Triplet Network integrating NetVLAD and Fully Connected Layers
- 2021年25卷第5期 页码:1095-1107
纸质出版日期: 2021-05-07
DOI: 10.11834/jrs.20210188
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2021-05-07 ,
扫 描 看 全 文
薛朝辉,周逸飏,强永刚,刘弋锋,林晖.2021.融合NetVLAD和全连接层的三元神经网络交叉视角场景图像定位.遥感学报,25(5): 1095-1107
Xue Z H,Zhou Y Y,Qiang Y G,Liu Y F and Lin H. 2021. Cross-view scene image localization with Triplet Network integrating NetVLAD and Fully Connected Layers. National Remote Sensing Bulletin, 25(5):1095-1107
研究场景图像的地理定位问题在室外定位、目标搜寻、军事侦察等领域具有重要意义。针对街景影像与鸟瞰影像之间的交叉视角场景图像匹配与定位问题,本文提出了一种融合可训练局部聚集描述子向量NetVLAD(Net Vector of locally aggregated descriptors)和全连接层的三元神经网络(Triplet Network)定位方法(Tri-NetVLAD)。三元神经网络由三组卷积神经网络CNN(Convolutional Neural Networks)构成,能同时处理3张影像,通过增大不匹配像对间的距离,减小匹配像对间的距离,实现图像检索与匹配;NetVLAD和全连接层的融合可以加强特征间的关联性。本文将CNN提取的局部卷积特征分别通过NetVLAD层和全连接层得到全局描述符与特征向量,并将二者融合,有效地提升了局部特征间的关联性,并保留了不同局部特征之间的差异性,提升了模型的定位精度;改进了DBL loss(Distance-based layer loss),通过加入参数
λ
增强函数判别困难样本的能力,在提升模型的收敛速度和稳定性的同时也提升了模型的定位精度。在美国Vo and Hays公开数据集上的实验结果表明,Tri-NetVLAD取得了优于MCVPlaces、Triplet eDBL-Net和CVM-Net等现有方法的定位精度,在测试集上的精度高于63%。
Cross-view scene image matching and positioning have a wide range of applications in target search
combating crime
and positioning. With the development of deep learning
neural networks have played an important role in this issue. Given the problem of cross-view scene image matching and positioning between street view and bird’s eye images
the neural network model’s convergence is slow
and the feature correlation is weak. This paper proposes a triplet network model (Tri-NetVLAD) that combines NetVLAD and a fully connected layer and improves DBL Loss (ADBL loss). The proposed method can not only improve the convergence speed and stability of the network but also the overall positioning accuracy of the model.
The proposed Tri-NetVLAD model extracts the local features of the three input images through a triplet network and inputs the local features to the fully connected and NetVLAD layers to obtain the feature vector and the global feature descriptor. The global feature descriptor can obtain the relative distribution between features
and on this basis
incorporate feature vectors
which can preserve the differences between features to improve the positioning accuracy of the model. ADBL loss improves the model’s ability to discriminate difficult samples by introducing parameters and the positioning accuracy of the model.
The proposed Tri-NetVLAD is compared with several existing methods
namely
MCVPlaces
Triplet eDBL-Net
and CVM-Net
and loss functions
namely
contrastive loss
triplet loss
and DBL loss. In the US vo and hays dataset
the highest positioning accuracy of 63.5% is achieved
proving that the triplet network that combines the NetVLAD and fully connected layers can effectively improve the positioning accuracy with the ADBL Loss.
Compared with existing methods
the proposed Tri-NetVLAD has the following advantages. (1) The Triplet network can increase the Euclidean distance between unmatched images while reducing the Euclidean distance between matched images. (2) The introduction of NetVLAD can aggregate the local features extracted by CNN to obtain global feature descriptors and the distribution relationship between features. (3) The fusing of the Fully Connected Layer adds the feature vector obtained through the fully connected layer to the global feature descriptor
so that the final feature vector not only represents the distribution relationship between features
but also retains the differences between features. (4) The improved loss function ADBL Loss can accelerate the gradient convergence speed and improve the overall positioning accuracy.
交叉视角场景图像匹配与定位三元神经网络NetVLADCNN(Convolutional Neural Networks)
cross-viewscene image matching and geolocationTriplet NetworkNetVLADCNN
Altwaijry H, Trulls E, Hays J, Fua P and Belongie S. 2016. Learning to Match Aerial Images with Deep Attentive Architectures//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE: 3539-3547 [DOI: 10.1109/CVPR.2016.385http://dx.doi.org/10.1109/CVPR.2016.385]
Arandjelović R, Gronat P, Torii A, Pajdla T and Sivic J. 2018. NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6): 1437-1451 [DOI: 10.1109/TPAMI.2017.2711011http://dx.doi.org/10.1109/TPAMI.2017.2711011]
Chen W H, Chen X T, Zhang J G and Huang K Q. 2017. Beyond Triplet Loss: a deep quadruplet network for person Re-identification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE: 403-412 [DOI: 10.1109/CVPR.2017.145http://dx.doi.org/10.1109/CVPR.2017.145]
Hadsell R, Chopra S and LeCun Y. 2006. Dimensionality reduction by learning an invariant mapping//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). New York, NY, USA: IEEE: 1735-1742 [DOI: 10.1109/CVPR.2006.100http://dx.doi.org/10.1109/CVPR.2006.100]
Hammoud R I, Kuzdeba S A, Berard B, Tom V, Ivey R, Bostwick R, HandUber J, Vinciguerra L, Shnidman N and Smiley B. 2013. Overhead-based image and video geo-localization framework//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Portland, OR, USA: IEEE: 320-327 [DOI: 10.1109/CVPRW.2013.55http://dx.doi.org/10.1109/CVPRW.2013.55]
Hu S X, Feng M D, Nguyen R M H and Lee G H. 2018. CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 7258-7267 [DOI: 10.1109/CVPR.2018.0075http://dx.doi.org/10.1109/CVPR.2018.0075]
Jégou H, Perronnin F, Douze M, Sánchez J, Pérez P and Schmid C. 2012. Aggregating local image descriptors into compact codes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9): 1704-1716 [DOI: 10.1109/TPAMI.2011.235http://dx.doi.org/10.1109/TPAMI.2011.235]
Kingma D P and Ba J. 2014. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Krizhevsky A, Sutskever I and Hinton G E. 2012. Imagenet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada, USA: ACM: 1097-1105
Lin T Y, Belongie S and Hays J. 2013. Cross-view image geolocalization//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE: 891-898 [DOI: 10.1109/CVPR.2013.120http://dx.doi.org/10.1109/CVPR.2013.120]
Lin T Y, Cui Y, Belongie S and Hays J. 2015. Learning deep representations for ground-to-aerial geolocalization//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE: 5007-5015 [DOI: 10.1109/CVPR.2015.7299135http://dx.doi.org/10.1109/CVPR.2015.7299135]
Schindler G, Brown M and Szeliski R. 2007. City-scale location recognition//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Minneapolis, MN, USA: IEEE: 1-7 [DOI: 10.1109/CVPR.2007.383150http://dx.doi.org/10.1109/CVPR.2007.383150]
Tian Y C, Chen C and Shah M. 2017. Cross-view image matching for geo-localization in urban environments//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE: 1998-2006 [DOI: 10.1109/cvpr.2017.216http://dx.doi.org/10.1109/cvpr.2017.216]
Vo N N and Hays J. 2016. Localizing and orienting street views using overhead imagery//Proceedings of the 14th European Conference on Computer Vision (ECCV). Amsterdam, The Netherlands: Springer: 494-509 [DOI: 10.1007/978-3-319-46448-0_30http://dx.doi.org/10.1007/978-3-319-46448-0_30]
Workman S, Souvenir R and Jacobs N. 2015. Wide-area image geolocalization with aerial reference imagery//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 3961-3969 [DOI: 10.1109/ICCV.2015.451http://dx.doi.org/10.1109/ICCV.2015.451]
Zhang H Q, Liu X Y, Yang S and Li Y. 2017. Retrieval of remote sensing images based on semisupervised deep learning. Journal of Remote Sensing, 21(3): 406-414
张洪群, 刘雪莹, 杨森, 李宇. 2017. 深度学习的半监督遥感图像检索. 遥感学报, 21(3): 406-414 [DOI: 10.11834/jrs.20176105http://dx.doi.org/10.11834/jrs.20176105]
Zhang L and Liao M S. 2006. A context aware fuzzy clustering method for remote sensing images. Journal of Remote Sensing, 2006 (01): 58-65
张路,廖明生.2006.一种顾及上下文的遥感影像模糊聚类.遥感学报,2006(01):58-65 [DOI:10.11834/jrs.20060109http://dx.doi.org/10.11834/jrs.20060109]
Zhao L J and Tang P. 2016. Scalability analysis of typical remote sensing data classification methods: a case of remote sensing image scene. Journal of Remote Sensing, 20(2): 157-171
赵理君, 唐娉. 2016. 典型遥感数据分类方法的适用性分析——以遥感图像场景分类为例. 遥感学报, 20(2): 157-171 [DOI: 10.11834/jrs.20164279http://dx.doi.org/10.11834/jrs.20164279]
相关文章
相关作者
相关机构