High resolution remote sensing image segmentation based on dual-modal efficient feature learning
- 2024年28卷第2期 页码:481-493
纸质出版日期: 2024-02-07
DOI: 10.11834/jrs.20233162
纸质出版日期: 2024-02-07 ,
张银胜,吉茹,童俊毅,杨宇龙,胡宇翔,单慧琳.2024.基于双模态高效特征学习的高分辨率遥感图像分割.遥感学报,28(2): 481-493
Zhang Y S,Ji R,Tong J Y,Yang Y L,Hu Y X and Shan H L. 2024. High resolution remote sensing image segmentation based on dual-modal efficient feature learning. National Remote Sensing Bulletin, 28(2):481-493
遥感图像因具有丰富的语义信息和空间信息,增加了语义分割的难度。然而已有提取双模态特征的分割方法采用相同的主干网络,没有考虑互补特征的差异,存在特征提取、特征融合和上采样恢复细节信息不足等问题,无法准确高效的学习高分辨率遥感图像信息。因此,本文提出基于双模态高效特征学习的高分辨率遥感图像分割算法。首先,针对不同模态的遥感图像设计合适的编码器,高效的提取双模态特征,并通过交互加强模块减少不同路径特征之间的差异。其次,提出双模态特征聚合模块和深层特征提取模块进一步融合和提取双模态特征,使网络能够充分学习互补信息。最后,提出多层特征上采样模块,利用语义信息丰富的高层特征对细节信息丰富的低层特征进行加权操作,逐步上采样实现特征高效恢复,提升分割性能。实验结果表明,所提算法在ISPRS Potsdam和Vaihingen数据集上的总体精度分别达到了94.52%、90.45%,能够高效的提取并融合高分辨率遥感图像的双模态特征,提高遥感图像分割的准确率。
With the rapid development of spatial technology
the resolution of remote sensing images gradually improves. The detailed information and spatial information contained in remote-sensing images are also richer. The ensuing problems are that the difference between various categories becomes and the difference between the same categories becomes larger
the phenomenon of the same spectrum of foreign objects and the different spectrum of the same objects is serious. However
the existing dual-modal segmentation methods do not extract the dual-modal feature information of remote-sensing images separately
and the fusion features are insufficient. The details of upsampling recovery are also insufficient
resulting in the inability to accurately and efficiently learn remote-sensing image information
thereby resulting in segmentation errors
edge blur
and other problems.
This study proposes a high resolution remote-sensing image segmentation based on dual-modal efficient feature learning. The algorithm designs appropriate encoders for different modal remote sensing images
efficiently extracts dual-modal features
and reduces the differences between different path features through interactive reinforcement modules. Then
the dual-modal feature aggregation module and the deep feature-extraction module are proposed to further fuse and extract the dual-modal features. As a result
the network can fully learn the complementary information of the dual-modal. Finally
a multi-layer feature upsampling module is proposed
which uses high-level features with rich semantic information to weight the low-level features with rich detail information. Gradual upsampling is then conducted to achieve efficient feature recovery and improve segmentation performance.
In this paper
experiments on the Potsdam and Vaihingen datasets demonstrate that the overall accuracy reaches 94.52% and 90.45%
respectively. Experimental results show that the segmentation effect of the proposed algorithm is better than that of existing algorithms. The proposed algorithm can efficiently extract and fuse the multi-modal complementary features of high resolution remote-sensing images and improve the segmentation accuracy of remote-sensing images.
This study proposes a high-resolution remote-sensing image segmentation based on dual-modal efficient feature learning. Experiments on the ISPRS Potsdam and Vaihingen datasets show that the proposed model is more suitable for segmenting low vegetation and trees
and roads with very similar spectral features. It can also achieve the accurate segmentation of small targets
such as cars. However
the complexity of the model needs to be further reduced
and much room for improvement in accuracy remains. In the future
a better segmentation network will be designed to fuse more than two modal features and thus obtain more feature information to achieve more accurate remote sensing image segmentation.
remote sensing image segmentationefficient feature extractionintegrationdual-modal feature aggregationdeep feature extractionmultilayer feature upsampling
Ahmed O S, Shemrock A, Chabot D, Dillon C, Williams G, Wasson R and Franklin S E. 2017. Hierarchical land cover and vegetation classification using multispectral data acquired from an unmanned aerial vehicle. International Journal of Remote Sensing, 38(8/10): 2037-2052 [DOI: 10.1080/01431161.2017.1294781http://dx.doi.org/10.1080/01431161.2017.1294781]
Audebert N, Le Saux B and Lefèvre S. 2018. Beyond RGB: very high resolution urban remote sensing with multimodal deep networks. ISPRS Journal of Photogrammetry and Remote Sensing, 140: 20-32 [DOI: 10.1016/j.isprsjprs.2017.11.011http://dx.doi.org/10.1016/j.isprsjprs.2017.11.011]
Chen B Y, Xia M and Huang J Q. 2021a. MFANet: a multi-level feature aggregation network for semantic segmentation of land cover. Remote Sensing, 13(4): 731 [DOI: 10.3390/rs13040731http://dx.doi.org/10.3390/rs13040731]
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848 [DOI: 10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184]
Chen S T, Wu C Q, Mukherjee M and Zheng Y J. 2021b. HA-MPPNet: height aware-multi path parallel network for high spatial resolution remote sensing image semantic segmentation. ISPRS International Journal of Geo-Information, 10(10): 672 [DOI: 10.3390/ijgi10100672http://dx.doi.org/10.3390/ijgi10100672]
Farabet C, Couprie C, Najman L and LeCun Y. 2013. Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8): 1915-1929 [DOI: 10.1109/TPAMI.2012.231http://dx.doi.org/10.1109/TPAMI.2012.231]
Gerard F, Petit S, Smith G, Thomson A, Brown N, Manchester S, Wadsworth R, Bugar G, Halada L, Bezák P, Boltiziar M, De Badts E, Halabuk A, Mojses M, Petrovic F, Gregor M, Hazeu G, Mücher C A, Wachowicz M, Huitu H, Tuominen S, Köhler R, Olschofsky K, Ziese H, Kolar J, Sustera J, Luque S, Pino J, Pons X, Roda F, Roscher M and Feranec J. 2010. Land cover change in Europe between 1950 and 2000 determined employing aerial photography. Progress in Physical Geography: Earth and Environment, 34(2): 183-205 [DOI: 10.1177/0309133309360141http://dx.doi.org/10.1177/0309133309360141]
Hazirbas C, Ma L N, Domokos C and Cremers D. 2016. FuseNet: incorporating depth into semantic segmentation via fusion-based CNN architecture//Proceedings of the 13th Asian Conference on Computer Vision. Taipei, China: Springer: 213-228 [DOI: 10.1007/978-3-319-54181-5_14http://dx.doi.org/10.1007/978-3-319-54181-5_14]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hu J, Shen L, Albanie S, Sun G and Wu E H. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2011-2023 [DOI: 10.1109/TPA-MI.2019.2913372http://dx.doi.org/10.1109/TPA-MI.2019.2913372]
Li H C, Xiong P F, An J and Wang L X. 2018. Pyramid attention network for semantic segmentation. arXiv preprint arXiv: 1805.10180
Marcos D, Volpi M, Kellenberger B and Tuia D. 2018. Land cover mapping at very high resolution with rotation equivariant CNNs: towards small yet accurate models. ISPRS Journal of Photogrammetry and Remote Sensing, 145: 96-107 [DOI: 10.1016/j.isprsjprs.2018.01.021http://dx.doi.org/10.1016/j.isprsjprs.2018.01.021]
Marmanis D, Wegner J D, Galliani S, Schindler K, Datcu M and Stilla U. 2016. Semantic segmentation of aerial images with an ensemble of CNNs. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, III-3: 473-480 [DOI: 10.5194/isprs-annals-III-3-473-2016http://dx.doi.org/10.5194/isprs-annals-III-3-473-2016]
Mnih V and Hinton G E. 2010. Learning to detect roads in high-resolution aerial images//Proceedings of the 11th European Conference on Computer Vision. Heraklion: Springer: 210-223 [DOI: 10.1007/978-3-642-15567-3_16http://dx.doi.org/10.1007/978-3-642-15567-3_16]
Paisitkriangkrai S, Sherrah J, Janney P and Van-Den Hengel A. 2015. Effective semantic pixel labelling with convolutional networks and Conditional Random Fields//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Boston: IEEE: 36-43 [DOI: 10.1109/CVPRW.2015.7301381http://dx.doi.org/10.1109/CVPRW.2015.7301381]
Ronneberger O, Fischer P and Brox T. 2015. U-net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 640-651 [DOI: 10.1109/TPAMI.2016.2572683http://dx.doi.org/10.1109/TPAMI.2016.2572683]
Sun H Q, Pan C, He L M and Xu Z J. 2022. Remote sensing image semantic segmentation network based on multimodal feature fusion. Computer Engineering and Applications, 58(24): 256-264
孙汉淇, 潘晨, 何灵敏, 胥智杰. 2022. 多模态特征融合的遥感图像语义分割网络. 计算机工程与应用, 58(24): 256-264 [DOI: 10.3778/j.issn.1002-8331.2207-0010http://dx.doi.org/10.3778/j.issn.1002-8331.2207-0010]
Sun J X and Li Y J. 2021. Multi-feature fusion network for road scene semantic segmentation. Computers and Electrical Engineering, 92: 107155 [DOI: 10.1016/j.compeleceng.2021.107155http://dx.doi.org/10.1016/j.compeleceng.2021.107155]
Wang S L. 2005. The magic eye of modern war-military application of satellite remote sensing image. China Surveying and Mapping, (1): 34-37
王树连. 2005. 现代战争的神奇天眼——卫星遥感图像的军事应用. 中国测绘, (1): 34-37
Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks//Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 7794-7803 [DOI: 10.1109/CVPR.2018.00813http://dx.doi.org/10.1109/CVPR.2018.00813]
Wu G M, Shao X W, Guo Z L, Chen Q, Yuan W, Shi X D, Xu Y W and Shibasaki R. 2018. Automatic building segmentation of aerial imagery using multi-constraint fully convolutional networks. Remote Sensing, 10(3): 407 [DOI: 10.3390/rs10030407http://dx.doi.org/10.3390/rs10030407]
Xu Y Y, Wu L, Xie Z and Chen Z L. 2018. Building extraction in very high resolution remote sensing imagery using deep learning and guided filters. Remote Sensing, 10(1): 144 [DOI: 10.3390/rs10010144http://dx.doi.org/10.3390/rs10010144]
Yang X, Li S S, Chen Z C, Chanussot J, Jia X P, Zhang B, Li B P and Chen P. 2021. An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 177: 238-262 [DOI: 10.1016/J.ISPRSJPRS.2021.05.004http://dx.doi.org/10.1016/J.ISPRSJPRS.2021.05.004]
Zhang W K, Liu W J, Sun X, Xu G L and Fu K. 2022. Multi-source features adaptation fusion network for semantic segmentation in high-resolution remote sensing images. Journal of Image and Graphics, 27(8): 2516-2526
张文凯, 刘文杰, 孙显, 许光銮, 付琨. 2022. 多源特征自适应融合网络的高分遥感影像语义分割. 中国图象图形学报, 27(8): 2516-2526 [DOI: 10.11834/jig.210054http://dx.doi.org/10.11834/jig.210054]
Zhang X J and Wang X L. 2020. Image segmentation models of remote sensing using full residual connection and multiscale feature fusion. Journal of Remote Sensing (Chinese), 24(9): 1120-1133
张小娟, 汪西莉. 2020. 完全残差连接与多尺度特征融合遥感图像分割. 遥感学报, 24(9): 1120-1133 [DOI: 10.11834/jrs.20208365http://dx.doi.org/10.11834/jrs.20208365]
Zhou W Q, Huang G L and Cadenasso M L. 2011. Does spatial configuration matter? Understanding the effects of land cover pattern on land surface temperature in urban landscapes. Landscape and Urban Planning, 102(1): 54-63 [DOI: 10.1016/j.landurbplan.2011.03.009http://dx.doi.org/10.1016/j.landurbplan.2011.03.009]