融合余弦退火与空洞卷积的遥感影像语义分割

唐振超; 韦蔚; 罗蔚然; 胡洁; 张东映

doi:10.11834/jrs.20211038

模型与方法 | 浏览量 : 0 下载量: 1583 CSCD: 1

R-PDF
PDF
导出
分享
收藏
专辑

融合余弦退火与空洞卷积的遥感影像语义分割
Remote sensing image semantic segmentation method combining cosine annealing with atrous convolution
2023年27卷第11期页码：2579-2592
收稿：2021-02-09，

纸质出版：2023-11-07
DOI： 10.11834/jrs.20211038
稿件说明：

移动端阅览

唐振超，韦蔚，罗蔚然，胡洁，张东映.2023.融合余弦退火与空洞卷积的遥感影像语义分割.遥感学报，27（11）： 2579-2592 DOI： 10.11834/jrs.20211038.

Tang Z C，Wei W，Luo W R，Hu J and Zhang D Y. 2023. Remote sensing image semantic segmentation method combining cosine annealing with atrous convolution. National Remote Sensing Bulletin， 27（11）：2579-2592 DOI： 10.11834/jrs.20211038.

摘要

为了捕捉遥感影像中丰富的上下文信息与多尺度的地物信息，改进集成模型的策略，提高语义分割精度，提出一种融合周期递增余弦退火与多尺度空洞卷积的高分辨率遥感影像语义分割方法。方法引入多尺度并行的空洞卷积，有利于捕捉更大范围的上下文信息，在不增加参数的情况下，提高网络对多尺度对象的辨识能力；使用全连接条件随机场引入空间和边缘的上下文信息，提高网络对遥感影像的细节分割能力；引入周期递增的余弦退火策略调整学习率，获得合适数量的局部最优解，集成局部最优解进一步提升网络在像素上的分类能力。在Gaofen Image Dataset数据集上的实验结果表明，多尺度并行空洞卷积可以充分捕捉遥感影像上的多尺度地物信息，能有效辨识复杂对象；空间和边缘上下文信息的引入使语义分割对象的边界辨识更精准；周期递增余弦退火策略能明显减少集成模型的推理时间，模型的总体精度与Kappa系数均优于目前主流的语义分割模型。

Abstract

This study aims to capture the rich context information and multiscale feature information in remote sensing images

improve the integrated model strategy

and enhance the accuracy of semantic segmentation. Thus

this study proposes a high-resolution remote sensing image semantic segmentation method using cosine annealing with increasing period and multiscale atrous convolution.

The multiscale parallel atrous convolution helps the network capture context information in a larger range and improves the ability of the network to recognize multiscale objects without increasing parameters. The method in this study uses the atrous convolution while discarding the pooling operation to maintain the spatial resolution. Meanwhile

the method adopts the fully connected conditional random field to add spatial and edge context information for making up for part of the position information missed by the atrous convolution. As a result

the outline of extraction objects by semantic segmentation fits the ground truth better. Moreover

the cosine annealing strategy with increasing period is introduced to adjust the learning rate and obtain a suitable number of local optimal solutions. We integrate the local optimal solutions in the method to further improve the pixel classification ability of the network.

The overall accuracy and kappa coefficient of the proposed model

which are 86.6% and 81.8%

respectively

are better than those of the current advanced semantic segmentation models.

The experimental results performed on the Gaofen image dataset show that the fusion of image context information and multiscale feature information can effectively identify objects with complex structures. Moreover

the model coupled with the period-increasing cosine annealing strategy could obtain better semantic segmentation accuracy than and less inference time than that coupled with the equal-period cosine annealing strategy.

关键词

Keywords

references

Anthimopoulos M , Christodoulidis S , Ebner L , Geiser T , Christe A and Mougiakakou S . 2019 . Semantic segmentation of pathological lung tissue with dilated fully convolutional networks . IEEE Journal of Biomedical and Health Informatics , 23 ( 2 ): 714 - 722 [ DOI: 10.1109/JBHI.2018.2818620 http://dx.doi.org/10.1109/JBHI.2018.2818620 ]

Badrinarayanan V , Kendall A and Cipolla R . 2017 . SegNet: a deep convolutional encoder-decoder architecture for image segmentation . IEEE Transactions on Pattern Analysis and Machine Intelligence , 39 ( 12 ): 2481 - 2495 [ DOI: 10.1109/TPAMI.2016.2644615 http://dx.doi.org/10.1109/TPAMI.2016.2644615 ]

Chen L C , Papandreou G , Kokkinos I , Murphy K and Yuille A L . 2015 . Semantic image segmentation with deep convolutional nets and fully connected CRFs // 3rd International Conference on Learning Representations . San Diego : ICLR

Chen L C , Papandreou G , Kokkinos I , Murphy K and Yuille A L . 2018 . DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs . IEEE Transactions on Pattern Analysis and Machine Intelligence , 40 ( 4 ): 834 - 848 [ DOI: 10.1109/TPAMI.2017.2699184 http://dx.doi.org/10.1109/TPAMI.2017.2699184 ]

Chen L C , Papandreou G , Schroff F and Adam H . 2017 . Rethinking atrous convolution for semantic image segmentation . arXiv preprint arXiv : 1706 . 05587 [ DOI: 10.48550/arXiv.1706.05587 http://dx.doi.org/10.48550/arXiv.1706.05587 ]

Deng J , Dong W , Socher R , Li L J , Li K and Fei-Fei L . 2009 . ImageNet: a large-scale hierarchical image database // 2009 IEEE Conference on Computer Vision and Pattern Recognition . Miami : IEEE: 248 - 255 [ DOI: 10.1109/CVPR.2009.5206848 http://dx.doi.org/10.1109/CVPR.2009.5206848 ]

Dumoulin V and Visin F . 2016 . A guide to convolution arithmetic for deep learning . arXiv preprint arXiv : 1603 . 07285 [ DOI: 10.48550/arXiv.1603.07285 http://dx.doi.org/10.48550/arXiv.1603.07285 ]

Garcia-Garcia A , Orts-Escolano S , Oprea S , Villena-Martinez V and Garcia-Rodriguez J . 2017 . A review on deep learning techniques applied to semantic segmentation . arXiv preprint arXiv : 1704 . 06857 [ DOI: 10.48550/arXiv.1704.06857 http://dx.doi.org/10.48550/arXiv.1704.06857 ]

He K M , Zhang X Y , Ren S Q and Sun J . 2016 . Deep residual learning for image recognition // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas : IEEE: 770 - 778 [ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

Hinton G , Vinyals O and Dean J . 2015 . Distilling the knowledge in a neural network . arXiv preprint arXiv : 1503 . 02531 [ DOI: 10.48550/arXiv.1503.02531 http://dx.doi.org/10.48550/arXiv.1503.02531 ]

Huang G , Li Y X , Pleiss G , Liu Z , Hopcroft J E and Weinberger K Q . 2017 . Snapshot ensembles : train 1 , get m for free // 5th International Conference on Learning Representations . Toulon: ICLR

Ioffe S and Szegedy C . 2015 . Batch normalization: accelerating deep network training by reducing internal covariate shift // Proceedings of the 32nd International Conference on International Conference on Machine Learning . Lille : JMLR.org: 448 - 456

Kamann C and Rother C . 2020 . Benchmarking the robustness of semantic segmentation models // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle : IEEE: 8825 - 8835 [ DOI: 10.1109/CVPR42600.2020.00885 http://dx.doi.org/10.1109/CVPR42600.2020.00885 ]

Kingma D P and Ba J . 2015 . Adam: a method for stochastic optimization // 3rd International Conference on Learning Representations . San Diego : ICLR

Krähenbühl P and Koltun V . 2011 . Efficient inference in fully connected CRFs with Gaussian edge potentials // Proceedings of the 24th International Conference on Neural Information Processing Systems . Granada : Curran Associates Inc.: 109 - 117

Li Y , Xiao C J , Zhang H Q , Li X J and Chen J . 2020 . Remote sensing image semantic segmentation using deep fusion convolutional networks and conditional random field . Remote Sensing for Natural Resources , 32 ( 3 ): 15 - 22 [ DOI: 10.6046/gtzyyg.2020.03.03 http://dx.doi.org/10.6046/gtzyyg.2020.03.03 ]

Long J , Shelhamer E and Darrell T . 2015 . Fully convolutional networks for semantic segmentation // Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition . Boston : IEEE: 3431 - 3440 [ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]

Loshchilov I and Hutter F . 2017 . SGDR: stochastic gradient descent with warm restarts // 5th International Conference on Learning Representations . Toulon : ICLR

Polino A , Pascanu R and Alistarh D . 2018 . Model compression via distillation and quantization // 6th International Conference on Learning Representations . Vancouver : ICLR

Ronneberger O , Fischer P and Brox T . 2015 . U-Net: convolutional networks for biomedical image segmentation // Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention . Munich : Springer: 234 - 241 [ DOI: 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ]

Simonyan K and Zisserman A . 2015 . Very deep convolutional networks for large-scale image recognition // 3rd International Conference on Learning Representations . San Diego : ICLR

Sun W W and Wang R S . 2018 . Fully convolutional networks for semantic segmentation of very high resolution remotely sensed images combined with DSM . IEEE Geoscience and Remote Sensing Letters , 15 ( 3 ): 474 - 478 [ DOI: 10.1109/LGRS.2018.2795531 http://dx.doi.org/10.1109/LGRS.2018.2795531 ]

Teichmann M and Cipolla R . 2019 . Convolutional CRFs for semantic segmentation // 30th British Machine Vision Conference 2019 . Cardiff : BMVC: 142

Tong X Y , Xia G S , Lu Q K , Shen H F , Li S Y , You S C and Zhang L P . 2020 . Land-cover classification with high-resolution remote sensing images using transferable deep models . Remote Sensing of Environment , 237 : 111322 [ DOI: 10.1016/j.rse.2019.111322 http://dx.doi.org/10.1016/j.rse.2019.111322 ]

Wang P Q , Chen P F , Yuan Y , Liu D , Huang Z H , Hou X D and Cottrell G . 2018 . Understanding convolution for semantic segmentation // 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) . Lake Tahoe : IEEE: 1451 - 1460 [ DOI: 10.1109/WACV.2018.00163 http://dx.doi.org/10.1109/WACV.2018.00163 ]

Wang Z W , Wang Z P , You S C , Lei F , Cao L and Yang K J . 2020 . Landsat image glacier extraction based on context semantic segmentation network . Acta Geodaetica et Cartographica Sinica , 49 ( 12 ): 1575 - 1582 [ DOI: 10.11947/j.AGCS.2020.20190313 http://dx.doi.org/10.11947/j.AGCS.2020.20190313 ]

Yu F and Koltun V . 2016 . Multi-scale context aggregation by dilated convolutions // 4th International Conference on Learning Representations . San Juan : ICLR

Zeiler M D . 2012 . ADADELTA: an adaptive learning rate method . arXiv preprint arXiv : 1212 . 5701 [ DOI: 10.48550/arXiv.1212.5701 http://dx.doi.org/10.48550/arXiv.1212.5701 ]

Zhao H S , Shi J P , Qi X J , Wang X G and Jia J Y . 2017 . Pyramid scene parsing network // Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu : IEEE: 6230 - 6239 [ DOI: 10.1109/CVPR.2017.660 http://dx.doi.org/10.1109/CVPR.2017.660 ]

Zhao Q H , Xie K L , Wang G H and Li Y . 2020 . Land cover classification of polarimetric SAR with fully convolution network and conditional random field . Acta Geodaetica et Cartographica Sinica , 49 ( 1 ): 65 - 78 [ DOI: 10.11947/j.AGCS.2020.20190038 http://dx.doi.org/10.11947/j.AGCS.2020.20190038 ]

Zhou P C , Cheng G , Yao X W and Han J W . 2021 . Machine learning paradigms in high-resolution remote sensing image interpretation . National Remote Sensing Bulletin , 25 ( 1 ): 182 - 197 [ DOI: 10.11834/jrs.20210164 http://dx.doi.org/10.11834/jrs.20210164 ]

Zuo Z C , Zhang W and Zhang D Y . 2020 . A remote sensing image semantic segmentation method by combining deformable convolution with conditional random fields . Journal of Geodesy and Geoinformation Science , 3 ( 3 ): 39 - 49 [ DOI: 10.11947/j.JGGS.2020.0304 http://dx.doi.org/10.11947/j.JGGS.2020.0304 ]

文章被引用时，请邮件提醒。

提交

基于图像分解去纠缠与边缘引导的遥感影像语义分割算法

LAE-Transformer：融合局部感知增强的机载LiDAR点云分割网络

三分支集成网络结构的高分辨遥感图像语义分割

Transformer与CNN融合的双分支遥感图像阴影检测

基于改进D-UNet模型的典型洪泛湿地信息提取