融合余弦退火与空洞卷积的遥感影像语义分割
Remote sensing image semantic segmentation method combining cosine annealing with atrous convolution
- 2023年27卷第11期 页码:2579-2592
纸质出版日期: 2023-11-07
DOI: 10.11834/jrs.20211038
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2023-11-07 ,
扫 描 看 全 文
唐振超,韦蔚,罗蔚然,胡洁,张东映.2023.融合余弦退火与空洞卷积的遥感影像语义分割.遥感学报,27(11): 2579-2592
Tang Z C,Wei W,Luo W R,Hu J and Zhang D Y. 2023. Remote sensing image semantic segmentation method combining cosine annealing with atrous convolution. National Remote Sensing Bulletin, 27(11):2579-2592
为了捕捉遥感影像中丰富的上下文信息与多尺度的地物信息,改进集成模型的策略,提高语义分割精度,提出一种融合周期递增余弦退火与多尺度空洞卷积的高分辨率遥感影像语义分割方法。方法引入多尺度并行的空洞卷积,有利于捕捉更大范围的上下文信息,在不增加参数的情况下,提高网络对多尺度对象的辨识能力;使用全连接条件随机场引入空间和边缘的上下文信息,提高网络对遥感影像的细节分割能力;引入周期递增的余弦退火策略调整学习率,获得合适数量的局部最优解,集成局部最优解进一步提升网络在像素上的分类能力。在Gaofen Image Dataset数据集上的实验结果表明,多尺度并行空洞卷积可以充分捕捉遥感影像上的多尺度地物信息,能有效辨识复杂对象;空间和边缘上下文信息的引入使语义分割对象的边界辨识更精准;周期递增余弦退火策略能明显减少集成模型的推理时间,模型的总体精度与Kappa系数均优于目前主流的语义分割模型。
This study aims to capture the rich context information and multiscale feature information in remote sensing images
improve the integrated model strategy
and enhance the accuracy of semantic segmentation. Thus
this study proposes a high-resolution remote sensing image semantic segmentation method using cosine annealing with increasing period and multiscale atrous convolution.
The multiscale parallel atrous convolution helps the network capture context information in a larger range and improves the ability of the network to recognize multiscale objects without increasing parameters. The method in this study uses the atrous convolution while discarding the pooling operation to maintain the spatial resolution. Meanwhile
the method adopts the fully connected conditional random field to add spatial and edge context information for making up for part of the position information missed by the atrous convolution. As a result
the outline of extraction objects by semantic segmentation fits the ground truth better. Moreover
the cosine annealing strategy with increasing period is introduced to adjust the learning rate and obtain a suitable number of local optimal solutions. We integrate the local optimal solutions in the method to further improve the pixel classification ability of the network.
The overall accuracy and kappa coefficient of the proposed model
which are 86.6% and 81.8%
respectively
are better than those of the current advanced semantic segmentation models.
The experimental results performed on the Gaofen image dataset show that the fusion of image context information and multiscale feature information can effectively identify objects with complex structures. Moreover
the model coupled with the period-increasing cosine annealing strategy could obtain better semantic segmentation accuracy than and less inference time than that coupled with the equal-period cosine annealing strategy.
高分辨率遥感影像语义分割周期递增余弦退火多尺度并行空洞卷积目标提取上下文学习条件随机场多尺度学习
high-resolution remote sensing imagesemantic segmentationcosine annealing with increasing periodmulti-scale parallel atrous convolutiontarget extractionin-context learningconditional random fieldmulti-scale learning
Anthimopoulos M, Christodoulidis S, Ebner L, Geiser T, Christe A and Mougiakakou S. 2019. Semantic segmentation of pathological lung tissue with dilated fully convolutional networks. IEEE Journal of Biomedical and Health Informatics, 23(2): 714-722 [DOI: 10.1109/JBHI.2018.2818620http://dx.doi.org/10.1109/JBHI.2018.2818620]
Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12): 2481-2495 [DOI: 10.1109/TPAMI.2016.2644615http://dx.doi.org/10.1109/TPAMI.2016.2644615]
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2015. Semantic image segmentation with deep convolutional nets and fully connected CRFs//3rd International Conference on Learning Representations. San Diego: ICLR
Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2018. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834-848 [DOI: 10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184]
Chen L C, Papandreou G, Schroff F and Adam H. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706. 05587 [DOI: 10.48550/arXiv.1706.05587http://dx.doi.org/10.48550/arXiv.1706.05587]
Deng J, Dong W, Socher R, Li L J, Li K and Fei-Fei L. 2009. ImageNet: a large-scale hierarchical image database//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE: 248-255 [DOI: 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848]
Dumoulin V and Visin F. 2016. A guide to convolution arithmetic for deep learning. arXiv preprint arXiv:1603. 07285 [DOI: 10.48550/arXiv.1603.07285http://dx.doi.org/10.48550/arXiv.1603.07285]
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V and Garcia-Rodriguez J. 2017. A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704. 06857 [DOI: 10.48550/arXiv.1704.06857http://dx.doi.org/10.48550/arXiv.1704.06857]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
Hinton G, Vinyals O and Dean J. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503. 02531 [DOI: 10.48550/arXiv.1503.02531http://dx.doi.org/10.48550/arXiv.1503.02531]
Huang G, Li Y X, Pleiss G, Liu Z, Hopcroft J E and Weinberger K Q. 2017. Snapshot ensembles: train 1, get m for free//5th International Conference on Learning Representations. Toulon: ICLR
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift//Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille: JMLR.org: 448-456
Kamann C and Rother C. 2020. Benchmarking the robustness of semantic segmentation models//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE: 8825-8835 [DOI: 10.1109/CVPR42600.2020.00885http://dx.doi.org/10.1109/CVPR42600.2020.00885]
Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//3rd International Conference on Learning Representations. San Diego: ICLR
Krähenbühl P and Koltun V. 2011. Efficient inference in fully connected CRFs with Gaussian edge potentials//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada: Curran Associates Inc.: 109-117
Li Y, Xiao C J, Zhang H Q, Li X J and Chen J. 2020. Remote sensing image semantic segmentation using deep fusion convolutional networks and conditional random field. Remote Sensing for Natural Resources, 32(3): 15-22 [DOI: 10.6046/gtzyyg.2020.03.03http://dx.doi.org/10.6046/gtzyyg.2020.03.03]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 3431-3440 [DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]
Loshchilov I and Hutter F. 2017. SGDR: stochastic gradient descent with warm restarts//5th International Conference on Learning Representations. Toulon: ICLR
Polino A, Pascanu R and Alistarh D. 2018. Model compression via distillation and quantization//6th International Conference on Learning Representations. Vancouver: ICLR
Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich: Springer: 234-241 [DOI: 10.1007/978-3-319-24574-4_28http://dx.doi.org/10.1007/978-3-319-24574-4_28]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//3rd International Conference on Learning Representations. San Diego: ICLR
Sun W W and Wang R S. 2018. Fully convolutional networks for semantic segmentation of very high resolution remotely sensed images combined with DSM. IEEE Geoscience and Remote Sensing Letters, 15(3): 474-478 [DOI: 10.1109/LGRS.2018.2795531http://dx.doi.org/10.1109/LGRS.2018.2795531]
Teichmann M and Cipolla R. 2019. Convolutional CRFs for semantic segmentation//30th British Machine Vision Conference 2019. Cardiff: BMVC: 142
Tong X Y, Xia G S, Lu Q K, Shen H F, Li S Y, You S C and Zhang L P. 2020. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sensing of Environment, 237: 111322 [DOI: 10.1016/j.rse.2019.111322http://dx.doi.org/10.1016/j.rse.2019.111322]
Wang P Q, Chen P F, Yuan Y, Liu D, Huang Z H, Hou X D and Cottrell G. 2018. Understanding convolution for semantic segmentation//2018 IEEE Winter Conference on Applications of Computer Vision (WACV). Lake Tahoe: IEEE: 1451-1460 [DOI: 10.1109/WACV.2018.00163http://dx.doi.org/10.1109/WACV.2018.00163]
Wang Z W, Wang Z P, You S C, Lei F, Cao L and Yang K J. 2020. Landsat image glacier extraction based on context semantic segmentation network. Acta Geodaetica et Cartographica Sinica, 49(12): 1575-1582 [DOI: 10.11947/j.AGCS.2020.20190313http://dx.doi.org/10.11947/j.AGCS.2020.20190313]
Yu F and Koltun V. 2016. Multi-scale context aggregation by dilated convolutions//4th International Conference on Learning Representations. San Juan: ICLR
Zeiler M D. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 [DOI: 10.48550/arXiv.1212.5701http://dx.doi.org/10.48550/arXiv.1212.5701]
Zhao H S, Shi J P, Qi X J, Wang X G and Jia J Y. 2017. Pyramid scene parsing network//Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 6230-6239 [DOI: 10.1109/CVPR.2017.660http://dx.doi.org/10.1109/CVPR.2017.660]
Zhao Q H, Xie K L, Wang G H and Li Y. 2020. Land cover classification of polarimetric SAR with fully convolution network and conditional random field. Acta Geodaetica et Cartographica Sinica, 49(1): 65-78 [DOI: 10.11947/j.AGCS.2020.20190038http://dx.doi.org/10.11947/j.AGCS.2020.20190038]
Zhou P C, Cheng G, Yao X W and Han J W. 2021. Machine learning paradigms in high-resolution remote sensing image interpretation. National Remote Sensing Bulletin, 25(1): 182-197 [DOI: 10.11834/jrs.20210164http://dx.doi.org/10.11834/jrs.20210164]
Zuo Z C, Zhang W and Zhang D Y. 2020. A remote sensing image semantic segmentation method by combining deformable convolution with conditional random fields. Journal of Geodesy and Geoinformation Science, 3(3): 39-49 [DOI: 10.11947/j.JGGS.2020.0304http://dx.doi.org/10.11947/j.JGGS.2020.0304]
相关作者
相关机构