深度神经网络条件随机场高分辨率遥感图像建筑物分割

王宇; 杨艺; 王宝山; 王田; 卜旭辉; 王传云

doi:10.11834/jrs.20198141

浏览量 : 0 下载量: 1710 CSCD: 10

PDF
导出
分享
收藏
专辑

深度神经网络条件随机场高分辨率遥感图像建筑物分割
Building segmentation in high-resolution remote sensing image through deep neural network and conditional random fields
2019年23卷第6期页码：1194-1208
收稿：2018-03-27，

录用：2018-6-15，

纸质出版：2019-11
DOI： 10.11834/jrs.20198141
稿件说明：

移动端阅览

王宇, 杨艺, 王宝山, 王田, 卜旭辉, 王传云. 2019. 深度神经网络条件随机场高分辨率遥感图像建筑物分割. 遥感学报, 23(6): 1194–1208 DOI： 10.11834/jrs.20198141.

Wang Y, Yang Y, Wang B S, Wang T, Bu X H and Wang C Y. 2019. Building segmentation in high-resolution remote sensing image through deep neural network and conditional random fields. Journal of Remote Sensing, 23(6): 1194–1208 DOI： 10.11834/jrs.20198141.

摘要

高分辨率遥感图像建筑物分割的实质是构建一个输入图像到分割结果之间的高维强非线性映射模型。然而，建筑物可能遍布整幅遥感图像，则在语义分割过程中，当前像素点可能与非邻域的像素点存在直接关系。为了更加精确地逼近建筑物分割的真实映射模型，克服道路、建筑物错层和阴影的影响，提高分割精度，本文以深度残差神经网络为基础，构建Encoder-Decoder的深度学习架构，自动提取建筑物的特征，学习建立高维强非线性分割模型；同时，通过条件随机场的成对势函数调节当前像素点与其他像素点之间的关联关系，从而构成全连接条件随机场对Encoder-Decoder的分割结果进行调节，提升分割精度。在全连接条件随机场的计算过程中，采用循环神经网络的运行机制来完成均值场的计算，这将条件随机场与深度神经网络有机融合，实现了Encoder-Decoder和全连接条件随机场参数的同步训练。实验结果表明，本文采用的深度神经网络条件随机场方法能有效克服道路、建筑物错层和阴影的影响，提升高分辨率遥感图像中建筑物的分割精度；同时，在一定范围内对多分辨率遥感图像具有较好的泛化能力。

Abstract

The core of building segmentation in high-resolution remote sensing image is to establish the mapping from an image feature space to a segmentation result with high dimension and strong nonlinearity. In a high-resolution remote sensing image

a building frequently emerges at any location in the entire image

thereby indicating that non-neighborhood pixels may be related to the current semantic segmentation pixel. The segmentation precision and generalization are significantly improved by adopting a Deep Neural Network (DNN) to extract the features and learn the nonlinear mapping in image segmentation. However

the non-neighborhood feature cannot be directly extracted by the DNN. This study presents an encoder–decoder deep learning architecture with ResNet and Conditional Random Field (CRF) for building semantic segmentation in a high-resolution remote sensing image to obtain high segmentation precision and reduce the obstacles from roads

staggered floors

and shadows. In the DNN

ResNet is used to establish the encoder for automatically extracting the building features

in which ResNet avoids the problems of vanishing and exploding gradient and accelerates the convergence of DNN weights. Before each convolution operation

batch normalization is adopted to normalize the sampling data and reduce the training difficulty of the DNN. Then

transposed convolution is applied to establish the decoder for reconstructing the image while segmenting the buildings. At the end of the DNN

the CRF is used to adjust the raw segmentation produced by the decoder. The value of a unary potential function in the CRF is given by the raw result of the decoder

and the pairwise potential function denotes the feature of pixel pairs in the entire image

which constructs a fully connected CRF (FCCRF). Considering that the calculation of FCCRF is considerable

a mean field algorithm is used to approximate the pairwise potential function value. Thus

convolution is used to obtain the pairwise potential function value

and a high-dimensional Gaussian filter is applied to implement the convolution operation. The mean field algorithm is implemented through an RNN mechanism. Thus

FCCRF becomes a part of the DNN

and the parameters of the CRF are trained with the encoder and decoder simultaneously. Experiments are conducted to validate the effectiveness of the proposed methodology. The remote sensing image dataset is Inria Aerial Image Labeling Dataset. A total of 4500 samples with 1000×1000×3 pixels are found in each sample

in which their resolution is 0.3 m. The typical kinds of building

such as building with order

single building with complicated roof

and building without order

are segmented through VGG

ResNet

and the proposed methodology (denoted as ResNetCRF)

correspondingly. The results show that ResNetCRF overcomes the interruption of roads in which their color features are similar to the building and effectively reduces the disturbance of staggered floors and shadows. Thus

ResNetCRF obtains the optimal segmentation precision. The multi-resolution experiment demonstrates that ResNetCRF has a strong generalization under a limited range of resolution change. Accurate mapping of building segmentation is established to reduce the disturbance of roads

shadows

and staggered floors by introducing CRFs in the encoder–decoder based on ResNet to segment the building in a high-resolution remote sensing image. In the future work

we will investigate the reduction of FCCRF calculation

overcome the missing segmentation of small buildings

and reduce the segmentation errors of a building whose color feature is similar to the background without a noticeable edge.

关键词

Keywords

references

Adams A, Baek J and Davis M A. 2010. Fast high‐dimensional filtering using the permutohedral lattice. Computer Graphics Forum, 29(2): 753–762

Alshehhi R, Marpu P R, Woon W L and Mura M D. 2017. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing, 130: 139–149

Audebert N, Le Saux B and Lefèvre S. 2018. Beyond RGB: very high resolution urban remote sensing with multimodal deep networks. ISPRS Journal of Photogrammetry and Remote Sensing, 140: 20–32

Badrinarayanan V, Kendall A and Cipolla R. 2017. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(12): 2481–2495

Bittner K, Cui S Y and Reinartz P. 2017. Building extraction from remote sensing data using fully convolutional Networks//International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. Hannover, Germany: ISPR: 481–486

Chatzis S P, Kosmopoulos D I and Doliotis P. 2013. A conditional random field-based model for joint sequence segmentation and classification. Pattern Recognition, 46(6): 1569–1578

陈杰, 邓敏, 肖鹏峰, 杨敏华, 梅小明, 刘慧敏. 2011. 利用小波变换的高分辨率多光谱遥感图像多尺度分水岭分割. 遥感学报, 15(5): 908–926

Chen J, Deng M, Xiao P F, Yang M H, Mei X M and Liu H M. 2011. Multi-scale watershed segmentation of high-resolution multi-spectral remote sensing image using wavelet transform. Journal of Remote Sensing, 15(5): 908–926

Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2014. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arVix preprint arXiv: 1412.7062(2014)

Chen L C, Papandreou G, Kokkinos I, Murphy K and Yuille A L. 2017. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4): 834–848

Dumoulin V and Visin F. 2016. A guide to convolution arithmetic for deep learning. arXiv preprint arXiv: 1603.07285(2016)

Glorot X, Bordes A and Bengio Y. 2011. Deep sparse rectifier neural networks//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, FL, USA: [s.n.]

He K M, Zhang X Y, Ren S Q and Sun J. 2015. Deep residual learning for image recognition. arXiv preprint arXiv: 1512.03385(2015)

Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv: 1502.03167(2015)

Jiao L C, Liang M M, Chen H, Yang S Y, Liu H Y and Cao X H. 2017. Deep fully convolutional network-based spatial distribution prediction for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(10): 5585–5599

Krähenbühl P and Koltun V. 2012. Efficient inference in fully connected CRFs with gaussian edge potentials. arXiv preprint arXiv: 1210.5644(2012)

李航. 2012. 统计学习方法. 北京: 清华大学出版社

Li H. 2012. Statistical Learning Method. Beijing: Tsinghua University Press

Lin H N, Shi Z W and Zou Z X. 2017. Fully convolutional network with task partitioning for inshore ship detection in optical remote sensing images. IEEE Geoscience and Remote Sensing Letters, 14(10): 1665–1669

Liu Y, Carbonell J, Weigele P and Gopalakrishnan V. 2005. Segmentation conditional random fields (SCRFs): a new approach for protein fold recognition//Proceedings of the 9th Annual International Conference on Research in Computational Molecular Biology. Cambridge, MA, USA: Springer: 408–422

Liu Y S, Piramanayagam S, Monteiro S T and Saber E. 2017. Dense semantic labeling of very-high-resolution aerial imagery and LiDAR with fully-convolutional neural networks and higher-order CRFs//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu, HI, USA: IEEE: 1561–1570

Noh H, Hong S and Han B. 2015. Learning deconvolution network for semantic segmentation//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 1520–1528

Ronneberger O, Fischer P and Brox T. 2015. U-Net: convolutional networks for biomedical image segmentation//Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention. Munich, Germany: Springer

Shelhamer E, Long J and Darrell T. 2017. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(4): 640–651

沈家煊. 2004. 人工智能中的" 联结主义”和语法理论. 外国语(3): 2–10

Shen J X. 2004. Connectionism in AI and grammatical theories. Journal of Foreign Languages(3): 2–10

Simonyan K and Zisserman A. 2014. Visual geometry group[EB/OL]. http://www.robots.ox.ac.uk/~vgg/research/very_deep/ http://www.robots.ox.ac.uk/~vgg/research/very_deep/ (2014)

王玉, 李玉, 赵泉华. 2018. 基于区域的多尺度全色遥感图像分割. 控制与决策, 33(3): 535–541

Wang Y, Li Y and Zhao Q H. 2018. A region-based multiscale segmentation of panchromatic remote sensing image. Control and Decision, 33(3): 535–541

王宇, 王宝山, 王田, 杨艺. 2018. 面向遥感图像水域分割的图像熵主动轮廓模型. 光学精密工程, 26(3): 698–707

Wang Y, Wang B S, Wang T and Yang Y. 2018. Image entropy active contour models towards water area segmentation in remote sensing image. Optics and Precision Engineering, 26(3): 698–707

Wang Y Y, Wang C and Zhang H. 2017. Integrating H-A-α with fully convolutional networks for fully PolSAR classification//Proceedings of 2017 International Workshop on Remote Sensing with Intelligent Processing (RSIP). Shanghai, China: IEEE: 1-4

Xu L L, Shafiee M J, Wong A and Clausi D A. 2016. Fully connected continuous conditional random field with stochastic cliques for dark-spot detection in SAR imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 9(7): 2882–2890

张建龙, 王斌. 2017. DSSRM级联分割的SAR图像变化检测. 遥感学报, 21(4): 614–621

Zhang J L and Wang B. 2017. SAR image change detection method of DSSRM based on cascade segmentation. Journal of Remote Sensing, 21(4): 614–621

Zheng S, Jayasumana S, Romera-Paredes B, Vineet V, Su Z Z, Du D L, Huang C and Torr P H S. 2015. Conditional random fields as recurrent neural networks//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 1529–1537

Zhong Y F, Han X B and Zhang L P. 2018. Multi-class geospatial object detection based on a position-sensitive balancing framework for high spatial resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 138: 281–294

周东国, 高潮, 郭永彩. 2014. 一种参数自适应的简化PCNN图像分割方法. 自动化学报, 40(6): 1191–1197

Zhou D G, Gao C and Guo Y C. 2014. Adaptive simplified PCNN parameter setting for image segmentation. Acta Automatica Sinica, 40(6): 1191–1197

文章被引用时，请邮件提醒。

提交

低秩张量嵌入的高光谱图像去噪神经网络

基于深度学习的高光谱遥感图像混合像元分解研究综述

地形校正对U-Net深度神经网络分类器分类精度的影响

全局局部细节感知条件随机场的高分辨率遥感影像建筑物提取