基于混合式特征选择的高分五号影像农田识别
Hybrid feature selection for cropland identification using GF-5 satellite image
- 2022年26卷第7期 页码:1383-1394
纸质出版日期: 2022-07-07
DOI: 10.11834/jrs.20220458
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2022-07-07 ,
扫 描 看 全 文
陈珠琳,贾坤,李强子,肖晨超,魏丹丹,赵祥,魏香琴,姚云军,李娟.2022.基于混合式特征选择的高分五号影像农田识别.遥感学报,26(7): 1383-1394
Chen Z L, Jia K, Li Q Z, Xiao C C,Wei D D, Zhao X, Wei X Q, Yao Y J and Li J. 2022. Hybrid feature selection for cropland identification using GF-5 satellite image. National Remote Sensing Bulletin, 26(7):1383-1394
精准农田识别是农作物估产和粮食安全评估的基础。遥感数据作为农田识别的重要数据源,可提供动态、快速的监测结果。高光谱数据在农田识别分类方面具有巨大的应用潜力,但其中的冗余波段影响了分类效率和分类精度。因此,本研究提出了一种适用于高光谱数据农田分类的混合式特征选择算法。首先,基于变量的重要性排序或约束程度,按步长逐步进行降维;其次,寻找分类精度骤减的转折点,并将其对应的变量作为特征子集;最后,利用序列后向选择SBS(Sequential Backward Selection)方法搜索最优分类特征子集。本研究利用GF-5高光谱数据,共研究了3种降维方法(随机森林RF(Random Forest)、互信息MI(Multi-Information)和L1正则化(L1 regularization))和3种分类算法(随机森林、支持向量机SVM(Support Vector Machine)和K近邻KNN(K-Nearest Neighbor))的组合在农田分类中的表现。结果表明,基于L1正则化法得到的特征子集自相关性较低,并且包含的红边和近红外波段有效提高了农田、森林和裸土的区分度。在不同分类模型比较中发现,SVM在高维空间中表现出非常好的抗噪能力,分类精度高于RF和KNN。而RF在低维空间中的泛化能力要高于SVM和KNN。相比于第一步降维得到的特征子集,使用SBS搜索得到的最优特征子集均提高了分类精度。最终,具有23维输入的L1-SVM-SBS分类模型得到了最高的总体分类精度(94.64%)和农田召回率(95.83%)。本研究为高光谱数据特征优选提供了一种新思路,筛选出了更具代表性的特征波段,提高了农田分类精度,对高光谱遥感分类研究具有参考价值。
Accurate farmland area identification is the basis of crop yield estimation and an important indicator in food security assessment. As an important data source for farmland identification
remote sensing data can provide dynamic and fast observation results for classification. GF-5
which is the only hyperspectral satellite in the China High-resolution Earth Observation System
has great research and application potential in farmland identification. However
the dimensionality curse caused by the redundant bands in hyperspectral data seriously affects the calculation speed and classification accuracy of models. To solve this problem
this research proposes a hybrid feature selection algorithm for farmland identification. First
on the basis of the feature importance provided by the feature selection algorithm
the feature dimension is gradually reduced from 295 to 5 with a step length of 10. The overall accuracy of the classification results corresponding to each feature dimension is recorded. Second
the turning point (a dimension number whose corresponding overall accuracy hardly decreases when the input variable number is smaller than it) is determined based on the overall accuracy
and the corresponding variables are adopted as the feature subset. Lastly
the Sequential Backward Selection (SBS) method is used to search for the best subset.Three feature selection algorithms (i.e.
Random Forest (RF)
Multi-Information (MI)
and L1 regularization (L1)) and three classification algorithms (RF
Support Vector Machine (SVM)
and K-Nearest Neighbor (KNN)) are examined. Results indicate that the autocorrelations of the three subsets differ significantly. Most of the bands selected by the MI method are continuous and concentrated in the blue and shortwave infrared range. Therefore
the extremely high autocorrelation that exists in this subset has a negative effect on classification accuracy. By contrast
the correlation between bands in the RF and L1 feature subsets is relatively weak. However
the two feature sets still result in different classification accuracy. According to the variable distribution
many red-edge and near-infrared bands are contained in the L1 feature subset. These bands demonstrate better ability to distinguish farmland
forest
and soil than the blue and red bands selected by the RF algorithm. The classification algorithms also have different capacities. In the high-dimensional space
the SVM algorithm exhibits high robustness to noise
resulting in high accuracy. However
when the dimension decreases to a critical value
the accuracy of SVM decreases sharply. By contrast
although RF is not as robust as SVM in the high-dimensional space
it has excellent generalization ability in the low-dimensional space. Compared with the subsets obtained after the first dimensionality reduction process
the optimal feature subsets obtained by SBS searching improve the classification accuracy of each model.The L1-SVM-SBS model with a 23-dimensional input achieves the highest overall classification accuracy (94.64%) and cropland recall rate (95.83%). This study provides a new method of farmland identification using hyperspectral data. By selecting numerous representative and informative bands
this method not only improves farmland classification accuracy
but can also be used as a reference for other classification problems involving hyperspectral remote sensing.
农田识别高分五号特征选择高光谱遥感L1正则化后向序列选择
cropland identificationGF-5feature selectionhyperspectral remote sensingL1 regularizationsequential backward selection
Alvarez-Meza A M, Lee J A, Verleysen M and Castellanos-Dominguez G. 2017. Kernel-based dimensionality reduction using Renyi’s α-entropy measures of similarity. Neurocomputing, 222: 36-46 [DOI: 10.1016/j.neucom.2016.10.004http://dx.doi.org/10.1016/j.neucom.2016.10.004]
Aneece I and Thenkabail P. 2018. Accuracies achieved in classifying five leading world crop types and their growth stages using optimal earth observing-1 Hyperion hyperspectral narrowbands on Google earth engine. Remote Sensing, 10: 2027 [DOI: 10.3390/rs10122027http://dx.doi.org/10.3390/rs10122027]
Bolón-Canedo V and Alonso-Betanzos A. 2019. Ensembles for feature selection: a review and future trends. Information Fusion, 52: 1-12 [DOI: 10.1016/j.inffus.2018.11.008http://dx.doi.org/10.1016/j.inffus.2018.11.008]
Ding X H, Zhang S Q, Li H P, Wu P, Dale P, Liu L J and Cheng S. 2020. A restrictive polymorphic ant colony algorithm for the optimal band selection of hyperspectral remote sensing images. International Journal of Remote Sensing, 41(3): 1093-1117 [DOI: 10.1080/01431161.2019.1655810http://dx.doi.org/10.1080/01431161.2019.1655810]
Dong C and Zhao G X. 2020. Influence of time series data quality on land cover classification accuracy. Remote Sensing Technology and Application, 35(3): 558-566.
董超, 赵庚星. 2020. 时序数据集构建质量对土地覆盖分类精度的影响研究. 遥感技术与应用, 35(3): 558-566 [DOI: 10.11873/j.issn.1004-0323.2020.3.0558http://dx.doi.org/10.11873/j.issn.1004-0323.2020.3.0558]
Dong X F, Gan F P, Li N, Yan B K, Zhang L, Zhao J Q, Yu J C, Liu R Y and Ma Y N. 2020. Fine mineral identification of GF-5 hyperspectral image. Journal of Remote Sensing, 24(4), 454-464
董新丰, 甘甫平, 李娜, 闫柏琨, 张磊, 赵佳琪, 于峻川, 刘镕源, 马燕妮. 2020. 高分五号高光谱影像矿物精细识别. 遥感学报, 24(4): 454-464 [DOI: 10.11834/jrs.20209194http://dx.doi.org/10.11834/jrs.20209194]
Fan D D, Li Q Z, Wang H Y, Zhang Y, Du X and Shen Y. 2019. Improvement in recognition accuracy of minority crops by resampling of imbalanced training datasets of remote sensing. Journal of Remote Sensing, 23(4): 730-742
樊东东, 李强子, 王红岩, 张源, 杜鑫, 沈宇. 2019. 通过训练样本采样处理改善小宗作物遥感识别精度. 遥感学报, 23(4): 730-742 [DOI: 10.11834/jrs.20197478http://dx.doi.org/10.11834/jrs.20197478]
Ghorbanian A, Kakooei M, Amani M, Mahdavi S, Mohammadzadeh A and Hasanlou M. 2020. Improved land cover map of Iran using sentinel imagery within Google earth engine and a novel automatic workflow for land cover classification using migrated training samples. ISPRS Journal of Photogrammetry and Remote Sensing, 167: 276-288 [DOI: 10.1016/j.isprsjprs.2020.07.013http://dx.doi.org/10.1016/j.isprsjprs.2020.07.013]
González J, Ortega J, Damas M, Martín-Smith P and Gan J Q. 2019. A new multi-objective wrapper method for feature selection-Accuracy and stability analysis for BCI. Neurocomputing, 333: 407-418 [DOI: 10.1016/j.neucom.2019.01.017http://dx.doi.org/10.1016/j.neucom.2019.01.017]
Han Z and Song W. 2019. Spatiotemporal variations in cropland abandonment in the Guizhou-Guangxi karst mountain area, China. Journal of Cleaner Production, 238: 117888 [DOI: 10.1016/j.jclepro.2019.117888http://dx.doi.org/10.1016/j.jclepro.2019.117888]
Hao P Y, Chen Z X, Tang H J, Li D D and Li H. 2019. New workflow of plastic-mulched farmland mapping using multi-temporal sentinel-2 data. Remote Sensing, 11(11): 1353 [DOI: 10.3390/rs11111353http://dx.doi.org/10.3390/rs11111353]
Htitiou A, Boudhar A, Lebrini Y, Hadria R, Lionboui H, Elmansouri L, Tychon B and Benabdelouahab T. 2019. The performance of random forest classification based on Phenological metrics derived from sentinel-2 and landsat 8 to map crop cover in an irrigated semi-arid region. Remote Sensing in Earth Systems Sciences, 2(4): 208-224 [DOI: 10.1007/s41976-019-00023-9http://dx.doi.org/10.1007/s41976-019-00023-9]
Jia K and Li Q Z. 2013. Review of features selection in crop classification using remote sensing data. Resources Science, 35(12): 2507-2516
贾坤, 李强子. 2013. 农作物遥感分类特征变量选择研究现状与展望. 资源科学, 35(12): 2507-2516
Kussul N, Lavreniuk M, Skakun S and Shelestov A. 2017. Deep learning classification of land cover and crop types using remote sensing data. IEEE Geoscience and Remote Sensing Letters, 14(5): 778-782 [DOI: 10.1109/LGRS.2017.2681128http://dx.doi.org/10.1109/LGRS.2017.2681128]
Li M, Chen H, Shi X, Liu S, Zhang M and Lu S F. 2019. A multi-information fusion “triple variables with iteration” inertia weight PSO algorithm and its application. Applied Soft Computing, 84: 105677 [DOI: 10.1016/j.asoc.2019.105677http://dx.doi.org/10.1016/j.asoc.2019.105677]
Li Q Z. 2018. Prospect of grain production and supply service mode of China in internet plus era. China Agricultural Informatics, 30(1): 93-99
李强子. 2018. 互联网+时代中国粮食的生产与供给服务模式思考. 中国农业信息, 30(1): 93-99 [DOI: 10.12105/j.issn.1672-0423.20180109http://dx.doi.org/10.12105/j.issn.1672-0423.20180109]
Li Y, Li T and Liu H. 2017. Recent advances in feature selection and its applications. Knowledge and Information Systems, 53(3): 551-577 [DOI: 10.1007/s10115-017-1059-8http://dx.doi.org/10.1007/s10115-017-1059-8]
Liu D and Sun K. 2019. Random forest solar power forecast based on classification optimization. Energy, 187: 115940 [DOI: 10.1016/j.energy.2019.115940http://dx.doi.org/10.1016/j.energy.2019.115940]
Liu J, Liu J K, An J J and Zhang C. 2020. Precise crop classification based on multi-features from time-series Landsat8 OLI images and random forest algorithm. Agricultural Research in the Arid Areas, 38(3): 281-288, 298
刘杰, 刘吉凯, 安晶晶, 章超. 2020. 基于时序Landsat8 OLI多特征与随机森林算法的作物精细分类研究. 干旱地区农业研究, 38(3): 281-288, 298 [DOI: 10.7606/j.issn.1000-7601.2020.03.37http://dx.doi.org/10.7606/j.issn.1000-7601.2020.03.37]
Liu X P, Li X, Tan Z Z and Chen Y M. 2011. Zoning farmland protection under spatial constraints by integrating remote sensing, GIS and artificial immune systems. International Journal of Geographical Information Science, 25(11): 1829-1848 [DOI: 10.1080/13658816.2011.557380http://dx.doi.org/10.1080/13658816.2011.557380]
Liu X S, Gong Z W and Wu J. 2018. Land use information extraction using multiple features derived from hyperspectral images. Journal of Nanjing Forestry University (Natural Science Edition), 42(4): 141-147
刘晓双, 龚直文, 吴见. 2018. 基于多特征的高光谱遥感土地利用信息提取. 南京林业大学学报(自然科学版). 42(4): 141-147 [DOI: 10.3969/j.issn.1000-2006.201705029http://dx.doi.org/10.3969/j.issn.1000-2006.201705029]
Liu Y N, Xun X D, Hu X N, Liu S F, Cao K Q, Chai M Y, Liao Q J, Zuo Z Q, Hao Z Y, Duan W B, Zhou W Y N, Zhang J and Zhang Y. 2020. Development of visible and short-wave infrared hyperspectral imager onboard GF-5 satellite. Journal of Remote Sensing, 24(4): 333-344
刘银年, 孙德新, 胡晓宁, 刘书锋, 曹开钦, 柴孟阳, 廖清君, 左志强, 郝振贻, 段微波, 周魏乙诺, 张静, 张营. 2020. 高分五号可见短波红外高光谱相机设计与研制. 遥感学报, 24(4): 333-344 [DOI: 10.11834/jrs.20209196http://dx.doi.org/10.11834/jrs.20209196]
Maldonado S and López J. 2018. Dealing with high-dimensional class-imbalanced datasets: embedded feature selection for SVM classification. Applied Soft Computing, 67: 94-105 [DOI: 10.1016/j.asoc.2018.02.051http://dx.doi.org/10.1016/j.asoc.2018.02.051]
Park M Y and Hastie T. 2007. L1‐regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4): 659-677 [DOI: 10.1111/j.1467-9868.2007.00607.xhttp://dx.doi.org/10.1111/j.1467-9868.2007.00607.x]
Radley S, Sybi C J and Premkumar K. 2020. Multi information amount movement aware- routing in FANET: flying ad-hoc networks. Mobile Networks and Applications, 25(2): 596-608 [DOI: 10.1007/s11036-019-01395-4http://dx.doi.org/10.1007/s11036-019-01395-4]
Sánchez-Maroño N, Alonso-Betanzos A and Tombilla-Sanromán M. 2007. Filter methods for feature selection – a comparative study//Proceedings of the 8th International Conference on Intelligent Data Engineering and Automated Learning. Birmingham, UK, December: Springer: 178-187 [DOI: 10.1007/978-3-540-77226-2_19http://dx.doi.org/10.1007/978-3-540-77226-2_19]
Sukawattanavijit C, Chen J and Zhang H S. 2017. GA-SVM algorithm for improving land-cover classification using SAR and optical remote sensing data. IEEE Geoscience and Remote Sensing Letters, 14(3): 284-288 [DOI: 10.1109/LGRS.2016.2628406http://dx.doi.org/10.1109/LGRS.2016.2628406]
Sylvester E V A, Bentzen P, Bradbury I R, Clément M, Pearce J, Horne J and Beiko R G. 2018. Applications of random forest feature selection for fine-scale genetic population assignment. Evolutionary Applications, 11(2): 153-165 [DOI: 10.1111/eva.12524http://dx.doi.org/10.1111/eva.12524]
Wang X X, Gao X W, Zhang Y Z, Fei X Y, Chen Z, Wang J, Zhang Y Y and Zhao H M. 2019. Land-cover classification of coastal wetlands using the RF algorithm for Worldview-2 and Landsat 8 images. Remote Sensing, 11(16): 1927 [DOI: 10.3390/rs11161927http://dx.doi.org/10.3390/rs11161927]
Wei Y H, Wang Y, He X M, Guo K and Chang R C. 2020. Method of terrain classification based on GF-5 satellite remote sensing images. Modern Electronics Technique, 43(18): 85-88
魏友华, 王瑶, 何雪梅, 郭科, 常睿春. 2020. 基于“高分五号”遥感图像的地物分类方法. 现代电子技术, 43(18): 85-88 [DOI: 10.16652/j.issn.1004-373x.2020.18.022http://dx.doi.org/10.16652/j.issn.1004-373x.2020.18.022]
Xu L, Ming D P, Zhou W, Bao H Q, Chen Y Y and Ling X. 2019. Farmland extraction from high spatial resolution remote sensing images based on stratified scale pre-estimation. Remote Sensing, 11(2): 108 [DOI: 10.3390/rs11020108http://dx.doi.org/10.3390/rs11020108]
Yan X A and Jia M P. 2018. A novel optimized SVM classification algorithm with multi-domain feature and its application to fault diagnosis of rolling bearing. Neurocomputing, 313: 47-64 [DOI: doi.org/10.1016/j.neucom.2018.05.002http://dx.doi.org/doi.org/10.1016/j.neucom.2018.05.002]
Yin H, Prishchepov A V, Kuemmerle T, Bleyhl B, Buchner J and Radeloff V C. 2018. Mapping agricultural land abandonment from spatial and temporal segmentation of Landsat time series. Remote Sensing of Environment, 210: 12-24 [DOI: 10.1016/j.rse.2018.02.050http://dx.doi.org/10.1016/j.rse.2018.02.050]
Yoo C, Han D, Im J and Bechtel B. 2019. Comparison between convolutional neural networks and random forest for local climate zone classification in mega urban areas using Landsat images. ISPRS Journal of Photogrammetry and Remote Sensing, 157: 155-170 [DOI: 10.1016/j.isprsjprs.2019.09.009http://dx.doi.org/10.1016/j.isprsjprs.2019.09.009]
Yu Q Y, Xiang M T, Wu W B and Tang H J. 2019. Changes in global cropland area and cereal production: an inter-country comparison. Agriculture, Ecosystems and Environment, 269: 140-147 [DOI: 10.1016/j.agee.2018.09.031http://dx.doi.org/10.1016/j.agee.2018.09.031]
Yuan J W, Wu C, Du B, Zhang L P and Wang S G. 2020. Analysis of landscape pattern on urban land use based on GF-5 hyperspectral data. Journal of Remote Sensing, 24(4): 465-478
袁静文, 武辰, 杜博, 张良培, 王树根. 2020. 高分五号高光谱遥感影像的城市土地利用景观格局分析. 遥感学报, 24(4): 465-478 [DOI: 10.11834/jrs.20209252http://dx.doi.org/10.11834/jrs.20209252]
Zhang S C. 2020. Cost-sensitive KNN classification. Neurocomputing, 391: 234-242 [DOI: 10.1016/j.neucom.2018.11.101http://dx.doi.org/10.1016/j.neucom.2018.11.101]
Zhou X L, Su G Q, Wang L J, Nie S D and Ge X M. 2017. The inversion of 2D NMR relaxometry data using L1 regularization. Journal of Magnetic Resonance, 275: 46-54 [DOI: 10.1016/j.jmr.2016.12.003http://dx.doi.org/10.1016/j.jmr.2016.12.003]
相关作者
相关机构