通过训练样本采样处理改善小宗作物遥感识别精度
Improvement in recognition accuracy of minority crops by resampling of imbalanced training datasets of remote sensing
- 2019年23卷第4期 页码:730-742
纸质出版日期: 2019-7 ,
录用日期: 2018-4-10
DOI: 10.11834/jrs.20197478
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2019-7 ,
录用日期: 2018-4-10
扫 描 看 全 文
樊东东, 李强子, 王红岩, 张源, 杜鑫, 沈宇. 2019. 通过训练样本采样处理改善小宗作物遥感识别精度. 遥感学报, 23(4): 730–742
Fan D D, Li Q Z, Wang H Y, Zhang Y, Du X and Shen Y. 2019. Improvement in recognition accuracy of minority crops by resampling of imbalanced training datasets of remote sensing. Journal of Remote Sensing, 23(4): 730–742
训练样本质量是决定农作物遥感识别精度的关键因素,虽然高空间分辨率卫星的发展有效地解决了农作物遥感识别过程中的混合像元问题,但是当区域内不同作物种植面积差异较大时,训练集中不同类别样本数量往往相差较大,这样的不均衡数据集影响分类器的训练,导致少数类别的识别精度不理想。为研究作物遥感识别过程中的不均衡样本问题,本文基于GF-2号卫星数据,首先挖掘了地物的光谱信息、纹理信息,用特征递归消除RFE (Recursive Feature Elimination)方法进行特征优选,然后从数据处理的角度采用了5种采样算法对不均衡训练集进行处理,最后使用采样后的均衡数据集训练分类器,对比数据采样前后决策树与Adaboost(Adaptive Boosting)两种分类器的识别结果,发现:(1)经过采样处理后两种分类算法明显提升了小宗作物的分类精度;(2)经过ADASYS (Adaptive synthetic sampling)采样处理后,分类器性能提升最多,决策树的Kappa系数提高了14.32%,Adaboost的Kappa系数提高了10.23%,达到最高值0.9336;(3)过采样的处理效果优于欠采样,过采样对分类器的性能提升更多。综上所述,选择合适的采样方法和分类方法是提高不均衡数据集遥感分类精度的有效途径。
The rapid development of high-spatial-resolution satellites has effectively alleviated the problem of mixed pixels in satellite images
thereby enabling extraction of the meticulous distribution of crops from them. The classification of remote sensing images is a quick way to obtain accurate agricultural information. However
the accuracy of supervised classification using remote sensing images is affected by several factors
such as classifier algorithm and input datasets. The imbalanced training samples
which indicates the number of training samples of some categories is considerably smaller or larger than the others
often results in poor classification accuracy for the minority classes. To improve this situation and generalization performance of classifier
this research focused on proper utilization of resampling techniques and classification methodologies for achieving perfect performance of remote sensing image classification. We investigated the aforementioned images by data mining approaches including spectrum and texture features and selection of optimized features based on recursive feature elimination. Then
five resample methods
namely
three over-resampling methods and two under-sampling methods
were separately used to balance the initial training datasets. Finally
we tested the resampled datasets by utilizing two classifiers (decision tree and AdaBoost) and evaluated the performance of each one in terms of kappa coefficient
overall accuracy
producer’s accuracy
and user’s accuracy. The overall classification accuracy and kappa coefficient improved considerably on decision tree (14.32%) and AdaBoost classifier (10.23%) after resampling. The AdaBoost obtained the highest value of kappa coefficient (0.9336) by using the training dataset resampled with ADASYN. The accuracy of classification on minority crops was also increased by resampling training datasets. Meanwhile
feature selection results showed that vegetation and texture indexes were more efficient than features of original reflection ratio to classification. Over-resampling methods had advantages in relieving the influence of imbalanced training samples to classifiers. Resampling process to training datasets has remarkable advantage in improving the classifier performance if the training datasets are critically imbalanced. The detailed accuracy assessment shows that over-resampling method is more excellent than under-resampling. The reason is that some significant samples are lost during under-resampling
but helpful and useful information is added after over-resampling. AdaBoost classifier performs better than decision tree in terms of solving imbalanced training datasets. Combination of proper resampling approaches and compatible classifier can significantly improve the accuracy of minority classes in the situation of imbalanced dataset classification.
作物识别不均衡数据集采样遥感小宗作物(GF-2)高分二号
crops recognitionimbalanced datasetsresamplingremote sensingminority cropsGF-2
Arenas-Toledo J M and Epiphanio J C N. 2011. Harmonic amplitude-terms mask to highlight agriculture in the savanna domain below the Brazilian Amazonian frontier. International Journal of Remote Sensing, 32(18): 5021–5034
曹莹, 苗启广, 刘家辰, 高琳. 2013. AdaBoost算法研究进展与展望. 自动化学报, 39(6): 745–758
Cao Y, Miao Q G, Liu J C and Gao L. 2013. Advance and prospects of AdaBoost algorithm. Acta Automatica Sinica, 39(6): 745–758
Chawla N V, Bowyer K W, Hall L O and Kegelmeyer W P. 2002. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1): 321–357
Chen C F, Chen C R and Son N T. 2012. Investigating rice cropping practices and growing areas from MODIS data using empirical mode decomposition and support vector machines. Giscience and Remote Sensing, 49(1): 117–138
Cutler D R, Edwards T C Jr, Beard K H, Cutler A, Hess K T, Gibson J and Lawler J J. 2007. Random forests for classification in ecology. Ecology, 88(11): 2783–2792
丁潇. 2014. 黑龙江省农作物种植结构布局研究. 哈尔滨: 东北农业大学
Ding X. 2014. Study on Distribution of Crop’s Structure in Heilongjiang. Harbin: Northeast Agricultural University
Elhassan A T, Aljourf M, Al-Mohanna F and Shoukri M. 2017. Classification of imbalance data using tomek link (T-Link) combined with random under-sampling (RUS) as a data reduction method. Global Journal of Technology and Optimization(S1): 111
Foody G M and Mathur A. 2006. The use of small training sets containing mixed pixels for accurate hard image classification: training on mixed spectral responses for classification by a SVM. Remote Sensing of Environment, 103(2): 179–189
García V, Sánchez J S and Mollineda R A. 2011. Classification of high dimensional and imbalanced hyperspectral imagery data//Proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis. Las Palmas de Gran Canaria, Spain: Springer [DOI: 10.1007/978-3-642-21257-4_80]
Haboudane D, Miller J R, Tremblay N, Zarco-Tejada P J and Dextraze L. 2002. Integrated narrow-band vegetation indices for prediction of crop chlorophyll content for application to precision agriculture. Remote Sensing of Environment, 81(2/3): 416–426
Han H, Wang W Y and Mao B H. 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning//Huang D S, Zhang X P and Huang G B, eds. Advances in Intelligent Computing. Berlin Heidelberg: Springer [DOI: 10.1007/11538059_91]
Haralick R M, Shanmugam K and Dinstein I. 1973. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics, SMC-3(6): 610–621
Harris R. 2003. Remote sensing of agriculture change in Oman. International Journal of Remote Sensing, 24(23): 4835–4852
He H B, Bai Y, Garcia E A and Li S T. 2008. ADASYN: adaptive synthetic sampling approach for imbalanced learning//Proceedings of 2008 IEEE International Joint Conference on Neural Networks. Hong Kong, China: IEEE [DOI: 10.1109/IJCNN.2008.4633969]
Hixson M, Scholz D, Fuhs N and Akiyama T. 1980. Evaluation of several schemes for classification of remotely sensed data. Photogrammetric Engineering and Remote Sensing, 46(12): 1547–1553
胡琼, 吴文斌, 宋茜, 余强毅, 杨鹏, 唐华俊. 2015. 农作物种植结构遥感提取研究进展. 中国农业科学, 48(10): 1900–1914
Hu Q, Wu W B, Song Q, Yu Q Y, Yang P and Tang H J. 2015. Recent progresses in research of crop patterns mapping by using remote sensing. Scientia Agricultura Sinica, 48(10): 1900–1914
黄东山. 2011. 特征选择及半监督分类方法研究. 武汉: 华中科技大学
Huang D S. 2011. Research on Feature Selection and Semi-Supervised Classification. Wuhan: Huazhong University of Science and Technology
贾坤, 李强子. 2013. 农作物遥感分类特征变量选择研究现状与展望. 资源科学, 35(12): 2507–2516
Jia K and Li Q Z. 2013. Review of features selection in crop classification using remote sensing data. Resources Science, 35(12): 2507–2516
Laurikkala J. 2001. Improving identification of difficult small classes by balancing class distribution//Quaglini S, Barahona P and Andreassen S, eds. Artificial Intelligence in Medicine. Berlin Heidelberg: Springer [DOI: 10.1007/3-540-48229-6_9]
李强子, 吴炳方. 2004. 作物种植成数的遥感监测精度评价. 遥感学报, 8(6): 581–587
Li Q Z and Wu B F. 2004. Accuracy assessment of planted area proportion using Landsat TM imagery. Journal of Remote Sensing, 8(6): 581–587
刘佳, 王利民, 杨福刚, 杨玲波, 王小龙. 2015. 基于HJ时间序列数据的农作物种植面积估算. 农业工程学报, 31(3): 199–206
Liu J, Wang L M, Yang F G, Yang L B and Wang X L. 2015. Remote sensing estimation of crop planting area based on HJ time-series images. Transactions of the Chinese Society of Agricultural Engineering, 31(3): 199–206
刘克宝, 刘述彬, 陆忠军, 宋茜, 刘艳霞, 张冬梅, 吴文斌. 2014. 利用高空间分辨率遥感数据的农作物种植结构提取. 中国农业资源与区划, 35(1): 21–26
Liu K B, Liu S B, Lu Z J, Song Q, Liu Y X, Zhang D M and Wu W B. 2014. Extraction on cropping structure based on high spatial resolution remote sensing data. Chinese Journal of Agricultural Resources and Regional Planning, 35(1): 21–26
刘晓娜, 李宪海, 孙丹峰, 李红, 张微微, 周连第. 2011. SPOT5遥感影像城郊耕地景观提取与廊道立地分析. 农业工程学报, 27(4): 317–323
Liu X N, Li X H, Sun D F, Li H, Zhang W W and Zhou L D. 2011. Landscape extraction and corridor site assessment of farmland in urban fringe using SPOT5 remote sensing image. Transactions of the CSAE, 27(4): 317–323
Mathur A and Foody G M. 2008. Crop classification by support vector machine with intelligently selected training data for an operational application. International Journal of Remote Sensing, 29(8): 2227–2240
Metternicht G. 2003. Vegetation indices derived from high-resolution airborne videography for precision crop management. International Journal of Remote Sensing, 24(14): 2855–2877
Murthy C S, Raju P V and Badrinath K V S. 2003. Classification of wheat crop with multi-temporal images: performance of maximum likelihood and artificial neural networks. International Journal of Remote Sensing, 24(23): 4871–4890
Punera K and Ghosh J. 2008. Consensus-based ensembles of soft clusterings. Applied Artificial Intelligence, 22(7/8): 780–810
Rätsch G, Onoda T and Müller K R. 2001. Soft margins for AdaBoost. Machine Learning, 42(3): 287–320
Rilwani M L and Ikhuoria I A. 2011. Prospects for geoinformatics-based precision farming in the Savanna River basin, Nigeria. International Journal of Remote Sensing, 32(12): 3539–3549
Sarkar A, Majumdar A, Chatterjee S, Chatterjee D, Ray S S and Kartikeyan B. 2008. Study of the potential of alternative crops by integration of multisource data using a neuro-fuzzy technique. International Journal of Remote Sensing, 29(19): 5479–5493
Shukla G, Garg R D, Srivastava H S and Garg P K. 2018. Performance analysis of different predictive models for crop classification across an aridic to ustic area of Indian states. Geocarto International, 33(3): 240–259
Sonobe R, Tani H, Wang X F, Kobayashi N and Shimamura H. 2014. Parameter tuning in the support vector machine and random forest and their performances in cross- and same-year crop classification using TerraSAR-X. International Journal of Remote Sensing, 32(23): 7898–7909
Tan C P, Ewe H T and Chuah H T. 2011. Agricultural crop-type classification of multi-polarization SAR images using a hybrid entropy decomposition and support vector machine technique. International Journal of Remote Sensing, 32(22): 7057–7071
Tumer K and Oza N C. 2003. Input decimated ensembles. Pattern Analysis and Applications, 6(1): 65–77
Waske B, Benediktsson J A and Sveinsson J R. 2009. Classifying remote sensing data with support vector machines and imbalanced training data//Proceedings of the 8th International Workshop on Multiple Classifier Systems. Reykjavik, Iceland: Springer [DOI: 10.1007/978-3-642-02326-2_38]
Wilson D L. 1972. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3): 408–421
吴炳方, 范锦龙, 田亦陈, 李强子, 张磊, 刘兆礼, 张广录, 何隆华, 黄进良, 江晓波, 颜长珍, 许安, 张维奇. 2004a. 全国作物种植结构快速调查技术与应用. 遥感学报, 8(6): 618–627
Wu B F, Fan J L, Tian Y C, Li Q Z, Zhang L, Liu Z L, Zhang G L, He L H, Huang J L, Jiang X B, Yan C Z, Xu A and Zhang W Q. 2004a. A method for crop planting structure inventory and its application. Journal of Remote Sensing, 8(6): 618–627
吴炳方, 许文波, 孙明, 李强子, 黄慧萍. 2004b. 高精度作物分布图制作. 遥感学报, 8(6): 688–695
Wu B F, Xu W B, Sun M, Li Q Z and Huang H P. 2004b. QuickBird imagery for crop pattern mapping. Journal of Remote Sensing, 8(6): 688–695
吴健平, 杨星卫. 1996. 遥感数据监督分类中训练样本的纯化. 国土资源遥感, 8(1): 36–41
Wu J P and Yang X W. 1996. Purification of training samples in supervised classification of remote sensing data. Remote Sensing for Land and Resources, 8(1): 36–41
赵英时. 2003. 遥感应用分析原理与方法. 北京: 科学出版社
Zhao Y S. 2003. The Principle and Method of Analysis of Remote Sensing Application. Beijing: Science Press
朱秀芳, 潘耀忠, 张锦水, 王双, 顾晓鹤, 徐超. 2007. 训练样本对TM尺度小麦种植面积测量精度影响研究(Ⅰ)——训练样本与分类方法间分类精度响应关系研究. 遥感学报, 11(6): 826–837
Zhu X F, Pan Y Z, Zhang J S, Wang S, Gu X H and Xu C. 2007. The effects of training samples on the wheat planting area measure accuracy in TM scale (Ⅰ): the accuracy response of different classifiers to training samples. Journal of Remote Sensing, 11(6): 826–837
相关作者
相关机构