面向不平衡高光谱遥感分类的SMOTE和旋转森林动态集成算法
Dynamic ensemble algorithm of SMOTE and rotation forest for imbalanced hyperspectral remote sensing classification
- 2022年26卷第11期 页码:2369-2381
纸质出版日期: 2022-11-07
DOI: 10.11834/jrs.20210216
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2022-11-07 ,
扫 描 看 全 文
童莹萍,冯伟,宋怡佳,全英汇,黄文江,高连如,朱文涛,邢孟道.2022.面向不平衡高光谱遥感分类的SMOTE和旋转森林动态集成算法.遥感学报,26(11): 2369-2381
Tong Y P,Feng W,Song Y J,Quan Y H,Huang W J,Gao L R,Zhu W T and Xing M D. 2022. Dynamic ensemble algorithm of SMOTE and rotation forest for imbalanced hyperspectral remote sensing classification. National Remote Sensing Bulletin, 26(11):2369-2381
旋转森林RoF(Rotation Forest)是一种功能强大的集成分类器,它在高光谱图像分类中已经获得了很多成功的应用。然而,现实数据经常存在类别不平衡的问题,这使得传统的RoF算法侧重识别多数类别的样本,而忽略了少数类样本的分类精度。SMOTE(Synthetic Minority Oversampling Technique)算法通过模拟生成新样本的方式来增加少数类别样本的数量,进而达到平衡数据集类别的效果;但是SMOTE算法目前主要被用于数据预处理阶段,并且在处理多类问题时具有增加人工噪声的风险。为了解决高光谱数据学习中的多类不平衡问题,本文提出了一个新的SMOTE和RoF动态集成算法;该算法利用动态采样因子技术,将类别分布优化和基分类器训练过程进行融合。本实验利用Indian Pines、Salinas以及Pavia University这3个公开的高光谱数据对新的SMOTE和RoF动态集成算法的性能进行测试,同时选取4种对比算法,包括随机森林、传统的RoF以及通过随机过采样和SMOTE数据预处理后的RoF算法,并且采用总体分类精度、平均分类精度、F-measure、Gmean、最小召回率、集成分类器多样性、模型训练时间以及McNemar测试等为算法性能评价标准。实验结果表明本文方法具有明显的分类优势,可以保证在增加数据总体分类精度的基础上提高小类别样本的识别精度。
Rotation Forest (RoF)
a powerful ensemble classifier
has obtained many successful applications in hyperspectral image classification. However
the data often has the problem of class imbalance. Consequently
the traditional RoF algorithm focuses on identifying the classes with majority samples
ignoring the accuracy of minority samples. The SMOTE (Synthetic Minority Oversampling Technique) algorithm increases the number of minority samples by simulating the way of generating new samples
thereby achieving the effect of balancing the categories of the data set. However
the SMOTE algorithm is mainly used in the data preprocessing stage and has the risk of increasing artificial noise when dealing with multi-class problems. Therefore
a novel dynamic ensemble algorithm based on SMOTE and RoF is proposed in this work to increase the classification accuracy of the multi-class imbalanced hyperspectral data. The proposed algorithm uses a dynamic sampling factor technology to merge the class distribution optimization with the base classifier. This algorithm not only realizes the adaptive generation of class balance data set but also reduces the influence of noise on the base classifier. In this experiment
three public hyperspectral images are used to test the performance of the algorithm
They are Indian Pines
Salinas and Pavia University. Four comparison algorithms are also selected
including random forest
traditional RoF
RoF algorithm with random oversampling
and SMOTE data preprocessing. The overall accuracy
average accuracy
F-measure
Gmean
minimum recall rate
ensemble classifier diversity
model training time
and McNemar test are the algorithm evaluation criteria. The experimental results demonstrate the effectiveness of the proposed method. The novel method not only obtains obvious classification advantages but also increases the recognition accuracy of minority samples while maintaining the overall classification accuracy of the data.
集成学习不平衡分类旋转森林SMOTE动态采样
ensemble learningimbalanced classificationrotation forestSMOTEdynamic sampling
Arshad A, Riaz S and Jiao L C. 2019. Semi-supervised deep fuzzy C-mean clustering for imbalanced multi-class classification. IEEE Access, 7: 28100-28112 [DOI: 10.1109/ACCESS.2019.2901860http://dx.doi.org/10.1109/ACCESS.2019.2901860]
Bandara A, Hettiarachchi Y, Hettiarachchi K, Munasinghe S, Wijesinghe I, Kusal H, Sidath M, Ishara W and Thayasivam U. 2020. A generalized ensemble machine learning approach for landslide susceptibility modeling//Sharma N, Chakrabarti A and Balas V E eds. Data Management, Analytics and Innovation. Singapore: Springer, 1016: 71-93 [DOI: 10.1007/978-981-13-9364-8_6http://dx.doi.org/10.1007/978-981-13-9364-8_6]
Bhagat R C and Patil S S. 2015. Enhanced SMOTE algorithm for classification of imbalanced big-data using Random Forest//2015 IEEE International Advance Computing Conference (IACC). Banglore: IEEE [DOI: 10.1109/IADCC.2015.7154739http://dx.doi.org/10.1109/IADCC.2015.7154739]
Breiman L. 2001. Random Forests. Machine Learning, 45: 5-32[DOI: 10.1023/A:1010933404324http://dx.doi.org/10.1023/A:1010933404324]
Cai L and Zhang G. 2019. Hyperspectral image classification with imbalanced data based on oversampling and convolutional neural network//Proceedings of SPIE 11342, AOPC 2019: AI in Optics and Photonics. Beijing: SPIE: 11342 [DOI: 10.1117/12.2543458]
Cai Z X, Wang X Y, Xu J and Jing L P. 2019. Sample adaptive classifier for imbalanced data. Computer Science, 46(1): 94-99
才子昕, 王馨月, 徐剑, 景丽萍. 2019. 样本自适应的不平衡分类器. 计算机科学, 46(1): 94-99 [DOI: 10.11896/j.issn.1002-137X.2019.01.014http://dx.doi.org/10.11896/j.issn.1002-137X.2019.01.014]
Díez-Pastor J F, Rodríguez J J, García-Osorio C I and Kuncheva L I. 2015. Diversity techniques improve the performance of the best imbalance learning ensembles. Information Sciences, 325: 98-117 [DOI: 10.1016/j.ins.2015.07.025http://dx.doi.org/10.1016/j.ins.2015.07.025]
Douzas G, Bacao F and Last F. 2018. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 465: 1-20 [DOI: 10.1016/j.ins.2018.06.056http://dx.doi.org/10.1016/j.ins.2018.06.056]
Du P J, Xia J S, Xue Z H, Tan K, Su H J and Bao R. 2016. Review of hyperspectral remote sensing image classification. Journal of Remote Sensing, 20(2): 236-256
杜培军, 夏俊士, 薛朝辉, 谭琨, 苏红军, 鲍蕊. 2016. 高光谱遥感影像分类研究进展. 遥感学报, 20(2): 236-256 [DOI: 10.11834/jrs.20165022http://dx.doi.org/10.11834/jrs.20165022]
Elreedy D and Atiya A F. 2019. A comprehensive analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences, 505: 32-64 [DOI: 10.1016/j.ins.2019.07.070http://dx.doi.org/10.1016/j.ins.2019.07.070]
Feng W and Bao W X. 2017. Weight-based rotation forest for hyperspectral image classification. IEEE Geoscience and Remote Sensing Letters, 14(11): 2167-2171 [DOI: 10.1109/LGRS.2017.2757043http://dx.doi.org/10.1109/LGRS.2017.2757043]
Feng W, Boukir S and Huang W. 2019a. Margin-based random forest for imbalanced land cover classification//2019 IEEE International Geoscience and Remote Sensing Symposium. Yokohama: IEEE: 3085-3088 [DOI: 10.1109/IGARSS.2019.8898652http://dx.doi.org/10.1109/IGARSS.2019.8898652]
Feng W, Dauphin G, Huang W J, Quan Y H and Liao W Z. 2019b. New margin-based subsampling iterative technique in modified random forests for classification. Knowledge-Based Systems, 182: 104845 [DOI: 10.1016/j.knosys.2019.07.016http://dx.doi.org/10.1016/j.knosys.2019.07.016]
Feng W, Huang W J and Ren J C. 2018. Class imbalance ensemble learning based on the margin theory. Applied Sciences, 8(5): 815[DOI: 10.3390/app8050815http://dx.doi.org/10.3390/app8050815]
Gao L R, Zhang B, Zhang X and Shen X. 2007. Study on the method for estimating the noise in remote sensing images based on local standard deviations. Journal of Remote Sensing, 2007,11, (2): 201-208
高连如, 张兵, 张霞, 申茜. 2007. 基于局部标准差的遥感图像噪声评估方法研究. 遥感学报, (2): 201-208 [DOI: 10.11834/jrs.20070227http://dx.doi.org/10.11834/jrs.20070227]
García S, Zhang Z L, Altalhi A, Alshomrani S and Herrera F. 2018. Dynamic ensemble selection for multi-class imbalanced datasets. Information Sciences, 445-446: 22-37[DOI: 10.1016/j.ins.2018.03.002http://dx.doi.org/10.1016/j.ins.2018.03.002]
Ghosh D and Cabrera J. Enriched random forest for high dimensional genomic data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 1-1 [DOI: 10.1109/TCBB.2021.3089417http://dx.doi.org/10.1109/TCBB.2021.3089417]
Han Z, Gao L R, Zhang B, Sun X and Li Q T. 2020. Nonlinear hyperspectral unmixing algorithm based on deep autoencoder networks. Journal of Remote Sensing, 24(4): 388-400
韩竹, 高连如, 张兵, 孙旭, 李庆亭. 2020. 高分五号高光谱图像自编码网络非线性解混. 遥感学报, 24(4): 388-400 [DOI: 10.11834/jrs.20209188http://dx.doi.org/10.11834/jrs.20209188]
Jimenez-Castaño C, Alvarez-Meza A and Orozco-Gutierrez A. 2020. Enhanced automatic twin support vector machine for imbalanced data classification. Pattern Recognition, 107: 107442 [DOI: 10.1016/j.patcog.2020.107442http://dx.doi.org/10.1016/j.patcog.2020.107442]
Krawczyk B. 2016. Learning from imbalanced data: open challenges and future directions. Progress in Artificial Intelligence, 5(4): 221-232 [DOI: 10.1007/s13748-016-0094-0http://dx.doi.org/10.1007/s13748-016-0094-0]
Mullick S S, Datta S, Dhekane S G and Das S. 2020. Appropriateness of performance indices for imbalanced data classification: an analysis. Pattern Recognition, 102: 107197 [DOI: 10.1016/j.patcog.2020.107197http://dx.doi.org/10.1016/j.patcog.2020.107197]
Pan T T, Zhao J H, Wu W and Yang J. 2020. Learning imbalanced datasets based on SMOTE and Gaussian distribution. Information Sciences, 512: 1214-1233 [DOI: 10.1016/j.ins.2019.10.048http://dx.doi.org/10.1016/j.ins.2019.10.048]
Rodríguez J J, Díez-Pastor J F, Arnaiz-González Á and Kuncheva L I. 2020. Random Balance ensembles for multiclass imbalance learning. Knowledge-Based Systems, 193: 105434[DOI: 10.1016/j.knosys.2019.105434http://dx.doi.org/10.1016/j.knosys.2019.105434]
Rodriguez J J, Kuncheva L I and Alonso C J. 2006. Rotation forest: a new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10): 1619-1630 [DOI: 10.1109/TPAMI.2006.211http://dx.doi.org/10.1109/TPAMI.2006.211]
Tu X, Shen X B, Fu P, Wang T, Sun Q S and Ji Z X. 2020. Discriminant sub-dictionary learning with adaptive multiscale superpixel representation for hyperspectral image classification. Neurocomputing, 409: 131-145 [DOI: 10.1016/j.neucom.2020.05.082http://dx.doi.org/10.1016/j.neucom.2020.05.082]
Zhang Y Q, Lu R Z, Qiao S J, Han N, Gutierrez L A and Zhou J L. 2020. A sampling method of imbalanced data based on sample space. Acta Automatica Sinica,1-14
张永清, 卢荣钊, 乔少杰, 韩楠, Gutierrez L A, 周激流. 2020. 一种基于样本空间的类别不平衡数据采样方法. 自动化学报,1-14 [DOI: 10.16383/j.aas.c200034http://dx.doi.org/10.16383/j.aas.c200034]
Zhou G and Guo F L. 2019. Research on sampling diversity method in ensemble learning base on margin//2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). Taiyuan: Shanxi University of Finance and Economics and hosted by AEIC Academic Exchange Center : 316-319[DOI: 10.1109/MLBDBI48998.2019.00071http://dx.doi.org/10.1109/MLBDBI48998.2019.00071]
Zhou S, Sun L J, Xing W, Feng G J, Ji Y M, Yang J and Liu S C. 2020. Hyperspectral imaging of beet seed germination prediction. Infrared Physics & Technology, 108: 10336 [DOI: 10.1016/j.infrared.2020.103363http://dx.doi.org/10.1016/j.infrared.2020.103363]
相关作者
相关机构