基于深度学习与随机森林的PM2.5浓度预测模型
A PM2.5 prediction model based on deep learning and random forest
- 2023年27卷第2期 页码:430-440
纸质出版日期: 2023-02-07
DOI: 10.11834/jrs.20210504
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2023-02-07 ,
扫 描 看 全 文
彭豪杰,周杨,胡校飞,张龙,彭杨钊,蔡心悦.2023.基于深度学习与随机森林的PM2.5浓度预测模型.遥感学报,27(2): 430-440
Peng H J,Zhou Y,Hu X F,Zhang L,Peng Y Z and Cai X X. 2023. A PM2.5 prediction model based on deep learning and random forest. National Remote Sensing Bulletin, 27(2):430-440
针对PM
2.5
浓度预测中传统机器学习算法无法对数据内部隐藏特征进行深层次挖掘,而深度学习算法在数据较少情况下效果不佳的问题,综合考虑深度学习与随机森林的特点,提出一种基于深度学习与随机森林的PM
2.5
浓度预测组合模型。模型以气溶胶光学厚度(AOD)遥感数据、气象再分析数据和PM
2.5
地面观测数据构建训练数据集,通过深度学习方法对训练数据内部深层次隐含特征进行提取,将提取得到的隐含特征用于随机森林模型训练,并使用随机森林回归算法得到PM
2.5
浓度的预测值。为验证方法的有效性,以河南省区域2018年—2019年的PM
2.5
浓度估算为例,将原始特征与利用CNN、LSTM和CNN_LSTM所提取特征共同构建的新特征分别通过随机森林回归、支持向量回归以及K近邻回归等3种传统机器学习方法进行训练和预测。实验结果表明,在较少数据情况下PMCOM模型无论是在整体预测还是在分季节预测场景下均具有较好的预测精度,其中以LSTM为特征选择器,RF为回归器的组合模型是本实验的最优模型,在即使只有35%的数据作为训练样本时,整体预测实验中
R
2
仍可达0.89,各季节预测实验中
R
2
均在0.75以上。
At present
the situation of environmental pollution in China is grim
among which regional compound air pollution dominated by PM
2.5
is the most prominent. Aerosol Optical Depth (AOD) is a key physical quantity used to characterize the degree of atmospheric turbidity
which represents the intensity of aerosol light reduction. Many studies have shown that there is a strong correlation between AOD and PM
2.5
. Using the AOD data obtained by satellite remote sensing combined with other influencing factors to analyze the change mechanism of PM
2.5
is of great significance to air pollution prevention and the protection of human health.
The diffusion of PM
2.5
is an extremely complicated process
and the PM
2.5
prediction model based on the statistical regression method can only describe a relatively simple nonlinear relationship. However
the estimation of PM
2.5
is considered to be a more complex multivariable nonlinear problem. Compared with statistical regression models
the PM
2.5
prediction model based on traditional machine learning algorithms can deal with more complex nonlinear problems. However
its ability to process historical data is still limited
so it is difficult to mine the variation law of pollutant concentrations from the perspective of big data. Compared with the traditional machine learning method
the models based on deep learning can dig deep features hidden in historical data. However
the AOD remote sensing data are affected by image time resolution and pixel cloud pollution
which will greatly reduce the effective data. Because the construction of a deep learning method depends on a large amount of training data
less training data will seriously affect the model accuracy.
Aiming at the problem that the traditional machine learning algorithm cannot deeply mine the hidden association features in data and the deep learning algorithm has a poor effect under the condition of less data
a combined model of PM
2.5
prediction based on deep learning and random forest is proposed. The model builds a training dataset with AOD remote sensing data
meteorological reanalysis data and PM
2.5
ground observation data. The deep hidden features in the training data are extracted by the powerful feature extraction ability of the deep learning model first. Then
the extracted hidden features are used in the training of the random forest model
and the predicted value of PM
2.5
concentration is obtained by the random forest regression algorithm.
To verify the effectiveness of this method
a series of experiments were carried out. The results demonstrate that PMCOM has better prediction accuracy in both overall prediction and seasonal prediction scenarios. The combination of random forest and long- and short-term memory neural networks is the best for this experiment. Even when only 35% of the data are used for training
R
2
in the overall prediction experiment can reach 0.89
and R
2
in each season prediction experiment is also above 0.75.
The combination of deep learning and random forest can reduce the dependence of deep learning models on the amount of data by random forest and make full use of the high-level hidden features of existing historical data. In this way
it makes up for the deficiency of mining the internal associated features of data by a random forest model and improves the prediction accuracy of PM
2.5
concentration.
遥感PM2.5深度学习随机森林长短时神经网络PM2.5组合模型
remote sensingPM2.5Deep LearningRandom ForestLSTMPMCOM
Breiman L. 1996. Bagging predictors. Machine Learning, 24(2): 123-140 [DOI: 10.1007/BF00058655http://dx.doi.org/10.1007/BF00058655]
Daryanoosh S M, Goudarzi G, Mohammadi M J, Armin H, Khaniabadi Y O and Sadeghi S. 2017. Exposure to particulate matter and its health impacts an AirQ approach. Archives of Hygiene Science, 6(1): 88-95 [DOI: 10.29252/ArchHygSci.6.1.88http://dx.doi.org/10.29252/ArchHygSci.6.1.88]
Du X, Feng J Y, Lv S Q and Shi W. 2017. PM2.5 concentration prediction model based on random forest regression analysis. Telecommunications Science, 33(7): 66-75
杜续, 冯景瑜, 吕少卿, 石薇. 2017. 基于随机森林回归分析的PM2.5浓度预测模型. 电信科学, 33(7): 66-75 [DOI: 10.11959/j.issn.1000-0801.2017211http://dx.doi.org/10.11959/j.issn.1000-0801.2017211]
Duan J X, Zhai W X, Cheng C Q and Chen B. 2018. Socio-economic factors influencing the spatial distribution of PM2.5 concentrations in China: an exploratory analysis. Environmental Science, 39(5): 2498-2504
段杰雄, 翟卫欣, 程承旗, 陈波. 2018. 中国PM2.5污染空间分布的社会经济影响因素分析. 环境科学, 39(5): 2498-2504 [DOI: 10.13227http://dx.doi.org/10.13227∕j.hjkx.201709087]
Engel-Cox J A, Holloman C H, Coutant B W and Hoff R M. 2004. Qualitative and quantitative evaluation of MODIS satellite sensor data for regional and urban scale air quality. Atmospheric Environment, 38(16): 2495-2509 [DOI: 10.1016/j.atmosenv.2004.01.039http://dx.doi.org/10.1016/j.atmosenv.2004.01.039]
Geng G N, Meng X, He K B and Liu Y. 2020. Random forest models for PM2.5 speciation concentrations using MISR fractional AODs. Environmental Research Letters, 15(3): 034056 [DOI: 10.1088/1748-9326/ab76dfhttp://dx.doi.org/10.1088/1748-9326/ab76df]
Gers F A, Schraudolph N N, and Schmidhuber J. 2003. Learning precise timing with LSTM recurrent networks. The Journal of Machine Learning Research, 3(1): 115-143 [DOI: 10.1162/153244303768966139http://dx.doi.org/10.1162/153244303768966139]
Guo J P, Xia F, Zhang Y, Liu H, Li J, Lou M Y, He J, Yan Y, Wang F, Min M and Zhai P M. 2017. Impact of diurnal variability and meteorological factors on the PM2.5-AOD relationship: Implications for PM2.5 remote sensing. Environmental Pollution, 221: 94-104 [DOI: 10.1016/j.envpol.2016.11.043http://dx.doi.org/10.1016/j.envpol.2016.11.043]
Ho T K. 1998. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8): 832-844 [DOI: 10.1109/34.709601http://dx.doi.org/10.1109/34.709601]
Huang B, Wu B and Barry M. 2010. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. International Journal of Geographical Information Science, 24(3): 383-401 [DOI: 10.1080/13658810802672469http://dx.doi.org/10.1080/13658810802672469]
Huang C J and Kuo P H. 2018. A deep CNN-LSTM model for particulate matter (PM2.5) forecasting in smart cities. Sensors, 18(7): 2220 [DOI: 10.3390/s18072220http://dx.doi.org/10.3390/s18072220]
Huang J, Zhang F, Du Z H, Liu R Y and Cao X P. 2019. Hourly concentration prediction of PM2.5 based on RNN-CNN ensemble deep learning model. Journal of Zhejiang University (Science Edition), 46(3): 370-379
黄婕, 张丰, 杜震洪, 刘仁义, 曹晓裴. 2018. 基于RNN-CNN集成深度学习模型的PM2.5小时浓度预测. 浙江大学学报(理学版), 46(3): 370-379 [DOI: 10.3785/j.issn.1008-9497.2019.03.016http://dx.doi.org/10.3785/j.issn.1008-9497.2019.03.016]
Jiang M, Sun W W, Yang G and Zhang D F. 2017. Modelling seasonal GWR of daily PM2.5 with proper auxiliary variables for the Yangtze River delta. Remote Sensing, 9(4): 346 [DOI: 10.3390/rs9040346http://dx.doi.org/10.3390/rs9040346]
Jiao L M, Xu G, Zhao S L, Ma M, Dong T and Li J Y. 2015. LUR-based simulation of the spatial distribution of PM2.5 of Wuhan. Geomatics and Information Science of Wuhan University, 40(8): 1088-1094
焦利民, 许刚, 赵素丽, 马明, 董婷, 李江月. 2015. 基于LUR的武汉市PM2.5浓度空间分布模拟. 武汉大学学报(信息科学版), 40(8): 1088-1094 [DOI: 10.13203/j.whugis20130785http://dx.doi.org/10.13203/j.whugis20130785]
LeCun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324 [DOI: 10.1109/5.726791http://dx.doi.org/10.1109/5.726791]
Li S S, Chen L F, Xiong X Z, Tao J H, Su L, Han D and Liu Y. 2013. Retrieval of the haze optical thickness in North China Plain using MODIS Data. IEEE Transactions on Geoscience and Remote Sensing, 51(5): 2528-2540 [DOI: 10.1109/TGRS.2012.2214038http://dx.doi.org/10.1109/TGRS.2012.2214038]
Lipton Z C, Berkowitz J and Elkan C. 2015. A critical review of recurrent neural networks for sequence learning. arXiv: 1506.00019
Liu H N, Zhu Y, Lin H J and Wang X Y. 2015. Observation and analysis of haze characteristics in Suzhou based on automatic station data. China Environmental Science, 35(3): 668-675
刘红年, 朱焱, 林惠娟, 王学远. 2015. 基于自动站资料的苏州灰霾天气分析. 中国环境科学, 35(3): 668-675
Liu L Y, Zhang Y J, Li Y S, Liu X Y and Wan Y. 2020. PM2.5 inversion using remote sensing data in Eastern China based on deep learning. Environmental Science, 41(4): 1513-1519
刘林钰, 张永军, 李彦胜, 刘欣怡, 万一. 2020. 基于深度学习的华东地区PM2.5浓度遥感反演. 环境科学, 41(4): 1513-1519 [DOI: 10.13227/j.hjkx.201909209http://dx.doi.org/10.13227/j.hjkx.201909209]
Ma Z W, Hu X F, Sayer A M, Levy R, Zhang Q, Xue Y G, Tong S L, Bi J, Huang L and Liu Y. 2016. Satellite-based spatiotemporal trends in PM2.5 concentrations: China, 2004-2013. Environmental Health Perspectives, 124(2): 184-192 [DOI: 10.1289/ehp.1409481http://dx.doi.org/10.1289/ehp.1409481]
Qin D M, Ding Z J, Jin Y P and Zhao Q. 2019. An air pollutant prediction model based on auto-encoder network. Journal of Tongji University (Natural Science), 47(5): 681-687
秦东明, 丁志军, 金玉鹏, 赵勤. 2019. 基于自编码网络的空气污染物浓度预测. 同济大学学报(自然科学版), 47(5): 681-687 [DOI: 10.11908/j.issn.0253-374x.2019.05.013http://dx.doi.org/10.11908/j.issn.0253-374x.2019.05.013]
Qu Y, Qian X, Song H Q, He J, Li J H and Xiu H. 2019. Machine-learning-based model and simulation analysis of PM2.5 concentration prediction in Beijing. Chinese Journal of Engineering, 41(3): 401-407
曲悦, 钱旭, 宋洪庆, 何杰, 李剑辉, 修昊. 2019. 基于机器学习的北京市PM2.5浓度预测模型及模拟分析. 工程科学学报, 41(3): 401-407 [DOI: 0.13374/i.issn2095-9389.2019.03.01http://dx.doi.org/0.13374/i.issn2095-9389.2019.03.01]
Rumelhart D E, Hinton G E and Williams R J. 1986. Learning representations by back-propagating errors. Nature, 323(6088): 533-536 [DOI: 10.1038/323533a0http://dx.doi.org/10.1038/323533a0]
Shen H F, Zhou M, Li T W and Zeng C. 2019. Integration of remote sensing and social sensing data in a deep learning framework for hourly urban PM2.5 mapping. International Journal of Environmental Research and Public Health, 16(21): 4102 [DOI: 10.3390/ijerph16214102http://dx.doi.org/10.3390/ijerph16214102]
Shen Y, Chen C L, Qian J and Liu J. 2018. High resolution PM2.5 estimation using remote sensing data based on random forest—a case study of Guangdong, China. Journal of Integration Technology, 7(3): 31-41
申原, 陈朝亮, 钱静, 刘军. 2018. 基于随机森林的高分辨率PM2.5遥感反演——以广东省为例. 集成技术, 7(3): 31-41 [DOI: 10.3969/j.issn.2095-3135.2018.03.004http://dx.doi.org/10.3969/j.issn.2095-3135.2018.03.004]
Wang Z B, Fang C L, Xu G and Pan Y P. 2015. Spatial-temporal characteristics of the PM2.5 in China in 2014. Acta Geographica Sinica, 70(11): 1720-1734
王振波, 方创琳, 许光, 潘月鹏. 2015. 2014年中国城市PM2.5浓度的时空变化规律. 地理学报, 70(11): 1720-1734 [DOI: 10.11821/dlxb201511003http://dx.doi.org/10.11821/dlxb201511003]
Xia X G, Chen H B, Li Z Q, Wang P C and Wang J K. 2007. Significant reduction of surface solar irradiance induced by aerosols in a suburban region in northeastern China. Journal of Geophysical Research, 112(D22): D22S02 [DOI: 10.1029/2006JD007562http://dx.doi.org/10.1029/2006JD007562]
Xia X S, Chen J J, Wang J J and Cheng X F. 2020. PM2.5 concentration influencing factors in China based on the random forest model. Environmental Science, 41(5): 2057-2065
夏晓圣, 陈菁菁, 王佳佳, 程先富. 2020. 基于随机森林模型的中国PM2.5浓度影响因素分析. 环境科学, 41(5): 2057-2065 [DOI: 10.13227/j.hjkx.201910126http://dx.doi.org/10.13227/j.hjkx.201910126]
Xiang S L, Liu J F, Tao W, Yi K, Xu J Y, Hu X R, Liu H Z, Wang Y Q, Zhang Y Z, Yang H Z, Hu J Y, Wan Y, Wang X J, Ma J M, Wang X L and Tao S. 2020. Control of both PM2.5 and O3 in Beijing-Tianjin-Hebei and the surrounding areas. Atmospheric Environment, 224: 117259 [DOI: 10.1016/j.atmosenv.2020.117259http://dx.doi.org/10.1016/j.atmosenv.2020.117259]
Xiao Q Y, Wang Y J, Chang H H, Meng X, Geng G N, Lyapustin A and Liu Y, 2017. Full-coverage high-resolution daily PM2.5 estimation using MAIAC AOD in the Yangtze River Delta of China. Remote Sensing of Environment, 199: 437-446 [DOI: 10.1016/j.rse.2017.07.023http://dx.doi.org/10.1016/j.rse.2017.07.023]
Xie H F, Ji L, Wang Q and Jia Z J. 2019. Research of PM2.5 prediction system based on CNNs-GRU in Wuxi urban area. IOP Conference Series: Earth and Environmental Science, 300(3): 032073 [DOI: 10.1088/1755-1315/300/3/032073http://dx.doi.org/10.1088/1755-1315/300/3/032073]
Yu D H, Zhang B M, Zhao C, Guo H T and Lu J. 2020. Scene classification of remote sensing image using ensemble convolutional neural network. Journal of Remote Sensing, 24(6): 717-727
余东行, 张保明, 赵传, 郭海涛, 卢俊. 2020. 联合卷积神经网络与集成学习的遥感影像场景分类. 遥感学报, 24(6): 717-727 [DOI: 10.11834/jrs.20208273http://dx.doi.org/10.11834/jrs.20208273]
Zhang C J, Dai L J and Ma L M. 2017. Dynamic model for forecasting concentration of PM2.5 one hour in advance using support vector machine. Infrared and Laser Engineering, 46(2): 226002
张长江, 戴李杰, 马雷鸣. 2017. 应用SVM的PM2.5未来一小时浓度动态预报模型. 红外与激光工程, 46(2): 226002 [DOI: 10.3788/IRLA201746.0226002http://dx.doi.org/10.3788/IRLA201746.0226002]
Zhao W F, Lin R S, Tang W and Zhou Y. 2019. Forecasting model of short-term PM2.5 concentration based on deep learning. Journal of Nanjing Normal University (Natural Science Edition), 42(3): 32-41
赵文芳, 林润生, 唐伟, 周勇. 2019. 基于深度学习的PM2.5短期预测模型. 南京师大学报(自然科学版), 42(3): 32-41 [DOI: 10.3969/j.issn.1001-4616.2019.03.005http://dx.doi.org/10.3969/j.issn.1001-4616.2019.03.005]
相关作者
相关机构