GeoCube: 面向大规模分析的多源对地观测时空立方体
GeoCube: A spatio-temporal cube toward massive and multi-source EO data analysis
- 2022年26卷第6期 页码:1051-1066
纸质出版日期: 2022-06-07
DOI: 10.11834/jrs.20210566
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2022-06-07 ,
扫 描 看 全 文
高凡,乐鹏,姜良存,曹志鹏,梁哲恒,上官博屹,胡磊,赵帅锋.2022.GeoCube: 面向大规模分析的多源对地观测时空立方体.遥感学报,26(6): 1051-1066
Gao F,Yue P,Jiang L C,Cao Z P,Liang Z H,Shangguan B Y,Hu L and Zhao S F. 2022. GeoCube: A spatio-temporal cube toward massive and multi-source EO data analysis. National Remote Sensing Bulletin, 26(6):1051-1066
随着对地立体观测体系的建立,遥感大数据不断累积。传统基于文件、景/幅式的影像组织方式,时空基准不够统一,集中式存储不利于大规模并行分析。对地观测大数据分析仍缺乏一套统一的数据模型与基础设施理论。近年来,数据立方体的研究为对地观测领域大数据分析基础设施提供了前景。基于统一的分析就绪型多维数据模型和集成对地观测数据分析功能,可构建一个基于数据立方的对地观测大数据分析基础设施。因此,本文提出了一个面向大规模分析的多源对地观测时空立方体,相较于现有的数据立方体方法,强调多源数据的统一组织、基于云计算的立方体处理模式以及基于人工智能优化的立方体计算。研究有助于构建时空大数据分析的新框架,同时建立与商业智能领域的数据立方体关联,为时空大数据建立统一的时空组织模型,支持大范围、长时序的快速大规模对地观测数据分析。本文在性能上与开源数据立方做了对比,结果证明提出的多源对地观测时空立方体在处理性能上具有明显优势。
The volume of Earth Observation (EO) data has tremendously increased after the establishment of EO system. Managing such big EO data and turning them into valuable information is a major challenge in EO domain. This study proposes a multisource EO cube toward large-scale analysis.
The infrastructure accommodates multisource geospatial data including raster and vector data. A cube model is designed
and four dimensions including product
space
time
and band dimension are formalized. Several cube explore examples are presented. The infrastructure enables large-scale analysis based on cloud computing technology
and a set of distributed cube objects extending Spark Resilient Distributed Dataset for cube tiles is designed. The distributed cube objects are compatible with multiple data source including raster and vector data. A multi-thread computing method is used together with cloud computing
which forms a hybrid parallelism
to further improve data access and processing efficiency. Batch computation is also used to address the issue that massive number of tiles cannot be loaded into memory at one time. Moreover
a machine learning-based approach is integrated into the cube to enhance parallel geoprocessing. The computational intensity of tiles can be predicted and saved in databases in advance
which eliminates the extra time cost of computational intensity prediction on the fly for those commonly used products. The design and implementation for the cube infrastructure
named GeoCube
is provided. It covers the ingestion and management of multisource geospatial data in the cube
the processing of geospatial/EO queries against different cube dimensions
and high-performance cube computing of large-scale geospatial datasets. The creation of such a geospatial data cube help advance the EO data cube approach while keeping connections to the data cube in the BI domain.
The performance on data query and access
data processing
and load balance is presented. Results demonstrate the advantage of GeoCube infrastructure. Several applications are presented including cube OLAP operations
large-scale time-series analysis
and multisource data cube analysis.
In conclusion
compared with existing cube approaches
the proposed infrastructure emphasizes the accommodation of multisource geospatial data including raster and vector data in the cube
cube tile processing with cloud computing
and artificial intelligence machine learning-enabled cube computation. Such a cube can inherit not only the large-scale processing capabilities of EO data cubes but also the data management capabilities of BI data cubes.
遥感对地观测数据时空立方体人工智能大规模分析
remote sensingEO dataspatio-temporal cubeartificial intelligencelarge-scale analysis
Armstrong M P and Densham P J. 1992. Domain decomposition for parallel processing of spatial problems. Computers, Environment and Urban Systems, 16(6): 497-513 [DOI: 10.1016/0198-9715(92)90041-Ohttp://dx.doi.org/10.1016/0198-9715(92)90041-O]
Bansal S, Kumar P and Singh K. 2006. An improved two-step algorithm for task and data parallel scheduling in distributed memory machines. Parallel Computing, 32(10): 759-774 [DOI: 10.1016/j.parco.2006.08.004http://dx.doi.org/10.1016/j.parco.2006.08.004]
Baumann P. 2017. The datacube manifesto[EB/OL]. https://earthserver.eu/tech/datacube-manifesto/The-Datacube-Manifesto.pdfhttps://earthserver.eu/tech/datacube-manifesto/The-Datacube-Manifesto.pdf.
Baumann P, Dehmel A, Furtado P, Ritsch R, Widmann N. 1999. Spatio-temporal retrieval with RasDaMan//Proceedings of the 25th International Conference on Very Large Data Bases. Edinburgh, Scotland, UK: Morgan Kaufmann Publishers Inc.: 746-749
Baumann P, Rossi A P, Bell B, Clements O, Evans B, Hoenig H, Hogan P, Kakaletris G, Koltsida P, Mantovani S, Marco Figuera R, Merticariu V, Misev D, Pham H B, Siemen S and Wagemann J. 2018. Fostering cross-disciplinary earth science through datacube analytics//Earth Observation Open Science and Innovation. Switzerland: Spring: 91-119 [DOI: 10.1007/978-3-319-65633-5_5http://dx.doi.org/10.1007/978-3-319-65633-5_5]
CEOS. 2020a. Committee on earth observation satellites[EB/OL]. http://ceos.org/http://ceos.org/
CEOS. 2020b. CEOS Open Data Cube[EB/OL]. http://datacube-core.readthedocs.io/en/latest/http://datacube-core.readthedocs.io/en/latest/
Chaudhuri S, Dayal U. 1997. An overview of data warehousing and OLAP technology. ACM SIGMOD Record, 26(1): 65-74 [DOI: 10.1145/248603.248616http://dx.doi.org/10.1145/248603.248616]
Crippen R E. 1990. Calculating the vegetation index faster. Remote Sensing of Environment, 34(1): 71-73 [DOI: 10.1016/0034-4257(90)90085-Zhttp://dx.doi.org/10.1016/0034-4257(90)90085-Z]
DASK. 2020. DASK: Scalable analytics in Python[EB/OL]. https://dask.org/https://dask.org/
Gao B C. 1996. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sensing of Environment, 58(3): 257-266 [DOI: 10.1016/S0034-4257(96)00067-3http://dx.doi.org/10.1016/S0034-4257(96)00067-3]
Huete A, Didan K, Miura T, Rodriguez E P, Gao X and Ferreira L G. 2002. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sensing of Environment, 83(1/2): 195-213 [DOI: 10.1016/S0034-4257(02)00096-2http://dx.doi.org/10.1016/S0034-4257(02)00096-2]
Huete A R. 1988. A soil-adjusted vegetation index (SAVI). Remote Sensing of Environment, 25(3): 295-309 [DOI: 10.1016/0034-4257(88)90106-Xhttp://dx.doi.org/10.1016/0034-4257(88)90106-X]
Kopp S, Becker P, Doshi A, Wright D J, Zhang K X and Xu H. 2019. Achieving the full vision of earth observation data cubes. Data, 4(3): 94 [DOI: 10.3390/data4030094http://dx.doi.org/10.3390/data4030094]
Lewis A, Lacey J, Mecklenburg S, Ross J, Siqueira A, Killough B, Szantoi Z, Tadono T, Rosenqvist A, Goryl P, Miranda N and Hosford S. 2018. CEOS analysis ready data for land (CARD4L) overview//2018 IEEE International Geoscience and Remote Sensing Symposium. Valencia, Spain: IEEE: 7407-7410 [DOI: 10.1109/IGARSS.2018.8519255http://dx.doi.org/10.1109/IGARSS.2018.8519255]
Li D R, Zhang L P and Xia G S. 2014. Automatic analysis and mining of remote sensing big data. Acta Geodaetica et Cartographica Sinica, 43(12): 1211-1216
李德仁, 张良培, 夏桂松. 2014. 遥感大数据自动分析与数据挖掘. 测绘学报, 43(12): 1211-1216 [DOI: 10.13485/j.cnki.11-2089.2014.0187http://dx.doi.org/10.13485/j.cnki.11-2089.2014.0187]
Liao X J. 2021. Scientific and technological progress and development prospect of the earth observation in China in the past 20 years. Journal of Remote Sensing, 25(1): 267-275
廖小罕. 2021. 中国对地观测20年科技进步和发展. 遥感学报, 25(1): 267-275 [DOI: 10.11834/jrs.20211017http://dx.doi.org/10.11834/jrs.20211017]
Mahecha M D, Gans F, Brandt G, Christiansen R, Cornell S E, Fomferra N, Kraemer G, Peters J, Bodesheim P, Camps-Valls G, Donges J F, Dorigo W, Estupinan-Suarez L M, Gutierrez-Velez V H, Gutwin M, Jung M, Londoño M C, Miralles D G, Papastefanou P and Reichstein M. 2020. Earth system data cubes unravel global multivariate dynamics. Earth System Dynamics, 11(1): 201-234 [DOI: 10.5194/esd-11-201-2020http://dx.doi.org/10.5194/esd-11-201-2020]
McFeeters S K. 1996. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. International Journal of Remote Sensing, 17(7): 1425-1432 [DOI: 10.1080/01431169608948714http://dx.doi.org/10.1080/01431169608948714]
Mueller N, Lewis A, Roberts D, Ring S, Melrose R, Sixsmith J, Lymburner L, McIntyre A, Tan P, Curnow S and lp A. 2016. Water observations from space: mapping surface water from 25 years of Landsat imagery across Australia. Remote Sensing of Environment, 174: 341-352 [DOI: 10.1016/j.rse.2015.11.003http://dx.doi.org/10.1016/j.rse.2015.11.003]
Nativi S, Mazzetti P and Craglia M. 2017. A view-based model of data-cube to support big earth data systems interoperability. Big Earth Data, 1(1/2): 75-99 [DOI: 10.1080/20964471.2017.1404232http://dx.doi.org/10.1080/20964471.2017.1404232]
Ren Y B, Chen Z J, Chen G, Han Y and Wang Y J. 2017. A hybrid process/thread parallel algorithm for generating DEM from LiDAR points. ISPRS International Journal of Geo-Information, 6(10): 300 [DOI: 10.3390/ijgi6100300http://dx.doi.org/10.3390/ijgi6100300]
Sudmanns M, Tiede D, Lang S, Bergstedt H, Trost G, Augustin H, Baraldi A and Blaschke T. 2020. Big Earth data: disruptive changes in Earth observation data management and analysis?. International Journal of Digital Earth, 13(7): 832-850 [DOI: 10.1080/17538947.2019.1585976http://dx.doi.org/10.1080/17538947.2019.1585976]
Voidrot M F and Percivall G. 2020. OGC geospatial coverages data cube community practice. IOP Conference Series: Earth and Environmental Science, 509: 012058 [DOI: 10.1088/1755-1315/509/1/012058http://dx.doi.org/10.1088/1755-1315/509/1/012058]
Wagemann J, Clements O, Figuera R M, Rossi A P and Mantovani S. 2018. Geospatial web services pave new ways for server-based on-demand access and processing of Big Earth Data. International Journal of Digital Earth, 11(1): 7-25 [DOI: 10.1080/17538947.2017.1351583http://dx.doi.org/10.1080/17538947.2017.1351583]
Wang S W and Armstrong M P. 2009. A theoretical approach to the use of cyberinfrastructure in geographical analysis. International Journal of Geographical Information Systems, 23(2): 169-193 [DOI: 10.1080/13658810801918509http://dx.doi.org/10.1080/13658810801918509]
Wulder M A and Coops N C. 2014. Satellites: make Earth observations open access. Nature, 513(7516): 30-31 [DOI: 10.1038/513030ahttp://dx.doi.org/10.1038/513030a]
Xu H Q. 2006. Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. International Journal of Remote Sensing, 27(14): 3025-3033 [DOI: 10.1080/01431160600589179http://dx.doi.org/10.1080/01431160600589179]
You S M, Zhang J T and Gruenwald L. 2015. Large-scale spatial join query processing in cloud//The 31st IEEE International Conference on Data Engineering Workshops. Seoul, Korea (South): IEEE: 34-41 [DOI: 10.1109/ICDEW.2015.7129541http://dx.doi.org/10.1109/ICDEW.2015.7129541]
Yue P, Gao F, Shangguan B Y and Yan Z R. 2020. A machine learning approach for predicting computational intensity and domain decomposition in parallel geoprocessing. International Journal of Geographical Information Science, 34(11): 2243-2274 [DOI: 10.1080/13658816.2020.1730850http://dx.doi.org/10.1080/13658816.2020.1730850]
Yue P, Ramachandran R, Baumann P, Khalsa S J S, Deng M X and Jiang L C. 2016. Recent activities in earth data science [technical committees]. IEEE Geoscience and Remote Sensing Magazine, 4(4): 84-89 [DOI: 10.1109/MGRS.2016.2600528http://dx.doi.org/10.1109/MGRS.2016.2600528]
Yue P, Zhang M D and Tan Z Y. 2015. A geoprocessing workflow system for environmental monitoring and integrated modelling. Environmental Modelling and Software, 69: 128-140 [DOI: 10.1016/j.envsoft.2015.03.017http://dx.doi.org/10.1016/j.envsoft.2015.03.017]
Zaharia M, Chowdhury M, Franklin M J, Shenke S and Stoica I. 2010. Spark: cluster computing with working sets//Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Boston, MA: USENIX Association
Zhang L F, Chen H, Sun X J, Fu D J, Tong Q X. 2017. Designing spatial-temporal-spectral integrated storage structure of multi-dimensional remote sensing images. Journal of Remote Sensing, 21(1): 62-73
张立福, 陈浩, 孙雪剑, 付东杰, 童庆禧. 2017. 多维遥感数据时空谱一体化存储结构设计. 遥感学报, 21(1): 62-73 [DOI: 10.11834/jrs.20176091http://dx.doi.org/10.11834/jrs.20176091]
相关作者
相关机构