面向属性空间分布特征的空间聚类
Spatial clustering method considering spatial distribution feature in the attribute domain
- 2017年21卷第6期 页码:917-927
纸质出版日期: 2017-9-15 ,
录用日期: 2017-5-27
DOI: 10.11834/jrs.20176493
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2017-9-15 ,
录用日期: 2017-5-27
扫 描 看 全 文
朱杰, 孙毅中, 李吉龙. 2017. 面向属性空间分布特征的空间聚类. 遥感学报, 21(6): 917–927
Zhu J, Sun Y Z and Li J L. 2017. Spatial clustering method considering spatial distribution feature in the attribute domain. Journal of Remote Sensing, 21(6): 917–927
空间聚类应当同时满足空间位置邻近和属性相似,在此背景下,为满足空间邻近实体之间趋势性和不均匀性的属性聚类需求,提出一种基于图论和信息熵的空间聚类算法。该算法主要是在Delaunay三角网空间位置聚类基础上,通过引入信息熵,采用多元相似性度量方法以解决二元关系在属性聚类中的缺陷,同时基于“等概率最大熵”原则提出了一种局部参数度量方法,用于表达邻近目标间属性分布的局部变化信息。将本文方法与多约束聚类方法和DDBSC聚类方法进行对比分析,结果表明:(1)在属性空间分布不均的情况下,本文方法的聚类精度要高于多约束方法和DDBSC方法,尤其是当属性空间分布不均程度不断扩大时,DDBSC和多约束算法会将空间簇内的实体误判为噪声;(2)在对异常值的敏感性问题上,3类方法都能识别出异常值的位置,但DDBSC和多约束算法对异常值具有一定的敏感性,聚类结果会掩盖属性分布的趋势性,本文方法受异常值影响很小。通过模拟实验和实际算例可以发现,在保证空间邻近的基础上本文方法具有如下优势:第一,能反映实体属性在空间分布中的趋势性特征;第二,能满足属性空间分布不均匀;第三,对异常值具有良好的稳健性。
Spatial clustering is important for spatial data mining and spatial analysis. Spatial objects in the same cluster should be similar in the spatial and attribute domains. Tendency and heterogeneity are important characteristics of geographic phenomena. Currently
most spatial clustering algorithms only consider either tendency or heterogeneity
failing to obtain satisfied clustering results. To overcome these limitations
a spatial clustering method based on graph theory and information entropy is developed in this work. The proposed algorithm involves two main steps: construct spatial proximity relationships and cluster spatial objects with similar attributes. Delaunay triangulation with edge length constraints is first employed to construct spatial proximity relationships among objects. To obtain satisfactory results in spatial clustering with attribute similarity
the information entropy is introduced to overcome the defects of similarity measure with binary relation
which can reflect the clustering tendency of geographical phenomena. Furthermore
a local parameter measurement method based on the principle of “equal probability maximum entropy” is designed to adapt to the local change information of attribute distribution. The performance of the proposed algorithm was evaluated experimentally by comparing the leading state-of-the-art alternatives: DDBSC and multi-constraint algorithms. Results showed that our method outperformed the two other algorithms as attributes are unevenly distributed in space. The sensitivity analysis of these algorithms showed that our method was the least sensitive to outliers. The effectiveness and practicability of the proposed algorithm were validated using simulated and real spatial datasets. Two experiments were performed to illustrate the three advantages of our algorithm: (1) It can reflect the tendency of the entity attribute in the spatial distribution. (2) It can meet the requirement that attributes are unevenly distributed in space. (3) It can discover clusters with arbitrary shape and is robust to outliers.
空间聚类Delaunay三角网信息熵趋势性不均匀性
spatial clusteringDelaunay triangulationinformation entropytendencyheterogeneity
Ahuja N. 1982. Dot pattern processing using Voronoineighborhoods. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-4(3): 336–343
Ai T H and van Oosterom P. 2002. GAP-Tree extensions based on skeletons//Richardson DE and van Oosterom P, eds.Advances in Spatial Data Handling. Berlin Heidelberg: Springer: 501–513 [DOI: 10.1007/978-3-642-56094-1_37]
Bai X, Zhao Y B and Luo S W. 2012. Normalized joint mutual information measure for ground truth based segmentation evaluation. IEICE Transactions on Information and Systems, E95.D(10): 2581–2584
Birant D and Kut A. 2007. ST-DBSCAN: an algorithm for clustering spatial-temporal data. Data and Knowledge Engineering, 60(1): 208–221
邓敏, 刘启亮, 李光强, 程涛. 2010. 基于场论的空间聚类算法. 遥感学报, 14(4): 694–709
Deng M, Liu Q L, Li G Q and Cheng T. 2010. Field-theory based spatial clustering method. Journal of Remote Sensing, 14(4): 694–709 (
邓敏, 刘启亮, 李光强, 黄健柏. 2011. 空间聚类分析及应用. 北京: 科学出版社: 2–5
Deng M, Liu Q L, Li G Q and Huang J B. 2011.Spatial Clustering Analysis and Its Application. Beijing: Science Press: 2–5
Eldershaw C and Hegland M. 1997. Cluster analysis using triangulation//NoyeB J, TeubnerM D and GillA W, eds.Computational Techniques and Applications: CTAC97. Singapore: World Scientific: 201–208
Ester M, Kriegel H P, Sander J and Xu X W. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). Menlo Park, CA: AAAI: 226–231
Estivill-Castro V and Lee I. 2002. Argument free clustering for large spatial point-data sets via boundary extraction from Delaunay Diagram. Computers, Environment and Urban Systems, 26(4): 315–334
郭新辰, 郗仙田, 樊秀玲, 韩啸. 2015. 基于半监督的模糊C-均值聚类算法. 吉林大学学报(理学版), 53(4): 705–709
Guo X C, Xi X T, Fan X L and Han X. 2015. Fuzzy C-Means clustering algorithm based on semi-supervised learning. Journal of Jilin University (Science Edition), 53(4): 705–709 (
贺辉, 胡丹, 余先川. 2016. 基于自适应区间二型模糊聚类的遥感土地覆盖自动分类. 地球物理学报, 59(6): 1983–1993
He H, Hu D and Yu X C. 2016. Land cover classification based on adaptive interval type-2 fuzzy clustering. Chinese Journal of Geophysics, 59(6): 1983–1993 (
Hong S Y and O’Sullivan D. 2012. Detecting ethnic residential clusters using an optimisation clustering method. International Journal of Geographical Information Science, 26(8): 1457–1477
焦利民, 洪晓峰, 刘耀林. 2011. 空间和属性双重约束下的自组织空间聚类研究. 武汉大学学报(信息科学版), 36(7): 862–866
Jiao L M, Hong X F and Liu Y L. 2011. Self-organizing spatial clustering under spatial and attribute constraints. Geomatics and Information Science of Wuhan University, 36(7): 862–866 (
焦利民, 张欣, 毛立凡. 2015. 自组织双重空间聚类算法的城市扩张结构分析应用. 地球信息科学学报, 17(6): 638–643
Jiao L M, Zhang X and Mao L F. 2015. Self-organizing dual spatial clustering algorithm and its application in the analysis of urban sprawl structure. Journal of Geo-Information Science, 17(6): 638–643 (
Kang I S, Kim T W and Li K J. 1997. A spatial data miningmethod by Delaunay triangulation//Proceedingsof the 5th ACM International Workshop on ADVANCES in Geographic Information Systems. Las Vegas, Nevada, USA: ACM: 35–39 [DOI: 10.1145/267825.267836]
李光强, 邓敏, 程涛, 朱建军. 2008. 一种基于双重距离的空间聚类方法. 测绘学报, 37(4): 482–488
Li G Q, Deng M, Cheng T and Zhu J J. 2008. A dual distance based spatial clustering method. ActaGeodaetica et CartographicaSinica, 37(4): 482–488 (
Li H F, Zhang K S and Jiang T. 2004. Minimum entropy clustering and applications to gene expression analysis//Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference. Stanford, CA, USA: IEEE: 142–151 [DOI:10.1109/CSB.2004.1332427]
李凯, 曹喆. 2016. 一种基于神经网络的广义熵模糊聚类算法. 电子学报, 44(8): 1881–1886
Li K and Cao Z. 2016. A fuzzy clustering algorithm with generalized entropy based on neural network. Acta Electronica Sinica, 44(8): 1881–1886 (
Lin C R, Liu K H and Chen M S. 2005. Dual clustering: integrating data clustering over optimization and constraint domains. IEEE Transactions on Knowledge and Data Engineering, 17(5): 628–637
Liu D and Sourina O. 2004. Free-parameters clustering of spatial data with non-uniform density//Proceedings of the2004 IEEE Conference on Cybernetics and Intelligent Systems. Singapore, Singapore:IEEE, 1: 387–392
刘启亮, 邓敏, 石岩, 彭东亮. 2011. 一种基于多约束的空间聚类方法. 测绘学报, 40(4): 509–516
Liu Q L, Deng M, Shi Y and Peng D L. 2011. A novel spatial clustering method based on multi-constraints. ActaGeodaetica et CartographicaSinica, 40(4): 509–516 (
Liu Q L, Deng M, Shi Y and Wang J Q. 2012. A density-based spatial clustering algorithm considering both spatial proximity and attribute similarity. Computers and Geosciences, 46: 296–309
MacQueen J. 1967. Some methods for classification and analysis of multivariate observations//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, Calif: University of California Press, 1: 281–297
Mundur P, Rao Y and Yesha Y. 2006. Keyframe-based video summarization using Delaunay clustering. International Journal on Digital Libraries, 6(2): 219–232
Pakhira M K, Bandyopadlhyay S and Maulik U. 2004. Validity index for crisp and fuzzy clusters. Pattern Recognition, 37(3): 487–501
Sander J, Ester M, Kriegel H P and Xu X W. 1998. Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Mining and Knowledge Discovery, 2(2): 169–194
Shannon C E. 1948. A mathematical theory of communication. The Bell System Technical Journal, 27(3): 379–423
石岩, 刘启亮, 邓敏, 林雪梅. 2012. 融合图论与密度思想的混合空间聚类方法. 武汉大学学报(信息科学版), 37(11): 1276–1280
Shi Y, Liu Q L, Deng M and Lin X M. 2012. A hybrid spatial clustering method based on graph theory and spatial density. Geomatics and Information Science of Wuhan University, 37(11): 1276–1280 (
石岩, 邓敏, 刘启亮, 唐建波. 2013. 融合尺度空间聚类思想的海温多尺度分区方法. 武汉大学学报(信息科学版), 38(12): 1484–1489
Shi Y, Deng M, Liu Q L and Tang J B. 2013. A multi-scale regionalization method for sea surface temperature based on a scale-space clustering. Geomatics and Information Science of Wuhan University, 38(12): 1484–1489 (
Slonim N, Atwal G S, Tkačik G and Bialek W. 2005. Information-based clustering. Proceedings of the National Academy of Sciences of the United States of America, 102(51): 18297–18302
武芳, 钱海忠, 邓红艳, 王辉连. 2008. 面向地图自动综合的空间信息智能处理. 北京: 科学出版社: 3–6
Wu F, Qian H Z, Deng H Y and Wang H L. 2008.Intelligent Processing of Spatial Information for Map Generalization. Beijing: Science Press: 3–6
严蔚敏, 吴伟民. 2007. 数据结构(C语言版). 北京: 清华大学出版社
Yan W M and Wu W M. 2007.Data Structure (C Language Edition). Beijing: Tsinghua University Press
张继国, Singh V P. 2012. 信息熵-理论与应用. 北京: 中国水利水电出版社
Zhang J G and Singh V P. 2012. Information Entropy: Theory and Application. Beijing: China Water Power Press
Zhang T, Ramakrishnan R and Livny M. 1996. BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Record, 25(2): 103–114
Zhong C M, Miao D Q and Wang R Z. 2010. A graph-theoretical clustering method based on two rounds of minimum spanning trees. Pattern Recognition, 43(3): 752–766
周悦来. 2011. 基于网格和信息熵的聚类算法. 长沙: 湖南大学
Zhou Y L. 2011. Grid-Based and Information Entropy-Based Clustering Algorithm.Changsha: Hunan University
相关作者
相关机构