顾及遥感影像场景类别信息的视觉单词优化分类

闫利; 朱睿希; 刘异; 莫楠

doi:10.11834/jrs.201761971

遥感应用 | 浏览量 : 0 下载量: 499 CSCD: 5

PDF
导出
分享
收藏
专辑

顾及遥感影像场景类别信息的视觉单词优化分类
Scene classification of remote sensing images by optimizing visual vocabulary concerning scene label information
2017年21卷第2期页码：280-290
收稿：2016-06-07，

录用：2016-10-03，

纸质出版：2017-03
DOI： 10.11834/jrs.201761971
稿件说明：

移动端阅览

闫利, 朱睿希, 刘异, 等. 顾及遥感影像场景类别信息的视觉单词优化分类[J]. 遥感学报, 2017,21(2):280-290. DOI： 10.11834/jrs.201761971.

Li YAN, Ruixi ZHU, Yi LIU, et al. Scene classification of remote sensing images by optimizing visual vocabulary concerning scene label information[J]. Journal of Remote sensing, 2017, 21(2): 280-290. DOI： 10.11834/jrs.201761971.

摘要

传统词包模型的视觉词典忽略了场景本身包含的类别信息，难以区分不同类别但外观相似的场景，针对这个问题，本文提出一种顾及场景类别信息的视觉单词优化方法，分别使用Boiman的分配策略和主成分分析对不同场景类别视觉单词的模糊性和单词冗余进行优化，增强视觉词典的辨识能力。本文算法通过计算不同视觉单词的影像频率，剔除视觉词典中影像频率较小的视觉单词，得到每种场景的类别视觉词典，计算类别直方图，将类别直方图和原始视觉直方图融合，得到不同类别场景的融合直方图，将其作为SVM分类器的输入向量进行训练和分类。选取遥感场景标准数据集，验证算法，实验结果表明：本算法能适应不同大小的视觉词典，在模型中增加场景类别信息，增强了词包模型的辨识能力，有效降低场景错分概率，总体分类精度高达89.5%，优于传统的基于金字塔匹配词包模型的遥感影像场景分类算法。

Abstract

The traditional Bag Of Words (BOW) model disregards the scene label information of remote sensing images and ambiguity or redundancy of visual vocabularies. Hence

utilizing BOW to classify categories with similar backgrounds is unsuitable. Therefore

we propose an image scene classification algorithm based on the optimization of visual words with respect to scene label information to handle the said problem.This paper reports on an image scene classification algorithm based on the optimization of visual words with respect to scene label information. The algorithm procedure is as follows: first

images are divided into patches utilizing Spatial Pyramid Matching

and then Scale Invariant Features Transform (SIFT) features are extracted for each local image patch. These features are then clustered with K-means to form a histogram of each patch at different levels utilizing the Boiman strategy. We adopt Image Frequency as the feature selection method on visual words in each category to eliminate visual vocabulary irrelevant to a specific category and obtain a class-specific codebook. Principal Component Analysis (PCA) is then utilized to eliminate redundant visual vocabulary. Finally

we produce a mixture of class-specific histograms in each image patch at different pyramid levels and a traditional histogram with an adaptive weight. A fusion of histograms will be placed in a Support Vector Machine (SVM).We conducted experiments in this study on standard datasets of scene classification. Five experiments were conducted to demonstrate the performance of proposed algorithm. The first experiment shows that our algorithm performs better than methods that do not consider the scene label information with an increased accuracy of approximately 6 percent. The second experiment shows that the proposed method suitably performs in classifying categories with similar backgrounds and classifying error decreases in most categories. The third experiment demonstrates that the accuracy of the proposed method is higher at each pyramid level

and combined pyramids can offer even higher accuracy. The fourth experiment shows that method utilizing an adaptive weighted fusion method is more accurate than methods without. The final experiment demonstrates that the proposed algorithm performs better than other representative methods under the same conditions.This study proposes a method based on the optimization of visual words with respect to scene label information. This algorithm extracts SIFT features at different levels of pyramids combined with the Boiman strategy to generate universal histograms. DF is adopted as the feature selection method to remove visual words irrelevant to a specific category. PCA is then applied to remove redundancy and obtain class-specific codebook and histograms. Finally

a practical adaptive weighted fusion method that combines the traditional histograms of different levels with the class-specific histogram is proposed and placed in an SVM trainer and classifier. The experiment results show that the proposed algorithm suitably performs in classifying categories with similar backgrounds and displays higher stability. However

the proposed algorithm only considers one SIFT descriptor that corresponds to only one visual word. We can perform experiments on one SIFT descriptor that corresponds to several visual words and other feature selection procedures in future research.

关键词

Keywords

references

Agarwal A and Triggs B. 2008.Multilevel image coding with hyperfeatures.International Journal of Computer Vision, 78(1): 15–27

Bo L F and Sminchisescu C. 2009.Efficient match kernel between sets of features for visual recognition//Advances in Neural Information Processing Systems. Vancouver, British Columbia, Canada: Curran Associates, Inc.:135–143

Boiman O, Shechtman E and Irani M. 2008.In defense of nearest-neighbor based image classification//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK: IEEE: 1–8

Cao L L and Li F F. 2007. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes// Proceedings of the IEEE 11th International Conference on Computer Vision. Rio de Janeiro:IEEE: 1–8

Chen J N, Huang H K, Tian S F and Qu Y L. 2009.Feature selection for text classification with Naïve Bayes. Expert Systems with Applications, 36(3): 5432–5435

Csurka G, Dance C R, Fan L Z, Willamowski J and Bray C. 2004. Visual categorization with bags of keypoints//Workshop on Statistical Learning in Computer Vision.Prague:ECCV: 1–22

Dalal N and Triggs B. 2005.Histograms of oriented gradients for human detection//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). San Diego, CA, USA:IEEE: 886–893

Datta R, Joshi D, Li J and Wang J Z. 2008.Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2):Article No. 5

Hao J and Jie X. 2010.Improved bags-of-words algorithm for scene recognition// Proceedings of the 2010 2nd International Conference on Signal Processing Systems (ICSPS). Dalian:IEEE: V2-279-V2-282

江悦, 王润生, 王程. 2010. 采用上下文金字塔特征的场景分类. 计算机辅助设计与图形学学报, 22(8): 1366–1373

Jiang Y, Wang R S and Wang C. 2010.Scene classification with context pyramid features. Journal of Computer-Aided Design and Computer Graphics, 22(8): 1366–1373

Jiang Y G, Ngo C W and Yang J. 2007.Towards optimal bag-of-features for object categorization and semantic video retrieval//Proceedings of the 6th ACM International Conference on Image and Video Retrieval. Amsterdam, The Netherlands:ACM: 494–501

Juneja M, Vedaldi A, Jawahar C V and Zisserman A. 2013.Blocks that shout: distinctive parts for scene classification//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition.Portland, OR: IEEE: 923–930

Jurie F and Triggs B. 2005.Creating efficient codebooks for visual recognition// Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE: 604–610

Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R and Li F F. 2014. Large-scale video classification with convolutional neural networks//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus, OH: IEEE: 1725–1732

Krizhevsky A, Sutskever I and Hinton G E. 2012.Imagenet classification with deep convolutional neural networks//Advances in Neural Information Processing Systems. Lake Tahoe, Nevada:Curran Associates Inc.: 1097–1105

Larlus D and Jurie F. 2006.Latent mixture vocabularies for object categorization// Proceedings of the 17th British Machine Vision Conference.BMVA Press: 959–968

Li Q, Zhang H G, Guo J, Bhanu B and An L. 2012. Improving bag-of-words scheme for scene categorization. The Journal of China Universities of Posts and Telecommunications, 19: 166–171

Moosmann F, Nowak E and Jurie F. 2008.Randomized clustering forests for image classification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9): 1632–1646

Oliva A and Torralba A. 2001.Modeling the shape of the scene: A holistic representation of the spatial envelope.International Journal of Computer Vision, 42(3): 145–175

Perronnin F. 2008. Universal and adapted vocabularies for generic visual categorization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7): 1243–1256

Sadeghi F and Tappen M F. 2012.Latent pyramidal regions for recognizing scenes//Proceedings of the 12th European Conference on Computer Vision. Berlin Heidelberg:Springer: 228–241

Singh S, Gupta A and Efros A A. 2012.Unsupervised discovery of mid-level discriminative patches// Proceedings of the 12th European Conference on Computer Vision.Berlin Heidelberg:Springer: 73–86

Sivic J and Zisserman A. 2003.Video Google: atext retrieval approach to object matching in videos// Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France: IEEE: 1470–1477

Van Gemert J C, Veenman C J, Smeulders A W M and Geusebroek J M. 2010. Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7): 1271–1283

Wu J X and Rehg J M. 2011.CENTRIST: avisual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8): 1489–1501

解文杰. 2011.基于中层语义表示的图像场景分类研究. 北京: 北京交通大学

Xie W J. 2011. Research on Middle Semantic Representation Based Image Scene Classification.Beijing:BeijingJiaotong University

Yang J C, Yu K, Gong Y H and Huang T. 2009.Linear spatial pyramid matching using sparse coding for image classification// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Miami, FL:IEEE: 1794–1801

Yang Y and Newsam S. 2010. Bag-of-visual-words and spatial extensions for land-use classification//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose, California:ACM: 270–279

Yang Y M and Pedersen J O. 1997.A comparative study on feature selection in text categorization//Proceedings of the Fourteenth International Conference on Machine Learning.San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.: 412–420

赵理君, 唐娉, 霍连志, 郑柯. 2014. 图像场景分类中视觉词包模型方法综述. 中国图象图形学报, 19(3):333–343

Zhao L J, Tang P, Huo L Z and Zheng K. 2014.Review of the bag-of-visual-words models in image scene classification.Journal of Image and Graphics, 19(3):333–343

文章被引用时，请邮件提醒。

提交

高光谱图像滚动引导递归滤波与地物分类

基于独立分量分析的遥感图像分类技术

快速近似主成分分析算法