顾及遥感影像场景类别信息的视觉单词优化分类
Scene classification of remote sensing images by optimizing visual vocabulary concerning scene label information
- 2017年21卷第2期 页码:280-290
纸质出版日期: 2017-3 ,
录用日期: 2016-10-03
DOI: 10.11834/jrs.201761971
扫 描 看 全 文
浏览全部资源
扫码关注微信
纸质出版日期: 2017-3 ,
录用日期: 2016-10-03
扫 描 看 全 文
闫利, 朱睿希, 刘异, 等. 顾及遥感影像场景类别信息的视觉单词优化分类[J]. 遥感学报, 2017,21(2):280-290.
Li YAN, Ruixi ZHU, Yi LIU, et al. Scene classification of remote sensing images by optimizing visual vocabulary concerning scene label information[J]. Journal of Remote sensing, 2017,21(2):280-290.
传统词包模型的视觉词典忽略了场景本身包含的类别信息,难以区分不同类别但外观相似的场景,针对这个问题,本文提出一种顾及场景类别信息的视觉单词优化方法,分别使用Boiman的分配策略和主成分分析对不同场景类别视觉单词的模糊性和单词冗余进行优化,增强视觉词典的辨识能力。本文算法通过计算不同视觉单词的影像频率,剔除视觉词典中影像频率较小的视觉单词,得到每种场景的类别视觉词典,计算类别直方图,将类别直方图和原始视觉直方图融合,得到不同类别场景的融合直方图,将其作为SVM分类器的输入向量进行训练和分类。选取遥感场景标准数据集,验证算法,实验结果表明:本算法能适应不同大小的视觉词典,在模型中增加场景类别信息,增强了词包模型的辨识能力,有效降低场景错分概率,总体分类精度高达89.5%,优于传统的基于金字塔匹配词包模型的遥感影像场景分类算法。
The traditional Bag Of Words (BOW) model disregards the scene label information of remote sensing images and ambiguity or redundancy of visual vocabularies. Hence
utilizing BOW to classify categories with similar backgrounds is unsuitable. Therefore
we propose an image scene classification algorithm based on the optimization of visual words with respect to scene label information to handle the said problem.This paper reports on an image scene classification algorithm based on the optimization of visual words with respect to scene label information. The algorithm procedure is as follows: first
images are divided into patches utilizing Spatial Pyramid Matching
and then Scale Invariant Features Transform (SIFT) features are extracted for each local image patch. These features are then clustered with K-means to form a histogram of each patch at different levels utilizing the Boiman strategy. We adopt Image Frequency as the feature selection method on visual words in each category to eliminate visual vocabulary irrelevant to a specific category and obtain a class-specific codebook. Principal Component Analysis (PCA) is then utilized to eliminate redundant visual vocabulary. Finally
we produce a mixture of class-specific histograms in each image patch at different pyramid levels and a traditional histogram with an adaptive weight. A fusion of histograms will be placed in a Support Vector Machine (SVM).We conducted experiments in this study on standard datasets of scene classification. Five experiments were conducted to demonstrate the performance of proposed algorithm. The first experiment shows that our algorithm performs better than methods that do not consider the scene label information with an increased accuracy of approximately 6 percent. The second experiment shows that the proposed method suitably performs in classifying categories with similar backgrounds and classifying error decreases in most categories. The third experiment demonstrates that the accuracy of the proposed method is higher at each pyramid level
and combined pyramids can offer even higher accuracy. The fourth experiment shows that method utilizing an adaptive weighted fusion method is more accurate than methods without. The final experiment demonstrates that the proposed algorithm performs better than other representative methods under the same conditions.This study proposes a method based on the optimization of visual words with respect to scene label information. This algorithm extracts SIFT features at different levels of pyramids combined with the Boiman strategy to generate universal histograms. DF is adopted as the feature selection method to remove visual words irrelevant to a specific category. PCA is then applied to remove redundancy and obtain class-specific codebook and histograms. Finally
a practical adaptive weighted fusion method that combines the traditional histograms of different levels with the class-specific histogram is proposed and placed in an SVM trainer and classifier. The experiment results show that the proposed algorithm suitably performs in classifying categories with similar backgrounds and displays higher stability. However
the proposed algorithm only considers one SIFT descriptor that corresponds to only one visual word. We can perform experiments on one SIFT descriptor that corresponds to several visual words and other feature selection procedures in future research.
场景类别类别直方图视觉单词优化主成分分析影像频率自适应加权融合
scene classificationclass-specific histogramoptimization of visual wordsprincipal component analysisimage frequencyadaptive weighted mixture
Agarwal A and Triggs B. 2008.Multilevel image coding with hyperfeatures.International Journal of Computer Vision, 78(1): 15–27
Bo L F and Sminchisescu C. 2009.Efficient match kernel between sets of features for visual recognition//Advances in Neural Information Processing Systems. Vancouver, British Columbia, Canada: Curran Associates, Inc.:135–143
Boiman O, Shechtman E and Irani M. 2008.In defense of nearest-neighbor based image classification//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, AK: IEEE: 1–8
Cao L L and Li F F. 2007. Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes// Proceedings of the IEEE 11th International Conference on Computer Vision. Rio de Janeiro:IEEE: 1–8
Chen J N, Huang H K, Tian S F and Qu Y L. 2009.Feature selection for text classification with Naïve Bayes. Expert Systems with Applications, 36(3): 5432–5435
Csurka G, Dance C R, Fan L Z, Willamowski J and Bray C. 2004. Visual categorization with bags of keypoints//Workshop on Statistical Learning in Computer Vision.Prague:ECCV: 1–22
Dalal N and Triggs B. 2005.Histograms of oriented gradients for human detection//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). San Diego, CA, USA:IEEE: 886–893
Datta R, Joshi D, Li J and Wang J Z. 2008.Image retrieval: ideas, influences, and trends of the new age. ACM Computing Surveys, 40(2):Article No. 5
Hao J and Jie X. 2010.Improved bags-of-words algorithm for scene recognition// Proceedings of the 2010 2nd International Conference on Signal Processing Systems (ICSPS). Dalian:IEEE: V2-279-V2-282
江悦, 王润生, 王程. 2010. 采用上下文金字塔特征的场景分类. 计算机辅助设计与图形学学报, 22(8): 1366–1373
Jiang Y, Wang R S and Wang C. 2010.Scene classification with context pyramid features. Journal of Computer-Aided Design and Computer Graphics, 22(8): 1366–1373
Jiang Y G, Ngo C W and Yang J. 2007.Towards optimal bag-of-features for object categorization and semantic video retrieval//Proceedings of the 6th ACM International Conference on Image and Video Retrieval. Amsterdam, The Netherlands:ACM: 494–501
Juneja M, Vedaldi A, Jawahar C V and Zisserman A. 2013.Blocks that shout: distinctive parts for scene classification//Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition.Portland, OR: IEEE: 923–930
Jurie F and Triggs B. 2005.Creating efficient codebooks for visual recognition// Proceedings of the 10th IEEE International Conference on Computer Vision. Beijing, China: IEEE: 604–610
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R and Li F F. 2014. Large-scale video classification with convolutional neural networks//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus, OH: IEEE: 1725–1732
Krizhevsky A, Sutskever I and Hinton G E. 2012.Imagenet classification with deep convolutional neural networks//Advances in Neural Information Processing Systems. Lake Tahoe, Nevada:Curran Associates Inc.: 1097–1105
Larlus D and Jurie F. 2006.Latent mixture vocabularies for object categorization// Proceedings of the 17th British Machine Vision Conference.BMVA Press: 959–968
Li Q, Zhang H G, Guo J, Bhanu B and An L. 2012. Improving bag-of-words scheme for scene categorization. The Journal of China Universities of Posts and Telecommunications, 19: 166–171
Moosmann F, Nowak E and Jurie F. 2008.Randomized clustering forests for image classification.IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9): 1632–1646
Oliva A and Torralba A. 2001.Modeling the shape of the scene: A holistic representation of the spatial envelope.International Journal of Computer Vision, 42(3): 145–175
Perronnin F. 2008. Universal and adapted vocabularies for generic visual categorization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7): 1243–1256
Sadeghi F and Tappen M F. 2012.Latent pyramidal regions for recognizing scenes//Proceedings of the 12th European Conference on Computer Vision. Berlin Heidelberg:Springer: 228–241
Singh S, Gupta A and Efros A A. 2012.Unsupervised discovery of mid-level discriminative patches// Proceedings of the 12th European Conference on Computer Vision.Berlin Heidelberg:Springer: 73–86
Sivic J and Zisserman A. 2003.Video Google: atext retrieval approach to object matching in videos// Proceedings of the 9th IEEE International Conference on Computer Vision. Nice, France: IEEE: 1470–1477
Van Gemert J C, Veenman C J, Smeulders A W M and Geusebroek J M. 2010. Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(7): 1271–1283
Wu J X and Rehg J M. 2011.CENTRIST: avisual descriptor for scene categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8): 1489–1501
解文杰. 2011.基于中层语义表示的图像场景分类研究. 北京: 北京交通大学
Xie W J. 2011. Research on Middle Semantic Representation Based Image Scene Classification.Beijing:BeijingJiaotong University
Yang J C, Yu K, Gong Y H and Huang T. 2009.Linear spatial pyramid matching using sparse coding for image classification// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Miami, FL:IEEE: 1794–1801
Yang Y and Newsam S. 2010. Bag-of-visual-words and spatial extensions for land-use classification//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. San Jose, California:ACM: 270–279
Yang Y M and Pedersen J O. 1997.A comparative study on feature selection in text categorization//Proceedings of the Fourteenth International Conference on Machine Learning.San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.: 412–420
赵理君, 唐娉, 霍连志, 郑柯. 2014. 图像场景分类中视觉词包模型方法综述. 中国图象图形学报, 19(3):333–343
Zhao L J, Tang P, Huo L Z and Zheng K. 2014.Review of the bag-of-visual-words models in image scene classification.Journal of Image and Graphics, 19(3):333–343
相关作者
相关机构