Abstract
Different kinds of methods have been proposed in Chinese document classification, while high dimension of feature vector is one of the most significant limits in these methods. In this paper, an important difference is pointed out between Chinese document classification and English document classification. Then an efficient approach is proposed to reduce the dimension of feature vector in Chinese document classification using Genetic Algorithm. Through merely choosing the set of much more “important” features, the proposed method significantly reduces the number of Chinese feature words. Experiments combining with several relative studies show that the proposed method has great effect on dimension reduction with little loss in correctly classified rate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Kowalski, G.: Information Retrieval Systems Theory and Implementation. Kluwer Acadamic Publishers, Netherlands (1997)
Zamir, O., Etzioni, O., Madani, O., et al.: Fast and Intuitive Clustering of Web Documents. In: Proc. of KDD 1997, Newport Beach, USA, pp. 287–290 (1997)
Cutting, D.R., Karger, D.R., Pedersen, J.O., et al.: Scatter Gather: A Cluster-based Approach to Browsing Large Document Collections. In: Proc. of SIGIR 1995, Copenhagen, pp. 318–320 (1992)
Aggrawal, C.C., Yu, P.S.: Finding Generalized Projected Clusters in High Dimensional Spaces. In: Proc. of SIGMOD 2000, Dallas, USA, pp. 70–81 (2000)
Yang, Y.: Noise Reduction in a Statistical Approach to Text Categorization. In: Proc. of SIGIR 1995, Seattle, USA, pp. 256–263 (1995)
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. of ICML 1997, Nashville, USA, pp. 412–420 (1997)
Tan, S.: A Novel Refinement Approach for Text Categorization. ACM CIKM 2005 (2005)
Andrew, M., Kamal, N.: A Comparison of Event Models for Naive Bayes Text Classification. In: AAAI/ICML 1998 Workshop on Learning for Text Categorization, pp. 41–48. AAAI Press, Menlo Park (1998)
Hao, X., Zhang, C., Tao, X., Wang, S., Hu, Y.: Accurate kNN Chinese Text Classification Via Multiple Strategies. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) (2007)
Huang, W., Xu, L.X., Duan, J., Lu, Y.: Chinese Web-page Classification Study. In: IEEE International Conference on Control and Automation, ICCA 2007, May 30 2007 - June 1 2007, pp. 1553–1558 (2007)
Ma, H., Fan, X., Chen, J.: An Incremental Chinese Text Classification Algorithm Based on Quick Clustering. In: 2008 International Symposiums on Information Processing (ISIP), May 23-25, pp. 308–312 (2008)
Xu, S., Sun, M.: Leveraging World Knowledge in Chinese Text Classification. In: Sixth International Conference on Advanced Language Processing and Web Information Technology, ALPIT 2007, August 22-24, pp. 33–38 (2007)
Jiang, X., Fan, X., Chen, K.: Chinese Text Classification Based on Summarization Technique. In: Third International Conference on Semantics, Knowledge and Grid, SKG 2007, pp. 362–365 (2007)
Zhang, X., Yan, R.: Character Based Education for Chinese as a Foreign Language, http://www.yywzw.com/stw/stw4-15.htm
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, Z., Lu, L., Xi, S., Sun, F. (2009). An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm. In: Yu, W., He, H., Zhang, N. (eds) Advances in Neural Networks – ISNN 2009. ISNN 2009. Lecture Notes in Computer Science, vol 5552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01510-6_55
Download citation
DOI: https://doi.org/10.1007/978-3-642-01510-6_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01509-0
Online ISBN: 978-3-642-01510-6
eBook Packages: Computer ScienceComputer Science (R0)