An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm

Guo, Zhishan; Lu, Li; Xi, Shijia; Sun, Fuchun

doi:10.1007/978-3-642-01510-6_55

Zhishan Guo¹⁹,
Li Lu¹⁹,
Shijia Xi¹⁹ &
…
Fuchun Sun¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5552))

Included in the following conference series:

International Symposium on Neural Networks

1353 Accesses
2 Citations

Abstract

Different kinds of methods have been proposed in Chinese document classification, while high dimension of feature vector is one of the most significant limits in these methods. In this paper, an important difference is pointed out between Chinese document classification and English document classification. Then an efficient approach is proposed to reduce the dimension of feature vector in Chinese document classification using Genetic Algorithm. Through merely choosing the set of much more “important” features, the proposed method significantly reduces the number of Chinese feature words. Experiments combining with several relative studies show that the proposed method has great effect on dimension reduction with little loss in correctly classified rate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Feature Selection for Text Classification Using Genetic Algorithm

A Hybrid Dimension Reduction Technique for Document Clustering

A feature selection model for document classification using Tom and Jerry Optimization algorithm

Article 21 June 2023

References

Kowalski, G.: Information Retrieval Systems Theory and Implementation. Kluwer Acadamic Publishers, Netherlands (1997)
MATH Google Scholar
Zamir, O., Etzioni, O., Madani, O., et al.: Fast and Intuitive Clustering of Web Documents. In: Proc. of KDD 1997, Newport Beach, USA, pp. 287–290 (1997)
Google Scholar
Cutting, D.R., Karger, D.R., Pedersen, J.O., et al.: Scatter Gather: A Cluster-based Approach to Browsing Large Document Collections. In: Proc. of SIGIR 1995, Copenhagen, pp. 318–320 (1992)
Google Scholar
Aggrawal, C.C., Yu, P.S.: Finding Generalized Projected Clusters in High Dimensional Spaces. In: Proc. of SIGMOD 2000, Dallas, USA, pp. 70–81 (2000)
Google Scholar
Yang, Y.: Noise Reduction in a Statistical Approach to Text Categorization. In: Proc. of SIGIR 1995, Seattle, USA, pp. 256–263 (1995)
Google Scholar
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. of ICML 1997, Nashville, USA, pp. 412–420 (1997)
Google Scholar
Tan, S.: A Novel Refinement Approach for Text Categorization. ACM CIKM 2005 (2005)
Google Scholar
Andrew, M., Kamal, N.: A Comparison of Event Models for Naive Bayes Text Classification. In: AAAI/ICML 1998 Workshop on Learning for Text Categorization, pp. 41–48. AAAI Press, Menlo Park (1998)
Google Scholar
Hao, X., Zhang, C., Tao, X., Wang, S., Hu, Y.: Accurate kNN Chinese Text Classification Via Multiple Strategies. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) (2007)
Google Scholar
Huang, W., Xu, L.X., Duan, J., Lu, Y.: Chinese Web-page Classification Study. In: IEEE International Conference on Control and Automation, ICCA 2007, May 30 2007 - June 1 2007, pp. 1553–1558 (2007)
Google Scholar
Ma, H., Fan, X., Chen, J.: An Incremental Chinese Text Classification Algorithm Based on Quick Clustering. In: 2008 International Symposiums on Information Processing (ISIP), May 23-25, pp. 308–312 (2008)
Google Scholar
Xu, S., Sun, M.: Leveraging World Knowledge in Chinese Text Classification. In: Sixth International Conference on Advanced Language Processing and Web Information Technology, ALPIT 2007, August 22-24, pp. 33–38 (2007)
Google Scholar
Jiang, X., Fan, X., Chen, K.: Chinese Text Classification Based on Summarization Technique. In: Third International Conference on Semantics, Knowledge and Grid, SKG 2007, pp. 362–365 (2007)
Google Scholar
Zhang, X., Yan, R.: Character Based Education for Chinese as a Foreign Language, http://www.yywzw.com/stw/stw4-15.htm

Download references

Author information

Authors and Affiliations

State Key Laboratory on Intelligent Technology and System, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Zhishan Guo, Li Lu, Shijia Xi & Fuchun Sun

Authors

Zhishan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Li Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shijia Xi
View author publications
You can also search for this author in PubMed Google Scholar
Fuchun Sun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Departamento de Control Automático,, CINVESTAV-IPN,, A.P. 14-740, Av.IPN 2508,, D.F., 07360,, México, México
Wen Yu
Deptartment of Electrical and Computer Engineering,, Stevens Institute of Technology,, NJ 07030,, Hoboken,, USA
Haibo He
Dept. of Electrical and Computer Engineering,, South Dakota School of Mines & Technology,, 501 E. St. Joseph Street,, SD 57701,, Rapid City,, USA
Nian Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, Z., Lu, L., Xi, S., Sun, F. (2009). An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm. In: Yu, W., He, H., Zhang, N. (eds) Advances in Neural Networks – ISNN 2009. ISNN 2009. Lecture Notes in Computer Science, vol 5552. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01510-6_55

Download citation

DOI: https://doi.org/10.1007/978-3-642-01510-6_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01509-0
Online ISBN: 978-3-642-01510-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm

Abstract

Access this chapter

Preview

Similar content being viewed by others

Feature Selection for Text Classification Using Genetic Algorithm

A Hybrid Dimension Reduction Technique for Document Clustering

A feature selection model for document classification using Tom and Jerry Optimization algorithm

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

An Effective Dimension Reduction Approach to Chinese Document Classification Using Genetic Algorithm

Abstract

Access this chapter

Preview

Similar content being viewed by others

Feature Selection for Text Classification Using Genetic Algorithm

A Hybrid Dimension Reduction Technique for Document Clustering

A feature selection model for document classification using Tom and Jerry Optimization algorithm

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation