AUTOMATIC TEXT CLASSIFICATION- EFFICACIOUSLY EMPLOYING THE CONVOLUTED NEURAL NETWORKS TO HARNESS THE NATURAL LANGUAGE PROCESSING TOOLS & TECHNIQUES
Pushkar Garg
Abstract
Automatic text classification is a fundamental task in the field of natural language processing and it can help users select vital information from massive text resources. To better represent the semantic meaning of a text, and to solve the problem that traditional methods need to extract features manually, we use TF-IDF algorithm to calculate the weight of each word in a text, then weight the word vectors by TF-IDF value. This method will generate text vectors, which have clearer semantic meanings. Then we input the text vector matrix into Convolution Neural Network (CNN), so that the CNN will automatically extract text features. Through extensive experiments conducted on two data sets, experiments demonstrate that our approach can effectively improve the accuracy of classification, and the classification accuracy of the two data sets are 96.28% and 96.97% respectively.
References
- Cui J M, Liu J M, Liao Z Y. Research of Text Categorization Based on Support Vector Machine[J]. Computer Simulation, 2013, 30 (2): 299-302.
- Gui Y N. Chinese Text Classification Based on KNN Algorithm [D]. China University of Petroleum (Beijing), 2012
- Li D. Chinese Text Classification Based on Naive Bayesian Method [D]. Hebei University, 2011.
- Wallach H M. Topic modelling: beyond bag-of-words[C]// International Conference on Machine Learning. ACM, 2006:977-984.
- Hu J, Yao Y. Research on the Application of an Improved TFIDF Algorithm in Text Classification[J]. Journal of Convergence Information Technology, 2013, 8(7):639-646.
- Vastenhouw B. Latent Semantic Indexing[J]. 1990
- Bengio Y, Schwenk H, Senécal J S, et al. Neural Probabilistic Language Models [M]//Innovations in Machine Learning. Springer Berlin Heidelberg, 2006.
- Bengio Y, Schwenk H, Senécal J S, et al. Neural Probabilistic Language Models [M]//Innovations in Machine Learning. Springer Berlin Heidelberg, 2006.
- Le Q V, Mikolov T. Distributed Representations of Sentences and Documents[J]. EprintArxiv, 2014, 4:1188-1196.
- Mikolov T, Yih W, Zweig G. Linguistic Regularities in Continuous Space Word Representations [C]//HLT-NAACL. 2013:746-751.
- Behnke S. Hierarchical Neural Networks for Image Interpretation [M]. Berlin Heidelberg: Springer, 2003.
- Hubel D H, Weisel T N. Wiesel, T.N. Receptive Fields, Binocular Interaction and Functional Architecture in Cat's Visual Cortex. J. Physiol. (London) 160, 106-154 [J]. 1962, 160.
- Jia S J, Yang D P, Liu J H. Product Image Fine-grained Classification Based on Convolutional Neural Network [J]. Journal of Shandong University of Science and Technology (Natural Science Edition), 2014, 33 (6): 91-96.
- Hinton G E, Srivastava N, Krizhevsky A, et al. Improving Neural Networks by Preventing Co-adaptation of Feature Detectors [J]. Computer Science, 2012, 3 (4): pages. 212-223.
- Huang W, Wang J. Character-level Convolutional Network for Text Classification Applied to Chinese Corpus[J]. 2016.
Back