SINGULAR VALUE DECOMPOSITION: EMPLOYABILITY OF INDEPENDENT COMPONENT ANALYSIS –TOPIC DETECTION, CLUSTERS, LATENT SEMANTIC INDEXING IN ENHANCING DATA INTELLIGENCE USABILITY
Karan Gupta
Abstract
Concept detection plays an important role if there is a huge amount of data available. We know that cluster analysis, topic detection, opinion mining has got a major role in the product marketing, online shopping, E-commerce. In this paper, we have conducted the topic detection and clustering experiments on the News samples which were sourced from online newspapers. Our aim is to find out the topics which also available in the text documents as a group of words and apply a clustering technique using the Singular value decomposition method. Then opinions are extracted from the comments, collected on a particular subject of interest like the comments for Smartphone. Finally, the clustering technique is applied on these sentiments to figure out the opinions of the people towards different features of the Smartphone. The results obtained here are competitive with the technology available.
References
- Arora, S., Ge, R., and Moitra, A. learning topic models-going beyond SVD. In Foundation of computer science
- Arora, S, R. Halpen, D., Moitra, A., Sontag, D., Wu, Y., and Zhu M. A practical slgorithm for topic modelling with provable guarantees. In International conference on machine learning, 2013
- S. Deerwester, S. Dumais, G. Furans, and T. landaer. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391-407, 1990
- J. M. Schultz and M. Liberman. Topic detection and tracking using idf- weighted cosine coefficient. In proc. DARPA Broadcast News Workshop, pages 189-192, Hemdon, Virginia, 1990.
- F.Walls, H. Jin, S. Siesta, and R. Schwartz. Topic detection in broadcast news. In Proc. DARPA Broadcast News Workshop, pages 193–198,Hemdon, Virginia, 1999
- E. Bingham. Topic identification in the dynamical text by extracting Minimum complexity time components. In Proc. 3rd Int. Conf. Independent Component Analysis and Blind Signal Separation, pages 546–551, San Diego, California, 2001.
- A. Kab´an and M. Girolami. Unsupervised topic separation and keyword identification in document collections: a projection approach. Tech. rep. 10, Dept. of Computing and Information Systems, Univ. of Paisley, 2000.
- T. Kolenda, L. Hansen, and J. Larsen. Signal detection using ica: Application to chat room topic spotting. In Proc. 3rd Int. Conf. Independent Component Analysis and Blind Signal Separation, pages 540–545, San Diego, California, 2001
- P. Willett. Document clustering using an inverted file approach. Journal of Information Science, 2:223–231, 1990
- L. Baker and A. McCallum. Distributional clustering of words for text classification. In Proceedings of ACM SIGIR, 1998.
- X. Liu and Y. Gong. Document clustering with cluster refinement and model selection capabilities. In Proceedings of ACM SIGIR 2002, Tampere, Finland, Aug. 2002.
- D. Cutting, D. Karger, J. Pederson, and J. Tukey. Scatter/gather: A cluster-based approach to browsing large document collections. In Proceedings of ACM SIGIR, 1992
- Manning, C. D.; Raghavan, P.; Schutze, H. (2008). 'Scoring, term weighting, and the vector space model'. Introduction to Information Retrieval (PDF). p. 100. doi:10.1017/CBO9780511809071.007.ISBN 9780511809071
- Singhal, Amit (2001). 'Modern Information Retrieval: A Brief Overview'. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 24 (4): 35–43
- Stanford core NLP Group: http://nlp.stanford.edu/software/corenlp.shtml
- The Apache Software Foundation, Licensed under the Apache License, Version 2.0 ; https://lucene.apache.org/core/
Back