Document Clustering : Before and After the Singular Value Decomposition
スポンサーリンク
概要
- 論文の詳細を見る
Document Clustering is an issue of measuring similarity between documents and grouping similar documents together. Information Retrieval is an issue of comparing query with a collection of documents to locate a set of documents relevant to a particular query. In the vector space IR model, a query is treated as a document consists of a few terms. Therefore, in both clustering and retrieval we necessarily address issues involving representation of documents and computation of similarities between a set of documents. In the vector space IR model, term-document matrix is computed from a collection of documents using certain weighting scheme. Latent Semantic Indexing, an efficient vector space retrieval approach, uses Singular Value Decomposition technique to reduce the rank of the original term-document matrix. Theoretically, SVD, a dimensionality reduction technique, performs a term-to-concept mapping, and therefore, conceptual indexing and retrieval is made possible. In this paper, we discussed clustering of documents by calculating pair-wise similarity between documents using the original term-document matrix and the decomposed term-document matrix. We also reported an evaluation method based on the clustering hypothesis and analyzed the clustering results.
- 一般社団法人情報処理学会の論文
- 1999-11-25
著者
-
Matsumoto Yuji
Nara Institute of Science and Technology
-
Matsumoto Y
Nara Inst. Sci. And Technol.
-
HASAN Maruf
Nara Institute of Science and Technology
-
Matsumoto Yuji
Nara Inst. Sci. And Technol.
-
Hasan Md
Nara Institute of Science and Technology
関連論文
- Paraphrasing Training Data for Statistical Machine Translation
- Paraphrasing Training Data for Statistical Machine Translation
- Opinion mining from web documents: extraction and structurization (論文特集:データマイニングと統計数理)
- Document Clustering : Before and After the Singular Value Decomposition
- A Method for Syntactic Behavior Analysis
- Effects of Structural Matching and Paraphrasing in Question Answering(Special Issue on Text Processing for Information Access)
- Information Extraction from MEDLINE abstracts of clinical trials(Medical Data Mining)
- Information Extraction from MEDLINE abstracts of clinical trials(Medical Data Mining)(Joint Workshop of Vietnamese Society of AI, SIGKBS-JSAI, ICS-IPSJ, and IEICE-SIGAI on Active Mining)
- A Generalization of Forward-backward Algorithm
- Opinion Mining from Web Documents: Extraction and Structurization