内容推測に適したキーワード抽出のための日本語ストップワード
スポンサーリンク
概要
- 論文の詳細を見る
Extracting keywords from a target text data is essential for an analysis to describe substance characteristics of message content. We picked a use of a stopword filter from among alternatives because the method has the advantage that it is simple yet effective way. The filter we present was made up of non-content words and low-content words. Non-content-bearing words consisted mainly of function words and were gotten rid of by using part-of-speech (POS) tag information. High occurrence rate words in remaining had prospects of being keywords, however usually there were some low-content words like delexical verbs and so on. This article presents a stopword list obtained to come up with low-content words by sensuous manual procedures carried out using 40 text files from the CASTEL/J database and establishes it in the view of general versatility.
著者
関連論文
- 音楽用感性評定尺度によるKinetic Typography測定の試み(音響デザイン)
- コロケーションに着目した日本語テクストのメッセージ分析(言語資源・文書分析)
- Japanese Convenience Stores' B2C E-Commerce Websites : Usefulness and Usability for Online Consumers
- 日本のコンビニエンス・ストア産業における Click and Mortar 戦略の分析
- 意味推測に用いる語彙抽出数の非干渉性
- 内容推測に適したキーワード抽出のための日本語ストップワード