Topic dependent language model based on on-line voting (言語理解とコミュニケーション)

概要

論文の詳細を見る
In this paper, we propose an alternative approach to a topic dependent language model(LM), where the topic is decided by voting in an unsupervised manner. Latent Semantic Analysis(LSA)is employed to reveal hidden(latent)relations among nouns in the context word sequence. To decide the topic of an event, a fixed size word history sequence(window)is observed, and voting is then carried out based on noun class occurrences weighted by a confidence measure. Experiments on the Wall Street Journal corpus and Mainichi Shimbun(Japanese newspaper)corpus show that our proposed method gives better perplexity than the comparative baselines, including a word-based/class-based n-gram LM, their interpolated LM, a cache-based LM, and the Latent Dirichlet Allocation(LDA)-based topic dependent LM.
2009-12-14