Automatically Deducing Probabilistic Dialogue Models from an IFT-Annotated Dialogue Corpus
スポンサーリンク
概要
- 論文の詳細を見る
One of the most interesting issues in corpus-based studies is deriving linguistic knowledge via automated procedures. Most works, however, have focused on deriving lexico-syntactic knowledge. In the work described here, we automatically deduce dialogue models from a corpus with probabilistic methods. The corpus is a subset of the ATR Dialogue Database, and consists of simulated dialogues between a secretary and a questioner at international conferences. Each utterance is annotated with a speaker label and an utterance type, called IFT (Illocutionary Force Type), which is an abstraction of the speaker's intention in terms of the type of action the speaker intends by the utterance. We use two kinds of probabilistic methods to model the speaker-IFT sequences of the corpus: (1) an Ergodic HMM (Hidden Markov Model) and (2) the ALERGIA algorithm, an algorithm for learning probabilistic automata by means of state merging. By analyzing the derived dialogue models, we see that both methods capture the basic characteristics of the local discourse structure, such as turn-taking and speech act sequencing. We also describe the quality measurement of the dialogue models from the information-theoretic viewpoint.
- 言語処理学会の論文
言語処理学会 | 論文
- 複合語の分野連想語の効率的決定法
- クラス指向事例収集手法による言い換えコーパスの構築
- 動詞項構造辞書への大規模用例付与
- 言い換え技術に関する研究動向
- Morpho-Syntactic Rules for Detecting Japanese Term Variation: Establishment and Evaluation