Reconstructing the Language Family Tree from Multilingual Corpus Based on Probabilistic Language Modeling
スポンサーリンク
概要
- 論文の詳細を見る
This paper proposes a new method for automatically clustering languages.The basicidea of this method involves developing a probabilistic model for each languagefrom the given linguistic data, and then computing the distances between languagesaccording to the distance measure defined on the language models.Clustering isperformed based on this distance measure.The paper embodies this idea when the <I>N-gram</I> language model is concerned.The effectiveness of the proposed methodhas been confirmed by evaluation experiments using multilingual texts of nineteendifferent languages from the ECI Corpus (European Corpus Initiative Multilingual Corpus).The results were very encouraging.They were very close to the family treeof languages established in linguistics.
- 言語処理学会の論文
言語処理学会 | 論文
- 複合語の分野連想語の効率的決定法
- クラス指向事例収集手法による言い換えコーパスの構築
- 動詞項構造辞書への大規模用例付与
- 言い換え技術に関する研究動向
- Morpho-Syntactic Rules for Detecting Japanese Term Variation: Establishment and Evaluation