書誌情報における著者名の曖昧性解消のためのクラスタリング

概要

論文の詳細を見る
本論文では，書誌情報に現れる省略著者名を，フルネームに正しく対応付けるためのクラスタリング手法を提案する．クラスタリングには，ナイーヴ・ベイズ混合モデルと，新たに提案する2 変数混合モデルとを用いた．実験ではDBLPデータ・セットを用い，50 以上のフルネームに対応する47の省略名で評価した．その結果，2 変数混合モデルは，適合率と再現率の良いバランスを実現することが分かった．In this paper, we propose a clustering method for disambiguating abbreviated author names appearing in citation data by finding the correct full name for each instance of an abbreviated name. We use the standard naive Bayes mixture model and the two-variable mixture model, which is a newly proposed model having two hidden variables. In the experiment, we have used the DBLP data set and have selected 47 abbreviated author names corresponding to more than or equal to 50 full names for evaluation. The results show that our model can achieve a good balancing of precisions and recalls.
日本データベース学会の論文
2007-06-29