人対会話エージェントとの多人数会話における頭部方向と音声情報を用いた受話者推定機構

概要

論文の詳細を見る
In multiparty human-agent interaction, the agent should be able to properly respond to a user by determining whether an utterance is addressed to the agent or to another user. This study proposes a mechanism for identifying the addressee by using nonverbal cues including the acoustic information from the user's speech and head orientation. First, we conduct a WOZ experiment to collect human-human-agent triadic conversations, in which the agent plays a role of an information provider. Then, we analyze whether the acoustic features and head orientations are correlated with addressee-hood. We found that speech features were different depending on whom the person talks to. When people talked to the agent, they spoke with a higher tone of voice and also spoke more loudly and slowly. In addition, the subjects looked at the agent 93.2% of the time while they were talking to the agent. On the other hand, the speaker looked at his/her partner only 33.5% of the time while they were talking to one another. These results suggest that people frequently look at the addressee, but it is difficult to estimate the addressee solely based on the head direction. Based on these analyses, we propose addressee estimation models by integrating speech and head direction information using SVM, and the accuracy of the best performance model is over 80%. Then, we implement an addressee identification mechanism by integrating speech processing and face tracking. We also conduct an evaluation experiment for our addressee identification mechanism, and report that the accuracy remains over 80% if invalid speech input can be eliminated.

人対会話エージェントとの多人数会話における頭部方向と音声情報を用いた受話者推定機構

スポンサーリンク

概要

著者

関連論文

スポンサーリンク