Alignment of lecture speech data and presentation documents based on discourse markers and text length

概要

論文の詳細を見る
Keyword search for the certain scene in video data seems to be in great demand as well as text search.For the video search, a conventional approach is to apply speech recognition to video voice signals and use the results as a text index with time information. However, speech recognition has problems such as recognition errors and unknown words, and recognition results themselves do not work as a precise index. If there are detailed scripts or transcripts of a video available, it is possible to make a precise index synchronized with the video, by aligning the script and the speech recognition results, but not every video comes with detailed scripts.We would like to propose a new approach which enables to make a text index without detailed scripts but with presentation slides.We focus on lecture videos, and we will explain how to make a text index by aligning two different materials;speech recognition results and presentation slides.We align them by slide so that keyword search for lecture videos can be done by slide.
言語処理学会の論文