日本語文章の音声認識システム

概要

論文の詳細を見る
We have constructed a continuous speech recognition system for various kinds of Japanese sentences. We explain the procedure referring to the flow diagram shown in Fig. 1. (1)The parameters as shown in Fig. 2 are extracted from the input speech waves, then this input speech waves are transformed into a phoneme string. (2)This input phoneme string is transformed into a condensed phoneme string (Fig. 3). (3)The characteristic phoneme string, in which vowels and /s/ continuing over 50 ms and silence are contained, is extracted from the input phoneme string(Fig. 3). (4)Candidate words are predicted by syntactic and semantic informations. (5)Furthermore, candidate words are restricted by a few phoneme at the beginning of the condensed phoneme string. (6)The input characteristic phoneme string is compared with the characteristic phoneme string of candidate words, and some words are selected. (7)The input condensed phoneme string is compared with the items of the word dictionary of candidate words, and some words are selected. Table 2 shows some items in the two dictionaries. (8)In this way, some words following the preceding word are selected and so on. Thus several word strings are formed(Fig. 4). (9), (10)Above procedures are repeated until all input phonemes are precessed. (11)Each candidate word string is compared with the input condensed phoneme string. (12)As the final output, the word string having the highest reliability is taken. (13)Pragmatic analysis is carried out with the output word string, and the subject concerned now is decided. (14)Then the words unrelated to the subject are removed from the dictionaries. The vocabulary contains 99 words, and it is possible to deal with the sentences concerning to both statistics and landscape. Japanese sentences as shown in Fig. 8 were spoken by four adult males. The results are shown in Fig. 9 and Table 3. This system can recognize 28 sentences among 36, and 76 blocks among 86. (We call a part of a sentence uttered in a breath a block. ) In this paper, we discuss some problems concerning to the system performances. The results are as follows: (1) Some learning process is necessary in order to satisfactorily identify vowels and nasals. (2) Acoustic informations are very effective to restrict the number of candidate words. For example, at the beginning of sentences 91 words are reduced usually to 5 (5. 5%). (3) The use of syntactic and semantic informations reduces the number of candidate words to 20-30% of the ones appearing in case of using only the acoustic informations (Fig. 11). (4) The restriction of the word dictionary by pragmatic informations is also very effective (Table 5). (5) Misrecognition is mostly due to the appearance of undesirable silence owing to decrease of the amplitude of speech waves. The advantages of this system are as follows. (1) Phoneme identification is fairly reliable, and the recognition score of short blocks is very good. (2) Even when the input phoneme string has some errors, the recognition score is much better than that by the matching method using dynamic programing. (3) Many kinds of Japanese sentence may be dealt with in this system, because syntactic restriction is loose. (4) So far, the semantic informations used are simple, but semantically unreasonable sentences may seldom appear. (5) Candidate words are sufficiently restricted by acoustic and linguistic informations.
社団法人日本音響学会の論文
1978-03-01

日本語文章の音声認識システム

スポンサーリンク

概要

著者

関連論文

スポンサーリンク