KWIC System on WEB Documents
スポンサーリンク
概要
- 論文の詳細を見る
A KWIC (KeyWord In Context) system is a useful tool to investigate the usage oflanguage.We developed a KWIC system for a huge WEB text.The text data isextracted from about 350 giga byte WEB pages and contains more than 10 billioncharacters.It was done by a crawler for about 2month period.The amount of thetext data exceeds 4 giga bytes which can be expressed in 32 bits.We developed asuffix array indexer which can handle 40 bits and the system searches sentences withdesired keywords in it.In order to show the usefulness of the system for Japaneselearners as a second language, we collect KWIC data for "TO-ITAMU (painful like)" and analyzed onomatopoeia appear before the expression.
- 言語処理学会の論文
言語処理学会 | 論文
- 複合語の分野連想語の効率的決定法
- クラス指向事例収集手法による言い換えコーパスの構築
- 動詞項構造辞書への大規模用例付与
- 言い換え技術に関する研究動向
- Morpho-Syntactic Rules for Detecting Japanese Term Variation: Establishment and Evaluation