圧縮テキスト上での<i>q</i>-gram非重複頻度の効率的な計算とその応用
スポンサーリンク
概要
- 論文の詳細を見る
In many problems concerning text data, length-q substrings, or q-grams, can represent important characteristics of the data. Determining the frequencies of all q-grams contained in the data is an important problem with many applications. In this paper, we consider the problem of calculating the non-overlapping frequencies of all q-grams in a text represented as a straight line program (SLP). We show that the problem can be solved in O(q2n) time, where n is the size of the SLP. We also show an interesting application of the algorithm, which converts an arbitrary SLP to an SLP that is constructed based on frequency information.
- 一般社団法人情報処理学会の論文
- 2011-02-28
著者
-
Hideo Bannai
Department Of Informatics Kyushu University
-
Masayuki Takeda
Department Of Informatics Kyushu University
-
Keisuke Goto
Department of Informatics
-
Nami Fukui
Department of Electrical Engineering and Computer Science
-
Shunsuke Ikenaga
Graduate School of Information Science and Electrical Engineering Kyushu University
-
Keisuke Goto
Department of Electrical and Electronic Engineering, Faculty of Engineering, Muroran Institute of Technology, 27-1 Mizumoto, Muroran, Hokkaido 050-8585, Japan
関連論文
- Variable Length Don't-Care Pattern Matching Problems on Compressed Texts
- Online Prediction over Permutahedron
- 圧縮テキスト上でのq-gram非重複頻度の効率的な計算とその応用
- 高速かつ省領域な線形時間LZ分解アルゴリズム