Physical Database Design for Efficient Time-Series Similarity Search
スポンサーリンク
概要
- 論文の詳細を見る
Similarity search in time-series databases finds such data sequences whose changing patterns are similar to that of a query sequence. For efficient processing, it normally employs a multi-dimensional index. In order to alleviate the well-known dimensionality curse, the previous methods for similarity search apply the Discrete Fourier Transform (DFT) to data sequences, and take only the first two or three DFT coefficients as organizing attributes. Other than this ad-hoc approach, there have been no research efforts on devising a systematic guideline for choosing the best organizing attributes. This paper first points out the problems occurring in the previous methods, and proposes a novel solution to construct optimal multi-dimensional indexes. The proposed method analyzes the characteristics of a target time-series database, and identifies the organizing attributes having the best discrimination power. It also determines the optimal number of organizing attributes for efficient similarity search by using a cost model. Through a series of experiments, we show that the proposed method outperforms the previous ones significantly.
- (社)電子情報通信学会の論文
- 2008-04-01
著者
-
Kim Sang-wook
Division Of Information And Communications Hanyang University
-
KIM Jinho
Department of Computer Science, Kangwon National University
-
PARK Sanghyun
Department of Computer Science, Yonsei University
-
Park Sanghyun
Department Of Computer Science Yonsei University
-
Kim Jinho
Department Of Computer Science Kangwon National University
関連論文
- Efficient Storage and Querying of Horizontal Tables Using a PIVOT Operation in Commercial Relational DBMSs
- Physical Database Design for Efficient Time-Series Similarity Search
- ACE-INPUTS: A Cost-Effective Intelligent Public Transportation System(Distributed Cooperation and Agents)
- Hybrid Lower-Dimensional Transformation for Similar Sequence Matching
- Fast Normalization-Transformed Subsequence Matching in Time-Series Databases(Data Mining)
- Extraction of Informative Genes from Multiple Microarray Data Integrated by Rank-Based Approach