マルチコア超並列環境におけるFFTの自動チューニング(<特集>数値計算のための自動チューニング(続))

概要

論文の詳細を見る
This paper presents an automatic performance tuning for parallel fast Fourier transforms (FFTs) on massively parallel platforms with multi-core processors. A blocking algorithm for parallel FFTs utilizes cache memory effectively. Since the optimal block size may depend on the problem size, we propose a method to determine the optimal block size that minimizes the number of cache misses. In addition, parallel FFTs require intensive all-to-all communication, which affects the performance of FFTs. An automatic tuning of all-to-all communication is also implemented. The performance results demonstrate that the proposed implementation of parallel FFTs with automatic performance tuning is efficient for improving the performance.
2010-12-24