Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
スポンサーリンク
概要
- 論文の詳細を見る
The low utilization of SIMD units and memory bandwidth is the main performance bottleneck on SIMD processors for sparse matrix-vector multiplication (SpMV), which is one of the most important kernels in many scientific and engineering applications. This paper proposes a hybrid optimization method to break the performance bottleneck of SpMV on SIMD processors. The method includes a new sparse matrix compressed format, a block SpMV algorithm, and a vector write buffer. Experimental results show that our hybrid optimization method can achieve an average speedup of 2.09 over CSR vector kernel for all the matrices. The maximum speedup can go up to 3.24.
著者
-
Chen Shuming
School of Computer, National University of Defense Technology
-
ZHANG Kai
School of Computer, National University of Defense Technology
-
Wang Yaohua
School of Computer, National University of Defense Technology
-
Wan Jianghua
School of Computer, National University of Defense Technology
関連論文
- CMRF: a Configurable Matrix Register File for accelerating matrix operations on SIMD processors
- Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
- Deterministic Message Passing for Distributed Parallel Computing