Window Memory Layout Scheme for Alternate Row-Wise/Column-Wise Matrix Access
スポンサーリンク
概要
- 論文の詳細を見る
The effective bandwidth of the dynamic random-access memory (DRAM) for the alternate row-wise/column-wise matrix access (AR/CMA) mode, which is a basic characteristic in scientific and engineering applications, is very low. Therefore, we propose the window memory layout scheme (WMLS), which is a matrix layout scheme that does not require transposition, for AR/CMA applications. This scheme maps one row of a logical matrix into a rectangular memory window of the DRAM to balance the bandwidth of the row- and column-wise matrix access and to increase the DRAM IO bandwidth. The optimal window configuration is theoretically analyzed to minimize the total number of no-data-visit operations of the DRAM. Different WMLS implementationsare presented according to the memory structure of field-programmable gata array (FPGA), CPU, and GPU platforms. Experimental results show that the proposed WMLS can significantly improve DRAM bandwidth for AR/CMA applications. achieved speedup factors of 1.6× and 2.0× are achieved for the general-purpose CPU and GPU platforms, respectively. For the FPGA platform, the WMLS DRAM controller is custom. The maximum bandwidth for the AR/CMA mode reaches 5.94 GB/s, which is a 73.6% improvement compared with that of the traditional row-wise access mode. Finally, we apply WMLS scheme for Chirp Scaling SAR application, comparing with the traditional access approach, the maximum speedup factors of 4.73X, 1.33X and 1.56X can be achieved for FPGA, CPU and GPU platform, respectively.
- The Institute of Electronics, Information and Communication Engineersの論文
著者
-
Lei Yuanwu
National Laboratory For Parallel And Distribution Processing National University Of Defense Technology
-
Dou Yong
National Laboratory For Parallel And Distribution Processing National University Of Defense Technology
-
Zhou Jie
National Laboratory For Parallel And Distribution Processing National University Of Defense Technology
-
TANG Yuhua
National Laboratory for Parallel and Distribution Processing, National University of Defense Technology
-
GUO Lei
National Laboratory for Parallel and Distribution Processing, National University of Defense Technology
-
MA Meng
National Laboratory for Parallel and Distribution Processing, National University of Defense Technology
-
Dou Yong
National Laboratory for Parallel and Distributed Processing, National University of Defense Technology
関連論文
- FPGA-Specific Custom VLIW Architecture for Arbitrary Precision Floating-Point Arithmetic
- High performance sparse matrix-vector multiplication on FPGA
- Parallel Sparse Cholesky Factorization on a Heterogeneous Platform
- Window Memory Layout Scheme for Alternate Row-Wise/Column-Wise Matrix Access
- Transpose-free Variable-Size FFT Accelerator Based On-chip SRAM