The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?
スポンサーリンク
概要
- 論文の詳細を見る
In a perfect world, code would only be written once and would run on different devices with high efficiency. A programmer's time would primarily be spent on thinking about the algorithms and data structures, not on implementing them. To a degree, that used to be the case in the era of frequency scaling on a single core. However, due to power limitations, parallel programming has become necessary to obtain performance gains. But parallel architectures differ substantially from each other, often require specialized knowledge, and typically necessitate reimplementation and fine tuning of application code. These slow tasks frequently result in situations where most of the time is spent reimplementing old rather than writing new code. The goal of our research is to find new programming techniques that increase productivity, maintain high performance, and provide abstraction to free the programmer from these unnecessary and time-consuming tasks. However, such techniques usually come at the cost of substantial performance degradation. This paper investigates current approaches to portable accelerator programming, seeking to answer whether they make it possible to combine high efficiency with sufficient algorithm abstraction. It discusses OpenCL as a potential solution and presents three approaches of writing portable code: GPU-centric, CPU-centric and combined. By applying the three approaches to a real-world program, we show that it is at least sometimes possible to run exactly the same code on many different devices with minimal performance degradation using parameterization. The main contributions of this paper are an extensive review of the current state-of-the-art regarding the stated problem and our original approach of addressing this problem with a generalized excessive-parallelism approach.
- 2014-06-10
著者
-
Reiji Suda
Presently With Crest Jst
-
Reiji Suda
The University of Tokyo
-
Kamil Rocki
IBM Research
-
Martin Burtscher
Texas State University
関連論文
- An execution time prediction analytical model for GPU with instruction-level and thread-level parallelism awareness
- A precise measurement tool for power dissipation of CUDA kernels
- A Three-Step Performance Automatic Tuning Strategy using Statistical Model for OpenCL Implementation of Krylov Subspace Methods
- Efficient Monte Carlo Optimization with ATMathCoreLib
- Evaluation of Impact of Noise on Collective Algorithms in Repeated Computation Cycle
- The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?