An execution time prediction analytical model for GPU with instruction-level and thread-level parallelism awareness
スポンサーリンク
概要
- 論文の詳細を見る
Even with a powerful hardware in parallel execution, it is still difficult to improve the application performance without realizing the performance bottlenecks of parallel programs on GPU architectures. To help programmers have a better insight into the performance bottlenecks of parallel applications on GPU architectures, we propose an analytical model that estimates the execution time of massively parallel programs which take the instruction-level and thread-level parallelism into consideration. Our model contains two components: memory sub-model and computation sub-model. The memory sub-model is estimating the cost of memory instructions by considering the number of active threads and GPU memory bandwidth. Correspondingly, the computation sub-model is estimating the cost of computation instructions by considering the number of active threads and the application's arithmetic intensity. We use ocelot1) to analysis PTX codes to obtain several input parameters for the two sub-models such as the memory transaction number and data size. Basing on the two submodels, the analytical model can estimates the cost of each instruction while considering instruction-level and thread-level parallelism, thereby estimating the overall execution time of an application. We compare the outcome from the model and the actual execution in GTX260; and the results show that the model can reach 90 percentage accuracy in average for the benchmarks we used.
- 2011-07-20
著者
-
Reiji Suda
Presently With Crest Jst
-
Luo Cheng
Presently with The University of Tokyo
-
Luo Cheng
Presently With The University Of Tokyo|presently With Crest Jst
関連論文
- An execution time prediction analytical model for GPU with instruction-level and thread-level parallelism awareness
- A precise measurement tool for power dissipation of CUDA kernels
- A Three-Step Performance Automatic Tuning Strategy using Statistical Model for OpenCL Implementation of Krylov Subspace Methods
- Efficient Monte Carlo Optimization with ATMathCoreLib
- Evaluation of Impact of Noise on Collective Algorithms in Repeated Computation Cycle
- The Future of Accelerator Programming: Abstraction, Performance or Can We Have Both?