Using Cacheline Reuse Characteristics for Prefetcher Throttling
スポンサーリンク
概要
- 論文の詳細を見る
One of the significant issues of processor architecture is to overcome memory latency. Prefetching can greatly improve cache performance, but it has the drawback of cache pollution, unless its aggressiveness is properly set. Several techniques that have been proposed for prefetcher throttling use accuracy as a metric, but their robustness were not sufficient because of the variations in programs' working set sizes and cache capacities. In this study, we revisit prefetcher throttling from the viewpoint of data lifetime. Exploiting the characteristics of cache line reuse, we propose Cache-Convection-Control-based Prefetch Optimization Plus (CCCPO+), which enhances the feedback algorithm of our previous CCCPO. Evaluation results showed that this novel approach achieved a 30% improvement over no prefetching in the geometric mean of the SPEC CPU 2006 benchmark suite with 256KB LLC, 1.8% over the latest prefetcher throttling, and 0.5% over our previous CCCPO. Moreover, it showed superior stability compared to related works, while lowering the hardware cost.
著者
-
Yoshinaga Tsutomu
University Of Electro-communications
-
HIRAKI Kei
The University of Tokyo
-
IRIE Hidetsugu
University of Electro-Communications
-
MIYOSHI Takefumi
University of Electro-Communications
-
HONJO Goki
The University of Tokyo
関連論文
- Pipelined round-robin broadcast algorithm in homogeneous clusters of SMP (計算機アーキテクチャ・ハイパフォーマンスコンピューティング 「ハイパフォーマンスコンピューティングとアーキテクチャの評価」に関する北海道ワークショップ(HOKKE-2008))
- Computer aided detection system implementation for recognize cancer in mammograms over an FPGA (VLSI設計技術)
- Computer aided detection system implementation for recognize cancer in mammograms over an FPGA (コンピュータシステム)
- Computer aided detection system implementation for recognize cancer in mammograms over an FPGA (リコンフィギャラブルシステム)
- Pipelined Round-Robin Broadcast Algorithm in Homogeneous Clusters of SMP
- D-6-8 Hybrid Compiler-Controlled Self-Adjustable Parallelism-Independent Scheduling Algorithm for Cluster of Workstations
- Using Cacheline Reuse Characteristics for Prefetcher Throttling
- FPGA-based Implementation of Sliding-Window Aggregates over Disordered Data Streams
- FPGA-based Implementation of Sliding-Window Aggregates over Disordered Data Streams
- FPGA-based Implementation of Sliding-Window Aggregates over Disordered Data Streams
- Using Cacheline Reuse Characteristics for Prefetcher Throttling