Toward Automated Cache Partitioning for the K Computer
スポンサーリンク
概要
- 論文の詳細を見る
The processor architecture available on the K computer (SPARC64VIIIfx) features an hardware cache partitioning mechanism called sector cache. This facility enables software to split the memory cache in two independent sectors and to select which one will receive a line when it is retrieved from memory. Such control over the cache by an application enables significant performance optimization opportunities for memory intensive programs, that several studies on software-controlled cache partitioning environments already demonstrated. Most of these previous studies share the same implementation idea: the use of page coloring and an overriding of the operating system virtual memory manager. However, the sector cache differs in several key points over these implementations, making the use of the existing analysis or optimization strategies impractical, while enabling new ones. For example, while most studies overlooked the issue of phase changes in an application because of the prohibitive cost of a repartitioning, the sector cache provides this feature without any costs, allowing multiple cache distribution strategies to alternate during an execution, providing the best allocation of the current locality pattern. Unfortunately, for most application programmers, this hardware cache partitioning facility is not easy to take advantage of. It requires intricate knowledge of the memory access patterns of a code and the ability to identify data structures that would benefit from being isolated in cache. This optimization process can be tedious, with multiple code modifications and countless application runs necessary to achieve a good optimization scheme. To address these issues and to study new optimization strategies, we discuss in this paper our effort to design a new cache analysis and optimization framework for the K computer. This framework is composed of a binary instrumentation tool to measure the locality of a program data structures over a code region, heuristics to determine the best optimization strategy for the sector cache and a code transformation tool to automatically apply this strategy to the target application. While our framework is still a work in progress, we demonstrate the usefulness of our locality analysis and optimization strategy on a handcrafted application derived from classical stencil programs.
- 2012-09-26
著者
-
Mitsuhisa Sato
Riken Advanced Institute For Computational Science|university Of Tsukuba
-
Swann Perarnau
Riken Advanced Institute for Computational Science