Exact, Fast and Flexible L1 Cache Configuration Simulation for Embedded Systems
スポンサーリンク
概要
- 論文の詳細を見る
Since target applications running on an embedded processor are much limited in embedded systems, we can optimize its cache configuration based on the number of sets, block size, and associativities. An extremely fast cache configuration simulation method, CRCB (Configuration Reduction approach by the Cache Behavior), has been recently proposed which can calculate cache hit/miss counts accurately for possible cache configurations when the three parameters above are changed. The CRCB method assumes LRU-based (Least Recently Used-based) cache but many recent processors use FIFO-based (First In First Out-based) cache or PLRU-based (Pseudo LRU-based) cache due to its hardware cost. In this paper, we propose exact and fast L1 cache configuration simulation algorithms for embedded applications that use PLRU or FIFO as a cache replacement policy. Firstly, we prove that the CRCB method can be applied not only to LRU but also to other cache replacement policies including FIFO and PLRU. Secondly, we prove several properties for FIFO- and PLRU-based caches and we propose associated cache simulation algorithms which can simulate simultaneously more than one cache configurations with different cache associativities accurately for FIFO or PLRU. Finally, many experimental results demonstrate that our cache configuration simulation algorithms obtain accurate cache hit/miss counts and run up to 249 times faster than a conventional cache simulator.
- 一般社団法人情報処理学会の論文
- 2011-08-10
著者
-
Yanagisawa Masao
Department Of Computer Science Waseda University
-
Nozomu Togawa
Department of Computer Science and Engineering, Waseda University
-
Tatsuo Ohtsuki
Department of Computer Science and Engineering, Waseda University
-
Masao Yanagisawa
Department of Electronic and Photonic Systems, Waseda University
-
Masashi Tawada
Department of Computer Science and Engineering, Waseda University
関連論文
- FPGA-Based Reconfigurable Adaptive FEC(System Level Design)(VLSI Design and CAD Algorithms)
- Floorplan-Aware High-Level Synthesis for Generalized Distributed-Register Architectures
- Selective Low-Care Coding : A Means for Test Data Compression in Circuits with Multiple Scan Chains(Selected Papers from the 18th Workshop on Circuits and Systems in Karuizawa)
- A Fast Elliptic Curve Cryptosystem LSI Embedding Word-Based Montgomery Multiplier (System LSIs and Microprocessors, VLSI Design Technology in the Sub-100nm Era)
- A SIMD Instruction Set and Functional Unit Synthesis Algorithm with SIMD Operation Decomposition(Programmable Logic, VLSI, CAD and Layout, Recent Advances in Circuits and Systems-Part 1)
- Sub-operation Parallelism Optimization in SIMD Processor Core Synthesis(Selected Papers from the 17th Workshop on Circuits and Systems in Karuizawa)
- High-Level Power Optimization Based on Thread Partitioning(System Level Design)(VLSI Design and CAD Algorithms)
- A Hardware/Software Cosynthesis Algorithm for Processors with Heterogeneous Datapaths(Selected Papers from the 16th Workshop on Circuits and Systems in Karuizawa)
- A Hardware/Software Partitioning Algorithm for Processor Cores with Packed SIMD-Type Instructions(Design Methodology)(VLSI Design and CAD Algorithms)
- A Retargetable Simulator Generator for DSP Processor Cores with Packed SIMD-type Instructions(Simulation Acceletor)(VLSI Design and CAD Algorithms)
- A Retargetable Simulator Generator for DSP Processor Cores with Packed SIMD-type Instructions
- A Hardware/Software Cosynthesis System for Processor Cores with Content Addressable Memories
- A High-Level Energy-Optimizing Algorithm for System VLSIs Based on Area/Time/Power Estimation(Special Section on VLSI Design and CAD Algorithms)
- An Algorithm and a Flexible Architecture for Fast Block-Matching Motion Estimation(Special Section on VLSI Design and CAD Algorithms)
- C-5 A Software/Hardware Codesign for MPEG Encoder
- High-Level Area/Delay/Power Estimation for Low Power System VLSIs with Gated Clocks(Special Section of Selected Papers from the 14th Workshop on Circuits and Systems in Karuizawa)
- A New Hardware/Software Partitioning Algorithm for DSP Processor Cores with Two Types of Register Files(Special Section on VLSI Design and CAD Algorithms)
- Area and Delay Estimation in Hardware/Software Cosynthesis for Digital Signal Processor Cores(Special Section on VLSI Design and CAD Algorithms)
- An Area/Time Optimizing Algorithm in High-Level Synthesis of Control-Based Hardwares (Special Section on Discrete Mathematics and Its Applications)
- CAM Processor Synthesis Based on Behavioral Descriptions (Special Section on VLSI Design and CAD Algorithms)
- A Hardware / Software Cosynthesis System for Digital Signal Processor Cores with Two Types of Register Files (Special Section of Selected Papers from the 12th Workshop on Circuit and Systems in Karuizawa)
- A Two-Level Cache Design Space Exploration System for Embedded Applications
- An L1 Cache Design Space Exploration System for Embedded Applications
- A Built-in Reseeding Technique for LFSR-Based Test Pattern Generation(Timing Verification and Test Generation)(VLSI Design and CAD Algorithms)
- A Selective Scan Chain Reconfiguration through Run-Length Coding for Test Data Compression and Scan Power Reduction(Test)(VLSI Design and CAD Algorithms)
- A Hybrid Dictionary Test Data Compression for Multiscan-Based Designs(Test)(VLSI Design and CAD Algorithms)
- A Scan-Based Attack Based on Discriminators for AES Cryptosystems
- X-Handling for Current X-Tolerant Compactors with More Unknowns and Maximal Compaction
- Unified Dual-Radix Architecture for Scalable Montgomery Multiplications in GF(P) and GF(2^n)
- A Unified Test Compression Technique for Scan Stimulus and Unknown Masking Data with No Test Loss
- A Secure Test Technique for Pipelined Advanced Encryption Standard
- Scan-Based Side-Channel Attack against RSA Cryptosystems Using Scan Signatures
- A Hardware/Software Cosynthesis System for Digital Signal Processor Cores (Special Section on VLSI Design and CAD Algorithms)
- A Depth-Constrained Technology Mapping Algorithm for Logic-Blocks Composed of Tree-Structured LUTs (Special Section on Selected Papers from the 11th Workshop on Circuits and Systems in Karuizawa)
- A Fast Scheduling Algorithm Based on Gradual Time-Frame Reduction for Datapath Synthesis
- An FPGA Layout Reconfiguration Algorithm Based on Global Routes for Engineering Changes in System Design Specifications(Special Section on Discrete Mathematics and Its Applications)
- Greedy Optimization Algorithm for the Power/Ground Network Design to Satisfy the Voltage Drop Constraint
- Integrating Wearable Sensor Technology into Project-management Process
- Greedy Algorithm for the On-Chip Decoupling Capacitance Optimization to Satisfy the Voltage Drop Constraint
- Scan-based Attack against DES and Triple DES Cryptosystems Using Scan Signatures (Preprint)
- Energy-efficient High-level Synthesis for HDR Architectures
- Scan Vulnerability in Elliptic Curve Cryptosystems
- A Fault-Secure High-Level Synthesis Algorithm for RDR Architectures
- A Fast Selector-Based Subtract-Multiplication Unit and Its Application to Butterfly Unit
- Floorplan-Driven High-Level Synthesis for Distributed/Shared-Register Architectures
- A Fast Weighted Adder by Reducing Partial Product for Reconstruction in Super-Resolution
- Exact, Fast and Flexible L1 Cache Configuration Simulation for Embedded Systems