Parallelism Analysis of H.264 Decoder and Realization on a Coarse-Grained Reconfigurable SoC
スポンサーリンク
概要
- 論文の詳細を見る
One of the largest challenges for coarse-grained reconfigurable arrays (CGRAs) is how to efficiently map applications. The key issues for mapping are (1) how to reduce the memory bandwidth, (2) how to exploit parallelism in algorithms and (3) how to achieve load balancing and take full advantage of the hardware potential. In this paper, we propose a novel parallelism scheme, called 'Hybrid partitioning', for mapping a H.264 high definition (HD) decoder onto REMUS-II, a CGRA system-on-chip (SoC). Combining good features of data partitioning and task partitioning, our methodology mainly consists of three levels from top to bottom: (1) hybrid task pipeline based on slice and macroblock (MB) level; (2) MB row-level data parallelism; (3) sub-MB level parallelism method. Further, on the sub-MB level, we propose a few mapping strategies such as hybrid variable block size motion compensation (Hybrid VBSMC) for MC, 2D-wave for intra 4×4, parallel processing order for deblocking. With our mapping strategies, we improved the algorithm's performance on REMUS-II. For example, with a luma 16×16MB, the Hybrid VBSMC achieves 4 times greater performance than VBSMC and 2.2 times greater performance than fixed 4×4 partition approach. Finally, we achieve 1080p@33fps H.264 high-profile (HiP)@level 4.1 decoding when the working frequency of REMUS-II is 200MHz. Compared with typical hardware platforms, we can achieve better performance, area, and flexibility. For example, our performance achieves approximately 175% improvement than that of a commercial CGRA processor XPP-III while only using 70% of its area.
著者
-
Shi Longxing
National ASIC Center, Southeast University
-
Yang Jun
National Asic System Engineering Research Center Southeast University
-
Cao Peng
National ASIC system and research engineering center, Southeast University
-
GAO Gugang
National ASIC System Engineering Technology Research Center, Southeast University
-
YANG Jun
National ASIC System Engineering Technology Research Center, Southeast University
関連論文
- 2P2c-10 逐次最小2乗プレフィルタリングを用いるランダムアレイの最適時間反転集束法(ポスターセッション)
- Current reused Colpitts VCO and frequency divider with quadrature outputs
- Compositionally Bi-layered Formation of Interfacial Voids in a Porous Anodic Alumina Template Directly Formed on Si
- 2P2b-17 超音波顔識別システムの開発(ポスターセッション)
- A Novel Fast-Lock-in Digitally Controlled Phase-Locked Loop
- Discrimination of Type 2 diabetic patients from healthy controls by using metabonomics method based on their serum fatty acid profiles
- Memory-Efficient and High-Performance Two-Dimensional Discrete Wavelet Transform Architecture Based on Decomposed Lifting Algorithm
- Diagnosis of liver cancer using HPLC-based metabonomics avoiding false-positive result from hepatitis and hepatocirrhosis diseases
- A Harmonic-Free All Digital Delay-Locked Loop Using an Improved Fast-Locking Successive Approximation Register-Controlled Scheme
- Study on vibration effects of decked charge in bench blasting
- A GC-based metabonomics investigation of type 2 diabetes by organic acids metabolic profile
- Determination of urinary nucleosides by direct injection and coupled-column high-performance liquid chromatography
- Integrated Current Sensing Technique Suitable for Step-Down Switch-Mode Power Converters
- Date Flow Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications
- An optimized QFP structure for use in radio frequency multi-chip module applications
- Fast AdaBoost-Based Face Detection System on a Dynamically Coarse Grain Reconfigurable Architecture
- Reconfiguration Process Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications
- Handling Deafness Problem of Scheduled Multi-Channel Polling MACs
- Parallelism Analysis of H.264 Decoder and Realization on a Coarse-Grained Reconfigurable SoC
- Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip
- A Data Prefetch and Reuse Strategy for Coarse-Grained Reconfigurable Architectures
- A novel DC-12GHz variable gain amplifier in InGaP/GaAs HBT technology
- An improved timing monitor for deep dynamic voltage scaling system
- A wide-range and ultra fast-locking all-digital SAR DLL without harmonic-locking
- The Organization of On-Chip Data Memory in One Coarse-Grained Reconfigurable Architecture
- VLSI Design of a Reconfigurable S-box Based on Memory Sharing Method
- On-chip long-term jitter measurement for PLL based on undersampling technique
- A Data Prefetch and Reuse Strategy for Coarse-Grained Reconfigurable Architectures
- Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip