Offline Permutation Algorithms on the Discrete Memory Machine with Performance Evaluation on the GPU
スポンサーリンク
概要
- 論文の詳細を見る
The Discrete Memory Machine (DMM) is a theoretical parallel computing model that captures the essence of the shared memory access of GPUs. Bank conflicts should be avoided for maximizing the bandwidth of the shared memory access. Offline permutation of an array is a task to copy all elements in array a into array b along a permutation given in advance. The main contribution of this paper is to implement a conflict-free permutation algorithm on the DMM in a GPU. We have also implemented straightforward permutation algorithms on the GPU. The experimental results for 1024 double (64-bit) numbers on NVIDIA GeForce GTX-680 show that the straightforward permutation algorithm takes 247.8 ns for the random permutation and 1684ns for the worst permutation that involves the maximum bank conflicts. Our conflict-free permutation algorithm runs in 167ns for any permutation including the random permutation and the worst permutation, although it performs more memory accesses. It follows that our conflict-free permutation is 1.48 times faster for the random permutation and 10.0 times faster for the worst permutation.
著者
-
Nakano Koji
Department Of Applied Chemistry Faculty Of Engineering Kyushu University
-
Ito Yasuaki
Department Of Applied Chemistry Nagoya Institute Of Technology
-
KASAGI Akihiko
Department of Information Engineering, Hiroshima University
関連論文
- PJ-293 Reduced Progressive Common Carotid Artery Intimal-medial Thickness (IMT) Correlates with Restroration of Insulin Resistance in Coronary Heart Disease (CHD) Patients(Atherosclerosis, Clinical 4 (IHD) : PJ49)(Poster Session (Japanese))
- Energy-Efficient Initialization Protocols for Ad-Hoc Radio Networks
- Evaluation of Vulnerable Coronary Plaques and Non-Alcoholic Fatty Liver Disease (NAFLD) by 64-Detector Multislice Computed Tomography (MSCT)
- Characterization of Transendothelial Migratory Lymphokine-activated Killer Cells
- Functional and T Cell Receptor Gene Usage Analysis of Cytotoxic T Lymphocytes in Fresh Tumor-infiltrating Lymphocytes from Human Head and Neck Cancer
- Reversed-Phase High-Performance Liquid-Chromatographic Determination Systems Specific to Ultratrace Hard Metal Ions with Tridentate Schiff Bases and Pyridylhydrazones
- Highly Selective Copper(II) Extraction with Oligoethylene Glycol Bis(hydrazone)s as Several Compositional Complexes
- An Atomic Force Microscopy Assay of Intercalation Binding, Unwinding, and Elongation of DNA, Using a Water-Soluble Psoralen Derivative as a Covalent Binding Probe Molecule
- Flow Immunoassay for Nonioinic Surfactants Based on Surface Plasmon Resonance Sensors
- Synthesis of Circular Double-Stranded DNA Having Single-Stranded Recognition Sequence as Molecular-Physical Probe for Nucleic Acid Hybridization Detection Based on Atomic Force Microscopy Imaging
- Phospholipid-linked Coumarin : A Fluorescent Probe for Sensing Hydroxyl Radicals in Lipid Membranes
- Surface Plasmon Resonance Immunosensor for IgE Analysis Using Two Types of Anti-IgE Antibodies with Different Active Recognition Sites
- Electrochemical Immunoassay for Vitellogenin Based on Sequential Injection Using Antigen-immobilized Magnetic Microbeads
- The Complex Branch Characteristics of Coupled Flutter
- Deafness Resilient MAC Protocol for Directional Communications
- Bioaffinity Sensor to Anti-DNA Antibodies Using DNA Modified Au Electrode
- Clipping-Free Halftoning and Multitoning Using the Direct Binary Search
- New Class of Catalysts for Alternating Copolymerization of Alkylene Oxide and Carbon Dioxide
- Effect of pioglitazone on various parameters of insulin resistance including lipoprotein subclass according to particle size by a gel-permeation high-performance liquid chromatography in newly diagnosed patients with type 2 diabetes
- Sequential Injection Immunoassay for Environmental Measurements
- Calix[n]arenes Provided with Thiols for Modified Electrode Applications;Ring-size Dependent Voltammetric Behavior toward Ferrocene Derivatives
- Eosinophil count is positively correlated with coronary artery calcification
- A Graph Rewriting Approach for Converting Asynchronous ROMs into Synchronous Ones
- AFM-Imaging Diagnosis Method for Single Nucieotide Polymorphism Using Molecular Beacon DNA as an Intramolecular Ligation Template of Target DNA and a Viewable Indicator
- Optimal Parallel Algorithms for Computing the Sum, the Prefix-Sums, and the Summed Area Table on the Memory Machine Models
- Offline Permutation Algorithms on the Discrete Memory Machine with Performance Evaluation on the GPU
- A Pivot-Hinge-Style DNA Immobilization Method with Adaptable Surface Concentration Based on Oligodeoxynucleotide-Phosphorothioate Chemisorption on Gold Surfaces
- Low insulin level is associated with aortic stiffness