Title | Synthesis and Design of Parameter Extractors for Low-Power Pre-computation-Based Content-Addressable Memory Using Gate-Block Selection Algorithm |
Author | *Jui-Yuan Hsieh, Shanq-Jang Ruan (National Taiwan University of Science and Technology, Taiwan) |
Page | pp. 316 - 321 |
Keyword | CAM, low-power, pre-computation, gate-block selection algorithm, synthesis |
Abstract | Content addressable memory (CAM) is frequently used in applications, such as lookup tables, databases, associative computing, and networking, that require high-speed searches due to its ability to improve application performance by using parallel comparison to reduce search time. Although the use of parallel comparison results in fast search time, it also significantly increases power consumption. In this paper, we propose a gate-block selection algorithm, which can synthesize a proper parameter extractor of the pre-computation-based CAM (PB-CAM) to improve the efficiency for specific applications such as embedded systems. Through experimental results, we found that our approach effectively reduces the number of comparison operations for specific data types (ranging from 19.24% to 27.42%) compared with the 1's count approach. We used Synopsys Nanosim to estimate the power consumption in TSMC 0.35um CMOS process. Compared to the 1's count PB-CAM, our proposed PB-CAM achieves 17.72% to 21.09% in power reduction for specific data types. |
PDF file |
Title | A Compiler-in-the-Loop Framework to Explore Horizontally Partitioned Cache Architectures |
Author | *Aviral Shrivastava (Arizona State University, United States), Ilya Issenin, Nikil Dutt (University of California, Irvine, United States) |
Page | pp. 328 - 333 |
Keyword | embedded, compiler, processor, cache, energy |
Abstract | Horizontally Partitioned Caches (HPCs) are a promising
architectural feature to reduce the energy consumption of
the memory subsystem. However, the energy reduction
obtained using HPC architectures is very sensitive to the
HPC parameters. Therefore it is very important to explore
the HPC design space and carefuly choose the HPC
parameters that result in minimum energy consumption
for the application. However, since in HPC architectures,
the compiler has a significant impact on the energy consumption
of the memory subsystem, it is extremely important
to include compiler while deciding the HPC design
parameters. While there has been no previous apporaches
to HPC design exploration, existing cache design space
exploration methodologies do not include the compiler effectsduring
DSE. In this paper, we present a Compiler-inthe-
Loop (CIL) Design Space Exploration (DSE) methodology
to explore and decide the HPC design parameters.
Our experimental results on HP iPAQ h4300-like memory
subsystem running benchmarks from the MiBench suite
demonstrate that CIL DSE can discover HPC configurations
with up to 80% lesser energy consumption than the
HPC configuration in the iPAQ. In contrast, tradiation
simulation-only exploration can discover HPC design parameters
that result in only 57% memory subsystem energy
reduction. Finally our hybrid CIL DSE heuristic
saves 67% of the exploration time as compared to the exhaustive
exploration, while providing maximum possible
energy savings on our set of benchmarks. |
PDF file |
Title | Fast, Quasi-Optimal, and Pipelined Instruction-Set Extensions |
Author | *Ajay K. Verma, Philip Brisk, Paolo Ienne (EPFL, Switzerland) |
Page | pp. 334 - 339 |
Keyword | Instruction Set Extension, Integer Linear Programming |
Abstract | Nowadays many customised embedded processors offer the possibility of
speeding up an application by implementing it using Application-Specific Functional units (AFUs). However, the AFUs must satisfy certain constraints in terms of read and write ports between AFU and processor register file. Due to these restrictions the size and complexity of AFUs remain small. However, in recent some work has been done on relaxing the register file port constraints by serialising register file access (i.e., by allowing multi cycle read and write). This makes the problem of selecting best AFU significantly more complex. Most previous approaches use a two staged process to solve this problem, i.e., first selecting AFUs under some higher I/O constraints and then serialise them under the actual register file port constraints. Not only these methods are complex but also lead to suboptimal solutions. In this paper we formulate the AFU selection problem as an Integer Linear Programming and solve it optimally. We show experimentally that our methodology produces significantly better results compared to state of art techniques. |
PDF file |