Title | Retiming for Synchronous Data Flow Graphs |
Author | Nikolaos Liveris, Chuan Lin, Jia Wang, *Hai Zhou (Northwestern University, United States), Prithviraj Banerjee (University of Illinois, Chicago, United States) |
Page | pp. 480 - 485 |
Keyword | SDF, retiming, high-level synthesis |
Abstract | In this paper we present a new algorithm for retiming Synchronous Dataflow (SDF) graphs. The retiming aims at minimizing the cycle length of an SDF. The algorithm is provably optimal and its execution time is improved compared to previous approaches. |
PDF file |
Title | Signal-to-Memory Mapping Analysis for Multimedia Signal Processing |
Author | Ilie I. Luican, Hongwei Zhu, *Florin Balasa (University of Illinois at Chicago, United States) |
Page | pp. 486 - 491 |
Keyword | memory management, signal-to-memory mapping, intra-array mapping |
Abstract | The storage requirements in data-dominant signal processing systems,
whose behavior is described by array-based, loop-organized algorithmic
specifications, have an important impact on the overall energy
consumption, data access latency, and chip area. Finding the optimal
storage of the usually large arrays from these behavioral specifications
is an important step during memory allocation.
This paper proposes more efficient algorithms for two intra-array
mapping-to-memory models (of De Greef and Troncon), resulting in
an implementation several times faster than the original ones. |
PDF file |
Title | MODLEX: A Multi Objective Data Layout EXploration Framework for Embedded Systems-on-Chip |
Author | *Rajesh Kumar T. S. (Texas Instruments India, India), Ravikumar C. P. (Texas Instruments, India), Govindarajan R. (Indian Institute of Science, India) |
Page | pp. 492 - 497 |
Keyword | Memory Architecture, Data Layout, Power-performance Trade-off, Genetic Algorithm |
Abstract | The memory subsystem is a major contributor to the performance,
power, and area of complex SoCs used in feature rich multimedia
products. Hence, memory architecture of the embedded DSP is complex
and usually custom designed with multiple banks of
single-ported or dual ported
on-chip scratch pad memory and multiple banks of off-chip memory.
Building software for such large complex memories with many of
the software components as individually optimized software IPs
is a big challenge. In order to
obtain good performance and a reduction in memory stalls,
the data buffers of the application need to be placed
carefully in different types of memory
. In this paper we present
a unified framework (MODLEX) that combines different data layout optimizations
to address the complex DSP memory architectures.
Our method models the data layout problem as multi-objective Genetic
Algorithm (GA) with performance and power being the objectives
and presents a set of solution points which is attractive from
a platform design viewpoint. While most of the work in the
literature assumes that performance and power are non-conflicting
objectives, our work demonstrates that
there is significant trade-off (up to 70\%) that is possible between power
and performance. |
PDF file |
Title | A Run-Time Memory Protection Methodology |
Author | *Udaya Seshua (Philips Semiconductors, India), Nagaraju Bussa (Philips Research, India), Bart Vermeulen (Philips Research, Netherlands) |
Page | pp. 498 - 503 |
Keyword | memory protection, software debug, Hardware/Software co-design |
Abstract | In this paper we present a novel methodology, which aids in debugging memory corruption errors during application development. This methodology is based on the analysis of the memory access behavior of a set of benchmark applications. The analysis result is used to strike an optimal balance between hardware and software instrumentation to make our approach low-cost both from a performance penalty and hardware area point-of-view. Experimental results show that our innovative approach typically requires less than 2% of CPU silicon area for less than 1% run-time performance overhead, making it applicable in time-constrained embedded systems. |
PDF file |
Title | Short-Circuit Compiler Transformation: Optimizing Conditional Blocks |
Author | *Mohammad Ali Ghodrat, Tony Givargis, Alex Nicolau (University of California, Irvine, United States) |
Page | pp. 504 - 510 |
Keyword | Short circuit evaluation, lazy evaluation, compiler transformation, domain space partitioning |
Abstract | We present the short-circuit code transformation technique, intended for embedded compilers. The transformation technique optimizes conditional blocks in high-level programs. Specifically, the transformation takes advantage of the fact that the Boolean value of the conditional expression, determining the true/false paths, can be statically analyzed to determine cases when one or the other of the true/false paths are guaranteed to execute.
In such cases, code is generated to bypass the evaluation of the conditional expression. In instances when the bypass code is faster to evaluate than the conditional expression, a net performance gain is obtained. Our experiments with the Mediabench applications show that the short-circuit transformation yields a an average of 35.1% improvement in execution time for SPARC and an average of 36.3% improvement in execution time for ARM. We also measured an average of 36.4% reduction in power consumption for ARM. |
PDF file |