|
DAC 2001 Abstracts
Sessions:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
Chair and Organizer: Rita Glover
Panel Members: Marc Halpern, Rich Becks, Richard Kubin, Henry Jurgens,
Rick Cassidy, Ted Vucurevich
-
-
The makeup and relationships within the design supply chain are
changing rapidly. One-stop shopping, whereby a system house
procures most of its design and fabrication services from a single
source, is now being supplanted by a host of outsourcing
suppliers. These players are developing online integration platforms
for outsourced design engineering, business transactions,
and data management.
This evolution is being felt by the entire design automation
industry, whether it be focused on the electronic design of chips,
boards, or systems, or even the mechanical design of their
packaging and housing. What are the implications of this trend on
cost, demand for services, and time to market? How can all
parties ride this wave of the future to achieve the best outcome in
terms of collaborative design capabilities and quality of results?
Chair: Andrew B. Kahng
Organizers: Andrew B. Kahng, Kurt Keutzer
-
2.1 Future Performance Challenges in Nanometer Design [p. 3]
-
Dennis Sylvester, Himanshu Kaul
We highlight several fundamental challenges to designing high performance
integrated circuits in nanometer-scale technologies (i.e.
drawn feature sizes < 100 nm). Dynamic power scaling trends lead to
major packaging problems. To alleviate these concerns, thermal monitoring
and feedback mechanisms can limit worst-case dissipation and
reduce costs. Furthermore, a flexible multi-Vdd +
multi-Vth + re-sizing
approach is advocated to leverage the inherent properties of ultrasmall
MOSFETs and limit both dynamic and static power. Alternative
global signaling strategies such as differential and low-swing drivers
are recommended in order to curb the power requirements of crosschip
communication. Finally, potential power delivery challenges are
addressed with respect to ITRS packaging predictions.
-
2.2 IC Design in High-Cost Nanometer-Technologies Era [p. 9]
-
Wojciech Maly
Nanometer IC technologies are on the horizon. They promise a
lot. They will cost a lot as well. Therefore, we need to ask
today: How may the billions of dollars, which we will need to
spend on nanometer-fablines, affect IC design domain? This
paper attempts to address the above question by analyzing
design-manufacturing interface. The partial answer is derived
from a simple transistor cost model proposed in the body of
the paper.
Keywords
Design of ICs, cost modeling of IC design and manufacturing,
IC technology trends, nanometer-technologies, and IC layout
regularity.
Chair: Pieter van der Wolf
Organizers: Kees Vissers, Kurt Keutzer
-
3.1 LOTTERYBUS: A New High-Performance Communication Architecture for System-on-Chip Designs [p. 15]
-
Kanishka Lahiri, Anand Raghunathan, Ganesh Lakshminarayana
This paper presents LOTTERYBUS, a novel high-performance
communication architecture for system-on-chip (SoC) designs. The
LOTTERYBUS architecture was designed to address the following
limitations of current communication architectures: (i) lack of control
over the allocation of communication bandwidth to different system
components or data flows (e.g., in static priority based shared
buses), leading to starvation of lower priority components in some
situations, and (ii) significant latencies resulting from variations in
the time-profile of the communication requests (e.g., in time division
multiplexed access (TDMA) based architectures), sometimes leading
to larger latencies for high-priority communications.
We present two variations of LOTTERYBUS: the first is a low
overhead architecture with statically configured parameters, while
the second variant is a more sophisticated architecture, in which values
of the architectural parameters are allowed to vary dynamically.
Our experiments investigate the performance of the LOTTERYBUS
architecture across a wide range of communication traffic characteristics.
In addition, we also analyze its performance in a 4x4
ATM switch sub-system design. The results demonstrate that the
LOTTERYBUS architecture is (i) capable of providing the designer
with fine grained control over the bandwidth allocated to each SoC
component or data flow, and (ii) well suited to provide high priority
communication traffic with low latencies (we observed up to
85.4% reduction in communication latencies over conventional onchip
communication architectures).
-
3.2 Robust Interfaces for Mixed-Timing Systems with Application to Latency-Insensitive Protocols [p. 21]
-
Tiberiu Chelcea, Steven M. Nowick
This paper presents several low-latency mixed-timing FIFO designs
that interface systems on a chip working at different speeds.
The connected systems can be either synchronous or asynchronous.
The design are then adapted to work between systems with very
long interconnection delays, by migrating a single-clock solution
by Carloni et al. (for ãlatency-insensitiveä protocols) to mixed timing
domains. The new designs can be made arbitrarily robust
with regard to metastability and interface operating speeds. Initial
simulations for both latency and throughput are promising.
-
3.3 Latency-Driven Design of Multi-Purpose Systems-On-Chip [p. 27]
-
Seapahn Meguerdichian, Milenko Drinic, Darko Kirovski
Deep submicron technology has two major ramifications on
the design process: (i) critical paths are being dominated by global
interconnect rather than gate delays and (ii) ultra high levels of
integration mandate designs that encompass numerous intrasynchronous
blocks with decreased functional granularity and
increased communication demands. These factors emphasize the
importance of the on-chip bus network as the crucial high performance
enabler for future systems-on-chip. By using independent
functional blocks with programmable connectivity, designers
are able to build systems-on-chip capable of supporting
different applications with exceptional levels of resource sharing.
To address challenges in this design paradigm, we have developed
a methodology that enables efficient bus network design with approximate
timing verification and floorplanning of multi-purpose
systems-on-chip in early design stages. The design platform iterates
system synthesis and floorplanning to build min-area floorplans
that satisfy statistical time constraints of applications. We
demonstrate the effectiveness of our bus network design approach
using examples from a multimedia benchmark suite.
-
3.4 Estimation of Speed, Area, and Power of Parameterizable, Soft IP [p. 31]
-
Jagesh Sanghavi, Albert Wang
We present a new approach to estimate speed, area, and power of a
parameterizable, soft IP. By running the ASIC implementation flow
only on selected configurations, we predict the performance for any
arbitrary configuration. We exploit performance function decomposability
to address the combinatorial explosion challenge. The estimator
has been used successfully to configure Xtensa processor cores
for numerous embedded SOC designs.
Chair: Masahiro Fujita
Organizers: Limor Fix, Timothy Kam
-
4.1 Formal Property Verification by Abstraction Refinement with Formal, Simulation and Hybrid Engines [p. 35]
-
Dong Wang, Pei-Hsin Ho, Jiang Long, James Kukula, Yunshan Zhu, Tony Ma, Robert
Damiano
We present RFN, a formal property verification tool based on
abstraction refinement. Abstraction refinement is a strategy for
property verification. It iteratively refines an abstract model to
better approximate the behavior of the original design in the
hope that the abstract model alone will provide enough evidence
to prove or disprove the property.
However, previous work on abstraction refinement was only
demonstrated on designs with up to 500 registers. We developed
RFN to verify real-world designs that may contain thousands of
registers. RFN differs from the previous work in several ways.
First, instead of relying on a single engine, RFN employs
multiple formal verification engines, including a BDD-ATPG
hybrid engine and a conventional BDD-based fixpoint engine,
for finding error traces or proving properties on the abstract
model. Second, RFN uses a novel two-phase process involving
3-valued simulation and sequential ATPG to determine how to
refine the abstract model. Third, RFN avoids the weakness of
other abstraction-refinement algorithms --- finding error traces
on the original design, by utilizing the error trace of the abstract
model to guide sequential ATPG to find an error trace on the
original design.
We implemented and applied a prototype of RFN to verify
various properties of real-world RTL designs containing
approximately 5,000 registers, which represents an order of
magnitude improvement over previous results. On these designs,
we successfully proved a few properties and discovered a design
violation.
-
4.2 Scalable Hybrid Verification of Complex Microprocessors [p. 41]
-
Maher Mneimneh, Fadi Aloul, Chris Weaver, Saugata Chatterjee, Karem Sakallah,
Todd Austin
We introduce a new verification methodology for modern microprocessors
that uses a simple checker processor to validate the execution
of a companion high-performance processor. The checker
can be viewed as an at-speed emulator that is formally verified to be
compliant to an ISA specification. This verification approach enables
the practical deployment of formal methods without impacting
overall performance.
-
4.3 Symbolic RTL Simulation [p. 47]
-
Alfred Kölbl, James Kukula, Robert Damiano
Symbolic simulation is a promising formal verification technique
combining the flexibility of conventional simulation with powerful
symbolic methods. Unfortunately, existing symbolic simulators
are restricted to gate level simulation or handle just a synthesizable
subset of an HDL. Simulation of systems composed of design,
testbench and correctness checkers, however, requires the complete
set of HDL constructs. We present an approach that enables
symbolic simulation of the complete set of RT-level Verilog constructs
with full delay support. Additionally, we propose a flexible
scheme for introducing symbolic variables and demonstrate how
error traces can be simulated with this new scheme. Finally, we
present some experimental results on an 8051 micro-controller design
which prove the effectiveness of our approach.
Chair: Yervant Zorian
Organizers: Anand Raghunathan, Irith Pomeranz
-
5.1 A Unified DFT Architecture for Use with IEEE 1149.1 and VSIA/IEEE P1500 Compliant Test Access
Controllers [p. 53]
-
Bulent I. Dervisoglu
This paper discusses some of the critical issues that may prevent
IEEE P1500 from becoming an acceptable standard and offers
some suggestions for their solution. In particular, the inadequacy
of the proposed P1500 and the VSIA solutions in handling
hierarchical implementations is addressed. Support for
hierarchical implementations is seen as an essential feature in a
test access methodology that is intended for use in System on a
Chip (SoC) designs. The author is actively pursuing some of these
solutions through the working groups.
-
5.2 Instruction-Level DFT for Testing Processor and IP Cores in System-on-a-Chip [p. 59]
-
Wei-Cheng Lai, Kwang-Ting (Tim) Cheng
Self-testing manufacturing defects in a system-on-a-chip
(SOC) by running test programs using a programmable
core has several potential benefits including, at-speed testing,
low DfT overhead due to elimination of dedicated test
circuitry and better power and thermal management during
testing. However, such a self-test strategy might require a
lengthy test program and might not achieve a high enough
fault coverage. We propose a DfT methodology to improve
the fault coverage and reduce the test program length, by
adding test instructions to an on-chip programmable core
such as a microprocessor core. This paper discusses a
method of identifying effective test instructions which could
result in highest benefits with low area/performance overhead.
The experimental results show that with the added
test instructions, a complete fault coverage for testable path
delay faults can be achieved with a greater than 20%
reduction in the program size and the program runtime, as
compared to the case without instruction-level DfT.
-
5.3 Test Strategies for BIST at the Algorithmic and Register-Transfer Levels [p. 65]
-
Kelly A. Ockunzzi, Chris Papachristou
The proposed BIST-based DFT method targets testability
problems caused by three constructs. The first construct is reconvergent
fanout in a circuit behavior, which causes correlation. The
second construct, control statements, also cause correlation, and
relational operations degrade observability. The third construct is
random-pattern-resistant RTL modules, which cannot be tested
effectively with random patterns. Test strategies are presented that
overcome the testability problems by modifying the circuit behavior.
An analysis and insertion scheme that systematically identifies
the problems and applies the strategies is described. Experimental
results from seven examples show that this scheme improves fault
coverage while minimizing the impact on area and critical delay.
Categories and Subject Descriptors: B.8.1 [Performance and
Reliability]: Reliability, Testing, and Fault-Tolerance.
General Terms: Design, Theory
Keywords: design-for-test, built-in self-test, test synthesis
Chair: Rajesh K. Gupta
Organizers: Shishpal Rawat, Ingrid Verbauwhede
Panel Members: Gerard Berry, Ramesh Chandra, Daniel Gajski, Kris Konigsfeld,
Partick Schaumont, Ingrid Verbauwhede
-
-
The focus of this panel is on issues surrounding the use of C++ in
modeling, integration of silicon IP and system-on-chip designs. In
the last two years there have been several announcements promoting
C++ based solutions and of multiple consortia (SystemC,
Cynapps, Accellera, SpecC) that represent increasing commercial
interest both from tool vendors as well as perhaps expression of
genuine needs from the design houses. There are, however, serious
questions about what value proposition does a C++ based
design methodology bring to the IC or system designer? What has
changed in the modeling technology (and/or available tools) that
gives a new capability? Is synthesis the right target ? or Validation?
Tester modeling or testbench generation? This panel brings
together advocates and opponents from the user community to
highlight the achievements and the challenges that remain in use
of C++ for use in microelectronic circuits and systems.
Chair: Robert C. Clark
Organizers: Andrew B. Kahng, Lars Liebmann
-
7.1 Reticle Enhancement Technology: Implications and Challenges for Physical
Design [p. 73]
-
W. Grobman, M. Thompson, R. Wang, C. Yuan, R. Tian, E. Demircan
In this paper, we review phase shift lithography, rule vs. model
based methods for OPC and model-based tiling, and discuss their
implications for layout and verification. We will discuss novel
approaches, using polarizing films on reticles, which change the
game for phase-shift coloring, and could lead to a new direction in
c:PSM constraints on physical design. We emphasize the need to
do tiling that is model-driven and uses optimization techniques to
achieve planarity for better manufacturing tolerance in the
subwavelength dimensions era. Electromagnetic solver results will
be presented which estimate the effect of tiling on circuit timing.
Keywords
RET, Reticle Enhancement Technology , subwavelength
lithography, mask data preparation, OPC, PSM, optical proximity
correction, tiling.
-
7.2 Enabling Alternating Phase Shifted Mask Designs for a Full Logic Gate Level:
Design Rules and Design Rule Checking [p. 79]
-
Lars Liebmann, Jennifer Lund, Fook-Luen Heng, Ioana Graur
The International Technology Roadmap for Semiconductors lists
F2 (lambda = 157 nm) optical lithography and extreme ultraviolet next
generation lithography as the two most feasible lithography
solutions for the 70 nm technology node. It is likely that both of
these solutions will be late, forcing ArF (lambda = 193 nm) lithography
to operate at unprecedented resolution levels. Theoretically,
alternating phase shifted masks ("altPSM") can achieve the
resolution required to manufacture 70 nm logic products with ArF
lithography equipment, but technical and logistical challenges
associated with the broad implementation of altPSM require novel
and invasive EDA solutions which have caused the industry to shy
away from altPSM in the past. One of the biggest such challenges
is the creation of robust design rule checking (DRC) tools which
can predict whether a given layout has a valid, manufacturable
altPSM solution. This paper takes a detailed look at the technical
and practical issues associated with altPSM design rules and DRC.
-
7.3 Layout Design Methodologies for Sub-Wavelength Manufacturing [p. 85]
-
Michael L. Rieger, Jeffrey P. Mayhew, Sridhar Panchapakesan
In this paper, we describe new types of layout design constraints
needed to effectively leverage advanced optical wafer lithography
techniques. Most of these constraints are dictated by the physics
of advanced lithography processes, while other constraints are
imposed by new photomask techniques. Among the methods
discussed are 1) phase shift mask (PSM) lithography in which
phase information is placed on the photomask in combination
with conventional clear and dark information; 2) optical proximity
correction (OPC) where predictable distortions in feature geometry
are corrected by putting an inverse distortion on the mask; 3)
off-axis illumination optics that improve resolution of some configurations
at the expense of others; and 4) use of non-resolving
assist features that improve neighboring structures.
Keywords
optical proximity correction, OPC, phase shift mask, PSM, lithography.
-
7.4 Adoption of OPC and the Impact on Design and Layout [p. 89]
-
F. M. Schellenberg, Olivier Toublan, Luigi Capodieci, Bob Socha
With the adoption of various combinations of resolution
enhancement techniques (RET) for IC lithography, different
process constraints are placed on the IC layout. The final layout
used for mask production is dramatically different than the
original designerâs intent. To insure that EDA tools developed for
applying RET techniques can have optimal performance, layout
methodology must change to create a true ãtargetä layer that
represents the actual design intent. Verification of the final layout
is then expanded from LVS and DRC to also include lithography
process simulation, which compares results to this desired ãtargetä
and governs the application of RET.
General Terms
Algorithms, Design, Reliability, Standardization, Verification.
Keywords
RET, OPC, PSM, SRAF, scattering bars, lithography, phaseshifting,
OAI, off-axis illumination, Quasar, quadrupole.
-
7.5 A Practical Application of Full-Feature Alternating Phase-Shifting Technology
for a Phase-Aware Standard-Cell Design Flow [p. 93]
-
Michel C™tŽ, Philippe Hurat, Vinod Malhotra, Michael Sanie
As the semiconductor industry enters the subwavelength era
where silicon features are much smaller than the wavelength of
the light used to create them, a number of "subwavelength"
technologies such as Optical Proximity Correction (OPC) and
Phase-Shifting Masks (PSM) have been introduced to produce
integrated circuits (ICs) with acceptable yields. An effective
approach to subwavelength IC production includes a combination
of these techniques, including OPC and PSM. Nevertheless, as
we approach silicon features of 0.10µ and below, Alternating
PSM (AltPSM) becomes a critical part of the technology portfolio
needed to achieve IC requirements. An effective EDA
methodology that generates AltPSM ICs must guarantee correct
generation of AltPSM layouts, maintain or improve today's design
productivity, and leverage existing tools and flows. The
implementation of such a methodology becomes more complex as
phase shifting is applied to all critical features, including those
outside of transistor gates. In this paper, we present a
methodology targeted for standard-cell or structured-custom
design styles. We also present examples of designs that start from
standard-cells created in a manner in which all issues regarding
generation of AltPSM are effectively considered, and are used in a
typical cell-based (synthesis-Automatic Place & Route) flow to
produce design layouts that are ready for cost-effective silicon
manufacturing.
Chair: Yusuke Matsunaga
Organizers: Timothy Kam, Masahiro Fujita
-
8.1 Layout-Driven Hot-Carrier Degradation Minimization Using Logic Restructuring Techniques [p. 97]
-
Chih-Wei (Jim) Chang, Kai Wang, Malgorzata Marek-Sadowska
The rapid advances in semiconductor manufacturing technology
have created tough reliability problems. Failure mechanisms such
as hot-carrier effect, dielectric breakdown, electrostatic discharge
and electromigration have posed tremendous threats to the longterm
reliability of VLSI circuits. As a result, designers not only
need analysis tools to locate the problem, but also design-for-reliability
tools to correct it. However, these problems often surface
when the physical layout is done and relatively few logic changes
can be made. In this paper, we target the performance optimization
issues in the context of hot-carrier induced degradation. A layout
driven approach combining rewiring, discrete gate resizing, and pin
reordering is proposed. Experimental results show that rewiring based
incremental logic restructuring is a very powerful technique
in post-layout design for reliability.
-
8.2 An Algorithm for Bi-Decomposition of Logic Functions [p. 103]
-
Alan Mishchenko, Bernd Steinbach, Marek Perkowski
We propose a new BDD-based method for decomposition of
multi-output incompletely specified logic functions into netlists of
two-input logic gates. The algorithm uses the internal don't-cares
during the decomposition to produce compact well-balanced
netlists with short delay. The resulting netlists are provably nonredundant
and facilitate test pattern generation. Experimental
results over MCNC benchmarks show that our approach
outperforms SIS and other BDD-based decomposition methods in
terms of area and delay of the resulting circuits with comparable
CPU time.
-
8.3 Factoring and Recognition of Read-Once Functions using Cographs and Normality [p. 109]
-
Martin C. Golumbic, Aviad Mintz, Udi Rotics
An approach for factoring general boolean functions was
described in [5] which is based on graph partitioning algorithms.
In this paper, we present a very fast algorithm
for recognizing and factoring read-once functions which is
needed as a dedicated factoring subroutine to handle the
lower levels of that factoring process. The algorithm is based
on algorithms for cograph recognition and on checking normality.
Our method has been implemented in the SIS environment,
and an empirical evaluation is given.
Categories and Subject Descriptors
J.6 [Computer Application]: Computer Aided Engineering;
I.1.2 [Computing Methodologies]: Symbolic and Algebraic Manipulation
General Terms Factoring Algorithms
Keywords Read-Once Functions, Factoring, Normality, Cograph
-
8.4 Logic Minimization using Exclusive OR Gates [p. 115]
-
Valentina Ciriani
Recently introduced pseudoproducts and Sum of Pseudoproduct (SPP) forms have
made possible to represent Boolean functions with much shorter expressions than
standard Sum of Products (SP) forms [5]. A pseudo product is a product
(AND) of Exclusive OR (EXOR) factors, and an SPP form is a sum (OR) of
pseudoproducts. The synthesis of SPP minimal forms requires greater effort
than SP minimization. In this paper we present a new data structure for this
problem, leading to an efficient minimization method for SPP form implemented
with an exact algorithm and an heuristic. Experimental results on a
classical set of benchmarks show that the new algorithms are fast, and can be
applied to "complex" functions with a reasonable running time.
Chair: Alper Demir
Organizers: Joel Phillips, Noel Menezes
-
9.1 Design of Half-Rate Clock and Data Recovery Circuits for Optical Communication Systems [p. 121]
-
Jafar Savoj, Behzad Razavi
This paper describes the design of two half-rate clock and
data recovery circuits for optical receivers. Targeting the
data rate of 10-Gb/s, the first implementation incorporates
a ring oscillator and a linear phase detector whereas the
second implementation uses a multiphase LC oscillator and
a bang-bang phase/frequency detector. Fabricated in 0.18-um
CMOS technology, the power consumption of each of the
circuits is less than 100 mW. The rms jitter of the output
clock for the two prototypes is 1 ps and 0.8 ps, respectively,
while the latter achieves a capture range of more than 14%.
-
9.2 A Novel Method for Stochastic Nonlinearity Analysis of a CMOS Pipeline ADC [p. 127]
-
David Goren, Eliyahu Shamsaev, Israel A. Wagner
An analytic approach is presented for estimating the nonlinearity of an
analog to digital converter (ADC) as a function of the variations in the
circuit devices. The approach is demonstrated for the case of a pipeline
ADC with digital error correction. Under some mild assumptions on the
expected variations, the error probability is expressed as a simple
explicit function of the standard deviations in the components' parameters:
gain errors, comparator offset errors and resistor errors. The analytical
expression is verified for Integral Non Linearity (INL), and its limits are
studied using Monte-Carlo simulations of a 10 bit pipeline ADC structure.
-
9.3 Behavioral Partitioning in the Synthesis of Mixed Analog-Digital Systems [p. 133]
-
Sree Ganesan, Ranga Vemuri
Synthesis of mixed-signal designs from behavioral specifications
must address analog-digital partitioning. In this paper, we investigate
the issues in mixed-signal behavioral partitioning and design
space exploration for signal-processing systems. We begin with
the system behavior specified in an intermediate format called the
Mixed Signal Flow Graph, based on the time-amplitude characterization
of signals. We present techniques for analog-digital behavioral
partitioning of the MSFG, and performance estimation of the
technology-mapped analog and digital circuits. The partitioned solution
must satisfy constraints on imposed by the target field programmable
mixed-signal architecture on available configurable resources,
available data converters, their resolution and speed, and
IO pins. The quality of the solution is evaluated based on two metrics,
namely feasibility and performance. The former is a measure
of the validity of the solution with respect to the architectural constraints.
The latter measures the performance of the system based
on bandwidth/speed and noise.
-
9.4 Efficient DDD-based Symbolic Analysis of Large Linear Analog Circuits [p. 139]
-
Wim Verhaegen, Georges Gielen
A new technique for generating approximate symbolic expressions
for network functions in linear(ized) analog circuits is presented. It
is based on the compact determinant decision diagram (DDD) representation
of the circuit. An implementation of a term generation
algorithm is given and its performance is compared to a matroid-based
algorithm. Experimental results indicate that our approach is
the fastest reported algorithm so far for this application.
Chair: T. M. Mak
Organizers: Anand Raghunathan, Tim Cheng
-
10.1 Random Limited-Scan to Improve Random Pattern Testing of Scan Circuits [p. 145]
-
Irith Pomeranz
We propose a method of random pattern generation for at-speed
testing of circuits with scan. The proposed method uses limited
scan operations to achieve complete fault coverage. Under a limited
scan operation, the circuit state is shifted by a number of
positions which may be smaller than the number of state variables.
Limited scan operations are inserted randomly to ensure
that the complete test set can be generated by a random pattern
generator with simple control logic.
-
10.2 Test Volume and Application Time Reduction Through Scan Chain Concealment [p. 151]
-
Ismet Bayraktaroglu, Alex Orailoglu
A test pattern compression scheme is proposed in order to reduce
test data volume and application time. The number of scan chains
that can be supported by an ATE is significantly increased by utilizing
an on-chip decompressor. The functionality of the ATE is
kept intact by moving the decompression task to the circuit under
test. While the number of virtual scan chains visible to the ATE
is kept small, the number of internal scan chains driven by the decompressed
pattern sequence can be significantly increased.
-
10.3 An Approach to Test Compaction for Scan Circuits that Enhances At-Speed Testing [p. 156]
-
Irith Pomeranz, Sudhakar M. Reddy
We propose a new approach to the generation of compact test
sets for scan circuits. Compaction refers here to a reduction in
the test application time. The proposed procedure generates an
initial test set that is likely to have a low test application time. It
then applies an existing static compaction procedure to this initial
test set to further compact it. As a by-product, the proposed
procedure also results in long primary input sequences, which
are applied at-speed. This contributes to the detection of delay
defects. We demonstrate through experimental results the advantages
of this approach over earlier ones as a method for generating
test sets with minimal test application time and long primary
input sequences.
-
10.4 Generating Efficient Tests for Continuous Scan [p. 162]
-
Sying-Jyan Wang, Sheng-Nan Chiou
Conventional scan-based designs spend a lot of testing time in
shifting test patterns and output responses, which greatly increases
the testing cost. In this paper, we propose a modified approach for
scan-based design in which a test is conducted in every clock
cycle. This approach may significantly reduce the test application
time when appropriate test vectors are applied. We develop
algorithms to generate efficient test input for the test environment,
and experimental results show that we can achieve high fault
coverage with only about 10%-30% of the clock cycles required
in conventional scan-based design.
Keywords
Scan, DFT, Test Generation, Compression.
-
10.5 Combining Low-Power Scan Testing and Test Data Compression for System-on-a-Chip [p. 166]
-
Anshuman Chandra, Krishnendu Chakrabarty
We present a novel technique to reduce both test data volume and scan
power dissipation using test data compression for system-on-a-chip testing.
Power dissipation during test mode using ATPG-compacted test patterns is
much higher during functional mode. We show that Golomb coding of precomputed
test sets lends to significant savings in peak and average power, without
requiring either a shower scan clock or blocking logic in the scan cells.
We also improve upon prior work on Golomb coding by showing that a separate
cyclical scan register is not necessary for pattern decompression. Experimental
results for the larger ISCAS 89 benchmarks show that reduced test data
volume and low power scan testing can indeed be achieved in all cases.
Keywords Embedded core testing, Colomb codes, precomputed test sets,
scan testing, switching activity, test set encoding.
Chair: Gabe Moretti, Tim Hopes, Ramesh Narayanaswamy
Organizers: Nanette Collins, Dave Kelf, Gabe Moretti,
Tim Hopes, Ramesh Narayanaswamy
Panel Members: Tom Anderson, Janick Bergeron, Ashish Dixit, Peter Flake,
Tim Hopes, Ramesh Narayanaswamy
-
-
As the popularity of reusing existing designs - or Intellectual
Property (IP) - continues to grow, design challenges escalate. The
most time-consuming and critical part of IP design and reuse is
verifying that it will work as it was designed to and as the user
intends. Designers are pushing the limits of IP for new, distinctive
and innovative applications. With this innovation come problems
that need creative solutions. Product verification, for example, will
become more and more important in ensuring the correctness of the
design. Over the years, various solutions have come on the market,
all seemingly useful, but none reducing the time or manpower it
takes to verify the design. With designs becoming increasingly more
complex with each new project and verification consuming up to 70
percent of a design cycle, something must be done to alleviate the
bottleneck.
Most of today's verification techniques rely on old simulation and
emulation technologies, combined with add-on products designed to
target specific functional items facilitated by the increased
importance of the functionality they provide. These environments
have led to an overall degrading in productivity, with a decrease in
tool speed and a sharp rise in learning curve and installation issues.
In addition, interaction between add-on products created in isolation
lead to further complications, usually discovered as products are
incorporated in design flows.
An improved verification flow is required to provide high-level
productivity improvement over the entire design. The larger and
more complex the design, the higher probability of errors slipping
through the verification process, making System-On-Chip (SOC)
devices the most vulnerable. With the integration of entire systems
within single chips, the need to test hardware and software before
the circuitry is produced, within as natural an environment as
possible is critical to ensuring design success.
The most important aspect in the selection and verification of IP is
the collaboration of the vendor and the foundry. Designers need to
be able to evaluate the core before a final selection is made. The
evaluation should not just use the testbench provided by the vendor,
but should provide an indication of the behavior in the intended use
environment. In parallel, the designer must look at fabrication
option by obtaining its fabrication profile. How many foundries
have certified the core? How many times has the core been used in
previous designs from each foundry? And, more important, is the
core certified by the foundry chosen for the ASIC under
development? With the microprocessor or microcontroller IP, the
designer might have projected the use of an off-the-shelf RTOS. In
this case, it is imperative to make sure that either the IP or RTOS
has been used successfully already, or that the software vendor and
the hardware vendor are committed to insure proper integration in a
timely manner.
Most of this work is tedious and costly because it must take place
before final selection and business negotiations can take place.
Once the IP has been chosen, user and vendor must work as a team
process, since bugs can be found when a new set of tests and a new
use methodology is available. Those who assume that IP commerce
is similar to standard parts commerce are mistaken and are apt to
encounter serious obstacles to IP integration.
An SOC must be verified at various levels of abstraction: functional
to check the correctness of system design, RTL to insure that what is
going to be synthesized is correct, and gate for timing accuracy and
possibly to check signal integrity and power consumption. Many
verification improvements have been proposed which target specific
parts of the design flow. The panel will look at various alternatives
for IP verification today, including universal simulation, hardware
acceleration, formal verification and semi-formal verification.
Functional verification, used to verify that the implementation meets
the specification, requires a significant amount of time because of
the large amount of test vectors required to meaningfully test all of
the functionality of a complex system, including the interfaces
among the various logical blocks that make up the system.
Traditional HDL simulation falls short of the required capabilities
when Verilog is used because the language lacks appropriate
behavioral constructs. Although VHDL provides the required
constructs, its popularity is more limited, so C based dialects or new
languages are required. Being able to develop a testbench is only
the first of many obstacles. The amount of time required to execute
a full suite of functional simulations is significant, even if processors
speeds continue to increase. Simulation farms are becoming a
popular tool to decrease functional verification time.
Moving from behavioral to RTL code is the step in the methodology
that has the least amount of tool support. Verifying the RTL
implementation of a system is crucial. Given the size of today's
designs, traditional simulation methods are running out of steam.
Formal verification and hardware acceleration methods are being
adopted and improved, respectively. EDA tools implementing
formal methods have seen a resurgence in the last year, due to
improved user interface as well as algorithms implemented to prove
the equivalence between a RTL and a gate-level representation of a
system. Emulation is also becoming a more popular method to
verify gate-level implementations, although the initial cost of the
equipment is still quite high.
Panelists, experienced designers and representatives of EDA tools
providers, as well as IP providers, will explore ways to beat the
verification bottleneck and to identify the methodology best suited
for IP design. They will attempt to answer the question, "What
methodology works best for IP Design?"
Chair: Diederik Verkest
Organizers: Diederik Verkest, Kurt Keutzer
-
12.1 A Quick Safari Through the Reconfiguration Jungle [p. 172]
-
Patrick Schaumont, Ingrid Verbauwhede, Kurt Keutzer, Majid Sarrafzadeh
Cost effective systems use specialization to optimize factors such as
power consumption, processing throughput, flexibility or
combinations thereof. Reconfigurable systems obtain this
specialization at run-time. System reconfiguration has a vertical, a
horizontal and a time dimension. We organize this design space as
the reconfiguration hierarchy, and discuss the design methods that
deal with it. Finally, we survey existing commercial platforms that
support reconfiguration and situate them in the reconfiguration
jungle.
-
12.2 Re-Configurable Computing in Wireless [p. 178]
-
Bill Salefski, Levent Caglar
Wireless communications requires a new approach to implement
the algorithms for new standards. The computational demands of
these standards are outstripping the ability of traditional signal
processors, and standards are changing too quickly for
traditional hardware implementation. In this paper we outline
how reconfigurable processing can meet the needs for wireless
base station design while providing the programmability to allow
not just field upgrades as standards evolve, but also to adapt to
much more dynamic factors in the environment such as traffic mix
and the weather. We outline how a designer works with the
Chameleon reconfigurable processor from algorithm design to
prototyping on a development module.
-
12.3 Hardware/Software Instruction Set Configurability for System-on-Chip Processors [p. 184]
-
Albert Wang, Earl Killian, Dror Maydan, Chris Rowen
New application-focused system-on-chip platforms motivate
new application-specific processors. Configurable and
extensible processor architectures offer the efficiency of tuned
logic solutions with the flexibility of standard high-level
programming methodology. Automated extension of processor
function units and the associated software environment ö
compilers, debuggers, simulators and real-time operating
systems ö satisfies these needs. At the same time, designing at
the level of software and instruction set architecture
significantly shortens the design cycle and reduces verification
effort and risk. This paper describes the key dimensions of
extensibility within the processor architecture, the instruction
set extension description language and the means of
automatically extending the software environment from that
description. It also describes two groups of benchmarks,
EEMBCâs Consumer and Telecommunications suites, that
show 20 to 40 times acceleration of a broad set of algorithms
through application-specific instruction set extension, relative
to high performance RISC processors.
Chair: Martin D. F. Wong
Organizers: Jason Cong, Louis Scheffer, Patrick Groeneveld
-
13.1 A Practical Methodology for Early Buffer and Wire Resource Allocation [p. 189]
-
Charles J. Alpert, Jiang Hu, Sachin S. Sapatnekar, Paul G. Villarrubia
The dominating contribution of interconnect to system performance
has made it critical to plan for buffer and wiring
resources in the layout. Both buffers and wires must be considered,
since wire routes determine buffer requirements and
buffer locations constrain wire routes. In contrast to recent
buffer block planning approaches, our design methodology
distributes buffer sites throughout the layout. A tile graph is
used to abstract the buffer planning problem while also
addressing wire planning. We present a four-stage heuristic
called RABID for resource allocation and experimentally
verify its effectiveness.
-
13.2 Creating and Exploiting Flexibility in Steiner Trees [p. 194]
-
Elaheh Bozorgzadeh, Ryan Kastner, Majid Sarrafzadeh
This paper presents the concept of flexibility ö a geometric property
associated with Steiner trees. Flexibility is related to the routability
of the Steiner tree. We present an optimal algorithm which takes
a Steiner tree and outputs a more flexible Steiner tree. Our experiments
show that a net with a flexible Steiner tree increases its
routability. Experiments with a global router show that congestion is
improved by approximately 20%.
-
13.3 Simultaneous Shield Insertion and Net Ordering under Explicit RLC Noise Constraint [p. 199]
-
Kevin M. Lepak, Irwan Luwandi, Lei He
For multiple coupled RLC nets, we formulate the min-area
simultaneous shield insertion and net ordering (SINO/NB-v)
problem to satisfy the given noise bound. We develop
an efficient and conservative model to compute the peak
noise, and apply the noise model to a simulated-annealing
(SA) based algorithm for the SINO/NB-v problem. Extensive
and accurate experiments show that the SA-based algorithm
is efficient, and always achieves solutions satisfying
the given noise bound. It uses up to 71% and 30% fewer
shields when compared to a greedy based shield insertion
algorithm and a separated shield insertion and net ordering
algorithm, respectively. To the best of our knowledge,
it is the first work that presents an in-depth study on the
min-area SINO problem under an explicit noise constraint.
-
13.4 On Optimum Switch Box Designs for 2-D FPGAs [p. 203]
-
Hongbing Fan, Jiping Liu, Yu-Liang Wu, Chak-Chung Cheung
An FPGA switch box is said to be universal (hyper-universal) if
it can detailed route all possible surrounding 2-pin (multi-pin) net
topologies satisfying the global routing density constraints. A switch
box is optimum if it is hyper-universal and the switches inside is
minimum. It has been shown that if the net topology is restricted to
2-pin nets, then a 2-D (4-way) switch box can be built to be universal
with only switches, where is the global routing channel
density. As the routing resource is relatively expensive in FPGA
chips, study of the optimum switch box designs is clearly a topic
with theoretical and commercial value of reducing silicon cost. A
previous work has constructed a formal mathematical model of this
optimum design problem for switch boxes with arbitrary dimensions,
and gave a scheme to produce hyper-universal designs with
less than 6.7W switches for 4-way FPGA switch boxes. In this paper,
we will further investigate this most common 4-way switch box
case, and will give new theoretical results followed by extensive experimental
justification. The results seem to be quite attractive. We
show that such an optimum switch box can be built with a very
low number of additional switches beyond 6W for todayâs practical
range of low W's (e.g. just 6W plus 1 or 2 additional switches
for W's up to 7). Even for arbitrary large W's, the bound can be
shown to be under 6.34W. To make experimental comparison, we
run today's published best FPGA router VPR on large benchmarks
for the popular Disjoint structure and our proposed designs. The
results are quite encouraging.
Chair: Massoud Pedram
Organizers: Donatella Sciuto, Kazutoshi Wakabayashi
-
14.1 Dependency Preserving Probabilistic Modeling of Switching Activity using Bayesian Networks [p. 209]
-
Sanjukta Bhanja, N. Ranganathan
We propose a new switching probability model for combinational
circuits using a Logic-Induced-Directed-Acyclic-Graph(LIDAG)
and prove that such a graph corresponds to a Bayesian Network
guaranteed to map all the dependencies inherent in the circuit. This
switching activity can be estimated by capturing complex dependencies
(spatio-temporal and conditional) among signals efficiently
by local message-passing based on the Bayesian networks. Switching
activity estimation of ISCAS and MCNC circuits with random
input streams yield high accuracy (average mean error=0.002) and
low computational time (average time=3.93 seconds).
-
14.2 A Static Estimation Technique of Power Sensitivity in Logic Circuits [p. 215]
-
Taewhan Kim, Ki-Seok Chung, C. L. Liu
In this paper, we study a new problem of statically estimating the power
sensitivity of a given logic circuit with respect to the primary inputs.
The power sensitivity defines the characteristics of power dissipation due
to changes in state of primary inputs. Consequently, estimating the power
sensitivity among the inputs is essential not only to measure the power
consumption of the circuit efficiently but also to provide potential
opportunities of redesigning the circuit for low power. In this context,
we propose a fast and reliable static estimation technique for power
sensitivity based on a new concept called power equations, which are then
collectively transformed into a table called power table. Experimental
data on MCNC benchmark examples show that the proposed technique is useful
and effective in estimating power consumption. In summary, the relative
error for the estimation of maximum power consumption is 9.4% with a
huge speed-up in simulation.
-
14.3 JouleTrack - A Web Based Tool for Software Energy Profiling [p. 220]
-
Amit Sinha, Anantha P. Chandrakasan
A software energy estimation methodology is presented that avoids
explicit characterization of instruction energy consumption and predicts
energy consumption to within 3% accuracy for a set of benchmark
programs evaluated on the StrongARM SA-1100 and Hitachi
SH-4 microprocessors. The tool, JouleTrack, is available as an online
resource and has various estimation levels. It also isolates the switching
and leakage components of the energy consumption.
Keywords
Software energy, leakage estimation, instruction energy
Chair: Limor Fix
Organizers: Masahiro Fujita, Timothy Kam
-
15.1 Effective Use of Boolean Satisfiability Procedures in the Formal Verification of Superscalar and VLIW Microprocessors [p. 226]
-
Miroslav N. Velev, Randal E. Bryant
We compare SAT-checkers and decision diagrams on the evaluation
of Boolean formulas produced in the formal verification of
both correct and buggy versions of superscalar and VLIW microprocessors.
We identify one SAT-checker that significantly outperforms
the rest. We evaluate ways to enhance its performance
by variations in the generation of the Boolean correctness formulas.
We reassess optimizations previously used to speed up the
formal verification and probe future challenges.
-
15.2 Circuit-based Boolean Reasoning [p. 232]
-
Andreas Kuehlmann, Malay K. Ganai, Viresh Paruthi
Many tasks in CAD, such as equivalence checking, property
checking, logic synthesis, and false paths analysis require efficient
Boolean reasoning for problems derived from circuit structures.
Traditionally, canonical representations, e.g., BDDs, or SAT-based
search methods are used to solve a particular class of problems.
In this paper we present a combination of techniques for Boolean
reasoning based on BDDs, structural transformations, and a SAT
procedure natively working on a shared graph representation of the
problem. The described intertwined integration of the three techniques
results in a robust summation of their orthogonal strengths.
Our experiments demonstrate the effectiveness of the approach.
-
15.3 Checking Equivalence for Partial Implementations [p. 238]
-
Christoph Scholl, Bernd Becker
We consider the problem of checking whether a partial implementation
can (still) be extended to a complete design which is equivalent
to a given full specification.
Several algorithms trading off accuracy and computational resources
are presented: Starting with a simple 0,1,X-based simulation,
which allows approximate solutions, but is not able to find all
errors in the partial implementation, we consider more and more
exact methods finally covering all errors detectable in the partial
implementation. The exact algorithm reports no error if and only
if the current partial implementation conforms to the specification,
i.e. it can be extended to a full implementation which is equivalent
to the specification.
We give a series of experimental results demonstrating the effectiveness
and feasibility of the methods presented.
Chair and Organizer: Carl Pixley
-
16.1 Validating the Intel¨ Pentium¨ 4 Microprocessor [p. 244]
-
Bob Bentley
Developing a new leading edge IA-32 microprocessor is an
immensely complicated undertaking. In the case of the Pentium¨ 4
processor, the microarchitecture is significantly more complex
than any previous IA-32 microprocessor and the implementation
borrows almost nothing from any previous implementation. This
paper describes how we went about the task of finding bugs in the
Pentium¨ 4 processor design prior to initial silicon, and what we
found along the way.
General Terms
Management, Verification.
-
16.2 Nuts and Bolts of Core and SoC Verification [p. 249]
-
Ken Albin
Digital design at Motorola is performed at design centers
throughout the world, on projects with different design objectives,
executed on different time scales, by different sized teams with
different skill sets. This paper attempts to categorize these diverse
efforts and identify common threads: what works, what the
challenges are, and where we need to go.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design Aids
- simulation, verification. B.6.3 [Logic Design]: Design Aids -
simulation, verification. B.7.2 [Integrated Circuits]: Design Aids
- simulation, verification.
General Terms
Management, Documentation, Design, Economics, Verification.
Keywords
Verification, monitors, biased-random simulation, testbenches.
-
16.3 Teaching Future Verification Engineers: The Forgotten Side of Logic Design [p. 253]
-
Fusun Ozguner, Duane Marhefka, Joanne DeGroat, Bruce Wile, Jennifer Stofer,
Lyle Hanrahan
This paper describes a senior/graduate level course in hardware
logic verification being offered by The Ohio State University in
cooperation with IBM. The need for the course is established
through the growing importance of logic verification to users of
custom logic designs. We discuss the short-term and long-term
goals for the course, and describe the course content and format.
The course relies heavily on lab projects to illustrate the main
concepts. Three projects and a final project review are described.
Chair: Jan Rabaey
Organizers: Jan Rabaey, Anantha Chandrakasan
-
17.1 SoC Integration of Reusable Baseband Bluetooth IP [p. 256]
-
Torbjörn Grahm
This presentation will give a list of design criteria an ASIC
Design house need to look in the process of deciding to take the
complex Bluetooth specification and implement everything from
scratch or to integrate reusable Intellectual Property for
integration into their SoC.
The presentation also include experience from a typical embedded
development project where reusable Bluetooth Baseband
Intellectual Property both for HW and SW is used with the
Bluetooth Technology from Ericsson as example. This pper is a
compressed summary.
-
17.2 One-chip Bluetooth ASIC Challenges [p. 262]
-
Paul T. M. van Zeijl
This paper will describe the constraints, trade-offs and challenges for
designing a one-chip (RF/radio plus baseband integrated on the same
Si-die) compared to designing a saprate RF/radio ASIC.
Chair: Hiroto Yasuura
Organizers: Kazutoshi Wakabayashi, Rajesh K. Gupta
-
18.1 Transformations for the Synthesis and Optimization of Asynchronous Distributed Control [p. 263]
-
Michael Theobald, Steven M. Nowick
Asynchronous design has been the focus of renewed interest. However,
a key bottleneck is the lack of high-quality CAD tools for the
synthesis of large-scale systems which also allow design-space exploration.
This paper proposes a new synthesis method to address
this issue, based on transformations.
The method starts with a scheduled and resource-bounded
Control-Data Flow Graph (CDFG). Global transformations are first
applied to the entire CDFG, unoptimized controllers are then extracted,
and, finally, local transforms are applied to the individual
controllers. The result is a highly-optimized set of interacting distributed
controllers. The new transforms include aggressive timing and
area-oriented optimizations, several of which have not been previously
supported by existing asynchronous CAD tools.
As a case study, the method is applied to the well-known differential
equation solver synthesis benchmark. Results comparable to a
highly-optimized manual design by Yun et al. [26] can be obtained
by applying the new automated transformations. Such an implementation
cannot be obtained using existing asynchronous CAD tools.
-
s
18.2 Speculation Techniques for High Level Synthesis of Control Intensive Designs [p. 269]
-
Sumit Gupta, Nick Savoiu, Sunwoo Kim, Nikil Dutt, Rajesh K. Gupta, Alex
Nicolau
The quality of synthesis results for most high level synthesis
approaches is strongly affected by the choice of control ow
(through conditions and loops) in the input description. In
this paper, we explore the effectiveness of various types of
code motions, such as moving operations across conditionals,
out of conditionals (speculation) and into conditionals
(reverse speculation), and how they can be effectively directed
by heuristics so as to lead to improved synthesis results in
terms of fewer execution cycles and fewer number of
states in the finite state machine controller. We also study
the effects of the code motions on the area and latency of
the final synthesized netlist. Based on speculative code motions
, we present a novel way to perform early condition
execution that leads to significant improvements in highly
control-intensive designs. Overall, reductions of up to 38 %
in execution cycles are obtained with all the code motions
enabled.
-
18.3 Parallelizing DSP Nested Loops on Reconfigurable Architectures using Data Context Switching [p. 273]
-
Kiran Bondalapati
Reconfigurable architectures promise significant performance
and flexibility advantages over conventional architectures.
Automatic mapping techniques that exploit the features of
the hardware are needed to leverage the power of these architectures.
In this paper, we develop techniques for parallelizing nested loop
computations from digital signal processing (DSP) applications
onto high performance pipelined
configurations. We propose a novel data context switching
technique that exploits the embedded distributed memory
available in reconfigurable architectures to parallelize
such loops. Our technique is demonstrated on two diverse
state-of-the-art reconfigurable architectures, namely, Virtex
and the Chameleon Systems Reconfigurable Communications
Processor. Our techniques show significant performance
improvements on both architectures and also perform
better than state-of-the-art DSP and microprocessor architectures.
Keywords
Configurable Computing, Mapping Techniques, Loops
-
18.4 Using Symbolic Algebra in Algorithmic Level DSP Synthesis [p. 277]
-
Armita Peymandoust, Giovanni De Micheli
Current multimedia applications require the design of datapath
intensive circuits. Unfortunately, current design tools and
methods support design abstraction at a level that is inferior to the
expectation of designers. Namely, most arithmetic-level
optimizations are not supported and they are left to the designers'
ingenuity. In this paper, we show how symbolic algebra can be
used to construct an arithmetic-level decomposition algorithm.
We also introduce our tool, SymSyn, that performs arithmetic
library mapping and optimization of data-flow descriptions into
data paths using arithmetic components.
Chair: Satya Pullela
Organizers: Jaijeet Roychowdhury, Joel Phillips
-
19.1 Computing Logic-Stage Delays Using Circuit Simulation and Symbolic Elmore Analysis [p. 283]
-
Clayton B. McDonald, Randal E. Bryant
The computation of logic-stage delays is a fundamental sub-problem
for many EDA tasks. Although accurate delays can be obtained via
circuit simulation, we must estimate the input assignments that will
maximize the delay. With conventional methods, it is not feasible to
estimate the delay for all input assignments on large sub-networks,
so previous approaches have relied on heuristics. We present a
symbolic algorithm that enables efficient computation of the Elmore
delay under all input assignments and delay refinement using
circuit-simulation. We analyze the Elmore estimate with three
metrics using data extracted from symbolic timing simulations of
industrial circuits.
-
19.2 A New Gate Delay Model for Simultaneous Switching and Its Applications [p. 289]
-
Liang-Chi Chen, Sandeep K. Gupta, Melvin A. Breuer
We present a new model to capture the delay phenomena associated
with simultaneous to-controlling transitions. The proposed
delay model accurately captures the effect of the targeted delay phenomena
over a wide range of transition times and skews. It also captures
the effects of more variables than table lookup methods can
handle. The model helps improve the accuracy of static timing analysis,
incremental timing refinement, and timing-based ATPG.
-
19.3 Static Timing Analysis Including Power Supply Noise Effect on Propagation Delay in VLSI Circuits [p. 295]
-
Geng Bai, Sudhakar Bobba, Ibrahim N. Hajj
This paper presents techniques to include the effect of supply voltage
noise on the circuit propagation delay of a digital VLSI circuit.
The proposed methods rely on an input-independent approach to
calculate the logic gate's worst-case power supply noise. A quasistatic
timing analysis is then applied to derive a tight upper-bound
on the delay for a selected path with power supply noise effects.
This upper-bound can be further reduced by considering the logic
constraints and dependencies in the circuit. Experimental results
for ISCAS-85 benchmark circuits are presented using the techniques
described in the paper. HSPICE simulation results are also
used to validate our work.
Chair: Magdy S. Abadir
Organizers: Irith Pomeranz, Tim Cheng
-
20.1 Simulation-Based Test Algorithm Generation and Port Scheduling for Multi-Port Memories [p. 301]
-
Chi-Feng Wu, Chih-Tsun Huang, Kuo-Liang Cheng, Chih-Wea Wang, Cheng-Wen Wu
The paper presents a simulation-based test algorithm generation
and test scheduling methodology for multi-port memories. The
purpose is to minimize the testing time while keeping the test algorithm
in a simple and regular format for easy test generation,
fault diagnosis, and built-in self-test (BIST) circuit implementation.
Conventional functional fault models are used to generate
tests covering most defects. In addition, multi-port specific defects
are covered using structural fault models. Port-scheduling is introduced
to take advantage of the inherent parallelism among different
ports. Experimental results for commonly used multi-port
memories, including dual-port, four-port, and n-read-1-write memories,
have been obtained, showing that efficient test algorithms can
be generated and scheduled to meet different test bandwidth constraints.
Moreover, memories with more ports benefit more with
respect to testing time.
-
20.2 Improving Bus Test Via IDDT and Boundary Scan [p. 307]
-
Shih-yu Yang, Christos A. Papachristou, Massood Tabib-Azar
This paper presents a systematic test methodology targeting bus
line interconnect defects using IDDT testing and Boundary Scan.
Traditional test is unable to detect all possible defects, especially
timing-related faults. Open and short defects on interconnects
between embedded modules can be detected by IDDT testing.
Boundary Scan can provide accessibility to internal buses. A
statistical analysis is presented discussing the uncertain factors
due to process variations and power fluctuation. The
effectiveness of the proposed technique on shorts, opens or the
other non stuck-at fault type defects is also illustrated.
Keywords. IDDT, Current Test, Boundary Scan, Interconnect.
-
20.3 Fault Characterizations and Design-for-Testability Technique for Detecting IDDQ Faults in CMOS/BiCMOS Circuits [p. 313]
-
Kaamran Raahemifar, Majid Ahmadi
This paper provides the results of a simulation-based fault characterization
study of CMOS/BiCMOS logic families. We show that most of the shorts
cause IDDQ faults, while open defects result in delay or
stuck-open faults. We propose a design-for-testability techniques for
detecting short and bridging faults in CMOS/BiCMOS logic circuits. The
impact of this circuit modification on the behavior of the circuit in
normal mode is investigated.
-
20.4 Testing for Interconnect Crosstalk Defects Using On-Chip Embedded Processor Cores [p. 317]
-
Li Chen, Xiaoliang Bai, Sujit Dey
Crosstalk effects degrade the integrity of signals traveling on long
interconnects and must be addressed during manufacturing
testing. External testing for crosstalk is expensive due to the need
for high-speed testers. Built-in self-test, while eliminating the
need for a high-speed tester, may lead to excessive test overhead
as well as overly aggressive testing. To address this problem, we
propose a new software-based self-test methodology for system-on-
chips (SoC) based on embedded processors. It enables an onchip
embedded processor core to test for crosstalk in system-level
interconnects by executing a self-test program in the normal
operational mode of the SoC. We have demonstrated the
feasibility of this method by applying it to test the interconnects of
a processor-memory system. The defect coverage was evaluated
using a system-level crosstalk defect simulation method.
Chair and Organizer: Rob A. Rutenbar
Panel Members: Max Baron, Thomas Daniel, Rajeev Jayaraman, Zvi Or-Bach,
Jonathan Rose, Carl Sechen
-
-
There was a time - in the dim historical past - when foundries
actually made ASICs with only 5000 to 50,000 logic gates. But
FPGAs and CPLDs conquered those markets and pushed ASIC
silicon toward opportunities with more logic, volume, and speed.
Today's largest FPGAs approach the few-million-gate size of a
typical ASIC design, and continue to sprout embedded cores, such
as CPUs, memories, and interfaces. And given the risks of
nonworking nanometer silicon, FPGA costs and time-to-market
are looking awfully attractive. So, will FPGAs kill ASICs? ASIC
technologists certainly think not. ASICs are themselves sprouting
patches of programmable FPGA fabric, and pushing new realms
of size and especially speed. New tools claim to have tamed the
convergence problems of older ASIC flows. Is the future to be
found in a market full of FPGAs with ASIC-like cores? ASICs
with FPGA cores? Other exotic hybrids? Our panelists will share
their disagreements on these prognostications.
Chair: Phillip Restle
Organizers: Joel Phillips, Noel Menezes
-
22.1 Inductance 101: Modeling and Extraction [p. 323]
-
Michael W. Beattie, Lawrence T. Pileggi
Modeling magnetic interactions for onöchip
interconnect has become an issue of great interest for integrated
circuit design in recent years. This tutorial paper describes
the basic concepts of magnetic interaction, loop and
partial inductance, along with some of the high frequency effects
such as skin and proximity effect.
Index Terms÷Inductance, Magnetic Interaction, Interconnect
Modeling.
-
22.2 Inductance 101: Analysis and Design Issues [p. 329]
-
Kaushik Gala, David Blaauw, Junfeng Wang, Vladimir Zolotov, Min Zhao
With operating frequencies approaching the gigahertz range,
inductance is becoming an increasingly important consideration in
the design and analysis of on-chip interconnect. In this paper, we
give a tutorial overview of the analysis and design issues related to
on-chip inductance effects.We explain the complexity of the current
flow in VLSI circuits. We discuss the applicability of the PEEC
approach in a detailed circuit model of the signal and power grid
interconnect, switching devices, power pads and the package. Further,
we explain techniques that can be used to speed-up simulation
of the large PEEC model. We then discuss a simplified model
that uses the so-called loop inductance approach, and compare it
with the detailed model.We present experimental results, obtained
from simulations of industrial circuits, for both the PEEC and loop
models. We also cover design techniques that can help tackle the
on-chip inductance issues.
-
22.3 Modeling Magnetic Coupling for OnöChip Interconnect [p. 335]
-
Michael W. Beattie, Lawrence T. Pileggi
As advances in IC technologies and operating
frequencies make the modeling of onöchip magnetic interactions
a necessity, it is apparent that extension of traditional
inductance extraction approaches to full-chip scale problems
is impractical. There are primarily two obstacles to performing
inductance extraction with the same efficacy as full-chip
capacitance extraction: 1) neglecting far-away coupling
terms can generate an unstable inductance matrix approximation;
and 2) the penetrating nature of inductance makes localized
extraction via windowing extremely difficult. In this
paper we propose and contrast three new options for stable
and accurate window-based extraction of large-scale magnetic
coupling. We analyze the required window sizes to
consider the possibilities for pattern-matching style solutions,
and propose three schemes for determining coupling
values and window sizing for extraction via on-the-fly field
solution.
Index Terms÷Inductance, Susceptance, Magnetic Interaction,
Interconnect Modeling.
-
22.4 Min/max On-Chip Inductance Models and Delay Metrics [p. 341]
-
Yi-Chang Lu, Mustafa Celik , Tak Young, Lawrence T. Pileggi
This paper proposes an analytical inductance extraction model for
characterizing min/max values of typical on-chip global interconnect
structures, and a corresponding delay metric that can be used
to provide RLC delay prediction from physical geometries. The
model extraction and analysis is efficient enough to be used within
optimization and physical design exploration loops. The analytical
min/max inductance approximations also provide insight into the
effects caused by inductances.
Keywords
Inductance, rlc tree, delay, on-chip interconnect modeling, physical
design.
Chair: Daniel Connors
Organizers: Dirk Grunwald, Marco Di Natale
-
23.1 Utilizing Memory Bandwidth in DSP Embedded Processors [p. 347]
-
Catherine H. Gebotys
This paper presents a network flow approach to solving the
register binding and allocation problem for multiword memory
access DSP processors. In recently announced DSP processors,
such as Star*core, sixteen bit instructions which simultaneously
access four words from memory are supported. A polynomial-time
network flow methodology is used to allocate multiword accesses
while minimizing code size. Results show that improvements of
up to 87% in terms of memory bandwidth (and up to 30%
reduction in energy dissipation) are obtained compared to
compiler-generated DSP code. This research is important for
industry since this value-added technique can increase memory
bandwidths and minimize code size without increasing cost.
-
23.2 Address Code Generation for Digital Signal Processors [p. 353]
-
Sathishkumar Udayanarayanan, Chaitali Chakrabarti
In this paper we propose a procedure to generate code with minimum number
of addressing instructions. We analyze different methods of generating
addressing code for scalar variables and quantify the improvements due to
optimizations such as offset assignment, modify register optimization and
address register assignment. We propose an offset assignment heuristic that
uses k address registers, an optimal dynamic programming algorithm for
modify register optimization, and an optimal formulation and a heuristic
algorithm for the address register assignment problem.
-
23.3 Reducing Memory Requirements of Nested Loops for Embedded Systems [p. 359]
-
J. Ramanujam, Jinpyo Hong, Mahmut Kandemir, A. Narayan
Most embedded systems have limited amount of memory. In contrast, the
memory requirements of code (in particular loops) running on embedded
systems is significant. This paper addresses the problem of estimating the
amount of memory needed for transfers of data in embedded systems. The
problem of estimating the region associated with a statement or the set of
elements referenced by a statement during the execution of the entire set
of nested loops is analyzed. A quantitative analysis of the number of elements
referenced is presented; exact expressions for uniformly generated
references and a close upper and lower bound for non-uniformly generated
references are derived. In addition to presenting an algorithm that computes
the total memory required, we discuss the effect of transformations on the
lifetimes of array variables, i.e., the time between the first and last accesses
to a given array location. A detailed analysis on the effect of unimodular
transformations on data locality including the calculation of the maximum
window size is discussed. The term maximum window size is introduced
and quantitative expressions are derived to compute the window size. The
smaller the value of the maximum window size, the higher the amount of
data locality in the loop.
-
23.4 Detection of Partially Simultaneously Alive Signals in Storage Requirement Estimation for Data
Intensive Applications [p. 365]
-
Per Gunnar Kjeldsberg, Francky Catthoor, Einar J. Aas
In this paper, we propose a novel storage requirement estimation
methodology for use in the early system design phases when the
data transfer ordering is only partially fixed. At that stage, none of
the existing estimation tools are adequate, as they either assume a
fully specified execution order or ignore it completely. Using representative
application demonstrators, we show how our technique
can effectively guide the designer to achieve a transformed specification
with low storage requirement.
Chair: Peichen Pan
Organizers: Jason Cong, Malgorzata Marek-Sadowska
-
24.1 A New Structural Pattern Matching Algorithm for Technology Mapping [p. 371]
-
Min Zhao, Sachin S. Sapatnekar
In this paper, a new structural matching algorithm for technology
mapping is proposed. The algorithm is based on a
key observation that the matches for a node in a subject
Boolean network are related to the matches for its children.
The structural relationships between the library cells are
modeled using a lookup table. The proposed method is fast,
has low memory usage, and is easy to implement. Experimental
results show speedups of 20x over Matsunaga's fast
mapping approach, and orders of magnitude over SIS, with
the same or slightly better results, and much lower memory
utilization.
-
24.2 Technology Mapping for SOI Domino Logic Incorporating Solutions for the Parasitic Bipolar Effect [p. 377]
-
Srirang K. Karandikar, Sachin S. Sapatnekar
We present a technology mapping algorithm for implementing a
random logic gate network in domino logic. The target technology
of implementation is Silicon on Insulator (SOI). SOI devices exhibit
an effect known as Parasitic Bipolar Effect (PBE), which can
lead to incorrect logic values in the circuit. Our algorithm solves
the technology mapping problem by permitting several transformations
during the mapping process in order to avoid the PBE, such as
transistor reordering, altering the way transistors are organized into
gates, and adding pmos discharge transistors. We minimize the total
cost of implementation, which includes the discharge transistors
required for correct functioning. Our algorithm generates solutions
that reduce the number of discharge transistors needed by 44.23%,
and reduces the size of the final solution by 11.66% on average.
-
24.3 Latency and Latch Count Minimization in Wave Steered Circuits [p. 383]
-
Amit Singh, Arindam Mukherjee, Malgorzata Marek-Sadowska
Wave Steering is a new design methodology that realizes high
throughput circuits by embedding layout friendly synthesized
structures in silicon. Wave Steered circuits inherently utilize
latches in order to guarantee the correct signal arrival times at the
inputs of these synthesized structures and maintain the high
throughput of operation. In this paper, we show a method of reordering
signals to achieve minimum circuit latency for Wave
Steered circuits and propose an Integer Linear Programming(ILP)
formulation for scheduling and retiming these circuits to minimize
the number of latches for minimum latency. Experimental results
show that in 0.25mm CMOS technology, as much as 33.2% reduction
in latch count, at minimum latency, can be achieved over
unoptimized Wave Steered circuits operating at 500 MHz.
-
24.4 Performance-Driven Multi-Level Clustering with Application to Hierarchical FPGA Mapping [p. 389]
-
Jason Cong, Michail Romesis
In this paper, we study the problem of performance-driven
multi-level circuit clustering with application to hierarchical
FPGA designs. We first show that the performance-driven
multi-level clustering problem is NP-hard (in contrast to the
fact that single-level performance-driven clustering can be
solved in polynomial time optimally). Then, we present an
efficient heuristic for two-level clustering for delay
minimization. It can also provide area-delay trade-off by
controlling the amount of node duplication. The algorithm is
applied to Alteraâs latest APEX FPGA architecture which has
a two-level hierarchy. Experimental results with
combinational circuits show that with our performance-driven
two-level clustering solution we can improve the circuit
performance produced by the Quartus Design System from
Altera by an average of 15% for APEX devices measured in
terms of delay after final layout. To our knowledge this is the
first in-depth study for the performance-driven multi-level
circuit clustering problem.
Chair: Satish Venkatesan
Organizers: Noel Menezes, Vivek Tiwari
-
25.1 Application of Constraint-Based Heuristics in Collaborative Design [p. 395]
-
Juan A. Carballo, Stephen W. Director
Significant acceleration of today.s complex collaborative design
processes can be achieved if team members are able to apply search
heuristics that consider the simultaneous effect of all design
constraints. We present the Active approach to Design Process
Management (ADPM), whereby designers receive constraint-based
feedback that enables them to apply these search heuristics
effectively. To evaluate ADPM, we developed a design process
evaluation environment called TeamSim. Evaluation results suggest
that ADPM can reduce costly design iterations at the expense of
extra, less costly, verification tool executions.
-
25.2 A Universal Client for Distributed Networked Design and Computing [p. 401]
-
Franc Brglez, Hemang Lavana
We introduce a universal client (OmniFlow) whose GUI can
be readily configured by the user to invoke any number of
applications, concurrently or sequentially, anywhere on the
network. The design and the implementation of the client is
based on the principles of taskflow-oriented programming,
whereby we merge concepts from structured programming,
hardware description, and mark-up languages. A mark-up
language such as XML supports a well-de¯ned schema that
captures the decomposition of a program into a hierarchy of
tasks, each representing an instance of a blackbox or a whitebox
software component. The HDL-like input/output port
definitions capture data-task-data dependencies. A highly
interactive hierarchical GUI, rendered from the hierarchical
taskflow descriptions in extended XML, supports structured
programming language constructs to control sequences of
task synchronization, execution, repetition, and abort.
Experimental evaluations of the prototype, up to 9150
tasks and the longest path of 1600 tasks, demonstrate the
scalability of the environment and the overall effectiveness of
the proposed architecture for a number of networked design
and computing projects.
-
25.3 Hypermedia-Aided Design [p. 407]
-
Darko Kirovski, Milenko Drinic, Miodrag Potkonjak
Recently, the Internet revolutionized many activities from entertainment
to marketing and business. Two key underlying Internet
technologies, efficient data delivery and hypertext, demonstrated
exceptional potential as new application enablers. In this
paper, we present a novel Hypermedia-Aided Design (HAD)
collaboration framework that facilitates new communication
and data presentation paradigms to improve the effectiveness of
typical EDA applications. The framework leverages on the advantages
of using semantic multicast as a communication backbone
and quantized hypermedia presentations as an efficient
data organization, retrieval, and presentation model. Semantic
multicast is a global communication tool that relies on an internetwork
of proxies to provide content discovery and semantics based
profile-driven data dissemination services. We introduce
the notion of a quant, an atomic interactive multimedia information
primitive with embedded hyperlinks. We demonstrate how
interest-specific quant retrieval and concatenation can enable
more focused collaboration.
The HAD framework consists of a set of applications for
student(designer)-centered CAD education(consulting), collaborative
design and debugging, I-commerce, and technical support.
In all applications, quant-based presentations enable that
theoretical and practical components are tailored according to
user's interest and performances. The conceptualized and implemented
applications act in synergy with existing software,
hardware, and system design tools.
-
25.4 A Framework for Object Oriented Hardware Specification, Verification, and Synthesis [p. 413]
-
T. Kuhn, T. Oppold, M. Winterholer, W. Rosenstiel, Mark Edwards, Yaron
Kashai
We describe two things. First, we present a uniform framework
for object oriented specification and verification of hardware. For
this purpose the object oriented language Îeâ is introduced along
with a powerful run-time environment that enables the designer to
perform the verification task. Second, we present an object
oriented synthesis that enhances Îeâ and its dedicated run-time
environment into a framework for specification, verification, and
synthesis. The usability of our approach is demonstrated by realworld
examples.
Keywords: Object oriented hardware modeling, verification, high-level
synthesis.
Chair: Georges Gielen
Organizers: Mike Sottak, Mike Murray, Linda Kaye
Panel Members: Mar Hershenson, Ken Kundert, Philippe Magarshack,
Akria Matsuzawa, Ronald A. Rohrer, Ping Yang
-
-
Despite the fact that more and more electronic design is
comprised of analog and mixed signal content, the design flows
and methodologies in this area are lagging behind the pace of
innovation in digital design. For sure, analog designers are in
shorter supply, but this only makes the need for improvements
and efficiency that much greater. Only of late have we seen
production-worth attempts at functions such as analog synthesis
and optimization reach the market. What is needed in today's
analog design flow? What are the key technologies that are
missing? How does the existing 'food chain' need to work together
to drive greater efficiencies?
A distinguished panel of suppliers and users assemble to discuss,
analyze and predict.
Chair: Bryan Ackland
Organizer: Kurt Keutzer
-
27.1 Achieving 550 Mhz in an ASIC Methodology [p. 420]
-
D. G. Chinnery, B. Nikolic, Kurt Keutzer
Typically, good automated ASIC designs may be two to five times
slower than handcrafted custom designs. At last year's DAC this
was examined and causes of the speed gap between custom
circuits and ASICs were identified. In particular, faster custom
speeds are achieved by a combination of factors: good
architecture with well-balanced pipelines; compact logic design;
timing overhead minimization; careful floorplanning, partitioning
and placement; dynamic logic; post-layout transistor and wire
sizing; and speed binning of chips. Closing the speed gap requires
improving these same factors in ASICs, as far as possible. In this
paper we examine a practical example of how these factors may
be improved in ASICs. In particular we show how techniques
commonly found in custom design were applied to design a high speed
550 MHz disk drive read channel in an ASIC design flow.
General Terms
Performance, design.
Keywords
ASIC, clock, frequency, speed, throughput, comparison, custom.
-
27.2 A Semi-Custom Design Flow in High-Performance Microprocessor Design [p. 426]
-
Gregory A. Northrop, Pong-Fei Lu
In this paper we present techniques shown to significantly
enhance the custom circuit design process typical of high performance
microprocessors. This methodology combines
flexible custom circuit design with automated tuning and
physical design tools to provide new opportunities to optimized
design throughout the development cycle.
Keywords
Standard cell, circuit tuning, custom design, methodology.
-
27.3 Reducing the Frequency Gap Between ASIC and Custom Designs: A Custom Perspective [p. 432]
-
Stephen E. Rich, Matthew J. Parker, Jim Schwartz
This paper proposes that the ability to control the difference
between the simulated and actual frequencies of a design is a key
strategy to achieving high frequency in both ASIC and custom
designs. We will examine this principle and the methodologies
that can be deployed to manage this gap.
Chair: Marco Di Natale
Organizers: Donatella Sciuto, Luciano Lavagno
-
28.1 Low-Energy Intra-Task Voltage Scheduling Using Static Timing Analysis [p. 438]
-
Dongkun Shin, Jihong Kim, Seongsoo Lee
We propose an intra-task voltage scheduling algorithm for low energy
hard real-time applications. Based on a static timing analysis
technique, the proposed algorithm controls the supply voltage
within an individual task boundary. By fully exploiting all
the slack times, a scheduled program by the proposed algorithm
always complete its execution near the deadline, thus achieving a
high energy reduction ratio. In order to validate the effectiveness of
the proposed algorithm, we built a software tool that automatically
converts a DVS-unaware program into an equivalent low-energy
program. Experimental results show that the low-energy version
of an MPEG-4 encoder/decoder (converted by the software tool)
consumes less than 7~25% of the original program running on a
fixed-voltage system with a power-down mode.
-
28.2 Battery-Aware Static Scheduling for Distributed Real-Time Embedded Systems [p. 444]
-
Jiong Luo, Niraj K. Jha
This paper addresses battery-aware static scheduling in battery powered
distributed real-time embedded systems. As suggested by
previous work, reducing the discharge current level and shaping
its distribution are essential for extending the battery lifespan. We
propose two battery-aware static scheduling schemes. The first
one optimizes the discharge power profile in order to maximize
the utilization of the battery capacity. The second one targets
distributed systems composed of voltage-scalable processing
elements (PEs). It performs variable-voltage scheduling via
efficient slack time re-allocation, which helps reduce the average
discharge power consumption as well as flatten the discharge
power profile. Both schemes guarantee the hard real-time
constraints and precedence relationships in the real-time
distributed embedded system specification. Based on previous
work, we develop a battery lifespan evaluation metric which is
aware of the shape of the discharge power profile. Our
experimental results show that the battery lifespan can be
increased by up to 29% by optimizing the discharge power file
alone. Our variable-voltage scheme increases the battery lifespan
by up to 76% over the non-voltage-scalable scheme and by up to
56% over the variable-voltage scheme without slack-time reallocation.
-
28.3 An Approach to Incremental Design of Distributed Embedded Systems [p. 450]
-
Paul Pop, Petru Eles, Traian Pop, Zebo Peng
In this paper we present an approach to incremental design of distributed
embedded systems for hard real-time applications. We start
from an already existing system running a set of applications and the
design problem is to implement new functionality so that the already
running applications are not disturbed and there is a good chance
that, later, new functionality can easily be added to the resulted system.
The mapping and scheduling problem are considered in the context
of a realistic communication model based on a TDMA protocol.
Chair: Miodrag Potkonjak
Organizers: Kazutoshi Wakabayashi, Rajesh K. Gupta
-
29.1 Signal Representation Guided Synthesis Using Carry-Save Adders For Synchronous Data-path Circuits [p. 456]
-
Zhan Yu, Meng-Lin Yu, Alan N. Willson, Jr
Arithmetic transformations using carry-save adders have been exploited
recently in design automation but existing transformation approaches
only optimize combinatorial functions. Most applications
need synchronous circuits and it is known that techniques that move
the positions of the registers, such as retiming, can significantly reduce
the cycle time of a synchronous circuit. However, retiming disregards
arithmetic transformations and its power is limited by the circuit
topology. This work is the first to exploit carry-save arithmetic
transformations together with the moving of the register positions.
To enable such transformations, we first propose the use of a new
multiple-vector signal representation. Next, we use multiple-vector
signal representation as a common guide for all of our simultaneous
carry-save arithmetic transformations with the moving of the register
positions. Specifically, we propose, operation forward and operation
backward carry-save transformations, which are transformations
across register boundaries. We also propose operation duplicate and
operation merge transformations to exploit the resource sharing and
timing trade-offs in the implementation of a multiple-fanout network.
Finally, we propose an efficient and effective heuristic that selectively
applies a sequence of transformations to optimize the timing and the
area of a synchronous circuit. Experimental results show that the
proposed techniques significantly out-perform previous approaches.
-
29.2 Improved Merging of Datapath Operators using Information Content and Required Precision Analysis [p. 462]
-
Anmol Mathur, Sanjeev Saluja
We introduce the notions of required precision and
information content of datapath signals and use them to define
functionally safe transformations on data ow graphs. These
transformations reduce widths of datapath operators and enhance
their mergeability. Using efficient algorithms to compute required
precision and information content of signals, we dene a new algorithm
for partitioning a data ow graph consisting of datapath
operators into mergeable clusters. Experimental results indicate
that use of our clustering algorithm for operator merging based
synthesis of datapath intensive designs, can lead to significant
improvement in the delay and area of the implementation.
-
29.3 Digital Filter Synthesis Based on Minimal Signed Digit Representation [p. 468]
-
In-Cheol Park, Hyeong-Ju Kang
As the complexity of digital filters is dominated by the number of
multiplications, many works have focused on minimizing the complexity
of multiplier blocks at compute the constant coefficient multiplications
required in filters. The complexity of multiplier blocks can be significantly
reduced by using an efficient number system. Although the canonical signed
digit representation is commonly used as it guarantees the minimal number
of additions for a constant multiplications, we propose in the paper a
digital filter synthesis algorithm that is based on the minimal signed
digit (MSD) representation. The MSD representation is attractive because it
provides a number of forms that have the minimal number of non-zero
digits for a constant. This redudancy can lead to efficient filters if
a proper MSD representation is selected for each constant. In experimental
results, the proposed algorithm resulted in superior filters to those
generated from the CSD representation.
Chair: Ian R. Mackintosh
Organizers: Kenji Yoshida, Majid Sarrafzadeh
-
30.1 Publicly Detectable Techniques for the Protection of Virtual Components [p. 474]
-
Gang Qu
Highlighted with the newly released intellectual property (IP) protection
white paper by VSI Alliance, the protection of virtual components
(VCs) has received a large amount of attention recently.
Digital signature is one of the most promising solutions among
the known protection mechanisms. However, the trade-off between
hard-to-attack and easy-to-detect and the lack of efficient detection
schemes are the major obstacles for digital signatures to thrive. In
this paper, we propose a new watermarking method which (i) allows
the watermark to be public detected without forensic experts,
(ii) gives little advantage to attackers for forgery, and (iii) does
not lose the strength of protection provided by other watermarking
techniques. The basic idea is to make part of the watermark
public. We explain the concept of this public-private watermark
and discuss the generation and embedding of such marks. We use
popular VLSI CAD problems, namely technology mapping, partitioning,
graph coloring, FPGA design, and Boolean satisfiability, to
demonstrate its easy detectability, high credibility, low design overhead,
and robustness. Finally, this technique is compatible with all
the known watermarking and fingerprinting techniques.
-
30.2 Watermarking of SAT using Combinatorial Isolation Lemmas [p. 480]
-
Rupak Majumdar, Jennifer L. Wong
Watermarking of hardware and software designs is an effective
mechanism for intellectual property protection (IPP). Two important
criteria for watermarking schemes are credibility and fairness.
In this paper, we present the unique solution-based watermarking
technique which provides, in a sense, the ultimate answer to both
credibility and fairness requirements. Leveraging on a combinatorial
theorem of Valiant and Vazirani, we demonstrate how ultimate
credibility and complete fairness can almost always be achieved
with high probability during the watermarking of the solution of the
satisfiability (SAT) problem. The effectiveness of the technique is
demonstrated on both specially created examples where the number
of solutions is known, as well as on common CAD and operation
research SAT instances.
-
30.3 Watermarking Graph Partitioning Solutions [p. 486]
-
Greg Wolfe, Jennifer L. Wong, Miodrag Potkonjak
Trends in the semiconductor industry towards extensive design
and code reuse motivate a need for adequate Intellectual Property
Protection (IPP) schemes. We offer a new general IPP scheme
called constraint-based watermarking and analyze it in the context
of the graph partitioning problem. Graph partitioning is a critical
optimization problem that has many applications, particularly in the
semiconductor design process. Our IPP technique for graph partitioning
watermarks solutions to graph partitioning problems so that
they carry an authorâs signature. Our technique is transparent to the
actual CAD tool which does the partitioning. Our technique produces
solutions that have very low quality degradation levels, yet
carry signatures that are convincingly unambiguous, extremely unlikely
to be present by coincidence, and difficult to detect or remove
without completely resolving the partitioning problem.
-
30.4 Hardware Metering [p. 490]
-
Farinaz Koushanfar, Gang Qu
We introduce the first hardware metering scheme that enables reliable
low overhead proofs for the number of manufactured parts.
The key idea is to make each design slightly different. Therefore, if
two identical hardware designs or a design that is not reported by
the foundry are detected, the design house has proof of misconduct.
We start by establishing the connection between the requirements
for hardware and synthesis process. Furthermore, we
present mathematical analysis of statistical accuracy of the proposed
hardware metering scheme. The effectiveness of the metering
scheme is demonstrated on a number of designs.
Chair: Chandu Visweswariah
Organizers: Chandu Visweswariah, Majid Sarrafzadeh
-
31.1 Technical Visualizations in VLSI Design [p. 494]
-
Phillip J. Restle
Visualization techniques were applied to several different types
of VLSI design and simulation data. A number of different
visualizations have been tried, with varying results. Examples
include 3D visualization of voltage and currents from fullwave
interconnect analysis, on-chip clock distribution networks,
chip/package power supply noise analysis, wire congestion, chip
layout imaging, and static circuit tuning. The goals, successes,
and failures of these examples will be discussed, along with
some unexpected benefits from our ability to easily see patterns
in complex visualizations.
General Terms
Design, Human Factors.
Keywords
Visualization, VLSI Design, Interconnect, Inductance, Clock
Distribution, Circuit Optimization.
-
31.2 Using Texture Mapping with Mipmapping to Render a VLSI Layout [p. 500]
-
Jeff Solomon, Mark Horowitz
This paper presents a method of using texture mapping with mipmapping
to render a VLSI layout. Texture mapping is used to save
already rasterized areas of the layout from frame to frame, and to
take advantage of any hardware accelerated capabilities of the host
platform. Mipmapping is used to select which textures to display
so that the amount of information sent to the display is bounded,
and the image rendered on the display is filtered correctly. Additionally,
two caching schemes are employed. The first, used to
bound memory consumption, is a general purpose cache that holds
textures spatially close to the userâs current viewpoint. The second,
used to speed up the rendering process, is a cache of heavily used
sub-designs that are precomputed so rasterization on the fly is not
necessary.
An experimental implementation shows that real-time navigation
can be achieved on arbitrarily large designs. Results also show how
this technique ensures that image quality does not degrade as the
number of polygons drawn increases, avoiding the aliasing artifacts
common in other layout systems.
Categories and Subject Descriptors
J.6 [Computer-Aided Engineering]: [Computer-Aided Design];
I.3.3 [Computer Graphics]: Picture/Image Generation÷display
algorithms
Keywords
texture mapping, mipmapping, VLSI layout editor
-
31.3 Web-based Algorithm Animation [p. 506]
-
Marc Najork
This paper describes JCAT, a web-based algorithm animation
system. JCAT combines the expressive power of web
pages for publishing passive multimedia content with a full-fledged
interactive algorithm animation system that includes
a rich set of libraries for creating 2D and 3D animations.
The paper describes in detail how an algorithm animation
is authored, and it presents a sample of existing animations.
Chair: Kees Vissers
Organizers: Kurt Keutzer, Dirk Grunwald
-
32.1 Speeding Up Control-Dominated Applications through Microarchitectural Customizations
in Embedded Processors [p. 512]
-
Peter Petrov, Alex Orailoglu
We present a methodology for microarchitectural customization of
embedded processors by exploiting application information, thus attaining
the twin benefits of processor standardization and application specific
customization. Such powerful techniques enable increased
application fragments to be placed on the processor, with no sacrifice
in system requirements, thus reducing the custom hardware and
the concomitant area requirements in SOCs. We illustrate these ideas
through the branch resolution problem, known to impose severe performance
degradation on control-dominated embedded applications.
A low-cost late customizable hardware that uses application information
to fold out a set of frequently executed branches is described.
Experimental results show that for a representative set of control dominated
applications a reduction in the range of 7%-22% in processor
cycles can be achieved, thus extending the scope of low-cost embedded
processors in complex co-designs for control intensive systems.
-
32.2 Automatic Generation of Application-Specific Architectures for Heterogeneous Multiprocessor
System-on-Chip [p. 518]
-
Damien Lyonnard, Sungjoo Yoo, Amer Baghdadi, Ahmed A. Jerraya
We present a design flow for the generation of application-specific
multiprocessor architectures. In the flow, architectural parameters
are first extracted from a high-level system specification. Parameters
are used to instantiate architectural components, such as
processors, memory modules and communication networks. The
flow includes the automatic generation of communication coprocessor
that adapts the processor to the communication network in
an application-specific way. Experiments with two system examples
show the effectiveness of the presented design flow.
-
32.3 Dynamic Voltage Scaling and Power Management for Portable Systems [p. 524]
-
Tajana Simunic, Luca Benini, Andrea Acquaviva, Peter Glynn, Giovanni De Micheli
Portable systems require long battery lifetime while still delivering
high performance. Dynamic voltage scaling (DVS) algorithms
reduce energy consumption by changing processor speed and voltage
at run-time depending on the needs of the applications running.
Dynamic power management (DPM) policies trade off the performance
for the power consumption by selectively placing components
into low-power states. In this work we extend the DPM model
presented in [2, 3] with a DVS algorithm, thus enabling larger
power savings. We test our approach on MPEG video and MP3
audio algorithms running on the SmartBadge portable device [1].
Our results show savings of a factor of three in energy consumption
for combined DVS and DPM approaches.
Chair: Karem A. Sakallah
Organizers: Andrew B. Kahng, Kurt Keutzer, Limor Fix
-
33.1 Chaff: Engineering an Efficient SAT Solver [p. 530]
-
Matthew W. Moskewicz, Conor F. Madigan, Ying Zhao, Lintao Zhang, Sharad Malik
Boolean Satisfiability is probably the most studied of
combinatorial optimization/search problems. Significant effort
has been devoted to trying to provide practical solutions to this
problem for problem instances encountered in a range of
applications in Electronic Design Automation (EDA), as well as
in Artificial Intelligence (AI). This study has culminated in the
development of several SAT packages, both proprietary and in
the public domain (e.g. GRASP, SATO) which find significant
use in both research and industry. Most existing complete solvers
are variants of the Davis-Putnam (DP) search algorithm. In this
paper we describe the development of a new complete solver,
Chaff, which achieves significant performance gains through
careful engineering of all aspects of the search ö especially a
particularly efficient implementation of Boolean constraint
propagation (BCP) and a novel low overhead decision strategy.
Chaff has been able to obtain one to two orders of magnitude
performance improvement on difficult SAT benchmarks in
comparison with other solvers (DP or otherwise), including
GRASP and SATO.
Categories and Subject Descriptors
J6 [Computer-Aided Engineering]: Computer-Aided Design.
General Terms
Algorithms, Verification.
Keywords
Boolean satisfiability, design verification.
-
33.2 Dynamic Detection and Removal of Inactive Clauses in SAT with Application in Image Computation [p. 536]
-
Aarti Gupta, Anubhav Gupta, Zijiang Yang, Pranav Ashar
In this paper, we present a new technique for the efficient dynamic
detection and removal of inactive clauses, i.e. clauses
that do not affect the solutions of interest of a Boolean
Satisfiability (SAT) problem. The algorithm is based on the extraction
of gate connectivity information during generation
of the Boolean formula from the circuit, and its use in the
inner loop of a branch-and-bound SAT algorithm. The motivation
for this optimization is to exploit the circuit structure
information, which can be used to find unobservable
gates at circuit outputs under dynamic conditions. It has
the potential to speed up all applications of SAT in which
the SAT formula is derived from a logic circuit. In particular
, we find that it has considerable impact on an image
computation algorithm based on SAT. We present practical
results for benchmark circuits which show that the use of
this optimization consistently improves the performance for
reachability analysis, in some cases enabling the prototype
tool to reach more states than otherwise possible.
-
33.3 SATIRE: A New Incremental Satisfiability Engine [p. 542]
-
Jesse Whittemore, Joonyoung Kim, Karem Sakallah
We introduce SATIRE, a new satisfiability solver that is particularly
suited to verification and optimization problems in electronic design
automation. SATIRE builds on the most recent advances in
satisfiability research, and includes two new features to achieve
even higher performance: a facility for incrementally solving sets of
related problems, and the ability to handle non-CNF constraints.
We provide experimental evidence showing the effectiveness of
these additions to classical satisfiability solvers.
-
33.4 A Framework for Low Complexity Static Learning [p. 546]
-
Emil Gizdarski, Hideo Fujiwara
In this paper, we present a new data structure for a complete
implication graph and two techniques for low complexity static
learning. We show that using static indirect ªé-implications and
super gate extraction some hard-to-detect static and dynamic
indirect implications are easily derived during static and dynamic
learning as well as branch and bound search. Experimental
results demonstrated the effectiveness of the proposed data
structure and learning techniques.
Chair: L. Miguel Silveira
Organizers: Joel Phillips, Mustafa Celik
-
34.1 Fast Power/Ground Network Optimization Based on Equivalent Circuit Modeling [p. 550]
-
X.-D. Sheldon Tan, C.-J. Richard Shi
This paper presents an efficient algorithm for optimizing the area of power
or ground networks in integrated circuits subject to the reliability constraints.
Instead of solving the original power/ground networks extracted from circuit
layouts as previous methods did, the new method first builds the equivalent
models for many series resistors in the original networks, then the
sequence of linear programming method [0] is used to solve the simplified
networks. The solutions of the original networks then are back solved from
the optimized, simplified networks. The new algorithm simply exploits the
regularities in the power/ground networks. Experimental results show that
the complexities of simplified networks are typically significantly smaller
than that of the original circuits, which renders the new algorithm extremely
fast. For instance, power/ground networks with more than one million branches
can be sized in a few minutes on modern SUN workstations.
-
34.2 An Interconnect Energy Model Considering Coupling Effects [p. 555]
-
Taku Uchino, Jason Cong
This paper first presents an analytical interconnect energy model
with consideration of event coupling, which is not considered by
the conventional 1/2 CV2 model. Our energy calculation
algorithm has the same time complexity as the 1/2 CV2 model,
and is several orders of magnitude faster than HSPICE with less than
5% error. In comparison, the error of the 1/2 CV2 model can
be as high as 100%
-
34.3 Efficient Large-Scale Power Grid Analysis Based on Preconditioned Krylov-Subspace Iterative Methods [p. 559]
-
Tsung-Hao Chen, Charlie Chung-Ping Chen
In this paper, we propose preconditioned Krylov-subspace iterative
methods to perform efficient DC and transient simulations
for large-scale linear circuits with an emphasis on power delivery
circuits. We also prove that a circuit with inductors can be simplified
from MNA to NA format, and the matrix becomes an s.p.d
matrix. This property makes it suitable for the conjugate gradient
with incomplete Cholesky decomposition as the preconditioner,
which is faster than other direct and iterative methods. Extensive
experimental results on large-scale industrial power grid circuits
show that our method is over 200 times faster for DC analysis
and around 10 times faster for transient simulation compared to
SPICE3. Furthermore, our algorithm reduces over 75% of memory
usage than SPICE3 while the accuracy is not compromised.
-
34.4 Using Conduction Modes Basis Functions for Efficient Electromagnetic Analysis of On-Chip and
Off-Chip Interconnect [p. 563]
-
Luca Daniel, Alberto Sangiovanni-Vincentelli, Jacob White
In this paper, we present an efficient method to model the interior
of the conductors in a quasi-static or full-wave integral equation
solver. We show how interconnect cross-sectional current distributions
can be modeled using a small number of conduction modes as
basis functions for the discretization of the Mixed Potential Integral
Equation (MPIE). Two examples are presented to demonstrate the
computational attractiveness of our method. In particular, we show
how our new approach can successfully and efficiently capture skin
effects, proximity effects and transmission line resonances.
-
34.5 Analysis of Non-Uniform Temperature-Dependent Interconnect Performance in High Performance ICs [p. 567]
-
Amir H. Ajami, Kaustav Banerjee, Massoud Pedram, Lukas P.P.P. van Ginneken
Non-uniform temperature profiles along global interconnect lines in
high-performance ICs can significantly impact the performance of
these lines. This paper presents a detailed analysis and modeling of
the interconnect performance degradation due to non-uniform
temperature profiles that exist along their lengths, which in turn arise
due to the thermal gradients in the underlying substrate. A non-uniform
temperature-dependent distributed RC interconnect delay model is proposed
for the first time. The model has been applied to a wide variety of
interconnect layouts and temperature distributions to quantify the impact
on signal integrity issues including clock skew fluctuations.
Chair: Yosinori Watanabe
Organizers: Anand Raghunathan, Shin-ichi Minato
-
35.1 VHDL-Based Design and Design Methodology for Reusable High Performance Direct Digital Frequency
Synthesizers [p. 573]
-
Ireneusz Janiszewski, Bernhard Hoppe, Hermann Meuth
Design methodologies for high performance Direct Digital Frequency
Synthesizers (DDFS) are described. Traditional look-up
tables (LUT) for sine and cosine are merged with CORDIC-interpolation
into a hybrid architecture. This implements DDFS-systems
with high resolution without being specific to a particular
target technology. Amplitude constants were obtained from mathematical
trigonometric functions of the IEEE math_real package.
These constants were then written via simulation of a VHDL
model into a fully synthesizable package. Systematic and detailed
studies varying the synthesizerâs inherent parameters lead to a
design optimum of the LUT/CORDIC-ratio, which minimizes
power and silicon area for a given clock frequency.
Categories and Subject Descriptors
M1.5: Logic and high-level synthesis and optimization
General Terms
Algorithms, Design, Languages, Theory.
Keywords
HDL-based Design, Direct Frequency Synthesis, Design Optimization
and Reuse, CORDIC Algorithm.
-
35.2 Concurrent Error Detection of Fault-Based Side-Channel Cryptanalysis of 128-Bit Symmetric
Block Ciphers [p. 579]
-
Ramesh Karri, Kaijie Wu, Piyush Mishra, Yongkook Kim
Fault-based side channel cryptanalysis is very
effective against symmetric and asymmetric encryption
algorithms. Although straightforward hardware and time
redundancy based concurrent error detection (CED)
architectures can be used to thwart such attacks, they entail
significant overhead (either area or performance). In this
paper we investigate systematic approaches to low-cost,
low-latency CED for symmetric encryption algorithms
based on the inverse relationship that exists between
encryption and decryption at algorithm level, round level
and operation level and develop CED architectures that
explore the trade-off between area overhead, performance
penalty and error detection latency. The proposed
techniques have been validated on FPGA implementations
of AES finalist 128-bit symmetric encryption algorithms.
-
35.3 MetaCores: Design and Optimization Techniques [p. 585]
-
Seapahn Meguerdichian, Farinaz Koushanfar, Advait Mogre, Dusan Petranovic, Miodrag Potkonjak
Currently, hardware intellectual property (IP) is delivered at
three levels of abstraction: hard, firm, and soft. In order to further
enhance performance, efficiency, and flexibility of IP design, we
have developed a new approach for designing hardware and software
IP called MetaCores. The new design approach starts at the
algorithm level and leverages on the algorithmâs intrinsic optimization
degrees of freedom. The approach has four main components:
(i) problem formulation and identification of optimization degrees
of freedom, (ii) objective functions and constraints, (iii) cost
evaluation engine, and (iv) multiresolution design space search.
From the algorithmic viewpoint, the main contribution is the introduction
of multiresolution search in algorithm optimization and
synthesis process. We have applied the approach to the development
of Viterbi and IIR MetaCores. Experimental results demonstrate
the effectiveness of the new approach.
Chair: Andrew B. Kahng
Organizer: Bing Sheu and Andrew B. Kahng
Panel Members: Nancy Nettleton, John Cohn, Shekhar Borkar, Louis Scheffer, Ed Cheng, Sang Wang
-

-
As fabrication technology moves to 100 nm and below, profound nanometer effects
become critical in developing silicon chips with hundreds of millions of
transistors. Both EDA suppliers and system houses have been re-tooling, and
new methodologies have been emerging. Will these efforts meet the challenges of
nanometer silicon such as performance closure, power, reliability,
manufacturability, and cost? Which aspects of nanometer design are, or are not,
under control? This session will consist of a debate between two teams of
distinguished representatives from EDA suppliers and system design houses.
Which side has the right answers and roadmap? You and a panel of judges will
decide!
Chair: Grant Martin
Organizers: Dirk Grunwald, Kurt Keutzer, Luciano Lavagno
-
37.1 A Hardware/Software Co-design Flow and IP Library Based of Simulink
TM [p. 593]
-
L. M. Reyneri, F. Cucinotta, A. Serra, L. Lavagno
This paper describes a design flow for data-dominated embedded
systems. We use The Mathworks' Simulink TM
environment for functional specification and algorithmic analysis. We
developed a library of Simulink blocks, each parameterized by design
choices such as implementation (software, analog or digital hardware,
...) and numerical accuracy (resolution, S/N ratio). Each block is equipped
with empirical models for cost (code size, chip area) and performance
(timing, energy), based on surface fitting from actual measurements.
We also developed an analysis toolbox that quickly evaluates
algorithm and parameter choices performed by the designer,
and presents the results for fast feedback. The chosen block netlist
is then ready for implementation, by using a customization of The
Mathworks' Real Time Workshop TM to generate a VHDL netlist for
FPGA implementation, as well as embedded software for DSP implementation.
-
37.2 System-Level Power/Performance Analysis for Embedded Systems Design [p. 599]
-
Amit Nandi, Radu Marculescu
This paper presents a formal technique for system level
power/performance analysis that can help the designer to
select the right platform starting from a set of target applications.
By platform we mean a family of heterogeneous architectures that
satisfy a set of architectural constraints imposed to allow re-use of
hardware and software components. More precisely, we introduce
the Stochastic Automata Networks (SANs) as an effective formalism
for average-case analysis that can be used early in the design cycle
to identify the best power/performance figure among several
application-architecture combinations. This information not only helps
avoid lengthy profiling simulations, but also enables efficient mappings
of the applications onto the chosen platform. We illustrate the
features of our technique through the design of an MPEG-2 video
decoder application.
Keywords: platform-based design, system-level analysis, stochastic
automata networks, multimedia systems.
-
37.3 High-level Software Energy Macro-modeling [p. 605]
-
T. K. Tan, A. Raghunathan, G. Lakshminarayana, N. K. Jha
This paper presents an efficient and accurate high-level software
energy estimation methodology using the concept of characterization-based
macro-modeling. In characterization-based macro-modeling, a function or
sub-routine is characterized using an accurate lower-level energy model
of the target processor, to construct a macro-model that relates the energy
consumed in the function under consideration to various parameters that can be
easily observed or calculated from a high-level programming language description.The constructed macro-models eliminate the need for significantly slower
instruction-level interpretation or hardware simulation that is required
in conventional approaches to software energy estimation.
We present two different approaches to macro-modeling for embedded
software that offer distinct efficiency-accuracy characteristics:
(i) complexity-based macro-modeling, where the variables that determine
the algorithmic complexity of the function under consideration are used
as macro-modeling parameters, and (ii) profiling-based macro-modeling, where
internal profiling statistics for the functions are used as parameters in
the energy macro-models. We have experimentally validated our software
energy macro-modeling techniques on a wide range of embedded software
routines and two different target processor architectures. Our experiments
demonstrate that high-level macro-models constructed using the proposed
techniques are able to estimate the energy consumption to within 95% accuracy
on the average, while commanding speedups of one to five orders-of-magnitude
over current instruction-level and architectural energy estimation techniques.
Chair: Carl Pixley
Organizers: Anand Raghunathan, Carl Pixley
-
38.1 Model Checking of S3C2400X Industrial Embedded SOC Product [p. 611]
-
Hoon Choi, Byeongwhee Yun, Yuntae Lee, Hyunglae Roh
This paper describes our experience and methodology used in
the model checking of S3C2400X industrial embedded SOC product.
We employed model checking to verify the RTL implementation.
We describe how to model the multiple clocks, gated clocks,
unsynchronized clocks, and synchronization logics in model
checking. Detailed case studies of real designs show the application
of the proposed modeling techniques, environment modeling,
and the properties we checked. The verification results validate
the proposed techniques by finding real bugs.
-
38.2 Semi-Formal Test Generation with Genevieve [p. 617]
-
Julia Dushina, Mike Benjamin, Daniel Geist
This paper describes the first application of the Genevieve test generation
methodology. The Genevieve approach uses semi-formal
techniques derived from ãmodel-checkingä to generate test suites
for specific behaviours of the design under test. An ãinterestingä
behaviour is claimed to be unreachable. If a path from an initial
state to the state of interest does exist, a counter-example is generated.
The sequence of states specifies a test for the desired behaviour.
To highlight real problems that could appear during test generation,
we chose the Store Data Unit (SDU) of the ST100, a new high
performance digital signal processor (DSP) developed by STMicroelectronics.
This unit is specifically selected because of the following
key issues:
1. big data structures that can not be directly modelled without
state explosion,
2. complex control logic that would require an excessive number
of tests to exercise exhaustively,
3. a design where it is difficult to determine how to drive the complete
system to ensure a given behaviour in the uneit under test.
The Genvieve methodology allowed us to define a coverage model
specifically devoted to covering corner cases of the design. Hence
the generated test suite achieved very efficient coverage of corner
cases, and checked not only functional correctness but also
whether the implementation matched design intent. As a result the
Genevieve tests discovered some subtle performance bugs which
would otherwise be very difficult to find.
-
38.3 A Transaction-Based Unified Simulation/Emulation Architecture for Functional Verification [p. 623]
-
Murali Kudlugi, Soha Hassoun, Charles Selvidge, Duaine Pryor
A transaction-based layered architecture providing for 100%
portability of a C-based testbench between simulation and emulation
is proposed. Transaction-based communication results in performance
which is commensurate with emulation without a hardware target. Testbench
portability eliminates duplicated effort when combining system level
simulation and emulation. An implementation based on the IKOS VStation
emulator validates these architectural claims on real designs.
Chair: Kazutoshi Wakabayashi
Organizers: Kazutoshi Wakabayashi, Rajesh K. Gupta
-
39.1 Integrated High-Level Synthesis and Power-Net Routing for Digital Design under Switching Noise
Constraints [p. 629]
-
Alex Doboli, Ranga Vemuri
This paper presents a CAD methodology and a tool for high-level
synthesis (HLS) of digital hardware for mixed analog-digital chips.
In contrast to HLS for digital applications,
HLS for mixed-signal systems is mainly challenged by constraints,
such as digital switching noise (DSN), that are due
to the analog circuits. This paper discusses an integrated
approach to HLS and power net routing for effectively reducing
DSN. Motivation for this research is that HLS has
a high impact on DSN reduction, however, DSN evaluation
is very difficult at a high level. Integrated approach also
employs an original method for fast evaluation of DSN and
an algorithm for power net routing and sizing. Experiments
showed that our combined binding and scheduling method
produces better results than traditional HLS techniques. Finally
, DSN evaluation using the proposed algorithm can be
significantly faster than SPICE simulation.
-
39.2 Integrating Scheduling and Physical Design into a Coherent Compilation Cycle for Reconfigurable
Computing Architectures [p. 635]
-
Kia Bazargan, Seda Ogrenci, Majid Sarrafzadeh
Advances in the FPGA technology, both in terms of device capacity and
architecture, have resulted in introduction of reconfigurable computing
machines, where the hardware adapts itself to the running application to
gain speedup. To keep up with the ever-growing performance expectations
of such systems, designers need new methodologies and tools for developing
reconfigurable computing systems (RCS). This paper addresses the need for
fast compilation and physical design phase to be used in application
development/debugging /testing cycle for RCS. We present a high-level
synthesis approach that its integrated with placement, making the compilation
cycle much faster. On the average, our tool generates the VHDL code
(and the corresponding placement information) from the data flow graph
of a program in less than a minute. By compromising 30% in the clock
frequency of the circuit, we can achieve about 10 times speedup in the
Xilinx placement phase, and 2.5 times overall speedup in the Xilinx
place-and-route phase, a reasonable trade-off when developing RCS
applications.
-
39.3 Statistical Design Space Exploration for Application-Specific Unit Synthesis [p. 641]
-
Davide Bruni, Alessandro Bogliolo, Luca Benini
The capability of performing semi-automated design space exploration
is the main advantage of high-level synthesis with respect to
RTL design. However, design space exploration performed during
high-level synthesis is limited in scope, since it provides promising
solutions that represent good starting points for subsequent optimizations,
but it provides no insight about the overall structure of
the design space. In this work we propose unsupervised Monte-
Carlo design exploration and statistical characterization to capture
the key features of the design space. Our analysis provides insight
on how various solutions are distributed over the entire design
space. In addition, we apply extreme value theory [11] to extrapolate
achievable bounds from the sampling points.
Chair: Ashok Vittal
Organizer: Narendra Shenoy
-
40.1 Static Scheduling of Multiple Asynchronous Domains For Functional
Verification [p. 647]
-
Murali Kudlugi, Charles Selvidge, Russell Tessier
While ASIC devices of a decade ago primarily contained synchronous
circuitry triggered with a single clock, many contemporary
architectures require multiple clocks that operate asynchronously to
each other. This multi-clock domain behavior presents significant
functional verification challenges for large parallel verification systems
such as distributed parallel simulators and logic emulators. In
particular, multiple asynchronous design clocks make it difficult to
verify that design hold times are met during logic evaluation and
causality along reconvergent fanout paths is preserved during signal
communication. In this paper, we describe scheduling and synchronization
techniques to maintain modeling fidelity for designs with
multiple asynchronous clock domains that are mapped to parallel
verification systems. It is shown that when our approach is applied
to an FPGA-based logic emulator, evaluation fidelity is maintained
and increased design evaluation performance can be achieved for
large benchmark designs with multiple asynchronous clock
domains.
-
40.2 Functional Correlation Analysis in Crosstalk Induced Critical Paths
Identification [p. 653]
-
Tong Xiao, Malgorzata Marek-Sadowska
In deep submicron digital circuits capacitive couplings make delay
of a switching signal highly dependent on its neighborsâ switching
times and switching directions. A long path may have a large number
of coupling neighbors with difficult to determine interdependencies.
Ignoring the mutual relationship among the signals may
result in a very pessimistic estimation of circuit delay. In this
paper, we apply efficient functional correlation analysis techniques
to identify critical paths caused by crosstalk delay effects. We also
discuss applications to static timing optimization. Experiments
demonstrate efficacy of the proposed technique.
-
40.3 An Advanced Timing Characterization Method Using Mode Dependency [p. 657]
-
Hakan Yalcin, Robert Palermo, Mohammad Mortazavi, Cyrus Bamji, Karem Sakallah,
John Hayes
To address the problem of accurate timing characterization, this
paper proposes a method that fully exploits mode dependency. It is
based on the premise that circuit delays are determined largely by
set of control inputs for which the number of useful combinations,
i.e., modes, is small for most practical circuits. We take the mode-dependent
characterization approach further and enhance it so that
the delays of the I/O paths between the control inputs and outputs
are calculated more accurately. We prove that, with a careful choice
of propagation conditions, our method can generate timing models
with very tight path delays that are guaranteed to give correct
results. Experimental results using real-life circuits show that circuit
delays can vary significantly among different modes for both
control and data input delays, and capturing this variation can
have a significant impact on the overall system timing.
-
40.4 Fast Statistical Timing Analysis By Probabilistic Event Propagation [p. 661]
-
Jing-Jia Liou, Kwang-Ting Cheng, Sandip Kundu, Angela Krstic
We propose a new statistical timing analysis algorithm, which
produces arrival-time random variables for all internal signals and
primary outputs for cell-based designs with all cell delays modeled
as random variables. Our algorithm propagates probabilistic timing
events through the circuit and obtains final probabilistic events
(distributions) at all nodes. The new algorithm is deterministic and
flexible in controlling run time and accuracy. However, the algorithm
has exponential time complexity for circuits with reconvergent
fanouts. In order to solve this problem, we further propose a
fast approximate algorithm. Experiments show that this approximate
algorithm speeds up the statistical timing analysis by at least
an order of magnitude and produces results with small errors when
compared with Monte Carlo methods.
Chair: Anand Raghunathan
Organizers: Anand Raghunathan, Sharad Malik
-
41.1 Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design [p. 667]
-
M. Sgroi, M. Sheets, A. Mihal, K. Keutzer, S. Malik, J. Rabaey, A. Sangiovanni-Vincentelli
Communication-based design represents a formal approach to
system-on-a-chip design that considers communication between components
as important as the computations they perform. Our ãnetwork-on-chipä
approach partitions the communication into layers to maximize reuse
and provide a programmer with an abstraction of the underlying
communication framework. This layered approach is cast in the
structure advocated by the OSI Reference Model and is demonstrated
with a reconfigurable DSP example. The Metropolis methodology of
deriving layers through a sequence of adaptation steps between
incompatible behaviors is illustrated through the Intercom design
example. In another approach, MESCAL provides a designer with
tools for a correct-by-construction protocol stack.
GENERAL TERMS÷Design
KEYWORDS÷Network-on-chip; platform-based design;
communication-based design; protocol stack.
-
41.2 MicroNetwork-Based Integration for SOCs [p. 673]
-
Drew Wingard
In this paper, we describe the concept of using an on-chip network
as the fundamental communication architecture for a complex
SOC design. We describe some of the salient features of a
MicroNetwork that we have implemented, and introduce an
automated development environment that makes MicroNetwork-ased
design productive. Finally, we show how one would use our
MicroNetwork to integrate an SOC, and provide a general
description of several completed MicroNetwork-based designs.
-
41.3 On-Chip Communication Architecture for OC-768 Network Processors [p. 678]
-
Faraydon Karim, Anh Nguyen, Sujit Dey, Ramesh Rao
The need for network processors capable of forwarding IP
packets at OC-192 and higher data rates has been well
established. At the same time, there is a growing need for
complex tasks, like packet classification and differentiated
services, to be performed by network processors. At OC-768
data rate, a network processor has 9 nanoseconds to
process a minimum-size IP packet. Such ultra high-speed
processing, involving complex memory-intensive tasks, can
only be achieved by multi-CPU distributed memory systems,
using very high performance on-chip communication
architectures. In this paper, we propose a novel
communication network architecture for 8-CPU distributed memory
systems that has the potential to deliver the
throughput required in next generation routers. We then
show that our communication architecture can easily scale
to accommodate much greater number of network nodes.
Our network architecture yields higher performance than
the traditional bus and crossbar yet has low implementation
cost. It is quite flexible and can be implemented in either
packet or circuit switched mode. We will compare and
contrast our proposed architecture with busses and
crossbars using metrics such as throughput and physical
layout cost.
-
41.4 Route Packets, Not Wires: On-Chip Interconnection Networks [p. 684]
-
William J. Dally, Brian Towles
Using on-chip interconnection networks in place of ad-hoc global
wiring structures the top level wires on a chip and facilitates
modular design. With this approach, system modules (processors,
memories, peripherals, etc...) communicate by sending packets to
one another over the network. The structured network wiring
gives well-controlled electrical parameters that eliminate timing
iterations and enable the use of high-performance circuits to
reduce latency and increase bandwidth. The area overhead
required to implement an on-chip network is modest, we estimate
6.6%. This paper introduces the concept of on-chip networks,
sketches a simple network, and discusses some challenges in the
architecture and design of these networks.
Chair: Stephen Neuendorffer
Organizers: Donatella Sciuto, Marco Di Natale
-
42.1 Dynamic Management of Scratch-Pad Memory Space [p. 690]
-
M. Kandemir, J. Ramanujam, M. J. Irwin, N. Vijaykrishnan, I. Kadayif, A. Parikh
Optimizations aimed at improving the efficiency of on-chip memories
are extremely important. We propose a compiler-controlled dynamic
on-chip scratch-pad memory (SPM) management framework that uses
both loop and data transformations. Experimental results obtained using
a generic cost model indicate significant reductions in data transfer
activity between SPM and off-chip memory.
-
42.2 Clustered VLIW Architectures with Predicated Switching [p. 696]
-
Margarida F. Jacome, Gustavo de Veciana, Satish Pillai
In order to meet the high throughput requirements of applications
exhibiting high ILP, VLIW ASIPs may increasingly
include large numbers of functional units(FUs). Unfortunately,
`switching' data through register files shared by large
numbers of FUs quickly becomes a dominant cost/ performance
factor suggesting that clustering smaller number of
FUs around local register files may be beneficial even if data
transfers are required among clusters. With such machines
in mind, we propose a compiler transformation, predicated
switching, which enables aggressive speculation while lever-aging
the penalties associated with inter-cluster communication
to achieve gains in performance. Based on representative
benchmarks, we demonstrate that this novel technique
is particularly suitable for application specific clustered machines
aimed at supporting high ILP as compared to state-of-the-art approaches.
-
42.3 High-Quality Operation Binding for Clustered VLIW Datapaths [p. 702]
-
Viktor S. Lapinskii, Margarida F. Jacome, Gustavo A. de Veciana
Clustering is an effective method to increase the available parallelism
in VLIW datapaths without incurring severe penalties associated
with large number of register file ports. Efficient utilization
of a clustered datapath requires careful binding of operations to
clusters. The paper proposes a binding algorithm that effectively
explores tradeoffs between in-cluster operation serialization and
delays associated with data transfers between clusters. Extensive
experimental evidence is provided showing that the algorithm generates
high quality solutions for basic blocks, with up to 29% improvement
over a state-of-the-art advanced binding algorithm.
-
42.4 Fast Bit-True Simulation [p. 708]
-
Holger Keding, Martin Coors, Olaf Lüthje, Heinrich Meyr
This paper presents a design environment which enables fast simulation
of fixed-point signal processing algorithms. In contrast to existing
approaches which use C/C++ libraries for the emulation of generic fixed-point
data types, this novel approach additionally permits a code transformation
to integral data types for fast simulation of the bit-true behavior. A speedup
by a factor of 20 to 400 can be achieved compared to library based simulation.
Chair: Sachin S. Sapatnekar
Organizers: Jaijeet Roychowdhury, Mustafa Celik
-
43.1 Timing Analysis with Crosstalk as Fixpoints on Complete Lattice [p. 714]
-
Hai Zhou, Narendra Shenoy, William Nicholls
Increasing delay variation due to crosstalk has a dramatic
impact on deep sub-micron technologies. It is now necessary
to include crosstalk in timing analysis. But timing analysis
with crosstalk is a chicken-and-egg problem since crosstalk
effect in turn depends on timing behavior of a circuit. In
this paper, we establish a theoretical foundation for timing
analysis with crosstalk. We show that solutions to the problem
are fixpoints on a complete lattice. Base on that, we
prove in general the convergence of any iterative approach.
We also show that, starting from different initial solutions,
an iterative approach will reach different fixpoints. The
current prevailing practice, which starts from the worst case
solution, will always reach the greatest fixpoint (which is
the loosest solution). In order to reach the least fixpoint, we
need to start from the best case solution. Base on chaotic
iteration and heterogeneous structures of coupled circuits,
we also design techniques to speed up iterations.
-
43.2 Driver Modeling and Alignment for Worst-Case Delay Noise [p. 720]
-
Supamas Sirichotiyakul, David Blaauw, Chanhee Oh, Rafi Levy, Vladimir Zolotov,
Jingyan Zuo
In this paper, we present a new approach to model the impact of
cross-coupling noise on interconnect delay. We introduce a new
linear driver model that accurately models the noise pulse induced
on a switching signal net due to cross coupling capacitance. The
proposed model effectively captures the non-linear behavior of the
victim driver gate during the transition and has an average error
below 8% whereas the traditional approach using a Thevenin
model incurs an average error of 48%. We also discuss the worst
case alignment of the aggressor net transitions with respect to the
victim net transition, emphasizing the need to maximize not
merely the delay of the interconnect alone but the combined delay
of the interconnect and receiver gate. We show that the worst case
alignment of an aggressor net transition is a function of the
receiver gate output loading, victim transition edge rate, and the
noise pulse width and height and hence propose a pre-characterization
approach to efficiently predict the worst-case alignment. The
proposed methods were implemented in an industrial noise analysis
tool called ClariNet. Results on industrial designs are presented
to demonstrate the effectiveness of our approach.
-
43.3 False Coupling Interactions in Static Timing Analysis [p. 726]
-
Ravishankar Arunachalam, Ronald D. Blanton, Lawrence T. Pileggi
Neighboring line switching can contribute to a large portion of the
delay of a line for todayâs deep submicron designs. In order to
avoid excessive conservatism in static timing analysis, it is important
to determine if aggressor lines can potentially switch simultaneously
with the victim. In this paper, we present a comprehensive
ATPG-based approach that uses functional information to identify
valid interactions between coupled lines. Our algorithm accounts
for glitches on aggressors that can be caused by static and dynamic
hazards in the circuit. We present results on several benchmark circuits
that show the value of considering functional information to
reduce the conservatism associated with worst-case coupled line
switching assumptions during static timing analysis.
-
43.4 Coupling Delay Optimization by Temporal Decorrelation using Dual Threshold Voltage Technique [p. 732]
-
Ki-Wook Kim, Seong-Ook Jung, Prashant Saxena, C. L. Liu, Sung-Mo Kang
Coupling effect due to line-to-line capacitance is of serious concern
in timing analysis of circuits in ultra deep submicron CMOS
technology. Often coupling delay is strongly dependent on temporal
correlation of signal switching in relevant wires. Temporal
decorrelation by shifting timing window can alleviate performance
degradation induced by tight coupling. This paper presents an algorithm
for minimizing circuit delay through timing window modulation
in dual Vt technology. Experimental results on the ISCAS85
benchmark circuits indicate that the critical delay will be reduced
significantly when low Vt is applied properly.
Keywords
Coupling, Performance, Leakage power
Chair: Vivek Tiwari
Organizers: Ingrid Verbauwhede, Tadahiro Kuroda
-
44.1 Input Space Adaptive Design: A High-level Methodology for Energy and Performance Optimization [p. 738]
-
W. Wang, A. Raghunathan, G. Lakshminarayana, N. K. Jha
This paper presents a high-level design methodology, called
input space adaptive design, and new design automation
algorithms for optimizing energy consumption and performance.
An input space adaptive design exploits the well-known fact
that the quality of hardware circuits and software programs can
be significantly optimized by employing
algorithms and implementation architectures that adapt to
the input statistics. We propose a methodology for such
designs which includes identifying parts of the behavior to
be optimized, selecting appropriate input sub-spaces, transforming
the behavior, and verifying the equivalence of the
original and optimized designs. Experimental results indicate
that such designs can reduce energy consumption by
up to 70.6% (average of 55.4%), and simultaneously improve
performance by up to 85.1% (average of 58.1%), leading to
a reduction in the energy-delay product by up to 95.6% (average
of 80.7%), compared to well-optimized designs that do
not employ such techniques.
-
44.2 A2BC: Adaptive Address Bus Coding for Low Power Deep SuböMicron Designs [p. 744]
-
Jörg Henkel, Haris Lekatsas
Due to larger buses (length, width) and deep sub-micron effects
where coupling capacitances between bus lines are in
the same order of magnitude as base capacitances, power
consumption of interconnects starts to have a significant impact
on a systemâs total power consumption. We present
novel address bus encoding schemes that take coupling effects
into consideration. The basis is a physical bus model
that quantifies coupling capacitances. As a result, we report
power/energy savings on the address buses of up to 56%
compared to the best known ordinary power/energy efficient
encoding schemes. Thereby, we exceed the only to-date approach
that also takes coupling effects into consideration.
Moreover, our encoding schemes do not assume any a priori
knowledge that is particular to a specific application.
-
44.3 Coupling-Driven Bus Design for Low-Power Application-Specific Systems [p. 750]
-
Youngsoo Shin, Takayasu Sakurai
In modern embedded systems including communication and multimedia
applications, large fraction of power is consumed during
memory access and data transfer. Thus, buses should be designed
and optimized to consume reasonable power while delivering sufficient
performance. In this paper, we address a bus ordering problem
for low-power application-specific systems. A heuristic algorithm
is proposed to determine the order in a way that effective
lateral component of capacitance is reduced, thereby reducing the
power consumed by buses. Experimental results for various examples
indicate that the average power saving from 30% to 46.7%
depending on capacitance components can be obtained without any
circuit overhead.
-
44.4 Modeling and Minimization of Interconnect Energy Dissipation in Nanometer Technologies [p. 754]
-
Clark N. Taylor, Sujit Dey, Yi Zhao
As the technology sizes of semiconductor devices continue to decrease,
the effect of nanometer technologies on interconnects, such
as crosstalk glitches and timing variations, become more significant.
In this paper, we study the effect of nanometer technologies
on energy dissipation in interconnects. We propose a new
power estimation technique which considers DSM effects, resulting
in significantly more accurate energy dissipation estimates than
transition-count based methods for on-chip interconnects. We also
introduce an energy minimization technique which attempts to minimize
large voltage swings across the cross-coupling capacitances
between interconnects. Even though the number of transitions may
increase, our method yields a decrease in power consumption of up
to 50%.
-
44.5 A True Single-Phase 8-bit Adiabatic Multiplier [p. 758]
-
Suhwan Kim, Conrad H. Ziesler, Marios C. Papaefthymiou
This paper presents the design and evaluation of an 8-bit adiabatic
multiplier. Both the multiplier core and its built-in self-test logic
have been designed using a true single-phase adiabatic logic family.
Energy is supplied to the adiabatic circuitry via a sinusoidal
power-clock waveform that is generated on-chip. In HSPICE simulations
with post-layout extracted parasitics, our design functions
correctly at clock frequencies exceeding 200 MHz. The total dissipation
of the multiplier core and self-test circuitry approaches
130pJ per operation at 200MHz. Our 11,854-transistor chip has
been fabricated in a 0.5 um standard CMOS process with an active
area of 0.470mm2. Correct chip operation has been validated
for operating frequencies up to 130 MHz, the limit of our experimental
setup. Measured dissipation correlates well with HSPICE simulations.
Categories and Subject Descriptors
B.7.1 [Hardware]: Integrated Circuits÷Types and Design Styles;
B.8.1 [Hardware]: Performance and Reliability÷Reliability, Testing,
and Fault-Tolerance; B.8.m [Hardware]: Performance and
Reliability÷Miscellaneous
General Terms
Design, Measurement, Verification
Keywords
Adiabatic logic, Clock generator, CMOS, Dynamic logic, Low power,
Low energy, SCAL, SCAL-D, Single phase, Multiplier, VLSI
Chair: Ralph H. J. M. Otten
Organizers: Louis Scheffer, Patrick Groeneveld
-
45.1 TCG: A Transitive Closure Graph-Based Representation for Non-Slicing
Floorplans [p. 764]
-
Jai-Ming Lin, Yao-Wen Chang
In this paper, we propose a transitive closure graph-based representation
for general floorplans, called TCG, and show its superior
properties. TCG combines the advantages of popular representations
such as sequence pair, BSG, and B*-tree. Like sequence pair and
BSG, but unlike O-tree, B*-tree, and CBL, TCG is P-admissible.
Like B*-tree, but unlike sequence pair, BSG, O-tree, and CBL, TCG
does not need to construct additional constraint graphs for the cost
evaluation during packing, implying faster runtime. Further, TCG
supports incremental update during operations and keeps the information
of boundary modules as well as the shapes and the relative
positions of modules in the representation. More importantly, the geometric
relation among modules is transparent not only to the TCG
representation but also to its operations, facilitating the convergence
to a desired solution. All these properties make TCG an effective and
flexible representation for handling the general floorplan/placement
design problems with various constraints. Experimental results show
the promise of TCG.
-
45.2 Floorplanning with Abutment Constraints and L-Shaped/T-Shaped Blocks based on Corner Block List [p. 770]
-
Yuchun Ma, Xianlong Hong, Sheqin Dong, Yici Cai, Chung-Kuan Cheng, Jun Gu
The abutment constraint problem is one of the common constraints in practice
to favor the transmittion of data between blocks. Based on Corner Block
List (CBL), a new algorithm to deal with abutment constraints is developed
in this paper. We can obtain the abutment information by scanning the
intemediate solutions represented by CBL by scanning the intermediate
solutions represented by CBL in linear time during the simulated annealing
process and fixed the CBL in case the constraints are violated. Based on this
algorithm, a new method to deal with L-shaped/T-shaped blocks is proposed.
The shape flexibility of the soft blocks and the rotation and reflection
of L-shaped/T-shaped blocks are exploited to obtain a tight packing.
The experiment results are demonstrated by some benchmark data and the
performance shows effectiveness of the proposed method.
-
45.3 Improved Cut Sequences for Partitioning Based Placement [p. 776]
-
Mehmet Can Yildiz, Patrick H. Madden
Recursive partitioning based placement has a long history, but there
has been little consensus on how cut sequences should be chosen.
In this paper, we present a dynamic programming approach to cut
sequence generation. If certain assumptions hold, these sequences
are optimal. After study of these optimal sequences, we observe
that an extremely simple method can be used to construct sequences
that are near optimal.
Using this method, our bisection based placement tool Feng Shui
outperforms the previously presented Capo tool by 11% on a large
benchmark. By integrating our cut sequence method into Capo,
we are able to improve performance by 5%, bringing the results of
Feng Shui and Capo closer together.
-
45.4 Timing Driven Placement using Physical Net Constraints [p. 780]
-
Bill Halpin, C. Y. Roger Chen, Naresh Sehgal
This paper presents a new timing driven placement algorithm that
explicitly meets physical net lengths constraints. It is the first recursive
bi-section placement (RBP) algorithm that meets precise half perimeter
bounding box constraints on critical nets. At each level of the recursive
bi-section, we use linear programming to ensure that all net constraints
are met. Our method can easily be incorporated with existing RBP
methods. We use the net constraint based placer to improve timing
results by setting and meeting constraints on timing critical nets. We
report significantly better timing results on each of the MCNC
benchmarks and achieve an average optimization exploitation of 69%
versus previously reported 53%.
-
45.5 From Architecture to Layout: Partitioned Memory Synthesis for Embedded Systems-on-Chip [p. 784]
-
L. Benini, L. Macchiarulo, A. Macii, E. Macii, M. Poncino
We propose an integrated front-end/back-end flow for the automatic
generation of a multi-bank memory architecture for embedded
systems. The flow is based on an algorithm for the automatic
partitioning of on-chip SRAM. Starting from the dynamic
execution profile of an embedded application running on a given
processor core, we synthesize a multi-banked SRAM architecture
optimally fitted to the execution profile.
The partitioning algorithm is integrated with the physical design
phase into a complete flow that allows the back-annotation of
layout information to drive the partitioning process. Results,
collected on a set of embedded applications for the ARM processor,
have shown average energy savings around 34%.
Chair: Steven Schulz
Organizer: Georgia Marszalek
Panel Members: Greg Hinckley, Greg Spirakis, Karen Vahtra, John Darringer,
J. George Janac, Handel H. Jones
-
-
If EDA technology innovation drives return on investment, what
drives innovation? Is it tied to the semiconductor retooling cycle?
To productivity requirements? To Mooreâs Law? Or is there
something more fundamental that we are missing? What driving
forces could result in a $30 billion EDA industry, and what role
will innovation play? Will the industry provide the needed
breakthroughs for their customers, or seek instead the lowest
common denominator user? Will designers ãroll their ownä tools
or seek common solutions? How does EDA innovation track with
the semiconductor business cycle? Does a slowdown accelerate or
depress creation of new tools and new designs? Panelists will
discuss the forces at work in the EDA industry of tomorrow.
Chair: Anirudh Devgan
Organizers: Chandu Visweswariah, Kenji Yoshida
-
47.1 Built-In Self-Test for Signal Integrity [p. 792]
-
Mehrdad Nourani, Amir Attarha
Unacceptable loss of signal integrity may harm the functionality of
SoCs permanently or intermittently. We propose a systematic approach
to model and test signal integrity in deep-submicron high-speed interconnects.
Various signal integrity problems occurring on such interconnects
(e.g. crosstalk, overshoot, noise, skew, etc.) are considered in
a unified model. We also present a test methodology that uses a noise
detection circuitry to detect low integrity signals and an inexpensive
test architecture to measure and read the statistics for final observation
and analysis.
-
47.2 Analysis of On-Chip Inductance Effects using a Novel Performance Optimization Methodology for Distributed RLC Interconnects [p. 798]
-
Kaustav Banerjee, Amit Mehrotra
This work presents a new and computationally efficient performance
optimization technique for distributed RLC interconnects based on a rigorous
delay computation scheme. The new optimization technique has been
employed to analyze the impact of line inductance on the circuit behaviour
and to illustrate the implications of technology scaling on wire inductance.
It is shown that reduction in the driver capacitance and output resistance
with scaling makes deep submicron (DSM) designs increasingly susceptible
to inductance effects. Also, the impact of inductance variations on
performance has been quantified. Additionally, the impact of the wire inductance
on catastrophic logic failures and IC reliability issues have been
analyzed.
-
47.3 Modeling and Analysis of Differential Signaling for Minimizing Inductive Cross-Talk [p. 804]
-
Yehia Massoud, Jamil Kawa, Don MacMillen, Jacob White
Many physical synthesis tools interdigitate signal and power
lines to reduce cross-talk, and thus, improve signal integrity
and timing predictability. Such approaches are extremely
effective at reducing cross-talk at circuit speeds where inductive
effects are inconsequential. In this paper, we use
a detailed distributed RLC model to show that inductive
cross-talk effects are substantial in long busses associated
with 0.18 micron technology. Simulation experiments are
then used to demonstrate that cross-talk in such high speed
technologies is much better controlled by re-deploying inter-digitated
power lines to perform differential signaling.
Chair: Derek Beatty
Organizers: Carl Pixley, Shin-ichi Minato
-
48.1 Automated Pipeline Design [p. 810]
-
Daniel Kroening, Wolfgang J. Paul
The interlock and forwarding logic is considered the tricky part
of a fully-featured pipelined microprocessor and especially debugging
these parts delays the hardware design process considerably.
It is therefore desirable to automate the design of both interlock
and forwarding logic. The hardware design engineer begins with
a sequential implementation without any interlock and forwarding
logic. A tool then adds the forwarding and interlock logic required
for pipelining. This paper describes the algorithm for such a tool
and the correctness is formally verified. We use a standard DLX
RISC processor as an example.
Keywords
Pipeline, Forwarding, Interlock
-
48.2 A New Verification Methodology for Complex Pipeline Behavior [p. 816]
-
Kazuyoshi Kohno, Nobu Matsumoto
A new test program generation tool, mVpGen, is developed for
verifying pipeline design of microprocessors. The only inputs
mVpGen requires are pipeline-behavior specifications; it automatically
generates test cases at first from pipeline-behavior specifications
and then automatically generates test programs corresponding
to the test cases.
Test programs for verifying complex pipeline behavior such
as hazard and branch or hazard and exception, are generated.
mVpGen has been integrated into a verification system for verifying
RTL descriptions of a real microprocessor design and complex
bugs that remained hidden in the RTL descriptions are detected.
-
48.3 Pre-silicon Verification of the Alpha 21364 Microprocessor Error Handling System [p. 822]
-
Richard Lee, Benjamin Tsien
This paper presents the strategy used to verify the error logic in
the Alpha 21364 microprocessor. Traditional pre-silicon
strategies of focused testing or unit-level random testing yield
limited results in finding complex bugs in the error handling
logic of a microprocessor. This paper introduces a technique to
simulate error conditions and their recovery in a global
environment using random test stimulus closely approximating
traffic found in a real system. A significant number of bugs were
found using this technique. A majority of these bugs could not be
uncovered using a simple random environment, or were counterintuitive
to focused test design.
Chair: Donatella Sciuto
Organizers: Diederik Verkest, Luciano Lavagno
-
49.1 Energy Efficient Fixed-Priority Scheduling for Real-Time Systems on Variable Voltage Processors [p. 828]
-
Gang Quan, Xiaobo (Sharon) Hu
Energy consumption has become an increasingly important consideration
in designing many real-time embedded systems. Variable
voltage processors, if used properly, can dramatically reduce
such system energy consumption. In this paper, we present a technique
to determine voltage settings for a variable voltage processor
that utilizes a fixed priority assignment to schedule jobs. Our approach
also produces the minimum constant voltage needed to feasibly
schedule the entire job set. Our algorithms lead to significant
energy saving compared with previously presented approaches.
-
49.2 Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service [p. 834]
-
Qinru Qiu, Qing Wu, Massoud Pedram
In this paper we address the problem of dynamic power
management in a distributed multimedia system with a required
quality of service (QoS). Using a generalized stochastic Petri net
model where the non-exponential inter-arrival time distribution of
the incoming requests is captured by the ãstage methodä, we
provide a detailed model of the power-managed multimedia
system under general QoS constraints. Based on this
mathematical model, the power-optimal policy is obtained by
solving a linear programming problem. We compare the new
problem formulation and solution technique to previous dynamic
power management techniques that can only optimize power
under delay constraints. We then demonstrate that these other
techniques yield policies with higher power dissipation by overconstraining
the delay target in an attempt to indirectly satisfy the
QoS constraints. In contrast, our new method correctly
formulates the power management problem under QoS constraints
and obtains the optimal solution.
-
49.3 Power-Aware Scheduling under Timing Constraints for Mission-Critical Embedded Systems [p. 840]
-
Jinfeng Liu, Pai H. Chou, Nader Bagherzadeh, Fadi Kurdahi
Power-aware systems are those that must make the best use of available
power. They subsume traditional low-power systems in that
they must not only minimize power when the budget is low, but
also deliver high performance when required. This paper presents a
new scheduling technique for supporting the design and evaluation
of a class of power-aware systems in mission-critical applications.
It satisfies stringent min/max timing and max power constraints. It
also makes the best effort to satisfy the min power constraint in an
attempt to fully utilize free power or to control power jitter. Experimental
results show that our scheduler can improve performance
and reduce energy cost simultaneously compared to hand-crafted
designs for previous missions. This tool forms the basis of the IMPACCT
system-level framework that will enable designers to explore
many power-performance trade-offs with confidence.
Chair: Chandu Visweswariah
Organizers: Chandu Visweswariah, Ralph H. J. M. Otten
-
50.1 Exploring SOI Device Structures and Interconnect Architectures for
3-Dimensional Integration [p. 846]
-
Rongtian Zhang, Kaushik Roy, Cheng-Kok Koh, David B. Janes
3-Dimensional (3-D) integration offers numerous advantages over
conventional structures. Double-gate (DG) transistors can be fabricated
for better device characteristics, and multiple device layers
can be vertically stacked for better interconnect performance.
In this paper, we explore the suitable device structures and interconnect
architectures for multi-device-layer integrated circuits and
study how 3-D SOI circuits can better meet the performance and
power dissipation requirements projected by ITRS for future technology
generations. Results demonstrate that DGSOI circuits can
achieve as much as 20% performance gain and 80% power delay
product reduction than SGSOI (single-gate SOI). More important,
for an interconnect-dominated circuits, multi-device-layer integration
offers significant performance improvement. Compared to 2-D
integration, most 3-D circuits can be clocked at much higher frequencies
(double or even triple). Multi-device-layer circuits, with
suitable SOI device structures, can be a viable solution for future
low power high performance applications.
-
50.2 Two-Dimensional Position Detection System with MEMS Accelerometer for
MOUSE Applications [p. 852]
-
Seungbae Lee, Gi-Joon Nam, Junseok Chae, Hanseup Kim, Alan J. Drake
A hybrid two-dimensional position sensing system is designed for
mouse applications. The system measures the acceleration of handmovements
which are converted into two-dimensional location
coordinates. The system consists of four major components: 1)
MEMS accelerometers, 2) CMOS analog read-out circuitry, 3) an
acceleration magnitude extraction module, and 4) a 16-bit RISC
microprocessor. Mechanical and analog circuit simulation shows
that the designed padless mouse system can detect accelerations as
small as 5.3 mg and operate up to 18MHz.
-
50.3 Mismatch Analysis and Direct Yield Optimization by Spec-Wise Linearization
and Feasibility-Guided Search [p. 858]
-
Frank Schenkel, Michael Pronath, Stephan Zizala, Robert Schwencker, Helmut
Graeb, Kurt Antreich
We present a new method for mismatch analysis and automatic
yield optimization of analog integrated circuits with respect to
global, local and operational tolerances. Effectiveness and efficiency
of yield estimation and optimization are guaranteed by consideration
of feasibility regions and by performance linearization
at worst-case points. The proposed methods were successfully applied
to two example circuits for an industrial fabrication process.
|