|
ABSTRACTS DAC 96
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
-
SESSION 1
PANEL: THE EDA YEAR IN REVIEW: CEO'S, THE PRESS, AND USERS
Chair: John Cooley
Organizer: John Cooley
Panelists:
Joe Costello, Aart de Geus, Richard Goering, Alain Hanover, Wally Rhines,
Gary Smith
A wide variety of issues will be addressed from a customer perspective by
EDA CEO's and industry pundits. Panelists will address questions and concerns
submitted by attendees.
Chair: Gary Smith
Organizer: S. Boose
High speed PCB design implementation is increasingly influenced by interconnect
issues as much as logic design. This session opens with the presentation
"Effective Interconnect and Package Modeling Techniques" using
a 2.4 GHz RF to IF converter example. Using several model examples this
paper determines which parasitic quantities are the important requirements.
The panel discussion that follows will explore the technology available
to address high speed design requirements from both the vendor and customer
point of view by exploring both the customer and vendor view of the tools
and technology for high speed interconnect design.
-
2.1 Package and Interconnect Modeling of the HFA3624, a 2.4 GHz RF to
IF Converter
- Mattan Kamon, Steve S. Majors
This paper demonstrates the use of three-dimensional electromagnetic field
solvers to accurately model the interconnect for a 2.4 GHz RF to IF Converter
integrated circuit under development at Harris Semiconductor. Lumped element
RLC models for the package leads, bondwires, die attach plane, and on-chip
interconnect are shown to provide accurate simulation versus measurement and
prove the importance of modeling the entire packaged IC.
-
Panel: PCB Synthesis - Is the Technology Ready for High Speed Design ?
- Panelists:
Robert Gonzales, Mark Leonard, Fred Saal, John Schoenfeld, Jonathan Weis,
Mike White
Today’s technology imperative demands faster products with greater data
bandwidths and an increasing number of complex devices. Responding to these
demands while meeting today’s time to market pressures limits engineers’
tolerance for traditional design processes that depend on an unmanageable
number of iterations between design and layout. A panel of industry experts
led by Gary Smith, Senior EDA Analyst from DataQuest, will focus on the
changing issues and challenges facing engineers in the design of high speed
circuits.
Chair: Bob Frye
Organizers: S. Nance, N. Weste
Power management is a topic of growing concern in circuit and system design.
Papers in this session deal with methods of power analysis and optimization.
-
3.1 HEAT: Hierarchical Energy Analysis Tool
- Janardhan H. Satyanarayana, Keshab K. Parhi
This paper presents an algorithm for the estimation of power in static CMOS
digital circuits using a stochastic approach. The salient feature of this
approach is that it can be used to estimate the power of reasonably large
digital circuits in a very short time, due to its hierarchical nature. Here,
the given circuit is first partitioned into smaller sub-circuits. Then, the
sub-circuits are modeled using state transition diagrams (stds), and the
steady-state probabilities associated with the various states are computed by
treating them as irreducible Markov chains. Finally, the energy associated
with each sub-circuit is computed, and the total energy of the circuit is
obtained by summing up the energies of its constituent sub-circuits. In the
proposed hierarchical approach, the energies associated with various edges in
a sub-circuit are calculated only once using SPICE and these values are used
several times; this results in large savings in computation time. Another
advantage of the proposed approach is that we can accommodate switching
activities at the transistor level and not necessarily at gate or higher levels.
Experimental results show that the estimated power is in close agreement with
the actual power obtained from exhaustive SPICE simulations, but the
computation time required by the proposed approach is orders of magnitude less
than that of SPICE.
-
3.2 Opportunities and Obstacles in Low-power System-Level CAD
- Andrew Wolfe
A case study in low-power system-level design is presented. We detail
the design of a low-power embedded system, a touchscreen interface device for a
personal computer. This device is designed to operate on excess power provided
by unused RS232 communication lines. We focus on the design and measurement
procedures used to reduce the power requirements of this system to less than
50mW. Additionally, we highlight opportunities to use system-level design and
analysis tools for low-power design and the obstacles that prevented using such
tools in this design.
-
3.3 POSE: Power Optimization and Synthesis Environment
- Sasan Iman, Massoud Pedram
Recent trends in the semiconductor industry have resulted in an increasing
demand for low power circuits. POSE is a step in providing the EDA community
and academia with an environment and tool suite for automatic synthesis and
optimization of low power circuits. POSE provides a unified framework for
specifying and maintaining power relevant circuit information and means of
estimating power consumption of a circuit using different load models. POSE
also gives a set of options for making are-power trade-offs during logic
optimization.
-
3.4 Early Power Exploration - A World Wide Web Application
- David B. Lidsky, Jan M. Rabaey
Exploration at the earliest stages of the design process is an integral
component of effective low-power design. Nevertheless, superficial high-level
analysis with insufficient accuracy are routinely performed. Critical
drawbacks of current high-level design aids include a limited scope of
application, inaccuracy of estimation, inaccessibility, and steep learning
curves. This paper introduces an approach to alleviate these limitations,
thus enabling more effective high-level design exploration. A World Wide
Web (WWW) based prototype called PowerPlay, which encapsulates and enhances
these techniques, is presented.
Chair: Hiroto Yasuura
Organizers: R. Walker, K. Wakabayashi
As high level synthesis matures, a number of exciting new directions are
beginning to emerge. This session, the first of three sessions on high-level
synthesis, begins with a brief tutorial on current issues in MLS. It then
presents current work on synthesizing iterative intensive computations,
on generating code for DSP architectures, on analyzing timing tradeoffs,
and optimizing high level HDL descriptions using don't care information.
-
4.1 Tutorial: Behavioral Synthesis
- Raul Camposano
Since the 1988 tutorial on behavioral (high-level) synthesis which appeared in
the DAC [1] proceedings (and later in the proceedings of the IEEE [2]) much has
happened in the field. Behavioral synthesis is now considered mainstream
EDA as evidenced by the number of articles at conferences, journals and books.
May be even more significant, several products are being offered commercially,
and the number of designers using behavioral synthesis is probably in the
hundreds. The initial promise of behavioral synthesis, to dramatically increase
productivity by elevating the level of design, has been fulfilled. The increase
in productivity of behavioral design versus RTL design is typically quoted at 5
times. What came as a surprise to most researchers in the field, is that this
is achieved without impacting the quality of results (area, timing)! If anything
, behavioral designs are slightly smaller and faster that their RTL counterparts, mainly because much more architectural exploration can be afforded at the
behavioral level, thus providing a better starting point for synthesis.
-
4.2 A Register File and Scheduling Model for Application Specific Processor
Synthesis
- Ehat Ercanli, C. Papachristou
In this paper, we outline general design steps of our synthesis tool to realize
application specific co-processors such that for a given scientific application
having intensive iterative computations especially with recurrences, a VLIW
type of co-processor is synthesized and realized, and an accompanying parallel
code is generated. We introduce a novel register file model, Shifting Register
File (SRF), based on cyclic regularity of register file accesses; and a simple
method, Expansion Scheduling, for scheduling iterative computations, which is
based on cyclic regularity of loops. We also present a variable-register file
allocation method and show how simple logic units can be used to activate
proper registers at run time through an example.
-
4.3 Optimized Code Generation of Multiplication-Free Linear Transforms
- Mahesh Mehendale, G. Venkatesh, S.D. Sherlekar
We present code generation of multiplication-free linear transforms targeted to
single-register DSP architectures such as TMS320C2x/C5x. We first present an
algorithm to generate optimized code from a DAG representation. We then present
techniques that transform a DAG so as to minimize the number of nodes and the
accumulator-spills. We then introduce a concept of spill-free DAGs and present
an algorithm for synthesizing such DAGs. The results for Walsh-Hadamard, Haar
and Slant transforms show 25% to 40% reduction in the cycle count using our
techniques.
-
4.4 Concurrent Analysis Techniques for Data Path Timing Optimization
- Chuck Monahan, Forrest Brewer
Conventional High-level Synthesis techniques create an inter-connection
structure before physical design. Following physical design, connection delays
and special requirements may cause the structure to fail timing or performance
constraints. Alterations to this structure are often limited since it was
created either after or during the binding and scheduling tasks. In this paper
we present a set of techniques which analyze the timing trade-offs associated
with the position-specific interconnection network given the freedom of
high-level binding and rescheduling changes.
-
4.5 HDL Optimization Using Timed Decision Tables
- Jian Li, Rajesh K. Gupta
System-level pre-synthesis refers to the optimization of an input HDL description
that produces an optimized HDL description suitable for subsequent synthesis
tasks. In this paper, we present optimization of control flow in behavioral HDL
descriptions using external Don't Care conditions. The optimizations are
carried out using a tabular model of system functionality, called Timed
Decision Tables or TDTs. TDT based optimization presented here have been
implemented in a program called PUMPKIN . Optimization results from several
examples show a reduction of 3-88% in the size of synthesized hardware circuits
depending upon the external Don't Care information supplied by the user.
Chair: Peter Beerel
Organizers: I. Lavagno, B. Lin
This session presents efficient techniques for the design of asynchronous
circuits. The first two papers present efficient algorithms for analyzing
the timing behavior of timed circuits. The third paper presents a new heuristic
state encoding technique. The fourth paper addresses the problem of synthesizing
distributed control circuits from high-level descriptions. The fifth paper
presents a heuristic algorithm for the two-level minimization of hazard-free
circuits. The final paper is devoted to synthesizing asynchronous designs
at the transistor level.
-
5.1 Efficient Partial Enumeration for Timing Analysis of Asynchronous
Systems
- Eric Verlind, Gjalt de Jong, Bill Lin
This paper presents an efficient method for the timing verification f
concurrent systems, modeled as labeled Timed Petri nets. The verification
problems we consider require us to analyze the systems reachable behaviors
under the specified time delays. Our geometric timing analysis algorithm
improves over existing ones by enumerating the state space only partially. The
algorithm relies on a concept called premature firing and a new, extended notion
of clocks with a negative age. We have tested the fully automated procedure on
a number of examples. Experimental results obtained on highly concurrent Petri
nets with more than 6000 nodes and 10 210 reachable states show that the
proposed method can drastically reduce computational cost.
-
5.2 Verification of Asynchronous Circuits Using Time Petri-Net Unfolding
- Alexei Semenov, Alex Yakovlev
This paper describes a novel approach to timing analysis and verification of
asynchronous circuits with bounded delays. The method is based on the
time-driven unfolding of a time Petri net model of a circuit. Each reachable
state, together with its timing constraints is represented implicitly. Our
method is used to verify freedom from hazards in asynchronous circuits
consisting of micropipeline components and logic gates.
-
5.3 Methodology and Tools for State Encoding in Asynchronous Circuit
Synthesis
- Jordi Cortadella, Michael Kishinevsky, Alex Kondratyev, Luciano Lavagno,
Alex Yakovlev
This paper proposes a state encoding method for asynchronous circuits based on
the theory of regions. A region in a Transition System is a set of states that
behave uniformly with respect to a given transition (value change of
an observable signal), and is analogue to a place in a Petri net. Regions are tightly connected with a set of properties that must be preserved across the
state encoding process, namely: (1) trace equivalence between the original and
the encoded specification, and (2) implementability as a speed-independent
circuit. We build on a theoretical body of work that has shown the
significance of regions for such property-preserving transformations, and
describe a set of algorithms aimed at efficiently solving the encoding problem
. The algorithms have been implemented in a software tool called petrify.
Unlike many existing tools, petrify represents the encoded specification as an
STG, and thus allows the designer to be more closely involved in the synthesis
process. The efficiency of the method is demonstrated on a number of “difficult
examples.
-
5.4 A Technique for Synthesizing Distributed Burst-Mode Circuits
- Prabhakar Kudva, Ganesh Gopalakrishnan, Hans Jacobson
We offer a technique to partition a centralized control-flow graph to obtain
distributed control in the context of asynchronous high-level synthesis. The
technique targets Huffman-style asynchronous controllers that are customized
to the problem. It solves the key problem of handling signals that are shared
between the partitions—a problem due to the incompletely specified nature of
asynchronous controllers. We report encouraging experimental results on
realistic examples.
-
5.5 Espresso-HF: A Heuristic Hazard-Free Minimizer for Two-Level Logic
- Michael Theobald, Steven M. Nowick, Tao Wu
We present a new heuristic algorithm for hazard-free minimization of two-level
logic. On nearly all examples, the algorithm finds an exactly minimum-cost
cover. It also solves several problems which have not been previously solved
using existing exact minimizers. We believe this is the first heuristic method
based on Espresso to solve the general hazard-free two-level minimization
problem, for multiple-input change transitions.
-
5.6 Synthesis of Hazard-Free Customized CMOS Complex-Gate Networks Under
Multiple-Input Changes
- Prabhakar Kudva, Ganesh Gopalakrishnan, Hans Jacobson, Steven M. Nowick
This paper addresses the problem of realizing hazard-free single-output Boolean
functions through a network of customized complex CMOS gates tailored to a
given asynchronous controller specification. A customized CMOS gate network can
either be a single CMOS gate or a multilevel network of CMOS gates. It is shown
that hazard-free requirements for such networks are less restrictive than for
simple gate networks. Analysis and efficient synthesis methods to generate such
networks under a multiple-input change assumption (MIC) will be presented.
Chair: D.F. Wong
Organizers: A. Domic, A.B. Kahng
This session opens with an invited review of the latest challenges and directions
for circuit partitioning tools. The first paper refines the classic spectral
approach via search over the objective and use of clustering information.
The second paper proposes a new methodology for generating benchmark inputs
for layout tools. The last paper gives an effective tie-breaking enhancement
for the FM algorithm.
-
6.1 Tutorial: Partitioning of VLSI Circuits and Systems
- Frank M. Johannes
Partitioning plays an increasingly important role in the design process of
VLSI circuits and systems. There are partitioning problems to be solved on all
levels of abstraction. The rapidly increasing size of the designs will make good
partitioning tools even more essential in the future. As an introduction to the
other papers in this session, this tutorial presentation discusses the numerous
facets of partitioning.
-
6.2 New Spectral Linear Placement and Clustering Approach
- Jianmin Li, John Lillis, Lung-Tien Liu, Chung-Kuan Cheng
This paper addresses the linear placement problem by using a spectral approach.
It has been demonstrated that, by giving a more accurate representation of the
linear placement problem, a linear objective function yields better placement
quality in terms of wire length than a quadratic objective function as in the
eigenvector approach [4][11][6]. On the other hand, the quadratic objective
function has an advantage in that it tends to place components more sparsely
than the linear objective function, resulting in a continuous solution closer
to a physically feasible discrete solution. In this paper, we propose an alpha
-order objective function to capture the strengths of both the linear and
quadratic objective functions. We demonstrate that our approach yields improved
spectral placements. We also present a bottom-up clustering algorithm which
iteratively collapses pairs of nodes in a graph using local and global
connectivity information, where the global connectivity information is derived
from the clustering property of the eigenvector approach. The effect of our
new spectral linear placement and clustering approach is demonstrated
on benchmark circuits from MCNC.
-
6.3 Characterization and Parameterized Random Generation of Digital Circuits
- Michael Hutton, J.P. Grossman, Jonathan Rose, Derek Corneil
The development of new Field-Programmed, Mask-Programmed and Laser-Programmed
Gate Array architectures is hampered by the lack of realistic test circuits
that exercise both the architectures and their automatic placement and routing
algorithms. In this paper, we present a method and a tool for generating
parameterized and realistic random circuits. To obtain the realism, we propose
a set of graph-theoretic characteristics that describe a physical netlist, and
have built a tool that can measure these characteristics on existing circuits.
The generation tool uses the characteristics as constraints in the random
circuit generation. To validate the quality of the generated netlists,
parameters that are not specified in the generation are compared with those of
real circuits, and with those of "random" graphs.
-
6.4 A Probability-Based Approach to VLSI Circuit Partitioning
- Shantanu Dutt, Wenyong Deng
Iterative-improvement 2-way min-cut partitioning is an important phase in most
circuit partitioning tools. Most iterative improvement techniques for circuit
netlists like the Fidducia-Mattheyses (FM) method compute the gains of nodes
using local netlist information that is only concerned with the immediate
improvement in the cut-set. This can lead to misleading gain calculations.
Krishnamurthy suggested a lookahead (LA) gain calculation method to
ameliorate this situation; however, as we show, it leaves considerable room
for improvement. We present here a probabilistic gain computation approach
called PROP 1 that is capable of capturing the global and future implications
of moving a node at the current time. Experimental results show that for the
same number of runs, PROP performs much better than FM (by about 30%) and LA
(by about 27%), and is also better than many recent state-of-the-art
clustering-based partitioners like EIG1, WINDOW, MELO and PARABOLI by 15% to
57%. We also show that the space and time complexities of PROP are very
reasonable. Our empirical timing results reveal that it is appreciably faster
than the above clustering-based techniques, and only a little slower than FM
and LA, both of which are very fast.
Chair: James Rowson
Organizers: N. Collins, R. Goeing
Verification is one of the most important design tasks. New techniques are
constantly emerging to address different portions of the broad verification
area. Following a short tutorial, panelists will discuss how some new
techniques contrast and how they fit with existing tools.
-
7.1 Tutorial: Verification of Electronic Systems
- Alberto L. Sangiovanni-Vincentelli, Patrick C. McGeer, Alexander Saldanha
The complexity of electronic systems is rapidly reaching a point where it will
be impossible to verify correctness of the design without introducing a
verification-aware discipline in the design process. Even though computers and
design tools have made important advances, the use of these tools in the
commonly practiced design methodology is not enough to address the design
correctness problem since verification is almost always an after-thought in the
mind of the designer. A design methodology should on one hand put to good use
all techniques and methods developed thus far for verification, from formal
verification to simulation, from visualization to timing analysis, but should
also have specific conceptual devices for dealing with correctness in the face
of complexity such as:
- Formalization, which consists of capturing the design
and its specification in an unambiguous, formal language” with precise semantics
- Abstraction, which eliminates details that are of no importance when checking
whether a design satisfies a particular property.
- Decomposition, which consists of breaking the design at a given level of
the hierarchy into components that can be designed and verified almost
independently.
-
Panel: Hot New Trends in Verification
- Panelists:
Anant Agarwal, Willis Hendley, Isadore Katz, Don McInnis, Patrick Scaglia,
Alex Silbey
Verification is an increasingly important design activity. As gate counts
grow larger and integrated circuit complexity increases, design challenges
are greater than ever. Traditional verification approaches - logic simulation,
specifically - aren't effective or particularly accurate. For example, the
time and effort required to perform functional and timing simulation is making
it an impracticality. In addition, designers are utilizing more and more
existing intellectual property which enables design reuse and helps reduce
implementation time. Verifying existing intellectual property creates another
problem - the large number of simulation vectors.
Chair: Scott Nance
Organizers: D. Stark, B. Frye
Power and speed considerations are critical in the design process. This
session offers new techniques and actual design examples for post layout
timing simulation, verifying critical path delays and designer low voltage/low
power circuits.
-
8.1 Design Considerations and Tools for Low-Voltage Digital System Design
- A.P. Chandrakasan, Isabel Yang, Carlin Vieri, Dimitri Antoniadis
Aggressive voltage scaling to 1V and below through technology, circuit, and
architecture optimization has been proven to be the key to ultra low-power
design. The key technology trends for low-voltage operation are presented
including low-threshold devices, multiple-threshold devices, and SOI and bulk
CMOS based variable threshold devices. The requirements on CAD tools that
allow designers to choose and optimize various technology, circuit, and system
parameters are also discussed.
-
8.2 VAMP: A VHDL Based Concept for Accurate Modeling and Post Layout
Timing Simulation of Electronic Systems
- Bernhard Wunder, Gunther Lehmann, K.D. Müller-Glaser
This paper presents a new concept for accurate modeling and timing simulation
of electronic systems integrated in a typical VHDL design environment, taking
into account the requirements of deep submicron technology. Contrary to
conventional concepts, autonomous models for gates and interconnections are
used. A piece-wise-linear signal representation allows to model waveform
dependent effects. Furthermore, the gate models catch pattern dependencies,
the models of interconnections take into account post layout information with
ramified structure, different layers and even contact holes.
-
8.3 A Systematic Technique for Verifying Critical Path Delays in A 300
MHz Alpha CPU Design Using Circuit Simulation
- Madhav P. Desai, Y.T. Yen
A static timing verifier is an important tool in the design of a complex high
performance VLSI chip such as an Alpha CPU. A timing verifier uses a simple and
pessimistic delay model to identify critical failing paths in the design, which
then need to be fixed. However, the pessimistic delay model results in a large
number of correct paths being identified as failing paths, possibly leading to
wasted design resources. Therefore, each critical path identified by the timing
verifier needs to be analyzed using a circuit simulator such as SPICE in order
to confirm that it is a real failure. Setting up such a simulation is complex,
especially when the critical path consists of structures appearing in a
datapath of the CPU. In this paper, we present algorithms for the construction
of a model for simulating the maximum delay through a critical path. This
technique has been used to analyze several critical paths during the design of
a 300MHz Alpha CPU.
Chair: C. Papachristou
Organizers: K. Wakabayashi, R. Camposano
As synthesis migrates toward higher levels, test and fault tolerance issues
move to be considered earlier in the design process. This session presents
novel approaches to self-diagnosis, reconfiguration and test resource
estimation at a high level.
-
9.1 Tutorial: High-Level Synthesis for Testability: A Survey and Perspective
- Kenneth D. Wagner, Sujit Dey
We review behavioral and RTL test synthesis and synthesis for testability
approaches that generate easily testable implementations. We also include an
overview of high-level synthesis techniques to assist high-level ATPG.
-
9.2 Introspection: A Low Overhead Binding Technique During Self-Diagnosing
Microarchitecture Synthesis
- Balakrishnan Iyer, Ramesh Karri
Introspection, a zero-overhead binding technique during self-diagnosing
microarchitecture synthesis is presented. Given a scheduled control data flow
graph (CDFG) introspective binding exploits the spare computation and data
transfer capacity in a synergistic fashion to achieve low latency fault
diagnostics with near zero area overheads without compromising the performance.
The resulting on-chip fault latencies are one ten-thousandth (10^-4) of
previously reported system level diagnostic techniques. A novel feature of
the proposed technique is the use of spare data transfer capacity in the
interconnect network for diagnostics.
-
9.3 Lower Bounds on Test Resources for Scheduled Data Flow Graphs
- Ishwar Parulkar, Sandeep K. Gupta, Melvin A. Breuer
Lower bound estimations of resources at various stages of high-level synthesis
are essential to guide synthesis algorithms towards optimal solutions. In this
paper we present lower bounds on the number of test resources (i.e. test
pattern generators, signature analyzers and CBILBO registers) required to test
a synthesized data path using built-in self-test (BIST). The estimations are
performed on scheduled data flow graphs and provide a practical way of selecting
or modifying module assignments and schedules such that the resulting synthesized
data path requires a small number of test resources to test itself.
Chair: Jay Lawrence
Organizers : R. McGeer, K. Sakallah
This session covers a full range of issues in Discrete Simulation. The first
paper examines a number of issues in mixed-mode co-simulation through a
software backplane. The second paper looks at an interesting issue in 4-valued
simulation: the distinction between the unstable and unknown states. The
final two papers introduce methods to compact vector sets for power simulation.
-
10.1 Symphony: A Simulation Backplane for Parallel Mixed-Mode Co-Simulation
of VLSI Systems
- Antonio R.W. Todesco, Teresa H-Y. Meng
In this paper we present an integrated simulation paradigm in which parallel
mixed-mode co-simulation is accomplished by integrating sequential simulators
in a software simulation backplane. Distributed conservative event-driven
scheduling is used in combination with an efficient deadlock-free mechanism
for handling synchronous feedback circuits. Simulation concurrency can be
further increased by utilizing circuit pipeline. The resulting parallel
simulation backplane is capable of concurrently simulating systems at circuit,
switch, gate, RTL and behavioral levels. We implemented this parallel
mixed-mode simulator on both the iPSC/860 message-passing machine and the DASH
shared-memory multi-processor. Experimental results are presented.
-
10.2 Oscillation Control in Logic Simulation Using Dynamic Dominance
Graphs
- Peter Dahlgren
Logic-level modeling of asynchronous circuits in the presence of races
frequently gives rise to oscillation. A new method for solving oscillation
occurring in feedback loops (FLs) is presented. First, a set of graph
traversal algorithms is used to locate the FLs and order them with respect to
a dominance relation. Next, a sequence of resimulations with the feedback
vertices forced into stable states is performed. The proposed method can handle
noncritical races occurring in asynchronous circuits and has applications in
feedback bridging fault simulation.
-
10.3 Compact Vector Generation for Accurate Power Simulation
- Shi-Yu Huang, Kuang-Chien Chen, Kwang-Ting Cheng, Mike Tien-Chien Lee
Transistor-level power simulators have been popularly used to estimate the
power dissipation of a CMOS circuit. These tools strike a good balance between
the conventional transistor-level simulators, such as SPICE, and the logic-level
power estimators with regard to accuracy and speed. However, it is still too
time-consuming to run these tools for large designs. To simulate one-million
functional vectors for a 50K-gate circuit, these power simulators may take
months to complete. In this paper, we propose an approach to generate a compact
set of vectors that can mimic the transition behavior of a much larger set of
functional vectors, which is given by the designer or extracted from
application programs. This compact set of vectors can then replace the
functional vectors for power simulation to reduce the simulation time while
still retaining a high degree of accuracy. We present experimental results to
show the efficiency and accuracy of this approach.
-
10.4 Improving The Efficiency of Power Simulators by Input Vector Compaction
- Chi-ying Tsui, Radu Marculescu, Diana Marculescu, Massoud Pedram
Accurate power estimation is essential for low power digital CMOS circuit
design. Power dissipation is input pattern dependent. To obtain an accurate
power estimate, a large input vector set must be used which leads to very long
simulation time. One solution is to generate a compact vector set that is
representative of the original input vector set and can be simulated in a
reasonable time. In this paper, we propose an input vector compaction technique
that preserves the statistical properties of the original sequence. Experimental
results show that a compaction ratio of 100X is achieved with less than 2%
average error in the power estimates.
Chair: Michaela Guiney
Organizers : D. Ku, R.A. Rutenbar
Design environments are central to the success of any complex design project.
This session focuses on efficient inter-tool communication, strategies
to manage design disciplines and workflows, and quantitative evaluation
of the design process.
-
11.1 Efficient Communication in a Design Environment
- Idalina Videira, Paulo J.E. Verissimo, M. Helena Sarmento
This paper presents a new communication service. The novelty of the work
resides in the distributed architecture adopted which is based on communication
agents in every tool and in every host of the design environment. The
importance of the work is demonstrated by the results achieved: improved
performance, reduced network traffic and fault-tolerance to host and network
failures.
-
11.2 A Description Language for Design Process Management
- Peter R. Sutton, Stephen W. Director
A language for defining design discipline characteristics is proposed. Design
discipline characteristics such as abstraction levels, design object
classifications and decompositions, design objectives, and design methodologies
can be defined in a simple machine and human readable form. The language, DDDL,
has led to the development of a user interface for Minerva II, a design
process management tool, which, when configured with a design
discipline description, can be used to aid teams of designers in the solution
of complex multi-disciplinary design problems. DDDL and user interface examples
are presented.
-
11.3 Improved Tool and Data Selection in Task Management
- John W. Hagerman, Stephen W. Director
Task management involves task creation and execution. These are facilitated
using a task schema as exemplified in the Hercules Task Manager. Experience
with Hercules has shown the task schema to be very useful for task creation,
but less than ideal for task resolution, i.e., the selection of tool and data
resources to be used in execution. Tool/data interactions often lead to
resource selection constraints that cannot be captured using dependency
relationships in the schema. We have addressed this by adding conditions to the
task schema which use task-level meta-data to constrain resource selection.
With examples we show that conditions are useful for handling a wide variety
of real constraints.
-
11.4 Application of a Markov Model to the Measurement, Simulation, and
Diagnosis of an Iterative Design Process
- Eric W. Johnson, Luis Castillo, Jay B. Brockman
This paper presents the use of a Markov-based model for analyzing iterative
design processes. Techniques are developed for collecting process metadata and
calibrating the model. An experiment is described that demonstrates the
utility and accuracy of the model for simulating design processes and
identifying design process bottlenecks.
-
SESSION 12
PANEL: GEARING UP FOR THE TECHNOLOGY EXPLOSION
Chair: Gary Smith
Organizer: M. Kenefick
Panelists:
Walt Davis, Glenn House, Kurt Keutzer, Jim Pena, Craig Peterson,
Lawrence Rubin, Jim Solomon
Over the next decade it is predicted that demands for advanced electronics
equipment will create a deficit in fab capacity, engineering talent, and
CAD tools needed to meet high abstraction, large gate count, deep submicron
designs. The use of compact electronics in consumer products is becoming
pervasive and to meet the demand, fabrication capacity and human resources
must be increased to enable the design and manufacture of new semiconductor
chips. A key component to this technology explosion is understanding
technology trends, where to invest, how to gear up new designs quickly and
how to re-use key design blocks to cut design time. This panel will explore
technology trends with key technologists from the user and design automation
industries.
-
SESSION 13
TUTORIAL: THE SPICE FET MODELS: PITFALLS AND PROSPECTS
(Are You An Educated Model Consumer ?)
Chair: Daniel Foty
Organizer: J. Cooley
Presenter: Daniel Foty
If you're a digital design engineer who needs to get reacquainted with SPICE
in order to do sub-micron design, then you should attend this tutorial!
The tutorial will discuss FET modeling using several popular flavors of
SPICE. Benefits and pitfalls of different SPICE flavors will be highlighted.
Chair: Gary D. Hachtel
Organizers: S. Malik, R. McGeer
This session starts off with a one hour embedded tutorial describing what
it takes to design a practical logic synthesis system. This will provide
the background needed for the subsequent papers in logic synthesis. The
second paper presents exciting new results on the binate covering problem
which has several applications in logic synthesis.
-
14.1 Tutorial: Design of a Logic Synthesis System
- Richard Rudell
Logic synthesis systems are complex systems and algorithmic research in
synthesis has become highly specialized. This creates a gap where it is often
not clear how an advance in a particular algorithm translates into a better
synthesis system. This tutorial starts by describing a set of constraints which
synthesis algorithms must satisfy to be useful. A small set of established
techniques are reviewed relative to these criteria to understand their
applicability and the potential for further research in these areas.
-
14.2 On Solving Covering Problems
- Olivier Coudert
The set covering problem and the minimum cost assignment problem (respectively
known as unate and binate covering problem) arise throughout the logic synthesis
flow. This paper investigates the complexity and approximation ratio of two
lower bound computation algorithms from both a theoretical and practical point
of view. It also presents a new pruning technique that takes advantage of the
partitioning.
Chair: Janusz Rajski
Organizers: S. Kundu, Y. Zorian
This session presents new more effective algorithms to generate tests that
guarantee complete diagnosis of multiple interconnect faults. Boolean satisfiability
is the basis of new formulation and more efficient algorithms to generate
tests for path delay faults. New compaction techniques for test sequences
of sequential circuits which significantly reduce the test application time
and memory requirements.
-
15.1 A New Complete Diagnosis Patterns for Wiring Interconnects
- Sungju Park
It is important to test the various kinds of interconnect faults between chips
on a card/module. When boundary scan design techniques are adopted, the chip
to chip interconnection test generation and application of test patterns is
greatly simplified. Various test generation algorithms have been developed for
interconnect faults. A new interconnect test generation algorithm is introduced.
It reduces the number of test patterns by half over present techniques. It also
guarantees the complete diagnosis of multiple interconnect faults.
-
15.2 A Satisfiability-Based Test Generation for Path Delay Faults in
Combinational Circuits
- Chih-Ang Chen, Sandeep K. Gupta
This paper describes a new formulation to generate robust tests for path delay
faults in combinational circuits based on Boolean satisfiability. Conditions
to detect a target path delay fault are represented by a Boolean formula.
Unlike the technique described in [16], which extracts the formula for each
path delay fault, the proposed formulation needs to extract the formula only
once for each circuit cone. Experimental results show tremendous time saving
on formula extraction compared to other satisfiability-based ATPG algorithms.
This also leads to low test generation time, especially for circuits that have
many paths but few outputs. The proposed formulation has also been modified to
generate other types of tests for path delay faults.
-
15.3 On Static Compaction of Test Sequences for Synchronous Sequential
Circuits
- Irith Pomeranz, Sudhakar M. Reddy
We propose three static compaction techniques for test sequences of synchronous
sequential circuits. We apply the proposed techniques to test sequences
generated for benchmark circuits by various test generation procedures. The
results show that the test sequences generated by all the test generation
procedures considered can be significantly compacted. The compacted sequences
thus have shorter test application times and smaller memory requirements. As a
by-product, the fault coverage is sometimes increased as well. More importantly,
the ability to significantly reduce the length of the test sequences indicates
that it may be possible to reduce test generation time if superfluous input
vectors are not generated.
Chair: James Spoto
Organizers : R.A. Rutenbar, J. White
Analog and mixed-signal designs continue to present serious challenges to
existing tools. This session offers new techniques for high-level analog
synthesis, behavioral modeling, and performance-oriented layout.
-
16.1 An O(n) Algorithm for Transistor Stacking with Performance
Constraints
- Bulent Basaran, Rob A. Rutenbar
We describe a new constraint-driven stacking algorithm for diffusion area
minimization of CMOS circuits. It employs an Eulerian trail finding algorithm
that can satisfy analog-specific performance constraints. Our technique is
superior to other published approaches both in terms of its time complexity
and in the optimality of the stacks it produces. For a circuit with n
transistors, the time complexity is O(n). All performance constraints are
satisfied and, for a certain class of circuits, optimum stacking is
guaranteed.
-
16.2 Use of Sensitivities and Generalized Substrate Models in
Mixed-Signal IC Design
- Paolo Miliozzi, Iasson Vassiliou, Edoardo Charbon, Enrico
Malavási, Alberto Sangiovanni-Vincentelli
A novel methodology for circuit design and automatic layout generation is
proposed for a class of mixed-signal circuits in presence of layout parasitics
and substrate induced noise. Accurate and efficient evaluation of the circuit
during design is possible by taking into account such non-idealities.
Techniques are presented to derive and use a set of constraints on substrate
noise and on the geometric instances of the layout. Verification is performed
using substrate extraction in combination with parasitic estimation techniques.
To show the suitability of the approach, a VCO for a PLL has been designed and
implemented in a CMOS 1um technology. The circuit has been optimized both at the
schematic and at the layout level for power and performance, while its
sensitivity to layout parasitics and substrate noise has been minimized.
-
16.3 RTL Emulation: The Next Leap in System Verification
- Sanjay Sawant, Paul Giordano
The worldwide electronics market is booming, fueled by the customers’
insatiable appetite for low-cost computers, connectivity and appliances packed
with high-technology features. While the sheer number of new design starts may
not be noteworthy, the development of new chips and systems that are becoming
more complex at a phenomenal rate.
-
16.4 Equation-Based Behavioral Model Generation for Nonlinear Analog
Circuits
- Carsten Borchers, Lars Hedrich and Erich Barke
A fully automatic method for generating behavioral models for nonlinear analog
circuits is presented. This method is based on simplifications of the system
of nonlinear differential equations which is derived from a transistor level
netlist. Generated models include nonlinear dynamic behavior. They are composed
of symbolic equations comprising circuit parameters. Accuracy and simulation
speed-up are shown by several examples.
-
SESSION 17
PANEL: CORE-BASED DESIGN FOR SYSTEM-LEVEL ASICs - WHOSE JOB IS IT?
Chair: Lynn Watson
Organizer: R. Goldman
Panelists:
Kim Asal, Andreas Danuser, Chris King, Susan Mason, Wayne Miller,
Jim Pena, Scott Runner
Constrained both by the complexity of system-level, core-based design, and
by limited engineering resources of the ASIC/semiconductor vendors, today's
burgeoning core-based design industry must find new avenues to fulfill its
market/revenue potential. This panel explores new roles, relationships and
technologies that bridge the gap between the market/system-level expertise
of the system vendor, the process/core-based design expertise of the ASIC
vendor, and the design tool and methodology expertise of the EDA vendor.
-
SESSION 18
PANEL: A COMMON STANDARDS ROADMAP
Chair: Alain Hanover
Organizer: J. Smith
Panelists:
Rich Goldman, Andy Graham, Randolph E. Harr, Gregory W. Ledenbach,
A. Richard Newton, Robert Rozeboom, Tabuchi Kinya
The complexity and design of high performance electronic systems demands
highly interoperable EDA design environments. Today we are living with Moore's
law, which states "the complexity of technology continues to double
every two years". And we are also diving into the world of deep sub-micron
design where traditional assumptions about gate and interconnect delays
and circuit modeling are constantly challenged.
How are the various consortia working together to define a roadmap that
will serve well into the 21st century? This panel is composed of individuals
representing organizations that share the goals of facilitating interoperability,
reducing implementation costs, and supporting standards. They will discuss
technology trends that are driving the standards processes and evaluate
the technology roadmap needed for the 21st century.
Chair: Iris Bahar
Organizers: R. McGeer, S. Malik
Logic synthesis has stabilized somewhat over the classic domains of area
and delay minimization of random logic. This leaves the problem of synthesis
of specialized circuits or use of complex subcircuits. The four papers in
this session examine separate aspects of this problem.
-
19.1 Multilevel Logic Synthesis for Arithmetic Functions
- Chien-Chung Tsai, Malgorzata Marek-Sadowska
The arithmetic functions, as a subclass of Boolean functions, have very compact
descriptions in the AND and XOR operators. Any n-bit adder is a prime example.
This paper presents a multilevel logic synthesis method which is particularly
suited for arithmetic functions and utilizes their natural representations in
the field GF(2). Algebraic factorization is performed to reduce the literal
count. A direct translation of the AND/XOR representations of arithmetic
functions into multilevel networks often results in excessive area, mainly due
to the large area cost of XOR gates. We present a process of redundancy removal
which reduces many XOR gates to single AND or OR gates without altering the
functional behavior of the network. The redundancy removal process requires
only to simulate a small and decidable set of primary input patterns.
Preliminary results show that our method produces circuits, before and after
technology mapping, with area improvement averaging 17% when compared to
Berkeley SIS 1.2. The run time is reduced by at least 50%. The resulting
circuits also have good testability and power consumption properties.
-
19.2 Synthesis by Spectral Translation Using Boolean Decision Diagrams
- Jeffery P. Hansen, Masatoshi Sekine
Many logic synthesis systems are strongly influenced by the size of the SOP
(Sum-of-Products) representation of the function being synthesized. Two-level
PLA (Programmable Logic Array) synthesis and many multi-level synthesis systems
perform poorly without a good SOP representation of the target function. In
this paper, we propose a new spectral-based algorithm using BDDs (Boolean
Decision Diagram) to transform the target function into a form that is easier
to synthesize by using a linear filter on the inputs. Using the methods
described in this paper, we were able to perform spectral translation on
circuits with many more inputs and much larger cube sets then previously
possible. This can result in a substantial decrease in delay and area for some
classes of circuits.
-
19.3 Delay Minimal Decomposition of Multiplexers in Technology Mapping
- Shashidhar Thakur, D.F. Wong, S. Krishnamoorthy
Technology mapping requires the unmapped logic network to be represented in
terms of base functions, usually two-input NORs and inverters. Technology
decomposition is the step that transforms arbitrary networks to this form.
Typically, such decomposition schemes ignore the fact that certain circuit
elements can be mapped more efficiently by treating them separately during
decomposition. Multiplexers are one such category of circuit elements. They
appear very naturally in circuits, in the form of datapath elements and as a
result of synthesis of CASE statements in HDL specifications of control logic.
Mapping them using multiplexers in technology libraries has many advantages.
In this paper, we give an algorithm for optimally decomposing multiplexers,
so as to minimize the delay of the network, and demonstrate its effectiveness
in improving the quality of mapped circuits.
-
19.4 Error Correction Based on Verification Techniques
- Shi-Yu Huang, Kuang-Chien Chen, Kwang-Ting Cheng
In this paper, we address the problem of correcting a combinational circuit
that is an incorrect implementation of a given specification. Most existing
error-correction approaches can only handle circuits with certain types of
errors. Here, we propose a general approach that can correct a circuit with
multiple errors without assuming any error model. We identify internal
equivalent pairs to narrow down the possible error locations using local BDD’s
with dynamic support. We also employ a technique called back-substitution to
correct the circuit incrementally. This approach can also be used to verify
circuit equivalence. The experimental results of correcting fully SIS-optimized
benchmark circuits with a number of injected errors will be presented.
Chair: Yervant Zorian
Organizers : S. Kundu, J. Rajski
In an era of deep-submicron technology, this session discusses area efficient
and timing driven techniques to design testable circuits.
-
20.1 Layout Driven Selecting and Chaining of Partial Scan Flip-Flops
- Chau-Shen Chen, Kuang-Hui Lin and TingTing Hwang
In an era of sub-micron technology, routing is becoming a dominant factor in
area, timing, and power consumption. In this paper, we will study the problem
of selecting and chaining of scan ip-ops with the objective of achieving
minimum routing area overhead. Most of previous work on partial scan has put
emphasis on selecting as few scan ip-ops as possible to break all cycles in
S-graph. However, the ip-ops that break more cycles are often the ones that
have more fanins and fanouts. The area adjacent to these nodes is often
congestive in layout. Such selections will cause layout congestion and increase
in number of tracks to chain the scan ip-ops. We propose a matching-based
algorithm to perform simultaneously the selecting and chaining of scan ip-op
taking layout information into account. Experimental results show that our
approach outperforms the traditional one in final layout area.
-
20.2 Test Point Insertion: Scan Paths through Combinational Logic
- Chih-chang Lin, M. Marek-Sadowska, Kwang-Ting Cheng, Mike Tien-Chien Lee
We propose a low-overhead scan design methodology which employs a new test
point insertion technique to establish scan paths through the functional logic.
The technique re-uses the existing functional logic; as a result, the
design-for-testability (DFT) overhead on area or timing can be minimized. In
this paper we show an algorithm which considers the test point insertion for
reducing the area overhead for the full scan design. We also discuss its
application to timing-driven partial scan design.
-
20.3 Area Efficient Pipelined Pseudo-Exhaustive Testing with Retiming
- Huoy-Yu Liou, Ting-Ting Y. Lin, Chung-Kuan Cheng
Pseudo-exhaustive testing (PET) offers a simple solution to testing complex
circuits and systems [12]. However, PET suffers long testing time for test
generation and high area overhead of test hardware. The pipelined
pseudo-exhaustive testing (PPET) achieves fast testing time with high fault
coverage by pipelining test vectors and test responses among partitioned
circuit segments [15]. To reduce hardware overhead in PPET, a novel approach
for implementing area-efficient PPET is presented. Circuit partitioning with
retiming is used to convert designs for PPET. Experimental results show that
this approach exhibits an average of 20% area reduction over non-retimed
testable circuits. Our algorithm offers high utilization of existing flip-flops
(FFs) and provides a framework for further performance optimization.
Chair: Peter Feldmann
Organizers: J. White, A.T. Yang
The first paper introduces pole analysis via congruence transformation (PACT)
as a technique for multiport RC network reduction with the properties of
absolute stability and controllable accuracy. The second paper presents
a two-phase homotopy technique for obtaining DC operating point of large,
hard-to-solve MOS circuits. The last paper presents a preconditioned recycled
Krylon-subspace method to accelerate ac and noise simulation of linear periodically
time-varying communication circuits.
-
21.1 Stable and Efficient Reduction of Large, Multiport RC Networks by
Pole Analysis via Congruence Transformations
- Kevin J. Kerns, Andrew T. Yang
A novel technique is presented which employs Pole Analysis via Congruence
Transformations (PACT) to reduce RC networks in a well-conditioned manner.
Pole analysis is shown to be more efficient than Padé approximations
when the number of network ports is large, and congruence transformations
preserve the passivity (and thus absolute stability) of the networks. Networks
are represented by admittance matrices throughout the analysis, and this
representation simplifies interfacing the reduced networks with circuit
simulators as well as facilitates realization of the reduced networks using RC
elements. A prototype SPICE-in, SPICE-out, network reduction CAD tool called
RCFIT is detailed, and examples are presented which demonstrate the accuracy
and efficiency of the PACT algorithm.
-
21.2 Homotopy Techniques for Obtaining a DC Solution of Large-Scale MOS
Circuits
- J.S. Roychowdhury, Robert C. Melville
A new technique for obtaining a DC operating point of large, hard-to-solve MOS
circuits is reported in this paper. Based on homotopy, the technique relies on
the provable global convergence of arc length continuation and uses a novel
method for embedding the continuation parameter into MOS devices. The new
embedding circumvents inefficiencies and numerical failures that limit the
practical applicability of previous simpler embeddings. Use of the technique
in a production environment has led to the routine solution of large, previously
hard-to-solve circuits.
-
21.3 Efficient AC and Noise Analysis of Two-Tone RF Circuits
- Ricardo Telichevesky, Kenneth S. Kundert, Jacob White
In this paper we present a preconditioned recycled Krylov-subspace method to
accelerate a recently developed approach for ac and noise analysis of linear
periodically-varying communication circuits. Examples are given to show that
the combined method can be used to analyze switching filter frequency response,
mixer 1/f noise frequency translation, and amplifier intermodulation distortion.
In addition, it is shown that for large circuits the pre-conditioned recycled
Krylov-subspace method is up to forty times faster than the standard optimized
direct methods.
Chair: Stephan Ohr
Organizers: S. Napper, R.A. Rutenbar
Mixed signal design is extremely difficult and has not benefited much from
tool advances. After a short tutorial on analog synthesis techniques, a
panel will address the state of mixed signal design and what is necessary
for real design problem.
-
22.1 Tutorial: Synthesis Tools for Mixed Signal ICS: Progress on Frontend
and Backend Strategies
- L. Richard Carley, Georges G.E. Gielen, Rob A. Rutenbar, Willy M.C. Sa
nsen
Digital synthesis tools such as logic synthesis and semi-custom layout have
dramatically changed both the frontend (specification to netlist) and backend
(netlist to mask) steps of the digital IC design process. In this tutorial, we
look at the last decade’s worth of progress on analog circuit synthesis and
layout tools. We focus on the frontend and backend of analog and mixed-signal
IC design flows. The tutorial summarizes the problems for which viable
solutions are emerging, and those which are still unsolved.
-
Panel: Mixed Signal Designs: Are There Solutions Today?
- Panelists:
Ariel Cao, Georges Gielen, Felicia James, Rob A. Rutenbar, Baker P. Scott,
David Squires
While it is now possible to obtain real productivity enhancements using EDA
tools to synthesize digital logic, the same productivity is hardly available
for analog circuit designers. Under the best of circumstances, mixed
analog-digital tools can provide a rough outline for the construction of new
devices and circuits. More likely, the user of commercial tools will need
to perform a considerable amount of "tuning-and-tweaking" just to get
commercial tools to perform properly. Yet, the prospect for tool development
remains hopeful. in this combination tutorials and panel discussion, a mixture
of university researches, tool developers and users provide insight into the
current state of art in EDA Tools for Mixed Analog-and-Digital Design.
Chair: Rajesh Raina
Organizers: N. Weste, P. Duncan
With microprocessors reaching multi-million transistor levels of integration,
verifying that a design is functionally correct has become a critical problem.
Papers in this session explore different avenues of functional verification.
-
23.1 Code Generation and Analysis for the Functional Verification of
Microprocessors
- Anoosh Hosseini, Dimitrios Mavroidis, Pavlos Konas
A collection of code generation tools which assist designers in the functional
verification of high performance microprocessors is presented. These tools
produce interesting test cases by using a variety of code generation methods
including heuristic algorithms, constraint-solving systems, user-provided
templates, and pseudo-random selection. Run-time analysis and characterization
of the generated programs provide an evaluation of their effectiveness in
verifying a microprocessor design, and suggest improvements to the code
generation process. An environment combining the code generation tools with the
analysis tools has been developed, and it has provided excellent functional
coverage for several generations of high-performance microprocessors.
-
23.2 Innovative Verification Strategy Reduces Design Cycle Time for
High-End Sparc Processor
- Val Popescu, Bill McNamara
Superscalar processor developers are creatively leveraging best-in-class design verification tools to meet narrow market windows. Accelerated simulation is especially useful flowing to its flexibility for verifying at many points during the design cycle. A unique "verification backplane" makes continuous verification at any level(s) of abstraction available to each design team member throughout the design cycle.
-
23.3 Hardware Emulation for Functional Verification of K5
- Gopi Ganapathy, Ram Narayan, Glenn Jorden, Denzil Fernandez, Ming Wang, Jim Nishimura
The K5 (TM) microprocessor is a 4 Millioon transistor, superscalar, X86
microprocessor. The K5(TM) microprocessor is an AMD original design, verifying
compatibility with the existing X86 architecture and software is crucial to
its success in the market place. The X86 architecture has been constantly
evolving over several years without any published specification. The primary
mechanism for functional design verification of an X86 processor is simulation.
The ability to execute a good sample set of the X86 software based on a model
of the processor architecture before tapeout, is key to achieving a very high
confidence first silicon. The Quickturn Hardware Emulation system allows us to
map a model of the design onto hardware resources and execute it at high speeds.
In this paper we present the emulation methodology that was jointly developed
for K5 and applied successfully to meet our functional verification goals.
-
23.4 Functional Verification Methodology for the PowerPC 604(TM) Microprocessor
- James Monaco, David Holloway and Rajesh Raina
Functional (i.e., logic) verification of the current generation of complex,
super-scalar microprocessors such as the PowerPC 604Ô microprocessor presents
significant challenges to a project’s verification participants. Simple
architectural level tests are insufficient to gain confidence in the quality of
the design. Detailed planning must be combined with a broad collection of
methods and tools to ensure that design defects are detected as early as
possible in a projects life-cycle. This paper discusses the methodology
applied to the functional verification of the PowerPC 604 microprocessor.
-
23.5 I'm Done Simulating; Now What? Verification Coverage Analysis and Correctness Checking of the DECchip 21164 Alpha Microprocessor
- Michael Kantrowitz, Lisa M. Noack
Digital's Alpha-based DECchip 21164 processor was verified extensively prior
to fabrication of silicon. This simulation-based verification effort used
implementation-directed, pseudorandom exercisers which were supplemented with
implementation-specific, hand-generated tests. Special emphasis was placed on
the tasks of checking for correct operation and functional coverage analysis.
Coverage analysis shows where testing is incomplete, under the assumption that
untested logic often contains bugs. Correctness checkers are various mechanisms
(both during and after simulation) that monitor a test to determine if it was
successful. This paper details the coverage analysis and correctness checking
techniques that were used. We show how our methodology and its implementation
was successful, and we discuss the reasons why this methodology allowed several
minor bugs to escape detection until the first prototype systems were available.
These bugs were corrected before any chips were shipped to customers.
Chair: David Knapp
Organizers: R. Camposano, R. Walker
Power optimization can be addressed in multiple ways at the high level.
This session presents several interesting approaches: glitch minimization
reports a power reduction of 20%, clocking schemes allow power savings up
to 50%, transformations in linear systems reduce power 3 to 15 times and
scheduling techniques can save up to 40% in power. Furthermore, reducing
switching activity not only minimizes energy dissipation, but maximizes
the MTF due to electromigration as well.
-
24.1 Glitch Analysis and Reduction in Register Transfer Level Power Optimization
- Anand Raghunathan, Sujit Dey, Niraj K. Jha
We present design-for-low-power techniques based on glitch reduction for
register-transfer level circuits. We analyze the generation and propagation of
glitches in both the control and data path parts of the circuit. Based on the
analysis, we develop techniques that attempt to reduce glitching power
consumption by minimizing generation and propagation of glitches in the RTL
circuit. Our techniques include restructuring multiplexer networks (to enhance
data correlations, eliminate glitchy control signals, and reduce glitches on
data signals), clocking control signals, and inserting selective rising/falling
delays. Our techniques are suited to control-flow intensive designs, where
glitches generated at control signals have a significant impact on the circuits power consumption, and multiplexers and registers often account for a major
portion of the total power. Application of the proposed techniques to
several examples shows significant power savings, with negligible area and
delay overheads.
-
24.2 An Effective Power Management Scheme for RTL Design Based on Multiple Clocks
- C. Papachristou, M. Spinning, M. Nourani
This paper presents an effective technique of low power design for RTL circuits
and microarchitectures. The basis of this technique is: a) to use a multiple
clocking scheme of n non-overlapping clocks, by dividing the frequency f of a
single clock into n cycles; b) to partition the circuit into disjoint modules
and assign each module to a distinct clock with frequency f/n. However, the
overall effective frequency of the circuit remains f the single clock frequency.
The results show that our multiple clocking scheme provides more effective power
management (power savings up to 50%) at the RTL in comparison to conventional
power management techniques based on gated clocks.
-
24.3 Power Optimization in Programmable Processors and ASIC Implementations of Linear Systems: Transformation-based Approach
- Mani Srivastava, Miodrag Potkonjak
Linear computations form an important type of computation that is widely used
in DSP and communications. We introduce two approaches for power minimization
in linear computations using transformations. First we show how unfolding
combined with the procedure for maximally fast implementation of linear
computations reduces power in single processor and multiprocessor
implementations by factors 2.2 and 8 respectively. To accomplish this we
exploit a newly identified property of unfolding whereby as a linear system is
unfolded, the number of operations per sample at first decreases to reach a
minimum and then begins to rise. For the custom ASIC implementation even higher
improvements are achievable using the second transformational approach, which
builds upon the unfolding based strategy of the first approach. We developed a
method that combines the multiple constant multiplication (MCM) technique with
the generalized Horner’s scheme and unfolding in such a way that power is
minimized.
-
24.4 Scheduling Techniques to Enable Power Management
- José Monteiro, Srinivas Devadas, Pranav Ashar, Ashutosh Mauskar
"Shut-down" techniques are effective in reducing the power dissipation of logic
circuits. Recently, methods have been developed that identify conditions under
which the output of a module in a logic circuit is not used for a given clock
cycle. When these conditions are met, input latches for that module are
disabled, thus eliminating any switching activity and power dissipation. In
this paper, we introduce these power management techniques in behavioral
synthesis. We present a scheduling algorithm which maximizes the “shut-down
period of execution units in a system. Given a throughput constraint and
the number of execution units available, the algorithm first schedules
operations that generate controlling signals and activates only those modules
whose result is eventually used. We present results which show that this
scheduling technique can save up to 40% in power dissipation.
-
24.5 Electromigration Reliability Enhancement Via Bus Activity Distribution
- Aurobindo Dasgupta, Ramesh Karri
Electromigration induced degradation in integrated circuits has been accelerated
by continuous scaling of device dimensions. We present a methodology for
synthesizing high-reliability and low-energy microarchitectures at the RT
level by judiciously binding and scheduling the data transfers of a control
data flow graph(CDFG) representation of the application onto the buses in the
microarchitecture. The proposed method accounts for correlations between data
transfers and the constraints on the number of buses, area and delay.
Chair: Andrew T. Yang
Organizers: J. White, A. Yang
In deep submicron designs, three-dimensional parasitics and electromagnetic
coupling are critical effects to be modeled and managed. This session offers
a cross-section of the latest numerical techniques for efficient, accurate
extraction of these effects.
-
25.1 A Sparse Image Method for BEM Capacitance Extraction
- Byron Krauter, Yu Xia, Aykut Dengi, Lawrence T. Pileggi
—Boundary element methods (BEM) are often used for complex 3-D capacitance
extraction because of their efficiency, ease of data preparation, and automatic
handling of open regions. BEM capacitance extraction, however, yields a dense
set of linear equations that makes solving via direct matrix methods such as
Gaussian elimination prohibitive for large problem sizes. Although iterative,
multipole-accelerated techniques have produced dramatic improvements
in BEM capacitance extraction, accurate sparse approximations of the
electrostatic potential matrix are still desirable for the following reasons.
First, the corresponding capacitance models are sufficient for a large number
of analysis and design applications. Moreover, even when the utmost accuracy is
required, sparse approximations can be used to precondition iterative solution
methods. In this paper, we propose a definition of electrostatic potential that
can be used to formulate sparse approximations of the electrostatic potential
matrix in both uniform and multilayered planar dielectrics. Any degree of
sparsity can be obtained, and unlike conventional techniques which discard the
smallest matrix terms, these approximations are provably positive definite for
the troublesome cases with a uniform dielectric and without a groundplane.
-
25.2 A Parallel Precorrected FFT Based Capacitance Extraction Program
for Signal Integrity Analysis
- N.R. Aluru, V.B. Nadkarni, J. White
—In order to optimize interconnect to avoid signal integrity problems, very fast
and accurate 3-D capacitance extraction is essential. Fast algorithms, such as
the multipole or precorrected Fast Fourier Transform (FFT) accelerated methods
in programs like FASTCAP, must be combined with techniques to exploit the
emerging cluster-of-workstation based parallel computers like the IBM SP2. In
this paper, we examine parallelizing the precorrected FFT algorithm for 3-D
capacitance extraction and present several algorithms for balancing workload
and reducing communication time. Results from a prototype implementation on an
eight processor IBM SP2 are presented for several test examples, and the
largest of these examples achieves nearly linear parallel speed-up.
-
25.3 Multipole Accelerated Capacitance Calculation for Structures with
Multiple Dielectrics with High Permitivity Ratios
- Johannes Tausch, Jacob White
This paper describes a new boundary integral formulation for the three-dimensional
capacitance calculation of structures with multiple dielectrics. Unlike the
existing equivalent-charge formulation, the new approach allows accurate
numerical approximations when the permittivity ratios of the dielectrics are
large. A multipole-accelerated algorithm based on this approach is described,
and numerical results of its implementation are presented.
-
25.4 Fast Parameters Extraction of General Three-Dimension Interconnects
Using Geometry Independent Measured Equation of Invariance
- Weikai Sun, Wayne Wei-Ming Dai and Wei Hong
Measured Equation of Invariance(MEI) is a new concept in computational
electromagnetics. It has been demonstrated that the MEI technique can be used
to terminate the meshes very close to the object boundary and still strictly
preserves the sparsity of the FD equations. Therefore, the final system matrix
encountered by MEI is a sparse matrix with size similar to that of integral
equation methods. However, complicated Green's function and disagreeable
Sommerfeld integrals make the traditional MEI very difficult, if not impossible,
to be applied to analyze multilayer and multiconductor interconnects. In this
paper, we propose the Geometry Independent MEI(GIMEI) which substantially
improved the original MEI method. We use GIMEI for capacitance extraction of
general three-dimension VLSI interconnect. Numerical results are in good
agreement with published data and those obtained by using FASTCAP [1], while
GIMEI is generally an order of magnitude faster than FASTCAP and uses
significant less memory than FAST-CAP.
-
25.5 Efficient Full-Wave Electromagnetic Analysis Via Model-Order Reduction
of Fast Integral Transforms
- Joel R. Phillips, Eli Chiprout, David D. Ling
An efficient full-wave electromagnetic analysis tool would be useful in many
aspects of engineering design. Development of integral-equation based tools
has been hampered by the high computational complexity of dense matrix
representations and difficulty in obtaining and utilizing the frequency-domain
response. In this paper we demonstrate that an algorithm based on application
of a novel model-order reduction scheme directly to the sparse model generated
by a fast integral transform has significant advantages for frequency- and
time-domain simulation.
Chair: M. Marek-Sadowska
Organizers: A.B. Kahng, Y.-L. Lin
This session presents techniques for topological design, buffer sizing and
wire sizing targeted towards performance, power, area and skew optimization
of clock and signal routing.
-
26.1 Useful-Skew Clock Routing with Gate Sizing for Low Power Designs
- Joe G. Xi, Wayne Wei-Ming Dai
Instead of zero-skew or assuming a fixed skew bound, we seek to produce useful
skews in clock routing. This is motivated by the fact that negative skew may
allow a larger timing budget for gate sizing. We construct a useful-skew tree
(UST) such that the total clock and logic power(measured as a cost function) is
minimized. Given a required clock period and feasible gate sizes, a set of
negative and positive skew bounds are generated. The allowable skews within
these bounds and feasible gate sizes form the feasible solution space of our
problem. We use a merging segment perturbation procedure and a simulated
annealing approach to explore various tree configurations. This is complemented
by a bi-partitioning heuristic to generate appropriate connection topology and
take advantage of useful skews. Experimental results have shown 11% to 22%
total power reduction over previous methods of clock routing with zero-skew or
single fixed skew bound and separately sizing logic gates.
-
26.2 Sizing of Clock Distribution Networks for High Performance CPU Chips
- Madhav P. Desai, Radenko Cvijetic, James Jensen
In a high performance microprocessor such as Digital's 300MHz Alpha 21164, the
distribution of a high quality clock signal to all regions of the device is
achieved using a complex grid with multiple drivers. The large capacitance of
this distribution grid together with the high clock frequency results in
substantial power dissipation in the chip. In this paper, we describe
techniques to size the interconnect segments (thus reducing their capacitance)
of the distribution network while meeting certain design goals. These techniques
place no restrictions on the topology of the network being sized, and have been
successfully used on very large examples.
-
26.3 New Performance Driven Routing Techniques With Explicit Area/Delay Tradeoff
and Simultaneous Wire Sizing
- John Lillis, Chung-Kuan Cheng, Ting-Ting Y. Lin, Ching-Yen Ho
We present new algorithms for construction of performance driven Rectilinear
Steiner Trees under the Elmore delay model. Our algorithms represent a
departure from previous approaches in that we derive an explicit area/delay
trade-off curve. We achieve this goal by limiting the solution space to the
set of topologies induced by a permutation on the sinks of the net. This
constraint allows efficient identification of optimal solutions while still
providing a rich solution space. We also incorporate simultaneous wire sizing.
Our technique consistently produces topologies equaling the performance of
previous approaches with substantially less area overhead.
-
26.4 Constructing Lower and Upper Bounded Delay Routing Trees Using
Linear Programming
- Jaewon Oh, Iksoo Pyo, Massoud Pedram
This paper presents a new approach for solving the Lower and Upper Bounded
delay routing Tree (LUBT) problem using linear programming. LUBT is a Steiner
tree rooted at the source node such that delays from the source to sink nodes
lie between the given lower and upper bounds. We show that our proposed method
produces minimum cost LUBT for a given topology under a linear delay model.
Unlike recent works which control only the difference between the maximum and
the minimum source-sink delay, we construct routing trees which satisfy
distinct lower and upper bound constraints on the source-sink delays. This
formulation exploits all the flexibility that is present in low power and high
performance clock routing tree design.
-
26.5 Fast Performance-Driven Optimization for Buffered Clock Trees Based
on Lagrangian Relaxation
- Chung-Ping Chen, Yao-Wen Chang, D.F. Wong
Delay, power, skew, area, and sensitivity are the most important concerns in
current clock-tree design. We present in this paper an algorithm for
simultaneously optimizing the above objectives by sizing wires and buffers in
clock trees. Our algorithm, based on Lagrangian relaxation method, can
optimally minimize delay, power, and area simultaneously with very low skew and
sensitivity. With linear storage overall and linear runtime per iteration, our
algorithm is extremely economical, fast, and accurate; for example, our
algorithm can solve a 6201-wire-segment clock-tree problem using about
1-minute runtime and 1.3-MB memory and still achieve pico-second precision on an
IBM RS/6000 workstation.
-
SESSION 27
TUTORIAL: HOW TO WRITE AWK AND PERL SCRIPTS TO
ENABLE YOUR EDA TOOLS TO WORK TOGETHER
Chair: Shankar Hemmady
Organizer: J. Cooley
Presenters: Robert C. Hutchins, Shankar Hemmady
This tutorial introduces the Awk, Nawk, and Perl scripting languages,
and shows how they can be used to massage a Synopsys Verilog netlist to
allow simulation using Cadence's Verilog-XL.
Chair: Neil Weste
Organizers: B. Frye, D. Stark
With the increasing complexity of systems, determining that a given chip
or collection of chips perform the required function, has become an increasingly
acute problem. Thus, functional verification has become a critical design
activity. This session contains three papers that describe a variety of
verification techniques designed to prove that the functionality of a given
system is indeed that indicated in the original specification.
-
28.1 The Automatic Generation of Functional Test Vectors for Rambus Design
- K.D. Jones, J.P. Privitera
We present a method for the automatic generation of test vectors for functional
verification, giving the advantages of random and directed testing. We show the
use of a formal specification as input to a test generator. We present
techniques for the efficient implementation of the generator. We discuss our
experience with this method applied to commercial designs. We show how our
approach is a stepping stone towards practical formal verification.
-
28.2 Functional Verification Methodology of Chameleon Processor
- F. Casaubleih, A. Mclsaac, F. Rocheteau, M. Bartley, M. Benjamin,
M. Belhadj, J. Eggleton, F. Pogodalla, G. Barrett, G. Mas, C.Berthet
Functional verification of the new generation microprocessor developed by
SGS-THOMSON Microelectronics makes extensive use of advanced technologies. This
paper presents a global overview of the methodology and focuses on three main
aspects :
- Use of acceleration and emulation technologies for the verification
of the VHDL specification in the early stages of the design.
- Development and use of sequential verification methods built upon a commercially available formal proof tool.
- Extensive use of combinational proof for circuit-level verification, in conjunction with transistor abstraction.
-
28.3 Experience in Designing A Large-scale Multiprocessor Using
Field-Programmable Devices and Advanced CAD Tools
- S. Brown, N. Manjikian, Z. Vranesic, S. Caranci, S. Grbic, R. Grindley, M. Gusat,K. Loveless, Z. Zilic, and S. Srbljic
This paper provides a case study that shows how a demanding application
stresses the capabilities of todays CAD tools, especially in the integration
of products from multiple vendors. We relate our experiences in the design of a
large, high-speed multi-processor computer, using state of the art CAD tools.
All logic circuitry is targeted to field-programmable devices (FPDs).
This choice amplifies the difficulties associated with achieving a high-speed
design, and places extra requirements on the CAD tools. Two main CAD systems are
discussed in the paper: Cadence Logic Workbench (LWB) is employed for
board-level design, and Altera MAX+plusII is used for implementation of logic
circuits in FPDs. Each of these products is of great value for our project, but
the integration of the two is less than satisfactory. The paper describes a
custom procedure that we developed for integrating sub-designs realized in
FPDs (via MAX+plusII) into our board-level designs in LWB. We also discuss
experiences with Logic Modelling Smart Models, for simulation of FPDs and other
types of chips.
Chair: Lawrence T. Pileggi
Organizers: K. Sakallah, P. McGeer
Low power design has become a major concern for portable and mobile electronic
devices. The papers in this session propose a variety of approaches for
power estimation that aim at satisfying the twin goals of accuracy and efficiency.
-
29.1 Power Estimation of Cell-Based CMOS Circuits
- Alessandro Bogliolo, Luca Benini, Bruno Riccó
PPP is a Web-based simulation and synthesis environment for low-power design.
In this paper we describe the gate-level simulation engine of PPP, that
achieves accuracy always within 6% from SPICE, while keeping performance
competitive with traditional gate-level simulation. This is done by using
advanced symbolic models of the basic library cells, that exploit the physical
understanding of the main power-consuming phenomena. VERILOG-XL is used as
simulation platform to maintain compatibility with design environments. The
Web-based interface allows the user to remotely access and execute the
simulator using his/her own Web-browser (without the need of any software
installation).
-
29.2 A New Hybrid Methodology for Power Estimation
- David Ihsin Cheng, Kwang-Ting Cheng, Deborah C. Wang,
Malgorzata Marek-Sadowska
In this paper, we propose a hybrid approach for estimating the switching
activities of the internal nodes in logic circuits. The new approach combines
the advantages of the simulation-based techniques and the probability-based
techniques. We use the user-specified control sequence for simulation and
treat the weakly correlated data inputs using the probabilistic model. The new
approach, on one hand, is more accurate than the probabilistic approaches
because the strong temporal and spatial correlations among control inputs are
well taken into consideration. On the other hand, the new approach is much more
efficient than the simulation-based approaches because the weakly correlated
data inputs are not explicitly simulated. In addition, we also propose a
heuristic that builds BDDs in terms of internal nodes such that large circuits
can be handled. Extensive experimental results are presented to show the
effectiveness and efficiency of our algorithms.
-
29.3 A Statistical Approach to the Estimation of Delay-Dependent Switching Activities In CMOS Combinational Circuits
- Yong Je Lim, Kyung-Im Son, Heung-Joon Park, Mani Soma
This paper describes a new procedure for estimating the delay-dependent
switching activities in CMOS combinational circuits. The procedure is based on
analytic and statistical techniques to take advantage of their time-efficiency
over conventional logic simulators. Combinational circuits driven by
synchronized logic signals are considered as application targets and the
statistical properties of logic signals and circuit parameters are defined and
evaluated. The experimental result on benchmark circuits shows the significant
time efficiency of the proposed procedure.
Chair: Gary D. Hachtel
Organizers: F. Somenzi, B. Lin
Optimization of sequential circuits is performed at both the structural
and the state transition graph level. This session presents new approaches
to several optimization tasks: state minimization, redundancy removal, and
rectification.
-
30.1 Engineering Change in a Non-Deterministic FSM Setting
- Sunil P. Khatri, Amit Narayan, Sriram C. Krishnan, Kenneth L. McMillan,
Robert K. Brayton, A. Sangiovanni-Vincentelli
We propose a new formalism for the Engineering Change (EC) problem in a finite
state machine (FSM) setting. Given an implementation that violates the
specification, the problem is to alter the behavior of the implementation so
that it meets the specification. The implementation can be a pseudo-nondeterministic
FSM while the specification may be a nondeterministic FSM. The EC problem is cast
as the existence of an appropriate simulation reaction from the
implementation into the specification. We derive the necessary and sufficient
conditions for the existence of a solution to the problem. We synthesize all
possible solutions, if the EC is feasible. Our algorithm works in space which
is linear, and time which is quadratic, in the product of the sizes of
implementation and specification. Previous formulations of the problem which
admit nondeterministic specifications, although more general, lead to an
algorithm which is exponential. We have implemented our procedure using Reduced
Ordered Binary Decision Diagrams.
-
30.2 Identifying Sequential Redundancies Without Search
- Mahesh A. Iyer, David E. Long, Miron Abramovici
Previous solutions to the difficult problem of identifying sequential
redundancy are either based on incorrect theoretical results, or rely on
unrealistic simplifying assumptions, or are applicable only to small circuits.
In this paper, we show the limitations of the existing definitions of
sequential redundancy and introduce a new concept of cycle redundancy as a
generalization of the conventional notion of sequential redundancy. We present an
efficient algorithm, FIRES, to identify cycle redundancies without search.
FIRES does not assume the existence of a global reset nor does it require any
state transition information. FIRES has provably polynomial-time complexity and
is practical for large circuits. Experimental results on benchmark circuits
indicate that FIRES identifies a large number of redundancies. We show that,
in general, the redundant faults identified by FIRES are not easy targets
for state-of-the-art sequential test generators.
-
30.3 A Fast State Reduction Algorithm for Incompletely Specified Finite
State Machines
- Hiroyuki Higuchi, Yusuke Matsunaga
This paper proposes a state reduction algorithm for incompletely specified
FSMs. The algorithm is based on iterative improvements. When the number of
compatibles is likely to be too large to handle explicitly, they are
represented by a BDD. Experimental results are given to demonstrate that the
algorithm described here is faster and obtains better solutions than conventional
methods.
-
30.4 Symbolic Optimization of FSM Networks Based on Sequential ATPG
Techniques
- Fabrizio Ferrandi, Franco Fummi, Enrico Macii, Massimo Poncino,
Donatella Sciuto
This paper presents a novel optimization algorithm for FSM networks that relies
on sequential test generation and redundancy removal. The implementation of the
proposed approach, which is based on the exploitation of input don't care
sequences through regular language intersection, is fully symbolic. Experimental
results, obtained on a large set of standard benchmarks, improve over the ones
of state-of-the-art methods.
Chair: Lou Scheffer
Organizers: A.B. Kahng, A. Domic
Among the new topics introduced in this session: a new partitioning objective
for hierarchical designs, datapath layout into FPGA devices, and two theoretical
results on floorplan area minimization and performance-driven wiresizing.
-
31.1 Module Compaction in FPGA-based Regular Datapaths
- Andreas Koch
When relying on module generators to implement regular datapaths on FPGAs, the
coarse granularity of FPGA cells can lead to area and delay inefficiencies. We
present a method to alleviate these problems by compacting adjacent modules
using structure extraction, local logic synthesis, and cell replacement. The
regular datapath structure is exploited and preserved, achieving faster layouts
after shorter tool run-times.
-
31.2 Network Partitioning into Tree Hierarchies
- Ming-Ter Kuo, Lung-Tien Liu, Chung-Kuan Cheng
This paper addresses the problem of partitioning a circuit into a tree hierarchy with an objective of minimizing a global interconnection cost. An efficient and effective algorithm is necessary when the circuit is huge and the tree has many levels of hierarchy. We propose a heuristic algorithm for improving a partition with respect to a given tree structure. The algorithm utilizes the tree hierarchy as an efficient mechanism for iterative improvement. We also extend the tree hierarchy to apply a multi-phase partitioning approach. Experimental results show that the algorithm significantly improves the initial partitions produced by multiway partitioning and by recursive partitioning.
-
31.3 Efficient Approximation Algorithms for Floorplan Area Minimization
- Danny Z. Chen, Xiaobo (Sharon) Hu
Approximation has been shown to be an effective method for reducing the time and
space costs of solving various floorplan area minimization problems. In this paper
, we present several approximation techniques for solving floorplan area
minimization problems. These new techniques enable us to reduce both the time
and space complexities of the previously best known approximation algorithms by
more than a factor of n and n 2 for rectangular and L-shaped sub-floorplans,
respectively (where n is the number of given implementations). The efficiency in
the time and space complexities is critical to the applicability of such
approximation techniques in floorplan area minimization algorithms. We also give
a technique for enhancing the quality of approximation results.
-
31.4 Optimal Wire-Sizing Formula Under Elmore Delay Model
- Chung-Ping Chen, Yao-Ping Chen, D.F. Wong
In this paper, we consider non-uniform wire-sizing. Given a wire segment of
length L, let f(x) be the width of the wire at position x, 0 <= x <= L.
We show that the optimal wire-sizing function that minimizes the Elmore delay
through the wire is f(x) =ae ^(-by), where y > 0 and b > 0 are constants
that can be computed in O(1) time. In the case where lower bound (L > 0)
and upper bound (U > 0) on the wire widths are given, we show that the
optimal wire-sizing function f(x) is a truncated version of ae ^(-by) that
can also be determined in O(1) time. Our wire-sizing formula can be
iteratively applied to optimally size the wire segments in a routing tree.
Chair: Takayasu Sakurai
Organizers: T. Sakurai, S. Trimberger
Recently LSI devices are finding more applications in high volume consumer
products. The papers in this session describe the design methods of several
consumer products, including a mini disc player, a HDTV decoder, a digital
VCR, a TV tuner and a satellite broadcasting receiver. Constraints, issues
and solutions for digital and analog design are presented using real industry
design examples.
-
32.1 VLSI Design and System Level Verification for the Mini-Disc
- Tetsuya Fujimoto, Takashi Kambe
In this paper, a new method for the design of complex multimedia ASIC is
introduced. Using this design method, VLSI with embedded software and
high-speed emulators can be developed concurrently. The method has proven to be
effective through actual design of VLSI for audio compression and decompression
in a Mini-Disc system.
-
32.2 Design Methodologies for Consumer-Use Video Signal Processing LSIS
- Hisakazu Edamatsu, Satoshi Ikawa, Katsuya Hasegawa
This paper describes the methodologies used to design a Hi-Vision MUSE decoder
for Japanese HDTV and codec LSIs for digital VCRs. Since a large amount of
input video data is needed to verify the algorithms and logic designs, reducing
the verification time is a key issue in designing these LSIs. We describe
the methodology used to verify the video signal processing algorithm and that
of the physical design.
-
32.3 Design Methodology for Analog High Frequency ICs
- Yasunori Miyahara, Toshitomo Oumi, Seijiro Moriyama
This paper presents a methodology suited for high frequency analog IC design.
The use of a top-down method with AHDL for circuit designers is proposed. In
order to accelerate the re-use of circuits that were previously designed and
validated in other ICs, the authors developed a system that eases the re-use
in the top-down design environment. Moreover, a model parameter generation
technique for bipolar transistors has been developed and its usefulness has
been shown for accurate simulation of high frequency analog ICs.
-
SESSION 33
TUTORIAL: ISSUES AND ANSWERS IN CAD TOOL INTEROPERABILITY
Chair: Mike Murray
Organiazer: M. Murray
Presenters:
Mike Murray, Uwe B. Meding, Bill Berg, Yatin Trivedi, Bill McCaffrey,
Ted Vucurevich
CAD tool interoperability issues are a recurring impediment to constructing a design methodology, especially if the methodology incorporates point tools from several vendors. Failures in data transfer between tools often arise unexpectedly, and can delay design work until a workaround is developed. This paper examines interoperability issues for four classes of tools: schematic tools, simulation and synthesis tools, IC physical design tools, and workflow management tools. This paper also describes current research in modeling interoperability problems. Using this paper, the reader can develop a checklist of potential interoperability issues in his CAD environment, and address these issues before they cause a design schedule slip.
Chair: Frank Vahid
Organizers: R. Gupta, L. Lavagno
Co-design refers to joint design and optimization of hardware and software
in system design. We begin with a tutorial on tools and methodologies for
various co-design sub-problems. This is followed by a presentation of the
architectural strategies for system co-design. System partitioning, or the
determination of the boundary between hardware and software, is the subject
of the last paper in this session.
-
34.1 Tutorial: The Design of Mixed Hardware/Software Systems
- Jay K. Adams, Donald E. Thomas
Over the past several years there has been a great deal of interest in the
design of mixed hardware/software systems, sometimes referred to as
hardware/software co-design or hardware/software co-synthesis. However,
although many new design methodologies have taken the name hardware/software
co-design, they often do not seem to share much in common with one another. This
partly due to the fact that the problem itself has so many dimensions. This
tutorial describes a set of criteria that can be used to compare differing
approaches to hardware/software co-design. These criteria are used in the
discussion of a number of published hardware/software co-design techniques to
illustrate how a wide range of approaches can be viewed within a single
framework.
-
34.2 Constructing Application-Specific Heterogeneous Embedded Architectures
from Custom HW/SW Applications
- Steven Vercauteren, Bill Lin, Hugo De Man
Deep sub-micron processing technologies have enabled the implementation of new
application-specific embedded architectures that integrate multiple software
programmable processors (e.g. DSPs, microcontrollers) and dedicated hardware
components together onto a single cost-efficient IC. These application-specific
architectures are emerging as a key design solution to today's microelectronics
design problems, which are being driven by emerging applications in the
areas of wireless communication, broadband networking, and multimedia
computing. However, the construction of these customized heterogeneous
multiprocessor architectures, while ensuring that the hardware and software
parts communicate correctly, is a tremendously difficult and highly error
proned task with little or no tool support. In this paper, we present a
solution to this embedded architecture co-synthesis problem based on an
orchestrated combination of architectural strategies, parameterized libraries,
and software tool support.
-
34.3 A Hardware/Software Partitioning Algorithm for Designing Pipelined
ASIPs with Least Gate Counts
- Nguyen Ngoc Binh, Masaharu Imai, Akichika Shiomi, Nobuyuki Hikichi
This paper introduces a new HW/SW partitioning algorithm used in automating
the instruction set processor design for pipelined ASIP (Application Specific
Integrated Processor). The partitioning problem is formalized as a combinatorial
optimization problem that partitions the operations into hardware and software
so that the HW cost (gate count) of the designed pipelined ASIP is minimized
under given execution cycle and power consumption constraints. A branch-and-bound
algorithm with proposed lower bound functions is used to solve the presented
formalization in the PEAS-I system. The experimental results show that the
proposed method is found to be effective and efficient.
Chair: Andrew T. Yang
Organizers: J. White, A.T. Yang
This session presents some of the recent work on timing/power analysis and
simulation. The first paper gives a method for calculating the time-domain
response for a finite - length distributed RC line stimulated by a ramp
input. Moment-matching approximation methods are employed for PCB delay
estimation in the second paper, and for transistor-level timing simulation
in the third paper. The last paper presents an electrothermal system to
find steady-state on-chip temperature distribution.
-
35.1 Analysis of RC Interconnections Under Ramp Input
- Andrew B. Kahng, Sudhakar Muddu
We present a general and, in the limit, exact approach to compute the
time-domain response for finite-length RC lines under ramp input, by summing
distinct diffusions starting at either end of the line. We also obtain
analytical expressions for the finite time-domain voltage response for an
open-ended finite RC line and for a finite RC line with capacitive load. Delay
estimates using our new method are very close to SPICE-computed delays. Finally,
we present a general recursive equation for computing the higher-order diffusion
components due to reflections at the source and load ends. Future work extends
our method to response computations in general interconnection trees by modeling
both reflection and transmission coefficients at discontinuities.
-
35.2 An AWE Technique for Fast Printed Circuit Board Delays
- Bernie Sheehan
We present a technique for rapidly calculating printed circuit board (PCB)
interconnect delays. Because of ringing, plateaux, and other signal integrity
effects, delay estimation with low order macromodels is more difficult for PCB
nets than for VLSI interconnect. A moment matching method for interconnect
trees is described in which moments are computed by a tree traversal algorithm
using ABCD matrices. Moment contributions from distributed transmission lines
are calculated directly. This is a simplification compared to methods which must
first approximate the line as a lumped RLC ladder; it also makes time of flight
extraction trivial. Order descent is used to ensure stability. When applied to
a real PCB design, the method was accurate and impressively fast--588 nets, or
4496 delays, in 12 seconds on a PC!
-
35.3 RC-Interconnect Macromodels for Timing Simulation
- Florentin Dartu, Bogdan Tutuianu, Lawrence T. Pileggi
Most timing simulators obtain their efficiency over circuit simulation in terms
of explicit integration algorithms that have difficulty handling the stiff RC
circuit models which characterize interconnect-dominated paths. In this paper
we describe a reduced-order N-port interconnect macromodel for timing
simulation. This macromodel is shown to improve the timing simulation
efficiency dramatically since it alleviates the stiff circuit problem.
Moreover, through its compatibility with the simple timing simulation
transistor models, it is shown that this macromodel does not suffer from the
dramatic increase in complexity with an increase in the number of ports like
circuit simulation.
-
35.4 iCET: A Complete Chip-Level Thermal Reliability Diagnosis Tool
for CMOS VLSI Chips
- Yi-Kan Cheng, Chin-Chi Teng, Abhijit Dharchoudhury, Elyse Rosenbaum,
Sung-Mo Kang
In this paper, we present the first chip-level electrothermal simulator, iCET.
For a given chip layout, packaging material, user-specified input signal patterns
, and thermal boundary conditions, it automatically finds the CMOS on-chip
steady-state temperature profile and the resulting circuit performance. iCET
has been tested on several circuits and it can efficiently analyze layouts
containing tens of thousands of transistors on a desktop workstation.
Chair: Randal E. Bryant
Organizers: F. Somenzi, a. Kuehlmann
This session presents various approaches to verify sequential systems by
exploiting structural properties. Two papers address the verification of
pipelines; one deals with multiprocessors; and the last one is concerned
with embedded systems.
-
36.1 Techniques for Verifying Superscalar Microprocessors
- Jerry R. Burch
Burch and Dill [3] described an automatic method for verifying a pipelined
processor against its instruction set architecture (ISA). We describe three
techniques for improving this method. We show how the combination of these
techniques allows for the automatic verification of the control logic of a
pipelined, superscalar implementation of a subset of the DLX architecture.
-
36.2 A Scalable Formal Verification Methodology for Pipelined Microprocessors
- Jeremy Levitt, Kunle Olukotun
We describe a novel, formal verification technique for proving the correctness
of a pipelined microprocessor that focuses specifically on pipeline control
logic. We iteratively deconstruct a pipeline by merging adjacent pipeline
stages, allowing for the verification to be done in several easier steps. We
present an inductive proof methodology that verifies that pipeline behaviour
is preserved as the pipeline depth is reduced via deconstruction; this
inductive approach is less sensitive to pipeline depth and complexity than
previous approaches. Invariants are used to simplify the proof, and datapath
components are abstracted using validity checking with uninterpreted functions.
We present experimental results from the formal verification of a DLX
five-stage pipeline using our technique.
-
36.3 State Reduction Using Reversible Rules
- C. Norris Ip, David L. Dill
We reduce the state explosion problem in automatic verification of finite-state
systems by automatically collapsing subgraphs of the state graph into abstract
states. The key idea of the method is to identify state generation rules that
can be inverted. It can be used for verification of deadlock-freedom, error and
invariant checking and stuttering-invariant CTL model checking.
-
36.4 Formal Verification of Embedded Systems based on CFSM Networks
- Felice Balarin, Harry Hsieh, Attila Jurecska, Luciano Lavagno,
Alberto Sangiovanni-Vincentelli
Both timing and functional properties are essential to characterize the correct
behavior of an embedded system. Verification is in general performed either by
simulation, or by bread-boarding. Given the safety requirements of such systems,
a formal proof that the properties are indeed satisfied is highly desirable. In
this paper, we present a formal verification methodology for embedded systems.
The formal model for the behavior of the system used in POLIS is a network of
Codesign Finite State Machines. This model is translated into automata, and
verified using automata-theoretic techniques. An industrial embedded system is
verified using the methodology. We demonstrate that abstractions and separation
of timing and functionality is crucial for the successful use of formal
verification for this example. We also show that in POLIS abstractions and
separation of timing and functionality can be done by simple syntactic modification
of the representation of the system.
-
SESSION 37
PANEL: ELECTRONIC CONNECTIVITY + EDA DATA = ELECTRONIC COMMERCE
Chair: Sean Murphy
Organizers: S. Murphy
Panelists:
Jeff Allison, Jake Karrfalt, Michael McClure, Preston Roper,
Dennis Wilson
EDA tools and EDA users are increasingly looking to the Internet for library
information, bug fixes, and sending final design data electronically to
ASIC foundries and board houses. This panel explores the impact the Internet
has made on current electronic design practices and how you as a designer
can benefit.
Chair: Rajiv Jain
Organizers: S. Trimberger, P. Duncan
The scope of design domains targeted by high level design techniques has
grown rapidly in recent years. These papers present real-life experiences
with high level design tools in varied application areas. Find out which
techniques match your high level design problems.
-
38.1 Combined Control Flow Dominated and Data Flow Dominated High-Level
Synthesis
- E. Berrebi, P. Kission, S. Vernalde, S. De Troch, J.C. Herluison, J. Fréhel, A.A. Jerraya, I. Bolsens
This paper presents the design of a Videophone Coder-Decoder Motion Estimator
using two High-Level Synthesis tools. Indeed, the combination of a Control Flow
Dominated part (complex communication protocol) with a Data Flow Dominated part
(high throughput computations) makes this circuit difficult to be synthesized by
a single HLS tool. The combination of two HLS tools for the high-level design
of this operator required the definition of a sophisticated design flow allowing
mixed-level and multi-language simulations. When compared to design starting
from RTL specifications, HLS induces only a negligible area overhead of 5%, and
provides gain in description length (divided by 5), design time and flexibility.
-
38.2 FADIC: Architectural Synthesis Applied in IC Design
- J. Huisken, F. Welten
This paper discusses the design of a chip using architectural synthesis. The
chip, FADIC, is applied in Digital Audio Broadcasting (DAB) receivers. It shows
that architectural synthesis tools are used for the design of new complex
applications and that it supports the evolutionary development of challenging
applications like DAB. It was found that the success of such tools in the
design community depends on the way user interaction is supported and stimulated
. Fast and accurate feedback from the synthesis tools in combination with a
rich set of hints for the compiler to guide the architecture exploration are
the key issues. It is shown that short time to market is possible for
implementations which are an order of magnitude more efficient than alternative
implementations on commercially available DSP processors.
-
38.3 Domain-Specific High-Level Modeling and Synthesis for ATM Switch
Design Using VHDL
- Mike Tien-Chien Lee, Yu-Chin Hsu, Ben Chen, Masahiro Fujita
This paper presents our experience on domain-specific high-level modeling and
synthesis for Fujitsu ATM switch design. We propose a high-level design
methodology using VHDL, where ATM switch architectural features are considered
during behavior modeling, and a high-level synthesis compiler, MEBS, is
prototyped to synthesize the behavior model down to a gate-level implementation.
Since the specific ATM switch architecture is incorporated into both modeling
and synthesis phases, a high-quality design is efficiently derived. The
synthesis results show that given the design constraints, the proposed
high-level design methodology can produce a gate-level implementation by MEBS
with about 15% area reduction in shorter design cycle when compared with manual
design.
Chair: Steve Tjiang
Organizers: L. Lavagno, R. Gupta
Efficient code generation and analysis are essential to improve the design
cycle of embedded software. The papers in this session aim at extending
the applicability of high-level languages to cost and performance-critical
systems. The first two papers deal with expression code generation and with
address register usage optimization for DSP applications. The third paper
addresses the problem of checking timing constraint satisfiability. The
last paper describes cost estimation techniques for synthesized software.
-
39.1 Using Register-Transfer Paths in Code Generation for Heterogeneous
Memory-Register Architectures
- Guido Araujo, Sharad Malik, Mike Tien-Chien Lee
In this paper we address the problem of code generation for basic blocks in
heterogeneous memory-register DSP processors. We propose a new a technique,
based on register-transfer paths, that can be used for efficiently dismantling
basic block DAGs (Directed Acyclic Graphs) into expression trees. This approach
builds on recent results which report optimal code generation algorithm for
expression trees for these architectures. This technique has been implemented
and experimentally validated for the TMS320C25, a popular fixed point DSP
processor. The results show that good code quality can be obtained using the
proposed technique. An analysis of the type of DAGs found in the DSPstone
benchmark programs reveals that the majority of basic blocks in this benchmark
set are expression trees and leaf DAGs. This leads to our claim that tree
based algorithms, like the one described in this paper, should be the technique
of choice for basic block code generation with heterogeneous memory-register
architectures.
-
39.2 Address Calculation for Retargetable Compilation and Exploration
of Instruction-Set Architectures
- Clifford Liem, Pierre Paulin, Ahmed Jerraya
The advent of parallel executing Address Calculation Units (ACUs) in Digital
Signal Processor (DSP) and Application Specific Instruction-Set Processor (ASIP)
architectures has made a strong impact on an application’s ability to efficiently
access memories. Unfortunately, successful compiler techniques which map
high-level language data constructs to the addressing units of the architecture
have lagged far behind. Since access to data is often the most demanding task
in DSP, this mapping can be the most crucial function of the compiler. This
paper introduces a new retargetable approach and prototype tool for the
analysis of array references and traversals for efficient use of ACUs. The
ArrSyn utility is designed to be used either as an enhancement to an existing
dedicated compiler or as an aid for architecture exploration.
-
39.3 Analysis of Operation Delay and Execution Rate Constraints for
Embedded Systems
- Rajesh K. Gupta
Constraints on the delay and execution rate of operations in an embedded
application are needed to ensure its timely interaction with a reactive
environment. In this paper, we present a static analysis of the timing
constraints satisfiability by a given system design consisting of interacting
hardware and software components. We use this analysis to evaluate the effect
of individual timing constraints on system design issues, such as the choice
of the software runtime system, bounds on loop invocations, and the
hardware-software synchronization operations. We show, by example, the use of
static analysis techniques in the design of embedded systems.
-
39.4 Efficient Software Performance Estimation Methods for Hardware/Software Codesign
- Kei Suzuki, Alberto Sangiovanni-Vincentelli
The performance estimation of a target system at a higher level of abstraction
is very important in hardware/software codesign. In this paper, we focus on
software performance estimation, including both the execution time and the code
size. We present two estimation methods at different levels of abstraction for
use in the POLIS hardware/software codesign system. The experimental results
show that the accuracy of our methods is usually within +-20%.
Chair: Andrzej J. Strojwas
Organizers: K. Sakallah, S. Malik
This session focuses on techniques used in high performance design. It
examines accurate and efficient techniques for gate and interconnect modeling
for high performance circuits as well as a process-tolerant clock skew timing
optimization technique.
-
40.1 An Explicit RC-Circuit Delay Approximation Based on the First Three
Moments of the Impulse Response
- Bogdan Tutuianu, Florentin Dartu, Lawrence T. Pileggi
Due to its simplicity, the ubiquitous Elmore delay, or first moment of the
impulse response, has been an extremely popular delay metric for analyzing RC
trees and meshes. Its inaccuracy has been noted, however, and it has been
demonstrated that higher order moments can be mapped to dominant pole
approximations (e.g. AWE) in the general case. The first three moments can be
mapped to a two-pole approximation, but stability is an issue, and even a
stable model results in a transcendental equation that must be iteratively
evaluated to determine the delay. In this paper we describe an explicit delay
approximation based on the first three moments of the impulse response. We
begin with the development of a provably stable two-pole transfer
function/impedance model based on the first three moments (about s=0) of the
impulse response. Then, since the model form is known, we evaluate the delay
(any waveform percentage point) in terms of an analytical approximation that is
consistently within a fraction of 1 percent of the “exact” solution for this
model. The result is an accurate, explicit delay expression that will be an
effective metric for high-speed interconnect circuit models.
-
40.2 Modeling the Effects of Temporal Proximity of Input Transitions
on Gate Propagation Delay and Transition Time
- V. Chandramouli, Karem A. Sakallah
While delay modeling of gates with a single switching input has received
considerable attention, the case of multiple inputs switching in close
temporal proximity is just beginning to be addressed in the literature. The
effect of proximity of input transitions can be significant on the delay and
output transition time. The few attempts that have addressed this issue are
based on a series-parallel transistor collapsing method that reduces the
multi-input gate to an inverter. This limits the technique to CMOS technology.
Moreover, none of them discuss the appropriate choice of voltage thresholds to
measure delay for a multi-input gate. In this paper, we first present a method
for the choice of voltage thresholds for a multi-input gate that ensures a
positive value of delay under all input conditions. We next introduce a
dual-input proximity model for the case when only two inputs of the gate are
switching. We then propose a simple approximate algorithm for calculating the
delay and output transition time that makes repeated use of the dual-input
proximity model without collapsing the gate into an equivalent inverter.
Comparison with simulation results shows that our method performs quite well
in practice.
-
40.3 Optimal Clock Skew Scheduling Tolerant to Process Variations
- Jóse Luis Neves, Eby G. Friedman
A methodology is presented in this paper for determining an optimal set of
clock path delays for designing high performance VLSI/ULSI-based clock
distribution networks. This methodology emphasizes the use of non-zero clock
skew to reduce the system-wide minimum clock period. Although choosing (or
scheduling) clock skew values has been previously recognized as an optimization
technique for reducing the minimum clock period, difficulty in controlling the
delays of the clock paths due to process parameter variations has limited its
effectiveness. In this paper the minimum clock period is reduced using
intentional clock skew by calculating a permissible clock skew range for each
local data path while incorporating process dependent delay values of the
clock signal paths. Graph-based algorithms are presented for determining the
minimum clock period and for selecting a range of process-tolerant clock skews
for each local data path in the circuit, respectively. These algorithms have
been demonstrated on the ISCAS-89 suite of circuits. Furthermore, examples of
clock distribution networks with intentional clock skew are shown to tolerate
worst case clock skew variations of up to 30% without causing circuit failure
while increasing the system-wide maximum clock frequency by up to 20% over zero
skew-based systems.
Chair: Rick Rudell
Organizers: A. Kuehlmann, F. Somenzi
Decision diagrams are used in many contemporary verification techniques.
This session presents advances in their basic implementation and two applications to combinational and sequential verification.
-
41.1 An Efficient Equivalence Checker for Combinational Circuits
- Yusuke Matsunaga
This paper describes a novel equivalence checking method for combinational
circuits, which utilizes relations among internal signals represented by
binary decision diagrams. To verify circuits efficiently, A proper set of
internal signals that are independent with each other should be chosen. A
heuristic based on analysis of circuit structure is proposed to select such a
set of internal signals. The proposed verifier requires only a minute for
equivalence checking of all the ISCAS'85 benchmarks on SUN-4/10.
-
41.2 High Performance BDD Package By Exploiting Memory Hierarchy
- Jagesh V. Sanghavi, Rajeev K. Ranjan, Robert K. Brayton, Alberto
Sangiovanni-Vincentelli
The success of binary decision diagram (BDD) based algorithms for verification
depend on the availability of a high performance package to manipulate very
large BDDs. State-of-the-art BDD packages, based on the conventional
depth-first technique , limit the size of the BDDs due to a disorderly memory
access patterns that
results in unacceptably high elapsed time when the BDD size exceeds the main
memory capacity. We present a high performance BDD package that enables
manipulation of very large BDDs by using an iterative breadth-first technique
directed towards localizing the memory accesses to exploit the memory system
hierarchy. The new memory-oriented performance features of this package are 1)
an architecture independent customized memory management scheme, 2) the ability
to issue multiple independent BDD operations (super-scalarity), and 3) the
ability to perform multiple BDD operations even when the operands of some BDD
operations are the result of some other operations yet to be completed
(pipelining). A comprehensive set of BDD manipulation algorithms are implemented
using the above techniques. Unlike the breadth-first algorithms presented in
the literature, the new package is faster than the state-of-the-art BDD package
by a factor of up to 1.5, even for the BDD sizes that fit within the main memory.
For BDD sizes that do not fit within the main memory, a performance improvement
of up to a factor of 100 can be achieved.
-
41.3 Implementation of an Efficient Parallel BDD Package
- Tony Stornetta, Forrest Brewer
Large BDD applications push computing resources to their limits. One solution
to overcoming resource limitations is to distribute the BDD data structure
across multiple networked workstations. This paper presents an efficient
parallel BDD package for a distributed environment such as a network of
workstations (NOW) or a distributed memory parallel computer. The
implementation exploits a number of different forms of parallelism that can
be found in depth-first algorithms. Significant effort is made to limit the
communication overhead, including a two-level distributed hash table and an
uncomputed cache. The package simultaneously executes multiple threads of
computation on a distributed BDD.
-
41.4 Word Level Symbolic Model Checking - Avoiding The Pentium FDIV Error
- E.M. Clarke, Manpreet Khaira, Xudong Zhao
The highly-publicized division error in the Pentium has emphasized the importance
of formal verification of arithmetic circuits. Symbolic model checking techniques
based on binary decision diagrams (BDDs) have been successful in verifying
control logic. However, lack of proper representation for functions that map
boolean vectors into the integers has prevented this technique from being used
for verifying arithmetic operations. We have developed a new technique for
verifying arithmetic circuits. The new technique, called work level model
checking, has been used successfully to verify circuits for division and
square root computation that are based on the SRT algorithm used by the Pentium.
The technique makes it possible to handle both the control logic and the data
paths in the circuit. The total number of state variables exceeds 600 (which
is much larger than any circuit previously handled by other symbolic model
checkers).
Chair: Carl Pixley
Organizers: B. Frye, N. Weste
Adequate design verification using traditional simulation-based techniques
becomes increasingly difficult as design complexity advances. Formal
verification techniques promise to prove that a design is correct without
exhaustive simulation. Papers in this session describe designer experiences
with formal verification techniques.
-
42.1 Formal Verification of PowerPC(TM) Arrays Using Symbolic Trajectory
Evaluation
- Manish Pandey, Richard Raimi, Derek L. Beatty, Randal E. Bryant,
Verifying memory arrays such as on-chip caches and register files is a
difficult part of designing a microprocessor. Current tools cannot verify the
equivalence of the arrays to their behavioral or RTL models, nor their correct
functioning at the transistor level. It is infeasible to run the number of
simulation cycles required, and most formal verification tools breakdown due to
the enormous number of state-holding elements in the arrays. The formal method
of symbolic trajectory evaluation (STE) appears to offer a solution, however.
STE verifies that a circuit satisfies a formula in a carefully restricted
temporal logic. For arrays, it requires only a number of variables approximately
logarithmic in the number of memory locations. The circuit is modeled at the
switch level, so the verification is done on the actual design. We have used
STE to verify two arrays from PowerPC microprocessors: a register file, and a
data cache tag unit. The tag unit contains over 12,000 latches. We believe it
is the largest circuit to have been formally verified, without abstracting away
significant detail, in the industry. We also describe an automated technique
for identifying state-holding elements in the arrays, a technique which should
greatly assist the widespread application of STE.
-
42.2 RuleBase: an Industry-Oriented Formal Verification Tool
- Ilan Beer, Shoham Ben-David, Cindy Eisner, Avner Landver
RuleBase is a formal verification tool, developed by the IBM Haifa Research
Laboratory. It is the result of three years of experience in practical formal
verification of hardware which, we believe, has been a key factor in bringing
the tool to its current level of maturity. We present the tool, including
several unique features, and summarize our usage experience.
-
42.3 Bit-Level Analysis of an SRT Divider Circuit
- Randal E. Bryant
It is impractical to verify multiplier or divider circuits entirely at the
bit-level using ordered Binary Decision Diagrams (BDDs), because the BDD
representations for these functions grow exponentially with the word size. It
is possible, however, to analyze individual stages of these circuits using
BDDs. Such analysis can be helpful when implementing complex arithmetic
algorithms. As a demonstration, we show that Intel could have used BDDs to
detect erroneous lookup table entries in the Pentium(TM) floating point
divider. Going beyond verification, we show that bit-level analysis can be
used to generate a correct version of the table.
-
42.4 Integrating Formal Verification Methods with A Conventional Project
Design Flow
- Asgeir Th. Eiriksson
We present a formal verification methodology that we have used on a computer
system design project. The methodology integrates a temporal logic model
checker with a conventional project design flow. The methodology has been used
successfully to verify the protocols within a distributed shared memory machine.
We consider the following to be the four main benefits to using the model
checker. First, it ensures that there exists an accurate high-level machine
readable system specification. Second, it allows high-level system verification
early in the design phase. Third, it facilitates equivalence and refinement
checking between the high-level specification, and the RTL implementation.
Finally, and most importantly it uncovered many protocol specification and RTL
implementation problems.
Chair: Wayne Wolf
Organizers: D. Stark, N. Weste
The realization of complex systems sometimes calls for methods of simultaneous
hardware and software design that go beyond the traditional boundaries separating
these two areas. The papers in this session explore efficient codesign and
examine its application to real design examples.
-
43.1 A System Design Methodology for Software/Hardware Co-Development
of Telecommunication Network Applications
- Bill Lin
In this paper, we describe a system design methodology for the concurrent
development of hybrid software/hardware systems for telecom network
applications. This methodology is based on the results of an investigation
and evaluation of an actual industrial system design application for ATM-based
broad-band networks. The aim of this methodology is to provide an integrated
design flow from system specification to implementation.
-
43.2 A Strategy for Real-Time Kernel Support in Application-Specific
HW/SW Embedded Architectures
- Steven Vercauteren, Bill Lin and Hugo De Man
Heterogeneous embedded multiprocessor architectures are becoming more prominent
as a key design solution to today's microelectronics design problems. These
application-specific architectures integrate multiple software programmable
processors and dedicated hardware components together on to a single cost-
efficient IC. In contrast to general-purpose computer systems, embedded
systems are designed and optimized to provide specific functionality, using
possibly a combination of different classes of processors (e.g. DSPs,
microcontrollers) from different vendors. While these customized heterogeneous
multiprocessor architectures offer designers new possibilities to tradeoff
programmability, there is currently a lack of tools to support the programming
of these architectures. In this paper, we consider the problem of providing
real-time kernel support for managing the concurrent software tasks that are
distributed over a set of processors in an application-specific
multiprocessor architecture. This is complimentary to current research
activities that aim to provide efficient retargetable code generation [8,11,10]
for a broad range of embedded processors.
-
43.3 Software Development in a Hardware Simulation Environment
- Benny Schnaider, Einat Yogev
Concurrent verification of hardware and software as part of the development
process can shorten the time to market of complex systems. The objectives of
the Virtual CPU approach is to provide a solution for code development in a
simulation environment before the system prototype is ready. The VCPU is an
ideal solution for processor-based systems that include multiple new ASICs and
boards, in which the hardware, diagnostics and software drivers must be
developed concurrently.
-
43.4 Compiled HW/SW Co-Simulation
- Vojin Zivojnovic, Heinrich Meyr
This paper presents a technique for simulating processors and attached hardware
using the principle of compiled simulation. Unlike existing, in-house and
off-the-shelf hardware/software co-simulators, which use interpretive
processor simulation, the proposed technique performs instruction decoding and
simulation scheduling at compile time. The technique offers up to three orders
of magnitude faster simulation. The high speed allows the user to explore
algorithms and hardware/software trade-offs before any hardware implementation.
In this paper, the sources of the speedup and the limitations of the technique
are analyzed and the realization of the simulation compiler is presented.
Chair: Bill Lin
Organizers: F. Somenzi, B. Lin
This paper covers two topic areas. The first two papers are devoted to power
estimation. The first paper presents a novel technique for power estimation
based on stochastic machine synthesis. This paper presents an effective
macro-modeling approach to datapath power estimation. The next two papers
are devoted to retiming techniques at the architecture level.
-
44.1 Stochastic Sequential Machine Synthesis Targeting Constrained Sequence Generation
- Diana Marculescu, Radu Marculescu, Massoud Pedram
The problem of stochastic sequential machines (SSM) synthesis is addressed and
its relationship with the constrained sequence generation problem which arises
during power estimation is discussed. In power estimation, one has to generate
input vector sequences that satisfy a given statistical behavior (in terms of
transition probabilities and correlations among bits) and/or to make these
sequences as short as possible so as to improve the efficiency of power
simulators. SSMs can be used to solve both problems. Based on Moore-type
machines, a general procedure for SSM synthesis is revealed and a new framework
for sequence characterization is built to match designer’s needs for sequence
generation or compaction. As results demonstrate, compaction ratios of 1-2
orders of magnitude can be obtained without much loss in accuracy of total
power estimates.
-
44.2 Energy Characterization based on Clustering
- Huzefa Mehta, Robert M. Owens, Mary Jane Irwin
We illustrate a new method to characterize the energy dissipation of circuits
by collapsing closely related input transition vectors and energy patterns
into capacitive coefficients. Energy characterization needs to be done only
once for each module (ALU, multiplier etc.,) in order to build a library of
these capacitive coefficients. A direct high-level energy simulator or profiler
can then use the library of pre-characterized modules and a sequence of input
vectors to compute the total energy dissipation. A heuristic algorithm which
performs energy clustering under objective constraints has been devised. The
worst case running time of this algorithm is O ( m 3 n ), where m is the
number of simulation points and n is the number of inputs of the circuit. The
designer can experiment with the criterion function by setting the appropriate
relative error norms to control the `goodness' of the clustering algorithm and
the sampling error and confidence level to maintain the sufficiency of
representation of each cluster. Experiments on circuits show a significant
reduction of the energy table size under a specified criterion function,
cluster sampling error and confidence level.
-
44.3 Architectural Retiming: Pipelining Latency-Constrained Circuits
- Soha Hassoun, Carl Ebeling
This paper presents a new optimization technique called architectural retiming
which is able to improve the performance of many latency-constrained circuits.
Architectural retiming achieves this by increasing the number of registers on
the latency-constrained path while preserving the functionality and latency
of the circuit. This is done using the concept of a negative register, which
can be implemented using precomputation and prediction. We use the name
architectural retiming since it both reschedules operations in time and
modifies the structure of the circuit to preserve its functionality. We
illustrate the use of architectural retiming on two realistic examples and
present performance improvement results for a number of sample circuits.
-
44.4 Optimizing Systems for Effective Block-Processing: The k-Delay Problem
- Kumar N. Lalgudi, Marios C. Papaefthymiou, Miodrag Potkonjak
Block-processing is a powerful and popular technique for increasing computation
speed by simultaneously processing several samples of data. The effectiveness of
block-processing is often reduced, however, due to suboptimal placement of
delays in the data flow graph of a computation. In this paper we investigate an application
of the retiming transformation for improving the effectiveness of
block-processing in computation structures. Specifically, we consider the
k-delay problem in which we wish to retime any given computation so that given
an integer k the resulting computation can process k data samples simultaneously
in a fully regular manner. Our main contribution is an O ( V 3 E + V 4 log V )-
time algorithm for the k-delay problem, where V is the number of computation
blocks and E is the number of interconnections in the computation.
Chair: Gabrièle Saucier
Co-Chair: Hamid Savoj
Organizers: M. Pedram, G. Saucier
This sessions focuses on performance oriented synthesis accounting for the
target technology. The first 3 papers deal with LUT based FPGA mapping both
in sequential and combinational circuits. The fourth paper proposes a new
algorithm for performing directed gate selection from an ASIC library. The
last paper describes a practical and effective method for post layout buffer
insertion using an accurate wire delay model.
-
45.1 Optimal Clock Period FPGA Technology Mapping for Sequential Circuits
- Peichen Pan, C.L. Liu
In this paper, we study the technology mapping problem for sequential circuits
for LUT-based FPGAs. Existing approaches map the combinational logic between
ip-ops (FFs) while assuming the positions of the FFs are fixed. We study in
this paper a new approach to the problem, in which retiming is integrated into
the technology mapping process. We present a polynomial time technology mapping
algorithm that can produce a mapping solution with the minimum clock period
while assuming FFs can be arbitrarily repositioned by retiming. The algorithm
has been implemented. Experimental results on benchmark circuits clearly
demonstrate the advantage of our approach. For many benchmark circuits, our
algorithm produced mapping solutions with clock periods not attainable by a
mapping algorithm based on existing approaches, even when it employs an
optimal delay mapping algorithm for combinational circuits.
-
45.2 Structural Gate Decomposition for Depth-Optimal Technology Mapping
in LUT-Based FPGA Design
- Jason Cong, Yean-Yow Hwang
In this paper, we study the problem of decomposing gates in fanin-unbounded or
K-bounded networks such that the K-input LUT mapping solutions computed by a
depth-optimal mapper have minimum depth. We show (1) any decomposition leads
to a smaller or equal mapping depth regardless the decomposition algorithm used,
and (2) the problem is NP-hard for unbounded networks when K >= 3 and remains
NP-hard for K-bounded networks when K >= 5. We propose a gate decomposition
algorithm, named DOGMA, which combines level-driven node packing technique
(Chortle-d) and the network flow based optimal labeling technique (FlowMap).
Experimental results show that networks decomposed by DOGMA allow depth-optimal
technology mappers to improve the mapping solutions by up to 11% in depth and
up to 35% in area comparing to the mapping results of networks decomposed by
other existing decomposition algorithms.
-
45.3 A Boolean Approach to Performance-Directed Technology Mapping for
LUT-Based FPGA Designs
- Christian Legl, Bernd Wurth, Klaus Eckl
This paper presents a novel, Boolean approach to LUT-based FPGA technology
mapping targeting high performance. As the core of the approach, we have
developed a powerful functional decomposition algorithm. The impact of
decomposition is enhanced by a preceding collapsing step. To decompose
functions for small depth and area, we present an iterative, BDD-based
variable partitioning procedure. The procedure optimizes the variable partition
for each bound set size by iteratively exchanging variables between bound set
and free set, and finally selects a good bound set size. Our decomposition
algorithm extracts common subfunctions of multiple-output functions, and thus
further reduces area and the maximum interconnect lengths. Experimental results
show that our new algorithm produces circuits with significantly smaller depths
than other performance-oriented mappers. This advantage also holds for the
actual delays after placement and routing.
-
45.4 New Algorithms for Gate Resizing: A Comparative Study
- Olivier Coudert, Ramsey Haddad and Srilatha Manne
Gate sizing consists of choosing for each node of a mapped network a gate
implementation in the library so that some cost function is optimized under
some constraints. It has a significant impact on the delay, power dissipation,
and area of the final circuit. This paper compares five gate sizing algorithms
targeting discrete, non-linear, non-unimodal, constrained optimization. The
goal is to overcome the non-linearity and non-unimodality of the delay and the
power to achieve good quality results within a reasonable CPU time, e.g.,
handling a 10000 node network in 2 hours. We compare the five algorithms on
constraint free delay optimization and delay constrained power optimization,
and show that one method is superior to the others.
-
45.5 Post-Layout Optimization for Deep Submicron Design
- Koichi Sato, M. Kawarabayashi, Hideyuki Emura, Naotaka Maeda
To reduce the number of synthesis and layout iterations, we present a new delay
optimization technique, which inserts bufers based on back-annotated detailed
routing information. During optimization, inserted buffers are assumed to be
placed on the appropriate location of original wires so as to calculate
accurate wire RC delay. With forward-annotated location information of inserted
buffers, the lay-out system attempts to preserve patterns of original wires
using the ECO technique. Our experimental results show that this technique
combined with the conventional gate sizing technique achieves up to 41.2% delay
reduction after the initial layout.
Chair: Alan Cave
Organizers: Y.-L. Lin, A. Domic
These papers present methods for fast extraction of accurate parasitic parameters,
analysis of layout for electromigration reliability, and improvement of
manufacturing yield by means of layout spacing.
-
46.1 Enhanced Network Flow Algorithm for Yield Optimization
- Cyrus S. Bamji, Enrico Malavasi
A novel constraint-graph algorithm for the optimization of yield is presented.
This algorithm improves the yield of a layout by carefully spacing objects to
reduce the probability of faults due to spot defects. White space between objects
is removed and spacing in tightly packed areas of the layout is increased. The
computationally expensive problem of optimizing yield is transformed into a
network flow problem, which can be solved via known eficient algorithms. Yield
can be improved either without changing the layout area, or if necessary by
increasing the layout area to maximize the number of good chips per wafer. Our
method can in theory provide the best possible yield achievable without
modifying the layout topology. The method is able to handle a general class of
convex objective functions, and can therefore optimize not only yield, but
other circuit performance functions such as wire-length, cross-talk and power.
-
46.2 Hierarchical Electromigration Reliability Diagnosis for VLSI Interconnects
- Chin-Chi Teng, Yi-Kan Cheng, Elyse Rosenbaum, Sung-Mo Kang
In this paper, we present a hierarchical reliability-driven CAD system for the
design of electromigration resistant circuits. The top of the hierarchy aims
at quickly identifying those critical interconnects with potential electromigration
reliability problems. Then the detailed electromigration analysis of critical
interconnects is carried out by an accurate and computationally efficient
simulation tool (iTEM). This top-down approach provides a feasible solution to
the complicated electromigration diagnosis problem.
-
46.3 Using Articulation Nodes to Improve the Efficiency of Finite-Element
based Resistance Extraction
- A.J. van Genderen, N.P. van der Meijs
In this paper, we describe how we have improved the efficiency of a
finite-element method for interconnect resistance extraction by
introducing articulation nodes in the finite-element mesh. The articulation
nodes are found by detecting equipotential regions and lines in the
interconnects. Without generating inaccuracies, these articulation nodes
split the finite-element mesh into small pieces that can be solved
independently. The method has been implemented in the layout-to-circuit
extractor Space. All interconnect resistances of a circuit containing 63,000
transistors are extracted on an HP 9000/735 workstation in approximately 70
minutes.
-
46.4 Extracting Circuit Models for Large RC Interconnections That Are
Accurate up to a Predefined Signal Frequency
- P.J.H. Elias, N.P. van der Meijs
This paper presents a technique that transforms large and complex RC networks
into much smaller, physically realizable RC networks. These models reflect the
transmission behavior of the initial network accurately for frequencies up to
a user-defined maximal signal frequency. This technique has been incorporated
in a layout-to-circuit extractor, using a scan-line approach. The method
guarantees numerical stability and performs excellently in modeling RC
interconnects.
-
SESSION 47
Panel: System Synthesis: Can We Meet The Challenges to Come?
Chair: Robert A. Walker
Organizer: R. Walker
Panelists:
Laurent Bergher, Raul Campasano, Daniel D. Gajski, Pierre Paulin,
Barry Shackleford, Randy Steck
Over the past few years, the field of system synthesis has begun to emerge,
combining high-level synthesis with hardware / software codesign. But as
we move toward the next millennium, will system synthesis be enough? Can
we develop the necessary tools to design the hottest processor chip in the
year 2000, or the next great cellular phone / PDA, or those home multimedia
systems that we see looming on the horizon? CAN WE MEET THE CHALLENGES TO
COME???
Chair: Hilary J. Kahn
Organizers: D. Stark, B. Frye
The key to containing the complexity of modern electronic system design
lies largely at the door of the popular HDLS. We begin this session with
a tutorial that compares and contrasts the two popular HDLs, Verilog and
VHDL and in addition shows the examples in the C language. This is followed
by a paper that promotes a VHDL coding standard and highlights a VHDL
development system that has been used with this coding standard.
-
48.1 Tutorial: VHDL & Verilog Compared & Contrasted - Plus Modeled
Example Written in VHDL, Verilog and C
- Douglas J. Smith
This tutorial is in two parts. The first part takes an unbiased view of VHDL
and Verilog by comparing their similarities and contrasting their differences.
The second part contains a worked example of a model that computes the Greatest
Common Divisor (GCD) of two numbers. The GCD is modeled at the algorithmic level
in VHDL, Verilog and for comparison purposes, C. It is then shown modeled at the
RTL in VHDL and Verilog.
-
48.2 VHDL Development System and Coding Standard
- Hans Sahm, Claus Mayer, Jörg Pleickhardt, Johannes Schuck, Stefan Späth
With the growing complexity of todays ASICs and the number of designers involved
in one VHDL ASIC project, the need for a VHDL development system together with
coding rules for simulation and synthesis has emerged. This paper describes the
VHDL Coding Standard which has been established and the VHDL development system
including code entry, code formatting, code compliance checkers, data
management and multi-user project set-up.
Chair: Massoud Pedram
Organizers: M. Pedram, F. Somenzi
Low power optimization has emerged as an important goal in IC design. This
session presents algorithms for power reduction by discrete gate resizing
and post-map resynthesis in combinational circuits and by selective disabling
of registers in sequential circuits.
-
49.1 An Exact Algorithm for Low Power Library-Specific Gate Re-Sizing
- De-Shen Chen, Majid Sarrafzadeh
In this paper we examine the problem of reducing the power consumption of a
technology mapped circuit under timing constraints. Consider a cell library
that contains multiple implementations (cells) of the same Boolean function.
We first present an exact algorithm for the problem when a complete library is
given in a complete library, "all" implementations of each cell are present.
We then propose an efficient algorithm for the problem if the provided library
is not complete.
-
49.2 Reducing Power Dissipation after Technology Mapping by Structural
Transformations
- Bernhard Rohfleisch, Alfred Kölbl, Bernd Wurth
Due to the increasing demand for low power circuits, low power dissipation has
emerged as an important optimization goal in logic synthesis. In this paper, we
show that the power dissipation of technology mapped circuits can be
significantly reduced by ATPG-based structural transformations. Our approach
performs a sequence of permissible signal substitutions, where each
substitution reduces the power consumption of the circuit. Since timing
constraints can be considered, we achieve a trade-off between power and delay.
The effectiveness of the proposed method is based on two facts. First, the
power models for library gates are reasonably accurate. Thus, the power savings
achieved by transformations of mapped circuits are also well modeled. Second,
ATPG-based structural transformations effectively exploit don't care conditions,
after technology mapping even for large circuits. Experimental results show power
reductions of 26% on average with no area penalty. Substantial power reductions
are also achieved if timing constraints are considered.
-
49.3 Desensitization for Power Reduction in Sequential Circuits
- Xiangfeng Chen, Peichen Pan, C.L. Liu
In this paper, we describe a technique for power reduction in sequential circuits.
Existing signals in the circuit are used to selectively disable some of the
registers so that a portion of the circuit will be deactivated. Consequently,
average power consumption in the circuit is reduced at a cost of small increases
in area and delay. We present an algorithm for determining the desensitizing
signal for each register. A significant amount of power reduction is achieved
in a number of benchmark circuits according to our experimental results.
Chair: Sandip Kundu
Organizers: J. Rajski, Y. Zorian
The papers in this session address fault emulation on hardware, effect of
coding density on partial scan selection and an in-depth testability analysis
technique for large macro based designs.
-
50.1 Serial Fault Emulation
- Luc Burgun, Frédéric Reblewski, Gérard Fenelon, Jean Barbier, Olivier Lepape
A hardware emulator based approach has been developed to perform test evaluation
on large sequential circuits (at least tens of thousands of gates). This
approach relies both on the flexibility and on the reconfigurability of hardware
emulators based on dedicated reprogrammable circuits. A Serial Fault Emulation
(SFE) method in which each faulty circuit is emulated separately has been
applied to gate-level circuits for Single Stuck Faults (SSFs). This approach has
been implemented on the Meta Systems’s hardware emulator which is capable of
emulating circuits of 1,000,000 gates at rates varying from 500KHz to several
MHz. Experimental results are provided to demonstrate the efficiency of SFE.
They indicate that SFE should be two orders of magnitude faster than software
approaches for designs containing more than 100.000 gates.
-
50.2 Partial Scan Design Based on Circuit State Information
- Dong Xiang, Srikanth Venkataraman, W. Kent Fuchs and Janak H. Patel
State information of a sequential circuit can be used to evaluate the
complexity of test generation. The ratio of valid states to all the states of
the circuit is an important indicator of test generation complexity. Using valid
states obtained via logic simulation, a testability measure based on the density
of encoding is proposed for scan flip flop selection. A second testability
measure based on the test generation state information is also presented and
used to select scan flip flops. Cycles are broken selectively on the basis of
the circuit state information. Good fault coverage and test efficiency are
obtained when fewer scan flip flops than the minimum cut set are selected.
Experimental results are presented to demonstrate the effectiveness of the
method.
-
50.3 Pseudorandom-Pattern Test Resistance in High-performance DSP Datapaths
- Laurence Goodby, Alex Orailoglu
The testability of basic DSP datapath structures using pseudorandom built-in
self-test techniques is examined. The addition of variance mismatched signals
is identified as a testing problem, and the associated fault detection
probabilities are derived in terms of signal probability distributions. A
method of calculating these distributions is described, and it is shown how
these distributions can be used to predict testing problems that arise from the
correlation properties of test sequences generated using linear-feedback shift
registers. Finally, it is shown empirically that variance matching using
associativity transformations can reduce the number of untested faults by a
factor of eight over variance mismatched designs.
Chair: D.M.H. Walker
Organizers: J. White, R.A. Rutenbar
High-performance designs necessarily push the envelope of what is achievable
in current fabrication technology. This session looks at the interplay between
design optimization and fab technology, for cells, libraries and full chips,
for reliability, power, performance and yield.
-
51.1 Hot-Carrier Reliability Enhancement via Input Reordering and Transistor Sizing
- Aurobindo Dasgupta, Ramesh Karri
Hot-carrier effects and electromigration are the two important failure mechanisms
that significantly impact the long-term reliability of high-density VLSI ICs.
In this paper, we present a probabilistic switch-level method for identifying
the most susceptible hot-carrier MOSFETs and improving their hot-carrier
reliability using two techniques (i) reordering of inputs to logic gates and
(ii) selective MOSFET sizing. We also show that for a given circuit, the best
design in terms of hot-carrier reliability does not necessarily coincide with
the best design in terms of power consumption.
-
51.2 A Methodology for Concurrent Fabrication Process/Cell Library Optimization
- Arun N. Lokanathan, Jay B. Brockman, John E. Renaud
This paper presents a methodology for concurrently optimizing an IC fabrication
process and a standard cell library in order to maximize overall yield. The
approach uses the Concurrent Subspace Optimization (CSSO) algorithm, which has
been developed for general coupled, multidisciplinary optimization problems.
An example is provided showing the application of the algorithm to optimizing
a mixed analog-digital library on a CMOS process.
-
51.3 Computing Parametric Yield Adaptively Using Local Linear Models
- Mien Li, Linda Milor
A divide-and-conquer algorithm for computing the parametric yield of large
analog circuits is presented. The algorithm targets applications whose
performance spreads could be highly nonlinear functions of a large numbers of
stochastic process disturbances, and therefore can not easily be modeled by
traditional response surface methods. This work addresses difficulties with
modeling by adaptively constructing the model piece by piece, namely, by
efficiently and recursively partitioning the disturbance space into several
regions, each of which is then modeled by a local linear model. Local linear
models are used because they are less sensitive to dimension than polynomial
models. Moreover, the resulting model can be made to be more accurate in some
regions compared to others. The number of simulations required in statistical
modeling can therefore be reduced since only critical regions, which define the
boundary of the feasible region in the space of process disturbances, are
modeled highly accurately. The resulting models are then used as cheap
surrogates for circuit simulation in Monte Carlo estimation of the parametric
yield. Examples indicate the efficiency and accuracy of this approach.
|