ABSTRACTS DAC 96

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51]


SESSION 1 PANEL: THE EDA YEAR IN REVIEW: CEO'S, THE PRESS, AND USERS

Chair: John Cooley
Organizer: John Cooley
Panelists: Joe Costello, Aart de Geus, Richard Goering, Alain Hanover, Wally Rhines, Gary Smith

A wide variety of issues will be addressed from a customer perspective by EDA CEO's and industry pundits. Panelists will address questions and concerns submitted by attendees.


SESSION 2 HIGH SPEED INTERCONNECT

Chair: Gary Smith
Organizer: S. Boose

High speed PCB design implementation is increasingly influenced by interconnect issues as much as logic design. This session opens with the presentation "Effective Interconnect and Package Modeling Techniques" using a 2.4 GHz RF to IF converter example. Using several model examples this paper determines which parasitic quantities are the important requirements. The panel discussion that follows will explore the technology available to address high speed design requirements from both the vendor and customer point of view by exploring both the customer and vendor view of the tools and technology for high speed interconnect design.

2.1 Package and Interconnect Modeling of the HFA3624, a 2.4 GHz RF to IF Converter
Mattan Kamon, Steve S. Majors

This paper demonstrates the use of three-dimensional electromagnetic field solvers to accurately model the interconnect for a 2.4 GHz RF to IF Converter integrated circuit under development at Harris Semiconductor. Lumped element RLC models for the package leads, bondwires, die attach plane, and on-chip interconnect are shown to provide accurate simulation versus measurement and prove the importance of modeling the entire packaged IC.

Panel: PCB Synthesis - Is the Technology Ready for High Speed Design ?
Panelists: Robert Gonzales, Mark Leonard, Fred Saal, John Schoenfeld, Jonathan Weis, Mike White

Today’s technology imperative demands faster products with greater data bandwidths and an increasing number of complex devices. Responding to these demands while meeting today’s time to market pressures limits engineers’ tolerance for traditional design processes that depend on an unmanageable number of iterations between design and layout. A panel of industry experts led by Gary Smith, Senior EDA Analyst from DataQuest, will focus on the changing issues and challenges facing engineers in the design of high speed circuits.


SESSION 3 POWER ANALYSIS

Chair: Bob Frye
Organizers: S. Nance, N. Weste

Power management is a topic of growing concern in circuit and system design. Papers in this session deal with methods of power analysis and optimization.

3.1 HEAT: Hierarchical Energy Analysis Tool
Janardhan H. Satyanarayana, Keshab K. Parhi

This paper presents an algorithm for the estimation of power in static CMOS digital circuits using a stochastic approach. The salient feature of this approach is that it can be used to estimate the power of reasonably large digital circuits in a very short time, due to its hierarchical nature. Here, the given circuit is first partitioned into smaller sub-circuits. Then, the sub-circuits are modeled using state transition diagrams (stds), and the steady-state probabilities associated with the various states are computed by treating them as irreducible Markov chains. Finally, the energy associated with each sub-circuit is computed, and the total energy of the circuit is obtained by summing up the energies of its constituent sub-circuits. In the proposed hierarchical approach, the energies associated with various edges in a sub-circuit are calculated only once using SPICE and these values are used several times; this results in large savings in computation time. Another advantage of the proposed approach is that we can accommodate switching activities at the transistor level and not necessarily at gate or higher levels. Experimental results show that the estimated power is in close agreement with the actual power obtained from exhaustive SPICE simulations, but the computation time required by the proposed approach is orders of magnitude less than that of SPICE.

3.2 Opportunities and Obstacles in Low-power System-Level CAD
Andrew Wolfe

A case study in low-power system-level design is presented. We detail the design of a low-power embedded system, a touchscreen interface device for a personal computer. This device is designed to operate on excess power provided by unused RS232 communication lines. We focus on the design and measurement procedures used to reduce the power requirements of this system to less than 50mW. Additionally, we highlight opportunities to use system-level design and analysis tools for low-power design and the obstacles that prevented using such tools in this design.

3.3 POSE: Power Optimization and Synthesis Environment
Sasan Iman, Massoud Pedram

Recent trends in the semiconductor industry have resulted in an increasing demand for low power circuits. POSE is a step in providing the EDA community and academia with an environment and tool suite for automatic synthesis and optimization of low power circuits. POSE provides a unified framework for specifying and maintaining power relevant circuit information and means of estimating power consumption of a circuit using different load models. POSE also gives a set of options for making are-power trade-offs during logic optimization.

3.4 Early Power Exploration - A World Wide Web Application
David B. Lidsky, Jan M. Rabaey

Exploration at the earliest stages of the design process is an integral component of effective low-power design. Nevertheless, superficial high-level analysis with insufficient accuracy are routinely performed. Critical drawbacks of current high-level design aids include a limited scope of application, inaccuracy of estimation, inaccessibility, and steep learning curves. This paper introduces an approach to alleviate these limitations, thus enabling more effective high-level design exploration. A World Wide Web (WWW) based prototype called PowerPlay, which encapsulates and enhances these techniques, is presented.


SESSION 4 CURRENT DIRECTIONS IN HIGH LEVEL SYNTHESIS

Chair: Hiroto Yasuura
Organizers: R. Walker, K. Wakabayashi

As high level synthesis matures, a number of exciting new directions are beginning to emerge. This session, the first of three sessions on high-level synthesis, begins with a brief tutorial on current issues in MLS. It then presents current work on synthesizing iterative intensive computations, on generating code for DSP architectures, on analyzing timing tradeoffs, and optimizing high level HDL descriptions using don't care information.

4.1 Tutorial: Behavioral Synthesis
Raul Camposano

Since the 1988 tutorial on behavioral (high-level) synthesis which appeared in the DAC [1] proceedings (and later in the proceedings of the IEEE [2]) much has happened in the field. Behavioral synthesis is now considered mainstream EDA as evidenced by the number of articles at conferences, journals and books. May be even more significant, several products are being offered commercially, and the number of designers using behavioral synthesis is probably in the hundreds. The initial promise of behavioral synthesis, to dramatically increase productivity by elevating the level of design, has been fulfilled. The increase in productivity of behavioral design versus RTL design is typically quoted at 5 times. What came as a surprise to most researchers in the field, is that this is achieved without impacting the quality of results (area, timing)! If anything , behavioral designs are slightly smaller and faster that their RTL counterparts, mainly because much more architectural exploration can be afforded at the behavioral level, thus providing a better starting point for synthesis.

4.2 A Register File and Scheduling Model for Application Specific Processor Synthesis
Ehat Ercanli, C. Papachristou

In this paper, we outline general design steps of our synthesis tool to realize application specific co-processors such that for a given scientific application having intensive iterative computations especially with recurrences, a VLIW type of co-processor is synthesized and realized, and an accompanying parallel code is generated. We introduce a novel register file model, Shifting Register File (SRF), based on cyclic regularity of register file accesses; and a simple method, Expansion Scheduling, for scheduling iterative computations, which is based on cyclic regularity of loops. We also present a variable-register file allocation method and show how simple logic units can be used to activate proper registers at run time through an example.

4.3 Optimized Code Generation of Multiplication-Free Linear Transforms
Mahesh Mehendale, G. Venkatesh, S.D. Sherlekar

We present code generation of multiplication-free linear transforms targeted to single-register DSP architectures such as TMS320C2x/C5x. We first present an algorithm to generate optimized code from a DAG representation. We then present techniques that transform a DAG so as to minimize the number of nodes and the accumulator-spills. We then introduce a concept of spill-free DAGs and present an algorithm for synthesizing such DAGs. The results for Walsh-Hadamard, Haar and Slant transforms show 25% to 40% reduction in the cycle count using our techniques.

4.4 Concurrent Analysis Techniques for Data Path Timing Optimization
Chuck Monahan, Forrest Brewer

Conventional High-level Synthesis techniques create an inter-connection structure before physical design. Following physical design, connection delays and special requirements may cause the structure to fail timing or performance constraints. Alterations to this structure are often limited since it was created either after or during the binding and scheduling tasks. In this paper we present a set of techniques which analyze the timing trade-offs associated with the position-specific interconnection network given the freedom of high-level binding and rescheduling changes.

4.5 HDL Optimization Using Timed Decision Tables
Jian Li, Rajesh K. Gupta

System-level pre-synthesis refers to the optimization of an input HDL description that produces an optimized HDL description suitable for subsequent synthesis tasks. In this paper, we present optimization of control flow in behavioral HDL descriptions using external Don't Care conditions. The optimizations are carried out using a tabular model of system functionality, called Timed Decision Tables or TDTs. TDT based optimization presented here have been implemented in a program called PUMPKIN . Optimization results from several examples show a reduction of 3-88% in the size of synthesized hardware circuits depending upon the external Don't Care information supplied by the user.


SESSION 5 ANALYSIS AND SYNTHESIS OF ASYNCHRONOUS CIRCUITS

Chair: Peter Beerel
Organizers: I. Lavagno, B. Lin

This session presents efficient techniques for the design of asynchronous circuits. The first two papers present efficient algorithms for analyzing the timing behavior of timed circuits. The third paper presents a new heuristic state encoding technique. The fourth paper addresses the problem of synthesizing distributed control circuits from high-level descriptions. The fifth paper presents a heuristic algorithm for the two-level minimization of hazard-free circuits. The final paper is devoted to synthesizing asynchronous designs at the transistor level.

5.1 Efficient Partial Enumeration for Timing Analysis of Asynchronous Systems
Eric Verlind, Gjalt de Jong, Bill Lin

This paper presents an efficient method for the timing verification f concurrent systems, modeled as labeled Timed Petri nets. The verification problems we consider require us to analyze the systems reachable behaviors under the specified time delays. Our geometric timing analysis algorithm improves over existing ones by enumerating the state space only partially. The algorithm relies on a concept called premature firing and a new, extended notion of clocks with a negative age. We have tested the fully automated procedure on a number of examples. Experimental results obtained on highly concurrent Petri nets with more than 6000 nodes and 10 210 reachable states show that the proposed method can drastically reduce computational cost.

5.2 Verification of Asynchronous Circuits Using Time Petri-Net Unfolding
Alexei Semenov, Alex Yakovlev

This paper describes a novel approach to timing analysis and verification of asynchronous circuits with bounded delays. The method is based on the time-driven unfolding of a time Petri net model of a circuit. Each reachable state, together with its timing constraints is represented implicitly. Our method is used to verify freedom from hazards in asynchronous circuits consisting of micropipeline components and logic gates.

5.3 Methodology and Tools for State Encoding in Asynchronous Circuit Synthesis
Jordi Cortadella, Michael Kishinevsky, Alex Kondratyev, Luciano Lavagno, Alex Yakovlev

This paper proposes a state encoding method for asynchronous circuits based on the theory of regions. A region in a Transition System is a set of states that behave uniformly with respect to a given transition (value change of an observable signal), and is analogue to a place in a Petri net. Regions are tightly connected with a set of properties that must be preserved across the state encoding process, namely: (1) trace equivalence between the original and the encoded specification, and (2) implementability as a speed-independent circuit. We build on a theoretical body of work that has shown the significance of regions for such property-preserving transformations, and describe a set of algorithms aimed at efficiently solving the encoding problem . The algorithms have been implemented in a software tool called petrify. Unlike many existing tools, petrify represents the encoded specification as an STG, and thus allows the designer to be more closely involved in the synthesis process. The efficiency of the method is demonstrated on a number of “difficult examples.

5.4 A Technique for Synthesizing Distributed Burst-Mode Circuits
Prabhakar Kudva, Ganesh Gopalakrishnan, Hans Jacobson

We offer a technique to partition a centralized control-flow graph to obtain distributed control in the context of asynchronous high-level synthesis. The technique targets Huffman-style asynchronous controllers that are customized to the problem. It solves the key problem of handling signals that are shared between the partitions—a problem due to the incompletely specified nature of asynchronous controllers. We report encouraging experimental results on realistic examples.

5.5 Espresso-HF: A Heuristic Hazard-Free Minimizer for Two-Level Logic
Michael Theobald, Steven M. Nowick, Tao Wu

We present a new heuristic algorithm for hazard-free minimization of two-level logic. On nearly all examples, the algorithm finds an exactly minimum-cost cover. It also solves several problems which have not been previously solved using existing exact minimizers. We believe this is the first heuristic method based on Espresso to solve the general hazard-free two-level minimization problem, for multiple-input change transitions.

5.6 Synthesis of Hazard-Free Customized CMOS Complex-Gate Networks Under Multiple-Input Changes
Prabhakar Kudva, Ganesh Gopalakrishnan, Hans Jacobson, Steven M. Nowick

This paper addresses the problem of realizing hazard-free single-output Boolean functions through a network of customized complex CMOS gates tailored to a given asynchronous controller specification. A customized CMOS gate network can either be a single CMOS gate or a multilevel network of CMOS gates. It is shown that hazard-free requirements for such networks are less restrictive than for simple gate networks. Analysis and efficient synthesis methods to generate such networks under a multiple-input change assumption (MIC) will be presented.


SESSION 6 NEW FRONTIERS IN PARTITIONING

Chair: D.F. Wong
Organizers: A. Domic, A.B. Kahng

This session opens with an invited review of the latest challenges and directions for circuit partitioning tools. The first paper refines the classic spectral approach via search over the objective and use of clustering information. The second paper proposes a new methodology for generating benchmark inputs for layout tools. The last paper gives an effective tie-breaking enhancement for the FM algorithm.

6.1 Tutorial: Partitioning of VLSI Circuits and Systems
Frank M. Johannes

Partitioning plays an increasingly important role in the design process of VLSI circuits and systems. There are partitioning problems to be solved on all levels of abstraction. The rapidly increasing size of the designs will make good partitioning tools even more essential in the future. As an introduction to the other papers in this session, this tutorial presentation discusses the numerous facets of partitioning.

6.2 New Spectral Linear Placement and Clustering Approach
Jianmin Li, John Lillis, Lung-Tien Liu, Chung-Kuan Cheng

This paper addresses the linear placement problem by using a spectral approach. It has been demonstrated that, by giving a more accurate representation of the linear placement problem, a linear objective function yields better placement quality in terms of wire length than a quadratic objective function as in the eigenvector approach [4][11][6]. On the other hand, the quadratic objective function has an advantage in that it tends to place components more sparsely than the linear objective function, resulting in a continuous solution closer to a physically feasible discrete solution. In this paper, we propose an alpha -order objective function to capture the strengths of both the linear and quadratic objective functions. We demonstrate that our approach yields improved spectral placements. We also present a bottom-up clustering algorithm which iteratively collapses pairs of nodes in a graph using local and global connectivity information, where the global connectivity information is derived from the clustering property of the eigenvector approach. The effect of our new spectral linear placement and clustering approach is demonstrated on benchmark circuits from MCNC.

6.3 Characterization and Parameterized Random Generation of Digital Circuits
Michael Hutton, J.P. Grossman, Jonathan Rose, Derek Corneil

The development of new Field-Programmed, Mask-Programmed and Laser-Programmed Gate Array architectures is hampered by the lack of realistic test circuits that exercise both the architectures and their automatic placement and routing algorithms. In this paper, we present a method and a tool for generating parameterized and realistic random circuits. To obtain the realism, we propose a set of graph-theoretic characteristics that describe a physical netlist, and have built a tool that can measure these characteristics on existing circuits. The generation tool uses the characteristics as constraints in the random circuit generation. To validate the quality of the generated netlists, parameters that are not specified in the generation are compared with those of real circuits, and with those of "random" graphs.

6.4 A Probability-Based Approach to VLSI Circuit Partitioning
Shantanu Dutt, Wenyong Deng

Iterative-improvement 2-way min-cut partitioning is an important phase in most circuit partitioning tools. Most iterative improvement techniques for circuit netlists like the Fidducia-Mattheyses (FM) method compute the gains of nodes using local netlist information that is only concerned with the immediate improvement in the cut-set. This can lead to misleading gain calculations. Krishnamurthy suggested a lookahead (LA) gain calculation method to ameliorate this situation; however, as we show, it leaves considerable room for improvement. We present here a probabilistic gain computation approach called PROP 1 that is capable of capturing the global and future implications of moving a node at the current time. Experimental results show that for the same number of runs, PROP performs much better than FM (by about 30%) and LA (by about 27%), and is also better than many recent state-of-the-art clustering-based partitioners like EIG1, WINDOW, MELO and PARABOLI by 15% to 57%. We also show that the space and time complexities of PROP are very reasonable. Our empirical timing results reveal that it is appreciably faster than the above clustering-based techniques, and only a little slower than FM and LA, both of which are very fast.


SESSION 7 TRENDS IN VERIFICATION

Chair: James Rowson
Organizers: N. Collins, R. Goeing

Verification is one of the most important design tasks. New techniques are constantly emerging to address different portions of the broad verification area. Following a short tutorial, panelists will discuss how some new techniques contrast and how they fit with existing tools.

7.1 Tutorial: Verification of Electronic Systems
Alberto L. Sangiovanni-Vincentelli, Patrick C. McGeer, Alexander Saldanha

The complexity of electronic systems is rapidly reaching a point where it will be impossible to verify correctness of the design without introducing a verification-aware discipline in the design process. Even though computers and design tools have made important advances, the use of these tools in the commonly practiced design methodology is not enough to address the design correctness problem since verification is almost always an after-thought in the mind of the designer. A design methodology should on one hand put to good use all techniques and methods developed thus far for verification, from formal verification to simulation, from visualization to timing analysis, but should also have specific conceptual devices for dealing with correctness in the face of complexity such as:

  • Formalization, which consists of capturing the design and its specification in an unambiguous, formal language” with precise semantics
  • Abstraction, which eliminates details that are of no importance when checking whether a design satisfies a particular property.
  • Decomposition, which consists of breaking the design at a given level of the hierarchy into components that can be designed and verified almost independently.

Panel: Hot New Trends in Verification
Panelists: Anant Agarwal, Willis Hendley, Isadore Katz, Don McInnis, Patrick Scaglia, Alex Silbey

Verification is an increasingly important design activity. As gate counts grow larger and integrated circuit complexity increases, design challenges are greater than ever. Traditional verification approaches - logic simulation, specifically - aren't effective or particularly accurate. For example, the time and effort required to perform functional and timing simulation is making it an impracticality. In addition, designers are utilizing more and more existing intellectual property which enables design reuse and helps reduce implementation time. Verifying existing intellectual property creates another problem - the large number of simulation vectors.


SESSION 8 SPECIALIZED DESIGN TECHNIQUES FOR SPEED AND POWER

Chair: Scott Nance
Organizers: D. Stark, B. Frye

Power and speed considerations are critical in the design process. This session offers new techniques and actual design examples for post layout timing simulation, verifying critical path delays and designer low voltage/low power circuits.

8.1 Design Considerations and Tools for Low-Voltage Digital System Design
A.P. Chandrakasan, Isabel Yang, Carlin Vieri, Dimitri Antoniadis

Aggressive voltage scaling to 1V and below through technology, circuit, and architecture optimization has been proven to be the key to ultra low-power design. The key technology trends for low-voltage operation are presented including low-threshold devices, multiple-threshold devices, and SOI and bulk CMOS based variable threshold devices. The requirements on CAD tools that allow designers to choose and optimize various technology, circuit, and system parameters are also discussed.

8.2 VAMP: A VHDL Based Concept for Accurate Modeling and Post Layout Timing Simulation of Electronic Systems
Bernhard Wunder, Gunther Lehmann, K.D. Müller-Glaser

This paper presents a new concept for accurate modeling and timing simulation of electronic systems integrated in a typical VHDL design environment, taking into account the requirements of deep submicron technology. Contrary to conventional concepts, autonomous models for gates and interconnections are used. A piece-wise-linear signal representation allows to model waveform dependent effects. Furthermore, the gate models catch pattern dependencies, the models of interconnections take into account post layout information with ramified structure, different layers and even contact holes.

8.3 A Systematic Technique for Verifying Critical Path Delays in A 300 MHz Alpha CPU Design Using Circuit Simulation
Madhav P. Desai, Y.T. Yen

A static timing verifier is an important tool in the design of a complex high performance VLSI chip such as an Alpha CPU. A timing verifier uses a simple and pessimistic delay model to identify critical failing paths in the design, which then need to be fixed. However, the pessimistic delay model results in a large number of correct paths being identified as failing paths, possibly leading to wasted design resources. Therefore, each critical path identified by the timing verifier needs to be analyzed using a circuit simulator such as SPICE in order to confirm that it is a real failure. Setting up such a simulation is complex, especially when the critical path consists of structures appearing in a datapath of the CPU. In this paper, we present algorithms for the construction of a model for simulating the maximum delay through a critical path. This technique has been used to analyze several critical paths during the design of a 300MHz Alpha CPU.


SESSION 9 TEST AND FAULT TOLERANCE IN HIGH LEVEL SYNTHESIS

Chair: C. Papachristou
Organizers: K. Wakabayashi, R. Camposano

As synthesis migrates toward higher levels, test and fault tolerance issues move to be considered earlier in the design process. This session presents novel approaches to self-diagnosis, reconfiguration and test resource estimation at a high level.

9.1 Tutorial: High-Level Synthesis for Testability: A Survey and Perspective
Kenneth D. Wagner, Sujit Dey

We review behavioral and RTL test synthesis and synthesis for testability approaches that generate easily testable implementations. We also include an overview of high-level synthesis techniques to assist high-level ATPG.

9.2 Introspection: A Low Overhead Binding Technique During Self-Diagnosing Microarchitecture Synthesis
Balakrishnan Iyer, Ramesh Karri

Introspection, a zero-overhead binding technique during self-diagnosing microarchitecture synthesis is presented. Given a scheduled control data flow graph (CDFG) introspective binding exploits the spare computation and data transfer capacity in a synergistic fashion to achieve low latency fault diagnostics with near zero area overheads without compromising the performance. The resulting on-chip fault latencies are one ten-thousandth (10^-4) of previously reported system level diagnostic techniques. A novel feature of the proposed technique is the use of spare data transfer capacity in the interconnect network for diagnostics.

9.3 Lower Bounds on Test Resources for Scheduled Data Flow Graphs
Ishwar Parulkar, Sandeep K. Gupta, Melvin A. Breuer

Lower bound estimations of resources at various stages of high-level synthesis are essential to guide synthesis algorithms towards optimal solutions. In this paper we present lower bounds on the number of test resources (i.e. test pattern generators, signature analyzers and CBILBO registers) required to test a synthesized data path using built-in self-test (BIST). The estimations are performed on scheduled data flow graphs and provide a practical way of selecting or modifying module assignments and schedules such that the resulting synthesized data path requires a small number of test resources to test itself.


SESSION 10 ISSUES IN DISCRETE SIMULATION

Chair: Jay Lawrence
Organizers : R. McGeer, K. Sakallah

This session covers a full range of issues in Discrete Simulation. The first paper examines a number of issues in mixed-mode co-simulation through a software backplane. The second paper looks at an interesting issue in 4-valued simulation: the distinction between the unstable and unknown states. The final two papers introduce methods to compact vector sets for power simulation.

10.1 Symphony: A Simulation Backplane for Parallel Mixed-Mode Co-Simulation of VLSI Systems
Antonio R.W. Todesco, Teresa H-Y. Meng

In this paper we present an integrated simulation paradigm in which parallel mixed-mode co-simulation is accomplished by integrating sequential simulators in a software simulation backplane. Distributed conservative event-driven scheduling is used in combination with an efficient deadlock-free mechanism for handling synchronous feedback circuits. Simulation concurrency can be further increased by utilizing circuit pipeline. The resulting parallel simulation backplane is capable of concurrently simulating systems at circuit, switch, gate, RTL and behavioral levels. We implemented this parallel mixed-mode simulator on both the iPSC/860 message-passing machine and the DASH shared-memory multi-processor. Experimental results are presented.

10.2 Oscillation Control in Logic Simulation Using Dynamic Dominance Graphs
Peter Dahlgren

Logic-level modeling of asynchronous circuits in the presence of races frequently gives rise to oscillation. A new method for solving oscillation occurring in feedback loops (FLs) is presented. First, a set of graph traversal algorithms is used to locate the FLs and order them with respect to a dominance relation. Next, a sequence of resimulations with the feedback vertices forced into stable states is performed. The proposed method can handle noncritical races occurring in asynchronous circuits and has applications in feedback bridging fault simulation.

10.3 Compact Vector Generation for Accurate Power Simulation
Shi-Yu Huang, Kuang-Chien Chen, Kwang-Ting Cheng, Mike Tien-Chien Lee

Transistor-level power simulators have been popularly used to estimate the power dissipation of a CMOS circuit. These tools strike a good balance between the conventional transistor-level simulators, such as SPICE, and the logic-level power estimators with regard to accuracy and speed. However, it is still too time-consuming to run these tools for large designs. To simulate one-million functional vectors for a 50K-gate circuit, these power simulators may take months to complete. In this paper, we propose an approach to generate a compact set of vectors that can mimic the transition behavior of a much larger set of functional vectors, which is given by the designer or extracted from application programs. This compact set of vectors can then replace the functional vectors for power simulation to reduce the simulation time while still retaining a high degree of accuracy. We present experimental results to show the efficiency and accuracy of this approach.

10.4 Improving The Efficiency of Power Simulators by Input Vector Compaction
Chi-ying Tsui, Radu Marculescu, Diana Marculescu, Massoud Pedram

Accurate power estimation is essential for low power digital CMOS circuit design. Power dissipation is input pattern dependent. To obtain an accurate power estimate, a large input vector set must be used which leads to very long simulation time. One solution is to generate a compact vector set that is representative of the original input vector set and can be simulated in a reasonable time. In this paper, we propose an input vector compaction technique that preserves the statistical properties of the original sequence. Experimental results show that a compaction ratio of 100X is achieved with less than 2% average error in the power estimates.


SESSION 11 ISSUES IN DESIGN ENVIRONMENTS

Chair: Michaela Guiney
Organizers : D. Ku, R.A. Rutenbar

Design environments are central to the success of any complex design project. This session focuses on efficient inter-tool communication, strategies to manage design disciplines and workflows, and quantitative evaluation of the design process.

11.1 Efficient Communication in a Design Environment
Idalina Videira, Paulo J.E. Verissimo, M. Helena Sarmento

This paper presents a new communication service. The novelty of the work resides in the distributed architecture adopted which is based on communication agents in every tool and in every host of the design environment. The importance of the work is demonstrated by the results achieved: improved performance, reduced network traffic and fault-tolerance to host and network failures.

11.2 A Description Language for Design Process Management
Peter R. Sutton, Stephen W. Director

A language for defining design discipline characteristics is proposed. Design discipline characteristics such as abstraction levels, design object classifications and decompositions, design objectives, and design methodologies can be defined in a simple machine and human readable form. The language, DDDL, has led to the development of a user interface for Minerva II, a design process management tool, which, when configured with a design discipline description, can be used to aid teams of designers in the solution of complex multi-disciplinary design problems. DDDL and user interface examples are presented.

11.3 Improved Tool and Data Selection in Task Management
John W. Hagerman, Stephen W. Director

Task management involves task creation and execution. These are facilitated using a task schema as exemplified in the Hercules Task Manager. Experience with Hercules has shown the task schema to be very useful for task creation, but less than ideal for task resolution, i.e., the selection of tool and data resources to be used in execution. Tool/data interactions often lead to resource selection constraints that cannot be captured using dependency relationships in the schema. We have addressed this by adding conditions to the task schema which use task-level meta-data to constrain resource selection. With examples we show that conditions are useful for handling a wide variety of real constraints.

11.4 Application of a Markov Model to the Measurement, Simulation, and Diagnosis of an Iterative Design Process
Eric W. Johnson, Luis Castillo, Jay B. Brockman

This paper presents the use of a Markov-based model for analyzing iterative design processes. Techniques are developed for collecting process metadata and calibrating the model. An experiment is described that demonstrates the utility and accuracy of the model for simulating design processes and identifying design process bottlenecks.


SESSION 12 PANEL: GEARING UP FOR THE TECHNOLOGY EXPLOSION

Chair: Gary Smith
Organizer: M. Kenefick
Panelists: Walt Davis, Glenn House, Kurt Keutzer, Jim Pena, Craig Peterson, Lawrence Rubin, Jim Solomon

Over the next decade it is predicted that demands for advanced electronics equipment will create a deficit in fab capacity, engineering talent, and CAD tools needed to meet high abstraction, large gate count, deep submicron designs. The use of compact electronics in consumer products is becoming pervasive and to meet the demand, fabrication capacity and human resources must be increased to enable the design and manufacture of new semiconductor chips. A key component to this technology explosion is understanding technology trends, where to invest, how to gear up new designs quickly and how to re-use key design blocks to cut design time. This panel will explore technology trends with key technologists from the user and design automation industries.


SESSION 13 TUTORIAL: THE SPICE FET MODELS: PITFALLS AND PROSPECTS (Are You An Educated Model Consumer ?)

Chair: Daniel Foty
Organizer: J. Cooley
Presenter: Daniel Foty

If you're a digital design engineer who needs to get reacquainted with SPICE in order to do sub-micron design, then you should attend this tutorial! The tutorial will discuss FET modeling using several popular flavors of SPICE. Benefits and pitfalls of different SPICE flavors will be highlighted.


SESSION 14 COMBINATIONAL LOGIC SYNTHESIS I

Chair: Gary D. Hachtel
Organizers: S. Malik, R. McGeer

This session starts off with a one hour embedded tutorial describing what it takes to design a practical logic synthesis system. This will provide the background needed for the subsequent papers in logic synthesis. The second paper presents exciting new results on the binate covering problem which has several applications in logic synthesis.

14.1 Tutorial: Design of a Logic Synthesis System
Richard Rudell

Logic synthesis systems are complex systems and algorithmic research in synthesis has become highly specialized. This creates a gap where it is often not clear how an advance in a particular algorithm translates into a better synthesis system. This tutorial starts by describing a set of constraints which synthesis algorithms must satisfy to be useful. A small set of established techniques are reviewed relative to these criteria to understand their applicability and the potential for further research in these areas.

14.2 On Solving Covering Problems
Olivier Coudert

The set covering problem and the minimum cost assignment problem (respectively known as unate and binate covering problem) arise throughout the logic synthesis flow. This paper investigates the complexity and approximation ratio of two lower bound computation algorithms from both a theoretical and practical point of view. It also presents a new pruning technique that takes advantage of the partitioning.


SESSION 15 PATTERN GENERATION FOR TEST AND DIAGNOSIS

Chair: Janusz Rajski
Organizers: S. Kundu, Y. Zorian

This session presents new more effective algorithms to generate tests that guarantee complete diagnosis of multiple interconnect faults. Boolean satisfiability is the basis of new formulation and more efficient algorithms to generate tests for path delay faults. New compaction techniques for test sequences of sequential circuits which significantly reduce the test application time and memory requirements.

15.1 A New Complete Diagnosis Patterns for Wiring Interconnects
Sungju Park

It is important to test the various kinds of interconnect faults between chips on a card/module. When boundary scan design techniques are adopted, the chip to chip interconnection test generation and application of test patterns is greatly simplified. Various test generation algorithms have been developed for interconnect faults. A new interconnect test generation algorithm is introduced. It reduces the number of test patterns by half over present techniques. It also guarantees the complete diagnosis of multiple interconnect faults.

15.2 A Satisfiability-Based Test Generation for Path Delay Faults in Combinational Circuits
Chih-Ang Chen, Sandeep K. Gupta

This paper describes a new formulation to generate robust tests for path delay faults in combinational circuits based on Boolean satisfiability. Conditions to detect a target path delay fault are represented by a Boolean formula. Unlike the technique described in [16], which extracts the formula for each path delay fault, the proposed formulation needs to extract the formula only once for each circuit cone. Experimental results show tremendous time saving on formula extraction compared to other satisfiability-based ATPG algorithms. This also leads to low test generation time, especially for circuits that have many paths but few outputs. The proposed formulation has also been modified to generate other types of tests for path delay faults.

15.3 On Static Compaction of Test Sequences for Synchronous Sequential Circuits
Irith Pomeranz, Sudhakar M. Reddy

We propose three static compaction techniques for test sequences of synchronous sequential circuits. We apply the proposed techniques to test sequences generated for benchmark circuits by various test generation procedures. The results show that the test sequences generated by all the test generation procedures considered can be significantly compacted. The compacted sequences thus have shorter test application times and smaller memory requirements. As a by-product, the fault coverage is sometimes increased as well. More importantly, the ability to significantly reduce the length of the test sequences indicates that it may be possible to reduce test generation time if superfluous input vectors are not generated.


SESSION 16 CAD FOR ANALOG AND MIXED SIGNAL ICs

Chair: James Spoto
Organizers : R.A. Rutenbar, J. White

Analog and mixed-signal designs continue to present serious challenges to existing tools. This session offers new techniques for high-level analog synthesis, behavioral modeling, and performance-oriented layout.

16.1 An O(n) Algorithm for Transistor Stacking with Performance Constraints
Bulent Basaran, Rob A. Rutenbar

We describe a new constraint-driven stacking algorithm for diffusion area minimization of CMOS circuits. It employs an Eulerian trail finding algorithm that can satisfy analog-specific performance constraints. Our technique is superior to other published approaches both in terms of its time complexity and in the optimality of the stacks it produces. For a circuit with n transistors, the time complexity is O(n). All performance constraints are satisfied and, for a certain class of circuits, optimum stacking is guaranteed.

16.2 Use of Sensitivities and Generalized Substrate Models in Mixed-Signal IC Design
Paolo Miliozzi, Iasson Vassiliou, Edoardo Charbon, Enrico Malavási, Alberto Sangiovanni-Vincentelli

A novel methodology for circuit design and automatic layout generation is proposed for a class of mixed-signal circuits in presence of layout parasitics and substrate induced noise. Accurate and efficient evaluation of the circuit during design is possible by taking into account such non-idealities. Techniques are presented to derive and use a set of constraints on substrate noise and on the geometric instances of the layout. Verification is performed using substrate extraction in combination with parasitic estimation techniques. To show the suitability of the approach, a VCO for a PLL has been designed and implemented in a CMOS 1um technology. The circuit has been optimized both at the schematic and at the layout level for power and performance, while its sensitivity to layout parasitics and substrate noise has been minimized.

16.3 RTL Emulation: The Next Leap in System Verification
Sanjay Sawant, Paul Giordano

The worldwide electronics market is booming, fueled by the customers’ insatiable appetite for low-cost computers, connectivity and appliances packed with high-technology features. While the sheer number of new design starts may not be noteworthy, the development of new chips and systems that are becoming more complex at a phenomenal rate.

16.4 Equation-Based Behavioral Model Generation for Nonlinear Analog Circuits
Carsten Borchers, Lars Hedrich and Erich Barke

A fully automatic method for generating behavioral models for nonlinear analog circuits is presented. This method is based on simplifications of the system of nonlinear differential equations which is derived from a transistor level netlist. Generated models include nonlinear dynamic behavior. They are composed of symbolic equations comprising circuit parameters. Accuracy and simulation speed-up are shown by several examples.


SESSION 17 PANEL: CORE-BASED DESIGN FOR SYSTEM-LEVEL ASICs - WHOSE JOB IS IT?

Chair: Lynn Watson
Organizer: R. Goldman
Panelists: Kim Asal, Andreas Danuser, Chris King, Susan Mason, Wayne Miller, Jim Pena, Scott Runner

Constrained both by the complexity of system-level, core-based design, and by limited engineering resources of the ASIC/semiconductor vendors, today's burgeoning core-based design industry must find new avenues to fulfill its market/revenue potential. This panel explores new roles, relationships and technologies that bridge the gap between the market/system-level expertise of the system vendor, the process/core-based design expertise of the ASIC vendor, and the design tool and methodology expertise of the EDA vendor.


SESSION 18 PANEL: A COMMON STANDARDS ROADMAP

Chair: Alain Hanover
Organizer: J. Smith
Panelists: Rich Goldman, Andy Graham, Randolph E. Harr, Gregory W. Ledenbach, A. Richard Newton, Robert Rozeboom, Tabuchi Kinya

The complexity and design of high performance electronic systems demands highly interoperable EDA design environments. Today we are living with Moore's law, which states "the complexity of technology continues to double every two years". And we are also diving into the world of deep sub-micron design where traditional assumptions about gate and interconnect delays and circuit modeling are constantly challenged.
How are the various consortia working together to define a roadmap that will serve well into the 21st century? This panel is composed of individuals representing organizations that share the goals of facilitating interoperability, reducing implementation costs, and supporting standards. They will discuss technology trends that are driving the standards processes and evaluate the technology roadmap needed for the 21st century.


SESSION 19 COMBINATIONAL LOGIC SYNTHESIS II

Chair: Iris Bahar
Organizers: R. McGeer, S. Malik

Logic synthesis has stabilized somewhat over the classic domains of area and delay minimization of random logic. This leaves the problem of synthesis of specialized circuits or use of complex subcircuits. The four papers in this session examine separate aspects of this problem.

19.1 Multilevel Logic Synthesis for Arithmetic Functions
Chien-Chung Tsai, Malgorzata Marek-Sadowska

The arithmetic functions, as a subclass of Boolean functions, have very compact descriptions in the AND and XOR operators. Any n-bit adder is a prime example. This paper presents a multilevel logic synthesis method which is particularly suited for arithmetic functions and utilizes their natural representations in the field GF(2). Algebraic factorization is performed to reduce the literal count. A direct translation of the AND/XOR representations of arithmetic functions into multilevel networks often results in excessive area, mainly due to the large area cost of XOR gates. We present a process of redundancy removal which reduces many XOR gates to single AND or OR gates without altering the functional behavior of the network. The redundancy removal process requires only to simulate a small and decidable set of primary input patterns. Preliminary results show that our method produces circuits, before and after technology mapping, with area improvement averaging 17% when compared to Berkeley SIS 1.2. The run time is reduced by at least 50%. The resulting circuits also have good testability and power consumption properties.

19.2 Synthesis by Spectral Translation Using Boolean Decision Diagrams
Jeffery P. Hansen, Masatoshi Sekine

Many logic synthesis systems are strongly influenced by the size of the SOP (Sum-of-Products) representation of the function being synthesized. Two-level PLA (Programmable Logic Array) synthesis and many multi-level synthesis systems perform poorly without a good SOP representation of the target function. In this paper, we propose a new spectral-based algorithm using BDDs (Boolean Decision Diagram) to transform the target function into a form that is easier to synthesize by using a linear filter on the inputs. Using the methods described in this paper, we were able to perform spectral translation on circuits with many more inputs and much larger cube sets then previously possible. This can result in a substantial decrease in delay and area for some classes of circuits.

19.3 Delay Minimal Decomposition of Multiplexers in Technology Mapping
Shashidhar Thakur, D.F. Wong, S. Krishnamoorthy

Technology mapping requires the unmapped logic network to be represented in terms of base functions, usually two-input NORs and inverters. Technology decomposition is the step that transforms arbitrary networks to this form. Typically, such decomposition schemes ignore the fact that certain circuit elements can be mapped more efficiently by treating them separately during decomposition. Multiplexers are one such category of circuit elements. They appear very naturally in circuits, in the form of datapath elements and as a result of synthesis of CASE statements in HDL specifications of control logic. Mapping them using multiplexers in technology libraries has many advantages. In this paper, we give an algorithm for optimally decomposing multiplexers, so as to minimize the delay of the network, and demonstrate its effectiveness in improving the quality of mapped circuits.

19.4 Error Correction Based on Verification Techniques
Shi-Yu Huang, Kuang-Chien Chen, Kwang-Ting Cheng

In this paper, we address the problem of correcting a combinational circuit that is an incorrect implementation of a given specification. Most existing error-correction approaches can only handle circuits with certain types of errors. Here, we propose a general approach that can correct a circuit with multiple errors without assuming any error model. We identify internal equivalent pairs to narrow down the possible error locations using local BDD’s with dynamic support. We also employ a technique called back-substitution to correct the circuit incrementally. This approach can also be used to verify circuit equivalence. The experimental results of correcting fully SIS-optimized benchmark circuits with a number of injected errors will be presented.


SESSION 20 DESIGN FOR TESTABILITY

Chair: Yervant Zorian
Organizers : S. Kundu, J. Rajski

In an era of deep-submicron technology, this session discusses area efficient and timing driven techniques to design testable circuits.

20.1 Layout Driven Selecting and Chaining of Partial Scan Flip-Flops
Chau-Shen Chen, Kuang-Hui Lin and TingTing Hwang

In an era of sub-micron technology, routing is becoming a dominant factor in area, timing, and power consumption. In this paper, we will study the problem of selecting and chaining of scan ip-ops with the objective of achieving minimum routing area overhead. Most of previous work on partial scan has put emphasis on selecting as few scan ip-ops as possible to break all cycles in S-graph. However, the ip-ops that break more cycles are often the ones that have more fanins and fanouts. The area adjacent to these nodes is often congestive in layout. Such selections will cause layout congestion and increase in number of tracks to chain the scan ip-ops. We propose a matching-based algorithm to perform simultaneously the selecting and chaining of scan ip-op taking layout information into account. Experimental results show that our approach outperforms the traditional one in final layout area.

20.2 Test Point Insertion: Scan Paths through Combinational Logic
Chih-chang Lin, M. Marek-Sadowska, Kwang-Ting Cheng, Mike Tien-Chien Lee

We propose a low-overhead scan design methodology which employs a new test point insertion technique to establish scan paths through the functional logic. The technique re-uses the existing functional logic; as a result, the design-for-testability (DFT) overhead on area or timing can be minimized. In this paper we show an algorithm which considers the test point insertion for reducing the area overhead for the full scan design. We also discuss its application to timing-driven partial scan design.

20.3 Area Efficient Pipelined Pseudo-Exhaustive Testing with Retiming
Huoy-Yu Liou, Ting-Ting Y. Lin, Chung-Kuan Cheng

Pseudo-exhaustive testing (PET) offers a simple solution to testing complex circuits and systems [12]. However, PET suffers long testing time for test generation and high area overhead of test hardware. The pipelined pseudo-exhaustive testing (PPET) achieves fast testing time with high fault coverage by pipelining test vectors and test responses among partitioned circuit segments [15]. To reduce hardware overhead in PPET, a novel approach for implementing area-efficient PPET is presented. Circuit partitioning with retiming is used to convert designs for PPET. Experimental results show that this approach exhibits an average of 20% area reduction over non-retimed testable circuits. Our algorithm offers high utilization of existing flip-flops (FFs) and provides a framework for further performance optimization.


SESSION 21 ADVANCES IN ELECTRICAL SIMULATION

Chair: Peter Feldmann
Organizers: J. White, A.T. Yang

The first paper introduces pole analysis via congruence transformation (PACT) as a technique for multiport RC network reduction with the properties of absolute stability and controllable accuracy. The second paper presents a two-phase homotopy technique for obtaining DC operating point of large, hard-to-solve MOS circuits. The last paper presents a preconditioned recycled Krylon-subspace method to accelerate ac and noise simulation of linear periodically time-varying communication circuits.

21.1 Stable and Efficient Reduction of Large, Multiport RC Networks by Pole Analysis via Congruence Transformations
Kevin J. Kerns, Andrew T. Yang

A novel technique is presented which employs Pole Analysis via Congruence Transformations (PACT) to reduce RC networks in a well-conditioned manner. Pole analysis is shown to be more efficient than Padé approximations when the number of network ports is large, and congruence transformations preserve the passivity (and thus absolute stability) of the networks. Networks are represented by admittance matrices throughout the analysis, and this representation simplifies interfacing the reduced networks with circuit simulators as well as facilitates realization of the reduced networks using RC elements. A prototype SPICE-in, SPICE-out, network reduction CAD tool called RCFIT is detailed, and examples are presented which demonstrate the accuracy and efficiency of the PACT algorithm.

21.2 Homotopy Techniques for Obtaining a DC Solution of Large-Scale MOS Circuits
J.S. Roychowdhury, Robert C. Melville

A new technique for obtaining a DC operating point of large, hard-to-solve MOS circuits is reported in this paper. Based on homotopy, the technique relies on the provable global convergence of arc length continuation and uses a novel method for embedding the continuation parameter into MOS devices. The new embedding circumvents inefficiencies and numerical failures that limit the practical applicability of previous simpler embeddings. Use of the technique in a production environment has led to the routine solution of large, previously hard-to-solve circuits.

21.3 Efficient AC and Noise Analysis of Two-Tone RF Circuits
Ricardo Telichevesky, Kenneth S. Kundert, Jacob White

In this paper we present a preconditioned recycled Krylov-subspace method to accelerate a recently developed approach for ac and noise analysis of linear periodically-varying communication circuits. Examples are given to show that the combined method can be used to analyze switching filter frequency response, mixer 1/f noise frequency translation, and amplifier intermodulation distortion. In addition, it is shown that for large circuits the pre-conditioned recycled Krylov-subspace method is up to forty times faster than the standard optimized direct methods.


SESSION 22 MIXED SIGNAL DESIGN

Chair: Stephan Ohr
Organizers: S. Napper, R.A. Rutenbar

Mixed signal design is extremely difficult and has not benefited much from tool advances. After a short tutorial on analog synthesis techniques, a panel will address the state of mixed signal design and what is necessary for real design problem.

22.1 Tutorial: Synthesis Tools for Mixed Signal ICS: Progress on Frontend and Backend Strategies
L. Richard Carley, Georges G.E. Gielen, Rob A. Rutenbar, Willy M.C. Sa nsen

Digital synthesis tools such as logic synthesis and semi-custom layout have dramatically changed both the frontend (specification to netlist) and backend (netlist to mask) steps of the digital IC design process. In this tutorial, we look at the last decade’s worth of progress on analog circuit synthesis and layout tools. We focus on the frontend and backend of analog and mixed-signal IC design flows. The tutorial summarizes the problems for which viable solutions are emerging, and those which are still unsolved.

Panel: Mixed Signal Designs: Are There Solutions Today?
Panelists: Ariel Cao, Georges Gielen, Felicia James, Rob A. Rutenbar, Baker P. Scott, David Squires

While it is now possible to obtain real productivity enhancements using EDA tools to synthesize digital logic, the same productivity is hardly available for analog circuit designers. Under the best of circumstances, mixed analog-digital tools can provide a rough outline for the construction of new devices and circuits. More likely, the user of commercial tools will need to perform a considerable amount of "tuning-and-tweaking" just to get commercial tools to perform properly. Yet, the prospect for tool development remains hopeful. in this combination tutorials and panel discussion, a mixture of university researches, tool developers and users provide insight into the current state of art in EDA Tools for Mixed Analog-and-Digital Design.


SESSION 23 FUNCTIONAL VERIFICATION OF MICROPROCESSORS

Chair: Rajesh Raina
Organizers: N. Weste, P. Duncan

With microprocessors reaching multi-million transistor levels of integration, verifying that a design is functionally correct has become a critical problem. Papers in this session explore different avenues of functional verification.

23.1 Code Generation and Analysis for the Functional Verification of Microprocessors
Anoosh Hosseini, Dimitrios Mavroidis, Pavlos Konas

A collection of code generation tools which assist designers in the functional verification of high performance microprocessors is presented. These tools produce interesting test cases by using a variety of code generation methods including heuristic algorithms, constraint-solving systems, user-provided templates, and pseudo-random selection. Run-time analysis and characterization of the generated programs provide an evaluation of their effectiveness in verifying a microprocessor design, and suggest improvements to the code generation process. An environment combining the code generation tools with the analysis tools has been developed, and it has provided excellent functional coverage for several generations of high-performance microprocessors.

23.2 Innovative Verification Strategy Reduces Design Cycle Time for High-End Sparc Processor
Val Popescu, Bill McNamara

Superscalar processor developers are creatively leveraging best-in-class design verification tools to meet narrow market windows. Accelerated simulation is especially useful flowing to its flexibility for verifying at many points during the design cycle. A unique "verification backplane" makes continuous verification at any level(s) of abstraction available to each design team member throughout the design cycle.

23.3 Hardware Emulation for Functional Verification of K5
Gopi Ganapathy, Ram Narayan, Glenn Jorden, Denzil Fernandez, Ming Wang, Jim Nishimura

The K5 (TM) microprocessor is a 4 Millioon transistor, superscalar, X86 microprocessor. The K5(TM) microprocessor is an AMD original design, verifying compatibility with the existing X86 architecture and software is crucial to its success in the market place. The X86 architecture has been constantly evolving over several years without any published specification. The primary mechanism for functional design verification of an X86 processor is simulation. The ability to execute a good sample set of the X86 software based on a model of the processor architecture before tapeout, is key to achieving a very high confidence first silicon. The Quickturn Hardware Emulation system allows us to map a model of the design onto hardware resources and execute it at high speeds. In this paper we present the emulation methodology that was jointly developed for K5 and applied successfully to meet our functional verification goals.

23.4 Functional Verification Methodology for the PowerPC 604(TM) Microprocessor
James Monaco, David Holloway and Rajesh Raina

Functional (i.e., logic) verification of the current generation of complex, super-scalar microprocessors such as the PowerPC 604Ô microprocessor presents significant challenges to a project’s verification participants. Simple architectural level tests are insufficient to gain confidence in the quality of the design. Detailed planning must be combined with a broad collection of methods and tools to ensure that design defects are detected as early as possible in a projects life-cycle. This paper discusses the methodology applied to the functional verification of the PowerPC 604 microprocessor.

23.5 I'm Done Simulating; Now What? Verification Coverage Analysis and Correctness Checking of the DECchip 21164 Alpha Microprocessor
Michael Kantrowitz, Lisa M. Noack

Digital's Alpha-based DECchip 21164 processor was verified extensively prior to fabrication of silicon. This simulation-based verification effort used implementation-directed, pseudorandom exercisers which were supplemented with implementation-specific, hand-generated tests. Special emphasis was placed on the tasks of checking for correct operation and functional coverage analysis. Coverage analysis shows where testing is incomplete, under the assumption that untested logic often contains bugs. Correctness checkers are various mechanisms (both during and after simulation) that monitor a test to determine if it was successful. This paper details the coverage analysis and correctness checking techniques that were used. We show how our methodology and its implementation was successful, and we discuss the reasons why this methodology allowed several minor bugs to escape detection until the first prototype systems were available. These bugs were corrected before any chips were shipped to customers.


SESSION 24 HIGH LEVEL POWER OPTIMIZATION

Chair: David Knapp
Organizers: R. Camposano, R. Walker

Power optimization can be addressed in multiple ways at the high level. This session presents several interesting approaches: glitch minimization reports a power reduction of 20%, clocking schemes allow power savings up to 50%, transformations in linear systems reduce power 3 to 15 times and scheduling techniques can save up to 40% in power. Furthermore, reducing switching activity not only minimizes energy dissipation, but maximizes the MTF due to electromigration as well.

24.1 Glitch Analysis and Reduction in Register Transfer Level Power Optimization
Anand Raghunathan, Sujit Dey, Niraj K. Jha

We present design-for-low-power techniques based on glitch reduction for register-transfer level circuits. We analyze the generation and propagation of glitches in both the control and data path parts of the circuit. Based on the analysis, we develop techniques that attempt to reduce glitching power consumption by minimizing generation and propagation of glitches in the RTL circuit. Our techniques include restructuring multiplexer networks (to enhance data correlations, eliminate glitchy control signals, and reduce glitches on data signals), clocking control signals, and inserting selective rising/falling delays. Our techniques are suited to control-flow intensive designs, where glitches generated at control signals have a significant impact on the circuits power consumption, and multiplexers and registers often account for a major portion of the total power. Application of the proposed techniques to several examples shows significant power savings, with negligible area and delay overheads.

24.2 An Effective Power Management Scheme for RTL Design Based on Multiple Clocks
C. Papachristou, M. Spinning, M. Nourani

This paper presents an effective technique of low power design for RTL circuits and microarchitectures. The basis of this technique is: a) to use a multiple clocking scheme of n non-overlapping clocks, by dividing the frequency f of a single clock into n cycles; b) to partition the circuit into disjoint modules and assign each module to a distinct clock with frequency f/n. However, the overall effective frequency of the circuit remains f the single clock frequency. The results show that our multiple clocking scheme provides more effective power management (power savings up to 50%) at the RTL in comparison to conventional power management techniques based on gated clocks.

24.3 Power Optimization in Programmable Processors and ASIC Implementations of Linear Systems: Transformation-based Approach
Mani Srivastava, Miodrag Potkonjak

Linear computations form an important type of computation that is widely used in DSP and communications. We introduce two approaches for power minimization in linear computations using transformations. First we show how unfolding combined with the procedure for maximally fast implementation of linear computations reduces power in single processor and multiprocessor implementations by factors 2.2 and 8 respectively. To accomplish this we exploit a newly identified property of unfolding whereby as a linear system is unfolded, the number of operations per sample at first decreases to reach a minimum and then begins to rise. For the custom ASIC implementation even higher improvements are achievable using the second transformational approach, which builds upon the unfolding based strategy of the first approach. We developed a method that combines the multiple constant multiplication (MCM) technique with the generalized Horner’s scheme and unfolding in such a way that power is minimized.

24.4 Scheduling Techniques to Enable Power Management
José Monteiro, Srinivas Devadas, Pranav Ashar, Ashutosh Mauskar

"Shut-down" techniques are effective in reducing the power dissipation of logic circuits. Recently, methods have been developed that identify conditions under which the output of a module in a logic circuit is not used for a given clock cycle. When these conditions are met, input latches for that module are disabled, thus eliminating any switching activity and power dissipation. In this paper, we introduce these power management techniques in behavioral synthesis. We present a scheduling algorithm which maximizes the “shut-down period of execution units in a system. Given a throughput constraint and the number of execution units available, the algorithm first schedules operations that generate controlling signals and activates only those modules whose result is eventually used. We present results which show that this scheduling technique can save up to 40% in power dissipation.

24.5 Electromigration Reliability Enhancement Via Bus Activity Distribution
Aurobindo Dasgupta, Ramesh Karri

Electromigration induced degradation in integrated circuits has been accelerated by continuous scaling of device dimensions. We present a methodology for synthesizing high-reliability and low-energy microarchitectures at the RT level by judiciously binding and scheduling the data transfers of a control data flow graph(CDFG) representation of the application onto the buses in the microarchitecture. The proposed method accounts for correlations between data transfers and the constraints on the number of buses, area and delay.


SESSION 25 3-D PARASITIC EXTRACTION

Chair: Andrew T. Yang
Organizers: J. White, A. Yang

In deep submicron designs, three-dimensional parasitics and electromagnetic coupling are critical effects to be modeled and managed. This session offers a cross-section of the latest numerical techniques for efficient, accurate extraction of these effects.

25.1 A Sparse Image Method for BEM Capacitance Extraction
Byron Krauter, Yu Xia, Aykut Dengi, Lawrence T. Pileggi

—Boundary element methods (BEM) are often used for complex 3-D capacitance extraction because of their efficiency, ease of data preparation, and automatic handling of open regions. BEM capacitance extraction, however, yields a dense set of linear equations that makes solving via direct matrix methods such as Gaussian elimination prohibitive for large problem sizes. Although iterative, multipole-accelerated techniques have produced dramatic improvements in BEM capacitance extraction, accurate sparse approximations of the electrostatic potential matrix are still desirable for the following reasons. First, the corresponding capacitance models are sufficient for a large number of analysis and design applications. Moreover, even when the utmost accuracy is required, sparse approximations can be used to precondition iterative solution methods. In this paper, we propose a definition of electrostatic potential that can be used to formulate sparse approximations of the electrostatic potential matrix in both uniform and multilayered planar dielectrics. Any degree of sparsity can be obtained, and unlike conventional techniques which discard the smallest matrix terms, these approximations are provably positive definite for the troublesome cases with a uniform dielectric and without a groundplane.

25.2 A Parallel Precorrected FFT Based Capacitance Extraction Program for Signal Integrity Analysis
N.R. Aluru, V.B. Nadkarni, J. White

—In order to optimize interconnect to avoid signal integrity problems, very fast and accurate 3-D capacitance extraction is essential. Fast algorithms, such as the multipole or precorrected Fast Fourier Transform (FFT) accelerated methods in programs like FASTCAP, must be combined with techniques to exploit the emerging cluster-of-workstation based parallel computers like the IBM SP2. In this paper, we examine parallelizing the precorrected FFT algorithm for 3-D capacitance extraction and present several algorithms for balancing workload and reducing communication time. Results from a prototype implementation on an eight processor IBM SP2 are presented for several test examples, and the largest of these examples achieves nearly linear parallel speed-up.

25.3 Multipole Accelerated Capacitance Calculation for Structures with Multiple Dielectrics with High Permitivity Ratios
Johannes Tausch, Jacob White

This paper describes a new boundary integral formulation for the three-dimensional capacitance calculation of structures with multiple dielectrics. Unlike the existing equivalent-charge formulation, the new approach allows accurate numerical approximations when the permittivity ratios of the dielectrics are large. A multipole-accelerated algorithm based on this approach is described, and numerical results of its implementation are presented.

25.4 Fast Parameters Extraction of General Three-Dimension Interconnects Using Geometry Independent Measured Equation of Invariance
Weikai Sun, Wayne Wei-Ming Dai and Wei Hong

Measured Equation of Invariance(MEI) is a new concept in computational electromagnetics. It has been demonstrated that the MEI technique can be used to terminate the meshes very close to the object boundary and still strictly preserves the sparsity of the FD equations. Therefore, the final system matrix encountered by MEI is a sparse matrix with size similar to that of integral equation methods. However, complicated Green's function and disagreeable Sommerfeld integrals make the traditional MEI very difficult, if not impossible, to be applied to analyze multilayer and multiconductor interconnects. In this paper, we propose the Geometry Independent MEI(GIMEI) which substantially improved the original MEI method. We use GIMEI for capacitance extraction of general three-dimension VLSI interconnect. Numerical results are in good agreement with published data and those obtained by using FASTCAP [1], while GIMEI is generally an order of magnitude faster than FASTCAP and uses significant less memory than FAST-CAP.

25.5 Efficient Full-Wave Electromagnetic Analysis Via Model-Order Reduction of Fast Integral Transforms
Joel R. Phillips, Eli Chiprout, David D. Ling

An efficient full-wave electromagnetic analysis tool would be useful in many aspects of engineering design. Development of integral-equation based tools has been hampered by the high computational complexity of dense matrix representations and difficulty in obtaining and utilizing the frequency-domain response. In this paper we demonstrate that an algorithm based on application of a novel model-order reduction scheme directly to the sparse model generated by a fast integral transform has significant advantages for frequency- and time-domain simulation.


SESSION 26 ROUTING OPTIMIZATION FOR PERFORMANCE

Chair: M. Marek-Sadowska
Organizers: A.B. Kahng, Y.-L. Lin

This session presents techniques for topological design, buffer sizing and wire sizing targeted towards performance, power, area and skew optimization of clock and signal routing.

26.1 Useful-Skew Clock Routing with Gate Sizing for Low Power Designs
Joe G. Xi, Wayne Wei-Ming Dai

Instead of zero-skew or assuming a fixed skew bound, we seek to produce useful skews in clock routing. This is motivated by the fact that negative skew may allow a larger timing budget for gate sizing. We construct a useful-skew tree (UST) such that the total clock and logic power(measured as a cost function) is minimized. Given a required clock period and feasible gate sizes, a set of negative and positive skew bounds are generated. The allowable skews within these bounds and feasible gate sizes form the feasible solution space of our problem. We use a merging segment perturbation procedure and a simulated annealing approach to explore various tree configurations. This is complemented by a bi-partitioning heuristic to generate appropriate connection topology and take advantage of useful skews. Experimental results have shown 11% to 22% total power reduction over previous methods of clock routing with zero-skew or single fixed skew bound and separately sizing logic gates.

26.2 Sizing of Clock Distribution Networks for High Performance CPU Chips
Madhav P. Desai, Radenko Cvijetic, James Jensen

In a high performance microprocessor such as Digital's 300MHz Alpha 21164, the distribution of a high quality clock signal to all regions of the device is achieved using a complex grid with multiple drivers. The large capacitance of this distribution grid together with the high clock frequency results in substantial power dissipation in the chip. In this paper, we describe techniques to size the interconnect segments (thus reducing their capacitance) of the distribution network while meeting certain design goals. These techniques place no restrictions on the topology of the network being sized, and have been successfully used on very large examples.

26.3 New Performance Driven Routing Techniques With Explicit Area/Delay Tradeoff and Simultaneous Wire Sizing
John Lillis, Chung-Kuan Cheng, Ting-Ting Y. Lin, Ching-Yen Ho

We present new algorithms for construction of performance driven Rectilinear Steiner Trees under the Elmore delay model. Our algorithms represent a departure from previous approaches in that we derive an explicit area/delay trade-off curve. We achieve this goal by limiting the solution space to the set of topologies induced by a permutation on the sinks of the net. This constraint allows efficient identification of optimal solutions while still providing a rich solution space. We also incorporate simultaneous wire sizing. Our technique consistently produces topologies equaling the performance of previous approaches with substantially less area overhead.

26.4 Constructing Lower and Upper Bounded Delay Routing Trees Using Linear Programming
Jaewon Oh, Iksoo Pyo, Massoud Pedram

This paper presents a new approach for solving the Lower and Upper Bounded delay routing Tree (LUBT) problem using linear programming. LUBT is a Steiner tree rooted at the source node such that delays from the source to sink nodes lie between the given lower and upper bounds. We show that our proposed method produces minimum cost LUBT for a given topology under a linear delay model. Unlike recent works which control only the difference between the maximum and the minimum source-sink delay, we construct routing trees which satisfy distinct lower and upper bound constraints on the source-sink delays. This formulation exploits all the flexibility that is present in low power and high performance clock routing tree design.

26.5 Fast Performance-Driven Optimization for Buffered Clock Trees Based on Lagrangian Relaxation
Chung-Ping Chen, Yao-Wen Chang, D.F. Wong

Delay, power, skew, area, and sensitivity are the most important concerns in current clock-tree design. We present in this paper an algorithm for simultaneously optimizing the above objectives by sizing wires and buffers in clock trees. Our algorithm, based on Lagrangian relaxation method, can optimally minimize delay, power, and area simultaneously with very low skew and sensitivity. With linear storage overall and linear runtime per iteration, our algorithm is extremely economical, fast, and accurate; for example, our algorithm can solve a 6201-wire-segment clock-tree problem using about 1-minute runtime and 1.3-MB memory and still achieve pico-second precision on an IBM RS/6000 workstation.


SESSION 27 TUTORIAL: HOW TO WRITE AWK AND PERL SCRIPTS TO ENABLE YOUR EDA TOOLS TO WORK TOGETHER

Chair: Shankar Hemmady
Organizer: J. Cooley
Presenters: Robert C. Hutchins, Shankar Hemmady

This tutorial introduces the Awk, Nawk, and Perl scripting languages, and shows how they can be used to massage a Synopsys Verilog netlist to allow simulation using Cadence's Verilog-XL.


SESSION 28 FUNCTIONAL VERIFICATION TECHNIQUES

Chair: Neil Weste
Organizers: B. Frye, D. Stark

With the increasing complexity of systems, determining that a given chip or collection of chips perform the required function, has become an increasingly acute problem. Thus, functional verification has become a critical design activity. This session contains three papers that describe a variety of verification techniques designed to prove that the functionality of a given system is indeed that indicated in the original specification.

28.1 The Automatic Generation of Functional Test Vectors for Rambus Design
K.D. Jones, J.P. Privitera

We present a method for the automatic generation of test vectors for functional verification, giving the advantages of random and directed testing. We show the use of a formal specification as input to a test generator. We present techniques for the efficient implementation of the generator. We discuss our experience with this method applied to commercial designs. We show how our approach is a stepping stone towards practical formal verification.

28.2 Functional Verification Methodology of Chameleon Processor
F. Casaubleih, A. Mclsaac, F. Rocheteau, M. Bartley, M. Benjamin, M. Belhadj, J. Eggleton, F. Pogodalla, G. Barrett, G. Mas, C.Berthet

Functional verification of the new generation microprocessor developed by SGS-THOMSON Microelectronics makes extensive use of advanced technologies. This paper presents a global overview of the methodology and focuses on three main aspects :

  • Use of acceleration and emulation technologies for the verification of the VHDL specification in the early stages of the design.
  • Development and use of sequential verification methods built upon a commercially available formal proof tool.
  • Extensive use of combinational proof for circuit-level verification, in conjunction with transistor abstraction.

28.3 Experience in Designing A Large-scale Multiprocessor Using Field-Programmable Devices and Advanced CAD Tools
S. Brown, N. Manjikian, Z. Vranesic, S. Caranci, S. Grbic, R. Grindley, M. Gusat,K. Loveless, Z. Zilic, and S. Srbljic

This paper provides a case study that shows how a demanding application stresses the capabilities of todays CAD tools, especially in the integration of products from multiple vendors. We relate our experiences in the design of a large, high-speed multi-processor computer, using state of the art CAD tools. All logic circuitry is targeted to field-programmable devices (FPDs). This choice amplifies the difficulties associated with achieving a high-speed design, and places extra requirements on the CAD tools. Two main CAD systems are discussed in the paper: Cadence Logic Workbench (LWB) is employed for board-level design, and Altera MAX+plusII is used for implementation of logic circuits in FPDs. Each of these products is of great value for our project, but the integration of the two is less than satisfactory. The paper describes a custom procedure that we developed for integrating sub-designs realized in FPDs (via MAX+plusII) into our board-level designs in LWB. We also discuss experiences with Logic Modelling Smart Models, for simulation of FPDs and other types of chips.


SESSION 29 POWER ESTIMATION

Chair: Lawrence T. Pileggi
Organizers: K. Sakallah, P. McGeer

Low power design has become a major concern for portable and mobile electronic devices. The papers in this session propose a variety of approaches for power estimation that aim at satisfying the twin goals of accuracy and efficiency.

29.1 Power Estimation of Cell-Based CMOS Circuits
Alessandro Bogliolo, Luca Benini, Bruno Riccó

PPP is a Web-based simulation and synthesis environment for low-power design. In this paper we describe the gate-level simulation engine of PPP, that achieves accuracy always within 6% from SPICE, while keeping performance competitive with traditional gate-level simulation. This is done by using advanced symbolic models of the basic library cells, that exploit the physical understanding of the main power-consuming phenomena. VERILOG-XL is used as simulation platform to maintain compatibility with design environments. The Web-based interface allows the user to remotely access and execute the simulator using his/her own Web-browser (without the need of any software installation).

29.2 A New Hybrid Methodology for Power Estimation
David Ihsin Cheng, Kwang-Ting Cheng, Deborah C. Wang, Malgorzata Marek-Sadowska

In this paper, we propose a hybrid approach for estimating the switching activities of the internal nodes in logic circuits. The new approach combines the advantages of the simulation-based techniques and the probability-based techniques. We use the user-specified control sequence for simulation and treat the weakly correlated data inputs using the probabilistic model. The new approach, on one hand, is more accurate than the probabilistic approaches because the strong temporal and spatial correlations among control inputs are well taken into consideration. On the other hand, the new approach is much more efficient than the simulation-based approaches because the weakly correlated data inputs are not explicitly simulated. In addition, we also propose a heuristic that builds BDDs in terms of internal nodes such that large circuits can be handled. Extensive experimental results are presented to show the effectiveness and efficiency of our algorithms.

29.3 A Statistical Approach to the Estimation of Delay-Dependent Switching Activities In CMOS Combinational Circuits
Yong Je Lim, Kyung-Im Son, Heung-Joon Park, Mani Soma

This paper describes a new procedure for estimating the delay-dependent switching activities in CMOS combinational circuits. The procedure is based on analytic and statistical techniques to take advantage of their time-efficiency over conventional logic simulators. Combinational circuits driven by synchronized logic signals are considered as application targets and the statistical properties of logic signals and circuit parameters are defined and evaluated. The experimental result on benchmark circuits shows the significant time efficiency of the proposed procedure.


SESSION 30 OPTIMIZATION OF SEQUENTIAL CIRCUITS

Chair: Gary D. Hachtel
Organizers: F. Somenzi, B. Lin

Optimization of sequential circuits is performed at both the structural and the state transition graph level. This session presents new approaches to several optimization tasks: state minimization, redundancy removal, and rectification.

30.1 Engineering Change in a Non-Deterministic FSM Setting
Sunil P. Khatri, Amit Narayan, Sriram C. Krishnan, Kenneth L. McMillan, Robert K. Brayton, A. Sangiovanni-Vincentelli

We propose a new formalism for the Engineering Change (EC) problem in a finite state machine (FSM) setting. Given an implementation that violates the specification, the problem is to alter the behavior of the implementation so that it meets the specification. The implementation can be a pseudo-nondeterministic FSM while the specification may be a nondeterministic FSM. The EC problem is cast as the existence of an appropriate simulation reaction from the implementation into the specification. We derive the necessary and sufficient conditions for the existence of a solution to the problem. We synthesize all possible solutions, if the EC is feasible. Our algorithm works in space which is linear, and time which is quadratic, in the product of the sizes of implementation and specification. Previous formulations of the problem which admit nondeterministic specifications, although more general, lead to an algorithm which is exponential. We have implemented our procedure using Reduced Ordered Binary Decision Diagrams.

30.2 Identifying Sequential Redundancies Without Search
Mahesh A. Iyer, David E. Long, Miron Abramovici

Previous solutions to the difficult problem of identifying sequential redundancy are either based on incorrect theoretical results, or rely on unrealistic simplifying assumptions, or are applicable only to small circuits. In this paper, we show the limitations of the existing definitions of sequential redundancy and introduce a new concept of cycle redundancy as a generalization of the conventional notion of sequential redundancy. We present an efficient algorithm, FIRES, to identify cycle redundancies without search. FIRES does not assume the existence of a global reset nor does it require any state transition information. FIRES has provably polynomial-time complexity and is practical for large circuits. Experimental results on benchmark circuits indicate that FIRES identifies a large number of redundancies. We show that, in general, the redundant faults identified by FIRES are not easy targets for state-of-the-art sequential test generators.

30.3 A Fast State Reduction Algorithm for Incompletely Specified Finite State Machines
Hiroyuki Higuchi, Yusuke Matsunaga

This paper proposes a state reduction algorithm for incompletely specified FSMs. The algorithm is based on iterative improvements. When the number of compatibles is likely to be too large to handle explicitly, they are represented by a BDD. Experimental results are given to demonstrate that the algorithm described here is faster and obtains better solutions than conventional methods.

30.4 Symbolic Optimization of FSM Networks Based on Sequential ATPG Techniques
Fabrizio Ferrandi, Franco Fummi, Enrico Macii, Massimo Poncino, Donatella Sciuto

This paper presents a novel optimization algorithm for FSM networks that relies on sequential test generation and redundancy removal. The implementation of the proposed approach, which is based on the exploitation of input don't care sequences through regular language intersection, is fully symbolic. Experimental results, obtained on a large set of standard benchmarks, improve over the ones of state-of-the-art methods.


SESSION 31 TOPICS IN PHYSICAL DESIGN

Chair: Lou Scheffer
Organizers: A.B. Kahng, A. Domic

Among the new topics introduced in this session: a new partitioning objective for hierarchical designs, datapath layout into FPGA devices, and two theoretical results on floorplan area minimization and performance-driven wiresizing.

31.1 Module Compaction in FPGA-based Regular Datapaths
Andreas Koch

When relying on module generators to implement regular datapaths on FPGAs, the coarse granularity of FPGA cells can lead to area and delay inefficiencies. We present a method to alleviate these problems by compacting adjacent modules using structure extraction, local logic synthesis, and cell replacement. The regular datapath structure is exploited and preserved, achieving faster layouts after shorter tool run-times.

31.2 Network Partitioning into Tree Hierarchies
Ming-Ter Kuo, Lung-Tien Liu, Chung-Kuan Cheng

This paper addresses the problem of partitioning a circuit into a tree hierarchy with an objective of minimizing a global interconnection cost. An efficient and effective algorithm is necessary when the circuit is huge and the tree has many levels of hierarchy. We propose a heuristic algorithm for improving a partition with respect to a given tree structure. The algorithm utilizes the tree hierarchy as an efficient mechanism for iterative improvement. We also extend the tree hierarchy to apply a multi-phase partitioning approach. Experimental results show that the algorithm significantly improves the initial partitions produced by multiway partitioning and by recursive partitioning.

31.3 Efficient Approximation Algorithms for Floorplan Area Minimization
Danny Z. Chen, Xiaobo (Sharon) Hu

Approximation has been shown to be an effective method for reducing the time and space costs of solving various floorplan area minimization problems. In this paper , we present several approximation techniques for solving floorplan area minimization problems. These new techniques enable us to reduce both the time and space complexities of the previously best known approximation algorithms by more than a factor of n and n 2 for rectangular and L-shaped sub-floorplans, respectively (where n is the number of given implementations). The efficiency in the time and space complexities is critical to the applicability of such approximation techniques in floorplan area minimization algorithms. We also give a technique for enhancing the quality of approximation results.

31.4 Optimal Wire-Sizing Formula Under Elmore Delay Model
Chung-Ping Chen, Yao-Ping Chen, D.F. Wong

In this paper, we consider non-uniform wire-sizing. Given a wire segment of length L, let f(x) be the width of the wire at position x, 0 <= x <= L. We show that the optimal wire-sizing function that minimizes the Elmore delay through the wire is f(x) =ae ^(-by), where y > 0 and b > 0 are constants that can be computed in O(1) time. In the case where lower bound (L > 0) and upper bound (U > 0) on the wire widths are given, we show that the optimal wire-sizing function f(x) is a truncated version of ae ^(-by) that can also be determined in O(1) time. Our wire-sizing formula can be iteratively applied to optimally size the wire segments in a routing tree.


SESSION 32 CONSUMER PRODUCT DESIGN

Chair: Takayasu Sakurai
Organizers: T. Sakurai, S. Trimberger

Recently LSI devices are finding more applications in high volume consumer products. The papers in this session describe the design methods of several consumer products, including a mini disc player, a HDTV decoder, a digital VCR, a TV tuner and a satellite broadcasting receiver. Constraints, issues and solutions for digital and analog design are presented using real industry design examples.

32.1 VLSI Design and System Level Verification for the Mini-Disc
Tetsuya Fujimoto, Takashi Kambe

In this paper, a new method for the design of complex multimedia ASIC is introduced. Using this design method, VLSI with embedded software and high-speed emulators can be developed concurrently. The method has proven to be effective through actual design of VLSI for audio compression and decompression in a Mini-Disc system.

32.2 Design Methodologies for Consumer-Use Video Signal Processing LSIS
Hisakazu Edamatsu, Satoshi Ikawa, Katsuya Hasegawa

This paper describes the methodologies used to design a Hi-Vision MUSE decoder for Japanese HDTV and codec LSIs for digital VCRs. Since a large amount of input video data is needed to verify the algorithms and logic designs, reducing the verification time is a key issue in designing these LSIs. We describe the methodology used to verify the video signal processing algorithm and that of the physical design.

32.3 Design Methodology for Analog High Frequency ICs
Yasunori Miyahara, Toshitomo Oumi, Seijiro Moriyama

This paper presents a methodology suited for high frequency analog IC design. The use of a top-down method with AHDL for circuit designers is proposed. In order to accelerate the re-use of circuits that were previously designed and validated in other ICs, the authors developed a system that eases the re-use in the top-down design environment. Moreover, a model parameter generation technique for bipolar transistors has been developed and its usefulness has been shown for accurate simulation of high frequency analog ICs.


SESSION 33 TUTORIAL: ISSUES AND ANSWERS IN CAD TOOL INTEROPERABILITY

Chair: Mike Murray
Organiazer: M. Murray
Presenters: Mike Murray, Uwe B. Meding, Bill Berg, Yatin Trivedi, Bill McCaffrey, Ted Vucurevich

CAD tool interoperability issues are a recurring impediment to constructing a design methodology, especially if the methodology incorporates point tools from several vendors. Failures in data transfer between tools often arise unexpectedly, and can delay design work until a workaround is developed. This paper examines interoperability issues for four classes of tools: schematic tools, simulation and synthesis tools, IC physical design tools, and workflow management tools. This paper also describes current research in modeling interoperability problems. Using this paper, the reader can develop a checklist of potential interoperability issues in his CAD environment, and address these issues before they cause a design schedule slip.


SESSION 34 HARDWARE-SOFTWARE CO-DESIGN

Chair: Frank Vahid
Organizers: R. Gupta, L. Lavagno

Co-design refers to joint design and optimization of hardware and software in system design. We begin with a tutorial on tools and methodologies for various co-design sub-problems. This is followed by a presentation of the architectural strategies for system co-design. System partitioning, or the determination of the boundary between hardware and software, is the subject of the last paper in this session.

34.1 Tutorial: The Design of Mixed Hardware/Software Systems
Jay K. Adams, Donald E. Thomas

Over the past several years there has been a great deal of interest in the design of mixed hardware/software systems, sometimes referred to as hardware/software co-design or hardware/software co-synthesis. However, although many new design methodologies have taken the name hardware/software co-design, they often do not seem to share much in common with one another. This partly due to the fact that the problem itself has so many dimensions. This tutorial describes a set of criteria that can be used to compare differing approaches to hardware/software co-design. These criteria are used in the discussion of a number of published hardware/software co-design techniques to illustrate how a wide range of approaches can be viewed within a single framework.

34.2 Constructing Application-Specific Heterogeneous Embedded Architectures from Custom HW/SW Applications
Steven Vercauteren, Bill Lin, Hugo De Man

Deep sub-micron processing technologies have enabled the implementation of new application-specific embedded architectures that integrate multiple software programmable processors (e.g. DSPs, microcontrollers) and dedicated hardware components together onto a single cost-efficient IC. These application-specific architectures are emerging as a key design solution to today's microelectronics design problems, which are being driven by emerging applications in the areas of wireless communication, broadband networking, and multimedia computing. However, the construction of these customized heterogeneous multiprocessor architectures, while ensuring that the hardware and software parts communicate correctly, is a tremendously difficult and highly error proned task with little or no tool support. In this paper, we present a solution to this embedded architecture co-synthesis problem based on an orchestrated combination of architectural strategies, parameterized libraries, and software tool support.

34.3 A Hardware/Software Partitioning Algorithm for Designing Pipelined ASIPs with Least Gate Counts
Nguyen Ngoc Binh, Masaharu Imai, Akichika Shiomi, Nobuyuki Hikichi

This paper introduces a new HW/SW partitioning algorithm used in automating the instruction set processor design for pipelined ASIP (Application Specific Integrated Processor). The partitioning problem is formalized as a combinatorial optimization problem that partitions the operations into hardware and software so that the HW cost (gate count) of the designed pipelined ASIP is minimized under given execution cycle and power consumption constraints. A branch-and-bound algorithm with proposed lower bound functions is used to solve the presented formalization in the PEAS-I system. The experimental results show that the proposed method is found to be effective and efficient.


SESSION 35 TIMING AND POWER

Chair: Andrew T. Yang
Organizers: J. White, A.T. Yang

This session presents some of the recent work on timing/power analysis and simulation. The first paper gives a method for calculating the time-domain response for a finite - length distributed RC line stimulated by a ramp input. Moment-matching approximation methods are employed for PCB delay estimation in the second paper, and for transistor-level timing simulation in the third paper. The last paper presents an electrothermal system to find steady-state on-chip temperature distribution.

35.1 Analysis of RC Interconnections Under Ramp Input
Andrew B. Kahng, Sudhakar Muddu

We present a general and, in the limit, exact approach to compute the time-domain response for finite-length RC lines under ramp input, by summing distinct diffusions starting at either end of the line. We also obtain analytical expressions for the finite time-domain voltage response for an open-ended finite RC line and for a finite RC line with capacitive load. Delay estimates using our new method are very close to SPICE-computed delays. Finally, we present a general recursive equation for computing the higher-order diffusion components due to reflections at the source and load ends. Future work extends our method to response computations in general interconnection trees by modeling both reflection and transmission coefficients at discontinuities.

35.2 An AWE Technique for Fast Printed Circuit Board Delays
Bernie Sheehan

We present a technique for rapidly calculating printed circuit board (PCB) interconnect delays. Because of ringing, plateaux, and other signal integrity effects, delay estimation with low order macromodels is more difficult for PCB nets than for VLSI interconnect. A moment matching method for interconnect trees is described in which moments are computed by a tree traversal algorithm using ABCD matrices. Moment contributions from distributed transmission lines are calculated directly. This is a simplification compared to methods which must first approximate the line as a lumped RLC ladder; it also makes time of flight extraction trivial. Order descent is used to ensure stability. When applied to a real PCB design, the method was accurate and impressively fast--588 nets, or 4496 delays, in 12 seconds on a PC!

35.3 RC-Interconnect Macromodels for Timing Simulation
Florentin Dartu, Bogdan Tutuianu, Lawrence T. Pileggi

Most timing simulators obtain their efficiency over circuit simulation in terms of explicit integration algorithms that have difficulty handling the stiff RC circuit models which characterize interconnect-dominated paths. In this paper we describe a reduced-order N-port interconnect macromodel for timing simulation. This macromodel is shown to improve the timing simulation efficiency dramatically since it alleviates the stiff circuit problem. Moreover, through its compatibility with the simple timing simulation transistor models, it is shown that this macromodel does not suffer from the dramatic increase in complexity with an increase in the number of ports like circuit simulation.

35.4 iCET: A Complete Chip-Level Thermal Reliability Diagnosis Tool for CMOS VLSI Chips
Yi-Kan Cheng, Chin-Chi Teng, Abhijit Dharchoudhury, Elyse Rosenbaum, Sung-Mo Kang

In this paper, we present the first chip-level electrothermal simulator, iCET. For a given chip layout, packaging material, user-specified input signal patterns , and thermal boundary conditions, it automatically finds the CMOS on-chip steady-state temperature profile and the resulting circuit performance. iCET has been tested on several circuits and it can efficiently analyze layouts containing tens of thousands of transistors on a desktop workstation.


SESSION 36 VERIFICATION OF SEQUENTIAL SYSTEMS

Chair: Randal E. Bryant
Organizers: F. Somenzi, a. Kuehlmann

This session presents various approaches to verify sequential systems by exploiting structural properties. Two papers address the verification of pipelines; one deals with multiprocessors; and the last one is concerned with embedded systems.

36.1 Techniques for Verifying Superscalar Microprocessors
Jerry R. Burch

Burch and Dill [3] described an automatic method for verifying a pipelined processor against its instruction set architecture (ISA). We describe three techniques for improving this method. We show how the combination of these techniques allows for the automatic verification of the control logic of a pipelined, superscalar implementation of a subset of the DLX architecture.

36.2 A Scalable Formal Verification Methodology for Pipelined Microprocessors
Jeremy Levitt, Kunle Olukotun

We describe a novel, formal verification technique for proving the correctness of a pipelined microprocessor that focuses specifically on pipeline control logic. We iteratively deconstruct a pipeline by merging adjacent pipeline stages, allowing for the verification to be done in several easier steps. We present an inductive proof methodology that verifies that pipeline behaviour is preserved as the pipeline depth is reduced via deconstruction; this inductive approach is less sensitive to pipeline depth and complexity than previous approaches. Invariants are used to simplify the proof, and datapath components are abstracted using validity checking with uninterpreted functions. We present experimental results from the formal verification of a DLX five-stage pipeline using our technique.

36.3 State Reduction Using Reversible Rules
C. Norris Ip, David L. Dill

We reduce the state explosion problem in automatic verification of finite-state systems by automatically collapsing subgraphs of the state graph into abstract states. The key idea of the method is to identify state generation rules that can be inverted. It can be used for verification of deadlock-freedom, error and invariant checking and stuttering-invariant CTL model checking.

36.4 Formal Verification of Embedded Systems based on CFSM Networks
Felice Balarin, Harry Hsieh, Attila Jurecska, Luciano Lavagno, Alberto Sangiovanni-Vincentelli

Both timing and functional properties are essential to characterize the correct behavior of an embedded system. Verification is in general performed either by simulation, or by bread-boarding. Given the safety requirements of such systems, a formal proof that the properties are indeed satisfied is highly desirable. In this paper, we present a formal verification methodology for embedded systems. The formal model for the behavior of the system used in POLIS is a network of Codesign Finite State Machines. This model is translated into automata, and verified using automata-theoretic techniques. An industrial embedded system is verified using the methodology. We demonstrate that abstractions and separation of timing and functionality is crucial for the successful use of formal verification for this example. We also show that in POLIS abstractions and separation of timing and functionality can be done by simple syntactic modification of the representation of the system.


SESSION 37 PANEL: ELECTRONIC CONNECTIVITY + EDA DATA = ELECTRONIC COMMERCE

Chair: Sean Murphy
Organizers: S. Murphy
Panelists: Jeff Allison, Jake Karrfalt, Michael McClure, Preston Roper, Dennis Wilson

EDA tools and EDA users are increasingly looking to the Internet for library information, bug fixes, and sending final design data electronically to ASIC foundries and board houses. This panel explores the impact the Internet has made on current electronic design practices and how you as a designer can benefit.


SESSION 38 EXPERIENCE WITH HIGH LEVEL SYNTHESIS

Chair: Rajiv Jain

Organizers: S. Trimberger, P. Duncan

The scope of design domains targeted by high level design techniques has grown rapidly in recent years. These papers present real-life experiences with high level design tools in varied application areas. Find out which techniques match your high level design problems.

38.1 Combined Control Flow Dominated and Data Flow Dominated High-Level Synthesis
E. Berrebi, P. Kission, S. Vernalde, S. De Troch, J.C. Herluison, J. Fréhel, A.A. Jerraya, I. Bolsens

This paper presents the design of a Videophone Coder-Decoder Motion Estimator using two High-Level Synthesis tools. Indeed, the combination of a Control Flow Dominated part (complex communication protocol) with a Data Flow Dominated part (high throughput computations) makes this circuit difficult to be synthesized by a single HLS tool. The combination of two HLS tools for the high-level design of this operator required the definition of a sophisticated design flow allowing mixed-level and multi-language simulations. When compared to design starting from RTL specifications, HLS induces only a negligible area overhead of 5%, and provides gain in description length (divided by 5), design time and flexibility.

38.2 FADIC: Architectural Synthesis Applied in IC Design
J. Huisken, F. Welten

This paper discusses the design of a chip using architectural synthesis. The chip, FADIC, is applied in Digital Audio Broadcasting (DAB) receivers. It shows that architectural synthesis tools are used for the design of new complex applications and that it supports the evolutionary development of challenging applications like DAB. It was found that the success of such tools in the design community depends on the way user interaction is supported and stimulated . Fast and accurate feedback from the synthesis tools in combination with a rich set of hints for the compiler to guide the architecture exploration are the key issues. It is shown that short time to market is possible for implementations which are an order of magnitude more efficient than alternative implementations on commercially available DSP processors.

38.3 Domain-Specific High-Level Modeling and Synthesis for ATM Switch Design Using VHDL
Mike Tien-Chien Lee, Yu-Chin Hsu, Ben Chen, Masahiro Fujita

This paper presents our experience on domain-specific high-level modeling and synthesis for Fujitsu ATM switch design. We propose a high-level design methodology using VHDL, where ATM switch architectural features are considered during behavior modeling, and a high-level synthesis compiler, MEBS, is prototyped to synthesize the behavior model down to a gate-level implementation. Since the specific ATM switch architecture is incorporated into both modeling and synthesis phases, a high-quality design is efficiently derived. The synthesis results show that given the design constraints, the proposed high-level design methodology can produce a gate-level implementation by MEBS with about 15% area reduction in shorter design cycle when compared with manual design.


SESSION 39 ANALYSIS AND COMPILATION FOR EMBEDDED SOFTWARE

Chair: Steve Tjiang
Organizers: L. Lavagno, R. Gupta

Efficient code generation and analysis are essential to improve the design cycle of embedded software. The papers in this session aim at extending the applicability of high-level languages to cost and performance-critical systems. The first two papers deal with expression code generation and with address register usage optimization for DSP applications. The third paper addresses the problem of checking timing constraint satisfiability. The last paper describes cost estimation techniques for synthesized software.

39.1 Using Register-Transfer Paths in Code Generation for Heterogeneous Memory-Register Architectures
Guido Araujo, Sharad Malik, Mike Tien-Chien Lee

In this paper we address the problem of code generation for basic blocks in heterogeneous memory-register DSP processors. We propose a new a technique, based on register-transfer paths, that can be used for efficiently dismantling basic block DAGs (Directed Acyclic Graphs) into expression trees. This approach builds on recent results which report optimal code generation algorithm for expression trees for these architectures. This technique has been implemented and experimentally validated for the TMS320C25, a popular fixed point DSP processor. The results show that good code quality can be obtained using the proposed technique. An analysis of the type of DAGs found in the DSPstone benchmark programs reveals that the majority of basic blocks in this benchmark set are expression trees and leaf DAGs. This leads to our claim that tree based algorithms, like the one described in this paper, should be the technique of choice for basic block code generation with heterogeneous memory-register architectures.

39.2 Address Calculation for Retargetable Compilation and Exploration of Instruction-Set Architectures
Clifford Liem, Pierre Paulin, Ahmed Jerraya

The advent of parallel executing Address Calculation Units (ACUs) in Digital Signal Processor (DSP) and Application Specific Instruction-Set Processor (ASIP) architectures has made a strong impact on an application’s ability to efficiently access memories. Unfortunately, successful compiler techniques which map high-level language data constructs to the addressing units of the architecture have lagged far behind. Since access to data is often the most demanding task in DSP, this mapping can be the most crucial function of the compiler. This paper introduces a new retargetable approach and prototype tool for the analysis of array references and traversals for efficient use of ACUs. The ArrSyn utility is designed to be used either as an enhancement to an existing dedicated compiler or as an aid for architecture exploration.

39.3 Analysis of Operation Delay and Execution Rate Constraints for Embedded Systems
Rajesh K. Gupta

Constraints on the delay and execution rate of operations in an embedded application are needed to ensure its timely interaction with a reactive environment. In this paper, we present a static analysis of the timing constraints satisfiability by a given system design consisting of interacting hardware and software components. We use this analysis to evaluate the effect of individual timing constraints on system design issues, such as the choice of the software runtime system, bounds on loop invocations, and the hardware-software synchronization operations. We show, by example, the use of static analysis techniques in the design of embedded systems.

39.4 Efficient Software Performance Estimation Methods for Hardware/Software Codesign
Kei Suzuki, Alberto Sangiovanni-Vincentelli

The performance estimation of a target system at a higher level of abstraction is very important in hardware/software codesign. In this paper, we focus on software performance estimation, including both the execution time and the code size. We present two estimation methods at different levels of abstraction for use in the POLIS hardware/software codesign system. The experimental results show that the accuracy of our methods is usually within +-20%.


SESSION 40 TIMING MODELING AND OPTIMIZATION

Chair: Andrzej J. Strojwas
Organizers: K. Sakallah, S. Malik

This session focuses on techniques used in high performance design. It examines accurate and efficient techniques for gate and interconnect modeling for high performance circuits as well as a process-tolerant clock skew timing optimization technique.

40.1 An Explicit RC-Circuit Delay Approximation Based on the First Three Moments of the Impulse Response
Bogdan Tutuianu, Florentin Dartu, Lawrence T. Pileggi

Due to its simplicity, the ubiquitous Elmore delay, or first moment of the impulse response, has been an extremely popular delay metric for analyzing RC trees and meshes. Its inaccuracy has been noted, however, and it has been demonstrated that higher order moments can be mapped to dominant pole approximations (e.g. AWE) in the general case. The first three moments can be mapped to a two-pole approximation, but stability is an issue, and even a stable model results in a transcendental equation that must be iteratively evaluated to determine the delay. In this paper we describe an explicit delay approximation based on the first three moments of the impulse response. We begin with the development of a provably stable two-pole transfer function/impedance model based on the first three moments (about s=0) of the impulse response. Then, since the model form is known, we evaluate the delay (any waveform percentage point) in terms of an analytical approximation that is consistently within a fraction of 1 percent of the “exact” solution for this model. The result is an accurate, explicit delay expression that will be an effective metric for high-speed interconnect circuit models.

40.2 Modeling the Effects of Temporal Proximity of Input Transitions on Gate Propagation Delay and Transition Time
V. Chandramouli, Karem A. Sakallah

While delay modeling of gates with a single switching input has received considerable attention, the case of multiple inputs switching in close temporal proximity is just beginning to be addressed in the literature. The effect of proximity of input transitions can be significant on the delay and output transition time. The few attempts that have addressed this issue are based on a series-parallel transistor collapsing method that reduces the multi-input gate to an inverter. This limits the technique to CMOS technology. Moreover, none of them discuss the appropriate choice of voltage thresholds to measure delay for a multi-input gate. In this paper, we first present a method for the choice of voltage thresholds for a multi-input gate that ensures a positive value of delay under all input conditions. We next introduce a dual-input proximity model for the case when only two inputs of the gate are switching. We then propose a simple approximate algorithm for calculating the delay and output transition time that makes repeated use of the dual-input proximity model without collapsing the gate into an equivalent inverter. Comparison with simulation results shows that our method performs quite well in practice.

40.3 Optimal Clock Skew Scheduling Tolerant to Process Variations
Jóse Luis Neves, Eby G. Friedman

A methodology is presented in this paper for determining an optimal set of clock path delays for designing high performance VLSI/ULSI-based clock distribution networks. This methodology emphasizes the use of non-zero clock skew to reduce the system-wide minimum clock period. Although choosing (or scheduling) clock skew values has been previously recognized as an optimization technique for reducing the minimum clock period, difficulty in controlling the delays of the clock paths due to process parameter variations has limited its effectiveness. In this paper the minimum clock period is reduced using intentional clock skew by calculating a permissible clock skew range for each local data path while incorporating process dependent delay values of the clock signal paths. Graph-based algorithms are presented for determining the minimum clock period and for selecting a range of process-tolerant clock skews for each local data path in the circuit, respectively. These algorithms have been demonstrated on the ISCAS-89 suite of circuits. Furthermore, examples of clock distribution networks with intentional clock skew are shown to tolerate worst case clock skew variations of up to 30% without causing circuit failure while increasing the system-wide maximum clock frequency by up to 20% over zero skew-based systems.


SESSION 41 DECISION DIAGRAMS AND THEIR APPLICATIONS

Chair: Rick Rudell
Organizers: A. Kuehlmann, F. Somenzi

Decision diagrams are used in many contemporary verification techniques. This session presents advances in their basic implementation and two applications to combinational and sequential verification.

41.1 An Efficient Equivalence Checker for Combinational Circuits
Yusuke Matsunaga

This paper describes a novel equivalence checking method for combinational circuits, which utilizes relations among internal signals represented by binary decision diagrams. To verify circuits efficiently, A proper set of internal signals that are independent with each other should be chosen. A heuristic based on analysis of circuit structure is proposed to select such a set of internal signals. The proposed verifier requires only a minute for equivalence checking of all the ISCAS'85 benchmarks on SUN-4/10.

41.2 High Performance BDD Package By Exploiting Memory Hierarchy
Jagesh V. Sanghavi, Rajeev K. Ranjan, Robert K. Brayton, Alberto Sangiovanni-Vincentelli

The success of binary decision diagram (BDD) based algorithms for verification depend on the availability of a high performance package to manipulate very large BDDs. State-of-the-art BDD packages, based on the conventional depth-first technique , limit the size of the BDDs due to a disorderly memory access patterns that results in unacceptably high elapsed time when the BDD size exceeds the main memory capacity. We present a high performance BDD package that enables manipulation of very large BDDs by using an iterative breadth-first technique directed towards localizing the memory accesses to exploit the memory system hierarchy. The new memory-oriented performance features of this package are 1) an architecture independent customized memory management scheme, 2) the ability to issue multiple independent BDD operations (super-scalarity), and 3) the ability to perform multiple BDD operations even when the operands of some BDD operations are the result of some other operations yet to be completed (pipelining). A comprehensive set of BDD manipulation algorithms are implemented using the above techniques. Unlike the breadth-first algorithms presented in the literature, the new package is faster than the state-of-the-art BDD package by a factor of up to 1.5, even for the BDD sizes that fit within the main memory. For BDD sizes that do not fit within the main memory, a performance improvement of up to a factor of 100 can be achieved.

41.3 Implementation of an Efficient Parallel BDD Package
Tony Stornetta, Forrest Brewer

Large BDD applications push computing resources to their limits. One solution to overcoming resource limitations is to distribute the BDD data structure across multiple networked workstations. This paper presents an efficient parallel BDD package for a distributed environment such as a network of workstations (NOW) or a distributed memory parallel computer. The implementation exploits a number of different forms of parallelism that can be found in depth-first algorithms. Significant effort is made to limit the communication overhead, including a two-level distributed hash table and an uncomputed cache. The package simultaneously executes multiple threads of computation on a distributed BDD.

41.4 Word Level Symbolic Model Checking - Avoiding The Pentium FDIV Error
E.M. Clarke, Manpreet Khaira, Xudong Zhao

The highly-publicized division error in the Pentium has emphasized the importance of formal verification of arithmetic circuits. Symbolic model checking techniques based on binary decision diagrams (BDDs) have been successful in verifying control logic. However, lack of proper representation for functions that map boolean vectors into the integers has prevented this technique from being used for verifying arithmetic operations. We have developed a new technique for verifying arithmetic circuits. The new technique, called work level model checking, has been used successfully to verify circuits for division and square root computation that are based on the SRT algorithm used by the Pentium. The technique makes it possible to handle both the control logic and the data paths in the circuit. The total number of state variables exceeds 600 (which is much larger than any circuit previously handled by other symbolic model checkers).


SESSION 42 FORMAL METHODS

Chair: Carl Pixley
Organizers: B. Frye, N. Weste

Adequate design verification using traditional simulation-based techniques becomes increasingly difficult as design complexity advances. Formal verification techniques promise to prove that a design is correct without exhaustive simulation. Papers in this session describe designer experiences with formal verification techniques.

42.1 Formal Verification of PowerPC(TM) Arrays Using Symbolic Trajectory Evaluation
Manish Pandey, Richard Raimi, Derek L. Beatty, Randal E. Bryant,

Verifying memory arrays such as on-chip caches and register files is a difficult part of designing a microprocessor. Current tools cannot verify the equivalence of the arrays to their behavioral or RTL models, nor their correct functioning at the transistor level. It is infeasible to run the number of simulation cycles required, and most formal verification tools breakdown due to the enormous number of state-holding elements in the arrays. The formal method of symbolic trajectory evaluation (STE) appears to offer a solution, however. STE verifies that a circuit satisfies a formula in a carefully restricted temporal logic. For arrays, it requires only a number of variables approximately logarithmic in the number of memory locations. The circuit is modeled at the switch level, so the verification is done on the actual design. We have used STE to verify two arrays from PowerPC microprocessors: a register file, and a data cache tag unit. The tag unit contains over 12,000 latches. We believe it is the largest circuit to have been formally verified, without abstracting away significant detail, in the industry. We also describe an automated technique for identifying state-holding elements in the arrays, a technique which should greatly assist the widespread application of STE.

42.2 RuleBase: an Industry-Oriented Formal Verification Tool
Ilan Beer, Shoham Ben-David, Cindy Eisner, Avner Landver

RuleBase is a formal verification tool, developed by the IBM Haifa Research Laboratory. It is the result of three years of experience in practical formal verification of hardware which, we believe, has been a key factor in bringing the tool to its current level of maturity. We present the tool, including several unique features, and summarize our usage experience.

42.3 Bit-Level Analysis of an SRT Divider Circuit
Randal E. Bryant

It is impractical to verify multiplier or divider circuits entirely at the bit-level using ordered Binary Decision Diagrams (BDDs), because the BDD representations for these functions grow exponentially with the word size. It is possible, however, to analyze individual stages of these circuits using BDDs. Such analysis can be helpful when implementing complex arithmetic algorithms. As a demonstration, we show that Intel could have used BDDs to detect erroneous lookup table entries in the Pentium(TM) floating point divider. Going beyond verification, we show that bit-level analysis can be used to generate a correct version of the table.

42.4 Integrating Formal Verification Methods with A Conventional Project Design Flow
Asgeir Th. Eiriksson

We present a formal verification methodology that we have used on a computer system design project. The methodology integrates a temporal logic model checker with a conventional project design flow. The methodology has been used successfully to verify the protocols within a distributed shared memory machine. We consider the following to be the four main benefits to using the model checker. First, it ensures that there exists an accurate high-level machine readable system specification. Second, it allows high-level system verification early in the design phase. Third, it facilitates equivalence and refinement checking between the high-level specification, and the RTL implementation. Finally, and most importantly it uncovered many protocol specification and RTL implementation problems.


SESSION 43 APPLICATIONS OF HARDWARE/SOFTWARE CODESIGN

Chair: Wayne Wolf
Organizers: D. Stark, N. Weste

The realization of complex systems sometimes calls for methods of simultaneous hardware and software design that go beyond the traditional boundaries separating these two areas. The papers in this session explore efficient codesign and examine its application to real design examples.

43.1 A System Design Methodology for Software/Hardware Co-Development of Telecommunication Network Applications
Bill Lin

In this paper, we describe a system design methodology for the concurrent development of hybrid software/hardware systems for telecom network applications. This methodology is based on the results of an investigation and evaluation of an actual industrial system design application for ATM-based broad-band networks. The aim of this methodology is to provide an integrated design flow from system specification to implementation.

43.2 A Strategy for Real-Time Kernel Support in Application-Specific HW/SW Embedded Architectures
Steven Vercauteren, Bill Lin and Hugo De Man

Heterogeneous embedded multiprocessor architectures are becoming more prominent as a key design solution to today's microelectronics design problems. These application-specific architectures integrate multiple software programmable processors and dedicated hardware components together on to a single cost- efficient IC. In contrast to general-purpose computer systems, embedded systems are designed and optimized to provide specific functionality, using possibly a combination of different classes of processors (e.g. DSPs, microcontrollers) from different vendors. While these customized heterogeneous multiprocessor architectures offer designers new possibilities to tradeoff programmability, there is currently a lack of tools to support the programming of these architectures. In this paper, we consider the problem of providing real-time kernel support for managing the concurrent software tasks that are distributed over a set of processors in an application-specific multiprocessor architecture. This is complimentary to current research activities that aim to provide efficient retargetable code generation [8,11,10] for a broad range of embedded processors.

43.3 Software Development in a Hardware Simulation Environment
Benny Schnaider, Einat Yogev

Concurrent verification of hardware and software as part of the development process can shorten the time to market of complex systems. The objectives of the Virtual CPU approach is to provide a solution for code development in a simulation environment before the system prototype is ready. The VCPU is an ideal solution for processor-based systems that include multiple new ASICs and boards, in which the hardware, diagnostics and software drivers must be developed concurrently.

43.4 Compiled HW/SW Co-Simulation
Vojin Zivojnovic, Heinrich Meyr

This paper presents a technique for simulating processors and attached hardware using the principle of compiled simulation. Unlike existing, in-house and off-the-shelf hardware/software co-simulators, which use interpretive processor simulation, the proposed technique performs instruction decoding and simulation scheduling at compile time. The technique offers up to three orders of magnitude faster simulation. The high speed allows the user to explore algorithms and hardware/software trade-offs before any hardware implementation. In this paper, the sources of the speedup and the limitations of the technique are analyzed and the realization of the simulation compiler is presented.


SESSION 44 POWER ESTIMATION AND RETIMING

Chair: Bill Lin
Organizers: F. Somenzi, B. Lin

This paper covers two topic areas. The first two papers are devoted to power estimation. The first paper presents a novel technique for power estimation based on stochastic machine synthesis. This paper presents an effective macro-modeling approach to datapath power estimation. The next two papers are devoted to retiming techniques at the architecture level.

44.1 Stochastic Sequential Machine Synthesis Targeting Constrained Sequence Generation
Diana Marculescu, Radu Marculescu, Massoud Pedram

The problem of stochastic sequential machines (SSM) synthesis is addressed and its relationship with the constrained sequence generation problem which arises during power estimation is discussed. In power estimation, one has to generate input vector sequences that satisfy a given statistical behavior (in terms of transition probabilities and correlations among bits) and/or to make these sequences as short as possible so as to improve the efficiency of power simulators. SSMs can be used to solve both problems. Based on Moore-type machines, a general procedure for SSM synthesis is revealed and a new framework for sequence characterization is built to match designer’s needs for sequence generation or compaction. As results demonstrate, compaction ratios of 1-2 orders of magnitude can be obtained without much loss in accuracy of total power estimates.

44.2 Energy Characterization based on Clustering
Huzefa Mehta, Robert M. Owens, Mary Jane Irwin

We illustrate a new method to characterize the energy dissipation of circuits by collapsing closely related input transition vectors and energy patterns into capacitive coefficients. Energy characterization needs to be done only once for each module (ALU, multiplier etc.,) in order to build a library of these capacitive coefficients. A direct high-level energy simulator or profiler can then use the library of pre-characterized modules and a sequence of input vectors to compute the total energy dissipation. A heuristic algorithm which performs energy clustering under objective constraints has been devised. The worst case running time of this algorithm is O ( m 3 n ), where m is the number of simulation points and n is the number of inputs of the circuit. The designer can experiment with the criterion function by setting the appropriate relative error norms to control the `goodness' of the clustering algorithm and the sampling error and confidence level to maintain the sufficiency of representation of each cluster. Experiments on circuits show a significant reduction of the energy table size under a specified criterion function, cluster sampling error and confidence level.

44.3 Architectural Retiming: Pipelining Latency-Constrained Circuits
Soha Hassoun, Carl Ebeling

This paper presents a new optimization technique called architectural retiming which is able to improve the performance of many latency-constrained circuits. Architectural retiming achieves this by increasing the number of registers on the latency-constrained path while preserving the functionality and latency of the circuit. This is done using the concept of a negative register, which can be implemented using precomputation and prediction. We use the name architectural retiming since it both reschedules operations in time and modifies the structure of the circuit to preserve its functionality. We illustrate the use of architectural retiming on two realistic examples and present performance improvement results for a number of sample circuits.

44.4 Optimizing Systems for Effective Block-Processing: The k-Delay Problem
Kumar N. Lalgudi, Marios C. Papaefthymiou, Miodrag Potkonjak

Block-processing is a powerful and popular technique for increasing computation speed by simultaneously processing several samples of data. The effectiveness of block-processing is often reduced, however, due to suboptimal placement of delays in the data flow graph of a computation. In this paper we investigate an application of the retiming transformation for improving the effectiveness of block-processing in computation structures. Specifically, we consider the k-delay problem in which we wish to retime any given computation so that given an integer k the resulting computation can process k data samples simultaneously in a fully regular manner. Our main contribution is an O ( V 3 E + V 4 log V )- time algorithm for the k-delay problem, where V is the number of computation blocks and E is the number of interconnections in the computation.


SESSION 45 TECHNOLOGY DEPENDENT PERFORMANCE DRIVEN SYNTHESIS

Chair: Gabrièle Saucier
Co-Chair: Hamid Savoj
Organizers: M. Pedram, G. Saucier

This sessions focuses on performance oriented synthesis accounting for the target technology. The first 3 papers deal with LUT based FPGA mapping both in sequential and combinational circuits. The fourth paper proposes a new algorithm for performing directed gate selection from an ASIC library. The last paper describes a practical and effective method for post layout buffer insertion using an accurate wire delay model.

45.1 Optimal Clock Period FPGA Technology Mapping for Sequential Circuits
Peichen Pan, C.L. Liu

In this paper, we study the technology mapping problem for sequential circuits for LUT-based FPGAs. Existing approaches map the combinational logic between ip-ops (FFs) while assuming the positions of the FFs are fixed. We study in this paper a new approach to the problem, in which retiming is integrated into the technology mapping process. We present a polynomial time technology mapping algorithm that can produce a mapping solution with the minimum clock period while assuming FFs can be arbitrarily repositioned by retiming. The algorithm has been implemented. Experimental results on benchmark circuits clearly demonstrate the advantage of our approach. For many benchmark circuits, our algorithm produced mapping solutions with clock periods not attainable by a mapping algorithm based on existing approaches, even when it employs an optimal delay mapping algorithm for combinational circuits.

45.2 Structural Gate Decomposition for Depth-Optimal Technology Mapping in LUT-Based FPGA Design
Jason Cong, Yean-Yow Hwang

In this paper, we study the problem of decomposing gates in fanin-unbounded or K-bounded networks such that the K-input LUT mapping solutions computed by a depth-optimal mapper have minimum depth. We show (1) any decomposition leads to a smaller or equal mapping depth regardless the decomposition algorithm used, and (2) the problem is NP-hard for unbounded networks when K >= 3 and remains NP-hard for K-bounded networks when K >= 5. We propose a gate decomposition algorithm, named DOGMA, which combines level-driven node packing technique (Chortle-d) and the network flow based optimal labeling technique (FlowMap). Experimental results show that networks decomposed by DOGMA allow depth-optimal technology mappers to improve the mapping solutions by up to 11% in depth and up to 35% in area comparing to the mapping results of networks decomposed by other existing decomposition algorithms.

45.3 A Boolean Approach to Performance-Directed Technology Mapping for LUT-Based FPGA Designs
Christian Legl, Bernd Wurth, Klaus Eckl

This paper presents a novel, Boolean approach to LUT-based FPGA technology mapping targeting high performance. As the core of the approach, we have developed a powerful functional decomposition algorithm. The impact of decomposition is enhanced by a preceding collapsing step. To decompose functions for small depth and area, we present an iterative, BDD-based variable partitioning procedure. The procedure optimizes the variable partition for each bound set size by iteratively exchanging variables between bound set and free set, and finally selects a good bound set size. Our decomposition algorithm extracts common subfunctions of multiple-output functions, and thus further reduces area and the maximum interconnect lengths. Experimental results show that our new algorithm produces circuits with significantly smaller depths than other performance-oriented mappers. This advantage also holds for the actual delays after placement and routing.

45.4 New Algorithms for Gate Resizing: A Comparative Study
Olivier Coudert, Ramsey Haddad and Srilatha Manne

Gate sizing consists of choosing for each node of a mapped network a gate implementation in the library so that some cost function is optimized under some constraints. It has a significant impact on the delay, power dissipation, and area of the final circuit. This paper compares five gate sizing algorithms targeting discrete, non-linear, non-unimodal, constrained optimization. The goal is to overcome the non-linearity and non-unimodality of the delay and the power to achieve good quality results within a reasonable CPU time, e.g., handling a 10000 node network in 2 hours. We compare the five algorithms on constraint free delay optimization and delay constrained power optimization, and show that one method is superior to the others.

45.5 Post-Layout Optimization for Deep Submicron Design
Koichi Sato, M. Kawarabayashi, Hideyuki Emura, Naotaka Maeda

To reduce the number of synthesis and layout iterations, we present a new delay optimization technique, which inserts bufers based on back-annotated detailed routing information. During optimization, inserted buffers are assumed to be placed on the appropriate location of original wires so as to calculate accurate wire RC delay. With forward-annotated location information of inserted buffers, the lay-out system attempts to preserve patterns of original wires using the ECO technique. Our experimental results show that this technique combined with the conventional gate sizing technique achieves up to 41.2% delay reduction after the initial layout.


SESSION 46 LAYOUT ANALYSIS AND OPTIMIZATION

Chair: Alan Cave
Organizers: Y.-L. Lin, A. Domic

These papers present methods for fast extraction of accurate parasitic parameters, analysis of layout for electromigration reliability, and improvement of manufacturing yield by means of layout spacing.

46.1 Enhanced Network Flow Algorithm for Yield Optimization
Cyrus S. Bamji, Enrico Malavasi

A novel constraint-graph algorithm for the optimization of yield is presented. This algorithm improves the yield of a layout by carefully spacing objects to reduce the probability of faults due to spot defects. White space between objects is removed and spacing in tightly packed areas of the layout is increased. The computationally expensive problem of optimizing yield is transformed into a network flow problem, which can be solved via known eficient algorithms. Yield can be improved either without changing the layout area, or if necessary by increasing the layout area to maximize the number of good chips per wafer. Our method can in theory provide the best possible yield achievable without modifying the layout topology. The method is able to handle a general class of convex objective functions, and can therefore optimize not only yield, but other circuit performance functions such as wire-length, cross-talk and power.

46.2 Hierarchical Electromigration Reliability Diagnosis for VLSI Interconnects
Chin-Chi Teng, Yi-Kan Cheng, Elyse Rosenbaum, Sung-Mo Kang

In this paper, we present a hierarchical reliability-driven CAD system for the design of electromigration resistant circuits. The top of the hierarchy aims at quickly identifying those critical interconnects with potential electromigration reliability problems. Then the detailed electromigration analysis of critical interconnects is carried out by an accurate and computationally efficient simulation tool (iTEM). This top-down approach provides a feasible solution to the complicated electromigration diagnosis problem.

46.3 Using Articulation Nodes to Improve the Efficiency of Finite-Element based Resistance Extraction
A.J. van Genderen, N.P. van der Meijs

In this paper, we describe how we have improved the efficiency of a finite-element method for interconnect resistance extraction by introducing articulation nodes in the finite-element mesh. The articulation nodes are found by detecting equipotential regions and lines in the interconnects. Without generating inaccuracies, these articulation nodes split the finite-element mesh into small pieces that can be solved independently. The method has been implemented in the layout-to-circuit extractor Space. All interconnect resistances of a circuit containing 63,000 transistors are extracted on an HP 9000/735 workstation in approximately 70 minutes.

46.4 Extracting Circuit Models for Large RC Interconnections That Are Accurate up to a Predefined Signal Frequency
P.J.H. Elias, N.P. van der Meijs

This paper presents a technique that transforms large and complex RC networks into much smaller, physically realizable RC networks. These models reflect the transmission behavior of the initial network accurately for frequencies up to a user-defined maximal signal frequency. This technique has been incorporated in a layout-to-circuit extractor, using a scan-line approach. The method guarantees numerical stability and performs excellently in modeling RC interconnects.


SESSION 47 Panel: System Synthesis: Can We Meet The Challenges to Come?

Chair: Robert A. Walker
Organizer: R. Walker
Panelists: Laurent Bergher, Raul Campasano, Daniel D. Gajski, Pierre Paulin, Barry Shackleford, Randy Steck

Over the past few years, the field of system synthesis has begun to emerge, combining high-level synthesis with hardware / software codesign. But as we move toward the next millennium, will system synthesis be enough? Can we develop the necessary tools to design the hottest processor chip in the year 2000, or the next great cellular phone / PDA, or those home multimedia systems that we see looming on the horizon? CAN WE MEET THE CHALLENGES TO COME???


SESSION 48 HARDWARE DESCRIPTION LANGUAGE TECHNIQUES

Chair: Hilary J. Kahn
Organizers: D. Stark, B. Frye

The key to containing the complexity of modern electronic system design lies largely at the door of the popular HDLS. We begin this session with a tutorial that compares and contrasts the two popular HDLs, Verilog and VHDL and in addition shows the examples in the C language. This is followed by a paper that promotes a VHDL coding standard and highlights a VHDL development system that has been used with this coding standard.

48.1 Tutorial: VHDL & Verilog Compared & Contrasted - Plus Modeled Example Written in VHDL, Verilog and C
Douglas J. Smith

This tutorial is in two parts. The first part takes an unbiased view of VHDL and Verilog by comparing their similarities and contrasting their differences. The second part contains a worked example of a model that computes the Greatest Common Divisor (GCD) of two numbers. The GCD is modeled at the algorithmic level in VHDL, Verilog and for comparison purposes, C. It is then shown modeled at the RTL in VHDL and Verilog.

48.2 VHDL Development System and Coding Standard
Hans Sahm, Claus Mayer, Jörg Pleickhardt, Johannes Schuck, Stefan Späth

With the growing complexity of todays ASICs and the number of designers involved in one VHDL ASIC project, the need for a VHDL development system together with coding rules for simulation and synthesis has emerged. This paper describes the VHDL Coding Standard which has been established and the VHDL development system including code entry, code formatting, code compliance checkers, data management and multi-user project set-up.


SESSION 49 POWER MINIMIZATION IN IC DESIGN

Chair: Massoud Pedram
Organizers: M. Pedram, F. Somenzi

Low power optimization has emerged as an important goal in IC design. This session presents algorithms for power reduction by discrete gate resizing and post-map resynthesis in combinational circuits and by selective disabling of registers in sequential circuits.

49.1 An Exact Algorithm for Low Power Library-Specific Gate Re-Sizing
De-Shen Chen, Majid Sarrafzadeh

In this paper we examine the problem of reducing the power consumption of a technology mapped circuit under timing constraints. Consider a cell library that contains multiple implementations (cells) of the same Boolean function. We first present an exact algorithm for the problem when a complete library is given in a complete library, "all" implementations of each cell are present. We then propose an efficient algorithm for the problem if the provided library is not complete.

49.2 Reducing Power Dissipation after Technology Mapping by Structural Transformations
Bernhard Rohfleisch, Alfred Kölbl, Bernd Wurth

Due to the increasing demand for low power circuits, low power dissipation has emerged as an important optimization goal in logic synthesis. In this paper, we show that the power dissipation of technology mapped circuits can be significantly reduced by ATPG-based structural transformations. Our approach performs a sequence of permissible signal substitutions, where each substitution reduces the power consumption of the circuit. Since timing constraints can be considered, we achieve a trade-off between power and delay. The effectiveness of the proposed method is based on two facts. First, the power models for library gates are reasonably accurate. Thus, the power savings achieved by transformations of mapped circuits are also well modeled. Second, ATPG-based structural transformations effectively exploit don't care conditions, after technology mapping even for large circuits. Experimental results show power reductions of 26% on average with no area penalty. Substantial power reductions are also achieved if timing constraints are considered.

49.3 Desensitization for Power Reduction in Sequential Circuits
Xiangfeng Chen, Peichen Pan, C.L. Liu

In this paper, we describe a technique for power reduction in sequential circuits. Existing signals in the circuit are used to selectively disable some of the registers so that a portion of the circuit will be deactivated. Consequently, average power consumption in the circuit is reduced at a cost of small increases in area and delay. We present an algorithm for determining the desensitizing signal for each register. A significant amount of power reduction is achieved in a number of benchmark circuits according to our experimental results.


SESSION 50 ADVANCED TEST SOLUTIONS

Chair: Sandip Kundu
Organizers: J. Rajski, Y. Zorian

The papers in this session address fault emulation on hardware, effect of coding density on partial scan selection and an in-depth testability analysis technique for large macro based designs.

50.1 Serial Fault Emulation
Luc Burgun, Frédéric Reblewski, Gérard Fenelon, Jean Barbier, Olivier Lepape

A hardware emulator based approach has been developed to perform test evaluation on large sequential circuits (at least tens of thousands of gates). This approach relies both on the flexibility and on the reconfigurability of hardware emulators based on dedicated reprogrammable circuits. A Serial Fault Emulation (SFE) method in which each faulty circuit is emulated separately has been applied to gate-level circuits for Single Stuck Faults (SSFs). This approach has been implemented on the Meta Systems’s hardware emulator which is capable of emulating circuits of 1,000,000 gates at rates varying from 500KHz to several MHz. Experimental results are provided to demonstrate the efficiency of SFE. They indicate that SFE should be two orders of magnitude faster than software approaches for designs containing more than 100.000 gates.

50.2 Partial Scan Design Based on Circuit State Information
Dong Xiang, Srikanth Venkataraman, W. Kent Fuchs and Janak H. Patel

State information of a sequential circuit can be used to evaluate the complexity of test generation. The ratio of valid states to all the states of the circuit is an important indicator of test generation complexity. Using valid states obtained via logic simulation, a testability measure based on the density of encoding is proposed for scan flip flop selection. A second testability measure based on the test generation state information is also presented and used to select scan flip flops. Cycles are broken selectively on the basis of the circuit state information. Good fault coverage and test efficiency are obtained when fewer scan flip flops than the minimum cut set are selected. Experimental results are presented to demonstrate the effectiveness of the method.

50.3 Pseudorandom-Pattern Test Resistance in High-performance DSP Datapaths
Laurence Goodby, Alex Orailoglu

The testability of basic DSP datapath structures using pseudorandom built-in self-test techniques is examined. The addition of variance mismatched signals is identified as a testing problem, and the associated fault detection probabilities are derived in terms of signal probability distributions. A method of calculating these distributions is described, and it is shown how these distributions can be used to predict testing problems that arise from the correlation properties of test sequences generated using linear-feedback shift registers. Finally, it is shown empirically that variance matching using associativity transformations can reduce the number of untested faults by a factor of eight over variance mismatched designs.


SESSION 51 TECHNOLOGY OPTIMIZATION FOR CELLS AND SYSTEMS

Chair: D.M.H. Walker
Organizers: J. White, R.A. Rutenbar

High-performance designs necessarily push the envelope of what is achievable in current fabrication technology. This session looks at the interplay between design optimization and fab technology, for cells, libraries and full chips, for reliability, power, performance and yield.

51.1 Hot-Carrier Reliability Enhancement via Input Reordering and Transistor Sizing
Aurobindo Dasgupta, Ramesh Karri

Hot-carrier effects and electromigration are the two important failure mechanisms that significantly impact the long-term reliability of high-density VLSI ICs. In this paper, we present a probabilistic switch-level method for identifying the most susceptible hot-carrier MOSFETs and improving their hot-carrier reliability using two techniques (i) reordering of inputs to logic gates and (ii) selective MOSFET sizing. We also show that for a given circuit, the best design in terms of hot-carrier reliability does not necessarily coincide with the best design in terms of power consumption.

51.2 A Methodology for Concurrent Fabrication Process/Cell Library Optimization
Arun N. Lokanathan, Jay B. Brockman, John E. Renaud

This paper presents a methodology for concurrently optimizing an IC fabrication process and a standard cell library in order to maximize overall yield. The approach uses the Concurrent Subspace Optimization (CSSO) algorithm, which has been developed for general coupled, multidisciplinary optimization problems. An example is provided showing the application of the algorithm to optimizing a mixed analog-digital library on a CMOS process.

51.3 Computing Parametric Yield Adaptively Using Local Linear Models
Mien Li, Linda Milor

A divide-and-conquer algorithm for computing the parametric yield of large analog circuits is presented. The algorithm targets applications whose performance spreads could be highly nonlinear functions of a large numbers of stochastic process disturbances, and therefore can not easily be modeled by traditional response surface methods. This work addresses difficulties with modeling by adaptively constructing the model piece by piece, namely, by efficiently and recursively partitioning the disturbance space into several regions, each of which is then modeled by a local linear model. Local linear models are used because they are less sensitive to dimension than polynomial models. Moreover, the resulting model can be made to be more accurate in some regions compared to others. The number of simulations required in statistical modeling can therefore be reduced since only critical regions, which define the boundary of the feasible region in the space of process disturbances, are modeled highly accurately. The resulting models are then used as cheap surrogates for circuit simulation in Monte Carlo estimation of the parametric yield. Examples indicate the efficiency and accuracy of this approach.