SIGDA, Super Compendium, GLSVSLI 1998, Abstracts

GLSVLSI 1998 ABSTRACTS

Sessions: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

Low Power Circuits and Architectures

Low Power Memory Architectures for Video Applications [p. 2]

Bhanu Kapoor

We provide data and insight into how the choice of cache parameters affects memory power consumption of video algorithms. We make use of memory traces generated as a result of running typical MPEG-2 motion estimation algorithms to simulate a large number of cache configurations. The cache simulation data is then combined with on-chip and off-chip memory power models to compute memory power consumption. In the area of analysis of video algorithms, this paper focuses on the following issues: We provide a derailed study of how varying cache size, block size, and associativity affects memory power consumption. The configurations of particular interest are the ones that optimize power under certain constraints. We also study the role of process technology in these experiments. In particular, we look at how moving to a more advanced process technology for the on-chip cache affects optimal points of operation with respect to memory power consumption.

Reducing Power Consumption of Dedicated Processors through Instruction Set Encoding [p. 8]

Luca Benini, Giovanni De Micheli, Alberto Macii, Enrico Macii, Massimo Poncino

With the increased clock frequency of modern, high-performance processors (over 500 MHz, in some cases), limiting the power dissipation has become the most stringent design target. It is thus mandatory for processor engineers to resort to a large variety of optimization techniques to reduce the power requirements in the hot zones of the chip. In this paper, we focus on the power dissipated by the instruction fetch and decode logic, a portion of the processor architecture where a lot of capacitance switching normally takes place. We propose a methodology for determining an encoding of the instruction set that guarantees the minimization of the number of bit transitions occurring inside the registers of the pipeline stages involved in instruction fetching and decoding. The assignment of the binary patterns to the op-codes is driven by the statistics concerning instruction adjacency collected through instruction-level simulation of typical software applications; therefore, the technique is best exploited when applied to encode the instruction set of core processors and microcontrollers, since components of these types are commonly used to execute fixed portions of machine code within embedded systems. We illustrate the effectiveness of the methodology through the experimental data we have obtained on an existing microprocessor.

A Low-Power High-Performance Embedded SRAM Macrocell [p. 13]

A.M. Fahim, M. Khellah, and M.I. Elmasry

A new approach to modeling the decoding hierarchy in a hierarchical word line (HWL) SRAM architecture using integer-linear programming (ILP) is introduced Using this approach, the HWL architecture is shown to be inadequate for very large SRAM sizes. Alternatively, a new low-power high-speed SRAM architecture is described This architecture is shown to have fairly constant speed and power dissipation for sizes ranging between 32kb to 4Mb. Low-power is achieved by a voltage boosting technique not requiring a two-step voltage [7], and by a new method of tristating memory cells during a write operation. The SRAM was implemented in a 0.35μm CMOS technology operated at 150MHz while dissipating only 10mW.

Low-Power Design of Finite Field Multipliers for Wireless Applications [p. 19]

A.G. Wassal, M.A. Hassan, and M.I. Elmasry

Unlike most research involving finite field multipliers, this work targets a low-power multiplier through the application of various power reduction techniques to different types of multipliers and comparing their power consumption among other factors, rather than comparing complexity measures such as gate count or area. Gate count is used as a starting point to choose potential architectures, namely, polynomial and normal basis architectures. Power reduction techniques employed are mainly concerned with Architecture- and Logic-Level low-power techniques. They include supply voltage reduction, power cost estimations, using low-power logic families and pipelining.

Guidelines for Use of Registers and Multiplexers in Low Power Low Voltage DSP Systems [p. 26]

Dusan Suvakovic, C. Andre T. Salama

Registers and datapath multiplexers exist in most DSP datapaths. Although not performing computations, they are necessary for the dataflow control and they consume energy. This paper describes the nature of register and multiplexer energy consumption in modern low power CMOS processes, shows its strong dependence on architectural and layout design and provides practical design guidelines for micropower implementation.

A Bootstrapped NMOS Charge Recovery Logic [p. 30]

Seung-Moon Yoo and Sung-Mo (Steve) Kang

This paper describes a new Bootstrapped NMOS Charge Recovery Logic (BNCRL) which realizes low energy computation. Power comparison with a state-of-the-an adiabatic charge recovery circuit is shown for an inverter chain and an 8-bit adder The new logic circuits exhibit full rail-to-rail logic swing, less dependency of energy consumption on output load capacitance variations, and significant energy saving. Benchmark circuits were designed for comparison using 0.6μm CMOS technology.

Power Reducing Techniques for Clocked CMOS PLAs [p. 34]

R.F. Hobson

Power saving techniques for CMOS Programmable Logic Arrays (PLAs) are discussed. Two new techniques are introduced, an AND-plane pulse generator, and Wired-OR CMOS. Power reduction in excess of 75% over Pseudo-NMOS techniques and 50% over some clocked PLA techniques is possible.
Keywords: Pulse generator, Wired- OR CMOS, Single-phase clock, Self-timed logic.

Dynamic and Short-Circuit Power of CMOS Gates Driving Lossless Transmission Lines [p. 39]

Yehea I. Ismail, Eby G. Friedman, and Jose L. Neves

The dynamic and short-circuit power consumption of a CMOS gate driving an LC transmission line as a limiting case of an RLC transmission line is investigated in this paper. Closed form solutions for the output voltage and short-circuit power of a CMOS gate driving an LC transmission line are presented. These solutions agree with AS/X circuit simulations within 11% error for a wide range of transistor widths and line impedances. The ratio of the short-circuit to dynamic power is shown to be less than 7% for CMOS gates driving LC transmission lines where the line is matched or underdriven. The total power consumption is expected to decrease as inductance effects becomes more significant as compared to an RC dominated interconnect.

A New Full Adder Cell for Low-Power Applications [p. 45]

Ahmed M. Shams, and Magdy A. Bayoumi

A new low power CMOS i-bit full adder cell is presented. It is based on recent design of XOR and XNOR gates [6], and pass-transistors, it has 17 transistors. This cell has been compared to two widely used efficient adder cells; the transmission function full adder cell (16 transistors) [2], and the low power adder cell (14 transistors) [3]. The new cell has no short circuit power and lower dynamic power (than the other adder cells), because of less number and magnitude of circuit capacitances. It consumes 10% to 15% less power than the other two cells. A comparative analysis (using Magic and Hspice) for 8-bit ripple carry and carry select adders shows that the adders based on the new cell can save up to 25% of power consumption.

VLSI Circuits

Beta-Driven Threshold Elements [p. 52]

Victor I. Varshavsky

Circuits on threshold elements have aroused considerable interest in recent years. One of the possible approaches of their imlementation is using output wired CMOS invertors [3,4,5]. The model of such an element is a CMOS pair with variable β of fully open p- and n-transistors. This model is specified by the ratio form of threshold function. It has been proved that any threshold function can be rewritten in ratio form. This gives us an evident way of β-driven implementation of threshold functions. It has the following differences from implementation on output wired CMOS invertors:
- 3DTE requires one transistor per weight unit rather than two;
- the implementability of 3DTE depends only on threshold value, not on the input weights sum.
The analysis of /3DTE implementability, examples of circuits and results of their SPICE simulation are given.

A VLSI High-Performance Encoder with Priority Lookahead [p. 59]

José G. Delgado-Frias and Jabulani Nyathi

In this paper we introduce a VLSI priority encoder that uses a novel priority lookahead scheme to reduce the delay for the worse case operation of the circuit, while maintaining a very low transistor count. The encoder's topmost input request has the highest priority: this priority descends linearly. Two design approaches for the priority encoder are presented. one without a priority lookahead scheme and one with a priority lookahead scheme. For an N-bit encoder, the circuit with the priority lookahead scheme requires only 1.094 times the number of transistors the circuit without the priority lookahead scheme. Having a 32-bit encoder as an example, the circuit with the priority lookahead scheme is 2.59 times faster than the circuit without the priority lookahead. The worst case operation delay is 4.4 us for this lookahead encoder, using a 1-μm scalable CMOS technology. The proposed lookahead scheme can be extended to larger encoders.

Noise Margins of Threshold Logic Gates Containing Resonant Tunneling Diodes [p. 65]

M. Bhattacharya and P. Mazumder

Threshold gates consisting of RTDs in conjunction with HBTs or CHFETs or MOS transistors can form extremely compact, ultrafast, digital logic alternatives. The resonant tunneling phenomenon causes these circuits to exhibit super-high-speed switching capabilities. Additionally. by virtue of being threshold logic gates, they are guaranteed to be more compact than traditional digital logic circuits while achieving the same functionality. However, reliable logic design with these gates will need a thorough understanding of their noise performance and power dissipation among other things. In this paper, we present an analytical study of the noise performance of these threshold gates supplemented by computer simulation results. with the objective of obtaining reliable circuit design guidelines.

600 MHz Digitally Controlled BiCMOS Oscillator (DCO) for VLSI Signal Processing & Communication Applications [p. 71]

Azman M. Yusof, Lim Chu Aun & S.M. Rezaul Hasan

A 16-bit digitally controlled BiCMOS ring oscillator (DCO) is described, This BiCMOS DCO design provides improved frequency stability under thermal fluctuations compared to a CMOS DCO design presented in [1]. Simulations of a 5-stage DCO using 1μm BiCMOS process parameters achieved a controllable frequency range of 90 - 640 MHz with a linear/quasi-linear range of around 300MHz. Monotone frequency gain (frequency vs control-word transfer function) with fine stepping (tuning) in several KHz was verified. This augurs the prospect of accurate frequency lock in a BCMOS all digital PLL (ADPLL) application in digital VLSI communication systems. Worst-case jitter due to digital control transitions at pathological control-word boundaries for the BiCMOS DCO was observed to be less than SOps, which is lower than that for the CMOS DCO.

Stability of a Continuous-Time State Variable Filter with Op-amp and OTA-C Integrators [p. 77]

Tim Bakken, John Choma, Jr.

The stability of a continuous-time state variable filter is analyzed using the Routh-Hurwitz criterion. This criterion assesses stability by indicating the number of poles that lie in the right-half plane. The filter is examined separately with integrators implemented with an op-amp and an OTA-C. Both amplifier types are characterized by a dominant-pole frequency response, and the stability of each implementation is compared. HSPICE simulations confirm the theoretical analyses, which indicate that the gain-bandwidth product of the op amps and the bandwidth of the OTAs must be much larger than the desired frequency of operation to ensure stability. Since the analyses assume a dominant-pole response, all higher-order poles of the actual amplifier must also be much greater than the unity-gain frequency to minimize excess phase.

Multiple-Valued Logic Voltage-Mode Storage Circuits Based on True-Single-Phase Clocked Logic [p. 83]

I. Thoidis, D. Soudris, I. Karafyllidis, A. Thanailakis, and T. Stouraitis

A number of novel voltage-mode multiple-valued logic circuits are introduced. Adopting the main features of the true single-phase clocked logic, efficient quaternary logic dynamic and pseudo-static latches, dynamic and static master-slave storage units, and uni-signal controlled pass gates are proposed. These circuits use two kinds of MOS transistors, i.e., enhancement and depletion mode, each of which has two threshold voltages. The proposed circuits exhibit regular, modular, and iterative structure, which means that the MVL circuits are VLSI implementable and can be easily re-designed for any radix of an arithmetic System. Since we use only clock signal, the derived circuits have low power dissipation. Comparisons with existing circuits prove substantial improvements in terms of speed, power consumption, and transistor count.

CMOS Tapered Buffer Design for Small Width Clock/Data Signal Propagation [p. 89]

J. Navarro S. Jr. and Wilhelmus A. M. Van Noije

A new optimization criterion, the propagation of the minimum width pulse through the buffer, is studied for design of tapered buffers draining capacitive loads. Contrary to the classic minimum delay criterion, this one produces buffers which support maximum speed signal propagation. Simulation results for a O.8μm and a O.35μn CMOS processes are analyzed. Semi-empirical relations are proposed to relate the minimum width pulse with the inverter gain ratio, the number of inverters, and the capacitive load. Additionally, a brief study of the delay skew of tapered buffers due to mismatching as a function of the gain ratio is done, showing that no severe degradation appears with small gain ratios. Finally, this work paints out that buffers with small gain ratios should reach higher speeds, nearly 30% over the speed of buffers with gain ratio larger than a factor of 3.

Design of Clock Distribution Networks in Presence of Process Variations [p. 95]

M. Nekili, Y. Savaria, and G. Bois

Tolerance to process-induced skew remains one of the major concerns in the design of large-area and high-speed clock distribution networks. Indeed, despite the availabilirv of some efficient exact-zero skew algorithms that can be applied during circuit design, the clock skew remains an important performance limiting factor after chip manufacturing, and is of increasing concern for sub-micron technologies. This tutorial reviews the importance of the problem, its sources, as well as typical examples of existing solutions, Solutions range from design rules strategies to built-in self-compensation methods.

Design of an 8:1 MUX at 1.7Gbit/s in 0.8�m CMOS Technology [p. 103]

J. Navarro, Jr. and W.A.M. Van Noije

The design of an 8:1 multiplexer Circuit, for SDH/SONET data transmission systems, is presented. In order to achieve maximum transmission rates, new circuits, high speed input/output converters for ECLCMOS levels and modified true single phase clocked (TSPC) cells, as well as new techniques for clock buffer optimization, were applied. The multiplexer implemented in a 0. 8μm CMOS process (0.7μm effective length) achieved 1.7Gbit/s rate and 42.6μW/MHz power consumption at 5V. These results were compared to a previous implementation (in the same process), and to other recently published works, showing superior performances.

Issues in the Design of Domino Logic Circuits [p. 108]

Pranjal Srivastava, Andrew Pua, and Larry Welch

Domino logic circuits have become extremely popular in the design of today's high performance processors because they offer fast switching speeds and reduced areas. However, the use of domino logic introduces many design risks because it is very sensitive to noise, circuit and layout topologies. This paper identifies issues that might cause domino logic circuits to fail, and discusses some possible solutions to alleviate these problems.

A Novel 1.5-V CMOS Mixer [p. 113]

G. Giustolisi, G. Palmisano, G. Palumbo, and C. Strano

New and simple CMOS mixer powered with 1.5 V is presented. It works with a 200MHz clock, and has a -7-dB IP³. Moreover. it elaborates signals up to 150 mV with 1-dB compression point. The particular topology makes it useful for an integration in fully digital ICr.

Analysis of Adaptive CMOS Down Conversion Mixers [p. 118]

C.K. Sandalci and S. Kiaei Can K. Sandalci

Analysis of CMOS direct conversion architecture with adaptive DC offset compensation is presented. Due to process mismatches and local oscillator (LO) crosstalk, DC offsets up to 3OmV are observed at the mixer output. For a practical direct conversion or zero-if down-conversion system, the incoming RF signal can be as low as -lOOdBm or few microvolts at this stage and any LO coupling will cause a DC offset orders of magnitude larger than the received signal. The DC offset needs to be effectively reduced to prevent the consecutive gain stages from entering saturation and destroying the RF signal. To achieve this, an adaptive DC shifting circuit is presented. Adding a tunable DC offset on the LO signal can effectively counteract the output DC offset by exploiting the quadratic LO dependence of the process mismatch induced offsets. In addition to that, DSP approaches for adaptively generating the control signals for the DC shifting circuitry are investigated.
Keywords- Zero-IF, Direct Conversion, Mixer, DC offset

Artificial Neural Network Electronic Nose for Volatile Organic Compounds [p. 122]

Hoda S. Abdel-Aty-Zohdy

Advanced microsystems that include, sensors, interface-circuits, and pattern-recognition integrated monolithically or in a hybrid module are needed for civilian, military, and space applications. These include: automotive, medical applications, environmental engineering, and manufacturing automation. ASICS with Artificial Neural Networks (ANN) are considered in this paper, with the objective of recognizing air-borne volatile organic compounds, especially alcohols, ethers, esters, halocarbons, NH3, NO2, and other warfare agent simulants. The ASIC inputs are connected to the outputs from array-distributed sensors which measure three-features for identifying each of four chemicals. A Specialized Reinforcement Neural Network (RNN) learning approach is chosen for the chemicals classification problem. Hardware implementation of the RNN is presented for 2 μm CMOS process, MOSIS chip. Design implementation and evaluation are also presented.

VLSI Architectures

A VLSI Self-Compacting Buffer for DAMQ Communication Switches [p. 128]

Jos� G. Delgado-Frias and Richard Diaz

This paper describes a novel VLSI CMOS implementation of a self-compacting buffer (SCB) for the dynamically allocated multi-queue (DAMQ) switch architecture. The 8GB is a scheme that dynamically allocates data regions within the input buffer for each output channel. The proposed implementation provides a high-performance solution to buffered communication switches that are required in interconnection networks. This performance comes from not only the DAMQ approach but also the pipelined implementation and novel circuitry. The major components of the 8GB are described in detail in this paper. The system has the capability of performing a read, a write, or a simultaneous read/write operation per cycle due to its pipelined architecture.

A Dictionary Machine Emulation on a VLSI Computing Tree System [p. 134]

A.E. Harvin III and J.G. Delgado-Frias

In this paper, we propose a dictionary machine emulation using a novel VLSI tree structure that operates on the dictionary using a blocking technique. We show that dictionary machine operations can be performed through the implementation of a number of processing and communication tasks overlapped on a simple structure. By manipulating the key-records bit serially, and storing them in an external memory rather than within the layers of the structure, we show that the size of the dictionary i3 limited only by the capacity of the external memory. This structure, which consists of multiple units, can be implemented in VLSI onto a single-chip. The key advantage of our structure is that it provides a means of implementing a high speed and low cost dictionary machine with virtually unlimited capacity; thus, eliminating the need for multiple chips should the dictionary expand. We have that an exhaustive search on a 2048 key-record dictionary can be performed in 29.78 μs.

Modeling and Analysis of the Difference-Bit Cache [p. 140]

Ashutosh Kulkarni, Navin Chander, Soumya Pillai and Lizy John

Advances in VLSI technology and processor architectures have resulted in a tremendous increase in processor speeds and memory capacities. However memory latencies have failed to improve as rapidly, making memory systems the performance bottlenecks in most high performance processor architectures. Caching is a time-tested mechanism to solve this speed disparity. Among the different cache mapping strategies, direct mapping is the only configuration where the critical path is merely the time required to access a RAM. Although direct mapped caches are preferable considering hit-access times, they have poor hit ratios compared to associative caches. The difference-bit cache proposed by Juan, Lang and Navarro (1/, is functionally equivalent to a two-way set-associative cache but tries to achieve an access time smaller than that of a conventional two-way set-associative cache and close to that of a direct-mapped cache. We modeled and analyzed the difference-bit cache to prove the hypothesis of its small access time. We have also tried to prove that the access time advantage of the difference-bit cache improves over the conventional two-way set-associative cache with an increase in the cache size. Finally we have tried to analyze the trade-off involved in applying these techniques to a higher associativity cache.
Keywords: Cache memory, critical path, hit access time, cache mapping strategies

Modeling of Shift Register-Based ATM Switch [p. 146]

Sandeep Agarwal and Fayez El-Guibaly

In this paper, we present the modeling of shift register-based ATM switch to find the cell loss probability, throughput and delay. The results are compared with other switch architectures based on input queueing, input smoothing, output queueing and completely shared buffering. It is observed that although our switch is an input-buffered switch, it's performance us better than other switches based on traditional queueing approaches.

An Architecture of Full-Search Block Matching for Minimum Memory Bandwidth Requirement [p. 152]

Jen-Chien Tuan, Chein-Wei Jen

In this paper an architecture of full-search block matching motion estimation suitable for high quality video is proposed. Minimum memory bandwidth is an important requirement in motion estimation architecture especially when dealing with high quality video such as large frame size video. Memory bandwidth will increase to an unrealistically high value without careful consideration, which no cost efficient solution can afford it. This architecture is designed for overcoming the frame memory bandwidth bottleneck by exploiting the maximum data reuse property. This is done by setting up local memory for storing frame data. The size of local memory is also optimized to near minimum value, only little overhead is introduced. Due to the reduction of memory bandwidth, the costs of frame memory modules, I/O pin count and the power consumption can be reduced but 100% hardware efficiency is still achieved. Simple and regular interconnections is featured to ensure high speed operation by an efficient and distributed local memory organization.

MPEG-2 Video Decoder for DVD [p. 157]

Nien-Tsu Wang, Chen-Wei Shih, Duan Juat Wong-Ho, Nam Ling

A video decoder with an efficient controller scheme and a sub-picture decoder for DYD application is presented in this paper. Most of the reported architecture for MPEG2 video decoding uses a 64 bit bus and a complex bus arbitration scheme. Our design uses synchronous DRAMS instead of standard EDO DRAMS and involves a novel controller scheme that allocates bus space for DRAM access efficiently. This efficient allocation allows us to reduce bus width from 64 bits to 32 bits, without significantly increasing embedded buffer sizes, and stilt meeting the requirements for MPEG2 MP@ML decoding. The bus arbitration algorithm Is also simple allowing for a less complex controller design. Our main strategy Is to impose a certain order in the DRAM access by the various processes instead of allowing any process to request for bus access arbitrarily. We also take advantage of the restricted GOP(group of picture) sequence In the DYD format to allow a longer decoding time for B frames. The sub-picture pixel data are run-length compressed bitmaps that are overlayed on top of the MPEG reconstruction video. The architecture for sub-picture decoding is simple and easy to implement.

A Self Timed Asynchronous Router for an Heterogeneous Parallel Machine [p. 161]

Eric SENN, Bertrand ZAVIDOVIQUE

This paper describes the implementation of the self timed asynchronous router in a parallel machine. The heterogenous architecture of the machine is outlined, then the need for asynchronous operations is explained, and the interest of an asynchronous network control. The specification and VLSI design of the router are exhibited with its measured performances.

Non-Refreshing Analog Neural Storage Tailored for On-Chip Learning [p. 168]

BA. Alhalabi, Q. Malluhi, and R. Ayoubi Bassem A. Alhalabi, Qutaibah Malluhi, Rafic Ayoubi

In this research, we devised a new simple technique for statically holding analog weights, which does not require periodic refreshing. It further contains a mechanism to locally update the weights from the analog back-propagation signals for fast on-chip learning. In this circuit, the weight is stored as a 5-bit digital number, which controls the gates of five pass transistors allowing five binary-weighted (1,2,4,8,16) voltage references to integrate at a voltage adder. The output of the voltage adder is the analog weight. The 5-bit register is designed as an up/down counter so that every pulse on the up/down input will increase/decrease the weight by one level out of 32 possible levels. The learning circuit takes the analog graded error signal and generates two pulse streams for up/down counting depending on the sign of the error signal. The duration of the pulse stream is proportional to the magnitude of the error signal. This complete modular synaptic body (storage and learning technique) is appropriate for large scaleable analog VLSI neural networks because it handle recall and learning operations at the same speed with full parallelism.

VLSI Arithmetic

Residue to Binary Number Converters for (2ⁿ - 1,2ⁿ,2ⁿ + 1) [p. 174]

Yuke Wang, Xiaoyu Song, Mostapha Aboulhamid

This paper proposes three new residue-to-binary converters using 2n- bit or n-bit adders for the three moduli residue number system of the form (2ⁿ - l,2ⁿ,2ⁿ + 1). The 2n- bit adder based converter is faster and requires about half of the hardware required by previous methods. For n-bit adder based implementations, one new converter is twice as fast as the previous method using similar amount of hardware; while another new converter achieves improvement in both speed and area.

The Design of Residue Number System Arithmetic Units for a VLSI Adaptive Equalizer [p. 179]

Inseop Lee and W. Kenneth Jenkins

This paper presents the design details of an experimental ASIC for an all-digital adaptive equalizer. In this design, the LMS algorithm is chosen because of its simplicity. The adaptive equalizer design, which is based on an RNS architecture, consists of an RNS multiplier. an RNS adder, an RNS filter, a binary-to-residue converter, a residue-to-binary converter. and an update algorithm. The design is verified by a high level hardware simulation tool. The designs of all these units are discussed in this paper.

An Efficient Residue to Weighted Converter for a New Residue Number System [p. 185]

Alexander Skavantzos

The Residue Number System (RNS) is an integer system appropriate for implementing fast digital signal processors since it can support parallel, carry-free, high-speed arithmetic. in this paper a new RNS 5 stem and an efficient implementation of its residue- to-weighted converter are presented. The new RNS is a balanced 5-moduli system appropriate for large dynamic ranges. T he new residue-to-binary converter is very fast and hardware-efficient and is based on a l's complement multioperand adder adding operands of size only 80% of the size of the system's dynamic range.

The Chinese Abacus Method: Can We Use It for Digital Arithmetic? [p. 192]

Franco Maloberri, Clien Gang

This paper discusses how to apply the approach used in the Chinese Abacus to implement digital arithmetic. Firstly, we examine the representations and the basic techniques used in the Chinese Abacus: then, we propose a MOS realization of the basic functions required; finally, we discuss a novel 12 bit full adder based on the Chinese Abacus method. Simulations of 0.5 μm CMOS realizations showed that a parallel solution can run at 200 MHz while a pipeline realization can achieve I GHz of clock frequency. The complexity of the circuit is quite limited: thus, the use of the Chinese Abacus approach results a competitive technique with respect to conventional methodologies.

Merged Arithmetic for Computing Wavelet Transforms [p. 196]

Gwangwoo Choe and Earl E. Swartzlander, Jr.

A variation of merged arithmetic is applied to the implementation of the wave/ct transform. This approach offers a simple design trade-off between the computational accuracy and the complexity. Our analysis shows that the trade-off is a function of the input data resolution, the number of,filter raps, the arithmetic precision, and the level of the wave/ct transform. The design parameter can be also fixed for a given number of taps and used to determine the minimum word size for the wavelet coefficients of the transform. The key element of this approach is to introduce a "truncation" within the merged arithmetic reduction process which provides equivalent throughput with a substantially less complexity. An experiment has been conducted to verify the analysis, which suggests that 24-bit merged arithmetic is required for the EZW algorithm to handle up to a level 6-wave/ct transform.

Digital Arithmetic Using Analog Arrays [p. 202]

S. Sadeghi-Emamchaie, G.A Jullien, V. Dimitrov, and W. C. Miller

This paper describes techniques for using locally connected analog Cellular Neural Networks (CNNs) to implement digital arithmetic arrays; the arithmetic is implemented using a recently disclosed Double-Base Number System (DBNS). The CNN arrays are targeted for low power low-noise DSP applications where lower slew rate during transitions is a potential advantage. Specifically, we demonstrate that a CNN array, using a simple nonlinear feedback template, with hysteresis, can perform arbitrary length arithmetic with good performance in terms of stability and robustness. The principles presented in this paper can also be used to implement arithmetic in other number systems such as the binary number system.

A Combined Interval and Floating Point Multiplier [p. 208]

James E. Stine and Michael J. Schulte

Interval arithmetic provides an efficient method for monitoring and controlling errors in numerical calculations. However, existing software packages for interval arithmetic are often too slow for numerically intensive computations. This paper presents the design of a multiplier that performs either interval or floating point multiplication. This multiplier requires only slightly more area and delay than a conventional floating point multiplier, and is one to two orders of magnitude faster than software implementations of interval multiplication.

Testing

Test Compaction for Synchronous Sequential Circuits by Test Sequence Recycling [p. 216]

Irith Pomeranz and Sudhakar M. Reddy

We introduce a new concept for test sequence compaction referred to as recycling. Recycling is based on the observation that easy-to-detect faults tend to be detected several times by a deterministic test sequence, whereas hard-to-detect faults are detected once towards the end of the test sequence. Thus, the suffix of a test sequence detects a large number of faults, including hard-to-detect faults. The recycling operation keeps a suffix S1 of a test sequence T1 and discards the rest of the sequence. The suffix S1 is then used as a prefix of a new test sequence T2. In this process, S1 is expected to detect the more difficult to detect faults as well as many of the easy-to-detect faults, resulting in a new sequence ~'2 which is shorter than T1. Recycling is enhanced by a scheme where several faults are targeted simultaneously to generate the shortest possible test sequence that detects all of them.

Random Self-Test Method Applications on PowerPC^TM Microprocessor Caches [p. 222]

Rajesh Raina,Robert Molyneaux

This paper describes a novel method for generating test stimuli for digital systems. By taking advantage of certain properties of the Design Under Validation, the method can be used to generate test stimuli that is random as well as self-testing. We discuss the requirements and limitations of this method on practical designs. The use of this merthod for High-Level Design Validation of caches in PowerPC^TM microprocessors is also described. The paper concludes by identifying areas where further work is needed.
Topic Areas: High-Level Design Validation, Silicon Validation, Pseudo-Random Testing, Microprocessor Testing.

A Unified Approach for a Time-Domain Built-In Self-Test Technique and Fault Detection [p. 230]

B. Provost, A.M. Brosa, and E. Sánchez-Sinencio

Being able to fully test a circuit is an important issue for quality manufacturing. Unlike fault analysis for digital circuits, analog fault analysis has been comparatively slow to evolve. The purpose of this paper is to study the feasibility of the time domain response analysis as a test method for analog circuits'. The approach was to first study the fault coverage obtained by testing the main parameters of the new NGCC amplifier, which shows the feasibility of built-in self test in time-domain. A circuit macromodel to implement a time-domain built-in self-test circuit was then proposed.

VHDL Testability Analysis Based on Fault Clustering and Implicit Fault Injection [p. 237]

F.S. Bietti, F. Ferrandi, F. Fummi, and D. Sciuto

Testability analysis of VHDL sequential models is the main topic of this paper. We investigate the possibility to obtain information about the testability of a sequential VHDL description before its actual synthesis. The analysis is based on an implicit fault model that injects faults into a BDD based description cx-traded from the VHDL representation. Such an injection is related to the original VHDL representation thus allowing the identification of potential testability problems before RTL and logic synthesis. Fault injection is performed efficiently by exploiting the concept of fault clustering, that is, the possibility of grouping faults and analyzing them concurrently. The proposed methodology is applied to benchmarks for efficiency evaluation and to a real VHDL description.

IDD Waveforms Analysis for Testing of Domino and Low Voltage Static CMOS Circuits [p. 243]

Hendrawan Soeleman, Dinesh Somasekhar, and Kaushik Roy

This paper describes a test method which relies on the actual observation of supply current ('IDD) waveforms. The method can be used to supplement the standard IDDQ test method and it can be easily applied to dynamic and low V_DD, low V_T CMOS circuits, The method allows us to detect faults which may not be detected by IDDQ test methods, and is sensitive enough to detect potential faults, which do not manifest themselves as functional errors. A simple built-in current sensor, which proves to be adequate in verifying the feasibility of using the I_DD waveforms analysis is proposed to safely observe the current waveforms without significantly changing the waveforms.

A Design-for-Testability Technique for Detecting Delay Faults in Logic Circuits [p. 249]

K. Raahemifar and M. Ahmadi

This paper provides a simulation-based study of the delay fault testing in logic circuits. It is shown that delay testing is necessary in order to achieve a high defect coverage. By detecting delayed time response in a transistor circuit, three types of faults are detected: 1) faults which cause delayed transitions at the output node due to some open defects, 2) faults which cause an intermediate voltage level at the output node, and 3) most stuck-at faults which halt the circuit at '1' or '0'. An on-line checker is presented which enables the concurrent detection of delay faults. Since one checker is used for each output signal, the area overhead is minimal. This technique does not degrade the speed of the circuit under test (CUT). We show that the test circuit is independent of the size of the CUT. Simulation results show that this technique can be adjusted to fit to any design style.

VLSI Communication Circuits and Systems

Development of a CMOS Cell Library for RF Wireless and Telecommunications Applications [p. 258]

Robert H. Caverly

There is increasing interest in the use of CMOS circuits for highly integrated high frequency wireless telecommunications systems. This paper presents the results of on-going work into the development of a cell library that includes many of the circuit elements required for the high frequency sub-system of a communications integrated circuit. The cells were fabricated using standard MOSIS processes and measurement results are presented. The full design files, testing results and circuit tutorials describing the cells and how they interface with baseband circuits are available from the author.

Design Issues of LC Tuned Oscillators for Integrated Transceivers [p. 264]

C. Samori, A.L. Lacaita, A. Zanchi, and P. Vita

VCO for wireless receivers must fulfill tight requirements of phase noise and their complete integration in silicon VLSI technologies is still an open issue due to the low quality factor of the inductors. In this paper we address some of the constraints met in the design of low noise oscillator stages: the tank topology and its quality factor, the dynamics of the transconductor stage and its loading effects, the limitation resulting from the AM- to-PM conversion.

Novel Simple Models of CML Propagation Delay [p. 270]

M. Alioto and G. Palumbo

Accurate and simple models of CML propagation delay are given. The approach used is new. The propagation delay is represented with a few terms, providing a better insight into the relationship between delay and its electrical parameters. which in turn are related to process parameters. The most accurate model has a typical and worst case errors as low as 2% and 5%, respectively.

Next Generation Narrowband RF Front-Ends in Silicon IC Technology [p. 275]

John R. Long

It is anticipated that the next generation of wireless systems will deliver voice and data services at carrier frequencies extending up to 6GHz. The front-end circuits for these radios must be aggressively designed in order to deal with issues such as analog and digital compatibility; higher linearity imposed by broadband signal processing at 1F, low supply voltage to minimize size, weight and power consumption, as well as operation in multiple frequency bands. The challenges and opportunities facing the designer of these radio frequency (RF) front-end IC's in silicon will be addressed in this paper from both the technological and circuit perspectives.

Low Voltage Low Power CMOS AGC Circuit for Wireless Communication [p. 281]

Hassan 0. Elwan, Mohammed Ismail

This paper describes a new technique for realizing CMOS digitally controlled, dB-linear variable gain amplifier (VGA) circuit. The circuit is developed taking into account system level issues for a direct conversion receiver. Besides being effective and simple to use from a system point of view, the developed VGA offers precise gain control, high linearity and low power consumption. The circuit can operate in a current domain or a voltage domain mode with single ended or fully differential signal handling capability. The proposed VGA circuit is implemented using a novel class AB operational transconductance amplifier and current division networks. Simulation results are included.

A Continuous-Time Switched-Current ΣΔ Modulator with Reduced Loop Delay [p. 286]

Louis Luh, John Choma,Jr., Jeffrey Draper

A novel architecture for a second-order continuous-time switched-current ΣΔ modulator is presented. The loop delay is reduced by predicting the states of the second integrator and feeding the predicted states to the comparator. The predicted states are generated by summing three scaled current mode signals. A Gain-Manager is used to accurately control the integrator gain to generate the predicted states and stabilize the system. A newly designed high-speed current- mode comparator is capable of summing the three scaled current inputs and comparing them. With a 50 MHz sampling rate, ii has achieved 60 dB dynamic range (10-bit) at 1 MHz. The modulator has been fabricated in a 2um CMOS process with an active area of 0.37 mm2. The power dissipation is 16.6 mW from a 5V single power supply.

Design Methodologies and CAD Tools Algorithms

An Exact Input Encoding Algorithm for BDDs Representing FSMs [p. 294]

Wilsin Gosti, Tiziano Villa, Alexander Saldanha, Alberto L. Sangiovanni-Vincentelli

We address the problem of encoding the state variables of a finite state machine such that the BDD representing its characteristic function has the minimum number of nodes. We present an exact formulation of the problem. Our formulation characterizes the two BDD reduction rules by deriving conditions under which these reduction rules can be applied. We then provide an algorithm that finds these conditions and solves the problem by formulating it as a 2-CNF formula and extracting all its prime implicants. in addition to this, we implemented a simulated annealing algorithm for this problem and provide a thorough experiment of the impact of encoding on a BDD representing an FSM with different orderings.

Maximum Current Estimation in Programmable Logic Arrays [p. 301]

S. Bobba and I.N.Hajj

Programmable logic array (PLA) is a circuit realization for the two-level sum of products representation of a multi-output Boolean function. The current drawn by a PLA is input dependent and it makes the problem of estimating the maximum current intractable. Integrated circuit reliability and signal integrity are related to the maximum current drawn by the circuit. Hence, an estimate of the maximum current is required for the design of a reliable VLSI circuit. In this paper, we present an input pattern-independent algorithm to obtain the estimate of maximum and minimum currents drawn by a PLA over all possible input vectors. Experimental results on several benchmark circuits and comparisons with exhaustive simulations are also included in this paper.

Mutually Disjoint Signals and Probability Calculation in Digital Circuits [p. 307]

Vishwani D. Agrawal, Sharad Seth

Signal probability calculation in circuits where signals are not independent is generally erpensive. We show that some correlated signals may be mutually disjoint. In such cases, the probability calculation can be as simple as it is for independent signals. For example, two signals that cannot be simultaneously true are defined as OR-disjoint. If these signals feed an OR gate, the probability of the output being true is simply the sum of the probabilities of inputs being true. We give an implication-based algorithm for identifying disjoint signals. Examples of large adders illustrate how the identification of disjoint signals simplifies the probability calculation.

Identifying High-Level Components in Combinational Circuits [p. 313]

Travis Doom, Jennifer White, Anthony Wojcik, Greg Chisholm

The problem of finding meaningful subcircuits in a logic layout appears in many contexts in computer-aided design. Existing techniques rely upon finding exact matchings of subcircuit structure within the layout. These syntactic techniques fail to identify functionally equivalent subcircuits which are differently implemented, optimized, or otherwise obfuscated. We present a mechanism for identifying functionally equivalent subcircuits which is capable of overcoming many of these limitations. Such semantic matching is particularly useful in the field of design recovery.

Local Optimality Theory in VLSI Channel Routing: Composite Cyclic Vertical Constraints [p. 319]

Anthony D. Johnson

Local Optimality paradigm is applicable to all combinatorial optimization problems. Its direct field of application are the constructive solution algorithm; its main advantage is the low computational cost for multiple high quality initial solutions for iterative improvement algorithms. The application of the paradigm to the VLSI channel routing has necessitated the creation of new knowledge represented by the theory of locally optimal breaking (LOB) of directed circuits (DC) in the vertical constraint graph. Existing theory has supported deterministic polynomial time algorithms for LOB of two classes of directed circuits, the classes of vertex disjoint DC'S, and of couples of connected DC'S. The new LOB theory supports algorithms for more complex classes of any number of DC'S sharing a single vertex and of a uniform lattices of DC's. It is significant that the new theory relies on theory for couples of connected DC'S for breaking more complex structures of connected DC'S

Linear Transformations and Exact Minimization of BDDs [p. 325]

Wolfgang Günther, Roif Drechsler

We present an exact algorithm to find an optimal linear transformation for the variables of a Boolean function to minimize its corresponding ordered Binary Decision Diagram (BDD). To prune the huge search space, techniques known from algorithms for finding the optimal variable ordering are used. This BDD minimization finds direct application in FPGA design. We give experimental results for a large variety of circuits to show the efficiency of our approach.

Timed Supersetting and the Synthesis of Large Telescopic Units [p. 331]

L. Benini, G. De Micheli, A. Lioy, E. Macii, G. Odasso, and M. Poncino

In high-performance systems, variable-latency units are often employed to improve the average throughput when the worst-case delay emceed, the cycle time. Although such units have traditionally been hand-designed, recent results have shown that variable-latency units can be automatically generated. Unfortunately, the existing synthesis procedure has limited applicability due to its computational complexity. In this work, we define and study an optimization problem, timed supersetting, whose solution is at the kernel of the procedure for automatic generation of variable-latency units. We contribute a new algorithm for solving timed supersetting in the most difficult case, that is, when the timing behavior of the circuits is expressed through an accurate delay model. The proposed solution overcomes the complexity limitation of previous approaches, and its robustness is experimentally demonstrated by obtaining high-throughput, variable-latency implementations for all the largest circuits in the Iscas'85 and Iscas'89 benchmark suites.

Tabu Search Based Circuit Optimization [p. 338]

Sadiq M. Sait, Habib Youssef, and Munir M. Zahra

In this paper we address the problem of optimizing mixed CMOS/BiCMOS circuits. The problem is formulated as a constrained combinatorial optimization problem and solved using an tabu search algorithm. Only gates on the critical sensitizable paths are considered for optimization. Such a strategy leads to sizable circuit speed improvement with minimum increase in the overall circuit capacitance. Compared to earlier approaches. the presented technique produces circuits with remarkable increase in speed (greater than 20%) for very small increase in overall circuit capacitance (less than 3%).
Keywords:Tabu Search, Circuit Optimization, Search Algorithms, CMOS/BiCMOS, Mixed Technologies, Critical Path, False Path.

On the Characterization of Multi-Point Nets in Electronic Designs [p. 344]

Dirk Stroobandt, Fadi I. Kurdahi

Important layout properties of electronic designs include interconnection length values, clock speed, area requirements, and power dissipation. A reliable estimation of those properties is essential for improving placement and routing techniques for digital circuits. Previous work on estimating design properties failed to take multi-point nets into account. All nets were assumed to be 2-point nets (especially for estimating the number of nets). In this paper we aim at characterizing multi-point nets in electronic designs. We will develop a model for the behaviour of multi-point nets during the partitioning process. The resulting distribution of nets over their net degree will be validated through comparison with benchmark data

Formal Verification

HOOVER: Hardware Object-Oriented Verification [p. 351]

Mostafa M. Aref and Khaled M. Elleithy

In this paper a new formal hardware verification approach based on object oriented techniques is presented. The HOOVER system (Hardware Object Oriented VERification) is described. A cell library of different hardware components has been implemented as classes. Components in the cell library are described at the transistor level, gate level, and logical level, and functional level. The verification of a CMOS inverter and 1-bit CMOS adder using HOOVER is given in the paper.

MDG-Based Verification by Retiming and Combinational Transformations [p. 356]

O.A. Mohamed, E. Cerny, and X. Song

Multiway Decision Graphs (MDGs) have been recently proposed as art efficient verification tool for RTL designs based on an efficient representation mechanisms. In MDG, a data value is represented by a single variable of abstract sort, and a data operation i.s represented by an uninterpreted function symbol. In this work we investigate the non- termination problem of MDG-based verification. We present a novel approach to dealing with the problem based on retiming and circuit transformations that preserve the behavior of the circuit. We demonstrate the effectiveness of our method on the example of the Island Tunnel Controller (JTc,).
Keywords: Formal Verification. Multiway Decision Graphs, Retiming, Circuit Transformations, Non-termination.

Practical Considerations in Formal Equivalence Checking of PowerPC^TM Microprocessors [p. 362]

A. Chandra, L.-C. Wang, and M. Abadir

Recently, formal verification is becoming more a part of the VLSI design methodology. Formally verifying a design guarantees 100% coverage and negates the need to do simulation. Theoretically, 100% coverage is very appealing and formal verification looks to be the panacea to solve the coverage problem. However, there are many practical considerations in deploying formal verification in real design environments. These considerations if not evaluated can lead to ineffective and even erroneous formal verification methodologies. In this paper we show how to make formal verification a successful part of a design methodology by paying attention to practical considerations and knowing the limitations offormal verification. We show the errors that can result by making over generalized assumptions and how they can be avoided. We do this in the context of the design of PowerPC microprocessors. We limit ourselves to a for,nal verification technique commonly used in our design methodology--boolean equivalence checking.

Practical Approaches to the Automatic Verification of an ATM Switch Fabric Using VIS [p. 368]

J. Lu and S. Tahar

In this paper we present several practical methods for formally verifying an Asynchronous Transfer Mode (ATM) network switching fabric using the Verification Interacting with Synthesis (VIS) tool. We produced Verilog RTL behavioral and netlist structural descriptions of the switch fabric at different levels of hierarchy and established several abstracted models of the fabric. Using various techniques presented in the paper, we provided a number of relevant liveness and safety properties expressible in CTL, and accomplished their verification in reasonable CPU time. Moreover, we performed equivalence checking between the structural and behavioral descriptions of each submodule of the implementation hierarchy.

Design Methods

Performance Optimization of Self-Timed Circuits [p. 374]

M.A. Franklin and P. Prabhu

In this paper, we present methods for improving the performance of self-timed computation blocks. The Hybrid Completion method permits the design of a spectrum of completion circuits ranging from those based on pure bounded delays to those based on full complementary circuit development. This is achieved by using a subset of the outputs of the computation block to generate the overall completion signal. Thus, the ertra circuitry for the completion signals of the other outputs is eliminated. The computation block's delay might also be reduced since fewer signals are required to generate the overall completion signal. The approach seeks to incorporate the area efficiency of the bounded delay approach and the operand based, delay sensitivity of the full complementary approach.

Stochastic Evolution Algorithm for Technology Mapping [p. 380]

A.S. Al-Mulhem, A. Amin, and H. Youssef

A new technology mapper (SELF-Map) for Look-Up Table (L UT,) based Field Programmable Gate Arrays (FPGAs) is described. SELF-Map is based on the Stochastic Evolution (SE) algorithm. The state space model of the problem is defined and suitable cost function which allows optimization for area. delay, or area-delay combinations is proposed. Experimental re-suits show that SELF-Map has an overall better performance compared to other algorithms reported in the literature.

RCRS: A Framework for Loop Scheduling with Limited Number of Registers [p. 386]

Kaisheng Wang, Ted Zhihong Yu, Edwin H. -M. Sha

Many real time applications such as multimedia and DSP systems require high throughput, so it is necessary to have special purpose designs for them. Loop pipelining is an effective approach to reduce the total execution time of loops. While most previous research concentrates on the scheduling of computation, the experiments show that data access may give significant overhead if the register resource is limited. This paper studies the register constraint problem and presents Register Constrained Rotation Scheduling (RCRS). including the algorithm analyzing the number of required registers for loops and two classes of algorithms based on different assumptions. The first class is for loop scheduling with a given number of registers. If the number of registers is too stringent, the second class of algorithms are applied by inserting necessary LOAD/STORE operations into the loop schedule. Through the series of experiments, the RCRS algorithms are shown to achieve near optimal schedule length while satisfying register constraints.

A Quantitative Study of the Benefits of Area-I/O in FPGAs [p. 392]

Herwig Van Marck, Jo Depreitere, Dirk Stroobandt and Jan Van Campenhout

Designs targeted for FPGAs are becoming increasingly larger and more complex. The need for i/O often surpasses the number of I/O pads that can be provided at the perimeter of the FPGA chip. As a result, these designs have to be implemented in larger FPGAs, the size of which is fixed by the number of I/O pads and not by the logic needed, reducing the performance of the implementation. Providing FPGA chips with i/O pads that are spread out across the whole chip area drastically reduces this problem. In this paper we present a quantitative analysis of the impact of area-i/O in FPGAs.

Top-Down Design Using Cycle Based Simulation: An MPEG A/V Decoder Example [p. 400]

Dale E. Hocevar, Ching-Yu Hung, Dan Pickens and Sundararajan Sriram

This paper presents a discussion of a top-down VLSI design approach which involves system level performance modeling, block level cycle based simulation, RTL/VHDL simulation and gate level emulation. An MPEG-2 Audio/Video decoder design example illustrates the use of this top-down approach. Most of the discussion con cent rates on the concept of block level cycle based (BLCB) simulation. HW/SW co-design also played an important role in this work and our approach towards such co-design is discussed as well.

Low Power

Low-Power Driven Scheduling and Binding [p. 406]

Jim Crenshaw and Majid Sarrafzadeh

We investigate the problem of exploiting signal correlation between operations to find a schedule and binding which minimizes switching. We propose several heuristics to solve the problem. Experimentally, we give an algorithm for scheduling communications on a bus, which reduces bus switching up to 60%, without increasing the number of cycles required for the schedule. Low-power scheduling efforts in the literature have focused on decreasing the number of cycles in the schedule so that the voltage required to run the resulting circuit can be lowered. However, the number of voltages supplied to a chip is likely to be limited, so among the processes to be implemented, typically only a few will determine the minimum voltages, and the rest will have slack in their schedules. Therefore it is interesting to inquire about the impact of scheduling which does not reduce the number of time steps in order to decrease switching. In this paper. we show that power- aware scheduling can lead to significant decreases in switching, often without an increase in the number of time steps required. The technique is general, and can be used to schedule operations in any kind of resources.

Effective Capacitance Macro-Modelling for Architectural-Level Power Estimation [p. 414]

Muhammad M. Khellah and M. I. Elmasry

This paper presents a simple, yet efficient method to characterize the effective capacitance in data-path macros for architectural-level power estimation. Given a library of hard-macros, a capacitance model based on linear regression is derived for each macro. A transistor-level tool is employed for capacitance extraction. The capacitance models can be used during architectural-level power estimation. Unlike previous approaches, our characterization methodology assumes no specific word-level statistics of the input data, requires little knowledge about the structure of the modules, allows the user to trade-off accuracy and characterization time, and propagates effective capacitance directly from transistor- level (real) implementations. Simulation experiments on a set of data-path components with various sizes are performed. Compared to a previously published approach [I], our scheme significantly improves the accuracy of RTL power estimation and produces results within 15% from a transistor-level tool on the average.

A Methodology for High Level Power Estimation and Exploration [p. 420]

V. Krishna and N. Ranganathan

Effective power reduction can be achieved at higher levels of design abstraction. A number of such techniques have been proposed for power optimization in the literature. These techniques use RT level templates which characterize the area, delay and power of the design. The templates are based on some knowledge of the logic block such as the number of nodes, levels and their interconnections. Methods which model the power consumption of a logic block whose internal details are not known are desirable to explore trade-offs early on in the design cycle. Recently, lower bounds for switching activity at the gate level based on decision theor:y have been proposed by the authors. This has been extended to derive the average switching activity of a module based solely on its functionality. The experimental results on ISCAS '85 benchmark circuits indicate that the approach gives reasonably accurate estimates at low computational cost. In this paper, we use the RT level estimates for power exploration at the behavioral level for various high level synthesis benchmarks. The experimental results show that appropriate design decisions can be taken at the high level to reduce the cost of redesigning which would be incurred if committed to a particular circuit structure.
Keywords: High Level Designs, Power Estimation, Low Power Designs. Switching Activity

How to Transform an Architectural Synthesis Tool for Low Power VLSI Designs [p. 426]

S. Gailhard, N. Julien, J. -Ph. Diguet, and E. Martin

High Level Synthesis (HLS) for Low Power VLSI design is a complex optimization problem due to the Area/Time/Power interdependence. As few low power design tools are available, a new approach providing a modular low power synthesis method is proposed. Although based for the moment on a generic architectural synthesis tool Gout, the use of different "commercial" tools is possible. The Gaut_w HLS tool is constituted of low power modules High level power dissipation estimation, Assignment, Module selection (operators and supply voltage), Optimization criteria and Operators library. As illustration, power saving factors on DWT algorithms are presented.

Database for CAD

Sharing Electronic Design Data Via Semantic Spaces [p. 432]

K.C. Davis, S. Venkatesan, and L.M.L. Delcambre

Electronic Design Automation (FDA) tools, such as layout generators and simulators, have generally focused on algorithms and techniques for hardware design. Data management aspects have not been emphasized. but the volume of data, heterogeneity of data formats, and the evolution/proliferation of tools have made data modeling and data interchange increasingly importart research issues. The data sharing problem stems from the fact that related EDA tools are often used in various sequences to manipulate and annotate a single design. In a typical design environment, tools use file-based data storage, with limited data modeling capabilities, and primitive or non-existent query facilities. In order to support current tools, we wish to preserve the semantics of existing hardware description languages in rigorous data models; we propose to capture each existing language in a semantic space model. We view data interchange as a query against one semantic space that produces objects, i.e., query answers, in another semantic space. We define an integrating meta-model, the meta-space, and also deftne general query operators for transforming objects between semantic spaces. These query operators deftne both the intension and extension of a query result; the transformed data is described in the type system of the meta-space, thus providing explicit semantics for the shared data. Our modeling approach supports advanced and evolving applications, such as hardware/software codesign, through the ability to retrieve data resident in individual semantic spaces, as well as to share data in semantic spaces from different FDA sources.

VHDL-Based EDA Tool Implementation with Java [p. 440]

R. Miller

As part of ARPA 's RASSP Technology Program, we developed a Hardware/software CoSynthesis Algorithm that was prototyped in C++. This initial prototype was developed in C++ using PCCTS, Tcl, Tcl/TK, and Tcl-DP. When this environment became unwieldy due to changing hardware development platforms and software are packages, a second prototype was built in Java. This paper describes architectural features of the prototype and how they were addressed in Java.

Standard Data Representations for VLSI Algorithm Development [p. 446]

D. Hertweck, M. Nica, S. Park, and C. Purdy

Because so many important problems arising in VLSI design are NP-hard, VLSI algorithms must employ randomization techniques or heuristics. Thus the process of analyzing a new algorithm or of comparing two algorithms is at present an experimental one. Consequently, progress in VLSI algorithm development must be based on references to standard benchmarks. Yet examination of literature on specific problems, such as graph partitioning, shows that such standardization is not yet a reality. Here we describe a system, Circuit base, which we are developing to address the standardization problem. Circuitbase will combine the extensive graph manipulation routines of Knuth 's Stanford Graphbase package with actual circuit examples from the Benchmark Archives at CBL, standard routines for generating random examples of circuits, and standard methods for algorithm analysis. We describe Circuit base versions of example behavioral, structural, and physical views of a VLSI circuit and discuss how Circuitbase can support modern VLSI design environments.

A Storage Structure for Graph-Oriented Databases Using an Array of Element Types [p. 452]

T. Hochin and T. Tsuji

This paper proposes a storage structure for graph-oriented databases called the flattened separable directory method. In this method, a data representing graph, which is a unit of representing graph, is primarily represented with an array of edge or node types. As every node or edge can be accessed without navigation, the values of nodes and/or edges can be quickly evaluated. Experimental evaluations support this characteristics, and clarify that the performance of inserting data is high, and less storage overhead is needed in the case of the graphs consisting of many node and edge types.