SIGDA, Super Compendium, GLSVLSI 2002, Abstracts

GLSVLSI 2002 ABSTRACTS

Sessions: [1] [2] [3] [4] [5] [6] [7]

(F) Full Paper
(S) Short Paper

Keynote Address

A VLSI System Perspective for Microprocessors Beyond 100 nm

Shekhar Borkar (Circuits Research Lab, Intel Corp.)

Microprocessor performance increased by five orders of magnitude in the last three decades. This was made possible by continued technology scaling, improving transistor performance to increase frequency, increasing integration capacity to realize complex architectures, and reducing energy consumed per logic operation to keep power dissipation within limit. The technology treadmill will continue to fulfill the microprocessor performance demand; however, with some adverse effects posing barriers. Therefore, performance at any cost will not be an option; significant improvements in efficiency of transistor utilization will be necessary. This paper will discuss potential solutions in all disciplines, such as microarchitecture, circuits, design technologies and methodologies, thermals, and power delivery to overcome these barriers in technologies beyond 100 nm.

Session 1: Low Power Design

Chair: Mircea Stan (University of Virginia)

(F) Power and CAD Considerations for the 1.75MByte, 1.2Ghz L2 Cache on the Alpha 21364 CPU [p. 1]

J. Grodstein (Compaq Computer, US), R. Rayess (Intel Corporation, USA), T. Truex, L. Shattuck (Compaq Computer, USA), S. Lowell, D. Bailey (Intel Corporation, USA), D. Bertucci (Compaq Computer, USA), G. Bischoff (Intel Corporation, USA), D. Dever (Compaq Computer, USA), M. Gowan (Intel Corporation, USA), R. Lane, B. Lilly (Compaq Computer, USA), K. Nagalla, R. Shah, E. Shriver (Intel Corporation, USA), S.-H. Yin, S. Morton (Compaq Computer, USA)

A 1.75 MByte L2 cache has been designed and fabricated as part of the Alpha 21364 microprocessor[1] (Figure 1), in a .18� bulk CMOS process. The cache was designed to run at 1.2 GHz, and pass-1 samples confirm this. While Alpha CPUs are known primarily for high speed, the combination of package constraints and a tight schedule forced careful attention to the integrated whole of power expenditure and the interaction of CAD with design. The cache consumes only 7% of total die power.
Categories and Subject Descriptors
B.3.1 [Memory Structures]: Semiconductor Memories - SRAM
B.2.2 [Memory Structures]: Design Styles - cache memory
B.7.1 [Integrated Circuits]: Types and Design Styles - Microprocessors.
General Terms
Design, Performance, Verification.
Keywords
Cache memory, CPU, low-power, timing verification, logic verification.

(F) Multi-Voltage Low Power Convolvers Using the Polynomial Residue Number System [p. 7]

V. Paliouras (University of Patras, Greece), A. Skavantzos (Louisiana State University, USA), T. Stouraitis (University of Patras, Greece)

A novel approach for the reduction of the power dissipated in a signal processing application is introduced in this paper. By exploiting the properties of the Polynomial Residue Number System (PRNS) and of the arithmetic modulo (2ⁿ +1), the power dissipation of implementing cyclic convolution is reduced up to four times. Furthermore, the corresponding powerxdelay product is reduced up to 2.4 times, while a simultaneous reduction of area cost is achieved. The particular performance improvement becomes possible by introducing a way to minimize the forward and inverse conversion overhead associated with PRNS. The introduced minimization exploits the fact that for the conversions for particular lengths of data sequences and particular moduli, only multiplications with powers of two and additions are required, thus leading to low implementation complexity. In addition multiple supply voltages are utilized to further reduce power dissipation by more than 30% for particular cases. Formulas that return the applicable supply voltage values per PRNS channel are derived in this paper.
Categories and Subject Descriptors
B.7.1 [Hardware]: Types and Design Styles; B.5.1 [Hardware]: Design; C.3 [Computer System Organization] Special-purpose and application-based systems - signal processing systems
General Terms
Design
Keywords
Computer arithmetic, Polynomial Residue Number System (PRNS), Low power design, signal processing

(F) Properties of On-chip Inductive Current Loops [p. 12]

Andrey V. Mezhiba, Eby G. Friedman (University of Rochester, USA)

The variation of inductance with circuit length is investigated in this paper. The nonlinear variation of inductance with length is shown to be a result of inductive coupling among circuit segments. If the distance between the forward and return current paths of a current loop is much smaller than the loop length, the inductive coupling to the forward current is similar to the coupling to the return current, resulting in negligible coupling. The inductance of these circuits therefore varies approximately linearly with length. Similarly, the effective inductive coupling between two parallel current loops is reduced through cancellation and has a negligible effect on the net inductance of a circuit. As a general rule, the inductance of circuits where the distance between the forward and return current is much smaller than the characteristic dimensions of the circuit scales linearly with circuit dimensions. This linear behavior can be used to simplify the inductance extraction and circuit analysis process. Categories and Subject Descriptors
B.7.m [Integrated Circuits]: Miscellaneous - on-chip inductance, partial inductance, inductive coupling
General Terms
Design
Keywords
inductance, inductive coupling, loop inductance

(S) Enhanced Clustered Voltage Scaling for Low Power [p. 18]

Monica Donno, Luca Macchiarulo, Alberto Macii, Enrico Macii, Massimo Poncino (Politecnico Di Torino, Italy)

This paper presents a voltage scaling approach that is based on an enhanced variant of clustered voltage scaling originally proposed by Usami and Horowitz ([1]). The results show that substituting the original depth first strategy with a breadth first one results in improved speed and quality of results. Data are validated through power and timing analysis performed with a commercial tool.
Categories and Subjects Descriptors
B. Hardware, B.7 Integrated Circuits, B.7.0 General.
General Terms
Design.

Session 2: Energy and Delay Considerations

Chair: Albert Macii (Politecnico di Torino)

(F) Unified Architecture Level Energy-Efficiency Metric [p. 24]

Victor Zyuban (T.J. Watson Research Center, USA)

The development of power-efficient microprocessors presents the need to consider power consumption at early stages of design, particularly at the ISA and microarchitecture definition stages, where the potential for power savings is more significant than at lower level stages, and the opportunity for making power-performance tradeoffs is the largest. Design modifications to the ISA and microarchitecture, however, affect most (if not all) parameters of the design, including architectural speed, code density, clocking rate and power. A reliable metric is required to make knowledgeable power-performance tradeoffs in this multi-dimensional space. This paper derives a unified energy-efficiency metric for evaluating ISA and microarchitecture features, which subsumes other commonly used power-performance metrics as special cases of a more general equation. This new metric is derived based on an analysis of a multi-dimensional power optimization problem, and the resulting formula involves only relative changes in the characteristics of a processor, enabling its application at the early stages of the design.
Categories and Subject Descriptors
C.1.0 [Processor Architectures]: General; C.5.3 [Microcomputers]: Microprocessors; B.7.1 [Types and Design Styles]: Microprocessors and microcomputers,VLSI; C.0 [General]: Modeling of computer architecture; C.1.3 [Other Architecture Styles]: Pipeline processors
General Terms
Design, Performance
Keywords
Energy, power, performance, energy-efficiency, metric, architecture, microarchitecture

(F) Fast and Accurate Wire Delay Estimation for Physical Synthesis of Large ASICs [p. 30]

Ruchir Puri, David S. Kung (IBM Thomas J. Watson Research Center, USA), Anthony D. Drumm (IBM Corporation, USA)

Interconnect delays represent an increasingly dominant portion of overall circuit delays. During timing-driven physical synthesis process, timing analysis is repeatedly performed over several hundred thousand components. Thus, fast and accurate estimation of interconnect delays is crucial. Traditionally, lumped and elmore delay models have been widely used for computing interconnect delays in physical synthesis due to their computational efficiency. However, these delay models are known to be inaccurate since they ignore slew and resistive shielding effects. In this paper, we propose a new iterative refinement based delay estimation approach that considers resistive shielding along with driver slew. Experiments results show that the proposed approach gives not only highly accurate results for far end RC-line delays but also compares very favorably to more difficult to match source end delays and source slews. In addition, use of the proposed delay model in physical synthesis yields significant performance improvement on several large industrial ASICs.
Categories and Subject Descriptors
B.7 [Hardware]: Integrated Circuits
General Terms
Algorithms, Design
Keywords
Integrated Circuit Design, Wire Delay, Estimation, Placement Driven Synthesis

(F) A Compact Delay Model for Series-Connected MOSFETs [p. 37]

Kaveh Shakeri, James D. Meindl (Georgia Institute of Technology, USA)

A compact delay model for series connected MOSFETs has been derived. This model enables accurate prediction of worst-case delay of different logic families such as dynamic logic. It also provides insight into delay change as the device parameters change. Key results show that the relative delay of series connected MOSFETs is almost invariant for different generations of technology.

(S) A Decoupling Method for Analysis of Coupled RLC Interconnects [p. 41]

Jun Chen, Lei He (University of Wisconsin, Madison, USA)

In this paper we present an efficient decoupling model for on-chip interconnect analysis. This model decouples multiple RLC transmission lines into independent lines with separate drivers and receivers. Based on this model we propose an efficient algorithm to solve the far end responses of multiple RLC lines. Experiments show good matching between our decoupling model and SPICE simulation. Based on the model, we further develop an N_max algorithm to quickly determine the noise amplitudes of far end responses. Experiments show that N_max algorithm gives conservative but reasonably accurate results compared to SPICE simulation.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Verification
General Terms
Design, Performance, Theory, Verification

(S) Low Swing Dual Threshold Voltage Domino Logic [p. 47]

Volkan Kursun, Eby G. Friedman (University of Rochester, USA)

A low swing domino logic technique is proposed to decrease power consumption without sacrificing noise immunity. With the proposed low swing domino logic circuit technique, active power consumption is reduced by up to 9.4% while improving the noise immunity by 2.6% as compared to standard domino logic circuits. It is also shown that by applying a low swing contention reduction technique, the power savings can be further increased by 6.7% while the delay can be improved by 8.6%. A simple and efficient dual threshold voltage (dual-V_t) circuit technique that incorporates low swing signals is also proposed. It is shown that the proposed dual-V_t technique reduces the standby leakage current by approximately 235 times while offering enhanced delay characteristics as compared to a standard low threshold voltage implementation.
Keywords
Domino logic, low voltage swing, dual-V_t, low power, dynamic circuits.

Session 3: Testing and Fault-Tolerance

Chair: Zhigang (David) Pan (IBM)

(F) An Error Simulation Based Approach to Measure Error Coverage of Formal Properties [p. 53]

P. Azzoni (Strada le Grazie, 15, Italy), A. Fedeli (STMicroelectronics, Italy), F. Fummi (Strada le Grazie, 15, Italy), G. Pravadelli (Strada le Grazie, 15, Italy), U. Rossi, F. Toto (STMicroelectronics, Italy)

Many approaches have been proposed for digital system verification, either based on simulation strategies or on formal verification techniques. Both of them show advantages and drawbacks and new mixed approaches have been presented in order to improve the verification process. Specifically, the adoption of formal methods still lacks a coverage metrics to let the verification engineer get a measure of which portion of the circuit is already covered by the written properties that far and which parts still need to be addressed. The present paper describes a new simulation based methodology aimed at measuring the error coverage achieved by temporal assertions proved by model checking. The approach has been applied to the description of a protocol converter block, and some preliminary results are presented in the paper.

(F) Protected IP-Core Test Generation [p. 59]

Alessandro Fin, Franco Fummi (Strada le Grazie, 15, Italy)

Design simplification is becoming necessary to respect the target time-to-market of SoCs, and this goal can be obtained by using predesigned IP-cores. However, their correct integration in a design implies more complex verification problems. IPs are usually provided with their own test patterns that can be used only by applying design for testability techniques onto the chip. Whenever physical faults must be detected, this approach is reasonable, even if it implies circuit performance degradation. However, it is completely useless at the design level, when the correct integration of the IPs into the global design must be investigated. At this level, proprietary test sequences must be generated in relation to the actual use of the IPs into the design. In this paper, the SystemC language is exploited to define a design verification framework for integration test of IP-cores. Intellectual properties of cores are guaranteed by adopting a client/server simulation architecture and by allowing functional test generation on faulty IP-core models without disclosing their internal structure. The methodology can be applied to mixed descriptions based on VHDL and SystemC, since an abstraction layer has been defined allowing clients and/or servers to be indifferently described in VHDL or SystemC. Finally, remote simulation can be also performed locally to avoid bandwidth bottleneck and the test generation process can be indifferently applied at lower abstraction levels such as RT and gate.

(F) Test Generation for Resistive Opens in CMOS [p. 65]

Arun Krishnamachary, Jacob. A. Abraham (The University of Texas at Austin, USA)

This paper develops new techniques for detecting both stuck-open faults and resistive open faults, which result in increased delays along some paths. The improved detection of CMOS open defects is made possible by a new delay fault model which combines the advantages of the gate delay fault model and the path delay fault model. We develop a test generation methodology for this fault model which enables generation of test vectors that test a percentage of the longest sensitizable paths in the design and also test each net for spot defects through their longest sensitizable paths. Real delay values are used to determine the true critical paths in the circuit. The high degree of effectiveness of this fault model under realistic assumptions for process characteristics is first enumerated, and experimental results demonstrate the improved coverage possible with the proposed approach.
Categories and Subject Descriptors
B.8.1 [Hardware]: Reliability, Testing and Fault-Tolerance
Keywords
delay testing, resistive opens, defect detection

(S) Self-Checking Sequential Circuits with Self-Healing Ability [p. 71]

Ilya Levin, Vladimir Ostrovsky, Sergey Ostanin (Tel-Aviv University, Israel), Mark Karpovsky (Boston University, USA)

In this paper we deal with totally self-checking (TSC) synchronous sequential circuits (SSCs), that are able to recover after an occurrence of a fault. We call SSC owing this property as a self-healing SSC. A concept of a partially monotonic SSC is used in the paper. It is shown that the partially monotonic SSCs satisfy the self-healing property. A novel reduced m-out-of-n code is developed. It is proposed applying this code to the synthesis of a TSC checker for the state monotonic SSCs. The proposed method of synthesis is based on a LUT implementation of monotonic functions.
For most circuits in a standard benchmark set, the proposed approach leads to a reduction of about 10-20% of the overhead as compared with the traditional methods.
Keywords: Self-checking, synchronous sequential circuit, self-healing unordered coding, checker.

(S) Minimizing Concurrent Test Time in SoC's by Balancing Resource Usage [p. 77]

Dan Zhao, Shambhu Upadhyaya (SUNY at Buffalo, USA), M. Margala (University of Rochester, USA)

We present a novel test scheduling algorithm for embedded core-based SoC's. Given a system integrated with a set of cores and a set of test resources, we select a test for each core from a set of alternative test sets, and schedule it in a way that evenly balances the resource usage, and ultimately reduce the test application time. Furthermore, we propose a novel approach that groups the cores and assigns higher priority to those with smaller number of alternate test sets. In addition, we also extend the algorithm to allow multiple test sets selection from a set of alternatives to facilitate testing for various fault models.
Categories and Subject Descriptors
B.7.1 [Hardware]: Types and Design Styles; B.8.1 [Hardware]: Reliability, Testing, and Fault-Tolerance
General Terms
Algorithm, Management and Reliability
Keywords
Resource balancing, system-on-a-chip test scheduling, test sets selection

Session 4: VLSI Design

Chair: Martin Margala (University of Rochester)

(F) Efficient Implementation of a Complex +/- 1 Multiplier [p. 83]

Boris D. Andreev, Eby G. Friedman, Edward L. Titlebaum (University of Rochester, USA)

A complex �1 multiplier is an integral element in modern CDMA communication systems, specifically as a pseudonoise code scrambler/descrambler. Therefore, an efficient implementation is essential to reduce the critical path delay, power, and area of wireless receivers. A new architecture is proposed to achieve this complex multiplier function. Tradeoffs and design solutions as well as the interface with subsequent arithmetic circuits are discussed. Simulations exhibit a significant speed improvement as compared to alternative architectures. These results are also applicable to other arithmetic circuits.
Categories and Subject Descriptors
B.2.4 [Arithmetic and Logic Structures] High-Speed Arithmetic: Algorithms
B.7.1 [Integrated circuits] Types and Design Styles: Algorithms implemented in hardware, VLSI
I.1.1 [Symbolic and Algebraic Manipulation] Expressions and Their Representation: Representations, Simplification of expressions
General Terms
Design, Algorithms
Keywords
VLSI, CDMA, PN code, scrambler, redundant arithmetic

(F) On the High-Speed VLSI Implementation of Errors-and-Erasures Correcting Reed-Solomon Decoders [p. 89]

Tong Zhang, Keshab K. Parhi (University of Minnesota, USA)

Recently a novel algorithm transformation was proposed to reduce the critical path of Berlekamp-Massey algorithm implementation for errors-alone Reed-Solomon decoding. In this paper, we apply the same methodology to transform the Berlekamp-Massey algorithm for errors-and-erasures RS decoding. We present a regular hardware architecture to implement the reformulated Berlekamp-Massey algorithm, which can achieve high throughput. Moreover, an operation scheduling scheme is proposed to further reduce the hardware complexity without loss of throughput.
Categories and Subject Descriptors
B.2.4 [ARITHMETIC AND LOGIC STRUCTURES]: High-Speed Arithmetic
General Terms
Design
Keywords
Reed-Solomon codes, Berlekamp-Massey algorithm, erasure, VLSI architectures

(F) Design Limitations in Deep Sub-0.1um CMOS SRAM [p. 94]

Robert K. Grube, Qi Wang, Sung-Mo Kang (University of California, Santa Cruz, USA)

Leakage effects in deep sub-0.1�m CMOS technologies are of critical concern to designers of high-performance integrated circuits. Recent estimates [1] of a 7.5x increase in leakage current per chip generation; along with several proposals for energy-efficient cache architectures that unfortunately do not address static leakage-energy issues [2], [3], [4], have heightened concerns over the functionality and stability of future high-performance SRAM cache designs. Each of these future technology generations, in addition to having increased shortchannel effects (SCEs) will now also suffer from gate leakage currents across physically and electrically thin (~1.5nm) SiO2 gate dielectrics. In this paper we demonstrate the limits of this scaling on the operational behavior of on-chip SRAM cache designs, and briefly discuss the impact of these results on high-performance memory architectures.
Categories and Subject Descriptors
Hardware - Integrated Circuits - Types and Design Styles
(B.7.1): Advanced Technologies; Hardware - Memory Structures � Semiconductor Memories (B.3.1): Static Memory (SRAM)
General Terms
Performance, Design
Keywords
Gate leakage, tunneling currents, GIDL, On-Chip Cache

(S) Reconfigurable Repetitive Padding Unit [p. 98]

Georgi Kuzmanov, Stamatis Vassiliadis (Delft University of Technology, The Netherlands)

This paper proposes a reconfigurable processing unit, which performs the MPEG-4 repetitive padding algorithm in real time. The padding unit has been implemented as a scalable systolic structure of processing elements. A generic array of PE has been described in VHDL, and the functionality of the unit has been validated by simulations. In order to determine the chip area and speed of the padding structure, we have synthesized the structure for two FPGA families - Xilinx and Altera. The simulation results indicate that the proposed padding unit can operate in a wide frequency range, depending on the implemented configuration. It is shown that it can process from tens up to hundreds of thousands MPEG-4 macroblocks per second. This allows the real-time requirements of all MPEG-4 profiles and levels to be met efficiently at trivial hardware costs. Finally, the trade-off between chip-area and operating speed is discussed and possible configuration alternatives are proposed. Categories and Subject Descriptors
B.6.1 [Logic Design]: Design Styles - logic arrays; parallel circuits; B.5.1 [RTL Implementation]: Design - arithmetic and logic units; data-path design; styles
General Terms
Design, Performance

Session 5: VLSI Circuits

Chair: Eby Friedman (University of Rochester)

(F) Energy-Delay Efficiency of VLSI Computations [p. 104]

Paul I. Pénzes, Alain J. Martin (California Institute of Technology, Pasadena, USA)

In this paper we introduce an energy-delay efficiency metric that captures any trade-off between the energy and the delay of the computation. We apply this new concept to the parallel and sequential composition of circuits in general and in particular to circuits optimized through transistor sizing. We bound the delay and energy of the optimized circuit and we give necessary and sufficient conditions under which these bounds are reached. We also give necessary and sufficient conditions under which subcomponents of a design can be optimized independently so as to yield global optimum when recomposed. We demonstrate the utility of a minimum-energy function to capture high level compositional properties of circuits. The use of this minimum-energy function yields practical insight into ways of improving overall energy-delay efficiency of circuits.
Categories and Subject Descriptors
B.6 [Hardware]: Logic Design; B.6.3 [Logic Design]: Design Aids - Optimization
General Terms
Theory
Keywords
Energy-delay optimization, transistor sizing

(F) Active Shields: A New Approach to Shielding Global Wires [p. 112]

Himanshu Kaul, Dennis Sylvester, David Blaauw (University of Michigan, Ann Arbor, USA)

A new shielding scheme, active shielding, is proposed for reducing delays on interconnects. As opposed to conventional (passive) shielding, the active shielding approach helps to speed up signal propagation on a wire by ensuring in-phase switching adjacent nets. Results show that the active shielding scheme improves performance by up to 16% compared to passive shields and up to 29% compared to unshielded wires. When signal slopes at the end of the line are compared, savings of up to 38% and 27% can be achieved when compared to passive shields and unshielded wires, respectively.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Layout, Placement and Routing.
General Terms
Design

(F) Variable-Segment and Variable-Driver Parallel Regeneration Techniques for RLC VLSI Interconnects [p. 118]

Falah R. Awwad, Mohamed Nekili (Concordia University, Montréal, Canada)

Repeaters are now widely used to enhance the performance of long On-Chip interconnects in CMOS VLSI. For RC-modeled interconnects, parallel repeaters have proved to be superior to serial ones. In this paper, a Variable-Segment Regeneration Technique is introduced and compared with a Variable-driver Parallel Technique, a recently proposed transparent repeater and with three conventional techniques. HSpice Simulations using a 0.25 mm TSMC technology show that both the variable-segment and variable-driver techniques feature 62% time delay saving and 354% Area-Delay product saving over the transparent repeater, and are superior to all conventional techniques. However, our new variable-segment technique is characterized by a 116% Area-Delay product saving over the variable-driver technique. Thus, making it the most performant in the field of high-performance RLC interconnect regeneration. The simulation results confirm the superiority of the parallel regeneration technique over the serial ones.
Keywords
VLSI, RLC Interconnect, Parallel Regeneration, Repeater.

(S) Selective-Run Built-In Self-Test Using an Embedded Processor [p. 124]

Sungbae Hwang, Jacob A. Abraham (The University of Texas at Austin, USA)

Many systems-on-a-chip (SOCs) include processors as central units to implement diverse algorithms and control peripheral units such as embedded cores. The computing power of the embedded processor can be used to self-test its own functions as well as to test the other cores within the chip boundary. In BIST methodology, pseudo-random pattern testing can reduce the memory requirements. In addition to general pseudo-random pattern testing, this paper proposes and evaluates a novel selective-random pattern test technique. This technique increases the fault coverage while significantly reducing test application time. This also greatly decreases the memory requirement compared to traditional BIST schemes. The cost for extra hardware is low and the technique is easily integrated with parallel scan and boundary scan designs.
Categories and Subject Descriptors
B.7.3 [Integrated Circuits]: Reliability and Testing - Built-in tests, Test generation
Keywords
SOC testing, built-in self-test, design for testability, processor-based testing, pseudo-random number generator.

Session 6: Design Automation

Chair: Ken Shepard (Columbia University)

(F) Board-Level Multiterminal Net Assignment [p. 130]

Xiaoyu Song (Portland State University), William N. N. Hung (Intel Corporation, USA), Alan Mishchenko, Malgorzata Chrzanowska-Jeske (Portland State University), Alan Coppola (Cypress Semiconductor, USA), Andrew Kennings (University of Waterloo, Canada)

The paper presents a satisfiability-based method for solving the board-level multiterminal net routing problem in Clos-Folded FPGA based logic emulation systems. The approach transforms the FPGA board-level routing task into a single, large Boolean equation with the property that any assignment of input variables that satisfies the equation specifies a valid routing. The approach considers all nets simultaneously and the absence of a satisfying assignment implies that the layout is unroutable.We use two of the fastest SAT solvers: Chaff and DLM to perform our experiments. Empirical results show that the method is time-efficient and applicable to large layout problem instances.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - placement and routing
General Terms
Algorithms, Design, Experimentation, Measurement, Verification

(F) Minimizing Resources in a Repeating Schedule for a Split-Node Data-Flow Graph [p. 136]

Timothy W. O'Neil, Edwin H.-M. Sha (University of Texas at Dallas, USA)

Many computation-intensive or recursive applications commonly found in digital signal processing and image processing applications can be represented by data-flow graphs (DFGs). In our previous work, we proposed a new technique, extended retiming, which can be combined with minimal unfolding to transform a DFG into one which is rate-optimal. The result, however, is a DFG with split nodes, a concise representation for pipelined schedules. This model and the extraction of the pipelined schedule it represents have heretofore not been explored. In this paper, we demonstrate one scheduling algorithm for such graphs, and then discuss a way to reduce the hardware requirements of the resulting schedule. In the process, we state and prove a tight upper bound on the minimum number of processors required to execute the static schedule produced by our algorithms. Finally, we demonstrate our methods on a specific example.
Categories and Subject Descriptors
J.6 [Computer-Aided Engineering]: Computer-Aided Design (CAD)
General Terms
Algorithms, Design, Performance, Theory

(F) A New Look at Hardware Maze Routing [p. 142]

John A. Nestor (Lafayette College, USA)

This paper describes a new design for a hardware accelerator to support grid-based Maze Routing. Based on the direct mapped approach of Breuer and Shamsa [3], this work refines their design to substantially reduce the hardware requirements of each processing element while at the same time adding support for multilayer routing and fast iterative routing. An RTL implementation has been developed for this design in VHDL, and initial results show promise for its realization using ASIC, custom, or FPGA technology.

(S) Novel Interconnect Modeling by Using High-Order Compact Finite Difference Methods [p. 148]

Qinwei Xu and Pinaki Mazumder (University of Michigan, Ann Arbor, USA)

The high-order compact finite difference (HCFD) method is adapted for interconnect modeling. Based on the compact finite difference method, the HCFD method employs the Chebyshev polynomials to construct the approximation framework for interconnect discretization, and leads to improved equivalent-circuit models. The HCFD-based modeling requires far fewer intervening grid points for building an accurate discrete model of the transmission line than other numerical methods like traditional Finite Difference (FD) method. It is believed that given the number of state variables, the presented method gives more accurate results than other known passive discrete modeling methods. The theoretical proof shows that HCFD-based modeling preserves the passivity.

(S) AQUASUN: Adaptive Window Query Processing in CAD Applications for Physical Design and Verification [p. 153]

Michiel De Wilde, Dirk Stroobandt, J. Van Campenhout, Peter Verplaetse (Ghent University, Belgium)

CAD applications for physical design and verification very often require enumerating all layout objects whose bounding box intersects an axis-aligned rectangular area. A number of multidimensional access methods exist to process such window queries. The performance of some important design and verification algorithms heavily depends on the processing speed of the used access method. For complex layouts, these methods require huge amounts of resident memory to attain this speed. In this paper, we present a new access method called aquasun, which brings a significant query processing performance improvement over other adaptive methods-methods which can cope with a continuously changing layout. These methods generally descend from the database world and are designed to perform the equivalent query in n-dimensional space. Our method is specifically tailored to two dimensions, exploiting 2D optimisations that significantly accelerate window queries within oblong objects like PCB tracks. Furthermore, aquasun makes use of an efficient compression technique which greatly cuts down on memory usage.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids|Graphics;
H.2.2 [Database Management]: Physical Design|Access methods; E.1 [Data Structures]: Trees
General Terms
Algorithms, Performance
Keywords
Multidimensional access methods, Rectangle indexes

Session 7: Potpourri

Chair: Igor Markov (University of Michigan)

(F) Term Ordering Problem on MDG [p. 160]

Yi Feng, Eduard Cerny (Université de Montréal, Canada)

As an efficient representation of Extended Finite State Machines, Multiway Decision Graphs (MDG) are suitable for automatic hardware verification of Register Transfer Level (RTL) designs. However, in some cases, MDG-based verification suffers from the state explosion problem. Some of cases are caused by the standard order used by MDG to order cross-terms that have the same top-level function symbol. These terms usually label decision nodes and must be ordered. We call this kind of state explosion the standard term ordering problem. A solution based on function renaming and cross-term rewriting is proposed in this paper. Experimental results show that this solution can solve the problem completely and thus increase the range of circuits that can be verified by MDG.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design Aids - verification.
General Terms
Verification.
Keywords
Multiway Decision Graphs, Standard term ordering, Variable Ordering, First-order terms, Function symbols, Function renaming, Term rewriting.

(S) A Low Power Direct Digital Frequency Synthesizer with 60 dBc Spectral Purity [p. 166]

J.M.P. Langlois, Dhamin Al-Khalili (Royal Military College of Canada, Canada)

We present a low-power sine-output Direct Digital Frequency Synthesizer (DDFS) realized in 0.18 mm CMOS that achieves 60 dBc spectral purity from DC to the Nyquist frequency. No ROM or multipliers are used, but an external DAC is required if an analog output is desired. Power consumption is 10 mW for a 100 MHz clock, which is significantly less than figures reported previously. System complexity is greatly reduced by using an efficient linear interpolation scheme to approximate a sinusoid function. This has resulted in silicon area utilization of 0.011 mm². The design would be suitable as an IP core in a low power digital RF transceiver ASIC.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles - algorithms implemented in hardware, VLSI.
General Terms
Algorithms, Design, Performance.
Keywords
DDFS, DDS, phase to sine amplitude conversion, low power

(S) Novel Design and Verification of a 16 x 16-b Self-Repairable Reconfigurable Inner Product Processor [p. 172]

Rong Lin (SUNY-Geneseo, USA), Martin Margala (University of Rochester, USA)

A novel self-repairable and reconfigurable inner-product processor with low-power, fast CMOS circuits and DFT techniques is presented. It takes the advantage of recently proposed decomposition based arithmetic circuit design approach for simple implementation of the reconfigurations, component replacements, and high-quality tests. The processor can be dynamically reconfigured for two types operations: 4 x 8 x 8-b inner product computation and 16 x 16-b multiplication. The self-repair is provided by choosing a fault-free one from 17 possible architectures during the test, which covers more than 52% transistors for the specified faults. Only one extra bit is needed for all reconfigurations, repairs, and tests. The proposed exhaustive DFT technique greatly reduces the test vector length, from 17*2³² to 1.5*2¹³, which is as short as that required by the pseudo-exhaustive DFT method recently reported in literature.
Keywords
Reconfigurable, Decomposition Algorithms, Self-Repair, VLSI, Arithmetic Circuits, Image Processing, Fault Tolerance

(S) Computing Walsh, Arithmetic and Reed-Muller Spectral Decision Diagrams Using Graph Transformations [p. 178]

W.J. Townsend, M. A. Thornton (Mississippi State University), R. Drechsler (University of Bremen, Germany), D. M. Miller (University of Victoria, Canada)

Spectral techniques have found many applications in computer-aided design, including synthesis, verification, and testing. Decision diagram representations permit spectral coefficients to be calculated via graph-based algorithms. In this paper, algorithms are described for transforming multi-output functions to produce Walsh, arithmetic, and Reed-Muller spectral decision diagrams and the experimental results of those implementations are presented.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - automatic synthesis, optimization, switching theory, verification.
J.6. [Computer-Aided Engineering]: Computer-aided design.
General Terms
Algorithm, Design, Verification.
Keywords
Data Structures, Decision Diagrams, Discrete Functions, Spectral Methods.