|
GLSVLSI 2002 ABSTRACTS
Sessions:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
(F) Full Paper
(S) Short Paper
-
A VLSI System Perspective for Microprocessors Beyond 100 nm
-
Shekhar Borkar (Circuits Research Lab, Intel Corp.)
Microprocessor performance increased by five orders of magnitude in the last three decades. This was made possible
by continued technology scaling, improving transistor performance to increase frequency, increasing integration
capacity to realize complex architectures, and reducing energy consumed per logic operation to keep power
dissipation within limit. The technology treadmill will continue to fulfill the microprocessor performance demand;
however, with some adverse effects posing barriers. Therefore, performance at any cost will not be an option;
significant improvements in efficiency of transistor utilization will be necessary. This paper will discuss potential
solutions in all disciplines, such as microarchitecture, circuits, design technologies and methodologies, thermals, and
power delivery to overcome these barriers in technologies beyond 100 nm.
Chair: Mircea Stan (University of Virginia)
-
(F) Power and CAD Considerations for the 1.75MByte, 1.2Ghz L2 Cache on the Alpha
21364 CPU [p. 1]
-
J. Grodstein (Compaq Computer, US), R. Rayess (Intel Corporation, USA),
T. Truex, L. Shattuck (Compaq Computer, USA), S. Lowell, D. Bailey (Intel Corporation, USA), D. Bertucci (Compaq Computer, USA), G. Bischoff (Intel Corporation, USA), D. Dever (Compaq Computer, USA), M. Gowan (Intel Corporation, USA),
R. Lane, B. Lilly (Compaq Computer, USA), K. Nagalla, R. Shah, E. Shriver
(Intel Corporation, USA), S.-H. Yin, S. Morton (Compaq Computer, USA)
A 1.75 MByte L2 cache has been designed and fabricated as part
of the Alpha 21364 microprocessor[1] (Figure 1), in a .18µ bulk
CMOS process. The cache was designed to run at 1.2 GHz, and
pass-1 samples confirm this. While Alpha CPUs are known primarily
for high speed, the combination of package constraints and
a tight schedule forced careful attention to the integrated whole of
power expenditure and the interaction of CAD with design. The
cache consumes only 7% of total die power.
Categories and Subject Descriptors
B.3.1 [Memory Structures]: Semiconductor Memories - SRAM
B.2.2 [Memory Structures]: Design Styles - cache memory
B.7.1 [Integrated Circuits]: Types and Design Styles - Microprocessors.
General Terms
Design, Performance, Verification.
Keywords
Cache memory, CPU, low-power, timing verification, logic verification.
-
(F) Multi-Voltage Low Power Convolvers Using the Polynomial Residue Number System [p. 7]
-
V. Paliouras (University of Patras, Greece), A. Skavantzos (Louisiana
State University, USA), T. Stouraitis (University of Patras, Greece)
A novel approach for the reduction of the power dissipated in
a signal processing application is introduced in this paper. By
exploiting the properties of the Polynomial Residue Number
System (PRNS) and of the arithmetic modulo (2n +1), the power
dissipation of implementing cyclic convolution is reduced up to
four times. Furthermore, the corresponding powerxdelay product
is reduced up to 2.4 times, while a simultaneous reduction of
area cost is achieved. The particular performance improvement
becomes possible by introducing a way to minimize the forward
and inverse conversion overhead associated with PRNS. The
introduced minimization exploits the fact that for the conversions
for particular lengths of data sequences and particular moduli, only
multiplications with powers of two and additions are required, thus
leading to low implementation complexity. In addition multiple
supply voltages are utilized to further reduce power dissipation
by more than 30% for particular cases. Formulas that return the
applicable supply voltage values per PRNS channel are derived in
this paper.
Categories and Subject Descriptors
B.7.1 [Hardware]: Types and Design Styles; B.5.1 [Hardware]:
Design; C.3 [Computer System Organization] Special-purpose
and application-based systems - signal processing systems
General Terms
Design
Keywords
Computer arithmetic, Polynomial Residue Number System
(PRNS), Low power design, signal processing
-
(F) Properties of On-chip Inductive Current Loops [p. 12]
-
Andrey V. Mezhiba, Eby G. Friedman (University of Rochester, USA)
The variation of inductance with circuit length is investigated in this
paper. The nonlinear variation of inductance with length is shown
to be a result of inductive coupling among circuit segments. If the
distance between the forward and return current paths of a current
loop is much smaller than the loop length, the inductive coupling to
the forward current is similar to the coupling to the return current,
resulting in negligible coupling. The inductance of these circuits
therefore varies approximately linearly with length. Similarly, the
effective inductive coupling between two parallel current loops is
reduced through cancellation and has a negligible effect on the net
inductance of a circuit. As a general rule, the inductance of circuits
where the distance between the forward and return current is much
smaller than the characteristic dimensions of the circuit scales linearly
with circuit dimensions. This linear behavior can be used to
simplify the inductance extraction and circuit analysis process.
Categories and Subject Descriptors
B.7.m [Integrated Circuits]: Miscellaneous - on-chip inductance,
partial inductance, inductive coupling
General Terms
Design
Keywords
inductance, inductive coupling, loop inductance
-
(S) Enhanced Clustered Voltage Scaling for Low Power [p. 18]
-
Monica Donno, Luca Macchiarulo, Alberto Macii, Enrico Macii, Massimo Poncino
(Politecnico Di Torino, Italy)
This paper presents a voltage scaling approach that is based on an enhanced
variant of clustered voltage scaling originally proposed by Usami and
Horowitz ([1]). The results show that substituting the original depth first
strategy with a breadth first one results in improved speed and quality of
results. Data are validated through power and timing analysis performed
with a commercial tool.
Categories and Subjects Descriptors
B. Hardware, B.7 Integrated Circuits, B.7.0 General.
General Terms
Design.
Chair: Albert Macii (Politecnico di Torino)
-
(F) Unified Architecture Level Energy-Efficiency Metric [p. 24]
-
Victor Zyuban (T.J. Watson Research Center, USA)
The development of power-efficient microprocessors presents the
need to consider power consumption at early stages of design, particularly
at the ISA and microarchitecture definition stages, where
the potential for power savings is more significant than at lower level
stages, and the opportunity for making power-performance
tradeoffs is the largest. Design modifications to the ISA and microarchitecture,
however, affect most (if not all) parameters of the
design, including architectural speed, code density, clocking rate
and power. A reliable metric is required to make knowledgeable
power-performance tradeoffs in this multi-dimensional space. This
paper derives a unified energy-efficiency metric for evaluating ISA
and microarchitecture features, which subsumes other commonly
used power-performance metrics as special cases of a more general
equation. This new metric is derived based on an analysis of
a multi-dimensional power optimization problem, and the resulting
formula involves only relative changes in the characteristics of a
processor, enabling its application at the early stages of the design.
Categories and Subject Descriptors
C.1.0 [Processor Architectures]: General; C.5.3 [Microcomputers]:
Microprocessors; B.7.1 [Types and Design Styles]: Microprocessors
and microcomputers,VLSI; C.0 [General]: Modeling of computer
architecture; C.1.3 [Other Architecture Styles]: Pipeline
processors
General Terms
Design, Performance
Keywords
Energy, power, performance, energy-efficiency, metric, architecture,
microarchitecture
-
(F) Fast and Accurate Wire Delay Estimation for Physical Synthesis of Large ASICs [p. 30]
-
Ruchir Puri, David S. Kung (IBM Thomas J. Watson Research Center, USA), Anthony D. Drumm (IBM Corporation, USA)
Interconnect delays represent an increasingly dominant portion of overall
circuit delays. During timing-driven physical synthesis process, timing
analysis is repeatedly performed over several hundred thousand components.
Thus, fast and accurate estimation of interconnect delays is crucial.
Traditionally, lumped and elmore delay models have been widely used for
computing interconnect delays in physical synthesis due to their computational
efficiency. However, these delay models are known to be inaccurate since they
ignore slew and resistive shielding effects. In this paper, we propose
a new iterative refinement based delay estimation approach that considers
resistive shielding along with driver slew. Experiments results show that
the proposed approach gives not only highly accurate results for far end
RC-line delays but also compares very favorably to more difficult to match
source end delays and source slews. In addition, use of the proposed delay
model in physical synthesis yields significant performance improvement on
several large industrial ASICs.
Categories and Subject Descriptors
B.7 [Hardware]: Integrated Circuits
General Terms
Algorithms, Design
Keywords
Integrated Circuit Design, Wire Delay, Estimation, Placement Driven
Synthesis
-
(F) A Compact Delay Model for Series-Connected MOSFETs [p. 37]
-
Kaveh Shakeri, James D. Meindl (Georgia Institute of Technology, USA)
A compact delay model for series connected MOSFETs has been
derived. This model enables accurate prediction of worst-case
delay of different logic families such as dynamic logic. It also
provides insight into delay change as the device parameters
change. Key results show that the relative delay of series
connected MOSFETs is almost invariant for different generations
of technology.
-
(S) A Decoupling Method for Analysis of Coupled RLC Interconnects [p. 41]
-
Jun Chen, Lei He (University of Wisconsin, Madison, USA)
In this paper we present an efficient decoupling model for on-chip
interconnect analysis. This model decouples multiple RLC transmission
lines into independent lines with separate drivers and receivers. Based on
this model we propose an efficient algorithm to solve the far end responses
of multiple RLC lines. Experiments show good matching between our decoupling
model and SPICE simulation. Based on the model, we further develop an
Nmax algorithm to quickly determine the noise amplitudes of
far end responses. Experiments show that Nmax algorithm gives
conservative but reasonably accurate results compared to SPICE simulation.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Verification
General Terms
Design, Performance, Theory, Verification
-
(S) Low Swing Dual Threshold Voltage Domino Logic [p. 47]
-
Volkan Kursun, Eby G. Friedman (University of Rochester, USA)
A low swing domino logic technique is proposed to decrease
power consumption without sacrificing noise immunity. With the
proposed low swing domino logic circuit technique, active power
consumption is reduced by up to 9.4% while improving the noise
immunity by 2.6% as compared to standard domino logic circuits.
It is also shown that by applying a low swing contention reduction
technique, the power savings can be further increased by 6.7%
while the delay can be improved by 8.6%. A simple and efficient
dual threshold voltage (dual-Vt) circuit technique that incorporates
low swing signals is also proposed. It is shown that the proposed
dual-Vt technique reduces the standby leakage current by approximately
235 times while offering enhanced delay characteristics as compared to a
standard low threshold voltage implementation.
Keywords
Domino logic, low voltage swing, dual-Vt, low power, dynamic
circuits.
Chair: Zhigang (David) Pan (IBM)
-
(F) An Error Simulation Based Approach to Measure Error Coverage of Formal Properties [p. 53]
-
P. Azzoni (Strada le Grazie, 15, Italy), A. Fedeli (STMicroelectronics, Italy), F. Fummi (Strada le Grazie, 15, Italy), G. Pravadelli (Strada le Grazie, 15, Italy), U. Rossi, F. Toto (STMicroelectronics, Italy)
Many approaches have been proposed for digital system verification,
either based on simulation strategies or on formal
verification techniques. Both of them show advantages and
drawbacks and new mixed approaches have been presented
in order to improve the verification process. Specifically, the
adoption of formal methods still lacks a coverage metrics to
let the verification engineer get a measure of which portion of
the circuit is already covered by the written properties that
far and which parts still need to be addressed. The present
paper describes a new simulation based methodology aimed
at measuring the error coverage achieved by temporal assertions
proved by model checking. The approach has been
applied to the description of a protocol converter block, and
some preliminary results are presented in the paper.
-
(F) Protected IP-Core Test Generation [p. 59]
-
Alessandro Fin, Franco Fummi (Strada le Grazie, 15, Italy)
Design simplification is becoming necessary to respect the
target time-to-market of SoCs, and this goal can be obtained
by using predesigned IP-cores. However, their correct integration
in a design implies more complex verification problems. IPs are
usually provided with their own test patterns
that can be used only by applying design for testability techniques
onto the chip. Whenever physical faults must be detected,
this approach is reasonable, even if it implies circuit
performance degradation. However, it is completely useless
at the design level, when the correct integration of the IPs
into the global design must be investigated. At this level,
proprietary test sequences must be generated in relation to
the actual use of the IPs into the design. In this paper, the
SystemC language is exploited to define a design verification
framework for integration test of IP-cores. Intellectual
properties of cores are guaranteed by adopting a client/server
simulation architecture and by allowing functional test generation
on faulty IP-core models without disclosing their internal
structure. The methodology can be applied to mixed
descriptions based on VHDL and SystemC, since an abstraction
layer has been defined allowing clients and/or servers to
be indifferently described in VHDL or SystemC. Finally, remote
simulation can be also performed locally to avoid bandwidth
bottleneck and the test generation process can be indifferently
applied at lower abstraction levels such as RT and gate.
-
(F) Test Generation for Resistive Opens in CMOS [p. 65]
-
Arun Krishnamachary, Jacob. A. Abraham (The University of Texas at Austin,
USA)
This paper develops new techniques for detecting both stuck-open
faults and resistive open faults, which result in increased delays
along some paths. The improved detection of CMOS open defects
is made possible by a new delay fault model which combines the
advantages of the gate delay fault model and the path delay fault
model. We develop a test generation methodology for this fault
model which enables generation of test vectors that test a percentage
of the longest sensitizable paths in the design and also test each
net for spot defects through their longest sensitizable paths. Real
delay values are used to determine the true critical paths in the circuit.
The high degree of effectiveness of this fault model under realistic
assumptions for process characteristics is first enumerated,
and experimental results demonstrate the improved coverage possible
with the proposed approach.
Categories and Subject Descriptors
B.8.1 [Hardware]: Reliability, Testing and Fault-Tolerance
Keywords
delay testing, resistive opens, defect detection
-
(S) Self-Checking Sequential Circuits with Self-Healing Ability [p. 71]
-
Ilya Levin, Vladimir Ostrovsky, Sergey Ostanin (Tel-Aviv University, Israel),
Mark Karpovsky (Boston University, USA)
In this paper we deal with totally self-checking (TSC) synchronous
sequential circuits (SSCs), that are able to recover after an occurrence of
a fault. We call SSC owing this property as a self-healing SSC. A concept
of a partially monotonic SSC is used in the paper. It is shown that the
partially monotonic SSCs satisfy the self-healing property. A novel
reduced m-out-of-n code is developed. It is proposed applying this
code to the synthesis of a TSC checker for the state monotonic SSCs. The
proposed method of synthesis is based on a LUT implementation of monotonic
functions.
For most circuits in a standard benchmark set, the proposed approach leads
to a reduction of about 10-20% of the overhead as compared with the
traditional methods.
Keywords:
Self-checking, synchronous sequential circuit, self-healing unordered
coding, checker.
-
(S) Minimizing Concurrent Test Time in SoC's by Balancing Resource Usage [p. 77]
-
Dan Zhao, Shambhu Upadhyaya (SUNY at Buffalo, USA), M. Margala (University
of Rochester, USA)
We present a novel test scheduling algorithm for embedded core-based
SoC's. Given a system integrated with a set of cores and a
set of test resources, we select a test for each core from a set of
alternative test sets, and schedule it in a way that evenly balances
the resource usage, and ultimately reduce the test application time.
Furthermore, we propose a novel approach that groups the cores
and assigns higher priority to those with smaller number of alternate
test sets. In addition, we also extend the algorithm to allow
multiple test sets selection from a set of alternatives to facilitate
testing for various fault models.
Categories and Subject Descriptors
B.7.1 [Hardware]: Types and Design Styles; B.8.1 [Hardware]:
Reliability, Testing, and Fault-Tolerance
General Terms
Algorithm, Management and Reliability
Keywords
Resource balancing, system-on-a-chip test scheduling, test sets selection
Chair: Martin Margala (University of Rochester)
-
(F) Efficient Implementation of a Complex +/- 1 Multiplier [p. 83]
-
Boris D. Andreev, Eby G. Friedman, Edward L. Titlebaum (University of
Rochester, USA)
A complex ±1 multiplier is an integral element in modern
CDMA communication systems, specifically as a pseudonoise code
scrambler/descrambler. Therefore, an efficient implementation is
essential to reduce the critical path delay, power, and area of
wireless receivers. A new architecture is proposed to achieve this
complex multiplier function. Tradeoffs and design solutions as well
as the interface with subsequent arithmetic circuits are discussed.
Simulations exhibit a significant speed improvement as compared to
alternative architectures. These results are also applicable to other
arithmetic circuits.
Categories and Subject Descriptors
B.2.4 [Arithmetic and Logic Structures] High-Speed Arithmetic:
Algorithms
B.7.1 [Integrated circuits] Types and Design Styles:
Algorithms implemented in hardware, VLSI
I.1.1 [Symbolic and Algebraic Manipulation] Expressions and
Their Representation: Representations, Simplification of expressions
General Terms
Design, Algorithms
Keywords
VLSI, CDMA, PN code, scrambler, redundant arithmetic
-
(F) On the High-Speed VLSI Implementation of Errors-and-Erasures Correcting
Reed-Solomon Decoders [p. 89]
-
Tong Zhang, Keshab K. Parhi (University of Minnesota, USA)
Recently a novel algorithm transformation was proposed to
reduce the critical path of Berlekamp-Massey algorithm implementation
for errors-alone Reed-Solomon decoding. In
this paper, we apply the same methodology to transform
the Berlekamp-Massey algorithm for errors-and-erasures RS
decoding. We present a regular hardware architecture to
implement the reformulated Berlekamp-Massey algorithm,
which can achieve high throughput. Moreover, an operation
scheduling scheme is proposed to further reduce the hardware
complexity without loss of throughput.
Categories and Subject Descriptors
B.2.4 [ARITHMETIC AND LOGIC STRUCTURES]:
High-Speed Arithmetic
General Terms
Design
Keywords
Reed-Solomon codes, Berlekamp-Massey algorithm, erasure,
VLSI architectures
-
(F) Design Limitations in Deep Sub-0.1um CMOS SRAM [p. 94]
-
Robert K. Grube, Qi Wang, Sung-Mo Kang (University of California, Santa Cruz,
USA)
Leakage effects in deep sub-0.1µm CMOS technologies are of
critical concern to designers of high-performance integrated
circuits. Recent estimates [1] of a 7.5x increase in leakage
current per chip generation; along with several proposals for
energy-efficient cache architectures that unfortunately do not
address static leakage-energy issues [2], [3], [4], have
heightened concerns over the functionality and stability of future
high-performance SRAM cache designs. Each of these future
technology generations, in addition to having increased shortchannel
effects (SCEs) will now also suffer from gate leakage
currents across physically and electrically thin (~1.5nm) SiO2
gate dielectrics. In this paper we demonstrate the limits of this
scaling on the operational behavior of on-chip SRAM cache
designs, and briefly discuss the impact of these results on high-performance
memory architectures.
Categories and Subject Descriptors
Hardware - Integrated Circuits - Types and Design Styles
(B.7.1): Advanced Technologies; Hardware - Memory
Structures š Semiconductor Memories (B.3.1): Static Memory
(SRAM)
General Terms
Performance, Design
Keywords
Gate leakage, tunneling currents, GIDL, On-Chip Cache
-
(S) Reconfigurable Repetitive Padding Unit [p. 98]
-
Georgi Kuzmanov, Stamatis Vassiliadis (Delft University of Technology,
The Netherlands)
This paper proposes a reconfigurable processing unit, which
performs the MPEG-4 repetitive padding algorithm in real
time. The padding unit has been implemented as a scalable
systolic structure of processing elements. A generic
array of PE has been described in VHDL, and the functionality
of the unit has been validated by simulations. In
order to determine the chip area and speed of the padding
structure, we have synthesized the structure for two FPGA
families - Xilinx and Altera. The simulation results indicate
that the proposed padding unit can operate in a wide frequency
range, depending on the implemented configuration.
It is shown that it can process from tens up to hundreds
of thousands MPEG-4 macroblocks per second. This allows
the real-time requirements of all MPEG-4 profiles and levels
to be met efficiently at trivial hardware costs. Finally, the
trade-off between chip-area and operating speed is discussed
and possible configuration alternatives are proposed.
Categories and Subject Descriptors
B.6.1 [Logic Design]: Design Styles - logic arrays; parallel circuits;
B.5.1 [RTL Implementation]: Design - arithmetic and
logic units; data-path design; styles
General Terms
Design, Performance
Chair: Eby Friedman (University of Rochester)
-
(F) Energy-Delay Efficiency of VLSI Computations [p. 104]
-
Paul I. Pénzes, Alain J. Martin (California Institute of Technology, Pasadena,
USA)
In this paper we introduce an energy-delay efficiency metric that captures
any trade-off between the energy and the delay of the computation.
We apply this new concept to the parallel and sequential composition of
circuits in general and in particular to circuits optimized through
transistor sizing. We bound the delay and energy of the optimized
circuit and we give necessary and sufficient conditions under which these
bounds are reached. We also give necessary and sufficient conditions
under which subcomponents of a design can be optimized independently
so as to yield global optimum when recomposed.
We demonstrate the utility of a minimum-energy function to capture
high level compositional properties of circuits. The use of this
minimum-energy function yields practical insight into ways of improving
overall energy-delay efficiency of circuits.
Categories and Subject Descriptors
B.6 [Hardware]: Logic Design; B.6.3 [Logic Design]: Design Aids -
Optimization
General Terms
Theory
Keywords
Energy-delay optimization, transistor sizing
-
(F) Active Shields: A New Approach to Shielding Global Wires [p. 112]
-
Himanshu Kaul, Dennis Sylvester, David Blaauw (University of Michigan,
Ann Arbor, USA)
A new shielding scheme, active shielding, is proposed for
reducing delays on interconnects. As opposed to conventional
(passive) shielding, the active shielding approach helps to speed
up signal propagation on a wire by ensuring in-phase switching
adjacent nets. Results show that the active shielding scheme
improves performance by up to 16% compared to passive shields
and up to 29% compared to unshielded wires. When signal slopes
at the end of the line are compared, savings of up to 38% and 27%
can be achieved when compared to passive shields and unshielded
wires, respectively.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Layout, Placement
and Routing.
General Terms
Design
-
(F) Variable-Segment and Variable-Driver Parallel Regeneration Techniques for
RLC VLSI Interconnects [p. 118]
-
Falah R. Awwad, Mohamed Nekili (Concordia University, Montréal, Canada)
Repeaters are now widely used to enhance the performance of
long On-Chip interconnects in CMOS VLSI. For RC-modeled interconnects,
parallel repeaters have proved to be superior to serial
ones. In this paper, a Variable-Segment Regeneration Technique is
introduced and compared with a Variable-driver Parallel Technique,
a recently proposed transparent repeater and with three conventional
techniques. HSpice Simulations using a 0.25 mm TSMC
technology show that both the variable-segment and variable-driver
techniques feature 62% time delay saving and 354% Area-Delay
product saving over the transparent repeater, and are superior to all
conventional techniques. However, our new variable-segment
technique is characterized by a 116% Area-Delay product saving
over the variable-driver technique. Thus, making it the most performant
in the field of high-performance RLC interconnect regeneration.
The simulation results confirm the superiority of the parallel
regeneration technique over the serial ones.
Keywords
VLSI, RLC Interconnect, Parallel Regeneration, Repeater.
-
(S) Selective-Run Built-In Self-Test Using an Embedded Processor [p. 124]
-
Sungbae Hwang, Jacob A. Abraham (The University of Texas at Austin, USA)
Many systems-on-a-chip (SOCs) include processors as central units to
implement diverse algorithms and control peripheral units such as embedded
cores. The computing power of the embedded processor can be used to self-test
its own functions as well as to test the other cores within the chip boundary.
In BIST methodology, pseudo-random pattern testing can reduce the memory
requirements. In addition to general pseudo-random pattern testing, this
paper proposes and evaluates a novel selective-random pattern test technique.
This technique increases the fault coverage while significantly reducing
test application time. This also greatly decreases the memory requirement
compared to traditional BIST schemes. The cost for extra hardware is low and
the technique is easily integrated with parallel scan and boundary scan
designs.
Categories and Subject Descriptors
B.7.3 [Integrated Circuits]: Reliability and Testing - Built-in tests,
Test generation
Keywords
SOC testing, built-in self-test, design for testability, processor-based
testing, pseudo-random number generator.
Chair: Ken Shepard (Columbia University)
-
(F) Board-Level Multiterminal Net Assignment [p. 130]
-
Xiaoyu Song (Portland State University), William N. N. Hung (Intel Corporation, USA), Alan Mishchenko, Malgorzata Chrzanowska-Jeske (Portland State University),
Alan Coppola (Cypress Semiconductor, USA), Andrew Kennings (University of
Waterloo, Canada)
The paper presents a satisfiability-based method for solving the
board-level multiterminal net routing problem in Clos-Folded
FPGA based logic emulation systems. The approach transforms the
FPGA board-level routing task into a single, large Boolean equation
with the property that any assignment of input variables that satisfies
the equation specifies a valid routing. The approach considers
all nets simultaneously and the absence of a satisfying assignment
implies that the layout is unroutable.We use two of the fastest SAT
solvers: Chaff and DLM to perform our experiments. Empirical results
show that the method is time-efficient and applicable to large
layout problem instances.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - placement and routing
General Terms
Algorithms, Design, Experimentation, Measurement, Verification
-
(F) Minimizing Resources in a Repeating Schedule for a Split-Node Data-Flow Graph [p. 136]
-
Timothy W. O'Neil, Edwin H.-M. Sha (University of Texas at Dallas, USA)
Many computation-intensive
or recursive applications commonly found in
digital signal processing and image processing applications can be represented
by data-flow graphs (DFGs). In our previous work, we proposed a
new technique, extended retiming, which can be combined with minimal
unfolding to transform a DFG into one which is rate-optimal.
The result,
however, is a DFG with split nodes, a concise representation for pipelined
schedules. This model and the extraction of the pipelined schedule it represents
have heretofore not been explored. In this paper, we demonstrate
one scheduling algorithm for such graphs, and then discuss a way to reduce
the hardware requirements of the resulting schedule. In the process, we
state and prove a tight upper bound on the minimum number of processors
required to execute the static schedule produced by our algorithms. Finally,
we demonstrate our methods on a specific example.
Categories and Subject Descriptors
J.6 [Computer-Aided Engineering]: Computer-Aided Design (CAD)
General Terms
Algorithms, Design, Performance, Theory
-
(F) A New Look at Hardware Maze Routing [p. 142]
-
John A. Nestor (Lafayette College, USA)
This paper describes a new design for a hardware accelerator to
support grid-based Maze Routing. Based on the direct mapped
approach of Breuer and Shamsa [3], this work refines their
design to substantially reduce the hardware requirements of
each processing element while at the same time adding support
for multilayer routing and fast iterative routing. An RTL
implementation has been developed for this design in VHDL,
and initial results show promise for its realization using ASIC,
custom, or FPGA technology.
-
(S) Novel Interconnect Modeling by Using High-Order Compact Finite Difference Methods [p. 148]
-
Qinwei Xu and Pinaki Mazumder (University of Michigan, Ann Arbor, USA)
The high-order compact finite difference (HCFD)
method is adapted for interconnect modeling. Based on the compact
finite difference method, the HCFD method employs the Chebyshev
polynomials to construct the approximation framework for interconnect
discretization, and leads to improved equivalent-circuit
models. The HCFD-based modeling requires far fewer intervening
grid points for building an accurate discrete model of the transmission
line than other numerical methods like traditional Finite Difference
(FD) method. It is believed that given the number of state
variables, the presented method gives more accurate results than
other known passive discrete modeling methods. The theoretical
proof shows that HCFD-based modeling preserves the passivity.
-
(S) AQUASUN: Adaptive Window Query Processing in CAD Applications for Physical
Design and Verification [p. 153]
-
Michiel De Wilde, Dirk Stroobandt, J. Van Campenhout, Peter Verplaetse
(Ghent University, Belgium)
CAD applications for physical design and verification very
often require enumerating all layout objects whose bounding
box intersects an axis-aligned rectangular area. A number
of multidimensional access methods exist to process such
window queries. The performance of some important design
and verification algorithms heavily depends on the processing
speed of the used access method. For complex layouts,
these methods require huge amounts of resident memory to
attain this speed.
In this paper, we present a new access method called
aquasun, which brings a significant query processing performance
improvement over other adaptive methods-methods
which can cope with a continuously changing layout. These
methods generally descend from the database world and are
designed to perform the equivalent query in n-dimensional
space. Our method is specifically tailored to two dimensions,
exploiting 2D optimisations that significantly accelerate
window queries within oblong objects like PCB tracks.
Furthermore, aquasun makes use of an efficient compression
technique which greatly cuts down on memory usage.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids|Graphics;
H.2.2 [Database Management]: Physical Design|Access
methods; E.1 [Data Structures]: Trees
General Terms
Algorithms, Performance
Keywords
Multidimensional access methods, Rectangle indexes
Chair: Igor Markov (University of Michigan)
-
(F) Term Ordering Problem on MDG [p. 160]
-
Yi Feng, Eduard Cerny (Université de Montréal, Canada)
As an efficient representation of Extended Finite State Machines,
Multiway Decision Graphs (MDG) are suitable for automatic
hardware verification of Register Transfer Level (RTL) designs.
However, in some cases, MDG-based verification suffers from the
state explosion problem. Some of cases are caused by the standard
order used by MDG to order cross-terms that have the same top-level
function symbol. These terms usually label decision nodes
and must be ordered. We call this kind of state explosion the
standard term ordering problem. A solution based on function
renaming and cross-term rewriting is proposed in this paper.
Experimental results show that this solution can solve the problem
completely and thus increase the range of circuits that can be
verified by MDG.
Categories and Subject Descriptors
B.5.2 [Register-Transfer-Level Implementation]: Design Aids - verification.
General Terms
Verification.
Keywords
Multiway Decision Graphs, Standard term ordering, Variable
Ordering, First-order terms, Function symbols, Function
renaming, Term rewriting.
-
(S) A Low Power Direct Digital Frequency Synthesizer with 60 dBc
Spectral Purity [p. 166]
-
J.M.P. Langlois, Dhamin Al-Khalili (Royal Military College of Canada, Canada)
We present a low-power sine-output Direct Digital Frequency
Synthesizer (DDFS) realized in 0.18 mm CMOS that achieves 60
dBc spectral purity from DC to the Nyquist frequency. No ROM
or multipliers are used, but an external DAC is required if an
analog output is desired. Power consumption is 10 mW for a 100
MHz clock, which is significantly less than figures reported
previously. System complexity is greatly reduced by using an
efficient linear interpolation scheme to approximate a sinusoid
function. This has resulted in silicon area utilization of 0.011
mm2. The design would be suitable as an IP core in a low power
digital RF transceiver ASIC.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles -
algorithms implemented in hardware, VLSI.
General Terms
Algorithms, Design, Performance.
Keywords
DDFS, DDS, phase to sine amplitude conversion, low power
-
(S) Novel Design and Verification of a 16 x 16-b Self-Repairable Reconfigurable
Inner Product Processor [p. 172]
-
Rong Lin (SUNY-Geneseo, USA), Martin Margala (University of Rochester, USA)
A novel self-repairable and reconfigurable inner-product
processor with low-power, fast CMOS circuits and DFT
techniques is presented. It takes the advantage of recently
proposed decomposition based arithmetic circuit design
approach for simple implementation of the reconfigurations,
component replacements, and high-quality tests.
The processor can be dynamically reconfigured for two types
operations: 4 x 8 x 8-b inner product computation and 16 x 16-b
multiplication. The self-repair is provided by choosing a fault-free
one from 17 possible architectures during the test, which
covers more than 52% transistors for the specified faults. Only
one extra bit is needed for all reconfigurations, repairs, and tests.
The proposed exhaustive DFT technique greatly reduces the test
vector length, from 17*232 to 1.5*213, which is as
short as that
required by the pseudo-exhaustive DFT method recently
reported in literature.
Keywords
Reconfigurable, Decomposition Algorithms, Self-Repair, VLSI,
Arithmetic Circuits, Image Processing, Fault Tolerance
-
(S) Computing Walsh, Arithmetic and Reed-Muller Spectral Decision Diagrams Using
Graph Transformations [p. 178]
-
W.J. Townsend, M. A. Thornton (Mississippi State University), R. Drechsler
(University of Bremen, Germany), D. M. Miller (University of Victoria, Canada)
Spectral techniques have found many applications in computer-aided
design, including synthesis, verification, and testing.
Decision diagram representations permit spectral coefficients to
be calculated via graph-based algorithms. In this paper,
algorithms are described for transforming multi-output functions
to produce Walsh, arithmetic, and Reed-Muller spectral decision
diagrams and the experimental results of those implementations
are presented.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - automatic synthesis,
optimization, switching theory, verification.
J.6. [Computer-Aided Engineering]: Computer-aided design.
General Terms
Algorithm, Design, Verification.
Keywords
Data Structures, Decision Diagrams, Discrete Functions, Spectral
Methods.
|