|
ISPD 2002 ABSTRACTS
Sessions:
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
Host: S. Sapatnekar (University of Minnesota)
-
Challenges and Principles of Physical Design (Invited Paper) [p. 3]
-
S.T. Teig (Simplex Solutions, Inc.)
Addressing the physical design of integrated circuits, MCMs, and
printed-circuit boards requires solving some of the largest and
most difficult combinatorial optimization problems ever
attempted. Deep submicron process technologies and the rapid and
continuing increase in design size significantly add to the
difficulties. Fortunately, computing power, memory, and disk also
continue to increase rapidly, providing hope that our mostly subquadratic
heuristics will keep up with problem sizes. The
increasing popularity of assorted hierarchical methodologies
provides additional reason for optimism.
Unfortunately, the vast majority of the fundamental techniques
used in physical design today are ten to twenty years old1. These
techniques were developed when computers had hundreds of times
less CPU and storage, and they incorporate artificial and needless
constraints on both designs themselves and the algorithms used to
lay them out.
The physical design community includes some of the most skilled
algorithmists in the world. We have an opportunity to develop new
algorithms and, even more importantly, new methodologies that
permit chips and systems to be smaller, faster, lower power, more
reliable, and easier to design. To fulfill our potential, we must
choose carefully the research directions that will have the most
leverage to designers. We must also articulate principles to follow
that will maximize the effectiveness of future algorithms and
heuristics in our discipline. I give a personal view on these two
questions below.
Chair: F. Johannes (TU Munich)
-
An Effective Congestion Driven Placement Framework [p. 6]
-
U. Brenner, A. Rohe (University of Bonn)
We present a fast but reliable way to detect routing criticalities in
VLSI chips. In addition, we show how this congestion estimation
can be incorporated into a partitioning based placement algorithm.
Different to previous approaches, we do not rerun parts of
the placement algorithm or apply a post-placement optimization,
but we use our congestion estimator for a dynamic avoidance of
routability problems in one single run of the placement algorithm.
Computational experiments on chips with up to 1,300,000 cells are
presented: The framework reduces the usage of the most critical
routing edges by 9.0% on average, the running time increase for
the placement is about 8.7%. However, due to the smaller congestion,
the running time of routing tools can be decreased drastically,
so the total time for placement and (global) routing is decreased by
47% on average.
-
Consistent Placement of Macro-Blocks Using Floorplanning and
Standard-Cell Placement [p. 12]
-
S.N. Adya, I.L. Markov (University of Michigan)
While a number of recent works address large-scale standard-cell
placement, they typically assume that all macros are fixed. Floorplanning
techniques are very good at handling macros, but do not
scale to hundreds of thousands of placeable objects. Therefore we
combine floorplanning techniques with placement techniques in a
design flow that solves the more general placement problem. Our
work shows how to place macros consistently with large numbers
of small standard cells. Our techniques can also be used to guide
circuit designers who prefer to place macros by hand.
The proposed flow relies on an arbitrary black-box standardcell
placer to obtain an initial placement and then removes possible
overlaps using a fixed-outline floorplanner. This results in
valid placements for macros, which are considered fixed. Remaining
standard cells are then placed by another call to the standardcell
placer. Empirical evaluation on ibm benchmarks shows, in
most cases, wirelength improvements of 10%-50% compared to
Cadence QPlace, as well as runtime improvements.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids š Layout, Placement and
Routing
General Terms
Algorithms, Design
Organizer and Moderator: D. Hill (Synopsys)
Panelists:
R. Camposano (Synopsys);
W.-J. Dai (Silicon Perspective);
P. Hersher (Simplex);
J.G. Xi (Plato);
S. Young (Nasada)
Chair: P. Groeneveld (Magma)
-
Leakage Current in Low Standby Power and High Performance Devices: Trends and
Challenges (Invited Paper) [p. 22]
-
G.C.-F. Yeap (Motorola Inc.)
IC technology is continuing to scale according to Moore's
Law, with the overall chip circuit requirements driving the
MOSFET device and process integration requirements and
optimal choices. In the 2001 International Technology
Roadmap for Semiconductors (ITRS) [1] the driver for the
high performance logic is maximizing MOSFET intrinsic
speed, while the driver for low standby power logic is
minimizing MOSFET leakage current. Total leakage
current of a MOSFET consists of three components: offstate
sub-threshold leakage current, gate direct tunneling
leakage current and source/drain junction leakage current.
In this paper, trends and challenges for each leakage current
component in low standby power and high performance
devices are discussed from the perspective of the 2001
ITRS and recently reported literatures.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Stylesš
advanced technologies, VLSI (very large scale integration).
General Terms: Performance, Design and Reliability.
Keywords: Leakage current, off-state sub-threshold leakage,
gate tunneling leakage, low standby power, high performance,
CMOS technology, system-on-a-ship (SoC).
-
Leakage-Tolerant Design Techniques for High Performance Processors (Invited
Paper) [p. 28]
-
V. De (Intel Labs)
In sub-100nm technology generation, transistor subthreshold
leakage is 100-1000nA/µm for high performance microprocessor
logic technology. As gate oxide thickness approaches sub-10ª
regime, gate oxide leakage escalates to 10-100A/cm2. Junction
leakages also become significant as doping levels around the
junction approach 5X1018 cm-3. These excessive leakage currents
contribute to large leakage power dissipation during (1) active
operation, (2) standby or idle mode and (3) burn-in. In addition,
excessive subthreshold leakage degrades noise margin or
robustness of performance-critical circuits such as wide-OR
domino gates, register files and cache. Large gate oxide leakage
also limits circuit fanout. Therefore, high performance and low
power processor designs must employ leakage power control
techniques to alleviate active power dissipation and delivery
challenges, extend battery life and prevent thermal runaway during
burn-in. In addition, leakage-tolerant high performance circuits
must be used to provide adequate circuit robustness.
The most effective design technique for reducing leakage power is
dual-VT design. Performance-critical transistors are made
low-VT
to provide the required performance and high-VT transistors are
used everywhere else to minimize leakage power without
impacting processor frequency. Optimal dual-VT designs can
provide the same frequency as a design with single low-VT, while
limiting low-VT usage to 30%. As a result, leakage power during
active, standby and burn-in is reduced by 3X without any
performance impact. Of course, process complexity is slightly
higher since extra critical masking steps are needed to provide
additional transistor VT's. EDA tools for optimal VT allocation
during all phases of the design flow are critical for successful dual-VT designs.
VT allocations can be performed at logic gate level or
transistor level. While transistor level allocation is most effective
for leakage power reduction, it is also the most complex. 75-100mV
difference between the high-VT and low-VT values is
found to be most optimal for typical microprocessor designs.
Chair: K. Bazargan (Minnesota)
-
Design Hierarchy Guided Multilevel Circuit Partitioning [p. 30]
-
Y. Cheon, D.F. Wong (The University of Texas at Austin)
In this paper, we present a new multilevel circuit partitioning algorithm (dhml) which is guided by design hierarchy.
In addition to flat netlist hypergraph, we use user design hierarchy as a hint for partitioning because it already has
some implications on connectivity information between logical blocks in the design. Using design hierarchy in
partitioning is nontrivial since hierarchical elements in design hierarchy does not necessarily have strong internal
connectivity, hence we need to determine whether it is preferable to break up or preserve the hierarchical elements.
In order to identify and select the hierarchical elements with strong connectivity, Rent exponent is used. Then, the
selected hierarchical elements are used as effective clustering scopes during multilevel coasening phase. The
scopes are dynamically updated (enlarged) while building up a clustering tree so that the clustering tree resembles
the densely connected portions of the design hierarchy.
We tested our algorithm on a set of large industrial designs in which the largest one has 1.8 million cells, 2.8 millions
nets, and 11 levels of hierarchy. By exploitng design hierarchy, our algorithm produces higher quality partitioning results
than the state-of-the-art multilevel partitioner hMetis[7]. Furthermore, experimentsal results show that dhml yields
significantly more stable solutions, which is helpful in practice to reduce the number of runs to obtain the best result.
Categories and Subject Descriptors
B.7.2. [Integrated Circuits]: Design Aids; J.6[Computer Applications]: Computer-Aided Engineering - computer-aided
design (CAD)
General Terms
Algorithms, Design, Experimentation, Performance
Keywords
Circuit partititioning, design hierarchy, clustering, Rent's rule
-
Physical Hierarchy Generation with Routing Congestion Control [p. 36]
-
C.-C. Chang, J. Cong, X. Yuan (UCLA); Z. Pan (IBM T.J. Watson Research
Center)
In this paper, we develop a multi-level physical hierarchy generation
(mPG) algorithm integrated with fast incremental global routing
for directly updating and optimizing congestion cost during
placement. The fast global routing is achieved by using a fast twobend
routing and incremental A-tree algorithm. The routing congestion
is modeled by the wire usage estimated by the fast global
router. A hierarchical area density control is also developed for
placing objects with significant size variations. Experimental results
show that, compared to GORDIAN-L , the wire length driven
mPG is 3-6.5 times faster and generates slightly better wire length
for test circuits larger than 100K cells. Moreover, the congestion
driven mPG improves 50% wiring overflow with 5% larger bounding
box wire length but 3 - 6% shorter routing wire length measured
by graph based A-tree.
Categories and Subject Descriptors
B.7.2 [Hardware]: INTEGRATED CIRCUITS-Design Aids
General Terms
Algorithms, Design, Experimentation, Performance
Keywords
Placement, routing, congestion, interconnect, physical hierarchy,
deep sub-micron
-
Routability Driven White Space Allocation for Fixed-Die
Standard-Cell Placement [p. 42]
-
X. Yang, B.-K. Choi, M. Sarrafzadeh (University of California at Los Angeles)
The use of white space in fixed-die standard-cell placement is an effective
way to improve routability. In this paper, we present a white
space allocation approach that dynamically assigns white space according
to the congestion distribution of the placement. In the topdown
placement flow, white space is assigned to congested regions
using a smooth allocating function. A post allocation optimization
step is taken to further improve placement quality. Experimental
results show that the proposed allocation approach, combined
with a multilevel placement flow, significantly improves placement
routability and layout quality.
In our experiments, we compared our placement tool with two
other fixed-die placers using an industrial place and route flow.
Placements created by all three tools have been routed with an industrial
router (Warp Route of Cadence). Compared with a leadingedge
industrial tool, our placer produces placements with similar or
better routability and on average 8.8% shorter routed wirelength.
Furthermore, our tool produces placement that runs faster through
the Warp Route compared with the industrial tool. Compared with
a state-of-the-art academic placement tool (Capo/MetaPlacer), our
placer shows ability to produce more routable placements: for 15
out of all 16 benchmarks our placer‰s outputs are routable while
Capo/MetaPlacer only creates 4 routable placements.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids
General Terms
Algorithms, Experimentation
Keywords
Physical Design, placement, routability
Chair: H. Zhou (Northwestern)
-
Routability Driven Floorplanner with Buffer Block Planning [p. 50]
-
C.W. Sham, E.F.Y. Young (The Chinese University of Hong Kong)
In traditional floorplanners, area minimization is an important
issue. However, due to the recent advances in VLSI
technology, the number of transistors in a design are increasing
rapidly and so are their switching speeds. This has
increased the importance of interconnect delay and routability
in the overall performance of a circuit. We should consider
interconnect planning, buffer planning and routability
as early as possible. In this paper, we study and implement
a routability-driven floorplanner with congestion estimation
and buffer planning. Our method is based on a simulated
annealing approach that is divided into two phases: the area
optimization phase and the congestion optimization phase.
In the area optimization phase, modules are roughly placed
according to the total area and wirelength. In the congestion
optimization phase, a floorplan will be evaluated by its area,
wirelength, congestion and routability. We assume that every
buffer should be inserted at a .exible interval from each
other for long enough wires and probabilistic analysis is performed
to compute the congestion information taken into
accounts the constraints in buffer locations. Our approach
is able to reduce the average number of wires at the congested
areas and allow more feasible insertions of buffers to
satisfy the delay constraints without having much penalty
in increasing the area of the floorplan.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design AidsÖPlacement and
Routing
General Terms
Algorithms, Design, Performance
-
Integrated Floorplanning with Buffer/Channel Insertion for Bus-Based
Microprocessor Designs [p. 56]
-
F. Rafiq, N. Sherwani (Intel Microelectronics Services);
M. Chrzanowska-Jeske (ECE Portland State University); H.H. Yang (Intel
Corporation)
A new approach to the interconnect-driven floorplanning problem that
integrates bus planning with floorplanning is presented. The integrated
floorplanner is intended for bus-based designs. Each bus consists of a
large number of wires. The floorplanner ensures routability by
generating the exact location and shape of interconnects (above and
between the circuit blocks) and optimizes the timing. Experiments
with MCNC benchmarks clearly show the superiority of integrated
floorplanning over the classical floorplan-analyze-and-then-refloorplan
approach. Our floorplans are routable, meet all timing
constraints, and are on average 12-13% smaller in area as compared to
the traditional floorplanning algorithms.
Categories and Subject Descriptors
B.7.2 [Hardware}: Integrated Circuits-Placement and Routing; J.6
[Computer Applications]: Computer-aided Engineering š CAD.
General Terms: Algorithms, Performance.
Keywords: Floorplanning, Routability, Interconnect Estimation.
-
TEG: A New Post-Layout Optimization Method [p. 62]
-
S. Zhang, W.W.-M. Dai (University of California)
Post-layout is an important stage in the modern VLSI design.
With the completed detail routing, it is the only stage
where extraction and veri¿cation tools can get accurate results
for further optimization. But the problem is that design optimization
or modification are very hard to perform
in the post-layout stage, because most layout elements are
under tight geometry constraints due to the routing. In this
paper we propose a new method to resolve this problem,
named TEG. Based on an improved topological layout representation
and a set of layout operation algorithms, TEG
provides an incremental layout modification environment for
the post-layout applications. Experimental results showed
that TEG was efficient and effective in processing industry
VLSI designs.
-
An Algorithm for Optimal Decoupling Capacitor Sizing and
Placement for Standard Cell Layouts [p. 68]
-
H. Su, S.R. Nassif (IBM ARL); S.S. Sapatnekar (University of Minnesota)
With technology scaling, the trend for high performance integrated circuits is
towards ever high operating frequency, lower power supply voltages and high
power dissipation. This causes a dramatic increase in the currents being
delivered through the on-chip power grid and is recognized in the International
Technology Roadmap for Semiconductors as one of the difficult challenges.
The addition of decoupling capacitances (decaps) is arguably the most powerful
degree of freedom that a designer has for power-grid noise abatement and is
becoming more important as technology scales. In this paper, we propose and
demonstrate an algorithm for the automated placement and sizing of decaps
in ASIC-like circuits. The adjoint sensitivity of the power grid nosie with
respect to every decap. We propose a fast convoluation technique based on
piecewise linear (PWL) compressions of the original and adjoint waveforms.
Experimental results show that power grid nosie can be significantly reduced
after a judicious optimization of decap placement, with little change of the total
chip area.
Categories and Subject Descriptors
B.8.2 [Performances and Reliability]: Performance Analysis and Design Aids
General Terms
Algorithms
Keywords
decoupling capacitor, placement, optimization, power grid noise, adjoint sensitivity,
ASICs
Chair: M. Pedram (University of Southern California)
-
On-chip Thermal Engineering for Peta-scale Integration (Invited Paper) [p. 76]
-
S.-M. Kang (University of California, Santa Cruz)
Heat removal is already a significant manufacturing issue in the
current product engineering of VLSI systems. The problem will
get much more serious in peta-scale integrated systems with
much higher energy consumption and complex interconnect
structures, potentially including optical interchip and even
intrachip communications. New thermal engineering is much
needed for both performance and reliability. In the current
product development-quality CAD practice, few tools can
provide accurate electrothermal analysis of integrated circuits,
let alone optical interconnects. The conventional circuit level
and timing analyses use uniform temperatures estimated from
the ambient temperature, average power consumption of the
chip and the thermal conductivity of the package. It has been
shown that under poor thermal design seemingly perfect circuit
can fail due to rather significant temperature differential. This
was a case for a 10bit adder surrounded by a high power module
on one side and a moderate power module on the other side. Yet
the conventional method using an average power dependent
uniform temperature for simulation, even in the worst case, fail
to capture the malfunction of the circuit. In this case the uneven
temperature profile of the circuit caused skew in signal
propagation delays enough to cause what should be a clean
digital voltage switching to be a voltage glitch. Product
development managers in semiconductor industry have reported
such problems.
Electromigration (EM) reliability also critically
depends on local temperature. Thus the estimation of
interconnect temperatures and their impact on EM reliability
besides propagation delays becomes important. Both accuracy
and computational efficiency should be achieved in the
development of new CAD capability. Optical interconnects (OI)
on chip or chip-to-chip has been discussed over the last few
decades. And its commercial applications have been slow in
coming. However, the need for OI has been slowly increasing as
evidenced by the recent creation of optoelectronics business
sector in Intel Corporation. In a peta-scale system, optical
interconnects are likely to be essential, especially for 3D
interconnects with high signal fidelity. Computer-aided design
of OI systems using novel compact models for laser diodes and
photodetectors should become as routine as the current VLSI
design practice that relies on SPICE-like simulation tools and
high-level simulation tools, such as Rsoft‰s LinkSIM.
Equally important issue is how to solve the local hot
spot problems in energy efficient manner. The recent
development of thermionic cooling by A. Shakouri of UC Santa
Cruz provides an inroad for electronically controllable local
cooling of hot spots. However, such hot spots should be
predicted with realistic assessment based on specific
environmental conditions such as the substrate system, thermal
conductivity of package, and on-chip power consumption. P.
Mazumder of UC Berkeley introduced a novel method for
cooling using nanotubes. Based on the thermal and electrical
measurements of individual or small bundles of nanotubes, a
composite can be designed for the desired electromagnetic and
thermal properties. Microfluidic devices have been used to align
nanotubes in a certain direction. Carbon nanotubes can be mixed
with monomer solutions and then injected into such microfluidic
devices to align carbon nanotubes in the desired spatial pattern.
These are some examples of potential methods for thermal
engineering.
In this talk, we will review recent efforts for thermal
engineering and consider what design methods and technologies
may become available options for peta-scale integration with
high-performance and reliability.
Chair: M. Marek-Sadowska (University of California, Santa Barbara)
-
Shield Count Minimization in Congested Regions [p. 78]
-
P. Saxena, S. Gupta (Intel Corporation)
With worsening crosstalk in nanometer designs, it is becoming
increasingly important to control the switching cross-coupling
experienced by critical wires. This is commonly done by adding
shields adjacent to these wires. However, the number of wires
requiring shields in high frequency designs becomes extremely
large, resulting in a large area impact. We address this problem at
both the methodological and algorithmic levels in this paper,
integrating the traditionally separate steps of power and signal
routing in a safe manner to minimize the number of shields
required to satisfy all shielding constraints. We postpone the
power routing in middle metal layers to after critical signal nets
and their shields have been laid out (with maximal shield sharing),
and then try to construct a fine-grained power grid out of the
already routed shields. Given a routing on a metal layer, our
adaptive power routing algorithm adds provably fewest new
power lines to complete the power grid on that layer. Our
approach has proven highly effective while designing some high
frequency blocks of a commercial gigahertz range microprocessor.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids š layout, placement
and routing.
General Terms
Algorithms, Performance, Design.
Keywords
Layout, Routing, Power routing, Noise, Crosstalk, Shielding,
High performance design, Domino circuits.
-
On Convergence of Switching Windows Computation in Presence of
Crosstalk Noise [p. 84]
-
P. Chen (University of California, Berkeley and Silicon Perspective);
Y. Kukimoto, C.-C. Teng (University of California, Berkeley);
K. Keutzer (Silicon Perspective)
Detecting overlapping of switching windows between coupling nets
is a major static technique to accurately locate crosstalk noise. However,
due to the mutual dependency between switching windows,
the computation requires iterations to converge. In this paper, we
discuss the issues and provide answers to the important questions
involved in convergence and numerical properties, including the effect
of coupling models, multiple convergence points, convergence
rate, computational complexity, non-monotonicity, continuity and
the effectiveness of bounds. Numerical fixed point computation results
are used to explain these properties. Our contribution here
builds a theoretical foundation for static crosstalk noise analysis.
Categories and Subject Descriptors
B.7.2 [Design Aids]: VerificationÖTiming analysis, noise analysis
General Terms
Theory, Algorithms, Verification
Chair: J. Lou (Synopsys)
-
Buffer Insertion with Adaptive Blockage Avoidance [p. 92]
-
J. Hu, S.T. Quay, G. Gandham (IBM Microelectronics);
C.J. Alpert (IBM ARL)
Buffer insertion is a fundamental technology for VLSI interconnect optimization. Several existing
buffer insertion algorithms have evolved from van Ginneken's classic algorithm. In this work, we
extend van Ginneken's algorithm to handle blockages in the layout. Given a Steiner tree containing
a Steiner point that overlaps a blockage, a local adjustment is made to the tree topology that enables
additional buffer insertion candidates to be considered. This adjustment is adaptive to the demand on
buffer insertion and is incurred only when it facilitates the maximal slack solution. This approach
can be combined with any performance-driven Steiner tree construction. The overall time complexity
has linear dependence on the number of blockages and quadratic dependence on the number of potential
buffer locations. Experiments on several large nets confirm that high-quality solutions can be obtained
through this technique with little CPU cost.
Categories and Subject Descriptors
B.7.2 [Hardware]: Integrated Circuits|Design Aids
General Terms
Algorithms, Performance
-
Buffer Tree Synthesis with Consideration of Temporal Locality,
Sink Polarity Requirements, Solution Cost and Blockages [p. 98]
-
M. Hrkic, J. Lillis (University of Illinois at Chicago)
We give an overview of a buffer tree synthesis package which pays particular attention to the following issues:
routing and buffer blockages, minimization of interconnect and buffer costs, exploitation of termporal locality
among the sinks and addressing sink polarity requirements. Experimental results demonstrate the effectiveness
of the tool in comparison with previously proposed techniques.
Categories and Subject Descriptors
B.7.2. [Hardware]: Integrated Circuits - Design Aids
General Terms
Algorithms
-
Simultaneous Driver Sizing and Buffer Insertion Using a Delay
Penalty Estimation Technique [p. 104]
-
C.J. Alpert, J. Hu, C. Kashyap, S.T. Quay R. G. Gandham (IBM);
C.-N. Chu (Iowa State University);
M. Hrkic (University of Illinois at Chicago)
To achieve timing closure in a placed design, buffer insertion and driver sizing are two
of the most effective transforms that can be applied. Since the driver sizing solution
and the buffer insertion solution affect each other, sub-optimal solutions may result
if these techniques are applied sequentially instead of simultaneously. We show how
to simply extend van Vinneken's buffer insertion algorithm to simultaneously incorporate
driver sizing and introduce the idea of a delay penalty to encapsulate the effect of
driver sizing on the previous stage. The delay penalty can be pre-computed efficiently
via dynamic programming. Experimental results show that using driving sizing with a
delay penalty function obtains designs with superior timing and area characteristics.
Categories and Subject Descriptors
B.7.2. [Integrated Circuits]: Design Aids - Layout; J.6 [Computer Applications]:
Computer-Aided Engineering - Computer-aided design
General Terms
Algorithms, Design, Performance
Chair: C.J. Alpert (IBM)
-
A Roadmap and Vision for Physical Design (Invited Paper) [p. 112]
-
A.B. Kahng (University of California, San Diego)
This invited paper offers "roadmap and vision" for physical design.
The main messages are as follows. (1) The high-level roadmap for
physical design is static and well-known. (2) Basic problems remain
untouched by fundamental research. (3) Academia should not overemphasize
back-filling and formulation over innovation and optimization.
(4) The physical design field must become more mature
and efficient in how it prioritizes research directions and uses its human
resources. (5) The scope of physical design must expand (up to
package and system, down to manufacturing interfaces, out to novel
implementation technologies, etc.), even as renewed focus is placed
on basic optimization technology.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids š Layout, Place and Route;
F.2.2 [Analysis of Algorithms and Problem Complexity]: Non-Numerical
Algorithms and Problems š Routing and Layout; J.6
[Computer-Aided Engineering]: CAD
General Termss
Algorithms, Design
-
Physical Design with Multiple On-Chip Voltages (Invited Paper) [p. 118]
-
C. Chen (University of Windsor)
Supply voltage is one of the dominant factors that determine the
timing performance and power consumption of VLSI chips. For
digital systems with a single supply voltage, some logic devices
typically operate faster or more frequently than necessary,
consuming extra power. This motivates the need for multiplevoltage
design for an aggressive power optimization as slow
operation allows a reduced voltage level. While the past few years
have seen some successful examples from behavioral- and logiclevel
synthesis, physical designers are facing the challenges under
such a new design environment. This talk discusses the physical
aspects of multiple-voltage design, including the latest approaches
to emerging problems such as voltage interface, clock distribution,
power/ground network, and place-and-route (for more info, visit
the web site at: http://www.vlsi.uwindsor.ca/~cchen). The goal is
to enable designers to deal with key issues in implementation of
multiple-voltage chips.
We first introduce the concept of hierarchy with multiple voltages
which can be applied to various design levels. In general, the
voltages could be either dynamically or statically variable. The
dynamic-voltage scheme requires high-efficiency DC-DC
converters and a voltage scheduling mechanism which treats the
voltage as a function of workload and timing constraints. The
static-voltage strategy calls for considerations about the number of
different voltages and the selection of their values. We discuss
these issues with emphasis on dual-voltage solutions which are a
typical case.
We then study the interface between different voltages on a same
chip. At high levels of abstraction, the chip needs to be partitioned
into several regions so that each individual region uses the same
supply voltage. This can be done with careful floorplanning and
power/ground network design. At low levels of abstraction, the
interface design involves investigating specific layout structures.
Particularly, we discuss level converters which are inserted
between different voltages to prevent the static current.
Techniques to minimize the effect of level converters are also
explored.
Since clock distribution network is a power-hungry component on
today's chips, applying multiple voltages to it can save significant
power. The basic idea is to use a lower voltage swing through the
clock tree, and the clock signal can be converted to a higher
voltage at the sinks. Buffers in the clock tree and/or flip-flops at
the sinks can be combined with level converters to reduce their
delay and power overheads.
We also look at the power/ground network which is another
concern with multiple on-chip voltages. Designers can simply
distribute different power lines to everywhere by using standard
cells with more than one power rails. Alternatively, different
power supplies can be separately delivered to individual regions,
depending upon the layout structures. We will compare these
approaches, and analyze their effect on place-and-route. In
addition, a new "fabric" layout for P/G and signal nets will be
reviewed towards tackling the cross-coupling and signal integrity
which are exacerbated for signal nets with different voltages.
Multiple-voltage brings additional physical constraints into both
placement and routing, resulting in a more difficult layout design
for signal nets. To improve routing, new placer and router should
be able to assign highly-connected logic devices to the same
voltage, and to locate signals with same voltage in the
neighboring areas. Recent methodology from the literature will be
described. Furthermore, shielding techniques can be used locally
to minimize the capacitive crosstalk between signal nets.
Finally, we will present the future trends for multiple-voltage
design, together with tool aspects from an industrial perspective.
Chair: J. Lillis (University of Illinois at Chicago)
-
Incremental Delay Change Due to Crosstalk Noise [p. 120]
-
L.H. Chen (Avant! Corporation); M. Marek-Sadowska (University of California,
Santa Barbara)
In this paper we present efficient closed-form formulas to
estimate the incremental delay change induced by capacitive
interconnect coupling. We also analyze temporal
correlations among switching signals and develop criteria
for timing window alignment. Our approximations
are conservative and yet achieve acceptable accuracy.
The formulas are simple enough to be used in the inner
loops of static timing analysis.
-
Crosstalk Noise Optimization by Post-Layout Transistor Sizing [p. 126]
-
M. Hashimoto, M. Takahashi, H. Onodera (Kyoto University)
This paper proposes a post-layout transistor sizing method for
crosstalk noise reduction. The proposed method downsizes the
drivers of the aggressor wires for noise reduction, utilizing the precise
interconnect information extracted from the detail-routed layouts.
We develop a transistor sizing algorithm for crosstalk noise
reduction under delay constraints, and construct a crosstalk noise
optimization method utilizing a crosstalk noise estimation method
and a transistor sizing framework which are previously developed.
Our method exploits the transistor sizing framework that can vary
the transistor widths inside cells with interconnects unchanged.
Our optimization method therefore never cause a new crosstalk
noise problem, and does not need iterative layout optimization. The
effectiveness of the proposed method is experimentally examined
using 2 circuits. The maximum noise voltage is reduced by more
than 50% without delay increase. These results show that the risk of
crosstalk noise problems can be considerably reduced after detailrouting.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids
General Terms
Design
Keywords
crosstalk noise, capacitive coupling noise, transistor sizing, gate
sizing, post-layout optimization
-
Understanding and Addressing the Impact of Wiring Congestion
During Technology Mapping [p. 131]
-
D. Pandini (ST Microelectronics); L.T. Pileggi, A.J. Strojwas (Carnegie
Mellon University)
Traditionally, interconnect effects are taken into account during
logic synthesis via wireload models, but their ineffectiveness for
DSM technologies has been demonstrated and various physical
synthesis approaches have been spawned to address the problem.
Of particular interest is that logic block size is no longer dictated exclusively
by total cell area, yet synthesis optimization objectives are
aimed specifically at minimizing the number and size of cells.
Methodologies that incorporate congestion within the logic synthesis
objective function have been proposed in [9][10][11] and [15];
however, as we will demonstrate, predicting the true congestion prior
to layout is not possible, and the efficacy of any approach can
only be evaluated after routing is completed within the fixed die
size. In this paper we propose a practical, complete methodology
which first performs congestion-aware technology mapping using a
global weighting factor for the cost function [15], and then applies
incremental localized unmapping and remapping on congested areas.
This complete approach addresses the problem that one global
factor is not ideally suited for all regions of the designs. Most importantly,
through the application of this methodology to industrial
examples we will show that any attempt at a purely top-down single-pass
congestion-aware technology mapping is merely wishful
thinking.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Automatic synthesis.
B.7.2 [Integrated Circuits]: Design Aids - Layout, placement and
routing.
General Terms
Algorithms, Performance, Design.
-
Closing the Smoothness and Uniformity Gap in Area Fill Synthesis [p. 137]
-
Y. Chen (University of California, Los Angeles);
A.B. Kahng (University of California, San Diego);
G. Robins (University of Virginia); A. Zelikovsky (George State University)
Control of variability in the back end of the line, and hence in
interconnect performance as well, has become extremely difficult
with the introduction of new materials such as copper and low-k dielectrics.
Uniformity of chemical-mechanical planarization (CMP)
requires the addition of area fill geometries into the layout, in order
to smoothen the variation of feature densities across the die. Our
work addresses the following smoothness gap in the recent literature
on area fill synthesis. (1) The very first paper on the filling
problem (Kahng et al., ISPD98 [7]) noted that there is potentially
a large difference between the optimum window densities in fixed
dissections vs. when all possible windows in the layout are considered.
(2) Despite this observation, all filling methods since 1998
minimize and evaluate density variation only with respect to a fixed
dissection. This paper gives the first evaluation of existing filling algorithms
with respect to "gridless" ("floating-window") mode, according
to both the effective and spatial density models. Our experiments
indicate surprising advantages of Monte-Carlo and greedy
strategies over ‹optimalŠ linear programming (LP) based methods.
Second, we suggest new, more relevant methods of measuring a
local uniformity based on Lipschitz conditions, and empirically
demonstrate that Monte-Carlo methods are inherently better than
LP with respect to the new criteria. Finally, we propose new LPbased
filling methods that are directly driven by the new criteria,
and show that these methods indeed help close the ‹smoothness
gapŠ.
Categories and Subject Descriptors
B.7.2 [Hardware]: ICÖ Manufacturability; J.6 [Computer Applications]:
CAD; F.2.2 [Analysis of Algorithms]: Problem Complexity
General Terms
Algorithms, Design, Reliability, Theory
Keywords
VLSI Manufacturability, Chemical-Mechanical Polishing, Dummy Fill Problem,
Density Analysis, Monte-Carlo
-
Min-Max Placement For Large Scale Timing Optimization [p. 143]
-
Andrew B. Kahng (University of California, San Diego); S. Mantik (University of
California, Los Angeles); I.L. Markov (University of Michigan)
At the 250nm technology node, interconnect delays account for over
40% of worst delays [12]. Transition to 130nm and below increases
this figure, and hence the relative importance of timing-driven placement
for VLSI. Our work introduces a novel minimization of maximal
path delay that improves upon previously known algorithms for
timing-driven placement. Our placement algorithms have provable
properties and are fast in practice. Empirical validation is based on
extending a scalable min-cut placer with proven quality in wirelengthand
congestion-driven placement [4]. The CPU overhead of the timing-driven
capability is within 50%. We placed industrial circuits and
evaluated the resulting layouts with a commercial static timing analyzer.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Layout, Place and Route
General Terms
Algorithms, Design
-
Global Clustering-Based Performance-Driven Circuit Partitioning [p. 149]
-
J. Cong (University of California, Los Angeles); C. Wu (Aplus Design
Technologies, Inc.)
In this paper, we propose a new global clustering based multi-level
partitioning algorithm for performance optimization. Our algorithm
computes a delay minimal K-way partition first, then gradually
reduces the cutsize while keeping the circuit delay by de-clustering
and refinement. Our test results on a set of MCNC sequential
examples show that we can reduce the delay by 30%, while increasing
the cutsize by 28% on average, when compared with hMetis [5]. Our
algorithm also consistently outperforms other state-of-the-art partitioning
algorithms [2,5,3] on circuit delay with reasonable cost on the cutsize.
-
Net Criticality Revisited: An Effective Method to Improve Timing
in Physical Design [p. 155]
-
H.-L. Chang, E. Shragowitz, J. Liu (University of Minnesota);
H. Youseff (Universite du Centre); B. Lu (Cadence Design Systems Inc);
S. Sutanthavibul (Intel Corporation)
Criticality metrics is a type of predictive models used in VLSI
design. This work demonstrates that timing in physical design
could be substantially improved if circuits were subjected to timing
criticality analysis prior to layout and new criticality metrics were
used to drive layout system. These new metrics are computed as
ratios of net physical characteristics to the net delay bounds
determined by an optimal bounds computation algorithm. Attempts
to develop criticality metrics prior to layout were made before, but
these metrics were not based on the bound ideology. The paper
provides probabilistic interpretation of new criticality metrics and
derivation of some important properties of these metrics.
Evaluation of net criticality by new metrics can be easily merged
with any layout system that allows weights to be assigned to nets
on placement and/or routing steps. The methodology has been
tested with a commercial layout system from a leading CAD
provider. When the new criticality information was supplied to a
basic commercial standard cell placer and router, timing was
improved by 29.5% for the set of testcases. The achieved result is
12% better than timing generated by a targeted timing-driven
layout system from the same provider. All additional computations
related to the new criticality metrics require only negligible
increase in run time of the basic layout system.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles - very large
scale integration.
General Terms
Algorithms, Performance, Theory.
Keywords
Net Delay Bound, Criticality Metrics, Placement, Routing.
-
FAR: Fixed-Points Addition and Relaxation Based Placement [p. 161]
-
B. Hu, M. Marek-Sadowska (University of California, Santa Barbara)
In this paper we describe the Fixed-points Addition and Relaxation
(FAR) based placement technique. Fixed point is a pseudo cell connected
to a movable cell. By introducing fixed points, the placement can
be maintained in a force equilibrium state and further transformed into
another equilibrium state. By relaxing some of the previously introduced
fixed points, we can partially or completely collapse the current placement
in order to reposition the cells, or incrementally perturb the existing
good solution to fulfill additional requirements. We apply the FARbased
approach to global placement for total wire length minimization,
and to incremental placement for Buffer Site Generation (BSG). For global
placement, our experimental results show that the FAR method
achieves 53.8% CPU speedup and total wire length comparable to that
achieved by the constant force based approach [1]. Experimental results
indicate that to accommodate buffers in specific regions, FAR is able to
perturb incrementally a given solution in a well-controlled way.
Organizer and Moderato: R.A. Rutenbar (Carnegie Mellon University)
Speakers: B. McCredie (IBM), K. Rinebold (Avant!), D. Wood (Intel)
Chair: M. Pedram (University of Southern California)
-
Timing Closure Based on Physical Hierarchy (Invited Paper) [p. 170]
-
J. Cong (University of California, Los Angeles)
In this paper, we shall present the progress and results of the
ongoing project at UCLA on synthesis and optimization under
physical hierarchy. First, we shall motivate our approach by
pointing out the limitations of the existing approach to
interconnect planning based on early RTL floorplanning
following logic hierarchy. Then, we shall discuss the technical
challenges for synthesis under the physical hierarchy, including
handling high computational complexity from the flattened logic
hierarchy, needs of retiming and pipelining over global
interconnects, and extension of existing synthesis operations.
Finally, we shall outline our approaches to overcome these
technical challenges.
Categories and Subject Descriptors
B.7.2 [Hardware]: INTEGRATED CIRCUITS - Design
Aids.
General Terms
Algorithms, Performance, Design, Experimentation.
Keywords
Physical hierarchy, logic hierarchy, timing closure,
interconnect planning, interconnect optimization,
multilevel optimization, retiming and pipelining,
sequential arrival time.
Chair: I. Markov (University of Michigan)
-
Timing-Driven Routing for FPGAs Based on Lagrangian
Relaxation [p. 176]
-
S. Lee, D.F. Wong (University of Texas at Austin)
As interconnection delay plays an important role in determining circuit
performance in FPGAs, timing-driven FPGA routing has received much
attention recently. In this paper, we present a new timing-driven routing
algorithm for FPGAs. The algorithm finds a routing with minimum critical
path delay for a given placed circuit using the Langrangian relaxation technique.
Lagrangian multiplies used to relax timing constraints are updated by
subgradient method over iterations. Inforporated into the cost function, these
multipliers guide the router to construct routing tree for each net. During
routing, the exclusivity constraints on each routing resources are also taken care
of to route circuits successfully. Experimental results on benchmark circuits
show that our approach outperforms the state-of-the-art VPR router.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Placement and routing; J.6
[Computer Applications]: Computer-Aided Engineering - Computer-aided
design (CAD)
General Terms
Algorithms, Experimentation
Keywords
FPGA, timing-driven routing, Lagrangian relaxation.
-
sub-SAT: A Formulation for Relaxed Boolean Satisfiability with
Applications in Routing [p. 182]
-
H. Xu, R.A. Rutenbar (Carnegie Mellon University); K. Sakallah (University
of Michigan)
Advances in methods for solving Boolean satisfiability (SAT) for
large problems have motivated recent attempts to recast physical design
problems as Boolean SAT problems. One persistent criticism of
these approaches is their inability to supply partial solutions, i.e, to
satisfy most but not all of the constraints cast in the SAT style. In this
paper we present a formulation for ‹subset satisfiableŠ Boolean SAT:
we transform a "strict" SAT problem with N constraints into a new,
‹relaxedŠ SAT problem which is satisfiable just if not more than
k<<N of these constraints cannot be satisfied in the original problem.
We describe a transformation based on explicit thresholding and
counting for the necessary SAT relaxation. Examples from FPGA
routing show how we can determine efficiently when we can satisfy
"almost all" of our geometric constraints.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design aidsÖlayout
General Terms
Algorithms
Chair: L. Scheffer (Cadence)
-
Temporal Logic Replication for Dynamically Reconfigurable
FPGA Partitioning [p. 190]
-
W.-K. Mak (University of South Florida), E.F.Y. Young (The Chinese
University of Hong Kong)
In this paper, we propose the idea of termporal logic replication in
dynamically reconfigurable field-programmable gate array partitioning
to reduce communication cost. Temporal logic replication has never been
explored before. We define the min-area min-cut replication problem given
a k-stage temporal partition satisfying all temporal constraints and
devise an optimal algorithm to solve this problem. We have also devised
a flow-based replication heuristic in case there is a tight area bound
that limtis the amount of replication. In addition, we will present a
correct network flow model for partitioning sequential circuits temporally.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Layout; J.6 [Computer Applications]:
Computer-Aided Engineering - Computer-aided design
General Terms
Algorithms, Design, Performance
-
Twin Binary Sequences: A Non-Redundant Representation for
General Non-slicing Floorplan [p. 196]
-
E.F.Y. Young (The Chinese University of Hong Kong); C.C.N. Chu, Z.C. Shen
(Iowa State University)
The efficiency and effectiveness of many floorplanning methods depend
very much on the representation of the geometrical relationship between
the modules. A good representation can shorten the searching process
so that more accurate estimations on area and interconnect costs
can be performed. Non-slicing floorplan is the most general kind of
floorplan that is commonly used. Unfortunately, there is not yet any
complete and non-redundant topological representation for non-slicing
structure. In this paper, we will propose the first representation of this
kind. Like some previous work [9], we have also made used of mosaic floorplan
as an intermediate step. However, instead of including a more than sufficient
number of extra dummy blocks in the set of modules, our representation allows
us to insert an exact number of irreducible empty rooms to a mosaic floorplan
in such a way that every non-slicing floorplan can be obtained by this method
uniquely from one and only one mosaic floorplan. The size of the solution space
is only O(n!23n/n1.5) but every non-slicing floorplan
can be generated uniquely and efficiently in linear time without nay redundant
representation.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Layout; J.6 [Computer Applications]:
Computer-Aided Engrineering - Computer-aided design
General Terms
Algorithms, Design, Performance
-
Geometrically Parameterized Interconnect Performance Models for
Interconnect Synthesis [p. 202]
-
L. Daniel (University of California, Berkeley); C.S. Ong, S.C. Low,
K.H. Lee (National University of Singapore); J. White (Massachusetts Institute
of Technology)
In this paper we describe an approach for generating geometrically parameterized
integrated-circuit interconnect models that are efficient enough for use in
interconnect synthesis. The model generation
approach presented is automatic, and is based on a multiparameter
model-reduction algorithm. The effectiveness of the technique
is tested using a multi-line bus example, where both wire
spacing and wire width are considered as geometric parameters.
Experimental results demonstrate that the generated models accurately
predict both delay and cross-talk effects over a wide range of
spacing and width variation.
|