ISPD 2002 ABSTRACTS

Sessions: [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]


Keynote Address

Host: S. Sapatnekar (University of Minnesota)

Challenges and Principles of Physical Design (Invited Paper) [p. 3]
S.T. Teig (Simplex Solutions, Inc.)

Addressing the physical design of integrated circuits, MCMs, and printed-circuit boards requires solving some of the largest and most difficult combinatorial optimization problems ever attempted. Deep submicron process technologies and the rapid and continuing increase in design size significantly add to the difficulties. Fortunately, computing power, memory, and disk also continue to increase rapidly, providing hope that our mostly subquadratic heuristics will keep up with problem sizes. The increasing popularity of assorted hierarchical methodologies provides additional reason for optimism. Unfortunately, the vast majority of the fundamental techniques used in physical design today are ten to twenty years old1. These techniques were developed when computers had hundreds of times less CPU and storage, and they incorporate artificial and needless constraints on both designs themselves and the algorithms used to lay them out. The physical design community includes some of the most skilled algorithmists in the world. We have an opportunity to develop new algorithms and, even more importantly, new methodologies that permit chips and systems to be smaller, faster, lower power, more reliable, and easier to design. To fulfill our potential, we must choose carefully the research directions that will have the most leverage to designers. We must also articulate principles to follow that will maximize the effectiveness of future algorithms and heuristics in our discipline. I give a personal view on these two questions below.


Session 1: Placement

Chair: F. Johannes (TU Munich)
An Effective Congestion Driven Placement Framework [p. 6]
U. Brenner, A. Rohe (University of Bonn)

We present a fast but reliable way to detect routing criticalities in VLSI chips. In addition, we show how this congestion estimation can be incorporated into a partitioning based placement algorithm. Different to previous approaches, we do not rerun parts of the placement algorithm or apply a post-placement optimization, but we use our congestion estimator for a dynamic avoidance of routability problems in one single run of the placement algorithm. Computational experiments on chips with up to 1,300,000 cells are presented: The framework reduces the usage of the most critical routing edges by 9.0% on average, the running time increase for the placement is about 8.7%. However, due to the smaller congestion, the running time of routing tools can be decreased drastically, so the total time for placement and (global) routing is decreased by 47% on average.

Consistent Placement of Macro-Blocks Using Floorplanning and Standard-Cell Placement [p. 12]
S.N. Adya, I.L. Markov (University of Michigan)

While a number of recent works address large-scale standard-cell placement, they typically assume that all macros are fixed. Floorplanning techniques are very good at handling macros, but do not scale to hundreds of thousands of placeable objects. Therefore we combine floorplanning techniques with placement techniques in a design flow that solves the more general placement problem. Our work shows how to place macros consistently with large numbers of small standard cells. Our techniques can also be used to guide circuit designers who prefer to place macros by hand. The proposed flow relies on an arbitrary black-box standardcell placer to obtain an initial placement and then removes possible overlaps using a fixed-outline floorplanner. This results in valid placements for macros, which are considered fixed. Remaining standard cells are then placed by another call to the standardcell placer. Empirical evaluation on ibm benchmarks shows, in most cases, wirelength improvements of 10%-50% compared to Cadence QPlace, as well as runtime improvements.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids š Layout, Placement and Routing
General Terms
Algorithms, Design


Session 2: Panel - Do you have to get your whole physical solution from one company [p. 19]

Organizer and Moderator: D. Hill (Synopsys)
Panelists: R. Camposano (Synopsys); W.-J. Dai (Silicon Perspective); P. Hersher (Simplex); J.G. Xi (Plato); S. Young (Nasada)

Session 3: Leakage Current

Chair: P. Groeneveld (Magma)
Leakage Current in Low Standby Power and High Performance Devices: Trends and Challenges (Invited Paper) [p. 22]
G.C.-F. Yeap (Motorola Inc.)

IC technology is continuing to scale according to Moore's Law, with the overall chip circuit requirements driving the MOSFET device and process integration requirements and optimal choices. In the 2001 International Technology Roadmap for Semiconductors (ITRS) [1] the driver for the high performance logic is maximizing MOSFET intrinsic speed, while the driver for low standby power logic is minimizing MOSFET leakage current. Total leakage current of a MOSFET consists of three components: offstate sub-threshold leakage current, gate direct tunneling leakage current and source/drain junction leakage current. In this paper, trends and challenges for each leakage current component in low standby power and high performance devices are discussed from the perspective of the 2001 ITRS and recently reported literatures.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Stylesš advanced technologies, VLSI (very large scale integration).
General Terms: Performance, Design and Reliability.
Keywords: Leakage current, off-state sub-threshold leakage, gate tunneling leakage, low standby power, high performance, CMOS technology, system-on-a-ship (SoC).

Leakage-Tolerant Design Techniques for High Performance Processors (Invited Paper) [p. 28]
V. De (Intel Labs)

In sub-100nm technology generation, transistor subthreshold leakage is 100-1000nA/µm for high performance microprocessor logic technology. As gate oxide thickness approaches sub-10ª regime, gate oxide leakage escalates to 10-100A/cm2. Junction leakages also become significant as doping levels around the junction approach 5X1018 cm-3. These excessive leakage currents contribute to large leakage power dissipation during (1) active operation, (2) standby or idle mode and (3) burn-in. In addition, excessive subthreshold leakage degrades noise margin or robustness of performance-critical circuits such as wide-OR domino gates, register files and cache. Large gate oxide leakage also limits circuit fanout. Therefore, high performance and low power processor designs must employ leakage power control techniques to alleviate active power dissipation and delivery challenges, extend battery life and prevent thermal runaway during burn-in. In addition, leakage-tolerant high performance circuits must be used to provide adequate circuit robustness. The most effective design technique for reducing leakage power is dual-VT design. Performance-critical transistors are made low-VT to provide the required performance and high-VT transistors are used everywhere else to minimize leakage power without impacting processor frequency. Optimal dual-VT designs can provide the same frequency as a design with single low-VT, while limiting low-VT usage to 30%. As a result, leakage power during active, standby and burn-in is reduced by 3X without any performance impact. Of course, process complexity is slightly higher since extra critical masking steps are needed to provide additional transistor VT's. EDA tools for optimal VT allocation during all phases of the design flow are critical for successful dual-VT designs. VT allocations can be performed at logic gate level or transistor level. While transistor level allocation is most effective for leakage power reduction, it is also the most complex. 75-100mV difference between the high-VT and low-VT values is found to be most optimal for typical microprocessor designs.


Session 4: Physical Hierarchy

Chair: K. Bazargan (Minnesota)
Design Hierarchy Guided Multilevel Circuit Partitioning [p. 30]
Y. Cheon, D.F. Wong (The University of Texas at Austin)

In this paper, we present a new multilevel circuit partitioning algorithm (dhml) which is guided by design hierarchy. In addition to flat netlist hypergraph, we use user design hierarchy as a hint for partitioning because it already has some implications on connectivity information between logical blocks in the design. Using design hierarchy in partitioning is nontrivial since hierarchical elements in design hierarchy does not necessarily have strong internal connectivity, hence we need to determine whether it is preferable to break up or preserve the hierarchical elements. In order to identify and select the hierarchical elements with strong connectivity, Rent exponent is used. Then, the selected hierarchical elements are used as effective clustering scopes during multilevel coasening phase. The scopes are dynamically updated (enlarged) while building up a clustering tree so that the clustering tree resembles the densely connected portions of the design hierarchy. We tested our algorithm on a set of large industrial designs in which the largest one has 1.8 million cells, 2.8 millions nets, and 11 levels of hierarchy. By exploitng design hierarchy, our algorithm produces higher quality partitioning results than the state-of-the-art multilevel partitioner hMetis[7]. Furthermore, experimentsal results show that dhml yields significantly more stable solutions, which is helpful in practice to reduce the number of runs to obtain the best result.
Categories and Subject Descriptors
B.7.2. [Integrated Circuits]: Design Aids; J.6[Computer Applications]: Computer-Aided Engineering - computer-aided design (CAD) General Terms
Algorithms, Design, Experimentation, Performance Keywords
Circuit partititioning, design hierarchy, clustering, Rent's rule

Physical Hierarchy Generation with Routing Congestion Control [p. 36]
C.-C. Chang, J. Cong, X. Yuan (UCLA); Z. Pan (IBM T.J. Watson Research Center)

In this paper, we develop a multi-level physical hierarchy generation (mPG) algorithm integrated with fast incremental global routing for directly updating and optimizing congestion cost during placement. The fast global routing is achieved by using a fast twobend routing and incremental A-tree algorithm. The routing congestion is modeled by the wire usage estimated by the fast global router. A hierarchical area density control is also developed for placing objects with significant size variations. Experimental results show that, compared to GORDIAN-L , the wire length driven mPG is 3-6.5 times faster and generates slightly better wire length for test circuits larger than 100K cells. Moreover, the congestion driven mPG improves 50% wiring overflow with 5% larger bounding box wire length but 3 - 6% shorter routing wire length measured by graph based A-tree.
Categories and Subject Descriptors
B.7.2 [Hardware]: INTEGRATED CIRCUITS-Design Aids
General Terms
Algorithms, Design, Experimentation, Performance
Keywords
Placement, routing, congestion, interconnect, physical hierarchy, deep sub-micron

Routability Driven White Space Allocation for Fixed-Die Standard-Cell Placement [p. 42]
X. Yang, B.-K. Choi, M. Sarrafzadeh (University of California at Los Angeles)

The use of white space in fixed-die standard-cell placement is an effective way to improve routability. In this paper, we present a white space allocation approach that dynamically assigns white space according to the congestion distribution of the placement. In the topdown placement flow, white space is assigned to congested regions using a smooth allocating function. A post allocation optimization step is taken to further improve placement quality. Experimental results show that the proposed allocation approach, combined with a multilevel placement flow, significantly improves placement routability and layout quality. In our experiments, we compared our placement tool with two other fixed-die placers using an industrial place and route flow. Placements created by all three tools have been routed with an industrial router (Warp Route of Cadence). Compared with a leadingedge industrial tool, our placer produces placements with similar or better routability and on average 8.8% shorter routed wirelength. Furthermore, our tool produces placement that runs faster through the Warp Route compared with the industrial tool. Compared with a state-of-the-art academic placement tool (Capo/MetaPlacer), our placer shows ability to produce more routable placements: for 15 out of all 16 benchmarks our placer‰s outputs are routable while Capo/MetaPlacer only creates 4 routable placements.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids
General Terms
Algorithms, Experimentation
Keywords
Physical Design, placement, routability


Session 5: Floorplanning and Postlayout Optimization

Chair: H. Zhou (Northwestern)
Routability Driven Floorplanner with Buffer Block Planning [p. 50]
C.W. Sham, E.F.Y. Young (The Chinese University of Hong Kong)

In traditional floorplanners, area minimization is an important issue. However, due to the recent advances in VLSI technology, the number of transistors in a design are increasing rapidly and so are their switching speeds. This has increased the importance of interconnect delay and routability in the overall performance of a circuit. We should consider interconnect planning, buffer planning and routability as early as possible. In this paper, we study and implement a routability-driven floorplanner with congestion estimation and buffer planning. Our method is based on a simulated annealing approach that is divided into two phases: the area optimization phase and the congestion optimization phase. In the area optimization phase, modules are roughly placed according to the total area and wirelength. In the congestion optimization phase, a floorplan will be evaluated by its area, wirelength, congestion and routability. We assume that every buffer should be inserted at a .exible interval from each other for long enough wires and probabilistic analysis is performed to compute the congestion information taken into accounts the constraints in buffer locations. Our approach is able to reduce the average number of wires at the congested areas and allow more feasible insertions of buffers to satisfy the delay constraints without having much penalty in increasing the area of the floorplan.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design AidsÖPlacement and Routing
General Terms
Algorithms, Design, Performance

Integrated Floorplanning with Buffer/Channel Insertion for Bus-Based Microprocessor Designs [p. 56]
F. Rafiq, N. Sherwani (Intel Microelectronics Services); M. Chrzanowska-Jeske (ECE Portland State University); H.H. Yang (Intel Corporation)

A new approach to the interconnect-driven floorplanning problem that integrates bus planning with floorplanning is presented. The integrated floorplanner is intended for bus-based designs. Each bus consists of a large number of wires. The floorplanner ensures routability by generating the exact location and shape of interconnects (above and between the circuit blocks) and optimizes the timing. Experiments with MCNC benchmarks clearly show the superiority of integrated floorplanning over the classical floorplan-analyze-and-then-refloorplan approach. Our floorplans are routable, meet all timing constraints, and are on average 12-13% smaller in area as compared to the traditional floorplanning algorithms.
Categories and Subject Descriptors
B.7.2 [Hardware}: Integrated Circuits-Placement and Routing; J.6 [Computer Applications]: Computer-aided Engineering š CAD.
General Terms: Algorithms, Performance.
Keywords: Floorplanning, Routability, Interconnect Estimation.

TEG: A New Post-Layout Optimization Method [p. 62]
S. Zhang, W.W.-M. Dai (University of California)

Post-layout is an important stage in the modern VLSI design. With the completed detail routing, it is the only stage where extraction and veri¿cation tools can get accurate results for further optimization. But the problem is that design optimization or modification are very hard to perform in the post-layout stage, because most layout elements are under tight geometry constraints due to the routing. In this paper we propose a new method to resolve this problem, named TEG. Based on an improved topological layout representation and a set of layout operation algorithms, TEG provides an incremental layout modification environment for the post-layout applications. Experimental results showed that TEG was efficient and effective in processing industry VLSI designs.

An Algorithm for Optimal Decoupling Capacitor Sizing and Placement for Standard Cell Layouts [p. 68]
H. Su, S.R. Nassif (IBM ARL); S.S. Sapatnekar (University of Minnesota)

With technology scaling, the trend for high performance integrated circuits is towards ever high operating frequency, lower power supply voltages and high power dissipation. This causes a dramatic increase in the currents being delivered through the on-chip power grid and is recognized in the International Technology Roadmap for Semiconductors as one of the difficult challenges. The addition of decoupling capacitances (decaps) is arguably the most powerful degree of freedom that a designer has for power-grid noise abatement and is becoming more important as technology scales. In this paper, we propose and demonstrate an algorithm for the automated placement and sizing of decaps in ASIC-like circuits. The adjoint sensitivity of the power grid nosie with respect to every decap. We propose a fast convoluation technique based on piecewise linear (PWL) compressions of the original and adjoint waveforms. Experimental results show that power grid nosie can be significantly reduced after a judicious optimization of decap placement, with little change of the total chip area.
Categories and Subject Descriptors
B.8.2 [Performances and Reliability]: Performance Analysis and Design Aids
General Terms
Algorithms
Keywords
decoupling capacitor, placement, optimization, power grid noise, adjoint sensitivity, ASICs


Session 6: Thermal Issues

Chair: M. Pedram (University of Southern California)
On-chip Thermal Engineering for Peta-scale Integration (Invited Paper) [p. 76]
S.-M. Kang (University of California, Santa Cruz)

Heat removal is already a significant manufacturing issue in the current product engineering of VLSI systems. The problem will get much more serious in peta-scale integrated systems with much higher energy consumption and complex interconnect structures, potentially including optical interchip and even intrachip communications. New thermal engineering is much needed for both performance and reliability. In the current product development-quality CAD practice, few tools can provide accurate electrothermal analysis of integrated circuits, let alone optical interconnects. The conventional circuit level and timing analyses use uniform temperatures estimated from the ambient temperature, average power consumption of the chip and the thermal conductivity of the package. It has been shown that under poor thermal design seemingly perfect circuit can fail due to rather significant temperature differential. This was a case for a 10bit adder surrounded by a high power module on one side and a moderate power module on the other side. Yet the conventional method using an average power dependent uniform temperature for simulation, even in the worst case, fail to capture the malfunction of the circuit. In this case the uneven temperature profile of the circuit caused skew in signal propagation delays enough to cause what should be a clean digital voltage switching to be a voltage glitch. Product development managers in semiconductor industry have reported such problems.
Electromigration (EM) reliability also critically depends on local temperature. Thus the estimation of interconnect temperatures and their impact on EM reliability besides propagation delays becomes important. Both accuracy and computational efficiency should be achieved in the development of new CAD capability. Optical interconnects (OI) on chip or chip-to-chip has been discussed over the last few decades. And its commercial applications have been slow in coming. However, the need for OI has been slowly increasing as evidenced by the recent creation of optoelectronics business sector in Intel Corporation. In a peta-scale system, optical interconnects are likely to be essential, especially for 3D interconnects with high signal fidelity. Computer-aided design of OI systems using novel compact models for laser diodes and photodetectors should become as routine as the current VLSI design practice that relies on SPICE-like simulation tools and high-level simulation tools, such as Rsoft‰s LinkSIM. Equally important issue is how to solve the local hot spot problems in energy efficient manner. The recent development of thermionic cooling by A. Shakouri of UC Santa Cruz provides an inroad for electronically controllable local cooling of hot spots. However, such hot spots should be predicted with realistic assessment based on specific environmental conditions such as the substrate system, thermal conductivity of package, and on-chip power consumption. P. Mazumder of UC Berkeley introduced a novel method for cooling using nanotubes. Based on the thermal and electrical measurements of individual or small bundles of nanotubes, a composite can be designed for the desired electromagnetic and thermal properties. Microfluidic devices have been used to align nanotubes in a certain direction. Carbon nanotubes can be mixed with monomer solutions and then injected into such microfluidic devices to align carbon nanotubes in the desired spatial pattern. These are some examples of potential methods for thermal engineering.
In this talk, we will review recent efforts for thermal engineering and consider what design methods and technologies may become available options for peta-scale integration with high-performance and reliability.


Session 7: Crosstalk Noise

Chair: M. Marek-Sadowska (University of California, Santa Barbara)
Shield Count Minimization in Congested Regions [p. 78]
P. Saxena, S. Gupta (Intel Corporation)

With worsening crosstalk in nanometer designs, it is becoming increasingly important to control the switching cross-coupling experienced by critical wires. This is commonly done by adding shields adjacent to these wires. However, the number of wires requiring shields in high frequency designs becomes extremely large, resulting in a large area impact. We address this problem at both the methodological and algorithmic levels in this paper, integrating the traditionally separate steps of power and signal routing in a safe manner to minimize the number of shields required to satisfy all shielding constraints. We postpone the power routing in middle metal layers to after critical signal nets and their shields have been laid out (with maximal shield sharing), and then try to construct a fine-grained power grid out of the already routed shields. Given a routing on a metal layer, our adaptive power routing algorithm adds provably fewest new power lines to complete the power grid on that layer. Our approach has proven highly effective while designing some high frequency blocks of a commercial gigahertz range microprocessor.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids š layout, placement and routing.
General Terms
Algorithms, Performance, Design.
Keywords
Layout, Routing, Power routing, Noise, Crosstalk, Shielding, High performance design, Domino circuits.

On Convergence of Switching Windows Computation in Presence of Crosstalk Noise [p. 84]
P. Chen (University of California, Berkeley and Silicon Perspective); Y. Kukimoto, C.-C. Teng (University of California, Berkeley); K. Keutzer (Silicon Perspective)

Detecting overlapping of switching windows between coupling nets is a major static technique to accurately locate crosstalk noise. However, due to the mutual dependency between switching windows, the computation requires iterations to converge. In this paper, we discuss the issues and provide answers to the important questions involved in convergence and numerical properties, including the effect of coupling models, multiple convergence points, convergence rate, computational complexity, non-monotonicity, continuity and the effectiveness of bounds. Numerical fixed point computation results are used to explain these properties. Our contribution here builds a theoretical foundation for static crosstalk noise analysis.
Categories and Subject Descriptors
B.7.2 [Design Aids]: VerificationÖTiming analysis, noise analysis
General Terms
Theory, Algorithms, Verification


Session 8: Buffer Insertion

Chair: J. Lou (Synopsys)
Buffer Insertion with Adaptive Blockage Avoidance [p. 92]
J. Hu, S.T. Quay, G. Gandham (IBM Microelectronics); C.J. Alpert (IBM ARL)

Buffer insertion is a fundamental technology for VLSI interconnect optimization. Several existing buffer insertion algorithms have evolved from van Ginneken's classic algorithm. In this work, we extend van Ginneken's algorithm to handle blockages in the layout. Given a Steiner tree containing a Steiner point that overlaps a blockage, a local adjustment is made to the tree topology that enables additional buffer insertion candidates to be considered. This adjustment is adaptive to the demand on buffer insertion and is incurred only when it facilitates the maximal slack solution. This approach can be combined with any performance-driven Steiner tree construction. The overall time complexity has linear dependence on the number of blockages and quadratic dependence on the number of potential buffer locations. Experiments on several large nets confirm that high-quality solutions can be obtained through this technique with little CPU cost.
Categories and Subject Descriptors
B.7.2 [Hardware]: Integrated Circuits|Design Aids
General Terms
Algorithms, Performance

Buffer Tree Synthesis with Consideration of Temporal Locality, Sink Polarity Requirements, Solution Cost and Blockages [p. 98]
M. Hrkic, J. Lillis (University of Illinois at Chicago)

We give an overview of a buffer tree synthesis package which pays particular attention to the following issues: routing and buffer blockages, minimization of interconnect and buffer costs, exploitation of termporal locality among the sinks and addressing sink polarity requirements. Experimental results demonstrate the effectiveness of the tool in comparison with previously proposed techniques.
Categories and Subject Descriptors
B.7.2. [Hardware]: Integrated Circuits - Design Aids
General Terms
Algorithms

Simultaneous Driver Sizing and Buffer Insertion Using a Delay Penalty Estimation Technique [p. 104]
C.J. Alpert, J. Hu, C. Kashyap, S.T. Quay R. G. Gandham (IBM); C.-N. Chu (Iowa State University); M. Hrkic (University of Illinois at Chicago)

To achieve timing closure in a placed design, buffer insertion and driver sizing are two of the most effective transforms that can be applied. Since the driver sizing solution and the buffer insertion solution affect each other, sub-optimal solutions may result if these techniques are applied sequentially instead of simultaneously. We show how to simply extend van Vinneken's buffer insertion algorithm to simultaneously incorporate driver sizing and introduce the idea of a delay penalty to encapsulate the effect of driver sizing on the previous stage. The delay penalty can be pre-computed efficiently via dynamic programming. Experimental results show that using driving sizing with a delay penalty function obtains designs with superior timing and area characteristics.
Categories and Subject Descriptors
B.7.2. [Integrated Circuits]: Design Aids - Layout; J.6 [Computer Applications]: Computer-Aided Engineering - Computer-aided design
General Terms
Algorithms, Design, Performance


Session 9: Future Design Trends

Chair: C.J. Alpert (IBM)
A Roadmap and Vision for Physical Design (Invited Paper) [p. 112]
A.B. Kahng (University of California, San Diego)

This invited paper offers "roadmap and vision" for physical design. The main messages are as follows. (1) The high-level roadmap for physical design is static and well-known. (2) Basic problems remain untouched by fundamental research. (3) Academia should not overemphasize back-filling and formulation over innovation and optimization. (4) The physical design field must become more mature and efficient in how it prioritizes research directions and uses its human resources. (5) The scope of physical design must expand (up to package and system, down to manufacturing interfaces, out to novel implementation technologies, etc.), even as renewed focus is placed on basic optimization technology.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids š Layout, Place and Route; F.2.2 [Analysis of Algorithms and Problem Complexity]: Non-Numerical Algorithms and Problems š Routing and Layout; J.6 [Computer-Aided Engineering]: CAD
General Termss
Algorithms, Design

Physical Design with Multiple On-Chip Voltages (Invited Paper) [p. 118]
C. Chen (University of Windsor)

Supply voltage is one of the dominant factors that determine the timing performance and power consumption of VLSI chips. For digital systems with a single supply voltage, some logic devices typically operate faster or more frequently than necessary, consuming extra power. This motivates the need for multiplevoltage design for an aggressive power optimization as slow operation allows a reduced voltage level. While the past few years have seen some successful examples from behavioral- and logiclevel synthesis, physical designers are facing the challenges under such a new design environment. This talk discusses the physical aspects of multiple-voltage design, including the latest approaches to emerging problems such as voltage interface, clock distribution, power/ground network, and place-and-route (for more info, visit the web site at: http://www.vlsi.uwindsor.ca/~cchen). The goal is to enable designers to deal with key issues in implementation of multiple-voltage chips.
We first introduce the concept of hierarchy with multiple voltages which can be applied to various design levels. In general, the voltages could be either dynamically or statically variable. The dynamic-voltage scheme requires high-efficiency DC-DC converters and a voltage scheduling mechanism which treats the voltage as a function of workload and timing constraints. The static-voltage strategy calls for considerations about the number of different voltages and the selection of their values. We discuss these issues with emphasis on dual-voltage solutions which are a typical case.
We then study the interface between different voltages on a same chip. At high levels of abstraction, the chip needs to be partitioned into several regions so that each individual region uses the same supply voltage. This can be done with careful floorplanning and power/ground network design. At low levels of abstraction, the interface design involves investigating specific layout structures. Particularly, we discuss level converters which are inserted between different voltages to prevent the static current. Techniques to minimize the effect of level converters are also explored. Since clock distribution network is a power-hungry component on today's chips, applying multiple voltages to it can save significant power. The basic idea is to use a lower voltage swing through the clock tree, and the clock signal can be converted to a higher voltage at the sinks. Buffers in the clock tree and/or flip-flops at the sinks can be combined with level converters to reduce their delay and power overheads. We also look at the power/ground network which is another concern with multiple on-chip voltages. Designers can simply distribute different power lines to everywhere by using standard cells with more than one power rails. Alternatively, different power supplies can be separately delivered to individual regions, depending upon the layout structures. We will compare these approaches, and analyze their effect on place-and-route. In addition, a new "fabric" layout for P/G and signal nets will be reviewed towards tackling the cross-coupling and signal integrity which are exacerbated for signal nets with different voltages. Multiple-voltage brings additional physical constraints into both placement and routing, resulting in a more difficult layout design for signal nets. To improve routing, new placer and router should be able to assign highly-connected logic devices to the same voltage, and to locate signals with same voltage in the neighboring areas. Recent methodology from the literature will be described. Furthermore, shielding techniques can be used locally to minimize the capacitive crosstalk between signal nets. Finally, we will present the future trends for multiple-voltage design, together with tool aspects from an industrial perspective.


Session 10: Poster Paper Introductions

Chair: J. Lillis (University of Illinois at Chicago)
Incremental Delay Change Due to Crosstalk Noise [p. 120]
L.H. Chen (Avant! Corporation); M. Marek-Sadowska (University of California, Santa Barbara)

In this paper we present efficient closed-form formulas to estimate the incremental delay change induced by capacitive interconnect coupling. We also analyze temporal correlations among switching signals and develop criteria for timing window alignment. Our approximations are conservative and yet achieve acceptable accuracy. The formulas are simple enough to be used in the inner loops of static timing analysis.

Crosstalk Noise Optimization by Post-Layout Transistor Sizing [p. 126]
M. Hashimoto, M. Takahashi, H. Onodera (Kyoto University)

This paper proposes a post-layout transistor sizing method for crosstalk noise reduction. The proposed method downsizes the drivers of the aggressor wires for noise reduction, utilizing the precise interconnect information extracted from the detail-routed layouts. We develop a transistor sizing algorithm for crosstalk noise reduction under delay constraints, and construct a crosstalk noise optimization method utilizing a crosstalk noise estimation method and a transistor sizing framework which are previously developed. Our method exploits the transistor sizing framework that can vary the transistor widths inside cells with interconnects unchanged. Our optimization method therefore never cause a new crosstalk noise problem, and does not need iterative layout optimization. The effectiveness of the proposed method is experimentally examined using 2 circuits. The maximum noise voltage is reduced by more than 50% without delay increase. These results show that the risk of crosstalk noise problems can be considerably reduced after detailrouting.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids
General Terms
Design
Keywords
crosstalk noise, capacitive coupling noise, transistor sizing, gate sizing, post-layout optimization

Understanding and Addressing the Impact of Wiring Congestion During Technology Mapping [p. 131]
D. Pandini (ST Microelectronics); L.T. Pileggi, A.J. Strojwas (Carnegie Mellon University)

Traditionally, interconnect effects are taken into account during logic synthesis via wireload models, but their ineffectiveness for DSM technologies has been demonstrated and various physical synthesis approaches have been spawned to address the problem. Of particular interest is that logic block size is no longer dictated exclusively by total cell area, yet synthesis optimization objectives are aimed specifically at minimizing the number and size of cells. Methodologies that incorporate congestion within the logic synthesis objective function have been proposed in [9][10][11] and [15]; however, as we will demonstrate, predicting the true congestion prior to layout is not possible, and the efficacy of any approach can only be evaluated after routing is completed within the fixed die size. In this paper we propose a practical, complete methodology which first performs congestion-aware technology mapping using a global weighting factor for the cost function [15], and then applies incremental localized unmapping and remapping on congested areas. This complete approach addresses the problem that one global factor is not ideally suited for all regions of the designs. Most importantly, through the application of this methodology to industrial examples we will show that any attempt at a purely top-down single-pass congestion-aware technology mapping is merely wishful thinking.
Categories and Subject Descriptors
B.6.3 [Logic Design]: Design Aids - Automatic synthesis. B.7.2 [Integrated Circuits]: Design Aids - Layout, placement and routing.
General Terms
Algorithms, Performance, Design.

Closing the Smoothness and Uniformity Gap in Area Fill Synthesis [p. 137]
Y. Chen (University of California, Los Angeles); A.B. Kahng (University of California, San Diego); G. Robins (University of Virginia); A. Zelikovsky (George State University)

Control of variability in the back end of the line, and hence in interconnect performance as well, has become extremely difficult with the introduction of new materials such as copper and low-k dielectrics. Uniformity of chemical-mechanical planarization (CMP) requires the addition of area fill geometries into the layout, in order to smoothen the variation of feature densities across the die. Our work addresses the following smoothness gap in the recent literature on area fill synthesis. (1) The very first paper on the filling problem (Kahng et al., ISPD98 [7]) noted that there is potentially a large difference between the optimum window densities in fixed dissections vs. when all possible windows in the layout are considered. (2) Despite this observation, all filling methods since 1998 minimize and evaluate density variation only with respect to a fixed dissection. This paper gives the first evaluation of existing filling algorithms with respect to "gridless" ("floating-window") mode, according to both the effective and spatial density models. Our experiments indicate surprising advantages of Monte-Carlo and greedy strategies over ‹optimalŠ linear programming (LP) based methods. Second, we suggest new, more relevant methods of measuring a local uniformity based on Lipschitz conditions, and empirically demonstrate that Monte-Carlo methods are inherently better than LP with respect to the new criteria. Finally, we propose new LPbased filling methods that are directly driven by the new criteria, and show that these methods indeed help close the ‹smoothness gapŠ.
Categories and Subject Descriptors
B.7.2 [Hardware]: ICÖ Manufacturability; J.6 [Computer Applications]: CAD; F.2.2 [Analysis of Algorithms]: Problem Complexity
General Terms
Algorithms, Design, Reliability, Theory
Keywords
VLSI Manufacturability, Chemical-Mechanical Polishing, Dummy Fill Problem, Density Analysis, Monte-Carlo

Min-Max Placement For Large Scale Timing Optimization [p. 143]
Andrew B. Kahng (University of California, San Diego); S. Mantik (University of California, Los Angeles); I.L. Markov (University of Michigan)

At the 250nm technology node, interconnect delays account for over 40% of worst delays [12]. Transition to 130nm and below increases this figure, and hence the relative importance of timing-driven placement for VLSI. Our work introduces a novel minimization of maximal path delay that improves upon previously known algorithms for timing-driven placement. Our placement algorithms have provable properties and are fast in practice. Empirical validation is based on extending a scalable min-cut placer with proven quality in wirelengthand congestion-driven placement [4]. The CPU overhead of the timing-driven capability is within 50%. We placed industrial circuits and evaluated the resulting layouts with a commercial static timing analyzer.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Layout, Place and Route
General Terms
Algorithms, Design

Global Clustering-Based Performance-Driven Circuit Partitioning [p. 149]
J. Cong (University of California, Los Angeles); C. Wu (Aplus Design Technologies, Inc.)

In this paper, we propose a new global clustering based multi-level partitioning algorithm for performance optimization. Our algorithm computes a delay minimal K-way partition first, then gradually reduces the cutsize while keeping the circuit delay by de-clustering and refinement. Our test results on a set of MCNC sequential examples show that we can reduce the delay by 30%, while increasing the cutsize by 28% on average, when compared with hMetis [5]. Our algorithm also consistently outperforms other state-of-the-art partitioning algorithms [2,5,3] on circuit delay with reasonable cost on the cutsize.

Net Criticality Revisited: An Effective Method to Improve Timing in Physical Design [p. 155]
H.-L. Chang, E. Shragowitz, J. Liu (University of Minnesota); H. Youseff (Universite du Centre); B. Lu (Cadence Design Systems Inc); S. Sutanthavibul (Intel Corporation)

Criticality metrics is a type of predictive models used in VLSI design. This work demonstrates that timing in physical design could be substantially improved if circuits were subjected to timing criticality analysis prior to layout and new criticality metrics were used to drive layout system. These new metrics are computed as ratios of net physical characteristics to the net delay bounds determined by an optimal bounds computation algorithm. Attempts to develop criticality metrics prior to layout were made before, but these metrics were not based on the bound ideology. The paper provides probabilistic interpretation of new criticality metrics and derivation of some important properties of these metrics. Evaluation of net criticality by new metrics can be easily merged with any layout system that allows weights to be assigned to nets on placement and/or routing steps. The methodology has been tested with a commercial layout system from a leading CAD provider. When the new criticality information was supplied to a basic commercial standard cell placer and router, timing was improved by 29.5% for the set of testcases. The achieved result is 12% better than timing generated by a targeted timing-driven layout system from the same provider. All additional computations related to the new criticality metrics require only negligible increase in run time of the basic layout system.
Categories and Subject Descriptors
B.7.1 [Integrated Circuits]: Types and Design Styles - very large scale integration.
General Terms
Algorithms, Performance, Theory.
Keywords
Net Delay Bound, Criticality Metrics, Placement, Routing.

FAR: Fixed-Points Addition and Relaxation Based Placement [p. 161]
B. Hu, M. Marek-Sadowska (University of California, Santa Barbara)

In this paper we describe the Fixed-points Addition and Relaxation (FAR) based placement technique. Fixed point is a pseudo cell connected to a movable cell. By introducing fixed points, the placement can be maintained in a force equilibrium state and further transformed into another equilibrium state. By relaxing some of the previously introduced fixed points, we can partially or completely collapse the current placement in order to reposition the cells, or incrementally perturb the existing good solution to fulfill additional requirements. We apply the FARbased approach to global placement for total wire length minimization, and to incremental placement for Buffer Site Generation (BSG). For global placement, our experimental results show that the FAR method achieves 53.8% CPU speedup and total wire length comparable to that achieved by the constant force based approach [1]. Experimental results indicate that to accommodate buffers in specific regions, FAR is able to perturb incrementally a given solution in a well-controlled way.


Session 11: Design Above the Silicon Surface: Package, Board and Beyond

Organizer and Moderato: R.A. Rutenbar (Carnegie Mellon University)
Speakers: B. McCredie (IBM), K. Rinebold (Avant!), D. Wood (Intel)

Session 12: Timing Closure

Chair: M. Pedram (University of Southern California)
Timing Closure Based on Physical Hierarchy (Invited Paper) [p. 170]
J. Cong (University of California, Los Angeles)

In this paper, we shall present the progress and results of the ongoing project at UCLA on synthesis and optimization under physical hierarchy. First, we shall motivate our approach by pointing out the limitations of the existing approach to interconnect planning based on early RTL floorplanning following logic hierarchy. Then, we shall discuss the technical challenges for synthesis under the physical hierarchy, including handling high computational complexity from the flattened logic hierarchy, needs of retiming and pipelining over global interconnects, and extension of existing synthesis operations. Finally, we shall outline our approaches to overcome these technical challenges.
Categories and Subject Descriptors
B.7.2 [Hardware]: INTEGRATED CIRCUITS - Design Aids.
General Terms
Algorithms, Performance, Design, Experimentation.
Keywords
Physical hierarchy, logic hierarchy, timing closure, interconnect planning, interconnect optimization, multilevel optimization, retiming and pipelining, sequential arrival time.


Session 13: Routing

Chair: I. Markov (University of Michigan)
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation [p. 176]
S. Lee, D.F. Wong (University of Texas at Austin)

As interconnection delay plays an important role in determining circuit performance in FPGAs, timing-driven FPGA routing has received much attention recently. In this paper, we present a new timing-driven routing algorithm for FPGAs. The algorithm finds a routing with minimum critical path delay for a given placed circuit using the Langrangian relaxation technique. Lagrangian multiplies used to relax timing constraints are updated by subgradient method over iterations. Inforporated into the cost function, these multipliers guide the router to construct routing tree for each net. During routing, the exclusivity constraints on each routing resources are also taken care of to route circuits successfully. Experimental results on benchmark circuits show that our approach outperforms the state-of-the-art VPR router.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Placement and routing; J.6 [Computer Applications]: Computer-Aided Engineering - Computer-aided design (CAD)
General Terms
Algorithms, Experimentation
Keywords
FPGA, timing-driven routing, Lagrangian relaxation.

sub-SAT: A Formulation for Relaxed Boolean Satisfiability with Applications in Routing [p. 182]
H. Xu, R.A. Rutenbar (Carnegie Mellon University); K. Sakallah (University of Michigan)

Advances in methods for solving Boolean satisfiability (SAT) for large problems have motivated recent attempts to recast physical design problems as Boolean SAT problems. One persistent criticism of these approaches is their inability to supply partial solutions, i.e, to satisfy most but not all of the constraints cast in the SAT style. In this paper we present a formulation for ‹subset satisfiableŠ Boolean SAT: we transform a "strict" SAT problem with N constraints into a new, ‹relaxedŠ SAT problem which is satisfiable just if not more than k<<N of these constraints cannot be satisfied in the original problem. We describe a transformation based on explicit thresholding and counting for the necessary SAT relaxation. Examples from FPGA routing show how we can determine efficiently when we can satisfy "almost all" of our geometric constraints.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design aidsÖlayout
General Terms
Algorithms


Session 14: Topics in Physical Design

Chair: L. Scheffer (Cadence)
Temporal Logic Replication for Dynamically Reconfigurable FPGA Partitioning [p. 190]
W.-K. Mak (University of South Florida), E.F.Y. Young (The Chinese University of Hong Kong)

In this paper, we propose the idea of termporal logic replication in dynamically reconfigurable field-programmable gate array partitioning to reduce communication cost. Temporal logic replication has never been explored before. We define the min-area min-cut replication problem given a k-stage temporal partition satisfying all temporal constraints and devise an optimal algorithm to solve this problem. We have also devised a flow-based replication heuristic in case there is a tight area bound that limtis the amount of replication. In addition, we will present a correct network flow model for partitioning sequential circuits temporally.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Layout; J.6 [Computer Applications]: Computer-Aided Engineering - Computer-aided design
General Terms
Algorithms, Design, Performance

Twin Binary Sequences: A Non-Redundant Representation for General Non-slicing Floorplan [p. 196]
E.F.Y. Young (The Chinese University of Hong Kong); C.C.N. Chu, Z.C. Shen (Iowa State University)

The efficiency and effectiveness of many floorplanning methods depend very much on the representation of the geometrical relationship between the modules. A good representation can shorten the searching process so that more accurate estimations on area and interconnect costs can be performed. Non-slicing floorplan is the most general kind of floorplan that is commonly used. Unfortunately, there is not yet any complete and non-redundant topological representation for non-slicing structure. In this paper, we will propose the first representation of this kind. Like some previous work [9], we have also made used of mosaic floorplan as an intermediate step. However, instead of including a more than sufficient number of extra dummy blocks in the set of modules, our representation allows us to insert an exact number of irreducible empty rooms to a mosaic floorplan in such a way that every non-slicing floorplan can be obtained by this method uniquely from one and only one mosaic floorplan. The size of the solution space is only O(n!23n/n1.5) but every non-slicing floorplan can be generated uniquely and efficiently in linear time without nay redundant representation.
Categories and Subject Descriptors
B.7.2 [Integrated Circuits]: Design Aids - Layout; J.6 [Computer Applications]: Computer-Aided Engrineering - Computer-aided design
General Terms
Algorithms, Design, Performance

Geometrically Parameterized Interconnect Performance Models for Interconnect Synthesis [p. 202]
L. Daniel (University of California, Berkeley); C.S. Ong, S.C. Low, K.H. Lee (National University of Singapore); J. White (Massachusetts Institute of Technology)

In this paper we describe an approach for generating geometrically parameterized integrated-circuit interconnect models that are efficient enough for use in interconnect synthesis. The model generation approach presented is automatic, and is based on a multiparameter model-reduction algorithm. The effectiveness of the technique is tested using a multi-line bus example, where both wire spacing and wire width are considered as geometric parameters. Experimental results demonstrate that the generated models accurately predict both delay and cross-talk effects over a wide range of spacing and width variation.