# A Timing Dependent Power Estimation Framework Considering Coupling

Debjit Sinha<sup>\*</sup>, DiaaEldin Khalil, Yehea Ismail and Hai Zhou Electrical Engineering and Computer Science Northwestern University, Evanston, IL 60208, USA {debjit, dsk839, ismail, haizhou}@ece.northwestern.edu

# Abstract

In this paper, we propose a timing dependent dynamic power estimation framework that considers the impact of coupling and glitches. We show that relative switching activities and times of coupled nets significantly affect dynamic power consumption, and neither should be ignored during power estimation. To capture the timing dependence, an approach to efficient representation and propagation of switching-window distributions through a circuit, considering coupling induced delay variations, is developed. Based on the propagated switchingwindow distributions, power consumption in charging or discharging coupling capacitances is calculated, and accounted for in the total power. Experimental results for the ISCAS'85 benchmarks demonstrate that ignoring the impact of timing dependent coupling on power can cause up to 59% error in coupling power estimation (up to 25% error in total power estimation).

# 1 Introduction

Accurate power estimation is a critical problem in modern IC design. It is expected that power dissipation would be a limiting factor for future technologies. Currently, more than 60% of a circuit's power consumption is attributed to charging (or discharging) interconnect capacitances [1-6]. This is due to relatively decreasing gate load capacitances in comparison to increasing parasitic interconnect capacitances. With the progress of deep sub-micron technologies, shrinking geometries have led to a reduction in the self capacitance of wires. However, coupling capacitances have increased as wires have a larger aspect ratio and are brought closer together. For 90nmtechnologies, the ratio of an interconnect's parasitic coupling capacitance to its parasitic ground capacitance is nearly 5.5 (85% of the total parasitic capacitance) [7–9]. This signifies the increased dominance of coupling capacitances with technology scaling. It is therefore evident that the component of power dissipation corresponding to parasitic coupling capacitances is significant.

Power consumption estimation for coupling capacitances is more complicated than for ground capacitances. In the latter case, parasitic ground capacitances on the fanout net of a gate are added in parallel to the gate's load capacitance.

Copyright 2006 ACM 1-59593-389-1/06/0011 ...\$5.00.



Figure 1: Effect of timing on coupling power

The ground capacitances are charged and discharged depending on the net's signal transitions in exactly the same way as the load capacitance. They simply introduce additional power consumption components similar to that for the load capacitance. The power dissipation per clock cycle is thus dependent only on the net's switching activity [10]. However, the power consumption for a parasitic coupling capacitance (termed coupling power) between two interconnects is dependent on the voltage difference across that capacitance. This in turn, is dependent on the relative switching activities of these interconnects. The voltage across a coupling capacitance can vary in the range of  $[-V_{dd}, V_{dd}]$ , while the range of voltage variation for a parasitic ground capacitance is  $[0, V_{dd}]$ . The worst case voltage variation across a coupling capacitor is therefore  $2V_{dd}$  in contrast to  $V_{dd}$  for a parasitic ground capacitor. When two coupled wires **a** and **b** with coupling capacitance  $C_c$ , simultaneously switch in the same direction, there is no charging or discharging of  $C_c$ , and no coupling power is consumed  $(P_1 = 0, \text{ Figure 1})$ . When only one of the wires switch,  $C_c$  charges or discharges with a voltage variation of  $V_{dd}$ , and its coupling power consumption is given by  $P_2 = 0.5C_c V_{dd}^2$ . In the case when the wires simultaneously switch in the opposite direction,  $C_c$  may charge or discharge with a voltage variation of  $2V_{dd}$ , consuming  $P_3 = 0.5C_c(2V_{dd})^2 = 2C_c V_{dd}^2$ units of power.

In addition to the dependence of coupling power on the relative switching activities of the coupled interconnects, the power consumed is dependent on the nets' relative switching times [10]. The coupling power  $P_4$  (in Figure 1) is dependent on some function  $\psi(d)$  of the difference d in their switching times. As d increases, the case of simultaneous switchings on the interconnects change to two independent cases of only a single interconnect switching. In this case of similar direction switchings, the coupling power can vary from 0 to  $C_c V_{dd}^2$ , depending on the relative delay between their switchings. If the wires switch in opposite directions, the coupling power can vary from  $C_c V_{dd}^2$  to  $2C_c V_{dd}^2$ . Relative delays, timing information, and switching activities are therefore, critical to accurate coupling power estimation.

Furthermore, the dependence on relative switching activities translate to dependence on the functional information of the circuit. For example, the outputs of an AND gate and an OR gate have different switching probabilities, even for identical input switching probabilities. This implies that coupling power is dependent on a circuit's functionality (logic implementation).

It is commonly accepted that circuit simulation based ap-

<sup>\*</sup>Dr. Sinha is currently with IBM Microelectronics, USA.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

ICCAD'06, November 5–9, 2006, San Jose, California, USA.

proaches to power estimation [11] are strongly input pattern dependent and too slow for large circuits. Probabilistic approaches to estimate power were first proposed in [12]. A zero-delay model was considered which ignored the impact of glitches. Probabilistic simulation approaches [13–15] that did not assume a zero-delay model were later proposed to handle the impact of glitches. To consider temporal correlations, such approaches require the user to specify typical signal behavior at the circuit inputs, often using a sequence of values indicating the probability of switchings at specific time points. The propagation of these signals is similar to event driven logic simulation. An event driven energy-consumption estimation model is proposed in [16]. Power estimation approaches that employ BDDs [17] are too slow for modern complex circuits. Statistical methods for power estimation [18] simulate the circuit repeatedly using some simulator (e.g. HSPICE, Powermill) while monitoring the power consumption. Prior work in power estimation does not couple the problem of power estimation to detailed timing analysis. The effects of relative switching delays are ignored. In most of the power estimation tools, either the coupling power is ignored or is accounted for as a component from coupling capacitances that are assumed to be grounded. In addition, it is accepted that considering all spatial correlations are computationally very expensive.

In this paper, we propose a timing dependent dynamic power<sup>1</sup> estimation framework for combinational circuits that considers the impact of coupling and glitches. Our contributions are summarized as follows.

- 1. We signify the timing dependence of coupling power and show that dynamic power estimation should not be decoupled from detailed timing analysis. We present a timing dependent framework for dynamic power estimation that considers the impact of coupling and glitches.
- 2. A representation to approximate the switching probability distribution on any wire (denoted as switchingwindow distribution) is developed. We next propose an efficient single-pass approach to propagating switchingwindow distributions through a given circuit while considering the impact of coupling induced delay variations.
- 3. Finally, we present an approach to coupling and total power estimation based on the obtained switchingwindow distributions. We also describe how the proposed framework is amenable to incorporating the impact of glitches on power.

We develop the proposed framework and compare results obtained when (i) coupling power is ignored; (ii) coupling capacitances are assumed to be grounded during power estimation; and (iii) the timing dependence of coupling power is considered (our approach). Experimental results for the IS-CAS'85 benchmarks demonstrate accuracy improvements of up to 59% in coupling power estimation (up to 25% in total power estimation). Since ignorance of the timing dependence of coupling power is found to result in both underestimation or overestimation of power, using a *guard-band* during power estimation while ignoring timing is meaningless. We therefore conclude that it is critical to consider the timing dependence of coupling power in total power estimation. The rest of this paper is organized as follows. We present a motivational example in Section 2 to illustrate the significance of the timing dependence of power consumption in coupling capacitances. The proposed approach to timing dependent power-consumption estimation is described in Section 3. Experimental results are reported in Section 4, and we draw conclusions in Section 5.

# 2 A motivational example

We performed HSPICE simulations for two coupled nets with typical local interconnect dimensions [9], and having driving and loading gates implemented in 90nm technology. The energy consumption per switching in the circuit was evaluated for the cases of single net switching (SNS), simultaneous similar switching (SSS), simultaneous opposite switching (SOS), similar switching with large relative delay (SSWD), and opposite switching with large relative delay (OSWD). Results obtained are presented in Table 1. For each of the above cases, we present the energy consumption when (i) the coupling capacitance was ignored (No  $C_c$ ); (ii) the coupling capacitance was considered as ground capacitances on the nets (Grounded  $C_c$ ); and (iii) the coupling capacitance was considered between the coupled nets (Exact  $C_c$ ).

From Table 1, it is observed that ignoring coupling leads to large underestimation of power in most cases (up to 53%in case of simultaneous opposite switching). In addition, considering coupling as capacitance to ground also leads to large errors: ranging from underestimating power by 26% (for the case of simultaneous opposite switching) to overestimating power by 56% (in case of simultaneous similar switching). Furthermore, we observe that relative delays between switchings cause significant differences in power consumption. It is thus evident that coupling power should be estimated accurately considering both timing and relative switching activities.

Table 1: Simulated energy consumption (nJ) per switching

|   | Switching case | No $C_c$ | Grounded $C_c$ | Exact $C_c$ |
|---|----------------|----------|----------------|-------------|
| ſ | SNS            | 1.442    | 2.249          | 2.249       |
|   | SSS            | 2.884    | 4.498          | 2.884       |
|   | SOS            | 2.888    | 4.498          | 6.090       |
|   | SSWD           | 2.884    | 4.498          | 4.498       |
|   | OSWD           | 2.888    | 4.498          | 4.498       |

# 3 Timing dependent power estimation

In this section, we describe our approach to dynamic power estimation. It is established that coupling power is dependent on the relative switching times of coupled interconnects. However, accounting for all possible switchings in large circuits is impractical and necessitates assumptions of some switching probability distribution on each interconnect. We first describe our approach to efficiently approximating the distribution of a switching on any wire, which we denote as a switching-window distribution. We next present an approach to window propagation while considering coupling induced delay variations. Finally, we describe our power estimation technique based on the obtained switching-window distributions.

# 3.1 Switching-window distribution representation

Considering complete functional information during timing analysis of a given circuit involves enumeration of all possible input vectors. This approach is exponential in complexity, and therefore, computationally prohibitive. On the other hand, ignorance of functional information introduces uncertainties that makes timing information on each wire of the

<sup>&</sup>lt;sup>1</sup>In this paper, we denote the switching power consumption as the dynamic power, and do not explicitly consider short-circuit power. Note that this is not a limitation; traditional approaches to short-circuit power estimation can still be employed.

circuit a set of possible signal switchings. Each set is represented as a switching-window and a set of slew rates such that any signal switching falling in the window and having a slew in the range, is in the set. Symbolically, we denote the switching-windows for the *rise* and *fall* transitions on any wire  $\mathbf{x}$  as  $x^r$  and  $x^f$ , respectively.

Each switching-window x ( $x^r$  or  $x^f$ ) on a wire is often defined as an interval  $x \triangleq [l_x, h_x]$ , such that the time t of any possible signal switching on the wire lies in this interval. An assumption of a uniform probability density for a switching in this interval is unrealistic. We formally denote the probability density function (PDF) of a switching on a given window xas  $\phi_x(t)$ . The following is immediate.

$$\phi_x(t) = 0, \quad \forall \ t < l_x \lor t > h_x \tag{1}$$

Efficient representation and propagation of an arbitrary window distribution  $\phi_x(t)$  is challenging and can be computationally very expensive. We propose to represent a given switching-window x using a set of M sub-switching-windows  $x_i$  (i = 1, 2, ..., M), each having a constant probability density  $\phi_{x_i}$  in their respective intervals  $[l_{x_i}, h_{x_i}]$ . Such an approach captures the PDF of x more accurately than the assumption of a uniform density in the interval  $[l_x, h_x]$  of x.

We now present one approach to representing a switchingwindow x with a given PDF  $\phi_x(t)$  as a set of M uniform density sub-switching-windows. For simplicity, we initially segment the interval  $[l_x, h_x]$  into M equal length segments. The interval of a sub-switching-window  $x_i \triangleq [l_{x_i}, h_{x_i}]$  is therefore given by the following.

$$l_{x_i} = l_x + \frac{(i-1)(h_x - l_x)}{M}$$
(2)

$$h_{x_i} = l_x + \frac{i(h_x - l_x)}{M} \tag{3}$$

To evaluate the constant probability density  $\phi_{x_i}$  for any sub-switching-window  $x_i$ , we match the probability of switchings in  $x_i$ 's interval to that of x in the same interval. Formally,  $\phi_{x_i}$  is computed as the following.

$$\phi_{x_i} = \frac{1}{(h_{x_i} - l_{x_i})} \int_{l_{x_i}}^{h_{x_i}} \phi_x(t) dt$$
 (4)

It is immediate that our approach preserves the probability of switching in a window x. This procedure is employed separately to represent the rise and fall switching-windows on a given wire, each into M sub-switching-windows. As a result, the probabilities that the signal on a wire switches from lowto-high (rise transition) and from high-to-low (fall transition) in a single clock cycle are given by the following.

$$Pr_{x}[rise] = \sum_{i=1}^{M} (h_{x_{i}^{r}} - l_{x_{i}^{r}})\phi_{x_{i}^{r}}$$
(5)

$$Pr_{x}[fall] = \sum_{i=1}^{M} (h_{x_{i}^{f}} - l_{x_{i}^{f}})\phi_{x_{i}^{f}}$$
(6)

## 3.2 Switching-window distribution propagation

We next describe our approach to propagating the set of subswitching-windows through logic blocks in a circuit. We discuss the window propagation for a two-input AND gate for illustration. The extension to other common logic gates, including those with greater than two inputs, is similar.



Figure 2: A two-input AND gate

We consider a logic AND gate with inputs **a** and **b**, and output **c** as shown in Figure 2. The switching-window on input **a** for the rise (fall) transition is denoted by M uniform density sub-switching-windows  $a_i^r$   $(a_i^f), (i = 1, 2, ..., M)$ . The interval of any sub-switching-window  $a_i^r$  is denoted by  $[l_{a_i^r}, h_{a_i^r}]$ , and the density within this interval is denoted by a constant  $\phi_{a_i^r}$ . The probabilities of the possible transitions in input **a**, namely low (steady at logic 0), rise, fall, and high (steady at logic 1) for a clock cycle are denoted by  $Pr_a[l]$ ,  $Pr_a[r]$ ,  $Pr_a[f]$ , and  $Pr_a[h]$ , respectively. We denote the delay of the timing arc from input  $\mathbf{a}$  to output  $\mathbf{c}$  for a rise transition as  $d_{a^r}$ . A similar representation is used for the input **b**. Given these information, our goal is to approximate the rise and fall switching-windows at  $\mathbf{c}$  each with M sub-switching-windows, for further propagation. Note that for M = 1, our approach falls back to the traditional single switching-window approach with  $Pr_a[r]$  as the traditional switching activity on **a**.

If we ignore logical correlation between the inputs  $\mathbf{a}$  and  $\mathbf{b}$ , the probability that  $\mathbf{c}$  has a fall transition is given by:

$$Pr_c[f] = Pr_a[h]Pr_b[f] + Pr_a[f]Pr_b[f] + Pr_a[f]Pr_b[h].$$

The PDF  $\phi_{cf}(t)$  of the fall transition switching-window  $c_f$  at any time t is thus given by the following.

$$\begin{split} \phi_{cf}(t) &= \\ Pr_{a}[h]\phi_{bf}(t-d_{b}^{f}) + \phi_{af}(t-d_{a}^{f}) \int_{-\infty}^{t} \phi_{bf}(x-d_{b}^{f})dx + \\ \phi_{bf}(t-d_{b}^{f}) \int_{-\infty}^{t} \phi_{af}(x-d_{a}^{f})dx + Pr_{b}[h]\phi_{af}(t-d_{a}^{f}) (7) \end{split}$$

Given that the input switching-windows are represented as a set of sub-switching-windows,  $\phi_{cI}(t)$  is expressed as:

$$\phi_{cf}(t) = Pr_{a}[h] \sum_{i=1}^{M} \phi_{\mathbf{b}_{\mathbf{i}}^{\mathbf{f}}}(t - d_{b}^{f}) + \sum_{i=1}^{M} \sum_{j=1}^{M} \phi_{\mathbf{a}_{\mathbf{i}}^{\mathbf{f}}}(t - d_{a}^{f}) \int_{-\infty}^{t} \phi_{\mathbf{b}_{\mathbf{j}}^{\mathbf{f}}}(x - d_{b}^{f}) dx + \sum_{i=1}^{M} \sum_{j=1}^{M} \phi_{\mathbf{b}_{\mathbf{i}}^{\mathbf{f}}}(t - d_{b}^{f}) \int_{-\infty}^{t} \phi_{\mathbf{a}_{\mathbf{j}}^{\mathbf{f}}}(x - d_{a}^{f}) dx + Pr_{b}[h] \sum_{i=1}^{M} \phi_{\mathbf{a}_{\mathbf{i}}^{\mathbf{f}}}(t - d_{a}^{f})$$
(8)

where, (expressions for  $\phi_{\mathbf{a}_{\mathbf{i}}^{\mathbf{r}}}(t), \phi_{\mathbf{a}_{\mathbf{i}}^{\mathbf{r}}}(t), \phi_{\mathbf{b}_{\mathbf{i}}^{\mathbf{r}}}(t)$  are similar)

$$\begin{split} \phi_{\mathbf{b}_{\mathbf{i}}^{\mathbf{f}}}(t) & \stackrel{\Delta}{=} & \left\{ \begin{array}{cc} 0 & \forall t < l_{b_{i}^{f}} \\ \phi_{b_{i}^{f}} & \forall t \in [l_{b_{i}^{f}}, h_{b_{i}^{f}}] \\ 0 & \forall t > h_{b_{i}^{f}} \end{array} \right. \end{split} \tag{9}$$

Each inner integral in (8) denotes the area of that subswitching-window lying on the left of t, that is, the proba-



Figure 3: Fall switching-window propagation example

bility of the sub-switching-window having a transition before time t. The earliest possible switching time for a fall transition at **c** is given by the minimum of  $(l_a^f + d_a^f)$  and  $(l_b^f + d_b^f)$ . Similarly, the latest possible switching time for a fall transition at **c** is given by the maximum of  $(h_a^f + d_a^f)$  and  $(h_b^f + d_b^f)$ . We partition the interval formed with these earliest and latest switching times into M equal segments. The probability of a fall transition within any segment is given by the area under  $\phi_{cf}(t)$  in that segment. The uniform density of this segment  $\phi_{c_{*}^{f}}$  is computed by dividing the obtained probability by the segment width (similar to (4)). In practice, this computation is done faster by evaluating the probability of switching in any segment as the difference of the cumulative distribution function  $\Phi_{cf}(t)$  of a fall transition on **c**, evaluated at the segment's upper and lower bounds, respectively.  $\Phi_{cf}(t)$  can be evaluated analytically without numerical integration. This procedure is employed to compute  $\phi_{c_i^f}$  for  $i = 1, 2, \dots, M$ .

Figure 3 illustrates the proposed approach with an example. Here, M = 2, and the switching probabilities and times are as shown in the figure.  $\phi_{c_i^f}$  is obtained as a sum of the three PDFs, the third corresponding to the sum of the two middle terms in (8), while the other two correspond to the first and last term in (8). The PDF obtained using our proposed approximation approach is shown using a dashed line in the figure. It is observed that the probability of a switching in a sub-interval is preserved. Increasing the number of the sub-intervals (M) improves the accuracy of our approach.

The procedure to computing the sub-switching-windows for a rise transition is similar. The proposed approach is extensible to other logic blocks, and is not limited to two-input gates. The sub-switching-window computation for a *NAND* gate is equivalent to the computation for an *AND* gate, except that, in this case, the computed rise and fall windows should be swapped. Non-inverting buffers and wires timeshift the sub-switching-windows by their delay values, and do not affect their probability densities.

# 3.3 Considering coupling induced delay variations

Simultaneous switchings on coupled wires induce delay variations (often termed as delay pushouts) in each of them.

The relative switching times and direction (rise or fall) of the switchings impact these delay variations. For timing estimation, a coupling induced delay pushout model is considered that evaluates the possible delay variation as a function of the overlap between the switching-windows on the coupled wires. A coupling between two wires lying in the fanin and fanout cone of a logic gate, respectively, causes timing dependencies that necessitate iterations during timing analysis. Starting with some assumption of the delay pushout, multiple iterations of timing analysis are performed. Zhou [19] formally proved that iterations starting with an assumption of zero delay pushouts on all wires would converge and yield timing analysis results with minimal pessimism. Simplistic delay pushout models only consider the worse case overlap length between the switching-windows on coupled nets. Such models result in a monotonically increasing sequence [19, 20] of timing information on all wires, and thus, guarantee convergence. However, ignorance of correlations between subsequent iterations when using accurate delay pushout models that consider the probabilities of switchings may not guarantee convergence. On the other hand, considering all possible correlations is computationally too expensive. Consequently, a majority of timing analysis tools that consider coupling start iterations with an initial assumption of worst case delay pushout for all coupled nets. A finite number of iterations are performed to reduce pessimism using accurate pushout models, but not till convergence is achieved.

In this work, we adopt the former approach for simplicity. Timing analysis of a given circuit is initially performed with two switching-windows on each wire that denote the wire's rise and fall switching-windows. Using an overlap based delay pushout model [21], the converged switching-windows on all wires in the circuit are obtained. We term these switchingwindows as the *simple switching-windows*.

Next, we propagate our sub-switching-windows through the circuit as explained earlier. To evaluate the delay variation for any sub-switching-window on a coupled wire, we consider a delay pushout model that yields the variation as a function of that sub-switching-window's PDF and the intervals of the simple switching-windows on its coupled neighbors. Each of the sub-switching-windows may now broaden, and may have regions of overlap with other sub-switchingwindows. The impact of coupling does not affect the total probability of switching in any wire's window; it may only affect the PDF in some intervals. We therefore scale the uniform density in any sub-switching-window such that the original probability in its interval is retained. We do not impose a restriction that all sub-switching-windows on a wire for a given transition should have mutually exclusive intervals. Since the *simple switching-windows* are pre-determined and do not change, this approach does not require multiple iterations. Our approach to propagating the switching-window distributions is therefore a single pass procedure, and is illustrated diagrammatically in Figure 4.

Alternately, we could start with assumptions of worse case delay pushout values, and perform iterations with the subswitching-window distributions. We choose the former approach as it is faster, and does not involve a heuristic for picking a suitable time to stop further iterations, when the pushout model does not guarantee convergence.

# 3.4 Timing dependent power estimation

Once switching-window distributions and transition probabilities have been evaluated on the fanout nets of all gates in a circuit, the dynamic power consumption is computed by a summation of the switching power corresponding to each gate in the circuit. For each gate, the average dynamic power  $P_d$ 



Figure 4: Estimation of coupling induced delay pushouts

consumed per clock cycle is given by:

$$P_{d} = P_{g} + P_{c}$$
  
=  $0.5V_{dd}^{2}C_{g}(Pr[r] + Pr[f]) + \sum_{i=1}^{k} P_{c}^{i}$  (10)

where,  $P_g$  denotes the power consumption corresponding to the gate's and its fanout net's total ground capacitance  $C_g$ , and Pr[r] (Pr[f]) denotes the probability of a rise (fall) transition at the gate's output.  $P_c$  denotes the power consumption due to coupling on its fanout net, and is given by the sum of coupling power  $P_c^i$  consumed due to each coupled neighbor  $i = 1, 2, \ldots, k$  of the gate's fanout net. For a single coupled neighbor i with coupling capacitance  $C_c$ ,

$$P_c = P_{quiet} + P_{opp} + P_{sim} \tag{11}$$

$$P_{quiet} = 0.5V_{dd}^2 C_c (Pr[r] + Pr[f]) (Pr_i[l] + Pr_i[h]) (12)$$

where,  $P_{quiet}$ ,  $P_{opp}$  and  $P_{sim}$  denote the coupling power when the coupled neighbor is not switching, switching in the opposite direction and switching in the same direction, respectively.  $(Pr_i[l]+Pr_i[h])$  denotes the probability that net *i* does not switch.  $P_{opp}$  and  $P_{sim}$  are **timing dependent**, that is, they depend on the relative switching times on the two nets. Analytically, we express

$$P_{opp} = 0.5 V_{dd}^2 C_c \int_{-\infty}^{\infty} \psi_{opp}(x) p_{opp}(x) dx \qquad (13)$$

where,  $p_{opp}(x)$  denotes the probability density that the nets switch in the opposite direction with a time skew of x and  $\psi_{opp}(x)$  denotes some function that gives an effective power factor as a function of the time skew x. In general,  $\psi_{opp}(x)$ depends on the slews of the switching signals and is symmetric in nature. We illustrate a typical linear and an exponential  $\psi_{opp}(x)$  model in Figure 5. It is intuitive that when  $x \approx 0$ , the coupled nets switch simultaneously in the opposite direction and  $C_c$  charges (or discharges) with  $\approx 2V_{dd}$  across itself. Attributing half of the power consumed to each driving-gate



Figure 5: A typical linear and exponential model for  $\psi_{opp}(x)$ 



Figure 6: PDF of sub-switching-windows switching with skew x



Figure 7: A typical linear and exponential model for  $\psi_{sim}(x)$ 

of the coupled nets is equivalent to having  $\psi_{opp}(x) \approx 2$ . Similarly, when the nets switch with a large skew ( $\geq s$ , in Figure 5), power consumption attributed to each driving gate is exactly same as the case of a switching with the coupled net quiet, and hence  $\psi_{opp}(x) \approx 1$ . In the figure, s is dependent on the slews of the signals on the coupled wires.

For two coupled wires **a** and **b**,  $p_{opp}(x)$  is evaluated as:

$$p_{opp}(x) =$$

$$\int_{-\infty}^{\infty} \phi_{a^r}(t)\phi_{b^f}(t+x)dt + \int_{-\infty}^{\infty} \phi_{b^r}(t)\phi_{a^f}(t+x)dt.$$
(14)

Given switching-window distributions for the rise and fall transitions on  $\mathbf{a}$  and  $\mathbf{b}$  each as a set of M sub-switching-windows, the computation translates to the following.

$$p_{opp}(x) = \sum_{i=1}^{M} \sum_{j=1}^{M} \int_{-\infty}^{\infty} \phi_{\mathbf{a}_{i}^{\mathbf{r}}}(\mathbf{t}) \phi_{\mathbf{b}_{j}^{\mathbf{f}}}(\mathbf{t} + \mathbf{x}) dt + \sum_{i=1}^{M} \sum_{j=1}^{M} \int_{-\infty}^{\infty} \phi_{\mathbf{b}_{i}^{\mathbf{r}}}(\mathbf{t}) \phi_{\mathbf{a}_{j}^{\mathbf{f}}}(\mathbf{t} + \mathbf{x}) dt$$
(15)

We illustrate the shape of a PDF obtained from one inner integral of (15) in Figure 6.

Given an analytical expression for  $\psi_{opp}(x)$ , the product of  $p_{opp}(x)$  and  $\psi_{opp}(x)$  are expressed analytically.  $P_{opp}$  can now be computed from analytical expressions obtained from the integration of the above product in appropriate intervals. Numerical integration is not required. A similar approach is employed for the computation of  $P_{sim}$ . Figure 7 presents a typical linear and an exponential model for  $\psi_{sim}(x)$ .

#### 3.5 Power estimation with timing dependent glitches

In this section, we describe how the proposed framework is amenable to incorporating power dissipation in glitches, often termed as *toggle power*. We illustrate our approach with an example of a two-input AND gate as shown in Figure 2. In this case, a glitch may be formed at output **c** only when both input signals switch with some time skew, in the opposite direction. The power consumption due to a glitch is some function  $\varphi(x)$  of the difference x in their switching times, given their slews. Note that  $\varphi(x)$  is not symmetric. If x denotes the time by which the rising signal switches before the falling signal, we have the following.

$$\varphi(x) \geq \varphi(-x)$$
 (16)

From (15), we can compute the probability density of the function  $p_{opp}(x)$  that denotes that the inputs switch in the opposite direction with time skew x. The toggle power consumption per cycle is estimated as the following.

$$P_{toggle} = 0.5 V_{dd}^2 C_g \int_{-\infty}^{\infty} \varphi(x) p_{opp}(x) dx \qquad (17)$$

The computation of  $p_{opp}(x)$  should also consider the difference in the timing arcs delays  $d_a$  and  $d_b$  for the corresponding cases. The proposed approach can be extended to estimate toggle power consumption in other logic gates similarly. Here we assume that glitches do not propagate through multiple logic levels, although theoretically our framework does not restrict that either. However, this may cause an exponential complexity of glitch power estimation if we do not keep an upper bound on the maximum number of logic levels that a glitch could propagate through.

#### 4 Experimental results

We next present obtained power consumption results for the ISCAS'85 benchmarks [22], mapped to 90nm technology library parameters. We use a simple delay model [23] and a linear  $\psi(x)$  model (Figure 5) in our experiments. Pr[l], Pr[r], Pr[f], and Pr[h] on all primary inputs are set to 0.25 each.

We denote the total power obtained from our approach (considering timing dependent coupling power) with M subswitching-windows, for any benchmark b, as  $P_M^b$ . Simulation based (or Monte Carlo) approaches to accurate power estimation require a very large number of input vectors (exponential in the number of inputs), and are thus, prohibitive. HSPICE simulations for small test cases show less than 1% error in  $P_{1000}^{b_{000}}$ . Although the accuracy of power estimation improves for larger M, the run time increase is not commensurate with the accuracy gain. As an example, for the benchmark C5315, the run-time to obtain  $P_{1000}^{C5315}$  is  $\approx 28$  hours. We therefore choose  $P_{1000}^b$  to be the true power value and use it as reference for error estimation. The absolute error in the result  $P_M^b$  obtained using our approach (with M sub-switching-windows), for a benchmark b, is defined as the following.

Absolute error(b, M) 
$$\triangleq \frac{|P_M^b - P_{1000}^b|}{P_{1000}^b} \times 100.0$$
 (18)

We plot absolute error values as a function of M for the benchmark C5315 in Figure 8. In the same graph, we also show the run-time ratio of our approach to the one where all coupling capacitors are assumed to be grounded. Similar plots are observed for other benchmarks. From these plots, we choose M = 6 as the default for all benchmarks. In this case, our estimation error is  $\leq 1.6\%$ , and average run-time increase is by a factor of  $\leq 5X$ . For comparison, we build the following power estimation engines.

- 1. *NC* (No Coupling) denotes an implementation of a power estimation engine that completely ignores the presence of coupling capacitances.
- 2. *FC* (Fixed Coupling) denotes an implementation of a power estimation engine that assumes all coupling capacitances are grounded. Consequently, it ignores the timing dependence of coupling power.



Figure 8: % Error and run-time ratios with varying M

3. *TDC* (Timing Dependent Coupling) denotes our approach to power estimation considering timing dependent coupling power, with M = 6.

For any approach A (could be NC, FC or TDC), we define the error in coupling power estimation ( $\% \Xi_C^A$ ) and total power estimation ( $\% \Xi_T^A$ ) of a circuit as the following.

$$\% \Xi_C^A \stackrel{\Delta}{=} \frac{P_{coupling}^A - P_{coupling}^{TDC}}{P_{coupling}^{TDC}} \times 100.0 \tag{19}$$

$$\% \Xi_T^A \stackrel{\Delta}{=} \frac{P_{total}^A - P_{total}^{TDC}}{P_{total}^{TDC}} \times 100.0 \tag{20}$$

We present error values and run-time overheads  $(t^{TDC}/t^{FC})$ for the benchmarks in Table 2. Since NC ignores coupling power, it is immediate that  $\% \Xi_C^{NC} = -100\%$ . Furthermore, NC underestimates total power on the average by 48%. We observe that considering coupling capacitances as grounded capacitances result in both overestimation and underestimation of coupling power by 59% (for benchmark C432) and 21% (for benchmark C1908), respectively. These numbers translate to overestimation by 25% and underestimation by 12%, respectively, of the total power consumption. Taking the arithmetic mean of absolute error values, we observe that coupling and total power values predicted by FC are off the true values by 28% and 13%, respectively. For M = 6, power estimation using our approach (TDC) for each benchmark takes between 0.03 to 1.95 secs.

The power estimation engines are developed in the C programming language. All experiments are performed on a Pentium 2.4GHz machine, with 1Gb RAM, running Redhat Linux 9.0.

Table 2: % Errors in coupling and total power estimation

|         |       | $\% \Xi_C$ |     | $\% \Xi_T$ |     |                  |
|---------|-------|------------|-----|------------|-----|------------------|
| Circuit | Nodes | NC         | FC  | NC         | FC  | $t^{TDC}/t^{FC}$ |
| C432    | 198   | -100       | 59  | -42        | 25  | 3.5              |
| C499    | 245   | -100       | 51  | -37        | 19  | 5.0              |
| C880    | 445   | -100       | 37  | -40        | 15  | 4.6              |
| C1355   | 589   | -100       | -19 | -56        | -11 | 3.6              |
| C1908   | 915   | -100       | -21 | -58        | -12 | 4.1              |
| C2670   | 1428  | -100       | 35  | -46        | 16  | 4.6              |
| C3540   | 1721  | -100       | -6  | -53        | -3  | 3.8              |
| C5315   | 2487  | -100       | 28  | -43        | 12  | 4.2              |
| C6288   | 2450  | -100       | 5   | -49        | 3   | 3.0              |
| C7552   | 3721  | -100       | -17 | -57        | -10 | 4.4              |
| Mean    |       | 100        | 28  | 48         | 13  | 4.1              |

# 5 Conclusions

In this paper, we present a timing dependent power estimation framework. Efficient approaches to representing and propagating switching-window distributions considering coupling induced delay variations are developed. Based on obtained switching-window distributions, we consider timing dependent coupling power in total power estimation. Experimental results for the ISCAS'85 benchmarks demonstrate that ignoring the timing dependence of coupling power can cause up to 59% error in coupling power estimation (up to 25% error in total power estimation).

The proposed framework is amenable to considering the effects of glitches and crosstalk-noise power [6] as well. In this paper, we ignore the effects of incomplete voltage swings since it has been shown that their effects are negligible for designs with typical activity factors [6]. The proposed approaches to switching-window distribution representation do not impose any restriction of how the switching-window interval must be segmented; alternative approaches can readily be accommodated. In addition, time slots [24] may be used in estimation of the *simple switching-windows* for pessimism reduction during timing analysis.

We have attained to signify the timing dependence of coupling power. Since the impact of this timing dependence may lead to both underestimation or overestimation of power, using a guard-band during power estimation while ignoring timing, is meaningless. This is true even if multiplication factors are employed when considering coupling capacitances as grounded capacitances. The best factor for the same would be 1 under the reasonable assumption that signals on coupled wires switch in similar and opposite directions with equal probability, on the average. Given that our experiments reveal both underestimation and overestimation of power using such an approach (FC), it is evident that we cannot obtain bounds on the power consumption by using other multiplication factors as well, while ignoring timing. We therefore conclude that it is critical to consider the timing dependence of coupling power in total power estimation.

Our assumption of logical independence between signals on any two wires is not completely true in reality, and may result in estimation errors. We will consider handling partial correlations for improved accuracy, and timing dependent power estimation approaches for sequential circuits, in the future.

#### Acknowledgments

This research is partly supported by the National Science Foundation under grant CCR-0238484 and by a grant from Intel Inc.

#### References

- D. W. Dobberpuhl et al., "A 200Mhz, 64bit dual-issue CMOS microprocessor," in *IEEE Journal of Solid State Circuits, Vol* 27(11), 1992, pp. 1555–1564.
- [2] D. Liu and C. Svensson, "Power consumption estimation in CMOS VLSI chips," in *IEEE Journal of Solid State Circuits*, Vol 29, 1994, pp. 663–670.
- [3] R. Mehra, L. M. Guerra, and J. M. Rabaey, "A partitioning scheme for optimizing interconnect power," in *IEEE Journal of Solid State Circuits, Vol 32*, 1997, pp. 433–443.
- [4] E. D. Man and M. Schobinger, "Power dissipation in the clock system of highly pipelined ULSI CMOS circuits," in Proc. International Workshop on Low Power Design, 1994, pp. 133–138.

- [5] J. Pangjun and S. Sapatnekar, "Low power clock distribution using multiple voltages and reduced swings," in *IEEE Transactions on Very Large Scale Integration Systems*, Vol 10, 2002, pp. 309–318.
- [6] P. Gupta and A. B. Kahng, "Quantifying error in dynamic power estimation of CMOS circuits," in Proc. Intl. Symposium on Quality Electronic Design, 2003, pp. 273–278.
- [7] S. C. Wong, G. Y. Lee, and D. J. Ma, "Modeling of interconnect capacitance, delay and crosstalk in VLSI," in *IEEE Transactions* on Semiconductor Manufacturing, Vol 13, 2000, pp. 108–111.
- [8] Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu, "New paradigm of predictive MOSFET and interconnect modeling for early circuit design," in *Proc. Custom Integrated and Circuits Conference*, 2000, pp. 201–204.
- [9] Nanoscale Integration and Modeling Group at ASU, Predictive technology models. http://www.eas.asu.edu/~ptm/.
- [10] M. Ghoneima and Y. Ismail, "Effect of relative delay on the dissipated energy in coupled interconnects," in *Proc. Intl. Symposium* on *Circuits and Systems*, 2004, pp. 525–528.
- [11] S. M. Kang, "Accurate simulation of power dissipation in VLSI circuits," in *IEEE Journal of Solid State Circuits*, Vol 21(5), 1986, pp. 889–891.
- [12] M. A. Cirit, "Estimating dynamic power consumption of CMOS circuits," in Proc. Intl. Conf. on Computer-Aided Design, 1987, pp. 534–537.
- [13] F. Najm, R. Burch, P. Yang, and I. Hajj, "Probabilistic simulation for reliability analysis of CMOS VLSI circuits," in *IEEE Transactions on Computer Aided Design*, 1990, pp. 439–450.
- [14] G. I. Stamoulis and I. N. Hajj, "Improved techniques for probabilistic simulation including signal correlation effects," in *Proc. of* the Design Automation Conf., 1993, pp. 379–383.
- [15] C. Y. Tsui, M. Pedram, and A. M. Despain, "Efficient estimation of dynamic power consumption under a real delay model," in *Proc. Intl. Conf. on Computer-Aided Design*, 1993, pp. 224–228.
- [16] T. Uchino and J. Cong, "An interconnect energy model considering coupling effects," in *Proc. of the Design Automation Conf.*, 2001, pp. 555–558.
- [17] A. Ghosh, S. Devdas, K. Keutzer, and J. White, "Estimation of average switching activity in combinational and sequential circuits," in Proc. of the Design Automation Conf., 1992, pp. 253–259.
- [18] M. Xakellis and F. Najm, "Statistical estimation of the switching activity in digital circuits," in Proc. of the Design Automation Conf., 1994, pp. 728-733.
- [19] H. Zhou, "Timing analysis with crosstalk is a fixpoint on a complete lattice," in *IEEE Transactions on Computer-Aided Design*, *September 2003*, pp. 1261–1269.
- [20] P. Chen, Y. Kukimoto, C. C. Teng, and K. Keutzer, "On convergence of switching window computation in presence of crosstalk noise," in *International Symposium on Physical Design*, 2002, pp. 84–89.
- [21] S. S. Sapatnekar, "A timing model incorporating the effect of crosstalk on delay and its application to optimal channel routing," *IEEE Transactions on Computer Aided Design*, 2000.
- [22] F. Brglez and H. Fujiwara, "A neutral netlist of 10 combinatorial benchmark circuits," in Proc. Intl. Symposium on Circuits and Systems, 1985, pp. 695–698.
- [23] A. Agarwal, K. Chopra, and D. Blaauw, "Statistical timing based optimization using gate sizing," in *Proc. DATE: Design Automa*tion and Test in Europe, 2005, pp. 400-405.
- [24] P. Chen, Y. Kukimoto, and K. Keutzer, "Refining switching window by time slots for crosstalk noise calculation," in *Proc. Intl. Conf. on Computer-Aided Design*, 2002, pp. 583–586.