### Hierarchical Electromigration Reliability Diagnosis for VLSI Interconnects \*

Chin-Chi Teng, Yi-Kan Cheng, Elyse Rosenbaum, and Sung-Mo Kang Coordinated Science Lab. and Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, Urbana, IL 61801.

Abstract — In this paper, we present a hierarchical reliability-driven CAD system for the design of *electromi*gration resistant circuits. The top of the hierarchy aims at quickly identifying those critical interconnects with potential electromigration reliability problems. Then the detailed electromigration analysis of critical interconnects is carried out by an accurate and computationally efficient simulation tool (iTEM). This top-down approach provides a feasible solution to the complicated electromigration diagnosis problem.

#### 1. INTRODUCTION

*Electromigration* (EM) is the phenomenon of metal ion mass transport along the grain boundaries, when a metallic interconnect is stressed at high current density. In recent years, the line width of thin-film interconnects has been shrunk into the submicron regime. This gives rise to serious concerns about electromigration-induced failures. Electromigration-induced voiding can grow and lead to resistance increase or even catastrophic open of an interconnect. Electromigration-induced hillocks can cause both intra-level and inter-level metal shorting. The time and the location of void-open or extrusion-short are basically of statistical nature, depending on the spatial distribution of current density and temperature.

Previous efforts on building EM-induced failure estimators mostly focused on developing the EM failure models and embedding them into general purpose circuit simulators, such as SPICE, for predicting EM effect over time [1, 2]. This approach is computationally expensive due to the amount of SPICE run time. Also, the diagnosis results are strongly input-pattern dependent and not sufficient to determine if the target interconnect system is reliable or not. An alternative is to use probabilistic simulation [3] to get average current stress in the interconnections. This approach is fast and the results are input-pattern independent. However, in the design-for-reliability paradigm, the worstcase reliability analysis is more important than the average case to guarantee the long-term reliability of VLSI chips.



Figure 1: A hierarchical environment for interconnect electromigration reliability diagnosis.

Moreover, information regarding the reliability impact of individual input vector, which can be useful in locating the unreliable design, is lost.

Computing the worst-case EM reliability indicator such as mean time-to-failure (MTF) for a single interconnect in a digital circuit is an NP-complete problem [4]. It is even worse to consider all interconnects since so many interconnects exist in the whole circuit. In this paper, we propose a hierarchical EM reliability diagnosis method, which provides a feasible solution to this complicated problem. Our EM diagnosis method uses two levels of the diagnoses (Figure 1). The top of the hierarchy is an input-pattern independent EM diagnosis procedure. It can quickly identify those critical interconnects with potential electromigration reliability problems and the corresponding input patterns which cause worst-case current stress to each critical interconnect; thus, the problem size of the worst-case EM diagnosis is significantly reduced and becomes tractable. Thereafter, designers can focus on the critical interconnects and feed those critical input patterns into an electromigration-reliability simulation tool, iTEM, to compute the accurate EM-induced failure of the interconnect systems. One important feature of iTEM is that, in addition to the current density and geometry, it takes into account the steady-state temperature of every interconnect under the given operating conditions. In a state-of-the-art chip, the temperature of the interconnect may rise up tens of degrees above the ambient due to joule heating and heat conduction from the substrate. Neglecting the temperature effect on EM-induced failure can lead to intolerable prediction errors. Our EM diagnosis method combines the characteristics of the input-pattern dependent and independent reliability diagnoses. This top-down approach not only handles large circuit layouts containing tens of thousands of transistors and interconnects even on a desktop computer, but it also gives an accurate worst-case EM

33rd Design Automation Conference ®

<sup>\*</sup>This research was supported by JSEP (N00014-96-1-0129), Rome Laboratory (F30602-94-1-0006), and Semiconductor Research Corp. (SRC95DP109).

Permission to make digital/hard copy of all or part of this work for personal or class-room use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permssion and/or a tee. DAC 96 - 06/96 Las Vegas, NV, USA ©1996 ACM, Inc. 0-89791-833-9/96/0006.\$3.50



Figure 2: A latch-controlled synchronous digital circuit.

reliability estimation.

The paper is organized as follows: First, we introduce the employed EM failure model in Section 2. Section 3 and Section 4 show our input-pattern independent and dependent diagnoses, respectively. The conclusions are drawn in Section 5.

#### 2. ELECTROMIGRATION-INDUCED FAILURE MODEL

EM-induced mean time-to-failure under DC current stress has been well established by Black's equation [5]:

$$MTF = A \cdot J^{-n} \cdot exp(E_a/kT). \tag{1}$$

 $E_a$  and k are activation energy and Boltzmann's constant, respectively. J is the current density. n is the current density exponent. Its value is usually around 2. A is a proportionality constant dependent on the physical dimension of the interconnect. In the digital circuit environment, the interconnects will experience unidirectional pulsed or bidirectional current stress. Many models have been proposed to estimate EM-induced failure lifetime under the bidirectional current stress. In this paper, we use the Average Current Recovery (ACR) [6] model:

$$MTF_{ac} = A \cdot J_{eff}^{-n} \cdot exp(E_a/kT).$$
<sup>(2)</sup>

The effective current density  $J_{eff}$  is defined as

$$J_{eff} = |\bar{J}^+| - \gamma |\bar{J}^-|, \tag{3}$$

where  $\bar{J}^+$  and  $\bar{J}^-$  are the time-averaged current density only including positive current or negative current, respectively, and  $|\bar{J}^+| \geq |\bar{J}^-|$  is assumed.  $\gamma$ , which is in the range [0,1], represents the degree of damage recovery due to opposite polarity current. The experimental results show that the value of  $\gamma$  is around 0.9 in most of interconnect materials [6].  $\gamma$ , A,  $E_a$ , and n in Eq. (2) can be obtained from measurement. The unknown parameters for the interconnects in a digital circuit are effective current density  $J_{eff}$  and the temperature T.

According to the stress conditions, we can divide all interconnects into two categories. The first one is the signal lines between gates, which is under symmetrical AC current stress (*i.e.*,  $|\bar{J}^+| \approx |\bar{J}^-|$ ). Due to healing effect, they have very large MTFs, and thus are neglected in our diagnosis. The others are power and ground buses. Basically they experience unsymmetrical AC or unidirectional current stress, which may lead to serious EM-reliability problem.

#### 3. INPUT PATTERN-INDEPENDENT EM DIAGNOSIS

For the pattern independent EM diagnosis, we only consider latch-controlled synchronous CMOS digital circuits. In general, this is not a severe limitation, since significant portions of VLSI circuits operate synchronously. As Figure 2 indicates, synchronous circuits consist of combinational logic



Figure 3: The block diagram of pattern independent diagnosis.

blocks separated by latches. In this design style, we may assume that all primary inputs of combinational logic blocks will switch simultaneously at most once under latch clocking. To elucidate the following discussion, we begin by defining two relevant terms:

- An *input pattern* to a synchronous circuit is a sequence of two input vectors, since at most two different input vectors will appear at the primary inputs during one clock period.
- The problem of finding the *worst-case current stress* of an interconnect is defined as that of finding its maximum effective current density over one clock period.

The main goals of the input-pattern independent EM diagnosis are to (1) identify those critical interconnects with potential electromigration reliability problems and (2) find the corresponding input patterns which cause worst-case current stress in each critical interconnect. A systematic procedure for input-pattern independent diagnosis has been developed and implemented in a logic level simulator with timing models. Its block diagram is shown in Fig. 3.

#### 3.1 Initialization

First, we describe the initialization, the procedures within the dashed box in Fig. 3. Electromigration reliability is strongly related to the circuit layout. The input to our tool is the circuit layout description in CIF or GDSII format. The layout extractor is executed to obtain the transistor netlist from the given layout and the on-chip x-y coordinates of each transistor (which will be used in iTEM later). The extracted transistor netlist is then mapped into a logic gate netlist for gate-level simulation. Next, the power and ground buses are extracted from the entire layout. Those transistors contacted to power/ground bus are identified and their contacts are located. Consequently, power and ground buses can be transformed into networks like that shown in Fig. 4, where the current sources represent the current coming from metal-diffusion contacts. Then we extract the resistive networks from the power and ground bus lines and



Figure 4: Power and ground bus networks.

build an admittance matrix for the resistive networks. The models of HPEX [7] are used for the resistance extraction.

After obtaining the admittance matrix of power/ground bus, we compute the *effective current density* equation for each interconnect. The current passing a transistor may induce positive or negative current stress to a specific interconnect (denoted by m) dependent on the relative geometric position between the transistors and interconnect m. Suppose that  $\bar{I}_i$  is time-averaged current of transistor i. The amount of the positive and negative time-averaged current of interconnect m can be expressed as the following equations:

$$\bar{I}_{m}^{+} = \sum_{i=1}^{K} \alpha_{m,i} \bar{I}_{i}, \text{ and } \bar{I}_{m}^{-} = \sum_{i=1}^{K} \beta_{m,i} |\bar{I}_{i}|, \qquad (4)$$

where K is the number of transistors contacted to power/ground bus.  $\alpha_{m,i}$  and  $\beta_{m,i}$  are all non-negative variables and represent the percentage of  $\bar{I}_i$  flowing into interconnect m.  $\alpha_{m,i} = 0$  if transistor *i* induces negative current stress to interconnect m; whereas  $\beta_{m,i} = 0$  if transistor *i* induces positive current stress. Assume that the admittance matrix of power/ground bus is  $A_{n \times n}$ , and B is an  $n \times 1$  current source vector. The procedure to compute  $\alpha_{m,i}$  and  $\beta_{m,i}$ is basically to solve nodal equation  $A \cdot V = B$  consecutively K times. The computational complexity is  $O(n^3 + K \cdot n^2)$ , where n is the number of nodes in the resistive network.

Without loss of generality, we assume that  $\bar{I}_m^+ \geq \bar{I}_m^-$ . According to ACR model (Eq. (2)), we define *effective current* density equation of interconnect m as follows:

$$J_{eff} = \frac{\bar{I}_m^+ - \gamma \cdot \bar{I}_m^-}{W_m \cdot T_m},\tag{5}$$

where  $W_m$  and  $T_m$  are the width and thickness of interconnect m, respectively. Eq. (5) will be used in the following input independent diagnosis procedures. Note that the real effective current density  $J_{eff,real}$  is smaller than  $J_{eff}$ , since some portion of  $\bar{I}_m^+$  and  $\bar{I}_m^-$  may occur simultaneously and be cancelled. The reason for using Eq. (5) is to save computational time in the high-level diagnosis. Basically

$$J_{eff} \ge J_{eff,real} \ge \frac{\bar{I}_m^+ - \bar{I}_m^-}{W_m \cdot T_m}.$$
(6)

From the above equations, it can be shown that

$$J_{eff} - J_{eff,real} \le (1 - \gamma) \frac{\bar{I}_m^-}{W_m \cdot T_m}.$$
(7)

If  $\overline{I_m}$  is small, then the error of  $J_{eff}$  is negligible. If  $\overline{I_m}$  is large, then this interconnect is not a critical intercon-

nect due to the healing effect. Therefore, Eq. (5) is a reasonable estimator of the effective current density. If the interconnect experiences unidirectional current stress, then  $J_{eff} = J_{eff,real}$ .

# 3.2 Identify critical interconnects with potential EM-reliability problems

Many studies have shown that the electromigration phenomenon will not appear if the effective current density  $J_{eff}$ of an interconnect doesn't exceed the threshold current density [8, 9]. Our method to decide whether an interconnect has potential EM-reliability problem is to find an upperbound on its  $J_{eff}$ . If the upper bound of  $J_{eff}$  is smaller than the threshold current density, then this interconnect is free of EM-reliability problem. In CMOS integrated circuits, both power consumption and current stress of the interconnects are strongly related to the circuit switching activity. Our procedure to identify the critical interconnects is as follows.

#### Procedure 1 : Identify critical interconnects with potential EM-reliability problems

- 1. Estimate the maximum number of switching events of every logic gate during one clock period.
- 2. Based on the maximum switching activity, compute the maximum time-averaged current passing each transistor during one clock period.
- 3. For each interconnect, set healing factor  $\gamma = 0$  and use the maximum average current of every transistor and Eq. (5) to compute the upper-bound of the effective current density.
- 4. Check the upper-bound effective current density of every interconnect. If the upper-bound of an interconnect is smaller than threshold current density, it is EM-reliable; otherwise, put it in the list of the critical interconnects.

If the target circuit is a dynamic CMOS circuit, it is simple to estimate the maximum number of switching events, since a dynamic logic gate will switch at most once during one clock period. However, in a static CMOS circuit, due to uneven circuit delay paths, even a single simultaneous switching event on the primary inputs can give rise to multiple switching events at an internal node. Some algorithms [10, 11] have been developed to estimate maximum transition density for the static CMOS circuits. We apply the single transition interval (STI) algorithm [10] here. This algorithm is based on the technique of propagating uncertainty signal waveforms throughout the circuit, and then counting the maximum switching activity in the uncertainty waveform obtained at every node. As the experimental results indicate [10], the computational complexity of this algorithm is linear in the number of gates in the circuit. The accuracy of this approach is also very high. Using the STI algorithm, the computational complexity of Procedure 1 is  $O(K \cdot M)$  (please refer to Eq. (4)), where K is the number of transistors and M is the number of interconnects.

Note that Procedure 1 guarantees to obtain the upperbound values of  $J_{eff}$  for all interconnects. However, the upper-bound may be pessimistic without considering the signal correlation between the logic gates.

#### 3.3 Estimating "typical" worst-case value of the effective current density using Monte Carlo simulation



Figure 5: 99.9<sup>th</sup> percentile point.



Figure 6: One iteration of logic simulation.

As mentioned before, computing the exact worst-case  $J_{eff}$  for an interconnect is an NP-complete problem. Fortunately, in most situations it is sufficient to obtain an estimate of the "typical" worst-case  $J_{eff}$ . In this sub-section, we will define the "typical" worst-case  $J_{eff}$  and how to use Monte Carlo simulation to find it.

 $J_{eff}$  of an interconnect over one clock period is a function of the input patterns. If we assume that, for any input pattern *i*, the probability to occur is g(i), then  $J_{eff}$  can be defined as a random variable X with the probability distribution function f(x) and the cumulative distribution function F(x) on the interval  $x \in [0, J_{eff,max}]$ , where  $J_{eff,max}$ is the real maximum effective current density. 99.9<sup>th</sup> percentile (denoted by  $\xi_{99.9}$ ) of the distribution f(x) is the point where  $F(\xi_{99.9}) = 0.999$  (See Fig. 5), which means that 99.9% of input patterns will induce  $J_{eff}$  smaller than the value of  $\xi_{99,9}$ . In most situations, if an interconnect can tolerate the current stress of  $\xi_{99.9}$ , it can be deemed as an EM-reliable interconnect safely. Thus, we define the value of  $\xi_{99.9}$  as the "typical" worst-case  $J_{eff}$ . The objective of our Monte Carlo simulation is to obtain an input pattern which can induce  $J_{eff}$  equal to or larger than the value of  $\xi_{99.9}$  for every critical interconnect. Note that (1) different interconnects have different values of  $\xi_{99.9}$ , (2)  $\xi_{99.9}$  is not necessarily equal to  $0.999\,J_{eff,max},$  and (3) "typical" worstcase  $J_{eff}$  can be defined by users. Here we just used  $\xi_{99.9}$ as an example.

Considering interconnect m in a circuit (see Fig. 6), a sample  $J_{eff,i}$  can be acquired through logic-level simulation when given an input pattern i. By the definition of  $\xi_{99.9}$ ,

$$Prob(J_{eff,i} \le \xi_{99,9}) = 0.999. \tag{8}$$

Suppose that a random sample  $[J_{eff,1}, J_{eff,2}, ..., J_{eff,n}]$  is obtained by executing n iterations of logic simulation with different input patterns and observing the effective current density at each iteration.  $P_n = max(J_{eff,i})$  for all *i*. Then

$$Prob(P_n \ge \xi_{99.9}) = 1 - Prob(P_n < \xi_{99.9}) = 1 - (0.999)^n.$$
(9)



Figure 7: The lists of the critical interconnects and input patterns .

The minimum number of iteration *n* needed to make  $Prob\{P_n > \xi_{99.9}\} \ge 1 - \delta$  is

$$n \geq \frac{\log(\delta)}{\log(0.999)}.$$
(10)

Eq. (10) means that, after *n* iterations of Monte Carlo simulation, an input pattern causing  $J_{eff,i}$  larger than  $\xi_{99.9}$  can be found with confidence  $1 - \delta$ . The value of *n* is independent of the number of primary inputs and the size of the circuit.

Based on the above non-parametric inference method, the procedure to get the "typical" worst-case  $J_{eff}$  is as follows:

## Procedure 2 : Estimating "typical" worst-cast $J_{eff}$ using Monte Carlo simulation.

- 1. Use Eq. (10) to determine the minimum n for the userspecified confidence  $1 - \delta$ .
- 2. Execute n iterations of the logic-level simulation and obtain n effective current density samples for each critical interconnect. The input patterns are generated randomly.
- 3. Check the maximum  $J_{eff}$  sample for every critical interconnect. If the maximum sample is smaller than threshold current density, remove this interconnect from the critical-interconnect list; otherwise, record the input pattern causing the maximum  $J_{eff}$  sample.

The computational complexity of Procedure 2 is O(n + $K \cdot M_{crit}$ ), where *n* is the number of iterations of logic simulation, K is the number of transistors, and  $M_{crit}$  is the number of critical interconnects obtained from Procedure 1. After Procedure 2, lists of critical interconnects and input patterns are generated as shown in Fig. 7. Every critical interconnect has a corresponding input pattern which can induce its "typical" worst-case  $J_{eff}$ . Different critical interconnects may have the same critical input pattern, when the correlation of  $J_{eff}$  between two interconnects is significant. Usually the number of critical input patterns is much less than that of critical interconnects. Those critical lists can be fed to the input dependent electromigration diagnosis tool, iTEM, to do detailed simulation. As we can see, Procedure 2 has further reduced the worst-case electromigration diagnosis problem. Note that in order to keep the independence of J<sub>eff</sub> samples, unlike other Monte Carlo simulation method [12], the circuit is given a sequence of two input vectors (*i.e.*, an input pattern) for one iteration of logic simulation (see Fig. 6) rather than a long sequence of input vectors.

| Circuit | Circuit Size |               | CPU Time (Sec.) |                   |
|---------|--------------|---------------|-----------------|-------------------|
|         | No. of       | No. of        | Layout          | Compute $J_{eff}$ |
| Name    | transistors  | interconnects | extraction      | equation          |
| C432    | 1152         | 3721          | 5.69            | 22.28             |
| C499    | 2266         | 7401          | 14.12           | 65.05             |
| C880    | 1768         | 5513          | 8.90            | 42.38             |
| C1355   | 2442         | 7640          | 12.68           | 66.58             |
| C3540   | 5842         | 17348         | 34.96           | 260.47            |
| C6288   | 10706        | 33544         | 74.79           | 900.99            |
| C7552   | 13541        | 41326         | 81.05           | 943.83            |

Table 1: Circuit sizes and CPU times for initialization.

| Circuit<br>Name | CPU Time<br>(Sec.) | % of critical interconnects | % of non-critical<br>interconnects |
|-----------------|--------------------|-----------------------------|------------------------------------|
| C432            | 0.27               | 1.67%                       | 98.33%                             |
| C499            | 0.29               | 0.81%                       | 99.19%                             |
| C880            | 0.25               | 1.56%                       | 98.44%                             |
| C1355           | 0.70               | 3.35%                       | 96.65%                             |
| C3540           | 2.38               | 4.16%                       | 95.84%                             |
| C6288           | 3.29               | 20.26%                      | 79.74%                             |
| C7552           | 5.08               | 11.32%                      | 88.68%                             |

Table 2: The results of Procedure 1

## 3.4 Experimental results of pattern independent diagnosis

For testing the pattern independent diagnosis procedures, we have chosen a set of circuits from ISCAS 85 benchmark circuits. The circuits are synthesized for minimum delay using SIS and their layouts are generated by iCGEN [13]. Table 1 shows the sizes of the test circuits and the computational time for initialization. The machine used is a SUN SPARC station 10 with 96 MBytes physical memory and 256 MBytes virtual memory.

We then show the results of identifying critical interconnects using Procedure 1 in Sec. 3.2. The employed threshold current density is the current density which leads to a MTF of 10 years at 70°C. The constants in Eq. (2) were obtained from [6]. From Table 2, it can be found that Procedure 1 is fast. It only takes 5.08 seconds of CPU time to handle the largest circuit. It also can efficiently filter out non-critical interconnects. That will significantly reduce the CPU time needed for the following procedures (please refer to Table 4). The results of Procedure 1 depend on the given circuits. If the design of the interconnect system is very conservative, Procedure 1 may find all of interconnects are non-critical, and the diagnosis process can stop here.

Next, we verify the validity of the second procedure proposed in Sec 3.3. The experiments were run on a logic simulator with timing models. For each circuit, we monitored only one interconnect which has the largest  $J_{eff}$  among all interconnects in the circuit. To estimate the value of  $\xi_{99.9}$  of the monitored interconnect, we first ran Monte Carlo logic simulation for 2 million iterations. Among 2 million samples of  $J_{eff}$ , we chose the 2000th largest one as an estimate of the value of  $\xi_{99.9}$ . To demonstrate the robustness of Procedure 2, we then ran it 1000 times with different random seeds. For each run, we executed 6905 iterations of logic simulation to achieve 99.9% confidence. The minimum and average values of the "typical" worst-case  $J_{eff}$ , obtained from 1000 runs of Procedure 2, are listed in the second and third columns of Table 3, respectively. The fourth column

| Circuit | Effective Current Density $(MA/cm^2)$ |                          |       |                |
|---------|---------------------------------------|--------------------------|-------|----------------|
|         | estimate of                           | 1000 runs of Procedure 2 |       |                |
| Name    | the value of $\xi_{99,9}$             | Min.                     | Avg.  | violation rate |
| C432    | 0.947                                 | 0.945                    | 1.032 | 0.3%           |
| C499    | 0.954                                 | 0.955                    | 1.011 | 0.0%           |
| C880    | 0.592                                 | 0.592                    | 0.654 | 0.0%           |
| C1355   | 0.646                                 | 0.644                    | 0.703 | 0.1%           |
| C3540   | 2.694                                 | 2.693                    | 2.911 | 0.2%           |
| C6288   | 3.150                                 | 3.172                    | 3.452 | 0.0%           |
| C7552   | 3.315                                 | 3.315                    | 3.570 | 0.1%           |

Table 3: The results of Procedure 2.

| Circuit | CPU tin       | CPU time      |       |
|---------|---------------|---------------|-------|
|         | all           | only crit.    |       |
| Name    | interconnects | interconnects | saved |
| C432    | 959.80        | 249.47        | 74%   |
| C499    | 1229.45       | 258.18        | 79%   |
| C880    | 1132.75       | 334.16        | 70%   |
| C1355   | 3445.20       | 723.47        | 79%   |
| C3540   | 11631.39      | 3082.32       | 83%   |
| C6288   | 15815.65      | 9647.55       | 39%   |
| C7552   | 17312.83      | 5972.93       | 66%   |

Table 4: The CPU time of Procedure 2.

shows the violation rate which is defined as the rate of Procedure 2 obtaining a "typical" worst-case  $J_{eff}$  less than the estimate of  $\xi_{99.9}$ . It was observed that (1) the violation rate is close to the expected value of 0.1%, and (2) for those few cases that Procedure 2 obtained a "typical" worst-case  $J_{eff}$ less than the estimate of  $\xi_{99.9}$ , their results are at most 1% below the estimate of  $\xi_{99.9}$ . Finally, Table 4 shows the CPU times of Procedure 2 considering all interconnects and only critical interconnects. As expected, using Procedure 1 to pre-process circuits can save up to 83% of CPU time.

#### 4. iTEM : AN ACCURATE INPUT PATTERN-DEPENDENT EM DIAGNOSIS

In this section, an accurate input-pattern dependent reliability diagnosis tool, iTEM, is briefly presented. In our hierarchical environment, this tool accurately computes electromigration MTF for the critical interconnects using the previously determined critical input patterns. With iTEM as guide, the designer can appropriately re-design the layout to meet the reliability criterion. Of course, iTEM also can be used independent of our hierarchical environment. The unique feature of iTEM is that it takes into account the steady-state temperature of every interconnect under the given operating condition. Including the temperature effect, iTEM can provide more accurate prediction on EMinduced failure than other EM diagnosis tools.

iTEM is an integrated system which consists of a 2-D geometry layout extractor, a sparse matrix solver, a timing simulator, a 3D thermal simulator and an interconnect temperature estimator. The initialization of iTEM is to extract layout information, which is very similar to that of the pattern-independent diagnosis. The extracted information is then fed to the timing simulator. Note that if iTEM is used in our hierarchical diagnosis system, it does not need to run initialization procedure, since the layout extraction has been done in the pattern independent diagnosis stage.

When given an input pattern, the timing simulator (IL-LIADS) computes the current waveform of each transistor,



Figure 8: The block diagram of iTEM.

and the power dissipation of each logic gate. Every logic gate is treated as a heat source. A 3D thermal simulator [14] is called to calculate the temperature profile on the surface of the substrate by solving heat diffusion equations, taking into account the location of the heat sources, chip dimensions, and packaging material parameters. Our thermal simulator calculates the steady-state temperature rather than the transient one. The reason is that the time to reach thermal steady-state in a silicon chip is in the range of a millisecond, which is much larger than the switching period. The sparse matrix solver is used to solve the power/ground resistive network, providing accurate current waveforms for every metal rectangle, via, and metal-diffusion contact.

Three phenomena may raise interconnect temperature above the ambient: joule heating, heat conduction from the substrate and heat conduction from nearby wires. Up to this stage, iTEM has obtained the substrate surface temperature profile and interconnect current waveforms. Based on these data, iTEM can estimate the temperature of each interconnect using an interconnect lumped thermal model [15]. We presently ignore the heat flow from nearby wires since it is usually small compared to the heat flow from the substrate. After collecting all of the necessary information, EM lifetime is projected by the ACR model.

iTEM simulation results for three different circuits are shown in Table 5. It can be found that the predicted MTF may decrease as much as 17 times if heating effects are considered. The details of iTEM program can be found in [15].

#### 5. CONCLUSIONS

Our hierarchical EM diagnosis method provides a feasible solution to the complicated electromigration diagnosis problem. The top level is an input-pattern independent diagnosis procedure. It first identifies the critical interconnects based on the estimation of the upper-bound effective current density. Then the Monte Carlo simulation is executed to find the input pattern causing "typical" worst-case current stress for each critical interconnect. After the input-pattern independent diagnosis procedure, designers can focus on the

| Circuit            |         | 10-bit               | C3540                | C6288                |
|--------------------|---------|----------------------|----------------------|----------------------|
|                    |         | Adder                |                      |                      |
| No. of Transistors |         | 868                  | 5842                 | 10706                |
| Operation Freq.    |         | $300\mathrm{MHz}$    | $100\mathrm{MHz}$    | $100\mathrm{MHz}$    |
| Avg. Substrate     |         |                      |                      |                      |
| Temperature        |         | $65.38^{\circ}C$     | $41.23^{\circ}C$     | $39.21^{\circ}C$     |
| Avg. Metal         |         |                      |                      |                      |
| Temperature        |         | $65.40{}^{\circ}C$   | $41.25^{\circ}C$     | $39.44^{\circ}C$     |
| Peak Metal         |         |                      |                      |                      |
| Temperature        |         | $67.57^{\circ}C$     | 44.19°C              | $69.55^{\circ}C$     |
| CPU Time (Sec.)    |         | 76                   | 848                  | 1177                 |
|                    | w/o     | _                    |                      |                      |
| MTF                | thermal | $6.8 \times 10^{6}$  | $5.84 \times 10^{6}$ | $5.29 \times 10^{5}$ |
| (hours)            | with    |                      |                      |                      |
|                    | thermal | $3.91 \times 10^{5}$ | $1.1 \times 10^{6}$  | $6.27 \times 10^{4}$ |

Table 5: Simulation results of iTEM.

critical interconnects and feed those critical input patterns into the electromigration-reliability simulation tool, iTEM, to compute the accurate EM-induced failure of the interconnect systems. Unlike traditional pattern-dependent EM simulation tools, iTEM takes into account the thermal effect of the interconnects to provide more accurate EM failure estimation. Both pattern-independent and pattern-dependent tools can be used together or separately. They also can be easily ported to in-house electrical or logic simulators to adapt the particular design environment.

#### REFERENCES

- [1] J. E. Hall, D. E. Hocevar, P. Yang, and M. J. McGraw, "SPIDER- a CAD system for modeling VLSI metallization pat-terns," *IEEE Trans. Computer-Aided Design*, vol. CAD-36, pp. 1023-1031, Nov. 1987.
- [2] R. H. Tu, E. Rosenbaum, W. Y. Chan, C. C. Li, E. Mi-nami, K. Quader, P. K. Ko, and C. Hu, "Berkley reliability tools-BERT," *IEEE Trans. Computer-Aided Design*, vol. 12, pp. 1524-1534, Oct. 1993.
- F. N. Najm, R. Burch, P. Yang, and I. N. Hajj, "Probabilistic simulation for reliability analysis of CMOS VLSI circuits," *IEEE Trans. Computer-Aided Design*, vol. 9, pp. 439–450, Apr. 1990. [3]
- [4] S. Devadas, K. Keutzer, and J. White, "Estimation of power dissipation in CMOS combinational circuits using boolean function manipulation," *IEEE Trans. Computer-Aided Design*, vol. 11, pp. 373-383, Mar. 1002. manipulation," *IEEE T* pp. 373-383, Mar. 1992.
- [5] J. R. Black, "Electromigration failure modes in aluminum met-allization for semiconductor devices," Proc. IEEE, vol. 57, pp. 1587-1594, Sept. 1969.
- [6] L. M. Ting, J. S. May, W. R. Hunter, and J. W. McPherson, "AC electromigration characterization and modeling of multi-layered interconnects," in *Proc. IEEE Int. Reliability Physics Symposium*, pp. 311-316, 1993.
- [7] S. L. Su, V. B. Rao, and T. N. Trick, "HPEX: A hierarchical parasitic circuit extractor," in *Proc. ACM/IEEE Design Automation Conf.*, pp. 566-569, 1987.
  [8] H.-U. Schreiber, "Electromigration threshold in aluminum films," *Solid State Electron*, vol. 28, no. 6, p. 617, 1985.
- J. J. Clement, "Vacancy supersaturation model for electromigra-tion failure under DC and pulsed DC stress," *Journal of Applied Physics*, vol. 91, pp. 4264-4268, May 1992. [9] Journal of Applied
- C.-C. Teng, A. M. Hill, and S. M. Kang, "Esitmation of max-imum transition counts at internal nodes in CMOS VLSI cir-cuits," in *Proc. ACM/IEEE Int. Conf. Computer-Aided De-sign*, pp. 366-370, Nov. 1995. [10]
- [11] F. Najm and M. Y. Zhang, "Extreme delay sensitivity and the worst-case switching activity in VLSI circuits," in Proc. ACM/IEEE Design Automation Conf., pp. 623-627, 1995.
- [12] R. Burch, F. N. Najm, P. Yang, and T. N. Trick, "A monte carlo approach for power estimation," *IEEE Trans. VLSI Systems*, vol. 1, pp. 63-71, Mar. 1993.
- [13] J. Kim, S. M. Kang, and S. Sapatnekar, "High performance CMOS macromodule layout synthesis," in *Proc. IEEE Int.* Symposium on Circuits and Systems, pp. 179-182, May 1994.
- [14] Y.-K. Cheng and S. M. Kang, "Chip-level thermal simulator to predict vlsi chip temperature," in Proc. IEEE Int. Symposium on Circuits and Systems, pp. 1392-1395, 1995.
- [15] C.-C. Teng, Y.-K. Cheng, E. Rosenbaum, and S. M. Kang, "item: A chip-level electromigration reliability diagnosis tool using electrothermal timing simulation," in *Proc. IEEE Int. Re-liability Physics Symposium*, Apr. 1996.