# ELECTROMIGRATION RELIABILITY ENHANCEMENT VIA BUS ACTIVITY DISTRIBUTION

Aurobindo Dasgupta and Ramesh Karri Department of Electrical and Computer Engineering University of Massachusetts, Amherst, MA 01003 {dasgupta,karri}@ecs.umass.edu

Abstract: Electromigration induced degradation in integrated circuits has been accelerated by continuous scaling of device dimensions. We present a methodology for synthesizing high-reliability and low-energy microarchitectures at the RT level by judiciously binding and scheduling the data transfers of a control data flow graph(CDFG) representation of the application onto the buses in the microarchitecture. The proposed method accounts for correlations between data transfers and the constraints on the number of buses, area and delay.

## 1 Introduction

With VLSI fabrication technology reaching sub-micron device dimensions, the role of the interconnect has become dominant in determining the reliability of integrated circuits [1]. As a result of sub-micron scaling, current densities of  $10^6 A/cm^2$  or more exist in metal interconnects [3]. These current densities far exceed the threshold current densities for electromigration ( $\approx$  500000  $A/cm^2$ ). Research on the DEC ALPHA CPU has shown that the electromigration median-time-to-failure **MTF** (defined as the time for 50% of the metal lines to fail) is of the order of  $10^4 - 10^5$ hours (1-10 years) [1, 4]. Electromigration is predominantly due to the transport of conductor metal atoms caused by the momentum transferred by the electron current. If the electron current density is sufficiently high then the metal atoms get depleted from one region on the conductor and pile up at other regions. This accumulation and depletion process continues until it becomes severe enough for circuit failure. Equation 1, relates the electromigration MTF of a conductor to the current density, J, and temperature, T [9, 10].

$$MTF = \frac{A}{J^n} \cdot \exp E_a / kT \tag{1}$$

In equation 1,  $E_a$  is the activation energy, k is the Boltzmann's constant and the constant A depends on the physical dimensions of the metal conductor. Experiments reveal that the empirical parameter n is approximately 2.

The current density J on a line i depends on the probability  $p_i$ , that the line toggles in a clock cycle as shown in equation 2.

$$J = \frac{C \cdot V_{dd}}{W \cdot H} \cdot f \cdot p_i \tag{2}$$

C, W and H are the capacitance, width and thickness, respectively, of the line, and f is the clock frequency. From equation 2, it can be seen that minimizing  $p_i$  reduces the current density in line i, which in turn maximizes the MTF. If  $SA_{max}$  is the maximum among the total switching activities calculated during one CDFG computation for all lines and  $T_{max}$  is the total time required to implement the CDFG, then  $p_i$  for the most electromigration-susceptible line is given by  $SA_{max}/T_{max}$ . Assuming that all lines in a bus have the same geometry,  $MTF \propto \frac{1}{(SA_{max}/T_{max})^2}$ .

On the other hand, the energy dissipated E, is proportional to  $SA_{tot}$ , the total switching activity on all lines in one CDFG computation [12]. Consequently, while minimizing  $(SA_{max}/T_{max})$  increases the MTF, minimizing  $SA_{tot}$  reduces the energy. In this paper, we present RT-level techniques to synthesize high-reliability, low-energy designs by suitably multiplexing signals onto bus lines. The objectives of the proposed algorithm are to (a) maximize  $MTF \propto 1/(SA_{max}/T_{max})^2$  or (b) minimize  $E \propto SA_{tot}$ , or (c) a combination thereof.

Previous research in the area of CAD for reliability has resulted in the development of tools to compute the current waveforms. Tools such as RELIC [8], SPIDER [7], CREST [11] and BERT-EM [2] perform transient analyses and determines current density and electromigration failure for each element. CAD techniques for hot-carrier reliability have also been developed [6, 13].

#### 2 EM-Reliability Enhancement

Owing to their length and width characteristics, buses are highly susceptible to electromigration [5]. On the other hand, metal lines in functional units and controllers being short and narrow are less susceptible

33rd Design Automation Conference ®

Permission to make digital/hard copy of all or part of this work for personal or class-room use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permssion and/or a tee. DAC 96 - 06/96 Las Vegas, NV, USA ©1996 ACM, Inc. 0-89791-833-9/96/0006.\$3.50

to electromigration. Finally, while power and ground lines are susceptible, their large widths increase their MTF. The MTF of bus lines can be increased by increasing the line width or by minimizing the maximum switching activity. Based on these observations, switching activity and EM effects on bus lines can be reduced by judiciously sequencing data transfers in a computation onto the available buses during RTL synthesis.

Consider an example Control Data Flow Graph (CDFG) with three add operations (1, 2 and 3) and 6 data transfers (a, b, c, d, e and f) in figure 1. Every add operation is assumed to require one clock cycle. Two possible schedules of the CDFG are shown in figure 2. Two adders and two buses are used to implement these schedules. The total execution time of these schedules is three clock cycles.



Figure 1: An example CDFG.



Figure 2: Two schedules for the CDFG in figure 1.

Let the expected switching activity of the  $l^{th}$  line on a bus when data transfer j is followed by data transfer  $i \ (i \Rightarrow j)$ , be denoted by  $p_{l(i\Rightarrow j)}$ . For the CDFG in figure 1,  $p_{l(i \Rightarrow j)}$  for all data transfers i, j, are shown in table 1. The  $p_{l(i \Rightarrow j)}$  are computed assuming that (i) the primary inputs are spatially and temporally independent [12], (ii) each input is a logical 1 with a given probability, and (iii) the bus width is 8. The method of computing these values is explained in section 3.1. For simplicity we assume that the  $p_{l(i \Rightarrow j)}$  are identical for all lines l. From table 1, it can be seen that when the data transfer b follows data data transfer e on a bus, the switching activity on line  $l(p_{l(b \Rightarrow e)})$  is 0.38. The sequence of data transfers mapped onto a bus in turn determine the total expected switching activity  $(SA_{I}^{k})$  on all lines in the bus. The maximum of all these switching activities is obtained as  $SA_{max} = \max_{\forall l,k} SA_l^k$ .

If the maximum number of clock cycles required for one computation of the CDFG is  $T_{max}$ , then the maximum switching activity of any line in the circuit is

|   | а    | b    | С    | d    | е    | f    |
|---|------|------|------|------|------|------|
| a | 0.49 | 0.46 | 0.42 | 0.41 | 0.40 | 0.49 |
| b | 0.46 | 0.49 | 0.50 | 0.49 | 0.38 | 0.50 |
| с | 0.42 | 0.50 | 0.49 | 0.49 | 0.48 | 0.50 |
| d | 0.41 | 0.49 | 0.49 | 0.50 | 0.51 | 0.49 |
| е | 0.40 | 0.38 | 0.48 | 0.51 | 0.51 | 0.51 |
| f | 0.49 | 0.50 | 0.50 | 0.49 | 0.51 | 0.49 |

Table 1:  $p_{l(i \Rightarrow j)}$ ,  $\forall i, j$  for the CDFG in figure 1.

 $SA_{max}/T_{max}$ . For a fixed line geometry and clock frequency,  $SA_{max}/T_{max} \propto J$ .

To clarify these ideas, consider the schedule in figure 2 (a). The switching activity on bus 1 is caused by the transitions  $a \Rightarrow c, c \Rightarrow d$  and  $d \Rightarrow a$ . The average switching activity on bus 1 is  $\frac{0.42+0.49+0.41}{3} = 0.44$  and on bus 2 is  $\frac{0.38+0.38}{3} = 0.25$ . Thus, bus 1 is more susceptible to electromigration than bus 2. Similarly, the average switching activity on buses 1 and 2 in figure 2 (b) are 0.41 and 0.32, respectively. Assuming a length of  $4.5 \cdot 10^4 \mu$ , a width of  $1\mu$  and a thickness of  $0.1\mu$ , the microarchitecture corresponding to figure 2(b)  $(MTF = 4.6 \cdot 10^5 hours)$  is more tolerant to electromigration-induced failures when compared to the microarchitecture corresponding to figure 2(a)  $(MTF = 2.3 \cdot 10^5 hours)$ .

The total energy dissipated on the buses in a microarchitecture during one CDFG computation is:

$$SA_{tot} = \sum_{\forall l,k} SA_l^k \tag{3}$$

Consequently, the buses in microarchitecture corresponding to the figure 2(a)  $(SA_{tot} = 16.64)$  dissipate less energy when compared to the microarchitecture corresponding to the figure 2(b)  $(SA_{tot} = 17.76)$ . In the case, when buses are of different lengths,  $SA_{tot}$  can be computed by weighting the switching activities with the lengths (or capacitances) of the buses.

## 3 The Algorithm

The proposed algorithm consists of two stages namely (a) activity determination and (b) simultaneous scheduling and binding.

#### **3.1** Activity Determination

The activity determination stage accepts as its inputs (i) the CDFG, (ii) the bus width and (iii) the primary input signal probabilities. The values  $p_{l(i \Rightarrow j)}$ , are computed by repeatedly simulating the CDFG.

The primary input signal probabilities are generated by assuming that they are spatially and temporally independent [12]. For each set of primary inputs, the signal values for all data transfers in the CDFG are computed by simulating the CDFG. For every possible pair of data transfers i and j in one CDFG simulation, a bit flip on line l during the transition  $i \Rightarrow j$  is determined by " $\oplus$ "ing the bit l in the signal values of i and j. The CDFG simulations are terminated when the average number of bit flips for all data transfer pairs converge.

For DSP applications, such simulations can be done very quickly because only simple operations such as additions and multiplications are required to generate the values for the data transfers. Suppose n is the number of operations (or data transfers) in the CDFG then  $n^2$  is the number of possible data transfer pairs. If S is the number of simulations required for convergence, then the total time required to compute  $p_{l(i\Rightarrow j)}$  for all possible data transfer pairs i and j is given by  $O(S \cdot n^2)$ . It is feasible to simulate the CDFG to compute  $p_{l(i\Rightarrow j)}$ by taking into account the correlations between the signals i and j. Therefore,  $p_{l(i\Rightarrow j)}$  can also be used for a quick estimation of the MTF or energy dissipated on the buses for a given schedule.

## 3.2 Scheduling and Binding

The inputs to this step are (i) the CDFG, (ii)  $p_{l(i \Rightarrow j)}$ , (iii) the number of buses and functional units available for binding and, (iv) the execution times of the operations on the available functional units. The algorithm synthesizes a high-reliability, low-energy microarchitecture subject to the constraints on the area (number of functional units and buses) and delay. The overall structure of the algorithm is given in figure 3.



Figure 3: Simultaneous scheduling and binding

Let AREA and DELAY be the constraints on the area and delay, respectively, for the microarchitecture to be synthesized. The delay is determined as the total number of clock cycles required to execute the CDFG on the synthesized microarchitecture, and is denoted by  $T_{max}$ . If EM-related failures is the objective then  $OBJ_{curr} = MTF$  for the microarchitecture investigated. Similarly, if energy dissipation is the objective, then  $OBJ_{curr} = SA_{tot}$ . Initially, a microarchitecture

is synthesized ensuring that the area and performance constraints are met. Subsequently, this microarchitecture is incrementally refined without violating the area or performance constraints, to optimize  $OBJ_{curr}$ .

## 4 Results

The results of the algorithm on three high level synthesis benchmarks (FIR, AR and Elliptic filter) are discussed [5]. The addition and multiplication operations are assumed to require one clock cycle. Each bus in the synthesized microarchitectures contains 8 lines. Each line of the primary input is spatially and temporally independent and has a signal probability of 0.5. The length, width and thickness of the metal lines are assumed to be  $4.5 \cdot 10^4 \mu$ ,  $1\mu$  and  $0.1\mu$ , respectively [3]. The temperature was assumed to be  $85^{\circ}$ C.

|       | # of  | Perf       | Energy       | $\mathbf{BEM}$ | $\mathbf{WEM}$      | BERT-EM     |
|-------|-------|------------|--------------|----------------|---------------------|-------------|
| Fltr. | buses | $T_{max}$  | SAtot        | MTF            | MTF                 | MTF         |
|       |       | (сус)      | (Act.)       | $(10^5 h)$     | (10 <sup>5</sup> h) | $(10^5 h)$  |
|       | 1     | 25         | 58.3         | 0.90           | 0.55                | 1.2         |
|       | 2     | <b>2</b> 0 | 62.1         | 1.51           | 0.54                | 1.4         |
| FIR   | 3     | 22         | 63.5         | 1.73           | 0.68                | 1.7         |
|       | 4     | 22         | 80.1         | 1.62           | 0.79                | 2.2         |
|       | 5     | <b>2</b> 0 | 78.2         | 1.84           | 0.60                | 2.3         |
|       | 6     | 19         | 72.3         | 1.24           | 0.57                | 1.4         |
|       | 1     | 35         | 73.1         | 2.9            | 0.69                | 3.1         |
|       | 2     | 27         | 80.4         | 3.27           | 0.91                | 4.2         |
| AR    | 3     | 21         | 78.6         | 2.80           | 0.86                | 3.3         |
|       | 4     | 19         | 88.4         | 2.31           | 1.24                | 3.2         |
|       | 5     | 18         | 77.3         | 2.66           | 1.06                | 3.4         |
|       | 6     | 17         | 77.0         | 2.79           | 1.16                | 3.5         |
|       | 1     | 36         | 133.3        | 1.9            | 0.55                | 2.5         |
|       | 2     | 36         | 134.7        | 3.27           | 1.24                | 3.1         |
| Elpt. | 3     | 35         | 126.3        | 4.11           | 1.06                | 3.9         |
| _     | 4     | 34         | 145.7        | 4.11           | 1.24                | 4.0         |
|       | 5     | 38         | <b>140.7</b> | 5.71           | 1.53                | 4.5         |
|       | 6     | 33         | 132.6        | 7.16           | 1.54                | 6.5         |
| Avg.  |       | 26.5       | 94.58        | 2.83           | 0.93                | <b>3.</b> 0 |

Table 2: Results on benchmarks after optimizing for EM-reliability

The results of optimizing for EM-reliability are sum-marized in table 2. "Perf" denotes the performance of the reliability-optimized microarchitecture, and "Energy" denotes the energy dissipated on the buses in the microarchitecture. "BEM" and "WEM" stand for the best and worst MTF values of the microarchitectures investigated by our algorithm. The MTF values have units of  $10^5$  hours. The MTF values for the best-reliability microarchitectures are on an average 2.9 times more than the MTF of the worst-reliability microarchitectures. This is an indicator of the large range of reliability values possible for microarchitectures synthesized using the proposed method of suitably altering the sequence of data transfers on buses. Thus, the proposed method presents a high scope for reliability optimization during microarchitecture synthesis. An increase in the number of buses reduces both the total switching activity for one CDFG computation in the lines in the circuit and  $T_{max}$ . Thus, the variation of the MTF as a function of the number of buses is uncertain.

The MTFs of the reliability-optimized layouts of our circuits obtained using the circuit level reliability simulator (BERT) [2] are shown in in column 7 of the table 2. The layouts of the circuits are simulated for 500 clock cycles. The primary inputs which are spatially and temporally independent are generated with a probability of 0.5.

|       | # of  | Perf      | EM                      | BEnergy    | WEnergy       |
|-------|-------|-----------|-------------------------|------------|---------------|
| Fltr. | buses | $T_{MAX}$ | MTF                     | SATOT      | SATOT         |
|       |       | (cycles)  | (10 <sup>5</sup> hours) | (activity) | (activity)    |
|       | 1     | 19        | 0.80                    | 58.3       | 88.1          |
|       | 2     | 20        | 1.12                    | 56.7       | 89.8          |
| FIR   | 3     | 18        | 1.06                    | 55.1       | 88.9          |
|       | 4     | 17        | 1.09                    | 54.4       | 89.3          |
|       | 5     | 18        | 1.24                    | 54.5       | 89.6          |
|       | 6     | 16        | 1.20                    | 52.1       | 90.1          |
|       | 1     | 35        | 2.79                    | 73.1       | 119.3         |
|       | 2     | 19        | 1.42                    | 75.3       | 119.5         |
| AR    | 3     | 17        | 1.33                    | 73.7       | 119.3         |
|       | 4     | 21        | 1.59                    | 74.0       | 119.2         |
|       | 5     | 19        | 1.94                    | 72.1       | <b>121.</b> 0 |
|       | 6     | 17        | 2.21                    | 68.2       | 120.0         |
|       | 1     | 36        | 1.24                    | 133.3      | 164.7         |
|       | 2     | 36        | 2.42                    | 125.6      | 168.8         |
| Elpt. | 3     | 31        | 2.11                    | 122.6      | 171.4         |
|       | 4     | 34        | 2.21                    | 124.9      | 174.3         |
|       | 5     | 38        | 3.87                    | 122.0      | 167.7         |
|       | 6     | 31        | 4.66                    | 120.7      | 181.9         |
| Avg.  |       | 24.6      | 1.91                    | 84.22      | 126.8         |

Table 3: Results on benchmarks when optimized for energy.

The MTF of the best-reliability microarchitecture is 48% more than that of the MTF of the best-energy microarchitecture (Table 3). This indicates that minimizing energy does not necessarily maximize MTF. The values indicate a strong correlation between the estimated and simulated values. The high-level metric proposed is therefore, a dependable indicator of the electromigration-induced failure in the circuit. The best-reliability schedule obtained and its corresponding microarchitecture is shown in figure 2. The MTF and the energy values of the best-energy microarchitectures synthesized using the proposed algorithm are shown in table 3 in columns 4 and 5.

## 5 Conclusion

We proposed a technique to synthesize high-reliability low-energy microarchitectures at the RT level. Comparisons on benchmarks indicate that an average of 66% decrease in electromigration failures can be achieved between the best reliability and worst reliability designs. Moreover, on an average, the MTFof the best-energy microarchitecture was 48% worse than the MTF of the best-reliability microarchitecture. Furthermore, microarchitectures optimized for electromigration and microarchitectures optimized for power dissipation do not coincide. Finally, circuit level reliability simulations using BERT [2] validate the goodness of the proposed metric for electromigration induced reliability.



Figure 4: (a) The gantt chart and (b) the data path for the best-reliability solution for the FIR filter.

### References

- W. Bowhill et al, "Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU", Digital Technical Journal, Vol 7, No. 1, 1995.
- [2] C. Hu, "The Berkeley Reliability Simulator BERT: an IC Reliability Simulator", Microelectronics Journal, 23, pp. 97-102, 1992.
- [3] A. Christou, "Electromigration and electronic device degradation", John Wiley and Sons, 1994
- [4] J. Clement, E. Atakov and J. Lyoyd, "Electromigration Reliability of VLSI Interconnect", Digital Technical Journal, Vol 4, No. 2, 1992.
- [5] A. Dasgupta and R. Karri, "High Reliability Low Energy Microarchitecture Synthesis", Technical Report, Dept of Elec. and Comp. Engg., Univ of Mass., 1995.
- [6] A. Dasgupta and R. Karri, "Hot-Carrier Reliability Enhancement via Input Reordering and Transistor Sizing", Proceedings of DAC, 1996.
- [7] J. E. Hall et al, "SPIDER A CAD system for modeling VLSI metallization patterns", Proceedings of ICCAD, Nov, 1986.
- [8] T. S. Hohol and L. A. Glasser, "RELIC: A reliability simulator for integrated circuits", in Proceedings of IEEE ICCAD, 1986.
- [9] B. Liew, N. Cheung, and C. Hu, "Projecting interconnect electromigration lifetime for arbitrary current waveforms", IEEE Transactions on Electron Devices, vol. 37, no 5, May 1990
- [10] J. A. Maiz, "Characterization of electromigration under bidirectional (BC) and pulsed unidirectional (PDC) currents", in Proceedings of IRPS, 1989.
- [11] F. N. Najm, R. Bursch, P. Yang, I. Hajj, "CREST A current estimator for CMOS circuits", Proceedings of IC-CAD, 1988.
- [12] F. N. Najm, "Transition Density, A Stochastic Measure of Activity in Digital Circuits", 28th ACM/IEEE Design and Automation Conference, pp 644- 649, 1991.
- [13] K. Roy and S. Prasad, "Logic Synthesis for Reliability -An Early Start to Controlling Electromigration and Hot Carrier Effects", Proceedings of EDAC, 1994.