# Efficient Early Stage Resonance Estimation Techniques for C4 Package<sup>\*</sup>

Jin Shi<sup>1</sup>, Yici Cai<sup>1</sup>, Shelton X-D Tan<sup>2</sup>, Xianlong Hong<sup>1</sup>

<sup>1</sup> Department of Computer Science and Technology, Tsinghua University, Beijing, P.R.China, 100084 Tel: +86-10-62785564
e-mail: shi-j03@mails.tsinghua.edu.cn caiyc@mail.tsinghua.edu.cn hxl-dcs@mail.tsinghua.edu.cn

Abstract - In this paper, we study the relationship between C4 package resonance effects and logical switching timing correlations, which has not been thoroughly investigated in the past. We show that improper logic designs with some special timing correlations can lead to adverse large voltage drops, which are due to resonance effects in the widely used C4 package. We first present the numerical analysis results on industry C4 package circuits to demonstrate resonance phenomenon. Then we propose a simple algorithm to compute the worst-case logical timing correlations among cells leading to resonance. Finally, we develop an efficient technique in early logic design stage to estimate the resonance risk. Experiment results demonstrate the effectiveness of the proposed method for the accurate prediction of the resonance effect in C4 package.

# I. Introduction

As the power consumption of modern highperformance VLSI chips is increasing rapidly, reliable on-chip power delivery and associated verification methods have become major design challenges. On top of this, wire-bond based package is becoming more difficult to accommodate the large quantities of I/O pins and associated power dissipation, which will introduce obvious package resistance and inductance. Flip-chip packages such as C4 (Controlled Collapse Chip Connection) are becoming widely used in highperformance power-hungry designs because of their superior electrical performance [1][2]. However, C4 package may suffer large voltage drops due to package resonance caused by improperly designed logical switching timing correlations. Traditionally, the design and optimization of packages are independent of the design and optimization of power delivery networks as well as the logics (their timing). This separation in design flow may cause potential problems such as resource-induced drops due to the strong interplay among package, power delivery networks and logical timing. To the best knowledge of the authors, only a few research

<sup>2</sup> Department of Electrical Engineering, University of California at Riverside, CA 92521, USA e-mail: stan@ee.ucr.edu

works are reported to consider package and power/ground network co-design [3][4], seldom research is reported on logic-package co-design. The relationship between the package electrical characteristic and logic timing was meagerly investigated in the past.

In this paper, we first perform some numerical analysis on real industry C4 package circuits in frequency domain. We show that that if the dynamic current flowing through a C4 bump happens to contain special frequency harmonics, the voltage drop on the package will become significant due to adverse resonance effects. Further more, if resonance happens in a local area, based on the 'locality' property of C4 package [5], the power supply in this area will suffer large drops.

Based on our resonance analysis on the C4 package, we further investigate the conditions for generating the required resonance-induced harmonics in the dynamic currents. Our study shows that certain logic switching timing correlations among different gates or cells can lead to such resonance-induced harmonics. The significance of such study is that we need to estimate the resonance risk in the early design stage (logic design stages) to avoid the resonance risk. To the line, we develop an efficient technique to estimate resonance in the early logical design stage. Our experimental results validate the proposed method. This paper is organized as follows: Section II introduces an industrial C4 package model and its frequency response. Section III analyzes the 'locality' property of C4 and provides a power supply failure example caused by the resonance effects. Section IV discusses the conditions for the resonance in package. Section V presents a fast estimation method. Experiment results are introduced in Section VI while Section VII summarizes the paper and presents forthcoming studies.

# II. Frequency Response of C4 Package Model

In this paper, we use a C4 package model provided by

<sup>&</sup>lt;sup>\*</sup> This work is supported by National Natural Science Foundation of China (NSFC) 60476014, National Hi-tech R&D Program of China 2005AA1Z1230, and Foundation of Intel Corporation



Fig. 3. Transient analysis of resonance

our industry partner to discuss resonance problem. Figure 1 demonstrates a lump model for a two metal layer C4 package. In this model, each bump on the die connects to a package pin through two metal layers and one via. In Figure 1,  $L_{top}$  ,  $R_{top}$  ,  $L_{bot}$  and  $R_{bot}$  represent the inductance and resistance of metal trace in the package while  $L_{via}$  and  $R_{via}$  represent the inductance and the resistance of the via in the package. There also exist package decoupling capacitors, which are represented by  $C_{pkg}$  ,  $R_{cpkg}$  ,  $L_{cpkg}$  . Here we treat our package model as a four port RLC network, because port1 and port2 will connect to voltage regulation model (VRM), which has very low internal resistance. We can ground port 1 and 2 and then obtain the package frequency response from port 3 and 4. The response is shown in Figure 2. From Figure 2, we can see that the frequency response has a single-peak around 550 Mhz, which indicates that if the current flowing from port 3 to port 4 (bump current) contains large amplitude harmonics around this frequency, even only 100mA in amplitude, the voltage drop on the package can be as large as 0.6 V. Figure 3 shows a transient analysis result when the bump current is similar to a sine wave having resonance frequency. In this situation, the voltage drop is obvious.

# III. Locality and Power Supply Failure

In Section II, we observed that if the bump current contains the significant harmonics which are close to the resonance frequency of the package model, the voltage drop on the package would become significant. However, what will happen if package sees such current in a local area? We use a 7-layer power/ground grid with vias for a test. The circuit model used is shown in Figure 4. The test analyzes a 300um x 300um local area with 1.5V voltage supply through a C4 package with dissipation of 20-watt power. The result is that we have local resonance in C4 package as shown in Figure 5. We observe that when drop on the package is large in area A on M7, the voltage supply in area B right below A on M2 is large as well. We find that circular area B is larger than circular area A with the ratio of radius B to A equaling or less than 1.5.



Fig. 4. Resistance model of 7 layer grid with vias



Fig. 5. Resonance in local area



Fig. 6. Voltage Distribution on Different Layer

The reason for this phenomenon is due to locality [5]. Usually, C4 package can provide a relative clean voltage supply because of the abundance of the power/ground around any small blocks, each of which can be supplied sufficiently and separately. However, if one area happens to suffer large droop, it is less likely to get power supply from an adjacent area. This explains why the package may suffer a power failure in local area. Further, if we draw the voltage distribution on different metal lagers in Figure 6, we can see that the distribution of lowest voltage points is almost the same in each layer.

Another interesting point in Figure 6 is that as the layer goes down from M7 to M2, the minimal voltage value in a certain layer increases. However, this improvement is not obvious above M6, which indicates that the 'locality' highly depends on the density of vias in each layer. In M7 and M6, because vias are sparse, voltage potential between two adjacent points can be relatively large. Therefore, points, which suffer low voltage, are more likely to get compensation from their neighbors. However, as vias become dense in M5-M2, potential between any two local points becomes smaller, so current almost flows from a vertically through vias, which suggests that the current flow direction is the main reason of the 'locality'.

#### IV. Resonance Caused by Logical Correlation

In Section II, we showed that current with special harmonic components on the bump could cause the package resonance. But we don't analyze the conditions for generating such currents. In this section, we show such conditions can be linked to certain logical correlations among different gates or cells.



Fig. 7. Basic TW current waveform



Fig. 8. Geometry relationships of discussing elements

Usually, the working frequency of logical cells is magnitudes higher than the resonance frequency of the package, in turn high frequency current typically will not cause resonance problem. This is the case especially when sufficient decoupling capacitors are placed in the package. On the other hand, the logical timing correlation among cells may generate much lower frequency components for potential resonance. Let's examine how the timing correlation can cause resonance.

First, we assume that the current generated by gates has trapezoidal waveform (TW) and the duty ratio is 50%, as shown in figure 7. This situation is common in digital units, especially in synchronous logical units. Secondly, we define a local area and have some logical cells in the area shown in Figure 8. Also we assume each cell contains many gates, and gates in the same cell share the same current waveform. Thirdly, we assume all gates are controlled by a synchronous clock while gates in different cells can have phase deviation either 0 degree or 180 degree. Lastly the clock frequency  $f_{clk}$  should be higher

than the package resonance frequency  $f_{res}$  or at least, equal to it.

Then we can obtain an algorithm in Figure 9 which can give out a worst logical correlation for resonance problem. The main idea of this algorithm is that we arrange the on/off timing of the cells so that the resulting currents will be similar to a sinusoidal wave including the resonance frequency.

Specifically, we first divide cells in two classes, one has 0 degree phase deviation and the other one has 180 degree phase deviation. Then we can set certain number of cells active according to the estimation value, which is obtained from the minimal consumption current of all the cells at a certain time step. If we have Eq.(1) satisfied, we can have sufficient cells to undertake allocation, which means that after adding the currents of each cell together, a sinusoidal-like waveform current can be generated, as shown in Figure 10.

$$k \le \left[\frac{A}{\min\left\{IC_{i}\right\}}\right] \le \frac{N}{2} \tag{1}$$

The notations in the algorithm are explained below:  $T_{res}$ : the reciprocal of package resonance frequency  $f_{res}$ 

$$T_{clk}$$
: the clock cycle time ;  $\sigma = \left[\frac{T_{res}}{2T_{clk}}\right]$ ;

N : the number of cells

A: the tolerance of harmonic amplitude around  $f_{res}$ 

 $IC_i$ : the current when all gates in cell i is turned on by clock

k: sample ratio factor, an integer bigger than 1 and less than  $\sigma$ 

Here, variable k is used to control the similarity between an ideal sinusoidal wave and the actual current waveform we can obtain. Because the total current only has a positive part, it contains dominant harmonics around its secondary harmonic frequency that is equal to the package's resonance frequency. Therefore, according to 'locality' property, resonance will occur on package in this area. As an example, Figure 11 illustrates a simple timing correlation among 4 cells. The rising edge of cell A causes one active cycle of cell B and C, then cell D. We can observe that the total current on bump is very similar to a sine wave.

In theory, the timing correlation generated by this algorithm is the worst one because the amplitude of harmonics around the resonance frequency can be set as large as possible. In practical, situations are more complicated than the worst case discussed in our algorithm. The reason is that Eq.(1) may not be satisfied, also decoupling effect should be considered, which can alter the resonance frequency. However, we believe that the actual risk exists in logic design and we should develop methods to estimate the resonance risk.

## V. Fast Resonance Estimation in Early Logic Design

Complex logical subsystems, such as 'clock gating' controller [6], bus controller, memory controller and DMA controller, usually contain many function cells; therefore, they are more likely to have resonance problems. However, in today's design flows, designers typically ignore the resonance risk due to lack of efficient estimation tools for the resonance verification. In this section, we present a novel technique to help logic designers estimate the resonance risk. Before introducing our algorithm, let's define some parameters:

R: logical correlations described in regular expression

N: cells in a local area

D: number of cycles of a time window

k: a sample ratio factor, which is the same as defined in section IV

#### Algorithm Name: resonance trigger Input: time window, A, k, N, *IC*.

Output on/off matrix T

1) divide all the cells into two group according to their phase derivation

2) map the time variable t within the time window we concerned into a standard area ranging from 0 to  $\pi$ 

3) use sample ratio factor k to get a vector containing k elements each of which is equal to  $X_j$ 

$$x_j = \frac{J}{\pi} \quad 0 \le j \le k - 1$$

4) estimate the maximal number of cells which should be activated at time step jaccording to formula below

$$f_j = \left\lfloor \frac{A\sin(x_j)}{\min\{IC_i\}} \right\rfloor$$

5) generate a zero matrix T which contains N rows and k columns

6) for each time step j, activate half of  $f_j$  cells in each cell group, which means to set T(i,j)=1 in T matrix 7) output matrix T

#### 1

# Fig. 9. Resonance trigger algorithm



Fig. 10. Total current of all cells



Fig.11. Logical correlation to cause resonance

T: an Nxk zero matrix

H(f, a): package frequency response function, where input parameter *f* represents harmonic frequency while *a* represents the harmonic amplitude; the out put of this function is the voltage droop on package under *f* and *a*.

 $\{\alpha_1 \cdots \alpha_N\}$ : a vector to computing current on vias

 $\{\beta_1 \cdots \beta_N\}$ : a vector to compensate blank area effect

 $f_{res}$ : the resonance frequency

 $f_{clk}$ : the clock frequency

Our algorithm accepts a given logics as the inputs but requires a regular expression description for the timing correlation which is defined in Eq.(2). It can represent any logic timing correlation among different cells. For instance, Eq. (3) below reveals that cell 13 will be turned on 5 clock cycles after cell 12's raising edge and 6 clock cycles after cell 11's raising edge and it will be always turned on at cycle 5 and 6.

| $C[0-9]^* \{[0-9]^*+\} \{C[0-9]^*+[0-9]^*\}$ | (2) |
|----------------------------------------------|-----|
| C13 5+6+ C12+5 C11+6                         | (3) |

Then we can have the estimating algorithm named FFT (Fast Fourier Transformation) estimator in Figure 12. This algorithm is mainly consisted of three parts. First, it attempts to construct the actual current waveform of all cells according to logic timing constraints in a given time window. Second, it computes the frequency spectrum of each cell's current waveform and compensates decoupling effect and blank area effect via FFT method. Finally, all the modified spectrums are added together, and the estimation of package drop is made according to package's frequency response.

Usually, decoupling capacitors are placed in each cell, and they will make the current waveform of supply nodes different from the waveform of the current sources connected to it at a certain time point. Here we assume each current source in the cell shares the same on/off waveform. This is because all gates in a cell are usually controlled by the same clock signal. Then we can obtain our decoupling model shown in Figure 13(1). Remember that because of the 'locality', current flows almost directly from via to via without diverging. Therefore, Figure 13(1) can be simplified to Figure 13(2). Now we can estimate the equivalent resistance of the power grid and ground grid to get Figure 13(3) (according to 'locality', the sum of via resistance from M7 to M3 can be a roughly estimated as  $R_P$  and  $R_G$ ). Finally, a linear transform can be performed to acquire the coefficient  $\alpha$  in order to get the current flowing through via resistor  $R_{\rm u}$ .

Another effect, which should be compensated, is blank area effect. A blank area is the area that contains decoupling capacitors but is not within the cell boundary. In our algorithm, we only consider decoupling capacitors in each cell, and add all current together to do estimation. However, blank area exists in place where decoupling capacitors also are placed. Therefore, when actual current is flowing through these blank areas additional current is supplied by these decoupling capacitors. This is why we use a coefficient  $\beta$  to compensate this effect in our algorithm. The simplest way to estimate  $\beta$  is to use the area ratio of blank area to sum up all cell areas when decoupling capacitors are evenly distributed.

One advantage of this algorithm is that in early design stage, logic correlations, total decoupling value as well as via resistance are easy to obtain from both product specification and experiences of the existing products. Also, the compensation parameters in the algorithm are relatively easy to compute for C4 package model. Further, because the number of cells in a local area is usually small and the timing window is less likely to be very wide, the computation cost of this algorithm is very low.

## VI. Experimental Results

We implement our algorithm using C++ language and test it under a 1Ghz Linux workstation with 512MB memory. Table I gives the performance of our algorithm when estimating package drop considering a 7-layer P/G grid with decoupling capacitors in a local area. Here we pass the timing information constructed by our algorithm to Hspice to produce a comparison. From the table, we can see that our algorithm can give a valid estimation of the droop on package when resonance is likely to happen while it is less accurate when current contains harmonics far away from resonance frequency. However, comparing with [7]-[10], which tries to get more accurate solution in dynamic simulation, accurate is not a serious problem

#### Algorithm Name: FFT estimator

Input: R, N, D, T, k,  $H(f, a) \{\alpha_1 \cdots \alpha_N\} \{\beta_1 \cdots \beta_N\}$ 

Output estimation of droop on the package

- 1) partition all regular expression to two tuples (Ci, Cj+m) or (Ci. t)
- 2) do logical assignmentuntil no changes in T happen, that is if a check process find that the column j of matrix T exist nonzero elements, it begin to check whether tuple(Ci, Cj+m) exists, if it is true, then it set element T(i,j+m) to 1. Also, if tuple (Ci, t) exists, it set T(i,j) to 1 directly.
- 3) Transform each row of matrix T to a time domain vector. This process replace any nonzero element of T, i.e., T(i,j) with a nonzero vector of length k, which is got by sample the trapezoid waveform of cell i using k time points while replace any zero element of T with a zero vector of length k
- 4) After 3) we can get a new Nxp matrix T1, here p is the total number of sample points. Do Fast Fourier Transform (FFT) on T1 to get spectrum matrix Z.
- 5) After 4),|Z(i, j)| represents the harmonic amplitude of cell i at frequency  $j \cdot f_s$ , where  $f_s = \frac{f_{clk}}{D}$ 6) for each element of Z, i.e, Z(m, n), multiply facto $\alpha_m(2\pi n \cdot f_s)$
- to compensate decoupling effect
- 7) for each row in matrix Z, i.e. Z(m,n), multiply factor  $\beta_m$  to compensate blank area effect.
- 8) sum all the row vector of Z to get a 1xp vector S, then divide S by the number of bumps in the area

$$\max \{H(j \cdot f_s, |S(j)|\} \quad j \cdot f_s \in \left[\frac{4f_{res}}{5}, \frac{5f_{res}}{4}\right]$$
(4)  
10)output(4) as result

Fig. 12. FFT estimator algorithm



because we are concerned with the resonance risk. Therefore, it remains an algorithm in early estimation

with low computation complexity.

# VII. Summary and Future Work

In this paper, we analyzed the resonance effect and the timing correlations of logic cells in the C4 package. We proved that the resonance effects do exist in the local area of C4 package. We then analyzed the conditions for generating the resonance effects from logic cell's timing correlation perspective. We consequently proposed an efficient algorithm to perform the early estimation of resonance effects. The experiment results show that the proposed algorithm can predict the resonance risk very well while the run time is reasonable.

The resonance problem we studied is quite different from the usual P/G droop problems; it is impossible to overcome by simply allocating more metal resources. To remove or reduce the resonance effects, further investigations are needed. For instance, one method is to optimize the placement of decoupling capacitors to minimize the harmonics amplitude around resonance frequency at bump nodes.

Also the simple resonance trigger algorithm only can give an ideal correlation among cells to cause resonance.

Anticipated research can include further investigations on how to give more actual constraints to help logical designers.

## Acknowledgements

The author would like to thank Dr. Eli. Chiprout from Intel Strategy CAD lab for his great help to this research work. He gave a lot of insightful suggestions and we've learned a lot from weekly discussions.

#### References

- [1]. "Performance characteristics of IC packages", Intel Corp., 2000 http://www.intel.com/design/packtech/ch 04.pdf
- [2]. D. Tönnies, "A review and trends in flip-chip technology", Chip scale review, April 2004.
- [3]. A. Dubey, "P/G pad placement optimization: problem formulation for best IR droop", ISQED 2005 Proceeding, pp 340-345
- [4]. N. Srivastava, X. Qi, K. Banerjee, "Impact of on-chip inductance on power distribution network design for nanometer scale integrated circuits", ISQED 2005 Proceeding, pp 346-351
- [5]. E. Chiprout, "Fast flip-chip power grid analysis via locality and grid shells", ICCAD 2004 Proceeding, pp 485-488
- [6]. H. Li, S. Bhunia, Y. Chen, T.N. Vijaykumar, K. Roy, "Deterministic clock gating for microprocessor power reduction," Proceedings of the The Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), pp 113.
- [7]. T. Chen and C. C. Chen: "Efficient large-scale power grid analysis based on preconditioned Krylov-subspace iterative methods", DAC2001 Proceedings, pp. 559-562
- [8]. W. Guo, S. X. D. Tan: "Circuit level alternation-directionimplicit approach to transient analysis of power distribution networks", International Conference on ASIC Proceedings, 2003, Beijing, pp. 246-249
- [9]. H. Qian, S. R. Nassif, S. S. Sapatnekar: "Random walks in a supply network", DAC2003 Proceedings, pp 93-98
- [10]. J. N. Kozhaya, S. R. Nassif and F. N. Najm: "A multigridlike technique for power grid analysis", IEEE Trans. Computer-Aided Design, vol.21, no.10, Oct. 2002, pp 1148-1160

| Cell                                                                                 | Area size | Area Time | Time   | Total | Worst Droop on Package |        |        |           | Relative |            |
|--------------------------------------------------------------------------------------|-----------|-----------|--------|-------|------------------------|--------|--------|-----------|----------|------------|
| Num                                                                                  | um x um   | Decap     | Window | Power | Hspice                 | Run    | FFT    | Run       | Error of |            |
|                                                                                      |           | р         | pr/um  |       |                        |        | Time   | estimator | Time     | Estimation |
| resonance is less likely to happen, contains harmonics away from resonance frequency |           |           |        |       |                        |        |        |           |          |            |
| 20                                                                                   | 100x100   | 400       | 30 ns  | 10W   | 8 mv                   | 11 s   | 5 mv   | <2 s      | 54%      |            |
| 45                                                                                   | 300x300   | 400       | 30 ns  | 20W   | 13 mv                  | 26 min | 7 mv   | <2 s      | 46%      |            |
| resonance is likely to happen, contains harmonics near resonance frequency           |           |           |        |       |                        |        |        |           |          |            |
| 20                                                                                   | 100x100   | 100       | 30 ns  | 10W   | 0.43 v                 | 11 s   | 0.38 v | <2 s      | 11.6%    |            |
| 45                                                                                   | 300x300   | 100       | 30 ns  | 20W   | 0.46 v                 | 26 min | 0.49 v | <2 s      | 6.5%     |            |

TABLE I Performance of FFT Estimator Algorithm (Obtained in a 1Ghz Linux Workstation with 512MB Memory)