# Transition Skew Coding: A Power and Area Efficient Encoding Technique for Global On-Chip Interconnects

Charbel J. Akl and Magdy A. Bayoumi

The Center for Advanced Computer Studies (CACS) University of Louisiana at Lafayette Lafayette, LA 70504 {cja3455, mab}@cacs.louisiana.edu

Abstract - Global signaling is becoming more and more challenging as technology scales down toward the deep submicron. We propose a new bus encoding technique, transition skew coding, that targets many of the global interconnects challenges such as crosstalk, peak energy and current, switching and leakage power, repeaters area, wiring area, signal integrity and noise. Simulations are done on different bus lengths using a 90nm library. Repeaters sizing and spacing are optimized, and the proposed encoded bus is compared against a standard bus and a bus with shields inserted between every two wires. The encoding and decoding latencies are also analyzed. Simulations show that transition skew coding is efficient in terms of energy and area with low encoding and decoding latency overhead.

#### I. Introduction

The continuous scaling of CMOS technology has led to reduced gate delay, increased integration density, and less energy per transition [1]. On the other hand, several new design issues related to global signaling, leakage power, and power management and distribution have raised [2]. As global wires dimensions decrease, the RC product which determines the delay per unit length increases. Therefore, global interconnects become a limiting factor on the frequency of operation. Also, as interconnects become denser and the aspect ratio increases, the coupling capacitance between neighboring wires increases and becomes a significant portion of the total capacitance. As a result, the delay and power of a wire become strongly dependent on the switching behavior of its neighbors, and wires become more susceptible to noise due to the charge injected in quiet wires by neighboring switching wires. Shielding and spacing are commonly used approaches to avoid capacitive coupling [3]. Also, shields insertion is an effective technique to reduce inductive effects [4]. However, these techniques affect the integration density leading to an increase in the number of metal layers and manufacturing cost. Power is another critical issue for global interconnects. Global buses contribute to a large portion of the total chip power due to the large capacitances charged and discharged every time a transition occurs. Beside the average power, the peak energy that is caused by the worst switching behavior, which happens when all the bus wires switch simultaneously, is important for and thermal management [5]. reliability Moreover, interconnects peak current, which affects power lines integrity, is becoming significant as chips are getting more complex and the number of global wires increases. Another source of power consumption that is growing in importance is repeaters leakage currents. Repeaters and buffers take around 50% of

the total devices width on chip [6]. Therefore, the leakage currents flowing through these repeaters are significant and contribute to most of the chip leakage currents. Also, as the number of metal layers increases from generation to generation, wiring and devices area become an important issue. It was projected that the number of repeaters in a high performance microprocessor will be in the range of 1-2 million for the 70nm technology node [7], and it was estimated that 70% of the total cell count at the 32nm node will be due to repeaters [8]. Reducing the number of repeaters without affecting the signal integrity and reducing the number of wires are other challenges.

A great deal of work has been done to overcome interconnects problems. Differential current-sensing that is based on signaling in current was proposed [9]. This technique suffers from static power, electro-migration and self-heating, and it has large wiring area overhead. The authors in [10] propose low swing signaling for low power. Reducing the swing has significant performance penalty, reduced noise margin, and it requires an extra power grid. A transition encoded dynamic bus technique was proposed in [11]. This technique requires two global transitions for every input transition leading to increased power dissipation; also it is more sensitive to noise than the static bus. Wave pipelining [12] increases the bus throughput but it requires more repeaters than the optimum number, and it is very sensitive to capacitive coupling, therefore shielding are required which increases wiring. Moreover, the peak current may increase due to the overlap of many transitions on the same wire. Boosters [13] and transition-aware global signaling [7] drive long wires without any repeater insertion. However, it was shown that signal integrity over long wires is a limiting factor when driving these wires without buffer insertion [14] [15]. Bus invert coding [16] reduces the number of transitions on a bus but it does not target crosstalk or leakage reduction, and the number of global wires and repeaters increases as more invert lines are used. Spatial encoding circuits for peak power reduction based on bus invert coding were proposed in [17]. Interconnects leakage power and crosstalk were addressed in [18] through encoding and leakage aware buffers. This technique has significant wiring overhead since each 3 bits are encoded into 4 bits and shields were inserted between every 4 wires. Also, it has significant devices area overhead due to the repeaters inserted on the extra wires and the sizing of the high threshold voltage transistors to achieve same delay as the low threshold voltage transistors; moreover, the technique reduces only subthreshold leakage.

In this paper, we propose a new bus encoding technique

called transition skew coding. This technique achieves good noise immunity and eliminates crosstalk by using shields between every two wires without increasing the total number of wires. The total number of repeaters is reduced considerably leading to a reduction in devices area and leakage power. Peak energy and peak current are reduced due to the reduction in simultaneous transitions on the bus. The average power reduction increases as the inputs switching activity increases. The information of two signals is sent over one wire through a single transition. The information is recovered based on the time the transition occurs. Pulse width modulation, pulse coding, and phase coding are other signaling techniques that are based on time. However, these techniques send a pulse on the wire which takes two global transitions and they require complicated encoding and decoding circuits leading to large overheads.

This paper is organized as follows. The simulation model used is presented in Section II. Transition skew coding approach is proposed in Section III. Section IV discusses repeaters sizing and spacing optimization method. The encoding and decoding circuits design are shown and analyzed in Section V. The experimental results are presented in Section VI. Section VII concludes the paper.

## II. Simulation Model

All circuit simulations in this work are done using HSPICE. An industrial 90nm library (CMOS090) from STMicroelectronics is used for the devices and interconnects models. The low threshold voltage devices were used in order to achieve high performance. The supply voltage used is 1.1V and all simulations were done at 110 °C. All repeaters and buffers have a P/N ratio of 2.74 to achieve equal rise and fall times. The FO4 delay equals 27ps. The process has 7 metal layers. Global wires routing is done at metal-6 since the top layer is usually used for clock and power lines routing. In order to achieve high integration density which is a requirement for scaled technologies, all wires were routed at minimum pitch. Table I shows the wires dimensions and the RC parasitics for the minimum pitch routing. A  $3\pi$ RC model is used for modeling each segment of a wire. As it can be seen from Table I, the coupling capacitances contribute to a large portion of the total capacitance. Increasing the spacing between the wires leads to a considerable reduction in coupling capacitances and a slight increase in ground capacitances. However, it has a negative effect on the integration density. Therefore, we only consider minimum pitch routing in our work.

## III. Transition Skew Coding

Transition skew coding reduces the number of active wires used for communication, therefore the number of repeaters, and reduces the peak current and energy during the worst case switching behavior of the bus. The bus structure is shown in Fig. 1. Signals travel from the output of the encoder flop and arrive to the input of a decoder flop. A buffering stage of one or more inverters is placed between the encoder flop and the first repeater. Each two signals are encoded and decoded separately. The encoder and decoder logic are placed before and after the flops since global buses latency is critical for the





Fig. 1.The proposed n bit bus structure with n/2 active wires and n/2 shields.

 TABLE I

 WIRE DIMENSION AND RC PARASITICS FOR MINIMUM PITCH ROUTING

| Width                            | 0.42 µm     |  |
|----------------------------------|-------------|--|
| Spacing                          | 0.42 µm     |  |
| Thickness                        | 0.9 µm      |  |
| Resistance                       | 58.201 Ω/mm |  |
| Total ground capacitance         | 91 fF/mm    |  |
| Single-wire coupling capacitance | 53.5 fF/mm  |  |

frequency of operation. This configuration does not affect the bus throughput, and the encoding and decoding latencies are added to the logic latencies at the transmitting and receiving sides. Two delay lines are used at both the receiving and transmitting sides in order to get two skewed clocks, CLK1 and CLK2. The skew between CLK2 and CLK1 equals the skew between CLK1 and the main system clock. More discussions about the skew will be presented in Section V. Initially, all flops need to be reset to a known state at start-up, similar to the transition encoded dynamic bus [11]. Whenever a transition happens at the input, a transition occurs at the bus line. If both inputs switch, the wire switches at the rising edge of the main system clock. If the first input switches and the second input did not switch, the wire switches at the rising edge of CLK1. If the second input switches and the first input did not switch, the wire switches at the rising edge of CLK2. At the receiving side, the decoder flop has three outputs which are asserted at the rising edge of CLK2. If a signal was sent at the rising edge of the main system clock, the three outputs can receive this signal. If a signal was sent at the rising edge of CLK1, only two outputs can receive this signal. If a signal was sent at the rising edge of CLK2, only one output can receive this signal. Based on this information, the decoder produces its outputs. The shields that are inserted between every two wires eliminate the delay variations caused by the coupling capacitances. However, there are other sources that cause delay variations (process variations, clock skew, temperature ...). In order to make the circuit robust against these variations, the skew time between the clocks need to be increased depending on the worst case delay variation.

#### IV. Repeaters Sizing and Spacing Optimization

Repeaters sizing and spacing optimization was done for a standard bus and a bus with shields inserted between every two wires, we will refer to the later as shielded bus. An 8 bit bus is used through the simulations since transition skew coding is independent of the bus width. The buses are optimized to achieve a delay near the optimum delay taking repeaters area and power into consideration. It was observed that for a given number of repeaters per line, the performance improvement through sizing decreases as repeaters size increases. Therefore, sizing the repeaters to achieve a delay near the optimum delay gives a considerable area and power improvement. For signal integrity purposes, we employ a constraint that the 10%-90% transition time at any point in the bus should not exceeds  $3 \times$  the transition time at the output of an inverter driving a FO4 load [17], which is 152ps in our case. For every bus length, the optimization was done by searching through a range of wire segments and repeaters size for the best configuration. The search range of the number of segments is between 1 and 10 segments, and the range of the repeaters NMOS width is between 2.4um and 28.5 um (PMOS=2.74×NMOS). A 4× minimum inverter size receiver is used at the endpoint of the wire in order to make a fast transition at the input of the receiving flop. When the number of segments is even, the number of inverters in the buffering stage should be odd. And when the number of segments is odd, the number of inverters in the buffering stage should be zero or even. This constraint ensures that the bus line remains non-inverted for all number of segments. A standard master-slave transmission gate flop is used as a storage element. Sizing of the buffering stage is made such that all inverters in this stage have an electrical effort [19] equal to the effort of the inverter at the output of the flop. The number of stages that achieves the smallest delay and satisfies the constraint mentioned before is chosen. Fig. 2 shows the sizing optimization for a 6mm shielded bus divided into three segments. As it can be seen, increasing repeaters NMOS width above 16.8um gives a very small bus delay improvement. Therefore 16.8um is chosen as the optimal width. After finding the optimal repeaters size for every number of segments, a similar approach for choosing the optimal number of segments was followed. The standard bus was tested during the worst case switching behavior which results in worst case capacitive coupling. Table II shows the optimization results for standard and shielded buses. The shielded bus has less delay, and the number and size of repeaters are less than a standard bus due to the elimination of the worst case capacitive coupling. This shows the benefits of using shields when a minimum pitch routing is employed. The clock frequencies of the standard and shielded buses for different wire lengths are shown in Table III. The frequency of operation for each bus length was obtained by adding the bus delay to the setup time and CLK-to-Q delay of the flops. The flops take between 3 to 3.5 FO4s delay depending on the size of the repeaters and the number of inverters in the buffering stage. A delay slack between 6 to 15% of the bus delay is added to the clock cycle in order to deal with variations and to make the clock period multiple of 10ps.

## V. Encoder and Decoder Circuits

Fig. 2 and Fig. 3 show the encoder logic and encoder flop circuits. IN1 and IN2 are generated from previous circuits, and they should be available before the clock rising edge by a time that is equal the encoder latency. EN1, EN2 and EN3 signals are mutually exclusive, i.e. when one signal is high, all the



Fig. 2.Repeaters size vs. delay for a 6mm bus divided into 3 segments.

TABLE II Repeaters optimization results for the standard and shift ded buses

| bus      | wire<br>length | # of segments | repeaters<br>NMOS<br>width | # inv. in<br>buffering<br>stage | delay  |
|----------|----------------|---------------|----------------------------|---------------------------------|--------|
|          | 6 mm           | 4             | 19.2 u                     | 1                               | 250 ps |
| standard | 9 mm           | 6             | 19.2 u                     | 1                               | 359 ps |
|          | 12 mm          | 8             | 19.2 u                     | 1                               | 468 ps |
|          | 6 mm           | 3             | 16.8 u                     | 2                               | 203 ps |
| shielded | 9 mm           | 5             | 15.6 u                     | 2                               | 291 ps |
|          | 12 mm          | 6             | 16.8 u                     | 1                               | 373 ps |

TABLE III STANDARD BUS AND SHIELDED BUS FREQUENCY OF OPERATION

| Bus    | Standard bus |           | Shielded bus |           |
|--------|--------------|-----------|--------------|-----------|
| length | period       | frequency | period       | frequency |
| 6 mm   | 360 ps       | 2.77 GHz  | 310 ps       | 3.22 GHz  |
| 9 mm   | 480 ps       | 2.08 GHz  | 400 ps       | 2.5 GHz   |
| 12 mm  | 600 ps       | 1.66 GHz  | 500 ps       | 2 GHz     |

others are low. These signals determine the rising edge (CLK, or CLK1, or CLK2 rising edge) at which the wire input should change its state. When there is a change in the inputs, EN signal goes high in order to cutoff the feedback that stores the state on the wire, and either EN1, or EN2, or EN3 goes high based on the new state of the inputs. After the rising edge of CLK2, all enable signals goes back to low and the wire stores its new state. The critical path of the encoder circuit is from the inputs to the low to high transitions of EN and EN1 signals since these signals have to be generated before the rising edge of CLK, whereas EN2 and EN3 signals have to be generated before the rising edge of CLK1 and CLK2 respectively. Therefore, the gates that generate EN2 and EN3 can be sized smaller than the gates that generate EN and EN1 in order to reduce the load on the XORs in the previous stage. Moreover, the gate that generates the EN1 signal can be skewed to favor the low to high transition since the other transition is not critical. Also, transistor reordering at the XOR gates enhances the performance of the circuit. After the delay optimization of the encoder circuit, assuming that the inputs are generated from minimum sized buffers, the delay of the circuit was found to be 52ps. However, unlike a standard flop, the encoder flop has zero setup time since the state at the master stage of the flop is setup after the falling edge of the clock. Therefore, the setup time of the standard flop, which is 1 FO4 when the flop is driven by a minimum sized buffer, is subtracted from the encoder latency to determine the latency overhead of the encoder, which was found to be less than 1







Fig. 3. Transition Skew Encoder flop.

CLK

cίκ

CLK

EN3 CLK2

EN2 CLK1

CLK2

EN

EN

EN3

EN2 CLK1

EN1 CLK

EN1 CLK



Fig. 5. Transition Skew Decoder.

FO4. Due to the long time that the encoder flop can use as a setup time, sizing was done to the inverter at the master stage along with the transmission gates, in order for the encoder flop to have the same CLK-to-Q delay as the standard flop.

Fig. 4 shows the decoder flop used to generate the N1 and N2 signals in the decoder circuit shown in Fig. 5. The decoder generates the outputs based on the previous outputs and the signals received from the flops. The delay lines at the outputs of the flops, that hold the previous decoder outputs state, are used to eliminate glitches at the outputs of the decoder. Fig. 6 shows the simulation waveforms for a 9mm encoded bus operating at 2.5 GHz with 35ps skew between each two clocks. The decoder latency equals the decoding logic latency plus two skew delays since all the decoding input signals are asserted at the rising edge of CLK2. The critical path in the decoding circuit is between the inputs and Out2. The decoding flops are sized to have same CLK-to-Q delay as the standard flop under same loading conditions. Therefore, the flops delay is not considered when measuring the decoding latency. The transistors driven by node N4 are sized smaller than the transistors driven by N1, N2 and N3 due to the large load present on node N4. The decoding circuit was optimized for delay assuming a 4× minimum inverter size load is present at each output. The decoding circuit latency was 103ps.

The skew between the clocks is critical in determining the robustness of the communication and the decoding latency. Using the minimum skew that achieves correct functionality achieves the smallest decoding latency; however the bus will be very sensitive to delay variations. Increasing the skew increases the robustness at the price of increasing the decoding



Fig. 4. Decoder flop (Dec. Flop)



Fig. 6. Simulation waveforms for a 9mm bus using transition skew coding. The frequency is 2.5 GHz, and the skew between each two clocks is 35ps.

latency. The minimum skew is determined by the delay slack in the clock cycle. The maximum skew is determined by the encoder circuit, the difference between CLK2 rising edge and CLK falling edge can not be too small such that the feedback in the encoder flop can affect the bus input state after a change in this state has occurred. Table IV presents the minimum and maximum skews for different bus lengths. Compared to the standard bus, the encoded bus has a total latency overhead of 4.7 FO4s for the 6mm bus and 3.6 FO4s for the 9mm and 12mm buses when minimum skew is used. On the other hand, the frequency of the encoded bus equals the shielded bus frequency which is higher than the standard bus frequency. Therefore, the encoding and decoding latencies are reasonable and do not add any overheads on the bus frequency.

## VI. Experimental Results

The peak energy and peak current were measured by applying the worst case switching vector on the bus. Such vector results in maximum energy and current during a clock period. For the encoded bus, the energy and current consumed by the encoder and decoder and the clocking delay lines are included in the measurements. The peak energies versus bus lengths are plotted in Fig. 7. The shielded bus has less peak energy than the standard bus since it does not suffer from the worst case capacitive coupling and the number and size of repeaters are less. Whereas, the encoded bus has all the advantages of the shielded bus and its active wires are reduced by half. The peak energy reduction of the encoded bus increases as the bus length increases. For the 12mm bus, the encoded bus reduces peak energy by 45% compared to the shielded bus, and 62% compared to the standard bus.

The peak current for the standard bus is the same for all bus lengths since repeaters sizes and spacing are the same. For the shielded and encoded bus, the difference in peak current between all bus lengths is very small. Table V shows the normalized peak currents. The encoded bus peak current is almost half of that of a standard bus. Therefore, the encoded bus is efficient in terms of peak energy, which is critical for thermal management, and peak current, which is important for the supply network reliability.

In order to compare the average energy per cycle for different buses, random inputs are generated over 1000 cycles. Simulations are carried at different switching activities, such that all inputs have same switching activity and the switching of one input is independent of other inputs switching. The delay lines for the encoded bus are designed to achieve a skew of 35ps for the 6mm and 9mm buses, and 45ps for the 12mm bus. Fig. 8, 9 and 10 compares the average energy per cycle of the standard, shielded and encoded buses for different wire lengths. All numbers are normalized to the average energy of the standard bus. The shielded bus reduces average energy compared to the standard bus due to the reduction in the number and size of repeaters. The energy saving for the shielded bus is almost constant over all switching activities; however it is slightly higher at low switching activities where leakage energy is a significant contributor to the overall energy. The encoded bus has more clocking energy than the other buses due to the delay lines used to generate the skewed clocks, and the extra clocked transistors and flops. The clocking energy is constant over all inputs switching activities. Other sources of energy overhead are the encoding and decoding circuits. On the other hand, the encoded bus has considerable leakage savings since the total transistors width is reduced considerably. The energy savings due to leakage are apparent at low switching activities. At 0.1 and 0.2 switching activities, the encoding and decoding switching energy overhead becomes apparent since the probability that two neighboring wires switch simultaneously is low. As the switching activity increases and the probability of simultaneous switching increases, the energy savings of the encoded buses increase due to the reduction in the number of switching global wires. The energy saving of the encoded bus increases as the wire length increases.

Leakage energy is also measured independently from the



Fig. 7. Peak energy for different bus lengths.

12mm

| TABLE IV       |                                                    |              |  |  |
|----------------|----------------------------------------------------|--------------|--|--|
| MINIMUM AND M. | MINIMUM AND MAXIMUM SKEW FOR DIFFERENT BUS LENGTHS |              |  |  |
| Bus length     | Minimum skew                                       | Maximum skew |  |  |
| 6mm            | 25ps                                               | 70ps         |  |  |
| 9mm            | 25ps                                               | 90ns         |  |  |

35ps

115ps

|              | TABLE V              |             |
|--------------|----------------------|-------------|
| N            | ORMALIZED PEAK CURRE | NT          |
| Standard bus | Shielded bus         | Encoded bus |
| 1            | 0.80                 | 0.52        |

overall energy. This allows us to know the exact leakage power for each bus and the average power consumed during standby mode when the circuits are idle and the clock is gated. All possible input vectors were applied and the flops outputs are assumed to be the same as their inputs. Leakage power is measured for each input vector, and the average power over all combinations is considered as the average leakage power. The number of combinations for the encoded bus is considerably larger than the other buses because the state of the wire is independent from the state of the inputs. Therefore, the measurement is done on a 2 bit portion of the bus, and the result is multiplied by 4 to get the average leakage power of the 8 bit encoded bus. As shown in Table VI, the encoded bus reduces leakage significantly compared to the standard and shielded buses.

The total transistors widths for different buses are tabulated in Table VII. As wire length increases, bus encoding reduces total devices area more and more since the reduction in repeaters area increases, whereas the encoding, decoding and the delay lines area is almost constant. The wiring area for an encoded bus is the same as a standard bus, which is half the wiring area of a shielded bus. Moreover, the encoded bus has all the advantages of a shielded bus in terms of coupling noise reduction.

### VII. Conclusion

Transition skew coding for global on-chip interconnects has been proposed. Simulations show that this technique is efficient in terms of peak energy and current, average energy and leakage, and total devices area. Transition skew coding has all the advantages of shielding without any wiring area penalty. The encoding and decoding circuits have been discussed and analyzed. The encoder has a very low latency.



Fig. 8. Average energy per cycle for 6mm length buses



Fig. 10. Average energy per cycle for 12mm length buses

TABLE VI

|          | 6mm       | 9mm       | 12mm      |
|----------|-----------|-----------|-----------|
| Standard | 456.72 μW | 667.48 μW | 878.24 μW |
| Shielded | 333.76 μW | 483.12 μW | 586.96 μW |
| Encoded  | 235.68 µW | 311.2 μW  | 363.8 μW  |

TABLE VII

| TOTAL TRANSISTORS WIDTH |            |            |            |
|-------------------------|------------|------------|------------|
|                         | 6mm        | 9mm        | 12mm       |
| Standard                | 2508.36 μm | 3657.29 μm | 4806.22 μm |
| Shielded                | 1836.96 µm | 2653.44 μm | 3219.88 μm |
| Encoded                 | 1638.5 μm  | 2046.68 µm | 2363.6 µm  |

However, a tradeoff exists between the decoder latency and the robustness against delay variations, which is the main limitation of the technique. Increasing the decoder latency does not affect the encoded bus frequency, which is higher than the standard bus frequency due to the elimination of crosstalk.

## References

- S. Borkar, "Design challenges of technology scaling", *IEEE Micro*, vol. 9, issue 4, july-august 1999, pp. 23-29.
- [2] D. Sylvester, H. Kaul, "Future performance challenges in nanometer design", *DAC* 2001.
- [3] R. Arunachalam, E. Acar, S. Nassif, "Optimal shielding/spacing metrics for low power design", *ISVLSI* 2003.
- [4] M. Elgamel, A. Kumar, and M. Bayoumi, "Efficient shield insertion for inductive noise reduction in nanometer technologies", *IEEE Trans. VLSI Syst.*, vol. 13, no. 3, pp. 401-405, March 2005.



Fig. 9. Average energy per cycle for 9mm length buses

- [5] K. Skadron *et al.*, "Temperature-aware microarchitecture", *ISCA* 2003, pp. 2-13.
- [6] K. Bernstein, C. Chuang, R. Joshi, and R. Puri, "Design and CAD challenges in sub-90 nm CMOS technologies", *ICCAD*, Nov. 2003, pp. 129-136.
- [7] H. Kaul, and D. Sylvester, "Low-power on-chip communication based on transition-aware global signaling (TAGS)", *IEEE Trans.* VLSI Syst., vol. 12, no. 5, May 2004.
- [8] P. Saxena, N. Menez, P. Cocchini, and D. Kirkpatrick, "Repeater scaling and its impact on CAD", *IEEE Trans. CAD Integr. Circuits Syst.*, vol. 23, no. 4, pp. 451-463, April 2004.
- [9] A. Maheshwari, and W. Burleson, "Differential current-sensing for on-chip interconnects", *IEEE Trans. VLSI Syst.*, vol. 12, no. 12, Dec. 04.
- [10] H. Zhang, V. George, and J. Rabaey, "Low-swing on-chip signaling techniques: effectiveness and robustness", *IEEE Trans. VLSI Syst.*, vol. 8, pp. 264-272, June 2000.
- [11] M. Anders, N. Rai, R. Krishnamurthy, and S. Borkar, "A transition-encoded dynamic bus technique for high-performance interconnects", *IEEE J. Solid-State Circuits*, vol.38, pp. 709-714, May 2003.
- [12] V. Deodhar, and J. Davis, "Optimization of throughput performance for low-power VLSI interconnects", *IEEE Trans. VLSI Syst.*, vol. 13, pp. 308-318, March 2005.
- [13] A. Nalamalpu, S. Srinivasan, and W. Burleson, "Boosters for driving long onchip interconnects-design issues, interconnect synthesis, and comparison with repeaters", *IEEE Trans. CAD of Circuits and Syst.*, vol.21, pp.50-62, Jan. 2002.
- [14] A. Deutsch et. al., "When are transmission-line effects important for on-chip interconnections?", *IEEE Trans. Microwave Theory* and Tech., vol. 45, no. 10, pp. 1836-1846, Oct. 1997.
- [15] C. Alpert, A. Devgan, and S. Quay, "Buffer insertion for noise and delay optimization", *IEEE Trans. CAD of Circuits and Syst.*, vol.18, pp.1633-1645, Nov. 1999.
- [16] M. Stan, and W. Burleson, "Bus-invert coding for low-power I/O", *IEEE Trans. VLSI Syst.*, vol. 3, no. 2, pp. 49-58, March 1995.
- [17] H. Kaul, D. Sylvester, M. Anders, and R. Krishnamurthy, "design and analysis of spatial encoding circuits for peak power reduction in on-chip buses", *IEEE Trans. VLSI Syst.*, vol. 13, no. 11, Nov. 2005.
- [18] R. Rao, H. Deogun, D. Blaauw, and D. Sylvester, "Bus Encoding for Total Power Reduction Using a Leakage-Aware Buffer Configuration", *IEEE Trans. VLSI Syst.*, vol. 13, no. 12, pp. 1376-1383, Dec. 2005.
- [19] N. H. E. Weste, D. Harris, "CMOS VLSI design: a circuit and systems perspective", Addison Wesley, 3<sup>rd</sup> ed., chp. 4, pp. 157-271.