# Design of an 8:1 MUX at 1.7Gbit/s in 0.8µm CMOS Technology J. Navarro S.Jr. and W.A.M. Van Noije Laboratório de Sistemas Integrados\* - University of São Paulo Av. Prof. Luciano Gualberto, 158, trav.3, CEP: 08805-900, São Paulo, SP, Brazil email: navarro@lsi.usp.br, Phone: +55-11-818.5668, Fax: +55-11-818.5664 #### Abstract The design of an 8:1 multiplexer circuit, for SDH/SONET data transmission systems, is presented. In order to achieve maximum transmission rates, new circuits, high speed input/output converters for ECL-CMOS levels and modified true single phase clocked (TSPC) cells, as well as new techniques for clock buffer optimization, were applied. The multiplexer implemented in a 0.8µm CMOS process (0.7µm effective length) achieved 1.7Gbit/s rate and 42.6µW/MHz power consumption at 5V. These results were compared to a previous implementation (in the same process), and to other recently published works, showing superior performances. ### 1. Introduction The well known superior CMOS characteristics—design ease, high integration capability, low power consumption, and low price—have stimulated its utilization in all sort of applications. During the last few years, the aggressive reduction of transistor dimensions and the consequent increase of speed have pushed CMOS into the field of high speed circuits, where technologies like Bipolar and GaAs were the only options. As a result, problems and difficulties with the clock distribution, the external signal handling, etc. have arisen in CMOS. To overcome such problems, new techniques and circuits have been proposed [1], [2]. The design of an 8:1 multiplexer, where new high speed CMOS techniques and circuits were applied, is presented. The circuit is part of a CMOS 1.2Gbit/s multiplexer/demultiplexer SONET/SDH chip set implemented in a standard 0.8µm CMOS process (0.7µm effective channel length) [3], [4]. The current implementation of the multiplexer attained 1.7Gbit/s, which can be favorably compared to other works [4]-[6], if the technological differences are taking into account. This paper is organized as follow. In section 2 the principal characteristics of the multiplexer architecture are distinguished. In section 3 the input/output CMOS-ECL converters, the true single phase clock cells, and the buffer optimizations are discussed. Experimental results are reported in section 4, and the main conclusions are drawn in the last section. ## 2. Multiplexer Architecture The main characteristics of the multiplexer architecture are described below: i. the clock input and the multiplexed signal output are compatible with common high speed environments: ECL logic levels and $50\Omega$ lines with resistor termination are usually adopted in those environments. ECL is useful to keep the noise low and the speed high since its level excursion is smaller (≅0.9V); additionally, the 50Ω line and termination are necessary to avoid signal reflections. In order to be compatible with high speed environments, the clock input circuit receives external ECL levels and converts them to CMOS levels [7]. The multiplexed signal output circuit, by contrast, changes the levels from the internal CMOS to the external ECL, and supports a $50\Omega$ load [8]. The ECL 10KH family high/low values, as shown in Table I, are managed by the multiplexer circuit. To work with such level values the power supply is 0V/-5V instead of the typical CMOS 5V/0V. ii. true single phase clock (TSPC) cells are used [2]: TSPC circuits are suitable for high speed designs. Their main advantage is the simplification of the clock distribution, with lower wiring costs and no phase overlapping problems. The cells were designed to explore both clock levels, resulting in two types of flipflops: flip-flops at which the output data is evaluated after the raising edge, and flip-flops at which the output is evaluated after the falling edge. <sup>\*</sup>This work was partially supported by FAPESP and CNPq/Protem. TABLE I ECL high/low values which are supported by the multiplexer | level | Maximum<br>(V) | Minimum<br>(V) | mean<br>(V) | |-----------------------------|----------------|----------------|-------------| | high (V <sub>H(ECL)</sub> ) | -0.81V | -0.98V | -0.895 | | low ( $V_{L(ECL)}$ ) | -1.63V | -1.95V | -1.79 | Fig. 1. Transistor schematics and symbols of the main TSPC cells used in the multiplexer. The crosshatched blocks evaluate when clock (C<sub>i</sub>) is low; the others evaluate when clock is high. Additionally, some N-like blocks were introduced to speed up the circuit operation [1], [9]. The basic multiplexer cells are shown in Fig. 1. They all were drawn with small transistor dimensions, a feature which keeps the power consumption and the area low. TABLE II Transistor dimensions of the Fig. 1(d) cell | transistor | width<br>(μm) | length<br>(µm) | |-------------------------------------------|---------------|----------------| | M <sub>1</sub> ,, M <sub>9</sub> | 2.4 | 0.8 | | $M_{10}, M_{12}, M_{14}, M_{16}$ | 3.0 | 0.8 | | $M_{11}$ , $M_{13}$ , $M_{15}$ , $M_{17}$ | 4.0 | 0.8 | Fig. 2. Schematic of the 8:1 multiplexer. The transistor sizes of the cell in Fig. l(d) are depicted in Table $\Pi$ . iii. the multiplexer architecture is based on 2:1 multiplexer cells (Fig. 2): in this structure the input clock is divided by 2 and 4 successively. The first column of 2:1 multiplexers, which are controlled by the divided-by-4 clock, will take the circuit inputs, the eight signals in<sub>0</sub>, ..., $in_7$ , and convert them to the four signals $m_1$ , ..., $m_4$ . These signals have twice the rate of in<sub>0</sub>-in<sub>7</sub>. The 2:1 multiplexers controlled by divided-by-2 clock will subsequently receive $m_1, ..., m_4$ , and then generate $m_5$ and m<sub>6</sub>. Finally, these signals, whose rates are equal to the clock frequency, are merged by the last 2:1 stage to form the output signal. The output data signal rate is twice the clock frequency. This feature is advantageous since it simplifies the clock handling and reduces the power dissipation. Details of the 2:1 multiplexers will be presented in the next section. # 3. Circuits and Techniques The circuit which converts the ECL levels of the clock input to CMOS levels is depicted in Fig. 3(a) [7]. It is composed by one source follower, transistor $M_{i1}$ and $M_{i2}$ , and inverters. The source follower function is to ensure that when the input signal, in, has -1.22V, the ECL average level, the input of the inverter gain stage, $o_{SF}$ , has nearly -2.5V. In this case, the excellent inverter gain characteristics can be used to amplify the ECL input signals. The capacitor C<sub>1</sub> is added to improve the circuit speed. To keep the correct operation for any process parameters, a reliable biasing circuit is necessary. Fig. 4(b) shows the implemented biasing circuit. The circuit is an exact copy of the first three stages of the input ECL-CMOS converter circuit; in addition, the output of the last stage is connected to the M'i2 gate. This transistor regulates the current in the follower source and, in consequence, its level shift. When -1.22V is supplied at V<sub>REF</sub>, the biasing circuit will adjust itself, through the feedback mechanism, providing a V<sub>BIASI</sub> voltage which assures the desired level shift for whatever process parameters. Circuit instability problems were avoided through the addition of the capacitor $C_X$ . Fig. 3. High speed ECL-CMOS converter input buffer (a), and its biasing circuit (b) [7]. Fig. 4. High speed CMOS-ECL converter output buffer [8]. This bias circuit is slightly different from [7] where a differential gain stage is used in the feedback. The new bias circuit is more robust to process variations, but wastes more power. The output buffer, which makes the CMOS to ECL level conversions and supports $50\Omega$ load, is presented in Fig. 4. The buffer is based on switched current sources. The description of the circuit operation is as follow: when the input signal is in=-5V, low level for CMOS logic, the transistor $M_{O1}$ is off. In this case, the output node will be pulled up to $V_{H(ECL)}$ value by the termination resistor ( $50\Omega$ external resistor). On the complementary input, in=0V, $M_{O1}$ will be conducting and a fixed current is forced through the $50\Omega$ resistor by transistor $M_{O2}$ . The $V_{BIASO}$ is adjusted to assure the correct level for $V_{L(ECL)}$ . Similar to the input circuit, a configuration with feedback is designed to provide a process independent correct low level [8]. Both circuits have proved that they work with signals as fast as 2.4Gbit/s (see results). The circuit schematics of the Fig. 1 show the evaluation phase of each block. Crosshatched blocks evaluate when clock is low; the others, when clock is high. Thus, the cells can be classified into two groups according to the input-output delay: - a. circuits (a) and (c) (see Fig. 1) are flip-flops where the input signal flows to the output immediately after the clock rising edge. Furthermore, when the clock is at the low level, the circuit output is in high impedance; - **b.** circuits (b), (d), and (e) are flip-flops where the input signal flows to the output half clock period after the clock rising edge. Here, the output is in high impedance when the clock level is high. Connecting the output of two cells, one from the first group and other from the second group, a 2:1 multiplexer is built. The output data rate of this multiplexer is twice its clock input frequency. In Fig. 2, different varieties of 2:1 multiplexers are found. Note that the complete multiplexer could be implemented with only (a) and(b) cells. In fact, a previous 8:1 multiplexer were designed with them [4]. The other cells were further introduced to achieve a higher speed. They were developed starting from the (a) and (b) cells. The most important modification is the suppression of stacked p-transistors, [1] and [9], using in their places ratioed logic. It improves the low-to-high transition delay and, in consequence, the global performance. A detailed description of the cells is presented below: cell a: this cell is the well established TSPC D-flip-flop [2]; cell b: this cell is formed from cell (a) plus a stage which evaluates on the low clock. This stage, duplicated to support a large load, is responsible for the half clock delay of the output, an essential feature to build the 2:1 multiplexer; cell c: two modifications on cell (a) produce this faster cell: the first block is replaced with a ratioed block; the last two blocks are duplicated; cell d: the utilization of ratioed blocks in the cell (b) last stages produces this new cell which is faster than (b); cell e: to increase more the cell (d) speed, the first block is also replaced with a ratioed block. The modified cells (c), (d), and (e) allow faster 2:1 multiplexers, but, in this case, an augmented power consumption should be expected. In consequence, the new cells were only applied to the most critical parts, namely on the second and third columns (see Fig. 2). Several tapered buffers are necessary in the multiplexer (Tbuffer1, ..., and Tbuffer5 in Fig. 2). Tbuffer1, Tbuffer2, and Tbuffer5 require more attention since they must support frequencies as high as the input clock frequency. These buffers were optimized not applying the classic criterion of minimizing the input-output delay; by contrast, they were designed to allow the propagation of minimum width pulses [10]. An external synchronization signal select is present in the multiplexer (Fig. 2). It might be used to align the data and the clock/4, when the clock/4 rising edge is too close to the data transition edge. ### 4. Experimental Results The 8:1 multiplexer was fabricated in a 0.8µm CMOS process (ATMEL-ES2, 0.7µm effective length). Together with the circuit, a test structure to evaluate the input/output buffers was placed. This structure consists of the CMOS-ECL input buffer connected to the ECL-CMOS output buffer. Tests were done on bounded ICs, 40-pin QFP package, and with power supply of 0V/5V. Input/output values are consequently shifted to 4.1V and 3.2V, for high and low level, respectively. The input/output buffers test consisted in sending an ECL signal to the IC input buffer, and recovering it at the IC output buffer. The performance of the circuits can be estimated comparing both signals. An applied 2.4Gbit/s input signal and the output result are shown in Fig. 5. Here, the input signal was somewhat delayed to ease the comparison. The *output* of the multiplexer working at 1.7Gbit/s rate is presented in Fig. 6. The input signal in this test was the sequence "01100101". The input *clock* signal is also shown, for each clock edge, a new bit is sent to the output (in the oscilloscope, the two signals were displayed in phase for comparison purposes). In Table III, the main test results of the implemented 8:1 multiplexer, along with the measurements of a previous version, are summarized. ### 5. Conclusions New CMOS circuits, such as input/output buffers and modified TSPC flip-flop cells, and better CMOS techniques were applied in the design of the high speed 8:1 multiplexer. A maximum speed of 1.7Gbit/s was reached, nearly 60% improvement over a previous implementation done with the same process (Table III). The comparison of recently published proposed circuit and implementations can be made with the data shown in Table IV. To take in account the technological differences, our results are rewritten considering the scaling rules for constant-filed scaling [11]. Since the minimum length and the power supply of the multiplexers in [5] and [6] are more than twice as small as those of our circuit, the scale factor is set to two. At the bottom of the Table IV, the estimated scaled values illustrate the advantages of the used architecture, CMOS circuits, and techniques TABLE III The experimental results for the current multiplexer implementation and the previous one | circuit | area<br>(only the<br>multiplexer core)<br>(µm²) | power<br>consumption<br>for V <sub>DD</sub> =5V<br>(μW/Mbit/s) | technology | maximum<br>frequency<br>(Gbit/s) | Differences between the two implementations | |----------|-------------------------------------------------|----------------------------------------------------------------|---------------------------|----------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | previous | 60,000 | 41.85 | 0.8µm CMOS<br>(ATMEL-ES2) | 1.05 | <ul> <li>ECL-CMOS input circuit of [7] is used;</li> <li>only the latches of Fig.1a/b are used;</li> </ul> | | current | 72,000 | 51.6 | 0.8μm CMOS<br>(ATMEL-ES2) | 1.7 | <ul> <li>ECL-CMOS input circuit of [7] and the improved bias circuit (Fig.3a) are used;</li> <li>all latches of Fig.1 are used;</li> <li>the Tbuffers 1, 2, and 5 were optimized to pass the minimum width pulses;</li> </ul> | <sup>\*</sup> the power consumption of the multiplexer core, the input circuit, and the input/output bias circuits (not including the termination resistors). TABLE IV Technology, power supply, power consumption, and speed for three different 8:1 multiplexers | multiplexer | Technology<br>(μm) | Power supply (V) | Power consumption (μW/MHz) | speed<br>(Gbit/s) | |-------------------------------------------|--------------------|------------------|----------------------------|-------------------| | <b>[5]</b> (8:1) | 0.15 | 2.0 | 39.3 | 3.0 | | [6] (16:2) | 0.25 (CMOS/SIMOX) | 2.0 | 68.0 (34.0 for 8:1) | 2.5 | | this work (8:1) | 0.8 | 5.0 | 42.6 | 1.7 | | this work scaled (constant-field scaling) | 0.4 | 2.5 | 10.7 | 3.4 | ### References - [1] B. Chang et al., "A 1.2 GHz CMOS dual-modulus prescaler using new dynamic D-type flip-flops," *IEEE J. Solid-State Circuits*, vol. 31, pp. 749-752, May, 1996. - [2] J. Yuan and C.Svensson, "High Speed CMOS Circuit Technique," *IEEE J. Solid-State Circuits*, vol. 24, pp. 62-70, Feb. 1989. - [3] F.L. Romão et al., "1.2Gb/s SONET/SDH demux in CMOS technology," in *Proc. SBMO/IEEE MTT-S International Microwave and Optoelectronics Conference*, Rio de Janeiro, July 1995, vol. 1, pp. 52-57. - [4] F.L.Romão et al., "Design of SONET/SDH 8:1MUX circuit at 1.25Gb/s rates in 0.7μm CMOS technology," in *Proc. IX Simp. Brasileiro de Concepção de Circuitos Integrados*, Recife, Brazil, Mar. 1996, pp. 201-211 (in Protuguese). - [5] M. Kurisu et al., "2.8-Gb/s 176-mW byte-interleaved and 3.0-Gb/s 118-mW bit-interleaved 8:1 multiplexers with a 0.15-μm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 31, pp. 2024-2029, Dec. 1996. - Y. Ohtomo et al., "A 40Gb/s 8x8 ATM switch LSI using 0.25μm CMOS/SIMOX," in ISSCC Dig. Tech. Papers, Feb. 1997, pp. 154-155. Fig. 5. Input/output buffers test results. The input signal has 2.4Gbit/s rate. The output signal is inverted. - [7] J. Navarro et al., "A High Speed CMOS ECL Compatible Input Circuit," Proc. X Congress of the Brazilian Microelectronics Society and Ibero American Microelectronics Conference, Brazil, July 1995, pp. 197-204. - [8] J. Navarro et al., "A 1.4Gbit/s CMOS driver for 50Ω ECL systems," in Proc. 7th Great Lakes Symposium on VLSI, Illinois, Mar. 1997, pp. 14-18. - [9] J. Navarro and W. Van Noije, "E-TSPC: Extended True Single Phase Clock CMOS circuit technique," VLSI: Integrated Systems on Silicon, ed. R. Reis and L. Claesen, London: Chapman & Hall, 1997, pp. 165-176. - [10] J. Navarro and W. Van Noije, "CMOS Tapered Buffer Design for Small Width Clock/Data Signals Propagation", to be presented in the 8th Great Lakes Symposium on VLSI, Louisiana, Feb. 1998. - [11] R.S. Muller and T.I. Kamins, Device electronics for integrated circuits. 2<sup>nd</sup> ed., New York: John Wiley & Sons, 1986. Fig. 6. Multiplexer output test results. The input signal is "01100101" and the output rate is 1.7Gbit/s.