# A True Single-Phase 8-bit Adiabatic Multiplier

Suhwan Kim T. J. Watson Research Center IBM Research Division Yorktown Heights, NY 10598 suhwan@us.ibm.com Conrad H. Ziesler EECS Department University of Michigan Ann Arbor, MI 48109 cziesler@eecs.umich.edu

Marios C. Papaefthymiou EECS Department University of Michigan Ann Arbor, MI 48109 marios@eecs.umich.edu

# ABSTRACT

This paper presents the design and evaluation of an 8-bit adiabatic multiplier. Both the multiplier core and its built-in self-test logic have been designed using a true single-phase adiabatic logic family. Energy is supplied to the adiabatic circuitry via a sinusoidal power-clock waveform that is generated on-chip. In HSPICE simulations with post-layout extracted parasitics, our design functions correctly at clock frequencies exceeding 200 MHz. The total dissipation of the multiplier core and self-test circuitry approaches 130pJ per operation at 200MHz. Our 11,854-transistor chip has been fabricated in a  $0.5\mu$ m standard CMOS process with an active area of 0.470mm<sup>2</sup>. Correct chip operation has been validated for operating frequencies up to 130MHz, the limit of our experimental setup. Measured dissipation correlates well with HSPICE simulations.

# **Categories and Subject Descriptors**

B.7.1 [Hardware]: Integrated Circuits—*Types and Design Styles*; B.8.1 [Hardware]: Performance and Reliability—*Reliability, Testing, and Fault-Tolerance*; B.8.m [Hardware]: Performance and Reliability—*Miscellaneous* 

# **General Terms**

Design, Measurement, Verification

# Keywords

Adiabatic logic, Clock generator, CMOS, Dynamic logic, Low power, Low energy, SCAL, SCAL-D, Single phase, Multiplier, VLSI

# 1. INTRODUCTION

In conventional CMOS design, charge is transferred during the course of computation between circuit capacitances and fixed power supply voltages. Consequently, the energy consumption of CMOS circuitry per cycle is proportional to the product  $CV^2$ , where C is the total switched capacitance, and V is the difference between the power and ground voltages. Adiabatic circuitry presents a promising alternative to this approach. The main idea behind adiabatic

Copyright 2001 ACM 1-58113-297-2/01/0006 ...\$5.00.

design is to transfer charge between circuit capacitances and a time varying power-clock node. This scheme enables the charge transfers to occur in a controlled manner, limiting the currents and thus the dissipation across the active devices. Any undissipated energy stored in circuit capacitance is recycled through an inductor or a network of switched capacitors [1, 3, 17]. Thus, adiabatic circuitry can potentially achieve sub- $CV^2$  energy dissipation per cycle.

Over the past decade, several adiabatic circuit topologies have been proposed with very promising energetics at relatively low clock rates [2, 9, 11, 12, 13, 17]. Their use at high clock frequencies poses several practical challenges, however, including the implementation of complex control schemes, the distribution of multiple clock phases, and the management of data-dependent clock capacitance fluctuations [15].

We recently presented an adiabatic logic family with simple clocking requirements, specifically geared towards high-speed design [6]. Our logic family relies on a single phase of a sinusoidal powerclock to provide both control and energy to the circuitry. Simulation results with our single-phase adiabatic logic have been very encouraging, indicating correct and low-energy operation at high frequencies.

To demonstrate the robustness, efficiency, and practicality of our single-phase adiabatic family, we used one of its members, called SCAL-D, to design an 8-bit unsigned multiplier. Our chip included built-in self-test logic, an integrated resonant clock generator, and circuits for converting between adiabatic and CMOS signaling conventions. With its 11,854 transistors, our design was sufficiently large and complex to enable a thorough exploration of several issues that are central to SCAL-D design in particular and adiabatic chip design in general [7]. This paper describes our chip and its empirical evaluation, including results for both the multiplier core and the clock generator.

HSPICE simulations of our multiplier with post-layout extracted parasitics demonstrate its correct operation across a broad range of frequencies. Our design dissipates less energy than a voltage scaled, pipelined, static CMOS multiplier that we designed for comparison. While operating in self-test mode at a clock rate of 100MHz, our adiabatic multiplier dissipates approximately 91pJ per operation with a 2.2V peak supply. At 200MHz, it is roughly 4 times more energy efficient than its CMOS counterpart, dissipating only 130pJ per operation with a 2.7V peak supply. These efficiencies were obtained without relying on any optimization tools and despite our conservative design approach that was primarily aimed at obtaining a working chip and thus ignored substantial energy optimization potential.

Our multiplier was fabricated in a standard 3-metal, 1-poly,  $0.5\mu$ m CMOS process through MOSIS. We have experimentally validated the correct operation of our chip at frequencies up to 130MHz,

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DAC 2001, June 18-22, 2001, Las Vegas, Nevada, USA.

limited by the bandwidth of the off-chip interface. Moreover, we have obtained measurements of its power dissipation which correlate well with simulation results under identical operating conditions. The correct operation of the integrated clock generator at frequencies over 140MHz has also been experimentally validated, although a similar measurement bandwidth problem prevented accurate combined power measurements at these frequencies.

The remainder of this paper has five sections. The architecture of the 8-bit multiplier, associated self-test circuitry, and internal power-clock generator is described in Section 2. Section 3 provides an overview of our design process. Section 4 presents a simulations-based comparison of our adiabatic multiplier with corresponding static CMOS designs. Section 5 provides results from the testing and experimental evaluation of our fabricated chip. Our contributions and ongoing research are summarized in Section 6.

## 2. MULTIPLIER



Figure 1: Block diagram of test chip.

A block diagram of our design is given in Figure 1. Our chip includes two 8-bit unsigned multiplier cores with built-in self-test logic, a single-phase power-clock generator, and adiabatic-to-digital converters to enable the observation of critical signals. The multiplier core and self-test circuitry were implemented entirely in SCAL-D [7]. Our design was conservative, with the objective of achieving correct operation and competitive energy efficiency at operating frequencies exceeding 200 MHz at 3.0V. Approximately 75% of its 11,854 transistors make up the multiplier core, with the remaining 25% devoted primarily to the self-test circuitry. In a  $0.5\mu$ m standard CMOS process, total design area is approximately 0.710mm<sup>2</sup> (= 0.829mm × 0.857mm), including the multiplier core of 0.470mm<sup>2</sup> (= 0.781mm × 0.607mm). The multiplier and testing logic have latencies of 15 and 4 cycles, respectively.

#### 2.1 Partial Product Summation Cell

Two basic operations are performed during multiplication: Evaluation of partial products and accumulation of the shifted partial products. The scheme used to compute and schedule these two operations directly affects the complexity, performance, and power dissipation of the resulting multiplier structure [10].

Since each SCAL-D gate combines both logic and state holding functions, we chose to implement a fully pipelined carry-save multiplier architecture. The most critical and complex of the cells that comprise this multiplier is the parallel multiplication cell [16]. The





Figure 2: (a) Schematic and (b) layout of multiplier cell in SCAL-D.

schematic in Figure 2(a) shows how the SCAL-D gates in this cell are supplied by a sinusoidal power-clock PC, DC supplies  $V_{dd}$  and  $V_{ss}$ , and DC bias voltages  $V_{bn}$  and  $V_{bp}$ . Not shown for clarity are the evaluation trees that compute an AND function and a 1-bit full adder function. This partial product cell contains the equivalent of 3 bits worth of state distributed among 7 SCAL-D gates that use 85 transistors total. An equivalent static CMOS implementation would require 28 transistors for the logic and 3 flip-flops at roughly 24 transistors each, for a total of 100 transistors. If latches are used instead of flip-flops, static CMOS would still require 80 transistors (assuming that 6 static latches replace the 3 flip-flops).

## 2.2 Self-Test Logic



Figure 3: 16-stage BILBO in SCAL-D.

Our self-test logic is based on Koenemann's built-in logic block observer (BILBO) [4, 8], which uses a linear feedback shift register to either function as a pseudorandom pattern generator or as a signature analyzer. We implemented our self-test logic entirely in SCAL-D, thus enabling full-speed testing without any synchronization issues. A logic diagram for the SCAL-D BILBO is shown in Figure 3. Since each SCAL-D gate implicitly contains a state element, no flip-flops are used. To generate the maximum-length sequences, the primitive polynomial  $1 + x + x^7 + x^{10} + x^{16}$  is used for the linear feedback shift register, which functions when the BILBO is in self-test mode. In normal operation mode, the BILBO is transparent, acting merely as a set of latches. Specifically, BILBO-1 and BILBO-2 in Figure 1 are configured as a pseudorandom pattern generator and a multiple input signature analyzer, respectively, whose output sequences ix and ox can be used to infer the correct operation of the entire multiplier.

#### 2.3 Power Clock

The voltages  $V_{dd}$ ,  $V_{ss}$ , and  $V_{PC}$  are supplied to each SCAL-D gate. Although distributing the power-clock can be a difficult problem for some adiabatic logic families, our SCAL-D circuits are naturally resistant to most of these distribution problems. Two major issues are associated with the distribution of a sinusoidal power-clock signal. The first issue is the  $i(t) \cdot R$  voltage drop on the power-clock distribution network. SCAL-D circuits are relatively immune to this drop, because it occurs mostly during the rising and falling edges, when the circuits are actively driving their outputs. The  $i(t) \cdot R$  voltage drop can be reduced even further by using wider clock distribution wires. Because only a single power clock needs to be distributed, using wider wires is less costly in terms of wire area than for multiple phase systems.

The second issue with sinusoidal clock distribution is the unknown or variable (data-dependent) capacitance attached to the powerclock. Since the power-clock is part of a resonant network, operating frequency varies with load capacitance. The load capacitance attached to the power-clock node has components that are data-dependent (such as load differences between the true and false data rails), and data-independent (such as the clock distribution wires). For example, the capacitance of our power-clock distribution tree contributes about 57% of the total capacitance loading of the power-clock node. Thus, in our SCAL-D circuits, the relative magnitudes of the data-independent and data-dependent capacitances result in an insignificant 5% variation in power-clock current. These slight shifts in operating frequency can be readily tolerated as there are no phase relationships to maintain. In addition, this data dependent capacitance effect can be further minimized by routing dual rail signals along similar paths, a layout strategy that also helps noise immunity.

An internal single-phase clock generator was designed using the topology shown in Figure 4(a). The resonant *LC* system is composed of an off-chip inductor and the total on-chip capacitive load. This simple harmonic oscillator is pumped using a zero-voltage switching scheme with mosfet switches S1 and S2. The PMOS switch (S1) is turned on at the peak of the sinusoid, when the voltage difference between  $V_{dd}$  and  $V_{PC}$  is nearly zero. The NMOS switch (S2) is turned on at the negative peak of the sinusoid, when the voltage difference between  $V_{PC}$  and  $V_{ss}$  is nearly zero. This scheme minimizes both conduction losses and switching losses in the devices. Conduction losses are minimized, because each device at most has to conduct a current  $I_{switch} < (2E_{loss,half-cycle}/L_{resonant})^{1/2}$ . Switching losses are minimized, since the energy stored in the parasitic source-to-drain capacitance is nearly zero when the voltage difference between  $V_{PC}$  or between  $V_{PC}$  and  $V_{ss}$  is zero.



Figure 4: (a) Power conversion topology. (b) Block diagram of control logic.

Switches S1 and S2 are driven by a small control circuit as outlined in Figure 4(b). A 3-element differential ring oscillator generates pulses shown as signal *i*. The pulse width and frequency is tuned by adjusting the two bias voltages labeled  $V_{bn}$  and  $V_{bp}$  in the figure. An asynchronous state machine alternates the pulses on *i* into inverted pulses on *a* and *b*, preserving the pulse width and halving the frequency. The pulses on *a* and *b* are amplified to feed the gates of the PMOS and NMOS mosfets represented by switches S1 and S2 respectively. This control circuit is very small and efficient, allowing adjustments to frequency and duty cycle large enough to match with that of the resonant LC system. In addition, the resonant topology chosen is well matched to the largely capacitive loads presented by the SCAL-D adiabatic logic and incorporates the bond-wire and package parasitics inductances and capacitances into the tank circuit.

# 3. DESIGNING IN SCAL-D

Our methodology differs from conventional design practices for several reasons. First, each gate is inherently a combinational circuit plus a state element. Therefore, each pipeline stage includes only one level of logic. Second, since every signal is in phase with the power-clock signal, timing analysis is replaced with analog signal integrity analysis. Phase delays are thus interpreted as a reduction in signal amplitude. Third, as each gate often only needs minimum sized evaluation transistors, fanout load is a function of the logic being implemented and wire length.

Our design commenced with a Verilog behavioral multiplier as a reference model. A structural model was developed in Verilog using a library of modules modeling the logic functions of our adiabatic gates. This structural model was simulated thoroughly in the digital domain. Next, an unsized transistor-level subcircuit of each adiabatic gate was drawn. These subcircuits were connected together into a hierarchical transistor netlist describing the adiabatic multiplier.

As the design progressed into layout, HSPICE simulations were run on the adiabatic subcircuits. The trace data from these simulations were post-processed by a custom tool which verified that each gate's input and output voltage waveforms corresponded to correct logical evaluation. The output of this tool was a list of failing gates along with layout coordinates, thus enabling us to rapidly diagnose and correct failures resulting from subtle analog considerations.

Verification was primarily done using a custom tool that processed extracted netlists and HSPICE simulation data, verifying both the basic topology of each gate as well as the noise margins and correct operation of each gate in the simulation. Thus we were able to rapidly verify correct operation with reasonable simulation times. The clock generator was simulated separately, using an estimated total capacitance and a series resistance modeling the total losses. Package and pin parasitics were taken from data provide by MOSIS for their 40-pin ceramic dip package.

# 4. SIMULATION RESULTS



Figure 5: Energy consumption per cycle vs. frequency for multipliers including self-test logic.

In this section we present a simulation-based comparative evaluation of our adiabatic multiplier and corresponding static CMOS designs. Three pipelined CMOS multipliers with latency 2, 4, and 8 cycles, respectively, were synthesized using a library of standard cells for the same  $0.5\mu$ m process in which we fabricated our multiplier. Flip-flops were used as the state elements. The EPOCH design automation tool was used to generate the layouts with the OPTIMIZE = low power flag set. The static CMOS designs used 5,146, 6,518, and 9,926 transistors for the 2-stage, 4-stage, and 8-stage pipelines, respectively.

Figure 5 gives the energy consumption per cycle of our multipliers with associated self-test logic when operating at 50MHz, 100MHz, and 200MHz. For each operating frequency, the energy dissipation of each multiplier was obtained using the lowest supply voltage that ensured its correct operation at that frequency. The value of the supply voltage is shown next to each data point.

SCAL-D is more energy efficient than the other pipelined static CMOS designs across the entire frequency range of our simulations, despite the fact that we did not optimize the transistor sizes in our adiabatic design. At 50MHz, the energy consumption of SCAL-D is comparable to that of a pipelined, voltage scaled static CMOS design. At 100MHz and 200MHz, SCAL-D becomes substantially more efficient than the pipelined static CMOS design. Furthermore, anecdotal evidence indicates that SCAL-D has room for further improvement both in energy consumption and performance. Thus, SCAL-D presents a promising approach to further reducing the dissipation of static CMOS designs that have reached their voltage scaling limits.

It is difficult to compare our results directly with published designs as the technology, bit width, target frequency, and simulation methodologies vary widely. For completeness, however, we provide the following information. The serial-parallel locally-clocked dynamic-logic multiplier of [5], which was designed in 1.0  $\mu$ m CMOS for a target throughput of 82.5MHz (at a bitrate of 660MHz), uses 1,086 transistors and dissipates 5,520pJ per cycle for 8 × 8 multiplication. The low-power 8-bit pipelined multiplier using pulse-triggered flip-flops in [14] contains 3,849 transistors and dissipates 195 pJ per multiply at 300MHz in 0.6 $\mu$ m CMOS.

#### 5. TEST AND MEASUREMENT



Figure 6: Microphotograph of test chip.

To demonstrate the viability of single-phase adiabatic design, we set out to determine experimentally if SCAL-D was robust enough to function correctly in the presence of factors that are typically poorly accounted for in simulations. These factors include noise, dielectric loss, process variation, latch-up, package parasitics, and thermal effects. A die photo of our working multiplier chip is shown in Figure 6.

The test and measurement of the 8-bit SCAL-D multiplier and its self-test logic were to accomplish three objectives:

- Verify the complete functionality of the multiplier.
- Compare the power consumption of the chip with HSPICE simulation results for identical operating conditions.
- Validate the operation of the integrated clock generator under the influence of real parasitics.

# 5.1 Functional Testing

For testing purposes, we connected switches to the configuration inputs of the SCAL-D self-test logic, initializing the system into self-test mode. In this mode, BILBO 1 and BILBO 2 in Figure 1 are configured as a pseudorandom pattern generator and a multiple input signature analyzer, respectively. The output sequences of BILBO 1 and BILBO 2 are then converted to standard static CMOS logic levels and observed on the output pins to infer the correct operation of the entire multiplier.

Figure 7 shows the measured waveforms of the SCAL-D multiplier in self-test mode. The logic was supplied from the sinusoidal power-clock  $V_{PC}$  with a peak-to-peak amplitude of 3V, and the 3V DC supply voltage  $V_{dd}$ . The operating frequency of the power-clock was 130 MHz. We verified the functional correctness of stored waveforms up to  $50 \times 10^{-6}$  seconds, the limit of the digital oscilloscope memory. The output sequences of BILBO-1 and BILBO-2 are fed through the adiabatic-to-digital converters before being buffered and output to the pads. The 40-pin ceramic DIP package and loading by the oscilloscope probes limited our measurement bandwidth, as can be observed by the waveforms in Figure 7.



Figure 7: Measured waveforms of multiplier operating at 130 MHz in self-test mode: Ch1 through Ch4 show the power-clock  $V_{PC}$ , the BILBO control signal S2, the pattern generator sequence ix, and the signature analyzer sequence ox, respectively.



Figure 8: Setup for dissipation measurements.

#### 5.2 **Power Measurement**

Figure 8 gives a schematic diagram of the experimental setup we used for measuring the power dissipation of the IC chips. Measurements were taken on a 4-channel Tektronix digitizing oscilloscope (TDS754D) with high-speed active probes. The AC current measurements were made using a 1.0 GHz Tektronix differential active probe. To obtain accurate power measurements, we monitored the waveforms of the DC currents IVDD, IPC, IBN and IBP, the AC current  $i_{PC}$ , the DC voltages  $V_A$ ,  $V_{BN}$ , and  $V_{BP}$ , the AC voltage with DC bias  $V_{PC}$ , and the AC voltage without DC bias  $v_{PC}$ . We then calculated the per-cycle energy consumption of our multiplier chip,  $E_{cycle}$ , using the equation:

$$E_{\text{cycle}} = \int_{0}^{N \cdot T} \left( \begin{array}{c} (I_{\text{VDD}} \cdot V_{\text{dd}}/2 + (I_{\text{VDD}} + I_{\text{PC}}) \cdot V_{\text{dd}}/2) \\ + (v_{\text{PC}} \cdot i_{\text{PC}}) + (V_{\text{BP}} \cdot I_{\text{BP}}) + (V_{\text{BN}} \cdot I_{\text{BN}}) \end{array} \right) dt/N$$

where N is the number of measured cycles, and T is cycle time.

We measured the energy consumption per cycle of the multiplier and its self-test logic up to 130 MHz. Figure 9(a) shows the energy dissipation of the multiplier and the self-test logic in the frequency range of 40-130MHz with various PMOS and NMOS biasing voltages, for a fixed supply voltage and power-clock amplitude of 3V.



Figure 9: (a) Measured dissipation and (b) relative error with respect to simulation-based estimates.

(b)

100 110 120 130

90

Operating Frequency (MHz)

50 60 70 80

40

We compared these measurements with the HSPICE simulation results for the same operating frequencies, amplitude of sinusoidal power-clock, constant supply voltage, and PMOS/NMOS biasing voltages. The relative error between our simulations and experimental measurements is shown in Figure 9(b).

Measured dissipation correlated well with HSPICE simulation results under the same operating conditions, including operating frequency, power-clock amplitude, constant supply voltage, and biasing voltages. Figure 9(b) shows that the relative difference of percycle energy consumption between TDS754D measurement and HSPICE simulation is less than 20% below 100 MHz. The bandwidth limitations and combined parasitics of our test setup dramatically reduce power measurement accuracy above 100MHz.

#### 5.3 **Clock Generator Testing**

30

20

10

A separate test board was made for testing the clock generator, so that the parasitics associated with the power-clock node could be varied. We tried several different types of inductors, including a tunable choke, a surface mount inductor, a wire-wound resistor, and a short solder lead. Shown in Figure 10 are the waveforms of



Figure 10: Measured operation of clock generator, showing *a*,*b* and *PC* signals.

the clock generator operating correctly at 140MHz, using a surface mount inductor. The large sinusoid is the power-clock signal with an amplitude of 3.1V. The two other waveforms are buffered versions of the signals driving the PMOS and NMOS power switches. Because of test equipment and bandwidth limitations, we were unable to simultaneously check both the operation of the clock generator and the multiplier.

# 6. CONCLUSION

We presented the design and experimental evaluation of an 8-bit adiabatic multiplier with an internal single-phase sinusoidal powerclock generator fabricated in a  $0.5\mu$ m standard CMOS process. To provide design-for-test capability, our chip included built-in selftest circuitry based on built-in logic block observation. Both the multiplier and the self-test circuitry have been designed in SCAL-D, an adiabatic logic family that operates with a single-phase sinusoidal power-clock.

Several CAD tools were developed to aid with the design verification of our adiabatic circuits. Our tools were primarily focused on performing analysis tasks such as validation of signal levels and translations of circuit descriptions at different abstraction levels. We had no automated support for tuning our designs by transistor sizing or optimal selection of biasing voltages.

The correct operation of our design was validated experimentally for operating frequencies up to 130MHz, limited primarily by our test environment. Moreover, dissipation measurements correlated well with HSPICE simulation results. Our results suggest that for throughput-intensive applications, the adiabatic family SCAL-D presents a viable and attractive alternative to static CMOS for low-energy, high-speed electronic design.

Future research directions include faster single-phase low-energy logic families, automated circuit optimizations, and low-energy bus protocols. The investigation of algorithms for optimizing the performance of single-phase adiabatic families such as SCAL-D presents an important and promising direction of future research.

# 7. ACKNOWLEDGMENTS

This research was supported in part by the US Army Research Office under Grant No. DAAD19-99-1-0304 and an AASERT Grant No. DAAG55-97-1-0250.

### 8. **REFERENCES**

- W. C. Athas, L. J. Svensson, J. G. Koller, N. Tzartzanis, and Y. Chou. Low-power digital systems based on adiabatic-switching principles. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 2(4):398–406, Dec. 1994.
- [2] W. C. Athas, N. Tzartzanis, L. Svensson, L. Peterson, H. Li, X. Jiang, P. Wang, and W.-C. Liu. AC-1: A clock-powered microprocessor. In *Proceedings of International Symposium on Low-Power Electronics* and Design, pages 18–20, 1997.
- [3] J. S. Denker. A review of adiabatic computing. In Proceedings of the 1994 Symposium on Low Power Electronics/Digest of Technical Papers, pages 94–97, Oct. 1994.
- [4] H. Fujiwara. Logic testing and design for testability. MIT Press, 1985.
- [5] G. N. Hoyer and C. Sechen. Locally-clocked dynamic logic serial/parallel multiplier. In *Proceedings of the Custom Integrated Circuits Conference*, pages 481–484, 2000.
- [6] S. Kim and M. C. Papaefthymiou. True single-phase adiabatic circuitry. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 9(1), Feb. 2001.
- [7] S. Kim, C. H. Ziesler, and M. C. Papaefthymiou. Design, verification, and test of a true single-phase 8-bit adiabatic multiplier. In *Proceedings of 19th Conference on Advanced Research in VLSI*, Mar. 2001.
- [8] B. Koenemann, J. Mucha, and G. Zwiehoff. Built-in logic block observation techniques. In *Proceedings of the 1979 Test Conference*, pages 37–41, 1979.
- [9] A. Kramer, J. S. Denker, B. Flower, and J. Moroney. 2nd order adiabatic computation with 2N-2P and 2N-2N2P logic circuits. In 1995 International Symposium on Low Power Design, pages 191–196, 1995.
- [10] C. F. Law, S. S. Rofail, and K. S. Yeo. A low-power 16 × 16-b parallel multiplier utilizing pass-transistor logic. *IEEE Journal of Sold-State Circuits*, SC-34(10):1395–1399, Oct. 1999.
- [11] D. Maksimovic, V. G. Oklobdzija, B. Nikolic, and K. W. Current. Clocked CMOS adiabatic logic with integrated single-phase power-clock supply. *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, 8(4):460–463, Aug. 2000.
- [12] Y. Moon and D. Jeong. An efficient charge recovery logic circuit. IEEE Journal of Solid-State Circuits, SC-31(4):514–522, Apr. 1996.
- [13] V. G. Oklobdzija and D. Maksimovic. Pass-transistor adiabatic logic using single power-clock supply. *IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing*, 44(10):842–846, Oct. 1997.
- [14] J. Wang, P. Yang, and D. Sheng. Design of a 3-V 300-MHz low-power 8-b multiplied by 8-b pipelined multiplier using pulse-triggered TSPC flip-flops. *IEEE Journal of Solid-State Circuits*, SC-35(4):583–592, Apr. 2000.
- [15] P. Wayner. Silicon in reverse. BYTE, 19:67–71, Aug. 1994.
- [16] N. H. E. Weste and K. E. Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. Addison-Wesley Publishing Company, 1993.
- [17] S. G. Younis and T. F. Knight. Practical implementation of charge recovering asymptotically zero power CMOS. In *Proceedings of* 1993 Symposium on Integrated Systems, pages 234–250, 1993.