## 12.5 A 7Gb/s 9.3mW 2-Tap Current-Integrating DFE Receiver

Matt Park<sup>1</sup>, John Bulzacchelli<sup>2</sup>, Michael Beakes<sup>2</sup>, Daniel Friedman<sup>2</sup>

<sup>1</sup>Massachusetts Institute of Technology, Cambridge, MA <sup>2</sup>IBM T.J. Watson, Yorktown Heights, NY

Serial chip-to-chip communications at multi-Gb/s line rates can be hindered by limited channel bandwidth. A powerful technique for equalizing a channel is to use a DFE. The specific focus of this work is the development of a DFE with power efficiency high enough for server applications requiring thousands of I/O per chip.

A DFE often consumes a large portion of the receiver power budget due to the fast settling time needed at the DFE summing node where ISI cancellation occurs. In a typical analog current summer (Fig. 12.5.1, top), a dominant RC time-constant is formed by the load resistance and wiring and parasitic capacitances from the input stage, feedback taps, and slicer. The time constant for settling can be decreased by reducing the load resistance. However, to meet amplifier gain and voltage-swing requirements, the current must increase to compensate, resulting in a power penalty.

This paper describes a DFE circuit that achieves low power by replacing the standard summing amplifier with a resettable integrator (Fig. 12.5.1, bottom), which has no settling time requirement. At the beginning of the integration period, the capacitor voltages have been reset to the power supply. During the integration period (one UI long), charge is integrated on the capacitors and represents the sum of the currents from the input stage and the current switches which implement the DFE feedback (H1, H2,...HN). In contrast to the integrating DFE described in [1], the DFE feedback is implemented with switched-currents instead of a fixed switched-capacitor. Since the tap weights can be set by adjusting the tail currents with current DACs, this approach makes it easy to add tunability, which was not present in the design in [1]. The resistively degenerated input stage is designed to achieve a gain of roughly one with less than 1dB of compression for a  $600mV_{pp-diff}$  input. At the end of the integration period, the capacitor voltages are sampled by a slicer and then reset by PMOS switches. Note that either CMOS- or CML-level clock signals can be used, although CML clocks (as used in this prototype) require larger PMOS devices to achieve low on-resistance and fast reset time. Since charge is integrated onto parasitic capacitances only, large signals can be generated with small integrator bias currents due to the large I/C ratio. The technique is readily expandable to DFEs with more taps. While the extra capacitance incurred by adding more taps and wiring results in an I/C gain error, this can be compensated by slightly increasing the integrator currents, with a small cost in power dissipation.

Because the I/C ratio can vary due to process variations, and the integration time varies with data rate, the output common-mode voltage at the end of the integration period can be poorly controlled, potentially causing integrator nonlinearity. To compensate for this variability, the integrator bias currents are tuned using a calibration circuit (Fig. 12.5.2). The calibration is performed with a duplicate integrator with loads that are matched to the integrators in the signal path. Since only a common-mode signal is desired, the gates of the input stage are tied together, the degeneration resistor is eliminated, and the tap weight currents are set to zero. The common-mode drop is compared to a reference (200mV below the supply) with a clocked comparator. The comparator output is monitored, and the bias currents adjusted until the comparator operates close to metastability.

A block diagram of the complete current-integrating DFE is shown in Fig. 12.5.3. Since light to moderate equalization is needed for the intended application, a 2-tap equalizer is deemed sufficient for this prototype. As in prior work [2], a half-rate architecture employing speculation (loop unrolling) is adopted to relax the timing requirements of the logic and current switches in the feedback loop. The half-rate architecture has an added bonus in that it allows for the integrator bias currents to be shared by the odd and even halves: while one half is resetting, the other half is integrating. This reduces the power dissipation of the integrators by a factor of 2.

The H1 and H2 tap coefficients as well as an offset-cancellation current at the integrator outputs are tuned via on-chip thermometer-coded DACs with 4b resolution (programmed through a serial interface). The half-rate clocks are derived from an external full-rate clock via an on-chip divide-by-2 circuit. The prototype DFE is implemented in a 90nm bulk CMOS technology; a die micrograph with layouts of all major blocks superimposed can be seen in Fig. 12.5.7. The area of the DFE core (integrators, latches, clock buffers) is  $85 \times 65 \mu m^2$ , while the total DFE area is  $290 \times 65 \mu m^2$ , (dominated by DC biasing and DAC circuitry with non-optimized layouts).

To evaluate the inherent speed and equalization capability of the DFE, the prototype is tested with data transferred over a low-loss SMA cable (easy channel) and over a 16-inch Tyco channel with high levels of reflections and attenuation (hard channel). The DFE has an input sensitivity of  $63 \text{mV}_{\text{pp}}$  differential at 7Gb/s and achieves a peak speed of 8Gb/s over the easy channel (SMA cable). The speed is limited by degraded clock amplitudes, that are most likely due to parasitic capacitances that are higher than had been assumed in pre-extracted simulations. Evidence of this limitation is provided by increasing the clock driver bias, which resulted in higher-speed operation.

An S-parameter plot of the Tyco channel is shown in Fig. 12.5.4; the channel loss is 12dB at 3GHz and 15dB at 3.5GHz. With a 7Gb/s PRBS-7 pattern, the input-data eye is closed, but the DFE can equalize the channel with a 45% horizontal eve opening while achieving error-free operation (BER  $< 10^{-13}$ ) at eye center. The regenerated data and clock output waveforms at 7Gb/s as well as a bathtub curve detailing the eye opening as a function of BER are shown in Fig. 12.5.5. With a more complex PRBS-31 pattern (producing more severe combinations of ISI), the DFE can equalize the Tyco channel up to 6Gb/s, with 18% horizontal eye opening. A summary of measured results is shown in Fig. 12.5.6. The weak dependence of power dissipation on data rate indicates that the receiver power is dominated by the fixed-current CML stages and not the variable-current integration stages. When equalizing the Tyco channel at 7Gb/s, the DFE consumes 9.3mW from a 1V supply. Together, the 4 integrators draw <700µA, which illustrates the power savings achieved with current integration on parasitics. For sake of comparison, these integrators consume over an order of magnitude less power than the resistively loaded summers used in [2]. Even if the number of taps were increased to 5, to match that in [2], most of the power savings would remain.

## Acknowledgments:

This work was supported by MPO contract H98230-04-C-0920.

## References:

[1] S. Bae, H. Chi, Y. Sohn, et al., "A 2 Gb/s 2-Tap DFE Receiver for Mult-Drop Single-Ended Signaling Systems with Reduced Noise," *ISSCC Dig. Tech. Papers*, pp. 244-245, Feb., 2004.

[2] T. Beukema, M. Sorna, K. Selander, et al., "A 6.4 Gb/s CMOS SerDes Core with Feedforward and Decision-Feedback Equalization," *IEEE J. Solid-State Circuits*, pp. 2633-2645, Dec., 2005.



