## 26.5 An 8-to-16Gb/s 0.65-to-1.05pJ/b 2-Tap Impedance-Modulated Voltage-Mode Transmitter with Fast Power-State Transitioning in 65nm CMOS

Young-Hoon Song<sup>1</sup>, Hae-Woong Yang<sup>1</sup>, Hao Li<sup>2</sup>, Patrick Yin Chiang<sup>2,3</sup>, Samuel Palermo<sup>1</sup>

<sup>1</sup>Texas A&M University, College Station, TX, <sup>2</sup>Oregon State University, Corvallis, OR, <sup>3</sup>Fudan University, Shanghai, China

Future processor I/Os must aggressively improve per-channel data-rates and energy efficiency to meet projected system bandwidth demands. These constraints necessitate the design of ultra-low-power serial-link transmitters that can efficiently incorporate equalization to compensate for channel losses, while enabling fast power-state transitioning to leverage dynamic power scaling. In this work, a scalable-data-rate voltage-mode transmitter is presented that introduces two main innovations. First, an impedance-modulated 2-tap equalizer is adopted that employs analog control of the equalizer taps, thereby obviating output driver segmentation. Second, fast power-state transitioning is achieved using a replica-biased voltage regulator to power the output stages of multiple channels and per-channel injection-locked oscillators (ILO) that can be rapidly disabled. Furthermore, capacitively driven low-swing global clock distribution and automatic phase calibration of the local ILO-generated quarter-rate clocks enables improved energy efficiency with aggressive supply scaling.

Figure 26.5.1 shows a conceptual diagram of our system, with 10 transmitter channels spanning across a 2mm distance. All transmitters share both a global regulator to set the nominal output swing, and two analog loops to set the driver output impedance during the maximum and de-emphasized levels of the 2-tap equalizer. In order to reduce dynamic clocking power, low-swing clocks are maintained throughout the global distribution and local generation of the four clocks used by the guarter-rate transmitters. A differential guarter-rate clock is distributed globally in a repeater-less manner via capacitively driven low-swing wires, while ac-coupled inverters with resistive feedback locally buffer these distributed clocks. Each ac-coupled inverter locally drives a two-stage ILO, producing four quadrature clocks that are shared by a two-channel bundle. ILO quadrature output phase spacing is improved by AC-coupling the injection clocks, adding dummy injection buffers, and optimizing the locking range via digital control of the injection-buffer drive strength. The ILO operating frequency is coarsely controlled via a dedicated power supply and finely set using the analog control voltage, EN\_VCTL. This analog control voltage can also be rapidly switched between GND and its nominal value, enabling fast power-up/shut-down of the clock signals.

For each transmitter channel (Fig. 26.5.2), eight parallel input bits are first serialized by an initial 8:4 mux, followed by two sets of 4:1 muxes that drive the main- and post-cursor paths of the 2-tap impedance-modulated voltage-mode driver. The serialized data passes through a level-shifting pre-driver [1] that boosts the voltage swing by a fully scalable supply value, DVDD, above the nominal NMOS threshold voltage, enabling reduced transistor sizing for a given impedance value. Closed-loop phase calibration is implemented to correct for deterministic jitter that arises due to the quadrature clocks' static phase errors and duty-cycle distortion. In calibration mode, the transmitter output for two complementary fixed patterns is sampled with a comparator clocked by an asynchronous 100MHz signal. First, the duty cycle is corrected by comparing the count value obtained for a "1100" output pattern and its complement, followed by an FSM that adjusts the P/N strength of the local clock buffers. Second, quadrature correction is realized by using a "1010" pattern and its complement, with the FSM then adjusting the relative delay of the buffers through capacitive tuning. At 16Gb/s operation, enabling this entire phase calibration loop improves the eye width variation from an uncorrected 13.1% to 5.4%, limited by nonlinearities in the duty-cycle tuning range.

The transmitter exhibits two operating modes to provide transmitter equalization at higher data rates, while dramatically scaling energy efficiency at lower data rates when equalization is not required. In equalization mode, a new impedance-modulation technique [2] is introduced in the all-NMOS output stage (Fig. 26.5.3). During a transition bit, the maximum output swing is achieved with nearly a 50 $\Omega$  output impedance, when both the higher-impedance single-transistor and 50 $\Omega$  two-transistor paths are activated in parallel. Analog control

of the impedance values allows for a non-segmented output stage, dramatically reducing pre-driver complexity and resulting in significant power savings. A global impedance loop sets the control voltages, VzceqUP and VzceqDN, of the top/bottom transistors of the two-transistor paths, realizing a  $50\Omega$  output impedance when combined in series with the two transistors that are switched by the main and post-cursor data bits. The area overhead of this effective three-transistor stack is minimized because the switch transistors see a large level-shifted overdrive voltage, VLS=DVDD+V<sub>thn</sub>, when turned on. Only the single-transistor pull-up/pull-down path is activated for run-lengths greater than one, with the de-emphasis level set by the control voltages, VzmeqUP and VzmeqDN, provided by the global de-emphasis impedance modulation loop. In non-equalization mode, the output stage is placed in a standard configuration with a single series impedance-control transistor in the pull-up/pull-down paths, where the control voltages, VzcUP and VzcDN, are provided by the global impedance control voltages.

A global voltage regulator with a replica output stage load (Figure 26.5.3) sets the output swing value and the transmitter output supply, VREG, enables amortization of the regulator power while providing a stable bias signal for fast power-state transitioning. A dual-supply topology is employed for the global regulator to improve accuracy and reduce power. The nominal 1V supply allows for a higher error amplifier gain, while a low 0.5V source-follower output stage provides a tunable output swing from 100 to  $300mV_{ppd}$ . 2.9ns power-state transitioning is achieved on a per-channel basis through staggered switching of the output-stage decoupling capacitance. Delayed enabling of the output-stage decoupling capacitance by ~550ps allows for rapid charging of VREG and minimal charge sharing when the decoupling capacitance is reapplied.

Figure 26.5.7 shows a die micrograph of the transmitter, fabricated in a GP 65nm CMOS process. While chip area constraints prevent a full 10-channel prototype, the concept is accurately emulated by placing a two-transmitter bundle at the end of a snaked on-chip 2mm clock distribution. Each transmitter channel occupies 0.006mm<sup>2</sup>, and the combined area of the injection-locked oscillator, global impedance control and modulation loop, bias circuitry, and voltage regulator is 0.014mm<sup>2</sup>. A channel consisting of a 5.8" FR4 trace and a 0.6m SMA cable, with 15.5dB loss at 8GHz, is used to characterize the transmitter (Fig. 26.5.4). Low-frequency output patterns with a peak  $300 \text{mV}_{\text{ood}}$ output swing verify the equalizer functionality up to the maximum 12dB setting. The transmitter transient performance at a maximum 16Gb/s data rate is verified for 27-1 PRBS eye diagrams, where a previously near-closed eye is opened to a 55mV height and 33.4ps width when the impedance-modulation equalization is enabled. As shown in Fig. 26.5.5, the transmitter achieves 8-to-16Gb/s operation at 0.65 to 1.05pJ/b energy efficiency by optimizing the transmitter's scalable supply and output swing for a minimum  $50mV_{\text{ppd}}$  eye height and 0.5UI eye width at the channel output. By routing the internal power-state-enable signal off-chip using a delay-matched channel, rapid power-state transitioning is verified with 0.5ns disable and 2.9ns enable times. Figure 26.5.6 shows the measured transmitter power breakdown and compares this work with other voltage-mode transmitters that incorporate either 2-tap equalization [2-3] or fast power-state transitioning [4-6].

## Acknowledgment:

This work was supported by the SRC-TxACE under grant 1836.060, Intel Labs Wireline Signaling Program, and a Department of Energy Early CAREER grant.

## References:

Y.-H. Song, *et al.*, "A 0.47-0.66 pJ/bit, 4.8-8 Gb/s I/O Transceiver in 65nm CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 5, pp. 1276-1289, May 2013.
R. Sredojevic, *et al.*, "Fully Digital Transmit Equalizer with Dynamic Impedance Modulation," *IEEE J. Solid-State Circuits*, vol. 46, no. 8, pp. 1857-1869, Aug. 2011.

[3] Yue Lu, *et al.*, "Design and Analysis of Energy-Efficient Reconfigurable Pre-Emphasis Voltage-Mode Transmitter," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1898-1909, Aug. 2013.

[4] B. Leibowitz, *et al.*, "A 4.3 GB/s Mobile Memory Interface With Power-Efficient Bandwidth Scaling," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 889-898, April 2010.

[5] J. Żerbe, *et al.*, "A 5.6 Gb/s 2.4 mW/Gb/s Bidirectional Link With 8ns Power-On," *in Symp. VLSI Circuits Dig. Tech. Papers*, pp. 82-83, June 2011.

[6] F. O'Mahony, *et al.*, "A 47×10Gb/s 1.4 mW/Gb/s Parallel Interface in 45nm CMOS," in *ISSCC Dig. Tech. Papers*, pp. 156-157, Feb. 2010.



Figure 26.5.5: Transmitter under fast power-down and start-up; energy efficiency versus data-rate with eye diagrams after 5.8" FR4+ 0.6m SMA cable.

Figure 26.5.6: Measured transmitter power breakdown at 16Gb/s and comparison to other voltage mode transmitters with 2-tap equalization and fast power-state transitioning.



