# A 22-Gb/s PAM-4 Receiver in 90-nm CMOS SOI Technology

Thomas Toifl, Member, IEEE, Christian Menolfi, Member, IEEE, Michael Ruegg, Member, IEEE, Robert Reutemann, Member, IEEE, Peter Buchmann, Marcel Kossel, Member, IEEE, Thomas Morf, Member, IEEE, Jonas Weiss, Member, IEEE, and Martin L. Schmatz, Member, IEEE

Abstract—We report a receiver for four-level pulse-amplitude modulated (PAM-4) encoded data signals, which was measured to receive data at 22 Gb/s with a bit error rate (BER)  $< 10^{-12}$  at a maximum frequency deviation of 350 ppm and a  $2^7 - 1$  PRBS pattern. We propose a bit-sliced architecture for the data path, and a novel voltage shifting amplifier to introduce a programmable offset to the differential data signal. We present a novel method to characterize sampling latches and include them in the data path. A current-mode logic (CML) biasing scheme using programmable matched resistors limits the effect of process variations. The receiver also features a programmable signal termination, an analog equalizer and offset compensation for each sampling latch. The measured current consumption is 207 mA from a 1.1-V supply, and the active chip area is 0.12 mm<sup>2</sup>.

*Index Terms*—CML, CMOS, digital communication, latch, PAM-4, receiver, serial links, SOI.

## I. INTRODUCTION

S IGNALING over short distances—e.g., several centimeters on a multi-chip module (MCM) or between closely spaced single-chip modules (SCM) on a PCB board—is limited by the number of available IO pins. For a given band-limited channel, PAM-4 signaling [1]–[3] can be used to achieve higher data rates per pin at acceptable cost in chip area and power consumption.

The first PAM-4 receiver implementation for multigigabit communication in CMOS [1] achieved a data rate of 8 Gb/s in a 0.3- $\mu$ m CMOS technology and using a 3-V supply. There, the design incorporates an analog fractionally spaced single-tap feed-forward equalizer in the receiver. The circuit in [2] targeted the use of PAM-4 signalling for backplane communication at a data rate of 5 Gb/s. In order to open the data eye the authors propose the use of a feed-forward equalizer (FFE) in the transmitter together with a coding scheme which avoids the largest transitions in the PAM-4 eye. The receiver circuit in [3] achieves a data rate of 10 Gb/s with a 0.13- $\mu$ m CMOS technology. A combination of transmit equalization and

Manuscript received September 5, 2005; revised December 19, 2005.

M. Ruegg and R. Reutemann are with the Miromico AG, Zürich, Switzerland (e-mail: Michael.Ruegg@miromico.com; Robert.Reutemann@miromico. com).

Digital Object Identifier 10.1109/JSSC.2006.870898

V-Shifting Amplifier † error +∆V  $dh_{0-7}$ PRBS-Data in Sampling + Checker (22Gbit/s 1:8 Demux eh₀ PAM4) 16 ∤ data CDRdm<sub>0-7</sub> Programmable Sampling + Logic Termination clock ! 1:8 Demux 1.38 em<sub>0-7</sub> +ESD GHz \_ \_ \_ \_ \_ \_ -ĀV dl<sub>0-7</sub> Sampling + 14 1:8 Demux elc V<sub>cont</sub> <u>5.5G</u>Hz Phase Phase Div 2 tref Gen I/Q Gen Rotato 11GHz

Fig. 1. Architecture of the PAM-4 receiver.

decision feedback equalization (DFE) in the receiver is used to cope with channel attenuation and reflections.

A PAM-4 transmitter at even higher data rates was demonstrated using a 90-nm CMOS technology [4]. The design of the analog receiver circuits with these technologies is becoming challenging, mostly due to the small supply voltages and available voltage margins. This receiver design benefits from several key features. First, we propose a novel voltage-shifting amplifier, which allows programmable equalization in the data path. The biasing concept applied in all current-mode logic (CML) stages uses adjustable unit resistor cells, which allows reducing the effect of process variations. A fast and supply-insensitive differential-to-CMOS clock converter allows a rapid transition from the CML to the full-swing CMOS clock domain. Also, since timing margins are very small, a fully differential CML-style clock path is implemented to achieve low jitter. A single phase rotator running at the baud rate, followed by an I/Q divider, provides low-jitter clocks with high accuracy.

## II. RECEIVER ARCHITECTURE

For the implementation of the proposed PAM-4 receiver, a bit-slice approach was taken as shown in Fig. 1. The differential data inputs *din* and *dinb* are terminated to the supply via programmable termination resistors. Three identical demux slices sample the input data, one for each vertical offset voltage, as shown in Fig. 2. Each demux slice consists of a voltage shifting amplifier, two 1:4 demuxes (one for the data and one for the edge samples), and two 4:8 demuxes. Due to this architecture

T. Toifl, C. Menolfi, P. Buchmann, M. Kossel, T. Morf, J. Weiss, and M. Schmatz are with the IBM Zürich Research Laboratory, Säumerstrasse 4, CH-8803 Rüschlikon, Switzerland (e-mail: tto@zurich.ibm.com; cme@zurich.ibm.com; pbu@zurich.ibm.com; mko@zurich.ibm.com; tmr@zurich.ibm.com; jwe@zurich.ibm.com; mrt@zurich.ibm.com).



Fig. 2. PAM-4 eye diagram and the required sampling points.



Fig. 3. Existing solutions for voltage shifting amplifiers.

the voltage shifting operation can be shared by both the data and edge bit samplers. The three demux slices receive two half-rate (5.5 GHz) differential clock phases I and Q, where the I and Q phases are used for the edge and data samples, respectively. Synchronization signals originating in the middle slice are shared by the blocks to avoid phase ambiguity in the locally divided clocks of the three slices. A phase rotator, running at symbol rate (11 GHz) provides a programmable phase shift for the clock signal, which is fed to a divide-by-2 prescaler to generate the quadrature clock phases. The phase rotator requires six clock phases, which are generated in a voltage-controlled delay line.

Each demux slice delivers eight data bits and eight edge bits at its output. Hence, a total number of  $3 \times 16$  bits is fed to a digital logic block, which is running at 1/8 of the symbol rate. The data is first decoded according to the bit assignment of Fig. 2, where a binary enumeration of the levels is used for simplicity. The decoded data is fed to a pseudo-random bit sequence (PRBS)



Fig. 4. Schematic of the proposed voltage shifting amplifier.



Fig. 5. Transfer function of the voltage shifting amplifier without (solid) and with (dashed) capacitive source degeneration.

checker, which outputs a combined error signal for eight symbols. The sampled edge and data bits enter the clock and data recovery (CDR) unit which controls the phase rotator.

### **III. CIRCUIT IMPLEMENTATION**

#### A. Receiver Front-End

The differential data signal passes two differential amplifier stages before it is sampled in the latches of the receiver front-end. The first stage is a voltage shifting amplifier, which is used to add a programmable offset voltage. It also provides equalization by capacitive source degeneration [5], which can be switched either on or off to adapt for different channel transfer functions. In order to achieve acceptable common-mode and supply-noise rejection and to cope with the small supply voltage and high bandwidth requirements we propose a novel topology. The second amplification stage also features programmable equalization and in addition provides offset correction for the sampling latches. Each of the  $3 \times 4$ latches is fed by a separate amplification stage.

The requirements on the voltage shifting amplifier are the following. Besides providing the necessary voltage shift, gain and bandwidth, it should also offer programmable equalization with a transfer function independent of the setting of the offset voltage, as well as acceptable supply and common-mode voltage rejection. Fig. 3 displays previously used topologies.



Fig. 6. Combined effect of voltage shift and equalization. Eye diagram at the input (left) and output (right) of the amplifier.

The circuit in Fig. 3(a) is a double differential stage, which sums the input voltage and a constant offset voltage generated in a voltage DAC. The circuit in Fig. 3(b) is a double-mismatched differential pair [6], [7]. Here, the voltage offset is controlled by changing the ratio of the bias currents.

While the structure in Fig. 3(a) has excellent common-mode and supply-noise rejection, it displays lower DC gain since only half the current is used in the input transistors. Also, its bandwidth is lower since the second differential pair adds capacitance to the output node. Additionally, the voltage DACs require a large chip area. Fig. 4 depicts the proposed voltage shifting amplifier. It is based on the principle that the effective width of the transistor in the differential stage can be programmed by a digital value. While transistors  $M_{\rm 0b}$  to  $M_{\rm 7b}$  are connected to the input voltage, transistors  $M_{0a}$  to  $M_{7a}$  act as switches, controlled by binary weights w<0:7>. Hence, the circuit consisting of  $M_{0a,b}$  to  $M_{7a,b}$  can be regarded as a single transistor where the effective width can be adjusted. In the opposite branch on the right side of Fig. 4 all transistors are always switched on. The proposed circuit has similar gain, bandwidth and common-mode rejection as the topology in Fig. 3(b). In contrast to this topology, however, the proposed structure can be easily combined with capacitive source degeneration to extend the bandwidth of the amplifier or to provide equalization. This is due to the fact that in the proposed circuit the zero generated by the source degeneration is independent of the applied voltage shift.

The difference in the effective transistor size leads to a programmable voltage offset  $\Delta V$ , which—neglecting to first order the influence of the source degeneration resistors—is approximately given by

$$\Delta V = \left(V_{gs} - V_t\right) \left[\sqrt{\frac{w_{\max}}{w_x}} - 1\right] \tag{1}$$

where  $V_{gs} - V_t$  is the gate overdrive of the maximum width transistor,  $w_{\text{max}}$  is the maximum gate width (all sub-transistors switched on), and  $w_x$  is the programmed gate width.

Fig. 5 displays the transfer function of the voltage shifting amplifier. The solid curve corresponds to the case without capacitive source degeneration, while the dashed curve represents



Fig. 7. Schematic used to derive small-signal gains.

the case when the amplifier is used in equalization mode. Fig. 6 displays simulated waveforms at the input and the output of the amplifier for the case of 200-mV voltage offset. For the simulation, the channel was described by measured S-parameter data of an 8-inch FR-4 board, together with 500 fF of load capacitance at the termination on both the transmit and receive side. Also, light transmit equalization with FFE coefficients  $[-0.02\ 0.9-0.05-0.03]$  was applied on the transmit side.

Since the amplifier is an asymmetric circuit, its response to changes in the supply and common-mode voltages has to be carefully analyzed. Referencing Fig. 7, the common-mode gain, differential gain, gain with respect to vdd, and ground, defined as

$$A_{cc} = \frac{vout}{(vin + vinb)/2} \tag{2}$$

$$A_{dd} = \frac{vout}{(vin - vinb)} \tag{3}$$

$$A_{vdd} = \frac{vout}{\Delta vdd} \tag{4}$$

$$A_{vss} = \frac{vout}{\Delta vss} \tag{5}$$



Fig. 8. Voltage shifting amplifier with improved common-mode rejection.

respectively, are given by

$$A_{cc} = \frac{R_L}{R_D} \left[ 2(\gamma_1 - \gamma_2) + g_{ds} \left[ R_L(\gamma_1 - \gamma_2) + (\gamma_1 R_{d2} - \gamma_2 R_{d1}) \right] + 2 \left[ \gamma_1 (1 + \gamma_2) \xi_2 - \gamma_2 (1 + \gamma_1) \xi_1 \right] \right]$$
(6)

$$\begin{array}{l}
 A_{dd} \\
 = \frac{R_L}{R_D} \left[ \gamma_1 (1+\gamma_2)(1+\xi_2) + \gamma_2 (1+\gamma_1)(1+\xi_1) \\
 + \left(\frac{g_{ds}}{2}\right) \left[ \gamma_1 (R_L + R_{d2}) + \gamma_2 (R_L + R_{d1}) \right] \right] \quad (7)$$

 $A_{vdd}$ 

$$= \frac{R_L}{R_D} \left[ 2(\gamma_1 - \gamma_2) + g_{ds} \left[ \xi_2 (1 + \gamma_2) (R_L + R_{d1}) -\xi_1 (1 + \gamma_1) (R_L + R_{d2}) \right] + 2 \left[ \gamma_1 (1 + \gamma_2) \xi_2 - \gamma_2 (1 + \gamma_1) \xi_1 \right] \right]$$
(8)

 $A_{vss}$ 

$$= \frac{R_L}{R_D} \left[ g_{ds} \left[ (1 + \gamma_1 + \xi_1 + \gamma_1 \xi_1) (R_L + R_{d2}) - (1 + \gamma_2 + \xi_2 + \gamma_2 \xi_2) (R_L + R_{d1}) \right] \right]$$
(9)

where  $\gamma_1 = g_{m1}/g_{ds1}$ ,  $\gamma_2 = g_{m2}/g_{ds2}$ ,  $\xi_1 = g_{mx1}/g_{dsx1}$ ,  $\xi_2 = g_{mx2}/g_{dsx2}$ , and

$$R_D = (R_L + R_{d1})(1 + \gamma_2 + \xi_2 + \gamma_2\xi_2) + (R_L + R_{d2})(1 + \gamma_1 + \xi_1 + \gamma_1\xi_1) + g_{ds}(R_L + R_{d1})(R_L + R_{d2})$$
(10)

$$\frac{1}{R_{d1}} = \frac{g_{ds1}}{1 + (g_{m1} + g_{ds1}) \left[ \left( R_0 + \frac{1}{g_{dsx1}} \right) + \frac{R_0 g_{mx1}}{g_{dsx1}} \right) \right]}$$
(11)

$$\frac{1}{R_{d2}} = \frac{g_{ds2}}{1 + (g_{m1} + g_{ds2}) \left[ \left( R_0 + \frac{1}{g_{dsx2}} \right) + \frac{R_0 g_{mx2}}{g_{dsx2}} \right) \right]}.$$
 (12)

Since the input signals are referenced to  $V_{dd}$  it is most important to guarantee sufficient immunity to supply noise on the ground node. The power supply rejection with respect to ground  $A_{dd}/A_{vss}$  is thus given by (13), shown at the bottom of the page, which is 20 dB in the current design for 200 mV of voltage shift. The common-mode rejection  $A_{dd}/A_{cc}$  results in (14), shown at the bottom of the page. In the present implementation, the common-mode rejection of the current implementation is 14 dB, which might be too low for a link where the common-mode level is modulated for back channel communication. The commonmode rejection can however be improved by 10 dB by matching the on-resistances of the switches in the variable width transistor, as shown in Fig. 8. Here, the combined series resistance of the switch transistors  $M_{0a}$ - $M_{7a}$  in the left branch matches the combined series resistance of  $M'_{0a}$ - $M'_{7a}$  in the right branch. Since this results in a more balanced configuration ( $\xi_1 = \xi_2$ ), the contribution of the third term in the denominator of (14) is minimized.

There exists one voltage shifting amplifier per slice, which feeds its output voltage into four second amplifiers, one for each of the  $2\times 2$  half-rate edge and data sampling latches per slice. The second amplification stage, displayed in Fig. 9, provides additional amplification and equalization, and also allows compensating the offsets of the individual sampling latches. This is

$$SNRG = \frac{\gamma_1(1+\gamma_2)(1+\xi_2) + \gamma_2(1+\gamma_1)(1+\xi_1) + \left(\frac{g_{ds}}{2}\right) \left[\gamma_1(R_L + R_{d2}) + \gamma_2(R_L + R_{d1})\right]}{g_{ds} \left[(1+\gamma_1 + \xi_1 + \gamma_1\xi_1)(R_L + R_{d2}) - (1+\gamma_2 + \xi_2 + \gamma_2\xi_2)(R_L + R_{d1})\right]}$$
(13)

$$CMRR = \frac{\gamma_1(1+\gamma_2)(1+\xi_2) + \gamma_2(1+\gamma_1)(1+\xi_1) + \left(\frac{g_{ds}}{2}\right) \left[\gamma_1(R_L+R_{d2}) + \gamma_2(R_L+R_{d1})\right]}{2(\gamma_1-\gamma_2) + g_{ds} \left[R_L(\gamma_1-\gamma_2) + (\gamma_1R_{d2} - \gamma_2R_{d1})\right] + 2\left[\gamma_1(1+\gamma_2)\xi_2 - \gamma_2(1+\gamma_1)\xi_1\right]}$$
(14)



Fig. 9. Schematic of the second amplification stage providing per-latch offset compensation.



Fig. 10. Proposed sampling latch model.

achieved by adding small currents into one of the differential branches. The resolution of the offset compensation is 3 + 1 sign bit, which covers a range of  $\pm 25$  mV.

## B. Sampling Latch Characterization

The sampling latch marks the interface between the analog and digital domains. It functions as a regenerative amplifier, which samples the input signal at a certain time instant and then decides if the voltage at its input is below or above a threshold voltage. For digital applications, latches are usually described by their setup and hold times together with the latch delay. For the case of a sampling latch for analog applications, such as serial link receivers, latches are characterized by their sensitivity and bandwidth.

In order to be able to include the latch characteristic in the transfer function of the data path and to accurately compare different latch types we propose to characterize latches by the model shown in Fig. 10. In the proposed model, the latch is divided into a linear front-end, which is described by a latch sensitivity function  $h_s(t)$ , followed by an ideal sampler and binary slicer. The symbol after the slicer  $s_n$ , multiplied by a voltage factor  $V_s$ , is fed back and shifts the threshold of the slicer. The voltage  $V_s$  is equivalent to the latch sensitivity for DC input signals. The sensitivity function  $h_s(t)$ , which is normalized such that

$$\int_{T_1}^{T_2} h_s(\tau) d\tau = 1 \tag{15}$$

defines the time resolution of the latch.  $T_1$  and  $T_2$  mark the limits of the sensitivity window. The normalized latch transfer function  $H_{s,n}(\omega)$  can then be derived by taking the Fourier transform of  $h_s(t)$ .



Fig. 11. Comparison of (a) CML-sampling latch and (b) SenseAmp-style latch for analog sampling.

The input signal v(t) is folded with  $h_s(t)$  before being sampled at t = kT. The latch flips if

$$\int_{T_1}^{T_2} v(\tau) h_s(\tau) d\tau > V_s \tag{16}$$

A measurement procedure to derive  $h_s(t)$  from simulations is given in the Appendix. Using this procedure, the latch sensitivity function  $h_s(t)$  and its DC sensitivity  $V_s$  was extracted for the two choices of sampling latches shown in Fig. 11, a CML-type latch and a SenseAmp latch [8]. For a fair comparison, the entire CML data path was included in the simulation, which consists of two CML latches followed by a SenseAmp latch, as shown in Fig. 12. A clock rise time of 30 ps was assumed in both cases.

The latch sensitivity function for the two cases is displayed in Fig. 13(a). Interestingly, the sensitivity window of the SenseAmp latch is smaller than in the CML case, indicating superior time resolution capability. The resulting DC sensitivities, however, are 2.6 mV for the CML case and 8.2 mV for the SenseAmp case, which is equivalent to 20 \* log 10(8.2/2.6) = 10 dB more gain in the signal path for the CML case. Defining a target sensitivity value  $V_{s,\text{target}}$  allows to derive a latch transfer function  $H_s(\omega) = H_{s,n}(\omega)(V_{s,\text{target}}/V_s)$ , which combines gain and





Case B : Sampling with SenseAmp Latch

Fig. 12. Data paths for latch comparison.



Fig. 13. (a) Latch sensitivity function for CML latch (solid) and SenseAmp latch (dotted). (b) Latch transfer function for CML latch (solid) and SenseAmp latch (dotted) at a target sensitivity of 5 mV.

frequency dependence of the latch in a single function, and can thus be included in simulations of the entire signal path. Fig. 13(b) compares  $H_s(\omega)$  for a target sensitivity of 5 mV. It can be seen that the bandwidth of the SenseAmp latch is 14 GHz, while the bandwidth of the CML latch amounts to 10 GHz, albeit at higher equivalent gain.

Although SenseAmp latches provide high input bandwidth and consume far less power (200  $\mu$ A) than the CML configuration (~2 mA), CML latches are used in the data path of the PAM-4 receiver because of their superior sensitivity and higher immunity to power supply variations.

# C. Demux

Fig. 14 displays the sampling and 1:4 demultiplexing unit for one bit slice. The signal from the voltage shifting amplifier is distributed to the second amplification stage, which also provides the offset correction for the subsequent latches. The CML latches are clocked by the half-rate (5.5 GHz) differential clocks  $\phi_{2i}/\phi_{2i}$  and  $\phi_{2q}/\phi_{2q}$  for the edge and data samples, respectively. The output data is then converted to full-swing CMOS levels by the subsequent SenseAmp latches (denoted SAL). By running the SenseAmp latches also at half the baud rate, the 2:4 demultiplexing step can be implemented with standard latches from the digital library (denoted FF in Fig. 14), thus saving power and reducing latency when compared to a solution with another stage of CML-latches. This requires, however, that the clock generator derives the full-swing CMOS clocks  $\phi_{2ix}$  and  $\phi_{2qx}$  from the differential CML clocks  $\phi_{2i}$  and  $\phi_{2q}$ at 5.5 GHz with accurately defined timing, even in the presence of power-supply variations. The differential to full-swing converter, shown in Fig. 15, consists of a differential amplifier, which boosts the signal swing, followed by capacitively coupled inverters [9]. Using this structure, the simulated timing deviation in the worst case process corner for a 100-mV and 100-MHz square-wave power supply variation is  $\pm 12$  ps. The four edge samples  $e_0-e_3$  and data samples  $d_0-d_3$  are further de-multiplexed to octal rate in a subsequent stage not shown in the figure.

#### D. CML Biasing Concept With Variable Resistor

The bias generator for the CML stages in the demux (clock buffers, amplifiers, latches) is shown in Fig. 16. The commonmode voltage drop over the load resistor is regulated via an OpAmp to  $0.3 \times V_{dd}$ . Since bandwidth and power consumption of all CML stages linearly depend on the value of the load resistor, all resistors in the CML stages (and also the bias generator) were replaced by adjustable unit load resistors. This offers the possibility to adjust the resistor value in two steps (either high or low resistance) after fabrication, making the design less dependent on process variations of the resistor.

The value of the resistance R is specified to deviate from its nominal value  $R_0$  due to chip-wide variations of manufacturing tolerance parameters  $T_m$  on the one hand, and nonmatching variations and changes in temperature  $T_{nmat}$  on the other hand. Hence, its actual value is given by

$$R = R_0(1+\alpha)(1+\beta) \tag{17}$$

with  $|\alpha| \leq T_m, |\beta| \leq T_{nmat}$ .

The resistance in the CML stages can be switched by shunting R with an auxiliary resistance  $R_{\rm p}$ . Hence, when the shunt resistor is applied, the load resistance becomes

$$R = k_r R_0 (1 + \alpha) (1 + \beta)$$
(18)

with resistance reduction factor  $k_r = (R_0||R_p)/R_0$ . Since the receiver should not be disturbed by switching values during operation, the shunt path is either switched on or off only at startup.



Fig. 14. Architecture of the 1:4 demux slice.



Fig. 15. Differential to single-ended converter.



choosing one of the two ranges, the influence of the manufacturing tolerance  $T_m$  can be approximately halved. The actual value of the manufacturing tolerance  $\alpha$  can be determined after production, and compared to a threshold  $\alpha_T$ , which defines if the upper or lower resistive range should be used. This defines two intervals  $I_{p,\text{off}}$  and  $I_{p,\text{on}}$  of resistance values for the two cases:

$$I_{p,\text{off}} = [R_0(1 - T_m)(1 - T_{nmat}) < R < R_0(1 + \alpha_T)(1 + T_{nmat})]$$
(19)

$$I_{p,\text{on}} = [k_r R_0 (1 + \alpha_T) (1 - T_{nmat}) < R < k_r R_0 (1 + T_M) (1 + T_{nmat})]$$
(20)

In order to derive the optimum values of  $\alpha_T$  and  $k_r$  we note that in this case the lower and upper boundaries of the two intervals should be identical. Solving for  $\alpha_T$  and  $k_r$  results in

$$k_{r,\text{opt}} = \sqrt{\frac{1 - T_m}{1 + T_m}} \tag{21}$$

$$\alpha_{T,\text{opt}} = \frac{1 - T_m - k_{r,\text{opt}}}{k_{r,\text{opt}}}.$$
(22)

Fig. 16. Biasing concept with variable resistor.

Hence, the circuit has to be designed to take full account of the nonmatching and temperature variations. On the other hand, by

For  $T_{\rm m} = 20\%$  and  $T_{nmat} = 10\%$ , this results in  $k_{r,{\rm opt}} = 0.82$  and  $\alpha_{T,{\rm opt}} = -0.02$ . In this case, the overall tolerance is reduced from 30% to 19.9%.



Fig. 17. Delay line with capacitive source degeneration.



Fig. 18. Phase rotator implementation.



Fig. 19. Data flow in the digital CDR logic.

#### E. Delay Line and Phase Rotator

In order to guarantee a precise timing relationship between the quadrature clocks delivered to the sampling front-end, a single phase rotator is used, running at the baud rate (11 GHz). Precise half-rate clocks are then generated in the I/Q generator/divider after the phase rotator. Six clock phases, which are generated by the voltage-controlled delay line shown in Fig. 17, are fed to the phase rotator. A resistor is used in parallel to the tunable pMOS devices in order to speed up the circuit. The phase rotator, shown in Fig. 18, consists of a phase selection stage followed by a phase interpolation stage [10]. All stages use a fully differential CML-style circuit topology. The first stage selects two clock phases from two adjacent phase sextants. Using six clock phases provides a good compromise between complexity and phase linearity. The phase interpolator, which blends the two selected phases, is controlled by an 8-bit thermometer-coded value. Hence, a total number of 48 phase steps are provided, resulting in a nominal timing resolution of 1.9 ps.

## F. CDR Logic

The CDR logic is displayed in Fig. 19. In order to derive the edge information, all minor transitions (0-1, 1-2, 2-3) and the major transition (0-3) are used. It can be verified from Fig. 2 that the crossing point for these transitions are identical to the midpoint of the data samples.

The CDR logic is running at 1/8 of the baud rate (1.375 GHz). It receives  $3 \times 16$  data and edge bits from the three analog slices, In a first step, the data and edge bits are transformed into an early and late vector, each of 8-bit length. Secondly, majority voting is applied in groups of four bits. The resulting early and late vectors, each two bits long, are then combined in a second level majority voting step to arrive at a single bit early or late information. This information is the input to a digital loop filter, which then decides if the value of the phase should be increased, decreased or left constant. A phase rotator signal generator encodes the 14 control signals used in the phase rotator.

## **IV. MEASUREMENT RESULTS**

The circuit has been fabricated in a 90-nm partially depleted digital CMOS SOI technology [11]. The test chip also contains a shift register to provide digital settings, and two inverter-based output buffers. An internal multiplexer allows to select the recovered clock, the undecoded latch outputs, the demultiplexed data signals, or derived signals, such as the bit error indicator, to be sent to the two output drivers.

The circuit was tested on-wafer with power-ground-signalground-signal-ground-power (P-G-S-G-S-G-P) probes. Fig. 20 displays the resulting offset versus the programmed value of the voltage-shifting amplifier. For a voltage offset of 200 mV, one bit corresponds to approximately 4 mV. The measured offset voltage for latch offset compensation versus programmed value is shown in Fig. 21.

For jitter tolerance tests, the input data was generated by an Anritsu 1775A four-channel bit pattern generator, where 4 channels were resisitively combined in order to generate a differential PAM-4 signal. The input eye as measured at the generator output for the  $2^7-1$  PRBS is depicted in Fig. 22. Although the overall differential eye amplitude is 650 mV p-p, the effective differential eye opening is about 95 mV vertically and 35 ps horizontally.

No special precoding was applied, so transitions between all PAM-4 levels are present. The output of the bit error signal was monitored in order to count the detected bit errors. For this test, the offset value for the voltage-shifting amplifier was set to 215 mV. The corresponding binary value w<0:7> was derived



Fig. 20. Measured offset voltage versus programmed value of the V-shifting amplifier. Global characteristic (left) and detail (right).



Fig. 21. Measured latch offset compensation capability.



Fig. 22. PAM-4 eye diagram of the input data.

from the previous measurement of the offset voltage characteristic.

For the per-latch offset adjustment, the pattern generator was switched off. Then, for each of the 12 sampling latches, an automatic adaptation procedure was performed. The on-chip output multiplexer was programmed in order to monitor the unencoded output of a specific sampling latch. This output was fed on an oscilloscope, which was read out via a GPIB bus connected to a controlling PC. The digital latch offset value was then increased or decreased until the latch flipped.

Fig. 23 displays the measured jitter tolerance curve for a bit error rate (BER) of  $10^{-12}$ . At the same error rate, the maximum acceptable frequency offset was measured to be 350 ppm.



Fig. 23. Jitter tolerance at BER =  $10^{-12}$ .



Fig. 24. Recovered clock from input data with 350 ppm frequency offset between data and receiver clock.

 TABLE I

 MEASURED JITTER UNDER DIFFERENT CONDITIONS

|                    | RMS Jitter [ps] | P-P Jitter [ps] |
|--------------------|-----------------|-----------------|
| Fixed clock        | 1.44            | 12.0            |
| CDR on             | 1.64            | 13.3            |
| $\Delta f=350 ppm$ | 1.97            | 15.6            |
| 1 MHz 1 UI jitter  | 3.95            | 24.0            |

Fig. 24 depicts the recovered 1/8 rate clock at a 350 ppm frequency offset. Jitter of the recovered clock under different stress conditions was measured with an Agilent 86100 scope, which is summarized in Table I. Fig. 25 depicts the demultiplexed output data.

The measured current consumption is 207 mA from a 1.1-V supply, which corresponds to a power consumption of 10.4 mW/Gb/s. Of the total current, 190 mA are consumed in the analog part, and 17 mA in the digital sections (4:8 demux, CDR logic, PRBS checker). The high-speed data input is protected with ESD diodes. Fig. 26 displays the layout of the test chip. The overall size of the chip is 1 mm<sup>2</sup>, of which the active



Fig. 25. Demultiplexed output data.



Fig. 26. Layout of the PAM-4 receiver.

components (including  $0.03 \text{ mm}^2$  of debugging logic) occupy an area of  $0.12 \text{ mm}^2$ .

## V. CONCLUSION

In this paper, we presented the architecture and the implementation of a PAM-4 receiver in 90-nm CMOS-SOI technology, capable of performing clock and data recovery at a data rate of 22 Gb/s.

We proposed a bit-slice architecture for the data path, where the voltage-shifter is shared by both the data and the edge sampling latches. A novel topology for a voltage shifting amplifier achieves the necessary high bandwidth and gain with acceptable power supply noise rejection.

The entire clock path, from the input clock to the sampling latches, is implemented with CML stages to achieve good power supply noise rejection. A single phase rotator followed by an I/Q divider assures accurate timing in the sampling front-end. To



Fig. 27. Signal diagram for latch transfer function extraction procedure.

save power, the use of CML sampling latches is limited to the first sampling stage, where accurate timing is crucial. A capacitively coupled differential to CMOS converter allows a rapid transition from the differential CML clock domain to the CMOS logic swing domain.

The proposed CML biasing scheme is based on a unit resistor cell, which allows adjusting the resistance in all CML stages after fabrication, and hence reduces the effective parameter tolerance.

We have proposed a new method for the characterization of sampling latches which captures the properties of the latch in a latch sensitivity function  $h_s(t)$  and a DC sensitivity voltage  $V_s$ . Using this mode, latches of different topologies or different options for the data path can be easily compared. By dividing the latch in a linear front-end and a nonlinear decision-directed back-end, the dynamic properties of the latch (i.e., its sensitivity function) can be accurately included in the data path.

# APPENDIX EXTRACTION PROCEDURE FOR LATCH SENSITIVITY FUNCTION

In order to extract the latch sensitivity function  $h_s(t)$  from simulations we used the following procedure, which can also be applied to the entire data path. Referencing Fig. 27, the device under test is provided with a clock signal lclk, and an input voltage signal vi(t), which both can be either differential or single-ended. The voltage at the latch output vo(t) is evaluated.

The simulation is organized in three cycles. In the first cycle, denoted RESET, the latch is put in state zero by applying a reset voltage  $-V_{\text{reset}}$ . In the next cycle, MEASURE in Fig. 27, the input signal to the latch is zero except for a short probe voltage pulse of amplitude A, width  $t_w$ , and time offset  $\Delta t$  with respect to the latch clock *lclk*. The latch detects the signal correctly, if the output signal of the latch vo(t) crosses the midrail voltage  $V_{\rm DD}/2$  before the required time  $t_{\rm eval}$  in the third cycle EVAL. For a given time offset  $\Delta t$ , the amplitude  $A(\Delta t)$  which is just sufficient to flip the latch can now be found. This is done by repeating the measurement cycles with constant  $\Delta t$  and adjusting  $A(\Delta t)$  with a binary search algorithm. Stepping the values of  $\Delta t$  inside the sensitivity window then results in the function  $A(\Delta t)$ . The described algorithm was implemented in a Spectre/VerilogA module, which allows automatic extraction of  $A(\Delta t)$  in a few minutes.

Now knowing  $A(\Delta t)$ , the sensitivity function can be derived. The measurement procedure above finds  $A(\Delta t)$  such that

$$A(\Delta t) \int_{T_1}^{T_2} h_s(\tau) rect(t-\tau, t_w) d\tau = V_s$$
(23)

where  $h_s(t)$  is the latch sensitivity function,  $T_1$  and  $T_2$  are the boundaries of the integration window, and  $rect(t, t_w)$  denotes a rectangular pulse of width  $t_w$  centered at t and unity height. Rewriting (23) results in

$$h_s(\tau) * rect(\Delta t - \tau, t_w) = \frac{V_s}{A(\Delta t)}$$
(24)

from which  $h_s(t)$  can be extracted by a deconvolution. For sufficiently small values of  $t_w$ 

$$h_s(\tau) * rect(\Delta t - \tau, t_w) \cong t_w h_s(\tau)$$
(25)

and hence

$$h_s(\tau) \cong t_w \frac{V_s}{A(\Delta t)}.$$
(26)

The value of  $V_s$  follows from the normalization condition

$$\int_{T_1}^{T_2} h_s(\tau) d\tau = 1.$$
 (27)

With

$$\int_{T_1}^{T_2} h_s(\tau) d\tau = \sum_n \int h_s(\tau) rect(nt_w - \tau, t_w) d\tau \qquad (28)$$

and using (4)

$$V_s = \frac{1}{\sum\limits_n \frac{1}{A(nt_w)}}.$$
(29)

#### ACKNOWLEDGMENT

The authors would like to thank Dr. L. Wagner of IBM Hopewell Junction, NY, for device-modeling support, Dr. N. Zamdmer of IBM Fishkill, NY, for valuable discussions concerning SOI device questions, and the IBM Fishkill foundry team, Hopewell Junction, NY.

#### REFERENCES

- R. Farjad-Rad, C. Yang, M. A. Horowitz, and T. H. Lee, "A 0.3-μm CMOS 8-Gb/s 4-PAM serial link transceiver," *IEEE J. Solid-State Circuits*, vol. 35, no. 5, pp. 757–764, May 2000.
- [2] J. Sonntag, J. Stonick, J. Gorecki, B. Beale, B. Check, X. Gong, J. Guiliano, K. Lee, B. Lefferts, D. Martin, U. Moon, A. Sengir, S. Titus, G. Wei, D. Weinlader, and Y. Yang, "An adaptive PAM-4 5 Gb/s backplane transceiver in 0.25 μm CMOS," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, May 2002, pp. 363–366.

- [3] J. L. Zerbe, C. W. Werner, V. Stojanovic, F. Chen, J. Wei, G. Tsang, D. Kim, W. F. Stonecypher, A. Ho, T. P. Thrush, R. T. Kollipara, M. A. Horowitz, and K. S. Donnelly, "Equalization and clock recovery for a 2.5–10-Gb/s 2-PAM/4-PAM backplane transceiver cell," *IEEE J. Solid-State Circuits*, vol. 38, no. 12, pp. 2121–2130, Dec. 2003.
- [4] C. Menolfi, T. Toifl, R. Reutemann, M. Ruegg, P. Buchmann, M. Kossel, T. Morf, and M. Schmatz, "A 25 Gb/s PAM4 transmitter in 90 nm CMOS SOI technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, San Francisco, CA, Feb. 2005, vol. 48.
- [5] R. Farjad-Rad, H. Ng, M. E. Lee, R. Senthinathan, W. J. Dally, A. Nguyen, R. Rathi, J. Poulton, J. Edmondson, J. Tran, and H. Yazdanmehr, "0.622–8.0 Gb/s 150 mW serial IO macrocell with fully flexible preemphasis and equalization," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2003, vol. 17, pp. 63–66.
- [6] A. Martin, B. Casper, J. Kennedy, J. Jaussi, and R. Mooney, "8 Gb/s differential simultaneous bidirectional link with 4 mV 9 ps waveform capture diagnostic capability," in *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) Dig. Tech. Papers, San Francisco, CA, Feb. 2003, vol. 46, pp. 478–479.
- [7] B. Garlepp, A. Ho, V. Stojanovíc, F. Chen, C. Werner, G. Tsang, T. Thrush, A. Agarwal, and J. Zerbe, "A 1–10 Gb/s PAM2, PAM4, PAM2 partial response receiver analog front-end with dynamic sampler swapping capability for backplane serial communications," in *Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2005, vol. 19, pp. 376–379.
- [8] T. Kobayashi, K. Nogami, T. Shirotori, and Y. Fujimoto, "A currentcontrolled latch sense amplifier and a static power-saving input buffer for low-power architecture," *IEEE J. Solid-State Circuits*, vol. 28, no. 4, pp. 523–527, Apr. 1993.
- [9] J. Savoj and B. Razavi, "A CMOS interface circuit for detection of 1.2 Gb/s RZ data," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 1999, vol. 42, pp. 278–279.
- [10] S. Sidiropoulos and M. Horowitz, "A semi-digital dual delay-locked loop," *IEEE J. Solid-State Circuits*, vol. 32, no. 11, pp. 1683–1692, Nov. 1997.
- [11] M. Khare *et al.*, "A high performance 90 nm SOI technology with 0.992 μm<sup>2</sup> 6T-SRAM cell," in *IEDM Tech. Dig.*, 2002, pp. 407–410.



**Thomas Toifl** (S'97–M'99) received the Dipl.-Ing. (M.S.) and Ph.D. (*sub auspiciis praesidentis rei publicae*) degrees from Vienna University of Technology, Austria, in 1995 and 1999, respectively.

In 1996, he joined the Microelectronics Group of the European Research Center for Particle Physics (CERN), Geneva, Switzerland, where he was working on radiation-hard integrated circuits for particle physics detectors. There, he developed circuits for detector synchronization and transmission of detector data, which were integrated in the four

particle detector systems of the new Large Hadron Collider (LHC). In 2001, he joined the IBM Research Laboratory in Rüschlikon, Switzerland, where since then he has been working on multigigabit, low-power communication circuits in advanced CMOS technologies.

Dr. Toifl received the Beatrice Winner Award for Editorial Excellence at the 2005 IEEE International Solid-State Circuits Conference (ISSCC).



**Christian Menolfi** (S'97–M'99) was born in St. Gallen, Switzerland, in 1967. He received the Dipl. Ing. degree and the Ph.D degree in electrical engineering from the Swiss Federal Institute of Technology (ETH), Zürich, in 1993 and 2000, respectively.

From 1993 to 2000, he was with the Integrated Systems Laboratory, ETH Zürich, as a Research Assistant, where he worked on highly sensitive CMOS VLSI data acquisition circuits for silicon based microsensors. Since September 2000, he has

been with the IBM Zürich Research Laboratory, Rüschlikon, Switzerland, where he has been involved with multigigabit low-power communication circuits in advanced CMOS technologies.



**Michael Ruegg** received the Dipl. Ing. degree in electrical engineering from the Swiss Federal Institute of Technology (ETH), Zürich, in 1997.

From 1997 to 1999, he was with Siemens Semiconductors working as an Analog IC Design Engineer on analog front-ends for video applications and GSM phones. In 1999, he joined Infineon Technologies, CA, where he designed low-jitter phase-locked loops for DVD and hard disk read/write-channels. He is the inventor or co-inventor of several patents in this area. In 2002, he co-founded Miromico AG, Zürich,

Switzerland, a spin-off of ETH Zürich, focusing on analog and mixed-signal IC design services.



Thomas Morf was born on April 4, 1961, in Zürich, Switzerland. He received the B.S. degree from Winterthur Polytechnic Switzerland in 1987, and the M.S. degree in electrical engineering from the University of California at Santa Barbara (UCSB) in 1991. From 1989 through 1991, he worked as a Research Assistant at UCSB, performing research in the field of active microwave inductors and digital GaAs circuits. In 1991, he joined the Swiss Federal Institute of Technology (ETH) in Zürich, Switzerland, where he received the Ph.D. degree in 1996. His Ph.D. work was

on circuit design and processing for high-speed optical links on GaAs using epitaxial lift-off techniques.

In 1996, he joined the Electronics Laboratory, also at the ETH, where he led a research group in the area of InP-HBT circuit design and technology. Since Fall 1999, he has been with the IBM Research Laboratory, Rüschlikon, Switzerland. His present research interests include all aspects of electrical and optical high-speed high-density interconnects and high-speed and microwave circuit design.



**Jonas Weiss** (S'04) received the Dipl. Ing. degree in electrical engineering from the Swiss Federal Institute of Technology (ETH), Zürich, in 1997.

From 1997 to 1998, he was with Philips Semiconductors, working on analog low-power CMOS circuits. From 2000 to 2002, he worked on mixed-signal front-ends for medical ultrasound applications. He joined the IBM Zürich Research Laboratory in 2003 to pursue his Ph.D. studies in the field of electro-optical interconnections. His research interests include packaging, ESD protection

schemes and analog front-ends for high-speed serial links.



Martin L. Schmatz received the degree in electrical engineering in 1993 and the Ph.D. degree in 1998, both from the Swiss Federal Institute of Technology (ETH), Zürich, for his work on low-power wireless receiver designs and on noise-parameter measurement systems.

In 1999, he joined the IBM Zürich Research Laboratory, where he established a research group focusing on high-speed and high-density CMOS seriallink systems. Since 2001, he has managed the I/O Link Technology group at IBM Research. He is also

the IBM manager responsible for the joint IBM-ETH Competence Center for Advanced Silicon Electronics (CASE), which allows researchers from ETH to access IBM's most advanced SiGe and CMOS technologies.



**Robert Reutemann** received the Dipl. Ing. degree in electrical engineering from the Swiss Federal Institute of Technology (ETH), Zürich, in 1997.

From 1998 to 2003 he was with the Integrated Systems Laboratory of the Swiss Federal Institute of Technology, working on low-power VLSI digital signal processing implementations for communications and mixed-signal integrated circuits for sensor/control applications. In 2002 he co-founded Miromico AG, Zürich, Switzerland, a spin-off of ETH Zürich, focusing on analog and mixed-signal

IC design services.



**Peter Buchmann** was born in Zürich, Switzerland, in 1953. He received the diploma in experimental physics and the Ph.D. degree in physics from the Federal Institute of Technology, Zürich, Switzerland, in 1978 and 1987, respectively.

From 1978 to 1981, he was involved in surface physics studies. From 1981 to 1985, he was working in the field of integrated optics in the group of Applied Research at the Federal Institute of Technology. He was engaged in the technology, design and characterization of III-V semiconductor waveguide de-

Activities and switches. In 1985, he joined waveguide devices, electro-optic modulators and switches. In 1985, he joined the IBM Zürich Research Laboratory, Rüschlikon, Switzerland, where he has been engaged in MESFET technology and in the process technology of III-V semiconductor lasers. In particular, he was involved in research on dry-etching techniques and opto-electronic integration. Since 1994, he has been involved in the design and implementation of VLSI chips for communication applications in the field of ATM, SONET/SDH, and network processors. His most recent work includes circuit design for high-speed IO and link technology.

Dr. Buchmann is a member of the Swiss Physical Society.



**Marcel Kossel** (S'99–M'02) received the Dipl. Ing. and Ph.D. degrees in electrical engineering from the Swiss Federal Institute of Technology (ETH), Zürich, in 1997 and 2000, respectively.

He joined the IBM Zürich Research Laboratory in 2001, where he is involved in analog circuit design for high-speed serial links. His research interests include circuit design and RF measurement techniques. He also has done research in the field of microwave tagging systems and radio-frequency identification systems.