## 17.7 A Double-Tail Latch-Type Voltage Sense Amplifier with 18ps Setup+Hold Time

Daniël Schinkel, Eisse Mensink, Eric Klumperink, Ed van Tuijl, Bram Nauta

University of Twente, Enschede, The Netherlands

Latch-type sense amplifiers, or sense amplifier based flip-flops, are very effective comparators. They achieve fast decisions due to a strong positive feedback and their differential input enables a low offset. Sense amplifiers (SA) are hence widely applied in, for example, memories, A/D converters, data receivers and lately also in on-chip transceivers [1,2]. Voltage-mode SA's, as shown in Fig. 17.7.1, have become especially popular [3-5] because of their high input impedance, full-swing output and absence of static power consumption.

However, the stack of transistors in Fig. 17.7.1 requires a large voltage headroom, which is problematic in low-voltage deep-submicron CMOS technologies. Furthermore, the speed and offset of this circuit are very dependent on the common-mode voltage of the input  $V_{cm}$  [4], which is a problem in applications with wide common-mode ranges, for example A/D converters.

As an alternative, a double-tail sense amplifier is presented here, which uses one tail for the input stage and another for the latching stage, as shown in Fig. 17.7.2. This topology has less stacking and can therefore operate at lower supply voltages. The double tail enables both a large current in the latching stage (wide M12), for fast latching independent of the  $V_{cm}$ , and a small current in the input stage (small M9), for low offset.

The signal behavior of the double-tail SA is also shown in Fig. 17.7.2. During the reset phase (Clk = 0V), transistors M7 and M8 pre-charge the Di nodes to  $V_{DD}$ , which in turn causes M10 and M11 to discharge the output nodes to ground (so there is no need for dedicated reset transistors at the output nodes). After the reset phase, the tail transistors M9 and M12 turn on (Clk =  $V_{DD}$ ). At the Di nodes, the common-mode voltage then drops monotonically with a rate defined by  $I_{M9}/C_{Di}$  and on top of this, an input dependent differential voltage  $\Delta V_{Di}$  will build up. The intermediate stage formed by M10 and M11 passes  $\Delta V_{Di}$  to the cross-coupled inverters and also provides additional shielding between the input and output, with less kickback noise as a result. The inverters start to regenerate the voltage difference as soon as the common-mode voltage at the Di nodes is no longer high enough for M10 and M11 to clamp the outputs to ground. The ideal operating point  $(V_{en})$  and the timing of the various phases can be tuned with the transistor sizes.

To compare the conventional and double-tail SAs, both circuits are simulated, with transistor dimensions scaled to get an offset standard deviation of  $\sigma_{\rm os}$  = 13mV. The operating conditions are  $V_{\rm DD}$  = 1.2V and  $f_{\rm cik}$  = 3GHz, and the input has  $V_{\rm cm}$  = 1.1V. At this high  $V_{\scriptscriptstyle cm}$  (found, for example, in memories), the conventional topology needs reset transistors at the Di nodes [1,5] to ensure that M5 and M6 at least start in saturation. The power consumed by both circuits is similar, about 40fJ/bit. Figure 17.7.3 shows the delays of both circuits versus the differential input voltage. The positive feedback gives a logarithmic relation between the delay and  $\Delta V_{ii}$ : 37ps per decade for the double-tail SA and 37 to 43ps/dec for the conventional SA. The double-tail SA is both faster in general and the delay increases by only 7ps when  $V_{em}$ drops to 0.7V, instead of the 25 to 60ps increase for the conventional topology. When simulated at  $V_{DD} = 1$ V, the delay of the double-tail SA is 15ps larger versus 29ps for the conventional SA.

The double-tail SA was implemented in a 1.2V CMOS 90nm technology, as part of a low-swing on-chip data transceiver that operates around  $V_{cm} = 1.1$ V. The  $V_{cm}$  can have large variations due to, for example, crosstalk effects. A double-tail SA with dedicated

314

input and output pads (for probe station measurement) was placed on the same die. The layout of the double-tail SA is shown in the inset of the chip micrograph in Fig. 17.7.7. An SR-latch is connected to the output of the SA to create static output signals without loss of timing information from the core of the SA. When required, more advanced 'slave' stages could be used in applications [3].

Figure 17.7.4 shows the measured relative delay under different conditions (the absolute delay is not measurable due to additional delay from the output drivers). As intended, the minimal delay is found at  $V_{cm} = 1.1$ V. At a  $V_{cm}$  of 0.6V, there is still only a 20ps increase in delay. The delay versus  $\Delta V_{in}$  is 44ps/decade under nominal conditions. In comparison, measurements in [4] on a conventional topology in 0.13 $\mu$ m CMOS with  $V_{DD} = 1.5$ V show a delay versus  $\Delta V_{in}$  of 100 to 170ps/dec and a 250ps increase in delay when  $V_{cm}$  is lowered to 0.6V.

The offset in [4] is also very dependent on the  $V_{cm}$  and rises from 8.5 to 19mV when the  $V_{cm}$  changes from 1.05 to 1.5V. For our design, measurements on 20 samples gave an offset of  $\sigma_{ce} = 8mV$  at both  $V_{cm} = 1.1$ V and  $V_{cm} = 0.75$ V. If desired, area upscaling could further reduce the offset at the expense of power  $(P \propto 1/\sigma_{ce}^2)$ . Offset compensation schemes [5] are a good alternative if the application allows for the added complexity. The power consumed by the SA is 113fJ/decision when  $\Delta V_{in}$  is 50mV ( $f_{ch} = 1$ GHz,  $V_{DD} = 1.2$ V,  $P = 113\mu$ W @ 1GHz, or 225 $\mu$ W @ 2 GHz), which drops to 92fJ/decision for full-swing inputs.

The SA's equivalent input noise was extracted, by measuring the average number of positive decisions versus  $\Delta V_{in}$ , as shown in Fig. 17.7.5. Fitting the measurements to a Gaussian cumulative distribution gives an RMS noise voltage of  $V_{rms} = 1.5$  mV.

Setup and hold times are extracted from BER measurements around the zero crossings of the full-swing input patterns, as shown in Fig. 17.7.6. No bit errors are measured outside an interval of 18ps, so the required setup+hold time is smaller than 18ps (as input jitter is part of the 18ps). A conventional circuit in  $0.18\mu$ m CMOS [3] achieves 80ps, which would still be 40ps in 90nm CMOS according to scaling theory. In the double-tail topology, the setup+hold time could be further reduced with a wider tail transistor M9, but at the expense of increased offset and noise due to a shortening of the time that M5/M6 operate in saturation. Simulations predict that the current aperture time is already fast enough to sample data patterns of 40Gb/s, provided that interleaving is used to enable a suitably long regeneration phase.

In conclusion, the double-tail topology has an added degree of freedom that enables better optimization of the balance between speed, offset, power and common-mode voltage. The circuit also has better isolation between input and output and is well suited to operate at low supply voltages.

## Acknowledgements:

The authors thank Philips Research for chip fabrication, the Dutch Technology Foundation (STW, project TCS.5791) for funding and Gerard Wienk for assistance.

## References:

[1] H. Zhang, V. George and J. Rabaey, "Low-Swing On-Chip Signaling Techniques: Effectiveness and Robustness," *IEEE T. VLSI Systems*, pp. 264-272, June, 2000.

[2] D. Schinkel et al., "A 3Gb/s/ch Transceiver for 10-mm Uninterrupted RC-Limited Global On-Chip Interconnects," *IEEE J. Solid-State Circuits*, pp. 297-306, Jan., 2006.

[3] B. Nikolic et al., "Improved Sense-Amplifier-Based Flip-Flop: Design and Measurements," *IEEE J. Solid-State Circuits*, pp. 876-884, June, 2000.

[4] B. Wicht et al., "Yield and Speed Optimization of a Latch-Type Voltage Sense Amplifier," *IEEE J. Solid-State Circuits*, pp. 1148-1158, July, 2004.
[5] K.-L.J. Wong and C.-K.K. Yang, "Offset Compensation in Comparators with Minimum Input-Referred Supply Noise," *IEEE J. Solid-State Circuits*, pp. 837-840, May, 2004.

## 1-4244-0852-0/07/\$25.00 ©2007 IEEE.

Authorized licensed use limited to: Ehsan Zhian Tabasy. Downloaded on November 15, 2009 at 04:17 from IEEE Xplore. Restrictions apply.



Figure 17.7.1: Conventional latch-type voltage sense amplifier. The dotted transistors are examples of common variations.



Figure 17.7.3: Simulated sense amplifier delays versus differential input voltage. The delay is the time between the clock edge and the instant when  $\Delta Out$  crosses 1/2  $V_{DD}$ .

measured

0.8

9.0 P(out=high) 9.0 P

0.2

0

-6

Gaussian,  $\sigma = 1.5 \text{mV}$ 

-2

-4

0

∆Vin - Voffset (mV)

2

4











17

Figure 17.7.4: Measured relative delay versus differential and common-mode input voltages.

Continued on Page 605

Authorized licensed use limited to: Ehsan Zhian Tabasy. Downloaded on November 15, 2009 at 04:17 from IEEE Xplore. Restrictions apply.

6

