#### ECEN 620: Network Theory Broadband Circuit Design Fall 2020

#### Lecture 16: High-Speed I/O Overview



#### Sam Palermo Analog & Mixed-Signal Center Texas A&M University

## Announcements

- Exam 2 is on Nov. 19
  - Covers through Lecture 15
  - Take home format assigned/turned-in via Google
    Classroom
  - Posted at ~8AM and due 7PM
  - Open notes/references
- Project
  - Final report due Nov 24
  - Presentation on Dec 4 (2PM-4:30PM)

## Outline

- Introduction
- Electrical I/O Overview
  - Channel characteristics
  - Transmitter & receiver circuits
  - Clocking techniques & circuits
- Conclusion

#### ECEN 720: High-Speed Links Circuits & Systems

- Spring 2021
- <u>https://people.engr.tamu.edu/spalermo/ecen720.html</u>
- Covers system level and circuit design issues relevant to high-speed electrical and optical links
- Channel Properties
  - Modeling, measurements, communication techniques
- Circuits
  - Drivers, receivers, equalizers, clocking
- Project
  - Link system design with statistical BER analysis tool
  - Circuit design of key interface circuits
- Prerequisite: ECEN 474/704 or my approval

# High-Speed Serial I/O

- Found in applications ranging from high-end computing systems to smart mobile devices
- Typical processor platform
  - Processor-to-memory: DDR4
  - Processor-to-peripheral: PCIe & USB
  - Storage: SATA
  - Network: LAN
- Mobile systems
  - DSI : Display Serial Interface
  - CSI : Camera Serial Interface
  - UniPRO : MIPI Universal Protocol





# Chip-to-Chip Signaling Trends



Slide Courtesy of Frank O'Mahony & Brian Casper, Intel

# Increasing I/O Bandwidth Demand



[Zhou Opt. Fiber Tech. 2017]

- Aggressive scaling of I/O data rates is required for data centers and HPC systems
- PAM4 modulation offers higher spectral efficiency and is commonly used in electrical I/Os operating above 50Gb/s

## Outline

- Introduction
- Electrical I/O Overview
  - Channel characteristics
  - Transmitter & receiver circuits
  - Clocking techniques & circuits
- Conclusion

#### High-Speed Electrical Link System



#### **Electrical Backplane Channel**



- Frequency dependent loss
  - Dispersion & reflections
- Co-channel interference
  - Far-end (FEXT) & near-end (NEXT) crosstalk

#### Loss Mechanisms



B. Dally et al, "Digital Systems Engineering,"

#### Reflections



#### Crosstalk

- Occurs mostly in package and boardto-board connectors
- FEXT is attenuated by channel response and has band-pass characteristic
- NEXT directly couples into victim and has high-pass characteristic



#### **Channel Performance Impact**



 $(\mathbf{V})$ 

Voltage

#### **Channel Performance Impact**



 $\geq$ 

Θ

Voltag

## Outline

- Introduction
- Electrical I/O Overview
  - Channel characteristics
  - Transmitter & receiver circuits
  - Clocking techniques & circuits
- Conclusion

## Link Speed Limitations

- High-speed links can be limited by both the internal electronics and the channel
- Clock generation and distribution is key circuit bandwidth bottleneck
  - Requires data mux/demux to use multiple clock phases
  - Passives and/or CML techniques can extend circuit bandwidth at the expense of area and/or power
- Limited channel bandwidth is typically compensated with equalization circuits



\*C.-K. Yang, "Design of High-Speed Serial Links in CMOS," 1998.

# **Multiplexing Techniques**

- Data mux/demux operation typically employs multiple clock phases
- 1/2 rate architecture (DDR) is most common
  - Sends a bit on both the rising and falling edge of one differential clock
  - 50% duty cycle is critical
- Higher multiplexing factors with multiple clock phases further increases output data rate relative to on-chip clock frequency
  - Phase spacing/calibration is critical



8:1 Multiplexing TX\*



\*C.-K. Yang, "Design of High-Speed Serial Links in CMOS," 1998.

#### Current vs Voltage-Mode Driver

- Signal integrity considerations (min. reflections) requires 50Ω driver output impedance
- To produce an output drive voltage
  - Current-mode drivers use Norton-equivalent parallel termination
    - Easier to control output impedance
  - Voltage-mode drivers use Thevenin-equivalent series termination
    - Potentially  $\frac{1}{2}$  to  $\frac{1}{4}$  the current for a given output swing



#### **TX FIR Equalization**

 TX FIR filter pre-distorts transmitted pulse in order to invert channel distortion at the cost of attenuated transmit signal (de-emphasis)



# 6Gb/s TX FIR Equalization Example





#### 6Gb/s Pulse Responses



- Pros
  - Simple to implement
  - Can cancel ISI in precursor and beyond filter span
  - Doesn't amplify noise
  - Can achieve 5-6bit resolution
- Cons
  - Attenuates low frequency content due to peak-power limitation
  - Need a "back-channel" to tune filter taps





6Gb/s Eye - Refined BP Channel w/ TX FIR Eq



#### Demultiplexing RX

- Input pre-amp followed by comparator segments
  - Pre-amp may implement peaking filtering
  - Comparator typically includes linear-amp & regenerative (positive feedback) latch
- Demultiplexing allows for lower clock frequency relative to data rate and extra regeneration and pre-charge time in comparators



#### **RX** Sensitivity

 RX sensitivity is a function of the input referred noise, offset, and min latch resolution voltage

 $v_{S}^{pp} = 2v_{n}^{rms} \sqrt{SNR} + v_{min} + v_{offset^{*}}$  Typical Values :  $v_{n}^{rms} = 1mV_{rms}$ ,  $v_{min} + v_{offset^{*}} < 2mV$ For BER =  $10^{-12}$  ( $\sqrt{SNR} = 7$ )  $\Rightarrow v_{S}^{pp} = 17mV_{pp}$ 

 Circuitry is required to reduce input offset from a potentially large uncorrected value (>50mV) to near 1mV



#### RX Equalization #1: RX FIR



- Pros
  - With sufficient dynamic range, can amplify high frequency content (rather than attenuate low frequencies)
  - Can cancel ISI in pre-cursor and beyond filter span
  - Filter tap coefficients can be adaptively tuned without any back-channel
- Cons
  - Amplifies noise/crosstalk
  - Implementation of analog delays
  - Tap precision

Eye-Pattern Diagrams at 1Gb/s on CAT5e\*



Before Equalizer: 23meters

After Equalizer: 23meters

\*D. Hernandez-Garduno and J. Silva-Martinez, "A CMOS 1Gb/s 5-Tap Transversal Equalizer based on 3<sup>rd</sup>-Order Delay Cells," ISSCC, 2007.

#### RX Equalization #2: RX CTLE



 $(\mathsf{N})$ 

Voltage



- Pros
  - Provides gain and equalization with low power and area overhead
  - Can cancel both precursor and long-tail ISI
- Cons
  - Generally limited to 1<sup>st</sup> order compensation
  - Amplifies noise/crosstalk
  - PVT sensitivity
  - Can be hard to tune

6Gb/s Eye - Refined BP Channel w/ No Eq 6Gb/s Eye - Refined BP Channel w/ RX CTLE Ec



#### RX Equalization #3: RX DFE



- Pros
  - No noise and crosstalk amplification
  - Filter tap coefficients can be adaptively tuned without any back-channel
- Cons
  - Cannot cancel precursor ISI
  - Critical feedback timing path
  - Timing of ISI subtraction complicates CDR phase detection



6Gb/s Eye - Refined BP Channel w/ No Eq

150

Time (ps)

100

200

250

300

0.5

0.4

0.3

0.2

- 0 . 1

-0.2

-0.3

-0.4

-0.5<mark>L</mark>

50

 $\mathbf{\hat{N}}$ 

Voltage



## Outline

- Introduction
- Electrical I/O Overview
  - Channel characteristics
  - Transmitter & receiver circuits
  - Clocking techniques & circuits
- Conclusion

## Clocking Architecture #1 Source Synchronous Clocking





\*S. Sidiropoulos, "High Performance Inter-Chip Signalling," 1998.

- Common high-speed reference clock is forwarded from TX chip to RX chip
- "Coherent" clocking allows high frequency jitter tracking
  - Jitter frequency lower than delay difference (typically less than 10bits) can be tracked
  - Allows power down of phase detection circuitry
    - Only periodic acquisition vs continuous tracking
- Requires one extra clock channel
- Need good clock receive amplifier as the forwarded clock can get attenuated by the low pass channel
- Low pass channel causes jitter amplification

## Clocking Architecture #2 Embedded Clocking (CDR)



- Clock frequency and optimum phase position are extracted from incoming data stream
- Phase detection continuously running
- Jitter tracking limited by CDR bandwidth
  - With technology scaling we can make CDRs with higher bandwidths and the jitter tracking advantages of source synchronous systems is diminished
- CDR can be implemented as a stand-alone PLL or as a "dual-loop" architecture with a PLL or DLL and phase interpolators (PI)

#### Phase-Locked Loop (PLL)



\*J. Bulzacchelli *et al*, "A 10Gb/s 5Tap DFE/4Tap FFE Transceiver in 90nm CMOS Technology," JSSC, 2006.

- Used for frequency synthesis at TX and embedded-clocked RX
- Second/third order loop
  - Charge pump & integrating loop filter produces voltage to control VCO frequency
  - Output phase is integration of VCO frequency
  - Zero required in loop filter for stability
- Low-noise VCO (or high BW PLL) required to minimize jitter accumulation

#### Delay-Locked Loop (DLL)



- Typically used to generate multiple clock phases in RX
- First order loop guarantees stability
- Delay line doesn't accumulate jitter like a VCO
- Difficult to use for frequency synthesis

#### Phase Interpolator (PI)



\*J. Bulzacchelli et al, "A 10Gb/s 5Tap DFE/4Tap FFE Transceiver in 90nm CMOS Technology," JSSC, 2006.

- Interpolators mix between two clock phases to produce the fine resolution clock phases used by the RX samplers
- Critical to limit bandwidth of PI mixing node for good linearity
  - Hard to design over wide frequency range without bandwidth adjustment and/or input slew-rate control

#### **Clock Distribution**

- Careful clock distribution is required in multichannel I/O systems
- Different distribution architectures tradeoff jitter, power, area, and complexity



\*J. Poulton et al, "A 14mW 6.25Gb/s Transceiver in 90nm CMOS," JSSC, 2007.

| Architecture       | Jitter    | Power    | Area     | Complexity |
|--------------------|-----------|----------|----------|------------|
| Inverter           | Moderate  | Moderate | Low      | Low        |
| CML                | Good      | High     | Moderate | Moderate   |
| T-line             | Good      | Low      | Low      | Moderate   |
| Resonant<br>T-line | Excellent | Low      | High     | High       |

# Conclusion

- High-speed I/O systems offer challenges in both circuit and communication system design
- Controlled-impedance drivers are necessary for highspeed signaling
- Achieving good RX sensitivity requires careful front-end noise analysis and sampler offset correction
- Low-noise clock generation, distribution, and recovery is necessary in high-performance serial links