## ECEN720: High-Speed Links Circuits and Systems Spring 2025

Lecture 13: Forwarded Clock Deskew Circuits



Sam Palermo Analog & Mixed-Signal Center Texas A&M University

# Announcements

- Project Preliminary Report due Apr 15
- Project Final Report due Apr 29

# Agenda

- Forwarded Clock I/O Overview
- Data & Clock Skew Performance Impact
- Jitter Impulse Response and Jitter Transfer Function
- Forwarded Clock Deskew Architectures
  - PLL/PI
  - DLL/PI
  - ILO
    - Fundamental, Super-Harmonic, Sub-Harmonic

# Forwarded Clock I/O Architecture



- Common high-speed reference clock is forwarded from TX chip to RX chip
  - Mesochronous system
- Used in processor-memory interfaces and multi-processor communication
  - Intel QPI
  - Hypertransport
  - Requires one extra clock channel
  - "Coherent" clocking allows lowto-high frequency jitter tracking
  - Need good clock receive amplifier as the forwarded clock is attenuated by the channel

# Forwarded Clock I/O Limitations



- Clock skew can limited forward clock I/O performance
  - Driver strength and loading mismatches
  - Interconnect length
     mismatches
- Low pass channel causes jitter amplification
- Duty-Cycle variations of forwarded clock

# Forwarded Clock I/O De-Skew



- nnel Serial Link System significant data rate increases
  - Sample clock adjusted to center clock on the incoming data eye

Per-channel de-skew allows for

- Implementations
  - Delay-Locked Loop and Phase Interpolators
  - Injection-Locked Oscillators
- Phase Acquisition can be
  - BER based no additional input phase samplers
  - Phase detector based implemented with additional input phase samplers periodically powered on

# Forwarded Clock I/O Circuits



- TX PLL
- TX Clock Distribution
- Replica TX Clock Driver
- Channel
- Forward Clock Amplifier
- RX Clock Distribution
- De-Skew Circuit
  - PLL/PI
  - DLL/PI
  - Injection-Locked Oscillator

## Data & Clock Skew Performance Impact

- High speed forwarded clock allows jitter tracking between clock and data
- •Clock to data skew causes that high frequency clock and data jitters become out of phase on the receiver



Jitter Frequency = 100MHz



 $J_D = J_P \sin\left(2\pi f_j U I \cdot n\right)$ 

#### skew of mUI between data and clock

$$J_C = J_P \sin\left(2\pi f_j UI \cdot n + 2m\pi f_j UI\right)$$

$$J_{diff} = J_D - J_C$$

UI = 100ps Assuming 5UI skew in this example

Jitter Frequency = 200MHz





Jitter Frequency = 400MHz

The clock skew flips the jitter phase of clock faster for higher frequency jitter and results in higher differential jitter

- Assuming infinite jitter tracking bandwidth (JTB)
- For a given skew, as the jitter frequency increases the differential jitter increases and become a maximum of 2X
- For a given jitter frequency, at a skew of 1/(6f<sub>j</sub>) the system will have a differential jitter greater than 1



### Optimum Jitter Tracking for 200MHz jitter

- Limit the JTB by attenuating the clock jitter using amplitude response of low pass function with pole frequency = JTB
- $J_C = J'_P \sin\left(2\pi f_j UI \cdot n + 2m\pi f_j UI\right)$



JTB(Hz)

x 10<sup>8</sup>

#### Jitter Impulse Response(JIR) and Jitter Transfer Function(JTF) Analysis Method

- JIR: test the system response on jitter
- JTF: ratio of output to input jitter as a function of frequency, DTFT of JIR

Extraction of JIR in  $\frac{1}{2}$  rate system where both clock edges are using



Ideal clock waveform superimposed with clock incorporating jitter impulse stimulus

Output clock waveforms using ideal clock versus jitter impulse clock

Jitter impulse response

• A clock system's effect on an input jitter sequence can be evaluated by convolving the jitter sequence with the jitter impulse response

B. Casper and F. O'Mahony, "Clocking analysis, implementation and measurement techniques for high-speed data links a tutorial," IEEE Trans. Circuits Syst. I, vol. 56, no. 1, pp. 683–688, Jan. 2009.

# Filter/Amplifier Frequency Response & Jitter Transfer Response



- Low-pass frequency response (buffer, distribution interconnect) is similar to a high-pass jitter filter
  - High frequency jitter is amplified
- High-pass frequency response (AC coupling cap) is similar to an all-pass jitter filter, except for Nyquist-rate jitter (duty cycle error)
- Band-pass frequency response (band-pass filter) is similar to a low-pass jitter filter with the center frequency aligned with the fundamental clock frequency

# Jitter Amplification

- Low-pass frequency response (buffer, distribution interconnect) is similar to a high-pass jitter filter
  - High frequency jitter is amplified as it propagates across the channel



# PLL or DLL/PI Forwarded Clock Deskew



- TX clock is forwarded along an independent channel to the RX chip where it is distributed to the RX channels
  - The PLL or DLL locks onto the forwarded clock and serves as a multi-phase generator and a jitter filter
  - The PI mixes the phases to produce sampling clocks at the optimal phase for maximum timing margin or BER

B. Casper and F. O'Mahony, "Clocking analysis, implementation and measurement techniques for high-speed data links a tutorial," IEEE Trans. Circuits Syst. I, vol. 56, no. 1, pp. 683–688, Jan. 2009.

## PLL/PI Forwarded Clock Deskew Example



[Prete ISSCC 2006]

• Fully buffered DIMM transceiver

# PLL/PI Forwarded Clock Deskew Example

- PLL low-pass jitter transfer characteristic filters high frequency jitter
  - Desired for uncorrelated jitter
  - Not desired for correlated high frequency jitter



- PLL disadvantages
  - Jitter accumulation in VCO
  - Stability
  - More area and
     complex than DLL implementations

#### [Prete ISSCC 2006]

# **DLL/PI Forwarded Clock Deskew Example**

- DLL displays an all-pass jitter transfer function
  - Desired for correlated jitter
  - Not desired for uncorrelated jitter



- DLL advantages
  - No jitter accumulation
  - Inherently stable
  - Simpler and less area than PLL
- Finite bandwidth of DLL
   delay line can result in jitter amplification

#### [Balamurugan JSSC 2008]

# Injection Locking in LC Tanks

a) a free-running oscillator consisting of an ideal positive feedback amplifier and an LC tank;

b) we insert a phase shift in the loop. We know this will cause the oscillation frequency to shift since the loop gain has to have exactly  $2\pi$  phase shift (or multiples).



## Phase Shift for Injected Signal

- Assume the oscillator "locks" onto the injected current and oscillates at the same frequency.
- Since the locking signal is not in general at the resonant center frequency, the tank introduces a phase shift
- In order for the oscillator loop gain to be equal to unity with zero phase shift, the sum of the current of the transistor and the injected currents must have the proper phase shift to compensate for the tank phase shift.





# **Injection Locked Oscillator Phasors**

 $\phi_0$  = Tank impedance phase shift

 $\theta$  = Phase shift between injected clock and output signal



Note that the frequency of the injection signal determines the extra phase shift  $\Phi_0$  of the tank. This is fixed by the frequency offset.

□ The current from the transistor is fed by the tank voltage, which by definition the tank current times the tank impedance, which introduces  $\Phi_0$  between the tank current/voltage.

 $\Box$  The angle between the injected current and the oscillator current  $\theta$  must be such that their sum aligns with the tank current.

### **Injection Geometry**

$$\sin \phi_0 = \frac{B}{I_{tank}}$$

$$\cos(\pi/2 - \theta) = \sin(\theta) = \frac{B}{I_{inj}}$$

$$\sin \phi_0 = \frac{I_{inj} \sin(\theta)}{I_{tank}} \sin(\theta)$$

$$\sin \phi_0 = \frac{I_{inj} \sin(\theta)}{|I_{osc}e^{j\theta} + I_{inj}|} = \frac{I_{inj} \sin(\theta)}{\sqrt{I_{osc}^2 + I_{inj}^2 + 2\cos\theta I_{osc} I_{inj}}} \theta$$

$$I_{inj}$$

The geometry of the problem implies the following constraints on the injected current amplitude relative to the oscillation amplitude.

# Locking Range

$$\sin \phi_0 = \frac{I_{inj}}{I_T} \sin \theta = \frac{I_{inj} \sin \theta}{\sqrt{I_{osc}^2 + I_{inj}^2 + 2I_{osc} I_{inj} \cos \theta}}$$
$$\Rightarrow \sin \phi_{0,\max} = \frac{I_{inj}}{I_{osc}}, if \cdot \cos \theta = -\frac{I_{inj}}{I_{osc}}$$

A second-order parallel tank consisting of L. C, Rp exhibits a phase shift of:

$$\phi_0 = \frac{\pi}{2} - \tan^{-1}\left(\frac{L \cdot \omega}{R_p} \cdot \frac{\omega_0^2}{\omega_0^2 - \omega^2}\right)$$
  
$$\because \omega_0^2 - \omega^2 \approx 2\omega_0(\omega_0 - \omega), \frac{L \cdot \omega}{R_p} = \frac{1}{Q}, \frac{\pi}{2} - \tan^{-1}(x) = \tan^{-1}(x^{-1})$$
  
$$\therefore \tan \phi_0 \approx \frac{2Q}{\omega_0}(\omega_0 - \omega)$$
  
$$\because \tan \phi_0 = \frac{I_{inj}}{I_T}, I_T = \sqrt{I_{osc}^2 - I_{inj}^2}$$



At the edge of the lock range, the injected current is orthogonal to the tank current.

The phase angle between the injected current and the oscillator is  $90^{\circ} + \Phi_{0,max}$ 

## Locking Range



 $I_{inj} << I_{osc}$ 

$$\omega_{\Delta,L} = \omega_0 - \omega_{inj} \approx \frac{\omega_0}{2Q} \cdot \frac{I_{inj}}{I_{osc}}$$

When: 
$$\omega_0 = 10 GHz, Q = 5, K = \frac{I_{inj}}{I_{osc}} = 0.1$$
  
=>  $\omega_{\Delta,L} \approx 100 MHz$ 

Locking range is inversely proportional to oscillator Q

#### Digital Controlled Oscillator (DCO) with Injection Locking



Shekhar, Sudip et al, "Strong Injection Locking in Low-Q LC Oscillators: Modeling and Application in a Forwarded-Clocked I/O Receiver", IEEE JSSC, 2009.

The digitally controlled switch-capacitor bank tunes the free-running frequency of DCO to adjust the phase of the forwarded clock and also compensate for PVT.

# **Ring Oscillator ILO Example**



# **Ring Oscillator ILO Example**



- ILOs have a first-order low-pass filter function to input (injection clock) jitter
- ILOs have a first-order high-pass filter function to VCO jitter

where  $\omega_{\rm P}$  is the jitter tracking bandwidth :  $\omega_{\rm P} = \sqrt{\frac{K^2}{\Lambda^2} - \Delta \omega^2}$ 

For a parallel RLC resonant tank : 
$$A = \frac{2Q}{\omega_o}$$

$$\Delta \omega$$
 is a function of the desired de-skew phase:  $\theta_{ss} \approx \sin^{-1} \left( \frac{A}{K} \Delta \omega \right)$ 



$$JTF_{VCO} = \frac{\overline{\omega_P}}{1 + \frac{s}{\omega_P}}$$

S

$$JTF_{VCO} = -\frac{1}{1}$$



• ILO jitter transfer bandwidth decreases as the oscillator is locked further from the free-running frequency,  $\omega_0$ , to obtain a larger phase shift  $\theta_{ss}$ 



 ILO jitter transfer bandwidth increases with injection strength



 ILO jitter transfer bandwidth increases with injection strength

# **ILO Phase Noise Filtering**

 $S_{out} = \left| JTF_{input} \right|^2 S_{inj} + \left| JTF_{VCO} \right|^2 S_{VCO}$ 

$$S_{out}(\omega_{jitter}) = \frac{\omega_P^2 S_{inj} + \omega^2 S_{VCO}}{\omega_P^2 + \omega^2} = \frac{\left(\frac{K}{A}\right)^2 \cos^2 \theta_{ss} S_{inj} + \omega^2 S_{VCO}}{\left(\frac{K}{A}\right)^2 \cos^2 \theta_{ss} + \omega^2}$$



- Up to jitter tracking bandwidth, ILO output phase noise is dominated by injection clock
  - Can be better than VCO
  - JTB depends on desired de-skew phase
- At high frequencies, VCO phase noise dominates

#### Ring Oscillator Super-Harmonic ILO Example

- Potential system application
  - 1/2 rate TX forwards clock to Deskewed Clock (f.)
     1/4 rate RX
     I-VCO
     Q-VCO



## Super-Harmonic ILO Phase Noise Filtering



 Low frequency phase noise is actually better than injection oscillator by m<sup>2</sup>

$$S_{out}(\omega_{jitter}) = \frac{m^2 \left(\frac{K}{A}\right)^2 \cos^2 m \theta_{ss} S_{inj} + \omega^2 S_{VCO}}{\left(\frac{K}{A}\right)^2 \cos^2 m \theta_{ss} + \omega^2}$$



### Ring Oscillator Sub-Harmonic ILO Example w/ Clock Signal Injection

- Forwarding a lower speed clock avoids jitter amplification over low-pass channel
- Sub-Harmonic injection with clock signal can cause significant ILO amplitude variations and sub-harmonic spurious tones



[Hossain ISSCC 2010]

### Ring Oscillator Sub-Harmonic ILO Example w/ Pulse Train Signal Injection

- Forwarding a pulse train signal reduces amplitude variations and ILO spurious tones
- Adjusting pulse width, d, changes effective injection strength and can be used to adjust jitter tracking bandwidth



#### [Hossain ISSCC 2010]

## Effective Injection Strength of Pulse Train



### Adjusting Jitter Tracking Bandwidth w/ Pulse Train Signal



- [Hossain ISSCC 2010]
- Wider pulse separation (lower frequency sub-harmonics) reduces effective injection strength and results in lower jitter tracking bandwidth
- Reducing pulse width, d, for a given spacing reduces effective injection strength and results in lower jitter tracking bandwidth

# Phase Drifts with ILO-Based Clocking



 Voltage and temperature variations can cause the TX/RX ILOs' free running frequency to change, and thus the phase relationship can drift with time

#### Low-Overhead CDR w/ILO-Based De-Skew



 Introducing a low-overhead CDR into a forwarded-clock system allows tracking of low-frequency phase drifts, while maintaining correlated jitter tracking

# Multi-Phase Errors at Low VDD



# Edge-Rotating 5/4X Sub-Rate CDR



H. Li, S. Chen, L. Yang, R. Bai, W. Hu, F. Zhong, S. Palermo, and P. Chiang, "A 0.8V, 560fJ/bit, 14Gb/s Injection-Locked Receiver with Input Duty-Cycle Distortion Tolerable Edge-Rotating 5/4X Sub-Rate CDR in 65nm CMOS," VLSI Symp., June 2014.

# 14Gb/s GP 65nm CMOS Prototype



V<sub>DD</sub>, Process

Data Rate

Clock Rate

Clock Arch.

Multi-phase Gen. **RX FOM** 

\*(Excludes PLL)

#### **Tracking Non-Uniform Eyes**



#### **Correlated Jitter Tolerance**



#### **Uncorrelated Jitter Tolerance**

14Gbps w/ CDR

| [2]        | [3]         | [4]        | This Work  |       |
|------------|-------------|------------|------------|-------|
| 1.0V/65nm  | 1.25V/90nm  | 1.08V/32nm | 0.8V/65nm  |       |
| 6.4Gb/s    | 8Gb/s       | 16Gb/s     | 14Gb/s     |       |
| 3.2GHz     | 2GHz        | 4GHz       | 3.5GHz     |       |
| FC         | Embedded    | FC         | FC         | Ample |
| PLL/PI     | DLL         | ILO/PI     | ILO/PI     |       |
| 3.9pJ/bit* | 1.59pJ/bit* | 1.02pJ/bit | 0.56pJ/bit |       |
|            |             |            |            | 0.01  |

14Gbps w/o CDR 12Gbps w/ CDR 12Gbps w/o CDR 0.001 10 100 0.01 0.1 **Jitter Frequency (MHz)** 

H. Li, S. Chen, L. Yang, R. Bai, W. Hu, F. Zhong, S. Palermo, and P. Chiang, "A 0.8V, 560fJ/bit, 14Gb/s Injection-Locked Receiver with Input Duty-Cycle Distortion Tolerable Edge-Rotating 5/4X Sub-Rate CDR in 65nm CMOS," VLSI Symp., June 2014.

### Optimum Jitter Tracking for 200MHz jitter

- Limit the JTB by attenuating the clock jitter using amplitude response of low pass function with pole frequency = JTB
- $J_C = J'_P \sin\left(2\pi f_j UI \cdot n + 2m\pi f_j UI\right)$
- $J'_P = J_P \left| \frac{1}{1 + \frac{jf_j}{JTB}} \right|$

In 10Gb/s system, UI = 100ps

Objective: Implement optimal JTB that yields minimum differential jitter

Controllable JTB over 70 - 800MHz is desired Jitter Frequency = 200MHz



#### Understanding of Jitter Reduction using Bandpass Filtering

- Time Domain Jittery Clock Expression:  $c(t) = Acos(2\pi f_c t + \beta sin2\pi f_m t)$  $\beta$ :phase noise amplitude;  $f_m$ : Jitter frequency
- Frequency Domain Jittery Clock Expression:

$$C(f) \approx \frac{A}{2} \delta(f - f_c) - \frac{\beta A}{4} [\delta(f - f_L) - \delta(f - f_H)]$$



#### Understanding of Jitter Reduction using Bandpass Filtering

• The spectrum of received clock after filtering

$$S(f) = \frac{\alpha_c A}{2} \delta(f - f_c) - \frac{\beta A}{4} [\alpha_L \delta(f - f_L) - \alpha_H \delta(f - f_H)]$$

 $\alpha_c$ ,  $\alpha_L$  and  $\alpha_H$  are the gain of the bandpass function at  $f_c$ ,  $f_L$  and  $f_H$ Received clock expression in time domain:  $s(t) = A_r cos(2\pi f_c t + \beta_r sin 2\pi f_m t)$ 

• Phase noise amplitude of the received clock

$$\beta_r \approx (\frac{\alpha_L + \alpha_H}{2\alpha_c})\beta$$

• For typical bandpass filtering,  $\alpha_c \ge 1$  and  $\alpha_L = \alpha_H < \alpha_c$ . Thus,  $\beta_r < \beta$  and the jitter of the transmitted clock is reduced by bandpass filtering

#### Understanding of Jitter Reduction using Bandpass Filtering

 Bandpass function is symmetrical and center at f<sub>c</sub>, the transfer function can be expressed as a low-pass function with respect to the frequency offset from f<sub>c</sub>,

$$H(j2\pi(f_c - f)) = H(j2\pi(f_c + f)) = \frac{|H(j2\pi f_c)|}{1 + \frac{jf}{f_p}}$$

f :the offset frequency from  $f_c$ ;  $BW_{3dB}$ :bandwidth of bandpass filter  $f_p = (1/2)BW_{3dB}$ 

• JTF of bandpass filtering:

$$JTF_{BP}(j2\pi f) = \frac{\beta_r}{\beta}(j2\pi f) = \frac{H(j2\pi(f_c - f)) + H(j2\pi(f_c + f))}{2H(j2\pi f_c)} = \frac{1}{1 + \frac{jf}{f_p}}$$

# Analysis of Bandpass Jitter Filtering Based on JIR and JTF



- Transmitted jitter exhibits low-pass transfer characteristic through band-pass channel
- Received high frequency uncorrelated jitter can be reduced by a bandpass filter

#### Optimum Jitter Tracking with Bandpass Filtering

Higher Q of bandpass filtering, smaller bandwidth, higher jitter filtering

$$JTF_{BP}(j2\pi f) = \frac{1}{1 + \frac{jf}{f_p}}$$

$$f_p = (1/2)BW_{3dB}$$

$$Q_1 < Q_2 < Q_3$$

$$Q_1$$

$$Q_2$$

$$Q_3$$

$$f_L$$

$$f_c$$

$$f_H$$

$$f_f$$

#### Optimum Jitter Tracking with Bandpass Filtering

 Apply JIR and JTF analysis to quantify the impact of Q on JTB of 5GHz clock, UI = 100ps



Q tuning range over 3 -30 provides JTB range over 97 – 790 MHz

To achieve JTB of 70MHz to optimize jitter tracking with 10UI clock skew, higher Q is required.

# **Bandpass Filter for Forwarded-Clocks**



# **Bandpass Filter Jitter Filtering**







Fig. 20 Simulated impact of the proposed bandpass filter circuit on random jitter





- Bandpass filter is effective in filtering high-frequency jitter
- Low-maximum Q of the filter (Q=2.6) limits tuning to low-frequency jitter tracking bandwidths
  - Limited by the passive inductor Q

# Next Time

#### Clock Distribution Techniques