## ECEN720: High-Speed Links Circuits and Systems Spring 2025

### Lecture 15: Die-to-Die Transceivers



Sam Palermo Analog & Mixed-Signal Group Texas A&M University

## Announcements

Project Final Report due Apr 29

- Exam 2 May 2
  - 1PM-3PM for in-person sections
  - Focuses on material from Lectures 7-15
  - Previous years' Exam 2s are posted on the website for reference

## Outline

- Die-to-Die Transceiver Motivation
- Packaging Options
- Standard Package Links
- Advanced Package Links
- 3D Package Links
- Conclusion

# **Chiplet Evolution**

### [Naffziger ISSCC 2020]



- Architectures are evolving from a massive single monolithic die in the most advanced process to smaller specialized chiplets
- Allows use of the most advanced technology only where it is needed most (xPUs)

## Die-to-Die Interconnects

- Yield concerns make it more expensive to build large chips in advanced CMOS nodes
- Domain-specific accelerators offer significant performance benefits
- Motivates chiplet system-inpackage architectures
- Dense energy-efficient die-to-die interconnects are required





# Die-to-Die Interconnect Signaling Challenges

- Single-ended PAM2 transceivers are most common to improve bandwidth density, but impose circuit and signal integrity challenges
- Simultaneous bidirectional (SBD) transceivers can double the bandwidth density, but require efficient front-ends with hybrid structures that separate inbound and outbound signals
- Simultaneous switching noise degrades link performance
- Scaled bump and interposer trace pitch increases crosstalk
- Low-power clocking architectures required to meet aggressive power efficiency targets





## Outline

- Die-to-Die Transceiver Motivation
- Packaging Options
- Standard Package Links
- Advanced Package Links
- 3D Package Links
- Conclusion

# 2D, 2.5D, and 3D Packaging

### [Das Sharma Nature Elec 2024]



- Packaging technology is evolving to support higher-density and finerpitch interconnects
- 3D packaging with hybrid bonding offers the ultimate in areal density

# Standard 2D Packaging



- Chiplets connected via standard package substrate
- Typical 80um diameter bumps with 110-130um pitch
- ~25um interconnect pitch
- Distances can be >50mm and loss >10dB
- Cost-effective, but limited bandwidth density

# 2.5D with Silicon Interposer

### [Das Sharma Nature Elec 2024]



### [Pantano ISSCC 2025]

Si Interposer



- Large silicon substrate used to interconnect chiplets
- Silicon BEOL metal routing technology with up to 5 metal layers and ~0.4um line width/spacing
- Able to achieve large substrates with reticle stitching
  - Future designs are projected at >120mm X 120mm

# 2.5D with Silicon Bridge

- Large silicon interposer can be expensive
- Silicon bridge technology allows the use of highdensity Si only where you need it

#### [Das Sharma Nature Elec 2024]





### [Pantano ISSCC 2025]

# 2.5D with Fanout RDL Interposer

#### https://ase.aseglobal.com/focos/ [Pantano ISSCC 2025] 1µm Line/Space Dual Damascene M1/V1/M2 Cu D3 M2 -----D3 M2 VI D2 VI D2 ....... DI MI MI DI Substrate

FOCoS-CL (Chip Last)

FOCoS-Bridge

- RDL-based interposers offer a lower-cost option without reticle size constraint
- Possible to also insert silicon bridges
- Advanced processing techniques are being explored to enable 0.5um line/space

# **3D** Packaging

- Microbumps are currently offered at ~50um pitch, with scaling possible to ~5um
- Higher density (<1um pitch) is possible with hybrid bonding



Source: TechInsights



Oxide to Oxide Internal

#### [wu ISSCC 2025]

Oxide

Oxide

Silicon

Metal

Meta

Silicon



Oxide



## Outline

- Die-to-Die Transceiver Motivation
- Packaging Options
- Standard Package Links
- Advanced Package Links
- 3D Package Links
- Conclusion

## Die-to-Die Transceiver Standards

|           | UCIE                | BOW                 | OpenHBI             | XSR                  |
|-----------|---------------------|---------------------|---------------------|----------------------|
| Data Rate | 8/16/32Gb/s         | 8/16/32Gb/s         | 8/16Gb/s            | 112/224Gb/s          |
| Signaling | Single-Ended<br>NRZ | Single-Ended<br>NRZ | Single-Ended<br>NRZ | Differential<br>PAM4 |
| Channel   | 2D/2.5D             | 2D/2.5D             | 2D/2.5D             | 2D                   |
| Clocking  | Clock<br>Forwarding | Clock<br>Forwarding | Clock<br>Forwarding | Recovered<br>Clock   |
| Reach     | 2mm-25mm            | 5mm-50mm            | 4mm                 | Up to 50mm           |
| Loss      | 3dB                 | 4dB                 | 3dB                 | 10dB                 |

15

# 113Gb/s PAM4 XSR XCVR (Std. Package)

[Gangasani ISSCC 2022]

- 8-port I/O core
- <sup>1</sup>/<sub>4</sub>-rate clocks from global PLL
- TX
  - Tailless 4:1 mux and CML output driver with  $550mV_{ppd}$  swing
  - 4-tap analog FFE with <1mV/LSB for -1-to-2 ISI terms
  - 2 independent roaming taps between 3-22 ISI terms
- RX
  - Gm-TIA CTLE provides 5dB peaking with 0.2dB steps for the majority of ISI cancellation
  - 5 samplers per 1/4-rate segment (3 data, 1 edge, 1 error)
  - PI-based BB-CDR
- RX-driven dynamic adaptation of TX & RX equalization settings with token-based backchannel



# 113Gb/s PAM4 XSR XCVR (Std. Package)



### Samsung 48Gb/s/wire Die-to-Die Link in 4nm (Std. Package)

**4 Die-to-Die Slices** 





#### RX w/ 1-tap DFE



- Single-ended PAM2 links
- Differential forwarded-clock shared among 10 lanes
- TX 4:1 mux employs source-follower-based feedback equalizer
- TX low-swing NMOS main driver and parallel capacitive equalizer
- RX utilizes 1-tap DFE implemented in slicer second stage
- 10mm package substrate with -3dB loss
- 0.67pJ/b and 1.85Tb/s/mm edge bandwidth density



### [Seong ISSCC 2024] 18

## Outline

- Die-to-Die Transceiver Motivation
- Packaging Options
- Standard Package Links
- Advanced Package Links
- 3D Package Links
- Conclusion

# 32Gb/s/wire Die-to-Die Link (Adv. Package)



- Single-ended PAM2 links
- 1/2-rate forwarded-clock shared among 39 lanes
- Nominal deskew set with global deskew block in RX DQS channel
- RX clock path delay provides per-channel deskew

### [Seong ISSCC 2023]





# 32Gb/s/wire Die-to-Die Link (Adv. Package)

### **Reflection-Cancellation Driver**





### Reflection-cancellation driver allows for unterminated RX

- Receiver utilizes 1-tap DFE embedded in 2-stage latch
- Implementation uses 50um bump pitch
- 3mm silicon interposer channel has -3.9dB loss and -29.3dB crosstalk at 16GHz
- 0.44pJ/b
- 8Tb/s/mm edge density

### RX 1-tap DFE





|                                                                                      | [1] VLSI19   | [2] ISSCC21 | [3] ISSCC22 | [4] VLIS21   | [5] VLSI22     | This work    |
|--------------------------------------------------------------------------------------|--------------|-------------|-------------|--------------|----------------|--------------|
| Technology                                                                           | 7nm          | 7nm         | 5nm         | 7nm          | 5nm            | 4nm          |
| (hannel longth (mm)                                                                  | 0.5          | 20          | 5-to-80     | 1            | 1.2            | 3            |
| channel length (mm)                                                                  | (Interposer) | (MCM)       | (MCM)       | (Interposer) | (Interposer**) | (Interposer) |
| Bump pitch (um)                                                                      | 40           | 130         | -           | 40           | 55             | 50           |
| Data rate (Gbps/pin)                                                                 | 8            | 40          | 113         | 20           | 50.4           | 32           |
| Bandwidth of beach front (Tbps/mm)                                                   | 0.625        | 0.45        | 0.46        | 5.31         | 2.68           | 8            |
| Power efficiency (pJ/bit)                                                            | 0.56         | 1.17        | 1.55        | 0.46         | 0.297          | 0.44         |
| FoM ((Tbps/mm)/(pJ/bit))                                                             | 1.11         | 0.38        | 0.296       | 11.5         | 9              | 18.2         |
| ** An on-chip channel that simulates the characteristics of the interposer was used. |              |             |             |              |                |              |

[Seong ISSCC 2023]

21

### Marvell 32Gb/s/wire Die-to-Die Link in 3nm (Adv. Package)



- Single-ended PAM2 links
- Differential forwarded-clock shared among 18 lanes
- TX SST drivers
- RX clock channel distributes data (min delay), edge (0.5UI delay), and EOM (0.5-1.5UI delay) clocks
- Per-lane CDR sets delay codes for RX data paths
- 1mm & 2mm 2.5D CoWoS package channels
  - -2.4dB loss and -18.1dB crosstalk
- 0.36pJ/bit and 3.84Tb/s/mm edge bandwidth density



PreDRV

DRV









# Correlated NRZ (5b6w)





#### **Multi-Input Combiner**



- 5 bits over 6 wires
  - Maintains common-mode and crosstalk noise resilience
  - Has same ISI ratio=1 as NRZ
  - Sensitive to skew between wires
- Lower 3/5X Nyquist frequency
- 16nm implementation achieved 20.8Gb/s/wire over a 6dB channel at 1pJ/b

## Simultaneous Bidirectional Signaling



- Requires efficient in/outbound signal separation
- CTLE compensates for channel loss, but doesn't help echoes
- Echo cancellation is necessary

# 32Gb/s PAM2 Simultaneous Bidirectional XCVR



- 32Gb/s SBD source-synchronous NRZ transceiver supporting channel loss of ~10dB
- VM TX driver combining with R-gm hybrid for signal separation
- Echo cancellation adaptation
- Achieved 1.83pJ/b in 28nm CMOS

### 50.4Gb/s/wire Simultaneous Bidirectional Die-to-Die Link

[Nishi VLSI 2022]



- Inverter-based voltage-mode driver and replica driver with complementary data
- Replica driver and pad signals connected to analog voltage adder (TIA)
- Resistor values set to cancel outbound signal and only amplify inbound signal
- 2 uni-directional forwarded clock channels shared among 14 (or 18) lanes



### 50.4Gb/s/wire Simultaneous Bidirectional Die-to-Die Link

[Nishi VLSI 2022]



- Current implementation uses 55um bump pitch that limits density
- Planned interposer has 12.14um signal-to-signal pitch across 4 routing layers
- 12.6GHz IL=-4.0dB, FEXT=-49.1dB, NEXT=-40.7dB
- Projected 18 DQ lanes and 4-rank
  - 0.281pJ/b •
  - 5.73Tb/s/mm<sup>2</sup> areal density ٠
  - 11.0Tb/s/mm edge density ٠





Ð



|                                       | Our work             | Y-Y Hsu<br>VLSI21 | M-S Lin<br>VLSI19 | B.Dehlaghi<br>JSSC16 |
|---------------------------------------|----------------------|-------------------|-------------------|----------------------|
| Technology                            | 5nm                  | 7nm               | 7nm               | 28nm                 |
| µbump pitch                           | 55µm*                | 40µm 🗸            | 40µm 🗸            | 100µm                |
| Interposer Channel                    | 1.2mm**              | 1.0mm             | 0.5mm             | 2.5mm 🗸              |
| Supply[V]                             | 0.75 🗸               | 0.8               | 0.8,0.3           | NA                   |
| Data Rate/wire [Gb/s]                 | 50.4 (SBD)           | 20 (NRZ)          | 8 (NRZ)           | 20 (NRZ)             |
| Energy Efficiency [pJ/bit]            | <mark>0.281</mark> 🗸 | 0.46              | 0.56              | 0.3***               |
| Areal Density [Tb/s/mm <sup>2</sup> ] | <mark>5.73</mark> 🗸  | 2.25              | 0.8               | NA                   |
| Edge Density [Tb/s/mm]                | <mark>11.0</mark> 🗸  | 5.31              | 0.67              | NA                   |

## Outline

- Die-to-Die Transceiver Motivation
- Packaging Options
- Standard Package Links
- Advanced Package Links
- 3D Package Links
- Conclusion

## UCIe-3D

| Characteristics / KPIs             | UCIe-S (2D)                                    | UCIe-A (2.5D) | UCIe 3D                                                  | Comments for UCIe 3D                                                                                                          |  |
|------------------------------------|------------------------------------------------|---------------|----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|--|
| Characteristics                    |                                                |               |                                                          |                                                                                                                               |  |
| Data Rate (GT/s)                   | 4, 8, 12, 16, 24, 32                           |               | Up to 4                                                  | = SoC Logic frequency – power efficiency is critical                                                                          |  |
| Width (each cluster)               | 16                                             | 64            | 80                                                       | Options or reduced width to 70, 60                                                                                            |  |
| Bump Pitch (µm)                    | 100 - 130                                      | 25 - 55       | <pre>&lt;_10 (optimized) &gt; 10 - 25 (functional)</pre> | Must scale so that UCIe-3D fits within the bump area, must support hybrid bonding                                             |  |
| Channel Reach (mm)                 | <u>&lt;</u> 25                                 | <u>&lt;</u> 2 | 3D vertical                                              | FtF bonding initially;<br>FtB, BtB, multi-stack possible                                                                      |  |
| Target for Key Metrics             |                                                |               |                                                          |                                                                                                                               |  |
| BW Shoreline (GB/s/mm)             | 28 - 224                                       | 165 - 1317    | N/A (vertical)                                           |                                                                                                                               |  |
| BW Density (GB/s/mm <sup>2</sup> ) | 22 - 125                                       | 188 - 1350    | 4,000 - 300,000                                          | 4TB/s/mm <sup>2</sup> @ 9μm, ~12TB/s/mm <sup>2</sup> @ 5μm,<br>~35TB/s/mm <sup>2</sup> @ 3μm, ~300TB/s/mm <sup>2</sup> @ 1 μm |  |
| Power Efficiency Target<br>(pJ/b)  | 0.5                                            | 0.25          | <0.05 at 9µm -><br>0.01 at 1 µm                          | Conservatively estimated at 9µm pitch<br><0.02 for 3µm pitch                                                                  |  |
| Low-Power Entry/Exit               | 0.5nS <u>&lt;</u> 16G, 0.5-1nS <u>&gt;</u> 24G |               | 0nS                                                      | No preamble or post-amble                                                                                                     |  |
| Reliability (FIT)                  | 0 < FIT (Failure in Time) << 1                 |               | 0 < FIT << 1                                             | BER < 1E-27                                                                                                                   |  |
| ESD                                | 30V CDM                                        |               | $5V \text{ CDM} \rightarrow \underline{<} 3V$            | 5V CDM at introduction, no ESD for W2W hybrid bonding possible                                                                |  |

[Wu ISSCC 2025]

- Extremely low-lower (0.01pJ/b) and high-density (<10um pitch) interface for 3D-stacked dies
- Orders of magnitude improvement in density over 2.5D UCIe
- Reduced ESD requirements, with potential for no ESD with hybrid bonding

#### [Das Sharma Nature Elec 2024]



## Proposed UCIe-3D PHY



- Simple inverter-based flop-flop transceiver architecture
- Forwarded clock without any skew adjustment
- Nominal 4Gb/s operation with no SERDES
- Projecting  $>10^5$  Gb/s/mm<sup>2</sup> and 0.01pJ/b as bump pitch scales to 1um
  - Potential for lower-power operation with fractional operation ٠ frequency (2X wires at 2Gb/s)



Ē

8

## Conclusion

- Splitting large monolithic chips into chiplets provides yield advantages and flexibility in process choice
- Dense energy-efficient interconnects are required to support dieto-die communication
- Differential XSR links are options for longer 2D packaging links
- Single-ended links offer bandwidth density and energy efficiency advantages for shorter 2D and advanced 2.5D packaging
- 3D packaging allows for simple inverter-buffer transceivers that are projected to achieve extremely high bandwidth density and excellent energy efficiency