# Power Efficiency Modeling and Optimization of High-Speed Equalized-Electrical I/O Architectures

Arun Palaniappan and Samuel Palermo, *Member, IEEE* Department of Electrical & Computer Engineering Texas A & M University College Station, TX 77840 arunp ece@neo.tamu.edu, spalermo@ece.tamu.edu

*Abstract*—An I/O design framework is presented which combines statistical link analysis with circuit power models to predict the power-optimum equalization architecture, circuit style, and transmit swing at a given data rate, channel, and process node.

Keywords- Electrical Interconnects, High-Speed I/O Link, Power Efficiency Modeling and Optimization

## I. INTRODUCTION

I/O-communication bandwidth has rapidly increased to scale commensurately with on-chip processing performance. While nanometer CMOS technologies provide adequate bandwidth for data rates in excess of 10 Gb/s, limited electrical I/O channel bandwidth prohibits high-speed I/O data rate scaling; necessitating equalization schemes that compensate for the channel losses. However, excessive equalization complexity can increase I/O power dissipation to unacceptable levels for future processors [1]. This creates the need for low power architectural techniques which can significantly improve the I/O power efficiency to comply with system power budgets.

This paper presents a design methodology that minimizes high-speed link power dissipation by selecting the optimum equalization architecture, circuit style (CMOS vs CML), and transmit output swing for a given data rate, channel type and process technology. The work leverages previous optimization methods for electrical links [2], [3] and builds upon them by combining statistical link analysis techniques with comprehensive equalization and serialization circuit power models. Due to the complex trade-offs involved in the design of high-speed links, statistical link analysis methods [4], [5] which co-optimize circuit and channel characteristics are utilized to accurately model and estimate the link margin at a given bit-error rate (BER). Based on the link margin results, transmitter output swing is scaled to satisfy the minimum receiver eye opening requirement and operate at optimal power efficiency. Comprehensive transmitter and receiver circuit models, which utilize normalized transistor parameters extracted from preliminary spice simulations of the circuit topologies, are used to provide for accurate power estimates over a wide link architecture design search space.

A brief overview of electrical links and their circuits are given in Section II. Section III presents the optimization methodology and performance comparison of electrical links over different channels and data rates in 90 nm and 45 nm CMOS technologies followed by the conclusion in Section IV.

#### II. ELECTRICAL I/O ARCHITECTURE

The backplane channel responses used in this work are shown in Fig. 1 [6]. All of the channels have linecard traces between 5-6" and varying backplane trace lengths; a short 1"channel (B1) with bottom traces, 20" channel (C4) with bottom stripline layer and 20" channel (T20) with top traces. The low pass channel characteristic spreads short data pulses in time, resulting in inter-symbol interference (ISI) that degrades voltage and timing margins and limits the maximum achievable data rate. In order to mitigate the effects of ISI and operate reliably at higher data rates over band limited channels, equalization circuitry is used.

The block diagram of a typical high-speed electrical link system is shown in Fig. 2(a), where a combination of transmitter (TX) feed-forward equalization (FFE) [7], receiver (RX) continuous time linear equalization (CTLE) and decision feedback equalization (DFE) [8] are implemented to mitigate the effects of ISI. In this work the TX FFE, implemented as an FIR filter which pre-distorts the transmitted signal to equalize the channel distortion, is limited to a maximum of 4-taps as shown in Fig. 2(b). Both CMOS and current mode logic (CML) circuits are considered. Fig. 2(c) shows an RX CTLE which provides high frequency peaking to compensate for the low



Fig. 1. Channel Frequency Response



Fig. 2. (a) Block diagram of High-Speed Electrical Link System (b) 4-tap Transmitter (TX) Feed-Forward Equalizer (FFE) [7] (c) Receiver (RX) Continuous Time Linear Equalizer (CTLE) (d) 5-tap Receiver (RX) Decision Feedback Equalizer (DFE) [8]

pass channel response. In this work the RX DFE (Fig. 2(d)), a non-linear equalizer that cancels the post-cursor ISI by subtracting the effect of previously sampled data weighted by the filter taps is limited to a maximum of 5-taps.

#### III. **OPTIMIZATION AND PERFORMANCE COMPARISON**

### A. Optimization Methodology

The flowchart of the optimization methodology used in this work is shown in Fig. 3. StatEye[4], an open source statistical link analysis tool, utilizes statistical methods to integrate deterministic and random noise sources constrained by I/O specifications such as bit-error rate (BER), minimum eye opening and jitter compliance given in Table I. StatEye is used to estimate the link voltage and timing margins, equalization coefficients for a channel response at a given data rate and generate a database for the channels' equalization requirements. Transmitter output swing is optimized such that link voltage margin meets minimum eye opening compliance requirement, resulting in considerable power savings. In order to achieve accurate circuit modeling results, normalized transistor parameters (transconductance, capacitances, output conductances etc) are utilized. These are extracted from preliminary spice simulations of the circuit topologies. The

| Bit-Error Rate (BER)         | 10-12                                     |
|------------------------------|-------------------------------------------|
| TX deterministic jitter (dj) | 0.01 UI                                   |
| TX random jitter (rj)        | 0.01 UI                                   |
| Min. Eye Opening compliance  | 20 mV <sub>pp</sub>                       |
| RX jitter compliance dj      | 0.45 UI                                   |
| RX jitter compliance tj      | 0.675 UI                                  |
| $V_{dd}$                     | 1.2V (90 nm) 1.1V (45 nm)                 |
| Max. TX Swing                | $1.2V_{ppd}$ (90 nm) $1.1V_{ppd}$ (45 nm) |

TABLE I. ELECTRICAL LINK I/O SPECIFICATIONS

circuit parameters are scaled in a constant current density manner, by scaling transistor finger number under fixed biasing conditions and finger size.

The main objective of the optimization methodology is to minimize the total link power dissipation, which includes all the serialization, deserialization and equalization circuits of the transmitter and receiver. Note, while local clock buffering power is modeled, clock generation, distribution, and recovery power, which can vary with application, is left for future work in order to more clearly display the electrical channel performance impact. The link margin and equalization coefficient results from StatEye along with the scaled transmitter output swing are coupled with normalized transistor parameters and circuit design constraints of the different transmitter and receiver blocks to design and compute the power dissipation of each equalization architecture. The power computation of various equalization architectures satisfying the



Fig. 3. Optimization Methodology Flowchart



Fig. 4. CTLE Power Efficiency vs. Data rate as a function of peak gain I/O specifications yields a wide design search space, from which is selected the optimal architecture with minimum power solution for a given data rate. This following section compares the impact of channel loss, circuit style, and process node on this power optimal design point.

#### **B.** Link Optimization Results

Transmitter and receiver circuits are modeled by iterating the design variables over circuit design criteria to satisfy a given data rate specification. The transmitter is designed such that the serialization and pre-driver circuits have transition times of one third of the bit period to avoid excessive intersymbol interference. Both CML and CMOS logic based circuit designs are analyzed over the data rates of interest. As will be shown later in the full link power efficiency results, static power dissipation of a CML-based design results in degraded power efficiency at lower data rates relative to CMOS-based design with dynamic power dissipation that scales down at lower frequencies. However, a CMOS-based design supports lower fan-outs at higher date rates due to a higher percentage of self-loading capacitance; necessitating large transistor sizes and increased power to satisfy the transition time specification. Thus, a valuable aspect of this design methodology is predicting when it is optimum from a power perspective to transition from a CMOS to a CML-based design.

Examples of receiver equalization circuitry power efficiency modeling versus data rate are shown in Fig 4 and



Fig. 5. DFE Power Efficiency vs. Data rate as a function of number of taps Fig. 5. CTLE power efficiency is a strong function of the peak gain requirement. As shown in Fig. 4, the 90nm technology realizes 12dB peak gain up to a little higher than 14Gb/s, but can achieve 6dB peak gain past 20Gb/s. Scaling technology to a higher  $f_T$  45nm process allows realization of 12dB peak gain out to 18Gb/s. The maximum data rate the DFE can reliably operate is determined by the 1 unit interval (UI) first-tap critical timing path of the direct feedback architecture, which includes the comparator Clock-Q delay, the feedback tap propagation delay, and the time for amplifier A2 to achieve 95% settling. As shown in Fig. 5, increasing DFE tap number adds additional loading on the critical tap-current summation node, resulting in reduced maximum operational data rate.

Using the discussed optimization methodology, link power efficiency for the three channels from Section II is computed and the impact of optimizing transmitter output swing and circuit style are illustrated in Fig. 6 and 7, respectively. Optimizing transmit swing can dramatically reduce power. As shown in Fig. 6, at 12Gb/s the power is roughly cut in half on the high-loss T20 channel and dramatically reduced to 20% of the non-scaled value in the low-loss B1 channel. The choice of CML vs CMOS circuit style is a function of data rate and technology node. As shown in the 90nm modeling results of Fig. 7, at low data rates the CMOS based link has better power efficiency than the CML based link with significant static power dissipation. However, beyond 14 Gb/s the CMOS based



Fig. 6. Total Power Efficiency of Electrical Link vs. Data Rate with and without TX Swing Optimization using CML circuits in 90 nm process



Fig. 7. Total Power Efficiency of Electrical Link vs. Data Rate using CML vs. CMOS circuits in 90 nm process



Fig. 8. Optimal Solution with Minimum Power Efficiency vs. Data rate link requires a large power due to reduced fan-out and CML based link becomes more power optimal. For example, at 16 Gb/s the CMOS based link achieves 5.95 mW/Gb/s operating on the low loss B1 channel, while the CML based link power efficiency is only 1.62 mW/Gb/s.

The impact of electrical channel and process node is evident in the modeling results of Fig. 8, which combines the CMOS and CML-based results to select the optimum design at a given data rate, and Fig. 9, which shows the optimum equalization architecture. The high loss T20 channel is strongly channel-limited, as there is no difference in the optimum equalization architecture or CMOS circuit style between the 90nm and 45nm process. A 3-tap FFE transmitter and 4-tap DFE receiver is required at the maximum data rate of 12Gb/s, resulting in a 90nm power efficiency of 3.0 mW/Gb/s and 1.8 mW/Gb/s in the 45 nm process.

The C4 channel has improved loss characteristics due to signaling on the bottom backplane layer, avoiding the detrimental impact of the T20 long via stubs. For this channel, the process node has an impact on the optimum equalization architecture and circuit style. In the 90nm technology, a CMOS design is more power efficient up to 14Gb/s, while above this data rate a CML design is chosen. A CMOS design is chosen for all data rates in the 45nm technology. Also, the 90nm design cannot efficiently leverage CTLE equalization above 12Gb/s, while the 45nm design utilizes a CTLE up to 16Gb/s. The 90nm design is limited to 16Gb/s due to the inability to implement a high-speed direct feedback DFE, while in the 45nm process the DFE is possible to allow for 18Gb/s.

The low-loss B1 channel doesn't require significant equalization complexity until about 18Gb/s. Interestingly, the optimal equalization architecture selected is 1-tap TX FFE with CTLE up to 16 Gb/s in 90 nm and 18 Gb/s in 45 nm. Including the CTLE actually achieves less power than with only 1-tap TX FFE, i.e. no equalization, as the CTLE peak gain allows scaling down the transmit output swing significantly. The 90nm switches to a 3-tap TX at 18Gb/s due to the inefficiency of the CTLE at this high data rate, while the 45 nm can still leverage a high peak gain CTLE at this data rate and doesn't require the 3-



Fig. 9. Optimal Equalization Architecture vs. Data Rate

tap TX FFE until 20Gb/s. Excellent power efficiency is achieved with this low-loss channel, as sub-mW/Gb/s operation is possible for the transmitter and receiver circuitry, again neglecting clock generation, distribution, and recovery, in the 45nm technology up to 18Gb/s. Above 20Gb/s, the channel could potentially achieve higher data rates with DFE. However, even the 45nm technology cannot efficiently implement the direct feedback architecture modeled in this work. Thus, this link is technology limited, and could potentially benefit by scaling to a more advanced process node.

### IV. CONCLUSION

In conclusion, this work presented a design flow for optimization of high-speed electrical I/O link power utilizing statistical link analysis methods and circuit power estimates. The design methodology predicts the optimum equalization architecture, circuit style (CMOS vs CML), and transmit output swing for minimum I/O power.

#### REFERENCES

- O'Mahony et al., "The future of electrical I/O for microprocessors," International Symposium on VLSI Design, Automation and Test, pp. 31-34, April 2009.
- [2] V. Stojanovic and M. Horowitz, "Modeling and analysis of high speed links," *IEEE Custom Integrated Circuits Conf. Dig. Tech. Papers*, pp. 589–594, 2003.
- [3] H. Hatamkhani , F. Lambrecht , V. Stojanovic and C.-K. Ken Yang, "Power-centric design of high-speed I/Os", *Proc. 43rd ACM/IEEE Design Automation Conf.*, pp. 867–872, 2006.
- [4] A. Sanders, M. Resso, and J. D'Ambrosia, "Channel Compliance Testing Utilizing Novel Statistical Eye Methodology," *DesignCon 2004*, Feb. 2004.
- [5] G. Balamurugan *et al.*, "Modeling and Analysis of High-Speed I/O Links," *IEEE Transactions on Advanced Packaging*, vol. 32, no. 2, pp. 237-247, May 2009.
- [6] http://grouper.ieee.org/groups/802/3/ap/public/channel\_model/index.htm l
- [7] J. F. Bulzacchelli *et al.*, "A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 41, no.12, pp. 2885–2900, Dec. 2006.
- [8] R. Payne et al., "A 6.25-Gb/s binary transceiver in 0.13-μm CMOS for serial data transmission across high loss legacy backplane channels," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2646–2657, Dec. 2005.