# An Accurate and Efficient Analysis Method for Multi-Gb/s Chip-to-chip Signaling Schemes

Bryan K. Casper, Matthew Haycock, Randy Mooney

Circuit Research, Intel Labs bryan.k.casper@intel.com Hillsboro, OR

## Abstract

This paper introduces an accurate method of modeling the performance of high-speed chip-to-chip signaling systems. Implemented in a simulation tool, it precisely accounts for intersymbol interference, cross-talk and echos as well as circuit related effects such as thermal noise, power supply noise and receiver jitter. We correlated the simulation tool to actual measurements of a high-speed signaling system and then used this tool to make tradeoffs between different methods of chip-to-chip signaling with and without equalization.

# Introduction

Using well-known principles of communication theory, we have developed a simulation tool that can predict and analyze the performance of any linear time-invariant high-speed signaling system. The tool can analyze different methods of signaling (i.e. multi-level, simultaneous bidirectional (SBD), etc.) and various forms of equalization (pre-emphasis, linear equalization, etc.). This tool accurately accounts for driver bandwidths, receiver bandwidths and sensitivities, receiver jitter, noise, intersymbol interference (ISI) and cochannel interference such as far-end cross-talk (FEXT), near-end cross-talk (NEXT), and echos (return loss).

A traditional method of signaling simulation involves performing a time domain simulation using random data vectors as the input. This can be done using commonly available simulation tools with the desired result being an eye diagram. Unfortunately, the worst-case eye diagram for many high-speed chip-to-chip communication systems is not accurately characterized by a small set of random input data. When a very large set of random data is used as the input stimulus, simulation time becomes prohibitive. Additionally, eye diagram simulations give little insight into the bit-error rate (BER) performance of such signaling systems. The methods presented here solve these difficult issues analytically, rather than depending on a time domain simulation of random data.

The first method we developed calculates the peak distortion of all interference sources to extract a worst-case eye diagram[1]. From the worst-case eye representation and the peak sampling boundary (assuming all noise sources can be bounded within a certain probability), it is possible to determine the associated timing and voltage margins in order to send data error free. Using this method, the maximum data rate is easily determined. This method also has the capability of determining the data patterns that produce the worst-case ISI. We also developed a second method of maximum data rate calculation that produces the BER as a function of the data rate. This statistical analysis is more computationally intensive, but it provides a more accurate prediction of the maximum data rate.

These two methods are very useful when determining tradeoffs between different signaling techniques, equalization methods, interconnect topologies and circuit implementations.

# **Peak Distortion Analysis**

To determine the worst-case voltage or timing margin, the worst-case received eye shape is extracted along with the peak sampling boundary. Since sources such as intersymbol and cochannel interference have truncated distributions, the associated worst-case magnitudes can be directly calculated from the unit pulse responses of the system. The unit pulse response y(t) of a system is given by

$$y(t) = c(t) \otimes p(t) \tag{1}$$

where c(t) is the transmitter symbol response, p(t) is the impulse response of the channel and receiver and  $\otimes$  denotes convolution. The eye edge due to the worst-case 1 is given by

$$s_{1}(t) = y(t) + \sum_{\substack{k=-\infty\\k\neq 0}}^{\infty} y(t-kT) \Big|_{y(t-kT)<0}$$
(2)

where T is the symbol period. If n cochannel interference sources exist and  $y^{i}$  is the cochannel pulse response, the worst-case 1 eye edge becomes

$$s_{1}(t) = y(t) + \sum_{\substack{k=-\infty\\k\neq 0}}^{\infty} y(t-kT) \Big|_{y(t-kT)<0}$$

$$+ \sum_{i=1}^{n} \sum_{k=-\infty}^{\infty} y^{i} (t-kT-t_{i}) \Big|_{y^{i}(t-kT-t_{i})<0}$$
(3)

where  $t_i$  is the relative sampling point of each cochannel pulse response. The eye edge due to the worst-case 0 is given by

$$s_{0}(t) = \sum_{\substack{k=-\infty\\k\neq 0}}^{\infty} y(t-kT) \Big|_{y(t-kT)>0} + \sum_{i=1}^{n} \sum_{k=-\infty}^{\infty} y^{i} (t-kT-t_{i}) \Big|_{y^{i}(t-kT-t_{i})>0.}$$
(4)

Therefore, the worst-case eye opening, e(t), is defined as  $s_1(t) > e(t) > s_0(t)$ .

To determine if data can be received error free, the peak sampling boundary must lie within the worst-case data eye. The peak sampling boundary is determined from the union of the receiver referenced noise, sensitivity, offset, skew, jitter and any other timing or voltage term that prevents the receiver from sampling in the middle of the data eye. In most

(5)

cases, this boundary forms a rectangle inside the worst-case data eye. These noise and jitter sources are described by both bounded and unbounded distribution functions. Unbounded noise sources such as thermal noise have an infinite peak magnitude. To deal with these gaussian distributed sources (including jitter), it is necessary to bound them within a certain probability that will effectively provide error free data transmission. In this case, we assume that the bounding probability is  $10^{-21}$ . Thus the peak noise amplitude for a gaussian source is  $\pm 10\sigma$ .

# **Statistical Analysis**

Statistical analysis produces the BER as a function of the data rate. To determine the BER, a distribution plot is constructed that relates the BER to the sampling (voltage and timing) point. This distribution is multiplied by the sampling distribution and then summed to give the overall BER. The BER distribution is derived by calculating the probability density function (pdf) of all interference sources[2]. Assuming equal probability of a 1 or 0, the pdf of the ISI is recursively calculated by convolving the individual ISI samples. This is given by

$$z_{k+1}(\tau,t) = \begin{cases} \frac{\delta(\tau) + \delta(\tau - y(t - kT))}{2} \otimes z_k(\tau,t), & k \neq 0\\ z_k(\tau,t), & k = 0 \end{cases}$$
(6)

where z is recursively calculated from  $k = -\infty$  to  $k = \infty$  while the initial condition is  $z_{-\infty}(\tau, t) = \delta(\tau)$ . The pdf of n cochannel interference sources can be determined by the following

$$z_{k+1}^{i}(\tau,t) = \frac{\delta(\tau) + \delta(\tau - y^{i}(t - kT - t_{i}))}{2} \otimes z_{k}^{i}(\tau,t)$$

$$\tag{7}$$

where i=1 to n. The resulting pdf of all deterministic interference sources is given by

$$\zeta(\tau,t) = z(\tau,t) \otimes z^{1}(\tau,t) \otimes \ldots \otimes z^{n-1}(\tau,t) \otimes z^{n}(\tau,t).$$
(8)

The received voltage difference pdf as a function of the reference voltage, v, and the sampling point, t, is given by

$$RD(\tau,t,\nu) = \frac{\zeta(\tau - y(t) + \nu, t)}{2} + \frac{\zeta(\nu - \tau, t)}{2}.$$
(9)

The received voltage difference pdf is converted to a BER distribution function through integration



Fig. 1 BER distribution plot of a unidirectional binary link

$$\beta(t,v) = \int_{-\infty}^{v} \left( \frac{\zeta(\tau - y(t) + v, t)}{2} + \frac{\zeta(v - \tau, t)}{2} \right) d\tau$$
(10)

where j is the sensitivity of the receiver. Fig. 1 demonstrates a BER distribution plot of a unidirectional binary link running at a data rate of 4Gb/s. The BER of a signaling system is determined by multiplying the BER distribution function,  $\beta(t, v)$ , by the sampling distribution function,  $\phi(t, v)$ . The resulting BER rate is

$$BER = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \left(\beta(t, v) \cdot \phi(t, v) dt\right) dv.$$
(11)

# **Simulated Versus Measured Data Correlation**

A digital router chip with high-speed SBD signaling ports was implemented and tested[3]. Partial testing results for this signaling system have been detailed in [4] and [5]. Table 1 lists the relevant measured parameters used for the signaling simulations. The SBD port was characterized using various interconnect topologies including a short chip-to-chip link of 15cm along with three backplane links of 34cm, 51cm and 107cm. Each backplane link consisted of two router boards connected to a backplane through AMP HS3 connectors. Each OLGA-packaged component was soldered directly to the FR4 router boards. The 51cm interconnect topology is shown graphically in Fig. 2.

Each of the four interconnect topologies were simulated to derive a 6 port S-parameter matrix. The insertion loss parameter for the 51cm interconnect is given in Fig. 3. An

Table 1 Simulation parameters based on measurements

| Receiver Jitter            | 2.5ps rms (50ps p-p @ ±10\sigma) |
|----------------------------|----------------------------------|
| Receiver Referenced Noise  | 5mV rms (100mV p-p @ ±10σ)       |
| Receiver Latch Sensitivity | 30m∨                             |
| Transmitter Bandwidth      | 3.5GHz (1 pole)                  |
| Receiver Bandwidth         | 3.2GHz (1 pole)                  |
| Supply Voltage             | 1.6V                             |



inverse FFT was performed on each of the S-parameters to extract the individual impulse responses for insertion loss, return loss, NEXT and FEXT. These impulse responses are used to calculate the unit pulse response as specified in (1).

We used our simulation tool to calculate the maximum bandwidth using the peak distortion and statistical analysis. These results were then compared to the measured results as shown in Fig. 4. Notice that the statistical simulation correlated closely to the measured bandwidth for all but the 15cm link. We know that this is due to the inherent frequency limitations of the router chip. Were it not for the maximum chip clock frequency of 3.2GHz, the 15cm link bandwidth could have been higher as indicated by the excess timing and voltage margin.

The peak distortion simulations were pessimistic by about 20% when compared with the statistical analysis for the SBD system. This is because the peak distortion analysis determines the maximum bandwidth at BER=0 assuming all noise and jitter sources are bounded. The peak distortion analysis is very useful because it uses a computationally efficient algorithm. On the other hand, the statistical analysis is extremely accurate at the expense of being computationally intensive. In our simulations, the run-time for the statistical simulation was 2 to 3 orders of magnitude greater than that for the peak distortion simulation.

# **Comparison of Potential Backplane Signaling Methods**

#### A. Unidirectional Versus Simultaneous Bidirectional

The value of a general signaling simulation tool is in its ability to make design tradeoffs in areas such as signaling technique, equalization method, interconnect topologies, and



Fig. 4 Measured vs simulated data rates for SBD



Fig. 5 Statistical vs peak distortion simulation for UD binary

circuit parameters. A particularly interesting analysis to those designing chip-to-chip signaling systems is evaluating unidirectional (UD) binary versus SBD links.

Voltage-mode SBD links benefit from being able to swing across the full range of the power supply voltage and by having a symbol rate that is half of the aggregate data rate[4,5,6,7]. Unfortunately, SBD links are hindered by echos that can be prohibitive for topologies having large impedance discontinuities. However, UD links that send data in the same direction are not affected by echos (return reflections) or NEXT.

Statistical and peak distortion simulations were done for a UD binary system given the same interconnect topologies as the SBD link and circuit parameters shown in Table 1. The results are shown in Fig. 5.

Interestingly, the aggregate bandwidth for SBD is 50-60% higher than UD binary for a given length. This is not too surprising since the SBD link utilizes twice the voltage swing as the UD binary link. For current-mode links where the voltage swing for SBD and UD binary are similar, the maximum bandwidth difference may not be as dramatic.

# B. Multi-level Signaling

Another method of chip-to-chip communication that has been attempted is multi-level signaling. A popular form of multi-level signaling is to have four different symbol shapes represent 2 binary data bits. These four different symbols are linearly scaled replicas of the same full-scale symbol. As with SBD signaling, this pulse amplitude modulation (PAM) method (a.k.a. 4 PAM) has data rates that are twice the symbol rate.

The peak distortion and statistical algorithms were modified to accommodate analysis of 4 PAM with its 4 different



Fig. 6 Statistical vs peak distortion simulation for 4 PAM



Fig. 7 SBD, UD binary and 4 PAM data rates

2002 Symposium On VLSI Circuits Digest of Technical Papers

symbol shapes and 3 reference values and then integrated in our simulation tool. The results across the four interconnect topologies are shown in Fig. 6. The comparison between SBD, UD binary and 4 PAM for a BER of  $10^{-17}$  is shown in Fig. 7.

The 4 PAM data rates are much lower than either SBD or UD binary data rates at all interconnect lengths. The increase in signal to noise plus distortion ratio due to reducing the symbol rate is not enough to make up for the signal spacing reduction of 1/3.

#### **Incorporating Equalization**

One enhancement that can be made to the previous signaling systems is to add transmitter pre-emphasis. While implementing pre-emphasis introduces greater circuit complexity, the maximum achievable data rate can be significantly increased. Pre-emphasis works to compensate for high-frequency channel attenuation by de-emphasizing the low-frequency signal components. This effectively reduces ISI at the expense of limiting the available transmitter power. This mechanism requires a multi-level driver along with a digital filter.

For the following analysis a  $2^{nd}$  order pre-emphasis filter (1 precursor tap, 1 postcursor tap) was used along with a 16 level driver. The tap coefficients were determined by optimizing the maximum data rate for each interconnect length and signaling style. The results of this pre-emphasis analysis are shown in Fig. 8.

Again, SBD signaling still led in maximum data rate, however, the gap between it and UD binary decreased. The 4 PAM signaling benefited from the pre-emphasis but was well



Fig. 8 SBD, UD binary and 4 PAM with transmitter pre-emphasis



Fig. 9 SBD, UD binary, and 4 PAM with transmitter pre-emphasis, low noise and jitter

behind SBD or UD binary data rates. Even though distortion was reduced, the large amount of noise (mostly supply noise) was a severe handicap to 4 PAM signaling. Using more advanced techniques such as differential and current-mode signaling is one way to reduce the noise floor. This would effectively reduce the receiver referenced noise and allow a much smaller eye opening for a given error rate.

To emulate a signaling system with improved circuits and reduced jitter characteristics, the analysis was modified such that jitter = 20ps p-p, receiver noise = 10mV and receiver sensitivity = 5mV. Fig. 9 shows the signaling rate improvement for this lower noise floor condition. With this analysis, 4 PAM data rates are improved and the three signaling techniques are more closely matched.

# Conclusion

This paper has described an accurate and efficient method of modeling the performance of high-speed chip-to-chip signaling systems. The simulation tool based on these methods is capable of analyzing different signaling techniques, equalization methods, interconnect topologies and circuits. We correlated our simulation results to actual measurements of a high-speed signaling system, evaluated and compared UD binary, SBD and 4 PAM signaling schemes and compared them with the introduction of transmitter preemphasis and various noise conditions.

#### Acknowledgements

We thank K. Mallory and J. Mix for assistance in generating interconnect simulation models.

# References

- J. G. Proakis, Digital Communications, 3<sup>rd</sup> ed., Singapore: McGraw-Hill, 1995, pp. 602-603.
- [2] C. W. Helstrom, "Calculating error probabilities for intersymbol and cochannel intereference," *IEEE Trans. Commun.*, vol. COM-24, pp. 430-435, May 1986.
- [3] R. Nair, et al., "a 28.GB/s CMOS Non-Blocking Router for Terabits/s Connectivity between Multiple Processors and Peripheral I/O Nodes," ISSCC 2001 Digest of Technical Papers, Pap 14.7, February 2001.
- [4] M. Haycock, R. Mooney, "3.2GHz, 6.4Gb/s per wire signaling in 0.18µm CMOS," ISSCC 2001 Digest of Technical Papers, Pap 4.3, February 2001.
- [5] H. Wilson, M. Haycock, "A Six-Port 30-GB/s Nonblocking Router Component Using Point-to-Point Simultaneous Bidirectional Signaling for High-Bandwidth Interconnects," *IEEE Journal of Solid-State Circuits*, volume 36, issue 12, pp.1954-1963, December 2001.
- [6] M. Haycock et al., "A 2.5 Gb/s Bidirectional Signaling Technology," *Hot Interconnects V Symposium Record*, pp. 149-156, August 1997.
- [7] R. Mooney, C. Dike, S. Borkar, "A 900 Mb/s Bidirectional Signaling Scheme," *IEEE Journal of Solid-State Circuits*, volume 30, issue 12, pp.1538-1543, December 1995.