# Energy Efficient and High Speed On-Chip Ternary Bus

Chunjie Duan and Sunil P. Khatri

Abstract— We propose two crosstalk reducing coding schemes using ternary busses. In addition to low power consumption and reduced delay, our schemes offer other advantages over binary coding schemes such as zero area overhead and simple, regular and fast CODEC design.

#### I. Introduction

Interconnect has become a bottleneck in Deep Sub-Micron (DSM) designs for both the speed and power consumption. Crosstalk, once negligible, has become a dominant determinant in overall system performance. Figure 1 shows a simplified bus model with crosstalk.  $C_L$  denotes the load capacitance seen by the driver, which includes the receiver gate capacitance and the wire-to-substrate parasitic capacitance.  $C_I$  is the coupling (inter-wire) capacitance between signal lines of the n-bit bus. For DSM,  $C_I \gg C_L$  [1].



Fig. 1. Simplified Bus Model

Many different schemes have been proposed to address the crosstalk induced performance degradation. In [2], the bus data is inverted when more than 50% of the lines have transitions. An additional line is used to signal if the bus is inverted. Gray codes and T0 codes can be used for address busses [5]. Crosstalk avoidance codes (CAC) have been proposed to reduce the worst-case delay of the bus [3][4][6]. These approaches remove certain patterns and consequently limit the worst-case crosstalk. Such codes can be constructed systematically. It has been shown that the asymptotic overhead for such code is  $\sim 44\%$ . The CODEC complexity for such a CAC grows rapidly with the bus width. When the bus reaches a certain size, the delay introduced by the CODEC logic may eventually cancel out the speed gain due to crosstalk avoidance. The complexity of the CODEC can be managed by partitioning the bus into small groups/lanes. All these techniques introduce additional area overhead.

Chunjie Duan is with Mitsubishi Electric Research Labs, 201 Broadway, Cambridge, MA USA; S. Khatri is with Department of Electrical Engineering, Texas A&M University, College Station, TX 77843 USA. Email: duan@merl.com, sunilkhatri@tamu.edu

A widely used technique in high-speed interconnect design is to reduce voltage swing, since delay is proportional to it. The related idea of multi-level busses has been studied for high throughput interconnect as it offers higher bit density than binary busses [7][8][10][11]. 4-level pulse amplitude modulation (PAM) is popular in multi-level logic data busses [7][11]. It offers sufficient noise margin and the number of logic levels is a power of two. However, crosstalk was not addressed in the above papers.

In this paper, we introduce a low-power, high-speed bus design that combines cross avoidance coding and the reduced swing offered by ternary busses. A direct bit-to-line binary-ternary mapping with flexible bit polarity selection allows us to design codes that avoid high crosstalk transitions (thereby reduce overall power consumption) while simultaneously increasing bus speed. It also simplifies the CODEC design. The proposed ternary CACs have a zero or low area overhead.

The rest of the paper is organized as follows: Section II gives the mathematical model used for computing bus speed and power consumption in the presence of crosstalk. It also defines several terms that are used in the later sections. Section III describes our approach for reducing the crosstalk on the ternary bus. Section IV discusses the implementation of the ternary bus and shows the experimental results. Section V draws some conclusions.

# II. BACKGROUND

For busses with crosstalk, the delay on the  $j^{th}$  line is given as:

$$\tau_j = k \cdot abs(C_L \cdot \Delta V_j + C_I \cdot \Delta V_{j,j-1} + C_I \cdot \Delta V_{j,j+1}) \quad (1)$$

where k is a constant determined by the driver strength or line resistance in the case when the delay is RC limited,  $\Delta V_j$  is the voltage change on the  $j^{th}$  line and  $\Delta V_{j,k} = \Delta V_j - \Delta V_k$  is the relative voltage change between the  $j^{th}$  and  $k^{th}$  line. For a binary-valued bus, assuming the output voltage levels are  $V_{dd}$  and 0V, we have  $\Delta V_j \in \{0, \pm V_{dd}\}$  and  $\Delta V_{j,k} \in \{0, \pm V_{dd}, \pm 2 \cdot V_{dd}\}$ .

Let  $V_{step}$  be the voltage step between logic levels ( $V_{step} = V_{dd}$  for a binary bus). Eq. 1 can be rewritten as

$$\tau_{i} = k \cdot C_{L} \cdot V_{step} \cdot abs(\delta_{i} + \lambda \cdot \delta_{i,i-1} + \lambda \cdot \delta_{i,i+1}) \tag{2}$$

here  $\lambda = \frac{C_I}{C_L}$ ,  $\delta_j$  is the normalized transition and  $\delta_{j,k}$  is the normalized relative transition. For the binary bus,  $\delta_j = 1$  when there is transition on the  $j^{th}$  line, otherwise  $\delta_j = 0$ ;  $\delta_{j,k} = 1$  when both  $j^{th}$  and  $k^{th}$  lines transition in the same direction, -1 when they transition in opposite directions and 0 otherwise.

If we define the *normalized total crosstalk* of the  $j^{th}$  line  $(X_{eff,j})$  as

$$X_{eff,j} = abs(2\delta_j - \delta_{j,j-1} - \delta_{j,j+1}) \tag{3}$$

We can see that for a binary bus,  $\min\{X_{eff,j}\}=0$  when  $\delta_{j,j-1}=\delta_{j,j+1}=\delta_j$ .  $\max\{X_{eff,j}\}=4$  when  $\delta_{j,j-1}=\delta_{j,j+1}=-\delta_j$ .

$$\tau_i = k \cdot C_L \cdot V_{step} \cdot abs(\delta_i + \lambda \cdot X_{eff,i}) \tag{4}$$

For  $\lambda\gg 1$ , the delay  $\tau_j$  is linearly proportional to  $X_{eff,j}$ . The maximum bus speed is limited by the value of  $\max\{X_{eff,j}\}$ . We define the vector sequences as 0X, 1X, 2X, 3X and 4X sequences, corresponding to  $X_{eff,j}=0,1,2,3,4$  respectively.

The energy consumption can also be derived based on the same model [11]. The total energy

$$E_{total} = \sum_{j=1}^{n} E_j^L + \lambda \sum_{j=1}^{n} E_j^I$$

$$= \sum_{j=1}^{n} (1 + X_{eff,j} \cdot \lambda) C_L \cdot \Delta V_j^2$$
(5)

includes the contribution of the energy to charge/discharge the load capacitance,  $E_j^L$ , and also the energy to charge/discharge the inter-wire capacitance,  $E_j^I$ . Once again, the first term is negligible for DSM processes.

The above equations serve as the basis for most of the low power and cross avoidance coding schemes for binary busses. For ternary busses, we assume a voltage-mode implementation that has three output voltages:  $V_{dd}$ ,  $V_{dd}/2$  and 0V, Equation 1-4 remain valid. However, the signal swing  $V_{step}$  is reduced to  $V_{dd}/2$ .  $\delta_j$   $\delta_{j,j+1}$  and  $\delta_{j,j-1}$  vary within [-2, 2] instead of [-1,1]. The maximum value of  $X_{eff}$  can be as high as 8. Equation 5 remains valid for the ternary bus energy computation.

# III. LOW POWER AND CROSSTALK AVOIDING CODE ON A TERNARY BUS

**Notation**: In the following discussion, the three logic values on the ternary bus are denoted in bold as +1, 0, -1, representing the high, middle and low values respectively. For clarity, they are sometimes simplified as +, 0, -. Binary values are denoted in italics. For example, 101 indicates a 3-bit binary value and +0+ is a 3-bit ternary value. Also, ' $\Rightarrow$ ' denotes a mapping operation and ' $\rightarrow$ ' indicates a transition.

### A. Direct Binary-Ternary Mapping

A ternary bus has higher "bit density" if it is used to represent a true ternary vector as each ternary bit can represent 1.585 binary bits. Hence, an n bit ternary bus can be used to replace an m bit binary bus where  $m = \lfloor log_2 3^n \rfloor \approx 1.585n$ . However, the mapping between binary and ternary values is complex and expensive to implement in VLSI. Thus a true ternary bus is rarely used for interconnect. In this work, we propose a ternary bus interconnect for binary logic. The mapping between the binary logic and ternary-valued bus is as follows:

- 1) Each binary bit is mapped directly to a line on the ternary bus hence n=m;
- 2) A binary  $\theta$  is always mapped to a middle value on the ternary bus. i.e.,  $\theta \Rightarrow \mathbf{0}$ ;
- 3) A binary I is mapped to either high or low value on the ternary bus. i.e.  $I \Rightarrow +$  or  $I \Rightarrow -$ .

Such a direct mapping scheme offers two advantages: it makes the encoding/decoding logic simple, and it offers flexibility on the polarity for the binary I. Table I gives several examples of ternary transition sequences and the corresponding  $X_{eff}$  on the middle bit. Notice that in this table,  $+0+\rightarrow0+0$  produces a 4X crosstalk (on the middle line). For  $+0+\rightarrow0+0$ ,  $X_{eff}=0$ . In our mapping scheme, both these ternary sequences represent the same binary sequence  $101\rightarrow010$ . Similarly,  $000\rightarrow0++$  transition has much less crosstalk (1X) compared to  $000\rightarrow0+-$ , which produces 4X crosstalk. In the extreme case,  $+-+\rightarrow+-+$  is an 8x sequence but a  $+++\rightarrow--$  is a 0x sequence.

 $\label{eq:table_interpolation} \text{TABLE I}$  . Examples of Total Crosstalk

| $\mathbf{V}_{t-1}$ | $\mathbf{V}_{t+1}$ | $\mathbf{X}_{e\!f\!f}$ |
|--------------------|--------------------|------------------------|
| 000                | +++                | 0                      |
| 000                | 0++                | 1                      |
| 000                | 0+-                | 4                      |
| +0+                | 0+0                | 4                      |
| +0+                | 0-0                | 0                      |
| -+0                | +-0                | 6                      |
| +-+                | -+-                | 8                      |
| +++                |                    | 0                      |

If we define the *bit polarity* as the sign of the ternary representation for a binary '1', clearly the bit polarity can affect crosstalk significantly. The following two coding schemes exploit the flexibility in our mapping scheme to reduce the crosstalk, yielding energy savings and an increase in speed.

# B. 4X Ternary Code

We first show the construction of a ternary sequence that eliminates 5X and higher crosstalk. We call this a "4X ternary code" since it satisfies

$$max\{X_{eff,j}\} \le 4, \forall j \in [1, n]$$

Let  $b_j$  be the  $j^{th}$  bit of the input vector,  $\mathbf{P}_j$  the polarity and  $\mathbf{D}_j$  the magnitude of the corresponding ternary representation. The following are the rules for constructing the 4X ternary sequence

- 1) Direct '+---' or '-----+' transitions are prohibited.
- 2)  $1 \rightarrow 0$  is mapped as  $+ \rightarrow 0$  or  $\rightarrow 0$ .
- 3) For a  $0 \rightarrow 1$  transition on  $b_j$ , if  $b_{j-1}$  is transitioning,  $\mathbf{P}_j$  is coded so both lines transition in the same direction.
- 4) For a  $0 \rightarrow 1$  transition on  $b_j$ , if  $b_{j-1}$  is not transitioning and that  $b_{j+1}$  is transitioning from 1 to 0,  $\mathbf{P}_j$  is coded so  $j^{th}$  and  $(j+1)^{th}$  lines transition in the same direction.
- 5) For a  $0 \rightarrow 1$  transition on  $b_j$ , if no transition occurs on either neighbor,  $\mathbf{P}_j$  is coded so  $\{\mathbf{P}_j = \mathbf{P}_{j-1} \text{ or } \mathbf{P}_j = \mathbf{P}_{j+1}\}$  with  $\mathbf{P}_j = \mathbf{P}_{j-1}$  having the higher priority.

\_\_\_\_\_

The coding scheme guarantees a maximum of 4X crosstalk on any bit of the ternary bus. Compared to a ternary bus without crosstalk avoidance coding where  $max\{X_{eff,j}\}=8$ , our coding will boost the bus speed by close to a factor of 2. In addition, the encoder chooses the bit polarity to minimize local crosstalk whenever it is possible. This results in total crosstalk reduction and therefore lowers the bus power consumption. Table III gives an example of the output sequence produced of the 4X ternary code for a given binary sequence.

A CODEC that satisfies all the rules above is illustrated in Figure 2, the inputs of the encoder are the binary vector  $b_1b_2...b_n$  and the output is a ternary vector  $\mathbf{t}_1\mathbf{t}_2...\mathbf{t}_n$ . Each line driver circuit can be considered of having two parts, the polarity encoder and the ternary driver. The ternary driver outputs the 3-level signal  $\mathbf{t}_j$  based on the polarity ( $\mathbf{P}_j$ ) and magnitude ( $\mathbf{D}_j$ ) with the truth table given in Table II. The  $\mathbf{j}^{th}$  polarity encoder has inputs of  $b_{j-1}b_jb_{j+1}$  and  $\mathbf{P}_{j-1}\mathbf{P}_j\mathbf{P}_{j+1}$ . Since the encoders for all lines are identical except for the boundary bits  $b_0$  and  $b_n$ , a very efficient implementations can be realized due to the regularity.

TABLE II
TERNARY DRIVER TRUTH TABLE

| Magnitude | Polarity | Logic Value | Voltage |
|-----------|----------|-------------|---------|
| $D_{j}$   | $P_{j}$  | $t_{j}$     |         |
| 0         | X        | 0           | $V_0$   |
| 1         | 0        | _           | $V_{-}$ |
| 1         | 1        | +           | $V_{+}$ |



Fig. 2. 4X encoder and driver circuit

TABLE III
4X TERNARY SEQUENCE EXAMPLE

| Binary   | Ternary  | $X_{eff}$ |  |
|----------|----------|-----------|--|
| 11110111 | ++-000-+ |           |  |
| 00110101 | 000+0+   | 01100121  |  |
| 11100011 | ++-000-+ | 01220111  |  |
| 01010100 | 0+0+0+00 | 10112122  |  |
| 10101110 | -0-0-+-0 | 00001021  |  |
| 01110001 | 0+-+000- | 01212200  |  |
| 00000011 | 000000   | 13431121  |  |
| 00011110 | 000+++-0 | 00110121  |  |

## C. 3X Ternary Code

The 4X-code lowers the probability of 4X crosstalk significantly but does not eliminate it completely as we can see in Table III. The maximum speed of a 4X-code bus is still limited by  $X_{eff}$ =4. To further improve the bus speed, we propose a modified coding scheme that eliminates the 4X crosstalk completely. The coding scheme is called "3X ternary code" as it satisfies

$$max\{X_{eff,j}\} \le 3, \forall j \in [1, n]$$

The 3X code is constructed by partitioning the bus into 5-bit lanes and inserting a grounded wire between lanes as a shield. By applying the same 5 rules given in Section III-B, the maximum crosstalk is 3X. Due to the space limit, the proof of the correctness is not given in this paper. Compared to the 4X code, such a design will have up to 20% area overhead. Since the 3X code can speed up the bus by  $\sim$ 33% compared to a 4X bus, this coding scheme offers an additional 11% gain in the throughput per unit area.

The 3X CODEC is less regular than the 4X CODEC. In each 5-bit lane, the polarity encoder for the center bit has higher complexity than encoders for other bits. Each 5-bit lane is independent from others and therefore there is no ripple delay throughout the entire bus.

#### IV. IMPLEMENTATION AND EXPERIMENTAL RESULTS

The ternary busses can be implemented in either voltage mode or current mode. A current mode bus implementation is illustrated in Figure 3(a). The current mode drivers and receivers are less sensitive to supply voltage variations. However, the power consumption is generally higher than a voltage mode circuit since a static current is required to drive the low impedance load.

A voltage-mode implementation is shown in Figure 3(b). The driver consists of two PMOS devices and one NMOS device. The three output voltages are set as:  $V_+=V_{dd}$ ,  $V_0=V_{dd}/2$  and  $V_-=0V$ . The receiver consists of two voltage comparators and the outputs from the comparators are decoded by the simple decoder.

We first compare our 4X ternary bus with a half swing binary bus. Both busses have a maximum crosstalk of 4X. Our simulations shows that for bus sizes of 5-bit, 8-bit, 16-bit and 32-bit, the 4X ternary code offers energy saving of 34.52%, 28.21%, 27.25% and 27.56% respectively. The CODEC circuit power consumption is less than 2% of the total power consumption for a 5mm bus.

Table IV compares the performance of the 4X and 3X ternary bus against some other busses<sup>1</sup>. All busses in the table are 16-bit wide except the true ternary bus which is 11-bit wide. Also included in the table is a full-swing binary bus with passive shielding since it is widely used in crosstalk-free bus designs (note that for such a bus, there are 31 wires for a 16-bit bus). Lengths of the busses used in the simulation are 5mm long, the total power consumption includes drives, receivers

<sup>&</sup>lt;sup>1</sup>Bus types: 4XT- 4X ternary; 3XT- 3X ternary; SB- full swing binary bus with passive shielding; HB- half swing binary; RP- random polarity ternary; TT- true ternary



(a) Current-mode bus driver and receiver



(b) Voltage-mode bus driver and receiver

Fig. 3. Current-mode and Voltage-mode busses

and CODEC circuit (if applicable), which are implemented using a 65nm process. The sizing of the ternary drivers is done so that the 0 to  $0.25V_{dd}$  delay is equal to the 0 to  $0.5V_{dd}$  delay for the full swing drivers. The bus areas in the table are all normalized to the area of a 5mm, 16-wire bus. The same randomly generated vector sequences are used for all the simulation. EF is defined as normalized total energy which is  $E_{total}$  normalized by setting  $C_I=1$  and  $V_{step}=1$ . Table IV shows the 4X ternary bus has the minimum energy consumption. It saves on average 68% over a full-swing bus with passive shield and 27% over a half-swing binary bus. It is interesting to see that even though the true ternary bus has only 11 lines, it consumes  $\sim 19\%$  more power than the 4X ternary bus. The 3X ternary code consumes about 8% more energy than the 4X ternary bus due to the coupling crosstalk to the added shielding lines. We use Power-maxDelay Product (PDP), the product of EF and the normalized maximum delay, as the figure of merit for measuring the overall bus performance. Table IV shows that the 3X ternary code has the minimal PDP while the full-swing binary bus with passive shielding has the worst PDP.

TABLE IV BUS PERFORMANCE COMPARISON

| Bus type              | 4XT  | 3XT  | SB   | HB   | RP    | TT   |
|-----------------------|------|------|------|------|-------|------|
| EF $(10^4)$           | 6.13 | 6.67 | 19.7 | 8.38 | 12.1  | 7.55 |
| MaxDelay              | 4X   | 3X   | 4X   | 4X   | 8X    | 8X   |
| PDP(10 <sup>5</sup> ) | 2.45 | 2.00 | 7.88 | 3.35 | 9.68  | 6.04 |
| Pwr Saving (%)        | 68.9 | 66.1 | 0    | 57.5 | 38.6  | 61.7 |
| PDP Gain (%)          | 68.9 | 74.6 | 0    | 57.5 | -22.8 | 23.4 |
| Bus Area              | 1    | 1    | 1.97 | 1    | 1     | 0.68 |



Fig. 4. Crosstalk distributions

Figure 4 shows the distribution of the crosstalk for several different types of busses. We can see that by applying our coding scheme, the distribution of the transitions shift towards the lower  $X_{eff}$  values. The true ternary bus has fewer total number of transitions compared to the other busses since there are fewer bits on the true ternary bus (11 bits). However, a significant portion of the transitions have 4X and higher crosstalk. The total energy consumption is higher than the coded ternary bus.

# V. CONCLUSIONS

In this paper, we proposed two ternary bus coding schemes to reduce maximum and total crosstalk on a bus and therefore simultaneously lower the power consumption and boost the speed of on-chip bus interconnects. The proposed 4X bus effectively improves the bus speed by  $\sim 100\%$  compared to a true ternary bus or a full-swing binary bus with no area overhead. The 3X bus offers an additional 33% speed increase with a 20% area overhead. Simulation results show that both proposed ternary busses have significantly lower average power consumption over other coded and uncoded busses. Both coding schemes are simple and regular, allow efficient circuit implementation of low power and high speed CODECs.

#### REFERENCES

- [1] S. Khatri et al, "A Novel VLSI Layout Fabric for Deep Sub-Micron Applications", Proc. of Design Automation
- Conference, 1999.

  K. Kim et al, "Coupling driven signal encoding scheme for low-power interface design," Proc. of ICCAD, 2000.

  B. Victor and K. Keutzer, "Bus encoding to prevent crosstalk delay", Proc. of ICCAD, 2001.

  C. Duan, A. Tirumala, S.P.Khatri, "Analysis and avoidance of cross-talk in on-chip bus", Proc. of Hot Interconnects, [3] [4]
- Aug 2001. Saraswat, Haghani, Bernard, "A low power design of Gray and T0 codecs for the address bus encoding for system
- level power optimization". www.studentimaster.usilu.net/saraswap/prabhat/projects
- S. R. Sridhara, A. Ahmed, and N. R. Shanbhag, "Area and Energy-Efficient Crosstalk Avoidance Codes for On-Chip Buses", Proc. of ICCD'04
- Busses", Proc. of ICCL'D'4
   G. E. Beers and L. K. John, "A Novel Memory Bus Driver/Receiver Architecture for Higher Throughput" Proc. of Int'l Conf on VLSI Design. pp 259-264. 1998
   J. Madsen and S. Long, "A High-speed Interconnect Network using Ternary Logic". Proceedings of 25th International Symposium on MVL-S May 1995.
   P. Sotiriadis and A. Chandrakasan, "Low power bus coding techniques considering inter-wire capacitance". Proc. Of IEEE CCC 2000. 2607 510.

- Vishak Venkatraman and Wayne Burleson, "Robust Multi-Level Current-Mode On-Chip Interconnect Signaling. in F101 the Presence of Process Variations", Proc. of ISQED'05

  J. Zerbe et al, "1.6 Gb/s/pin 4-PAM Signaling and Circuits for a Multidrop Bus", IEEE J. OF Solid-State Circuits
- Vol. 36, No. 5, 2001