# A Robust, Fast Pulsed Flip-Flop Design

Arunprasad Venkatraman, Rajesh Garg and Sunil P. Khatri Department of ECE, Texas A&M University College Station, TX 77843 arunprasadnitt@tamu.edu, rajeshgarg@tamu.edu and sunilkhatri@tamu.edu

## ABSTRACT

High Speed VLSI design utilizes heavy pipelining, resulting in a large number of flip-flops in the circuit. Hence there is a strong motivation to design fast, low power and area efficient flip-flops. In this paper, we present a pulsed flip-flop design based on a novel pulse generator circuit. Our design achieves significantly improved speed when compared to recent pulsed flip-flop design, as well as a traditional masterslave D flip-flop. Monte Carlo simulations demonstrate that our design is significantly more robust to variations than the other flip-flops. Our design consumes low power as well. Also we have performed the layout of our design and shown that our layout area is smaller than a traditional D flip-flop. **Categories and Subject Descriptors:** B.7.1 [Types and

Design Styles]: VLSI

General Terms: Design

Keywords: Flip-Flop, Latch

#### 1. INTRODUCTION

Along with the relentless advance of technology and shrinking of device sizes, there is a constant need to increase the speed of operation and decrease the power consumption. Latches and flip-flops are fundamental building blocks of sequential digital circuits [1]. To obtain high throughput for the system, a common technique used is to increase the number of pipeline stages [2]. This has led to an increase in number of latches and flip-flops in the circuit. As a consequence both the dynamic and static components of power is increased. The Dynamic power consumption is mainly caused by charging and discharging of parasitic and load capacitances. The Static power consumption occurs due to device leakage. The dynamic power consumed by the design increases linearly with clock frequency. The generation and the distribution networks consumes 20% to 40% of the total power [3, 4] of the design. A large fraction of this power is due to clocking of the latches and flip-flops of the design. In systems with stringent power constraints, the latches and flip-flops must therefore consume low power. For applications such as mobile phones and PDA, flip-flops need to consume very little power and operate at high speed [5]. Therefore, lowering the power consumption of flip-flop and

Copyright 2008 ACM 978-1-59593-999-9/08/05 ...\$5.00.

increasing its speed of operation are among the most important concerns of circuit designers [6]. Also, with technology scaling, process variations are becoming very important. There is therefore a need for the circuit to operate flawlessly and robustly under the influence of process variations.

Traditionally, flip-flops consist of a master latch and a slave latch, connected back-to-back. The input data is captured at the latching edge of the master clock and delivered to the output at the releasing edge of the slave clock [7]which is typically the compliment of the master clock. Thus a master-slave latch behaves as an edge triggered flip-flop element. For a flip-flop implemented in a master-slave latch configuration, data needs to arrive before the clock edge and so it has a positive setup time. Since it has a positive setup time, the sum of clock to Q delay  $(T_{cq})$  and setup time  $(T_{su})$ is higher. This delay  $T_{su} + T_{cq}$  is classically the figure of merit for timing. A lower value of this delay is desired. Since this delay is higher for a flip-flop configured as a master-slave latch, this type of flip-flop cannot be used for high speed applications. There is a strong motivation to design flip-flops using alternative circuit configuration, so that  $T_{su} + T_{cq}$  is reduced.

Recently pulsed flip-flops have been used. Such a flip-flop consist of a pulse generator circuit and a latch [8]. The latch is transparent for the time the pulse is high. The pulse is derived from the clock edge and hence is generated after the clock edge. As a result, data can also arrive after the clock edge and so the pulsed flip-flops can have negative setup time. This enables the pulsed flip-flops circuits to operate at high speeds, since  $T_{su} + T_{cq}$  is reduced. In a pulsed flip-flop, the pulse generator circuit can be shared across multiple flip-flops, thereby reducing circuit area. In this paper we propose a novel pulse generator design and use it to make a pulsed flip-flop. The main contributions of this paper are:

- A novel pulse generator circuit is presented.
- The proposed pulsed flip-flop operates at very low power, compared to existing pulsed flip-flop designs.
- The proposed design needs a smaller clock period and hence can allow the synchronous design to operate at high speed.
- The design is shown to be robust by performing Monte Carlo Analysis.

The remainder of this paper is organized as follows. Section 2 discusses some important previous work in this area. In Section 3 we describe our novel pulse generator circuit and proposed pulsed flip-flop. In Section 4, we present experimental results which demonstrate that our design is superior to the existing flip-flops. Finally conclusions are presented in Section 5.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

GLSVLSI'08, May 4-6, 2008, Orlando, Florida, USA.

## 2. PREVIOUS WORK

Several kinds of flip-flops have been proposed over the years, to minimize the power consumption and increase the speed of operation. The transmission gate based masterslave flip-flop [9] is one of the simplest implementations. However it has a positive setup time limiting its speed of operation. To overcome the positive setup time issue and to reduce the area and power of the flip-flops, pulsed flip-flops were introduced. In most of these approaches separate pulse generator and latch circuits are used. In [10], the authors present a dual pulse generator circuit and a NAND keeper latch. However their design occupies a larger area and also has a higher power consumption. In [11], authors proposed an explicit pulsed flip-flop. Their latch circuit is clocked using a single transistor. Their pulse generator circuit simply delays the clock and inverts it before ANDing the inverted delayed clock with the clock to generate the pulse. Their pulse generator circuit is very simple but their  $T_{cq}$  is high and hence  $T_{cq} + T_{su}$  figure of merit is also high. So they cannot be used for high speed applications.

The pulsed flip-flop proposed in [12], has a dynamic pulse generator circuit and a static latch. By using dynamic pulse generator, the authors have achieved good setup time. Also  $T_{cq} + T_{su}$  is low which implies their circuit can operate at higher speed. But their layout area is large and also their power consumption is high, as we will show in the sequel.

In [13], the pulsed flip-flop has a dynamic master stage and a static slave stage. In the dynamic stage, precharge and discharge occur alternatively every clock cycle. This happens regardless of the output transition resulting in unnecessary power consumption. In [14], authors proposed a improved hybrid latch flip-flop with reduced power consumption. By modifying the dynamic master stage of the hybrid latch flip-flop they have reduced the power consumption significantly. But due to their higher  $T_{cq}$  their speed of operation is slower. In contrast to all these designs, our proposed circuit has a significantly lower  $T_{cq} + T_{su}$  and still consume very low power. Our design also occupies less layout area and is very robust, as evidenced by Monte Carlo simulations. We re-implemented [12] and [14], and compared our approach with these designs as well as a master-slave flip-flop, all implemented in a 100nm process. Among the recent prior aproaches [12] and [14] claim superior results.

#### 3. OUR APPROACH

Our pulsed flip-flop consists of a pulse generator circuit and a latch. Most of the important characteristics of a pulsed flip-flop such as  $T_{su}$ , hold time  $(T_h)$  and  $T_{cq}$  are determined by the pulse generator circuit [12]. Hence the pulse generator circuit must be carefully implemented. As stated earlier, the figure of merit of a flip-flop is  $T_{su} + T_{cq}$ . We first explain why  $T_{su} + T_{cq}$  is a useful figure of merit. Consider the circuit shown in Figure 1. Let D be the delay of the combinational logic between the two flip-flops.  $T_{su}$ is the setup time of the flip-flop and  $T_{cq}$  is the clock to Q delay. If T is the clock period then  $T > T_{su} + T_{cq} + D$ is required [7] for the data to be sampled correctly. So the figure of merit that determines the speed of operation is  $T_{su}$  $+ T_{cq}$ . If this quantity is lower the speed of operation, the speed of operation will be higher and vice-versa.

Keeping this in mind, we propose a novel pulse generator circuit shown in Figure 2. All device sizes in all our figures are in microns. Also all devices have minimum channel length. A 1X inverter has PMOS transistor width of 0.2 microns and NMOS width of 0.1 microns. A 1X NAND gate has a width of 0.2 microns for both PMOS and NMOS transistors. The goal of the pulse generator circuit is to deliver a pulse at the edge of the clock as shown in timing diagram of Figure 3. Since the pulse is generated after the rising edge



Figure 1: Timing constraint in a flip-flop

of the clock, data can arrive even after the clock edge. This ensures that the flip-flop has negative setup time. The proposed pulse generator consist of two PMOS transistors P1, P2 and two NMOS transistors N1, N2. The clock signal is fed to the gate of PMOS transistor P1. P2 helps to pass the signal at node Z when clock is low and the transistors N1 and N2 help to pull it down. The working of the pulse generator circuit can be illustrated as follows. When the clock falls the node Z is pulled up to VDD by P1. The size of the PMOS device determines the delay between the falling edge of the CLK. When CLK rises, the PMOS device P2 acts as a passgate, allowing NMOS device N2 to discharge internal node W. Until W is fully discharged, it helps discharge Z by means of the NMOS device N1. Hence the falling of Z is achieved with extremely fast slew rates. Now the signal at node Z is NANDed with the clock and the output of the NAND gate is inverted to get the desired pulse. Note that the fast falling slew rate on Z allows us to achieve sharp pulse with low slew rates.



Figure 2: The Proposed pulse generator



Figure 3: Waveform obtained at various nodes in pulse generator circuit

The other component in a pulsed flip-flop is a latch. We have used the latch shown in Figure 4. This latch is transparent when the clock is high. The latch circuit is a tristate inverter with a static keeper. The pulse signal is fed to NMOS transistor N2, while its compliment is fed to the PMOS transistor P1. The input D is fed into transistors P2 and N1 as shown in Figure 4. The keeper circuit consists of two back to back inverters. The feedback inverter uses long channel device. It is used to hold the output. Since we have negative setup time and a smaller delay from D to Q we will also have better  $T_{su} + T_{cq}$  and hence higher speed of operation. When compared to master-slave latches pulsed flip-flops require only one latch per flip-flop. They also allow logic to borrow time across cycle boundaries [12]. Also, the pulsed flip-flop structure uses a pulse generator which consumes lot of power. By careful design we can share this pulse generator between two or more flip-flops. In this way we can bring about significant reduction in power and area of the circuit. This makes our pulsed flip-flop ideally suitable for low power and high speed applications. Also to guarantee robust operation over process, voltage and temperature variations, our devices were carefully sized. A Monte Carlo simulation was performed to test the robustness of the device. We ensured that all the flip-flops that are reported in Section 4 were sized to work correctly for all the Monte Carlo Simulations.



Figure 4: The latch structure

#### 4. EXPERIMENTAL RESULTS

We simulated the proposed pulsed flip-flop using SPICE using a 100 nm BSIM [15] model card. To compare the performance of our design we have also implemented the pulsed flip-flops of [12], [14] and also compared the results with a traditional master-slave D flip-flop. All designs were implemented using the same 100nm BSIM models. The circuit diagrams of [12] and [14] are shown in Figure 6 and Figure 7 respectively. All the device sizes are in microns. To verify robustness we have sized the devices of all the four flip-flops such that they work correctly across all variations in supply voltage, threshold voltage and channel length. This was done by performing 500 Monte Carlo simulations for each flip-flop. The mean value of each parameter that was varied (this includes VDD,  $V_T$  and channel length) was taken to be equal to the nominal value, and the standard deviation used was taken to be 3.34% of the nominal value (such that the three times of the standard deviation is 10% of the nominal value). Monte Carlo simulations are done at room temperature. All four flip-flop designs under test were sized to work correctly across all the 500 Monte Carlo simulations. The  $T_{su}$ ,  $T_{cq}$  and  $T_h$  were measured with respect to the clock edge. Since the D flip-flop and the design from [14] have no separate pulse they have no entry for pulse width. The results of the simulation are shown in the Table 1. From Table 1, we can clearly see that our design has a significantly lower  $T_{su} + T_{cq}$  delay. This shows that our design has the fastest

operating speed among the compared designs. Our design has 68% (60%) lower  $T_{su} + T_{cq}$  delay when compared to [12] ([14]) and 71% lower  $T_{su} + T_{cq}$  when compared to D flipflop. Also we can clearly see that our design has 40% lower power dissipation when compared to [12]. Our proposed pulsed flip-flop has a power consumption which is comparable to that of [14] and the D flip-flop. The leekage current for our proposed flip flop, [12], [14] and a traditional masterslave D flip-flop are 21nA, 25n, 5nA and 9nA respectively. Also our proposed design has a standard deviation of  $T_{su}$  +  $T_{cq}$  which is significantly lower than the other three designs resulting in a highly robust flip-flop. Note that when we performed 500 Monte Carlo simulations, we found the value of  $T_{su} + T_{cq}$  for each simulation and then computed its  $\mu$ and  $\sigma$ . The major portion of power in our design is consumed by the pulse generation circuit. However as stated earlier, we can share our pulse generator to more than one latch, a fact that it is true for any pulsed flip-flop that has an explicit pulse generator. In this way we will also be able to consume lesser power when we use our pulsed flip-flop in a design which contains large number of flip-flops. Also as pointed out earlier, since our design has only one latch per half-cycle time-borrowing is feasible. This is true for [12] and [14] as well.

The minimum delay constraint for a flip-flop is  $D_{min} > T_h - T_{cq}$  [5].  $D_{min}$ ,  $T_h$  and  $T_{cq}$  are assumed to be normally distributed random variables, therefore for 99% yield we have  $D_{min}^{\mu-3\sigma} > T_h^{\mu+3\sigma} - T_{cq}^{\mu+3\sigma}$ . Using the  $\mu$  and  $\sigma$  values of  $T_h$  and  $T_{cq}$  from Table 1,  $D_{min}$  for our proposed flip-flop, [14], [12] and a traditional master-slave D flip-flop are 50ps, -16ps, 82ps and 3ps respectively. The  $D_{min}$  required for our design is almost a single gate delay and so we will not have any hold time violations even in the worst case. We have also measured the clock load for all the designs. The clock load is calculated as the sum of gate area driven by the clock signal. We find that [12] has the lowest clock load but it has the highest power consumption among the compared designs. While our proposed pulsed flip-flop and the other two design have similar clock load and power consumption, as shown in the Column 8 of Table 1.

We have also compared our design with D flip-flop (shown in Figure 5.) in terms of layout area. Our proposed pulsed flip-flop occupies an area of 17.2  $\mu$ m<sup>2</sup>. In contrast a master-slave D flip-flop occupies an area of 23.61  $\mu$ m<sup>2</sup>. Thus we occupy 27% lesser area than a traditional master-slave flip flop. Also by sharing of pulse generator across more than one latch we can reduce the area further more. The layouts of the D flip-flop and our proposed design are shown in Figure 8 and Figure 9 respectively.



Figure 5: Transmission gate based D flip-flop

### 5. CONCLUSIONS

In this paper, we have presented a novel pulsed flip-flop design, and demonstrated that it consumes low power and area while achieving very high performance. Our pulse flipflop uses a novel pulse generator circuit. We compared our

| Flip-flop      | $T_{cq}$ (ps) |          | pulse width (ps) |       | power $(\mu w)$ |          | setup time $T_{su}$ (ps) |       | hold time (ps) |          | $T_{su} + T_{cq}$ (ps) |       | Clock           |
|----------------|---------------|----------|------------------|-------|-----------------|----------|--------------------------|-------|----------------|----------|------------------------|-------|-----------------|
|                | $\mu$         | $\sigma$ | $\mu$            | σ     | $\mu$           | $\sigma$ | $\mu$                    | σ     | $\mu$          | $\sigma$ | $\mu$                  | σ     | $Load(\mu m^2)$ |
| Proposed pulse |               |          |                  |       |                 |          |                          |       |                |          |                        |       |                 |
| flip flop      | 95.17         | 8.47     | 78.17            | 14.12 | 8.68            | 1.12     | -68.87                   | 11.19 | 87.47          | 11.21    | 26.3                   | 2.68  | 0.11            |
| Hybrid pulsed  |               |          |                  |       |                 |          |                          |       |                |          |                        |       |                 |
| flip-flop [14] | 117           | 14.72    | -                | -     | 8.43            | 0.96     | -34.4                    | 1.98  | 42             | 4.84     | 82.6                   | 11.84 | 0.13            |
| Pulsed flip    |               |          |                  |       |                 |          |                          |       |                |          |                        |       |                 |
| -flop [12]     | 120.4         | 29.3     | 52.18            | 3.7   | 14.57           | 1.8      | -54.2                    | 4.14  | 108.24         | 11.56    | 65.8                   | 7.26  | 0.05            |
| Master-Slave   |               |          |                  |       |                 |          |                          |       |                |          |                        |       |                 |
| D flip-flop    | 69.87         | 11.48    | -                | -     | 7.472           | 1.32     | 21.36                    | 2.48  | 29.88          | 3.12     | 91.2                   | 8.72  | 0.09            |

Table 1: Comparison of different Flip-Flop designs



Figure 6: Explicit Pulsed flip-flop [12]



Figure 7: Improved Hybrid latch flip-flop [14]

design against two pulsed flip-flops from the literature, along with a traditional master-slave D flip-flop. The robustness of the design was verified by performing Monte Carlo analysis. When compared to other designs, our design has a significantly lower  $T_{cq} + T_{su}$  delay and also consumes low power. For any pulsed flip-flop, we can use a single pulse generator for more than one latch and thus reduce the power consumption further. Our design outperformed the other pulsed flip-flops by 60% to 70% in terms of delay and also a lower standard deviation of  $T_{su} + T_{cq}$  in comparison with other flip-flops. Our design also has 27% less layout area than a normal D- flip flop. This area can be further reduced by sharing the pulse generator between more than one latch.

#### 6. **REFERENCES**

- R. S. Yeo, K.S. and W. Goh, CMOS/BiCMOS ULSI:low voltage and low power. Prentice Hall PTR, Upper Sadle River, NJ, 2002.
- [2] B. W. Chandrakasan, A. and F. Fox, Design of high performance microprocessor circuits. IEEE press, Piscataway, New Jersey, 2001.
- [3] L. D. and C. Svensson, "Power consumption estimation in CMOS VLSI chips," *IEEE J. Solid-State Circuits*, pp. 663–670, 1994,29(6).



Figure 8: D flip-flop layout



Figure 9: Our pulsed flip-flop layout

- [4] H. Kawaguchi, and T. Sakurai, "A reduced clock swing flip-flop(RCSFF) for 63% power reduction," *IEEE J. Solid-State Circuits*, pp. 807–811, 1998,33(5).
- [5] A. Chandrakasan and Brodersen, Low power CMOS design. IEEE press, New York, 1998.
- [6] P. M. and J. Rabaey, Low power design methodologies. Kluwer, Norwell, MA, USA, 1996.
- [7] N. Weste and E. K., Principles of CMOS VLSI design: A systems perspective. Prentice Hall, New Jersey, 2003,2nd edition.
- [8] J. Rabaey, Digital Integrated Circuits: A Design Perspective. Prentice Hall Electronics and VLSI Series, Prentice Hall, 1996.
- S. V and V.Oklobdizija., "Comparative analysis of master-slave latches and flip-flops for high performance and low power systems," *IEEE jnl of Solid State Circuits*, pp. 636–548, April 1999.
- [10] W. Kim, D. Shin, H. Yun, J. Kim, and S. Min, "Performance comparison of dynamic voltage scaling algorithms for hard real-time systems," in *Proc. of IEEE Real-Time and Embedded Technology and Applications Symposium*, pp. 219–228, 2002.
- [11] D. T. Zhao, P. and M. Bayoumi, "Low power and high speed explicit pulsed flip-flops," *The 45th midwest Symp. on Circuits* and Systems, Tulsa, OK, USA,, Aug 2002,.
- [12] W. M.W.Phyu and K.S.Yeo, "Low-power/high performance explicit-pulsed flip-flop using static latch and dynamic pulse generator," *IEE Proc. Circuits Devices and Syst.*, vol. 153, June 2006.
- [13] e. a. H.Partovi, "Flow through latch and edge triggered flip-flop hybrid elements," *International solid state circuit conference Digest of technical papers*, pp. 138–139, February 1996.
- [14] S. Goel and M. Bayoumi, "Improved Hybrid Latch flip-flop for low-power VLSI systems," tech. rep., Electronics Research Lab, VLSI Research Lab, Centre for Advanced Computer Studies, University of Lousiana at Lafayette.
- [15] Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu, "New paradigm of predictive MOSFET and interconnect modeling for early circuit design," in *Proc. of IEEE Custom Integrated Circuit Conference*, pp. 201–204, Jun 2000. http://www-device.eecs.berkeley.edu/ ptm.