# Power and Area-Efficient Static-RAMUsing Spinner Cells

B.RAJASEKHAR REDDY<sup>1</sup>,G.MOUNIKA SAI<sup>2</sup>,J.VINAY<sup>3</sup>,A.KAVYA SREE<sup>4</sup>,G.VAMSI<sup>5</sup>

1. Associate professor, Dept of ECE, Sri Mittapalli College Of Engineering, Tummlapalem, Guntur, A.P, E-mail: rajasekhar.reddy61@gmail.com

2,3,4,5. Student, Dept of ECE, Sri Mittapalli College Of Engineering, Tummlapalem, Guntur, A.P.,

E-mail: gmouni98@gmail.com

Abstract—This paper proposes a power and area-efficient SRAM using pulsed latches. The area and power consumption are reduced by replacing flip-flops with pulsed latches. This method solves the timing problem between pulsed latches through the use of multiple non-overlap delayed pulsed clock signal instead of conventional single pulsed clock signal. The shift register uses a small number of pulsed clock signals by grouping the latches to several sub-shift register and using additional temporary storage latches. Now, we are extending this work by designing an application SRAM using proposed shift register which results in power and area efficient. A 256 bit SRAM using pulsed latches is designed through Verilog HDL using Xilinx tool. This design of SRAM with conventional register and with proposed shift register will be compared and analyzed and finally will be shown that SRAM with proposed shift register is power and area efficient.

*IndexTerms*—area-efficient,flip-flop,pulsedclock,pulsedlatch, shiftregister, ram.

## **I.INTRODUCTION**

Flip-flops (FFs) are the basic storage elements used extensively in all kinds of digital designs. In particular, digital designs nowadays often adopt intensive pipelining techniques and employ many FF-rich modules such as register file, shift register, and first in first out. It is also estimated that the power consumption of the clock system, which consists of clock distribution networks and storage elements, is as high as 50% of the total system power. FFs thus contribute a significant portion of the chip area and power consumption to the overall system design. Pulse-triggered FF (P-FF), because of its single-latch structure, is more popular than the conventional transmission gate (TG) and master-slave based FFs in high-speed applications. Besides the speed advantage, its circuit simplicity lowers the power consumption of the clock tree system. A P-FF consists of a pulse generator for strobe signals and a latch for data storage. If the triggering pulses are sufficiently narrow, the latch acts like an edge-triggered FF.

Since only one latch, as opposed to two in the conventional master–slave configuration, is needed, a P-FF is simpler in circuit complexity. This leads to a higher toggle rate for high-speed operations. P-FFs also allow time borrowing across clock cycle boundaries and characteristic a zero or even negative setup time. Here we present a low power pulse triggered flip-flop based on a signal feed through scheme. The design manages to shorten the longer delay by feeding the input signal directly to an internal node of the latch design to speed up the data transition. This mechanism is implemented by introducing a

simple pass transistor for extra signal driving. When combined with the pulse generation circuitry, it forms a new P-FF design with enhancedspeedandpower-delayproduct(PDP) performances.

As we know, the clock system which consists of the clock distribution network and timing elements (flip-flops and latches) is one of the most power consuming components in a VLSI system. This power consumption is approximately 30% to 60% of the total power dissipation in a system. As a result of reducing power consumed by flip -flops will have a deep impact on the total power consumption. In common digital VLSI circuits, the various sources of power dissipation are switching power (Pswitching), short circuit power (Pshortcircuit), static power (Pstatic) and leakage power (Pleakage).

The following equation describes the total power consumption (Ptot) related to these four power components.

#### **P**tot = **P**switching + **P**shortcircuit + **P**static + **P**leakage .....(1)

The important ways to reduce this power consumption are voltage scaling and double edge triggering .Voltage scaling is the most effective way to decrease power consumption, since power is proportional to the square of the voltage (the golden equation for power consumption of VLSI circuits P =CLVdd2fclk ; where CL - load capacitance, Vdd supply voltage and fclk - clock frequency. However, voltage scaling is associated with threshold voltage scaling which can cause the leakage to increase exponentially. On the other hand, double-edge triggered clocking can be used to save half of the power on the clock distribution network results in total power consumption. Double edge triggering means that, a flipflop responses for both positive (0 to 1 transition) and negative (1 to 0) edges results in cutting the frequency of the clock by one half. In this paper the second method-double edge triggering is proposed to implement clock branch sharing-implicit pulse (CBS\_ ip) scheme flipflop and make comparison analysis with the existing double edge triggering flip-flops.

Testing of any chip is mandatory to guarantee its functionality after the manufacturing Process. Based on type of the circuit, different testing techniques were proposed. Scan based testing is one of the popular testing technique for digital circuits. During the logic synthesis phase of ASIC design the classical D- flip-flop is replaced by the scan flip-flop as a design for testability. The logic diagram of scan flip-flop is as shown in figure.1 usually the scan flip-flop is the combination of multiplexer and a D-flip-flop. These scan flip-flops are connected as a shift register to

pass the test vectors into the circuitcontagion.



Figure 1: Block diagram of Scan flip flop

Testing cycle includes sequence of three different cycles as shift-in, capture and shift-out. During shift-in and shift-out cycles the circuit remains in test mode and during capture cycle the circuit remains in normal mode. The power consumption during shift cycle is directly proportional to the switching activity of the number of components in the circuit due to the serial shifting of test vectors. Zorian showed that Power dissipation during test mode of an IC is significantly higher than during normal mode.

Different techniques are proposed to reduce the test power during both shift cycle and capture cycle. Software based method of reducing test power during shift cycle proposed by Dobholkar where test vectors are reordered such that to reduce the number of transition in the circuit by 10% - 14%. Kajihara proposed the software based method to reduce the switching activity in the circuit by filling the don't care value with the value of adjacent on the left. This method reduces the switching activity by 36% -47%. Preferred fill is the software based power reduction method proposed by Ramersaro to reduce the switching activity of the circuit during capture cycle. There are few hardware based methods of reducing test power. Gerstendorfer proposed a method of adding NOR gate with the scan cell to hold the constant output value in combinational circuit during scanning. Swarup

Bhunia proposed a technique of inserting extra supply gating transistor in the supply to ground path for the firstlevel gates at the outputs of the scan flip-flop. This method showed improvement of 62% in area overhead, 101% in power overhead and 94% in delay overhead. Amit Mishra proposed a modified scan flip-flop for low power testing in which the flip-flop disables the slave latch during scan and uses an alternate low cost dynamic latch. There are different latches and flipflops with many different techniques are proposed to reduce power and delay during testing.

#### **II. Existing System**

The existing method comprises of the design of shift register by using pulsed latches. Moreover the architecture of shift register consists of pulsed clock generator which is used for generating clock pulses to the latches. Then, it also consists of sub-shift registers blocks and it also contains temporary storage latches to produce some time delay.



Fig. 2: Typical master slave flip-flop.

The above figure shows that schematic of the fig 2. The maximum clock frequency in the conventional shift register is limited to only the delay of flip-flops. Therefore, the area and power consumption are more important than the speed for selecting the flip-flops. The proposed shift register uses latches instead of flip-flops to reduce the area and power consumption.

In the conventional delayed pulsed clock circuits, the clock pulse width must be larger than the summation of the rising and falling family times in all inverters in the pulsed clock generator clock pulsed width can be shorter than the summation of the rising and falling times because each sharp pulsed clock signal is generated from an AND gate and two delayed signals. Therefore, the delayed pulsed clock generator is suitable for short pulsed clock signals,

#### **II. Proposed system**

#### **Proposed Shift Register:**

A master-slave flip-flop using two latches in Fig.3(a) can be replaced by a pulsed latch consisting of a latch and a pulsed clock signal in Fig.3(b). All pulsed latches share the pulse generation circuit for the pulsed clock signal. As a result, the area and power consumption of the pulsed latch become almost half of those of the master-slave flip-flop.



Fig. 3: (a) Master-slave flip-flop. (b) Pulsed latch.

The pulsed latch is an attractive solution for small area and low power consumption. The pulsed latch cannot be used in shift registers due to the timing problem, as shown in Fig.3. The shift register in Fig. 4(a) consists of several latches and a pulsed clock signal (CLK\_ pulse). The operation waveforms in Fig. 4(b) show the timing problem in the shifter register. The output signal of the first latch (Q1) changes correctly because the input signal of the first latch (IN) is constant during the clock pulse width. But the second latch has an uncertain output signal (Q2) because its input signal (Q1) changes during the clock pulse width.



Fig 4: Shift register with latches and a pulsed clock signal. (a) Schematic. (b)Waveforms

One solution for the timing problem is to add delay circuits between latches, as shown in Fig. 5(a). The output signal of the latch is delayed (T delay) and reaches the next latch after the clock pulse. As shown in Fig. 5(b) the output signals of the first and second latches (Q1 and Q2) change during the clock pulse width(T pulse), but the input signals of the second and third latches (D2 and D3) become the same as the output signals of the first and second latches (Q1 and Q2) after the clock pulse. As a result, all latches have constant input signals during the clock Pulse and no timing problem occurs between the latches. However, the delay circuits cause large area and power overheads.



Fig 5: Shift register with latches, delay circuits, and a pulsed clock signal. (a) Schematic. (b) Waveforms.

Another solution is to use multiple non-overlaps

delayed pulsed clock signals, as shown in Fig.6(a). The delayed pulsed clock signals are generated when a pulsed clock signal goes through delay circuits. Each latch uses a pulsed clock signal which is delayed from the pulsed clock signal used in its next latch. Therefore, each latch updates the data after its next latch updates the data. As a result, each latch has a constant input during its clock pulse and no timing problem occurs between latches.

# ISSN: 2278-4632 Vol-10 Issue-7 No. 11 July 2020



Fig 6: Shift register with latches and delayed pulsed clock signals.(a) Schematic. (b) Waveforms.

However, this solution also requires many delay circuits. Fig.7(a) shows an example the proposed shift register. The proposed shift register is divided into sub shifter registers to reduce the number of delayed pulsed clock signals. A 4-bit sub shifter register consists of five latches and it performs shift operations with five non-overlap delayed pulsed clock signals. In the 4-bit sub shift register #1, four latches store 4-bit data (Q1-Q4) and the last latch stores 1-bit temporary data (T1) which will be stored in the first latch (Q5) of the 4-bit sub shift register #2. Fig.7(b) shows the operation waveforms in the proposed shift register. Five non-overlap delayed pulsed clock signals are generated by the delayed pulsed clock generator in Fig.8. The sequence of the pulsed clock signals is in the opposite order of the five latches. Initially, the pulsed clock signal CLK\_ pulse (T) updates the latch data T1 from Q4. And then, the pulsed clock signals CLK\_ pulse update the four latch data from Q4 to Q1 sequentially. The latches Q2-Q4 receive data from their previous latches Q1-Q3 but the first latch Q1 receives data from the input of the shift register (IN). The operations of the other sub shift registers are the same as that of the sub shift register #1 except that the first latch receives data from the temporary storage latch in the previous sub shift register.



Fig 7: Proposed shift register. (a) Schematic.

The proposed shift register reduces the number of delayed pulsed clock signals significantly, but it increases the number of latches because of the additional temporary storage latches. As shown in Fig.8 each pulsed clock signal is generated in a clock-pulse circuit consisting a delay circuit and an AND gate. When an  $N\_bit$ shift register is divided into  $K\_bit$  sub shift registers, the number of clock-pulse circuits is K+1 and the number of latches is N+N/K. A  $K\_bit$  sub shift register consisting of K+1 latches requires K+1 pulsed clock signals. The number of sub shift registers (M) becomes N/K, each sub shift register has a temporary storage latch. Therefore, N/K latches are added for the temporary storage latches.

The conventional delayed pulsed clock circuits in Fig.7 can be used to save the AND gates in the delayed pulsed clock generator in Fig.8. In the conventional delayed pulsed clock circuits, the clock pulse width must be larger than the summation of the rising and falling times in all inverters in the delay circuits to keep the shape of the pulsed clock. However, in the delayed pulsed clock generator in Fig.8 the clock pulsed width can be shorter than the summation of the rising and falling times be-cause each sharp pulsed clock signal is generated from an AND gate and two delayed signals. Therefore, the delayed pulsed clock generator is suitable for short pulsed clock signals.



Fig 8: Delayed pulsed clock generator

The numbers of latches and clock-pulse circuits change ac-cording to the word length of the sub shift register (K). K is selected by considering the area, power consumption, speed. The area optimization can be performed as follows. When the circuit areas are normalized with a latch, the areas of a latch and a clock-pulse circuit are 1 and  $\alpha\Lambda$ , respectively. The total area becomes  $\alpha\Lambda\times(K+1)+N(1+1/K)$ . The optimal  $K(=\sqrt{N/\alpha\Lambda})$  for the minimum area is obtained from the first-order differential equation of the total area  $(0=\alpha\Lambda_N/K)$ .

## **IV. Extension–SRAM**

SRAM (static Ram) is Random Access Memory (RAM) that retains the data bits in its memory as long as power is being supplied. SRAM is a type of volatile semiconductor memory to store binary logic '1' and '0' bits. SRAM uses Bistable latching circulatory made of transistors /MOSFETS.

SRAM keeps the data constant without the need of memory module to be refreshed periodically consequently. SRAM modules grant faster data access than DRAM ones. SRAM is an on-chip memory. Whose access time is small and faster execution than DRAM. The SRAM are widely used on the processors between the main memory of the computer.

## ISSN: 2278-4632 Vol-10 Issue-7 No. 11 July 2020

SRAM (static RAM) consists of flipflops, a Bistable latching circuit to store each bit and composed four (Or) six transistors. The SRAM is an application and extension to our project[power and area efficient using pulsed latches] the output that gain from the project is taken as input to the SRAM.it is an extension to our project. This is why SRAM is used for applications that requires relatively fast access to data Like video cards and cache memory.

The word static indicates that the memory retains the content as long as power is being supplied. However, data is lost when the power gets down due to the volatile nature.

SRAM chips use a matrix of 6.transisters and no capacitors. Transistors do not require power to prevent leakage, so SRAM need not to be refreshed on a regular basis.

SRAM is random access memory (RAM) that retains data bits in its memory as long as power is being supplied. SRAM for a computer cache memory and as part of the random access memory digital-to-analog converter on a vedio card. Hence SRAM uses more chips than DRAM for the same amount of storage space, making the manufacturing costs higher. SRAM thus used for the very fast access of cache memory.

### CHARACTERSTICS OF SRAM:

- Long life
- No need to refresh
- Faster execution
- Used as cache memory
- Large size
- Expensive
- High power consumption.



Fig.9: RTL Schematic for extension system

The SRAM module is required to have both high operating performance to deal with multimedia applications and low power consumption to prolong battery life. Active power consumption of CMOS logic circuits increases quadratically with supply voltage is one of the most effective ways to reduce energy usage but unfortunately this comes at the expense of

## **Copyright © 2020 Authors**

lower speed. To get the best trade-off, supply voltage and threshold voltage scaling along with process is needed. However the increase of leakage current limits the threshold voltage reduction as well as the supply voltage scaling. Moreover, in 45nm technology and below, voltage scaling becomes very complex due to the difficulty of the SRAM operation. In fact, in order to achieve very high density, the SRAM cell is implemented with the smallest size MOS transistors, which in turn are more and more impact by the increase of process fluctuations. It results in many obstacles to overcome to achieve low-voltage operation.

#### V. RESULTS I. DESIGN SUMMARY:

| Device Utilization Summary |                          |      |           |             |
|----------------------------|--------------------------|------|-----------|-------------|
|                            | Logic<br>Utilization     | Used | Available | Utilization |
| Existing                   | No. of slices            | 1850 | 4656      | 39%         |
|                            | No. of<br>Bonded<br>IOBs | 264  | 232       | 113%        |
| Extension                  | No. of slices            | 0    | 4656      | 0%          |
|                            | No. of<br>Bonded<br>IOBs | 256  | 232       | 110%        |

Table.1 : Device Utilization summary.

The entire project describes about designing of power and area efficient SRAM. So the designing process consists of code and its data. So this data is to be dumped on to a FPGA block for that purpose we use a software called Xilinx.

So by using this Xilinx software we can measure the area in terms of slices. But what does a slice means ? A slice is nothing but group of configurable logic blocks which are present in FPGA board. The no. of configurable logic blocks will depend on which type of FPGA we will use.

So, the FPGA what we use has a total no. of 4656 slices. Now we will see the no. of used slices both in the existing and the extension data.

#### **A.Existing data:**

- The no. of slices used in the existing data are approx., 1850. So the utilization percentage becomes 39%.
- Here by, we come to know that consumption of the area and power is more with respectively.

## **B.** Extension data:

- The no. of slices used in the extension data are absolutely '0'. So the utilization percentage is also 0%.
- Because here we used latches rather than the flip-flops.
- So hereby, we come to know that the area and power consumption is decreased compared to the existing system.

## **II. SIMULATION RESULTS:**

The developed project is simulated and verified their

## ISSN: 2278-4632 Vol-10 Issue-7 No. 11 July 2020

functionality. Once the functional verification is done, the RTL model is taken to the synthesis process using the Xilinx ISE tool. In synthesis process, the RTL model will be converted to the gate level netlist mapped to a specific technology library. Here in this Spartan 3E family, many different devices were available in the Xilinx ISE tool. In order to synthesis this design the device named as "XC3S500E" has been chosen and the package as "FG320" with the device speed such as "-4".



Fig. 10: simulation waveforms.

This design is synthesized and its results were analyzed as shown above fig. 10.

#### VI.CONCLUSION

This paper proposes a power and area-efficient static-RAM using spinner cells and a 256 bit SRAM is implemented by using pulsed latches(spinner cells) through Verilog HDL and Xilinx software. And it also solves the timing problem between the pulsed latches by using multiple non-overlap delayed pulsed clock signals.

And when compared to the existing system the power and area-efficient can be reduced by using this SRAM in the proposed system.

#### REFERENCES

- [1] IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS— I: REGULAR PAPERS, VOL. 62, NO. 6, JUNE 2015
- [2] P. Reyes, P. Reviriego, J. A. Maestro, and O. Ruano, "New protection techniques against SEUs for moving average filters in a radiation en-vironment," *IEEE Trans. Nucl. Sci.*, vol. 54, no. 4, pp. 957–964, Aug. 2007.
- [3] M. Hatamianet al., "Design considerations for gigabit ethernet 1000 base-T twisted pair transceivers," Proc. IEEE Custom Integr. CircuitsConf., pp. 335–342, 1998.
- [4] H. YamasakiandT. Shibata, "Areal-timeimage-featureextractionand vector-generation vlsi employing arrayed-shiftregister architecture," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 2046–2053, Sep. 2007.
- [5] H.-S. Kim, J.-H. Yang, S.-H. Park, S.-T. Ryu, and G.-H. Cho, "A 10-bit column-driver IC with parasitic-insensitive iterative charge-sharing based capacitor-string interpolation for mobile active-matrix LCDs," *IEEE J. Solid-State Circuits*, vol. 49, no. 3, pp. 766–782, Mar. 2014.
- [6] S.-H. W. Chiang and S. Kleinfelder, "Scalingand design of a16-mega-pixel CMOS image sensor for electron microscopy," in *Proc. IEEENucl. Sci. Symp. Conf. Record* (*NSS/MIC*), 2009, pp. 1249–1256.

www.junikhyat.com

# **Copyright © 2020 Authors**

- [7] S. Heo, R. Krashinsky, and K. Asanovic, "Activity-sensitive flip-flop and latch selection for reduced energy," *IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst.*, vol. 15, no. 9, pp. 1060–1064, Sep. 2007.
- [8] S. Naffziger and G. Hammond, "The implementation of the nextgen-eration 64 b itanium microprocessor," in *IEEE Int. Solid-State CircuitsConf. (ISSCC) Dig. Tech. Papers*, Feb. 2002, pp. 276–504.
- [9] H. Partoviet al., "Flow-through latch and edge-triggered flipflop hy-bridelements," *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) Dig. Tech.Papers, pp. 138–139, Feb. 1996.
- [10] E. Consoli, M. Alioto, G. Palumbo, and J. Rabaey, "Conditional push-pull pulsed latch with 726 fJops energy delay product in 65 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.Papers*, Feb. 2012, pp. 482–483.
- [11] Circuits, vol. 34, no. 4, pp. 536–548, Apr. 1999.
- [12] J. Montanaroet al., "A 160-MHz, 32-b, 0.5-W CMOS RISC micropro-cessor," *IEEE J. Solid-State Circuits*, vol. 31, no. 11, pp. 1703–1714, Nov. 1996.
- [13] S. Nomura *et al.*, "A 9.7 mW AAC-decoding, 620 mW H.264 720p 60fps decoding, 8-core media processor with embedded forward-body-biasing and power-gating circuit in 65 nm CMOS technology," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2008, pp. 262–264.
- [14] Y. Ueda *et al.*, "6.33 mW MPEG audio decoding on a multimedia pro-cessor," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Pa-pers*, Feb. 2006, pp. 1636– 1637.
- [15] B.-S. Kong, S.-S. Kim, and Y.-H. Jun, "Conditional-capture flip-flop for statistical power reduction," *IEEE J. Solid-State Circuits*, vol. 36, pp. 1263–1271, Aug. 2001.
- [16] C. K. Teh, T. Fujita, H. Hara, and M. Hamada, "A 77% energy-saving 22-transistor single-phase-clocking D-flip-flop with adaptive-coupling configuration in 40 nm CMOS," in *IEEE Int. Solid-State Circuits Conf.(ISSCC) Dig. Tech. Papers*, Feb. 2011, pp. 338–339.