Mohammed Abdul RaheemAssistant Professor, Dept. of Electronics and Communication Engineering.Muffakham Jah college of Engineering and TechnologyHyderabad, India.Nausheen Banu, M. Sarala Nikita, M. Lokitha ReddyB.E student, Dept. of Electronics and Communication EngineeringMuffakham Jah College of Engineering and TechnologyHyderabad, India. Abstract” FFT is one of the most employed blocks in many communication and signal processing systems and has been widely used in the Digital Signal Processing industry as a rudimentary operation to select the specific frequency components of a signal, which has been involved with other time domain signals.

There are many algorithms for FFT implementation like split radix, Radix-4, Radix-8 FFT algorithm etc… Out of all these Radix-2 FFT algorithm remains the most popular one due to its simple yet powerful structure, which makes it advantageous for signal processing applications.In this paper, Radix-2 FFT algorithm, using Decimation-In-Time (DIT), is implemented in Verilog HDL. The design is implemented using pipelined CSLA and is evaluated for different parameters like gate count, area, power consumption and speed on 180nm TSMC CMOS process technology and results for the same are presented.

Keywords-DSP; DFT; FFT;CSLA; Common Boolean Logic Introduction The Fast Fourier Transform (FFT) has become almost pervasive and most important in high speed signal processing. Using this transform, signals can be moved to the frequency domain where filtering and correlation can be performed with fewer operations. It has been widely used in communications radar applications and hearing aid systems.FFT involves the mathematics such as multiplier, adders and subtractions. In an N-point DFT, to evaluate each DFT sample value, we have to perform N multiplications and N-1 additions using complex numbers. N such computations are required in all, therefore there will be N^2 complex multiplications and N(N-1) complex additions. The FFT procedure for synthesizing and analyzing the Fourier series was given by Cooley and Turkey. These methods provide a divide and conquer approach to the computation of DFT. This is based on the decomposition of N point DFT into successively smaller DFTs. In an N-point sequence if N can be expressed as N =rm, then the sequence can be designated into r point sequences where m denotes the number of stages of computation and r denotes radix of the FFT algorithm. Here the decimation can be performed m times where m =log2N. The total number of complex additions are reduced to NlogrN and total number of complex multiplications are reduced to (N/2)logrN.There are various FFT algorithms such as radix-2, radix-4, radix22, split-radix, winograd and many more.Radix-2 algorithm is the simplest one, but its calculation of addition and multiplication is more than radix-4’s.Though being more efficient than radix-2, radix-4 only can process 4n-point FFT. In this paper hardware oriented Radix-2 FFT using Decimation in Time (DIT) algorithm is presented..radix-2 dit fft algorithmTo derive the algorithm we start with the basic DFT equation. The Discrete Fourier Transform (DFT) X (k), k=0, 1, , N-1 of a sequence x(n), n=0, 1,, N-1 is defined as X(k)=€‘_(n=0)^(N-1)–’–x(n) W^nK — (1)N is the transform size,W^nK=e^(-j2/N), j = €1We begin by splitting the DFT formula into two summations one of which involves the sum over the first N/2 data points and the second sum involves the last N/2 data points. Radix 2 DIT first computes DFT’s of even indexed inputs (x_2m=x_(0,) x_(2,) –,x—_(N-2)) and of the odd indexed inputs (x_(2m+1)=x_(1,) x_(2,)x_(N-1)) then combines those two results to produce DFT of whole sequence.X(k)=€‘_(n=0)^(N/2-1)–’–x(n) W_N^nK — + W_N^(nK/2) €‘_(n=0)^(N/2-1)–’–x(n+N/2)— W_N^nK (2)Since W_N^nK= –(-1)—^kX(k)=€‘_(n=0)^(N/2-1)–’–x(n)+(-1)^k x(n+N/2—) W_N^nK (3)Decimating X (k) into even and odd- numbered samplesX (2k) = €‘_(n=0)^(N/2-1)–’–[x(n)+x(n+N/2—)] (4)k= 0, 1, …, (N/2-1)X (2k+1) = €‘_(n=0)^(N/2-1)–’–[x(n)-x(n+N/2—)] (5)k= 0, 1, .., (N/2-1) X(k)= E_k+ e^(-j 2k/N ) O_kX(k)= E_k- e^(-j 2k/N ) O_kWhere, E_kand O_k represents the DFT of even and odd indexed inputs respectivelyThe two important properties exhibited by twiddle factors are periodicity and symmetry property due to which the twiddle factor can be calculated as follows.w_N^nk=e^(-j 2k/N)=cos(2k/N)+jsin(2k/N) (6)FFT Butterfly unitIn radix 2 FFT, butterfly unit is the heart of the algorithm. The basic signal flow diagram of butterfly unit is shown in the figure 1. A butterfly unit implementation requires a complex add-subtract unit and a complex multiplier. Each complex adder requires two normal additions. Each complex subtractor requires two normal subtractions. Each complex multiplier requires four normal multiplications, one normal addition and one normal subtraction. The block diagram of butterfly unit is shown in figure 2. The twiddle factors required for the calculation can be calculated using the formula in equation (6). In this paper complex adder and subtractor are fused together as single add/subtract unit. Detailed explanation of add/subtract unit is given in section c. Implementation of Radix-2 DIT FFT using verilog HDL Radix-2 FFT has been implemented and synthesized on Xilinx 14.2 using Verilog HDL. The signal flow diagram of 8 point radix-2 DIT FFT is shown in the figure below. In case of DIT FFT algorithm the inputs are the bit reversal of outputs. The twiddle factors are computed manually and are represented in 2’s complement form. The values of twiddle factors are defined in the program as constants and are recalled as and when required by the algorithm. Carry Select Adder as ADD/SUBTRACT unitFFT is commonly used essential tool in digital signal processing applications. The adder plays a very important role in it. To obtain optimized adder design in terms of delay and area, several works have been proposed earlier. In general, ripple-carry adder (RCA) provides a compact design but suffers from a long delay time. Carry look ahead adder (CLA) gives a fast design but has a large area. Carry-select adder (CSA) is intermediate in regard to speed and area. Therefore, CSA is suitable in many applications that consider both speed and area. In this paper modified carry select adder using Common Boolean logic (CBL) is used [4]. To remove the duplicate adder cells in the conventional CSLA, an area efficient SQRT CSLA is proposed by sharing Common Boolean Logic (CBL) term. The block diagram of K bit SQRT CSLA using CBL is shown in the figure. Instead of using two adders the adder for cin=1 is replaced with Common Boolean Logic (CBL). The implementation of adder using CBL requires only one xor gate and one inverter for the calculation of possible summation in advance and for carry propagation one and & one or gate is required. The figure shows the architechture of area efficient CSA using CBL. MultiplierMultipliers are key components of many high performance systems such as FIR filters, Microprocessor, digital signal processors, etc. Many Digital signal processing (DSP) systems include multipliers as one of core hardware blocks. A systems performance is generally determined by the performance of the multiplier because the multiplier is generally the slowest element in the system. Furthermore, it is generally the most area consuming. Hence, optimizing the speed and area of the multiplier is a major design issue however; area and speed are usually conflicting constraints so that improving speed results mostly in larger areas. In this paper Radix-2 booths multiplier is used for multiplication. Booth Multiplier reduces number of iteration steps to perform multiplication as compare to Conventional steps. This algorithm can reduce the number of additions required to produce the result compared to Conventional Multiplication algorithm, where each bit of the multiplier is multiplied with the multiplicand and the partial products are aligned and added together. SIMULATION & RESULTSThe work has been developed on Xilinx version 14.5 using Verilog HDL on Spartan 3E starter board. The architecture is proposed for 18 bits 9 bits for real part and 9 bits for the imaginary part. The proposed architecture has achieved outstanding performance in terms of area and gate count. Power consumption has also been reduced when compared to FFT implementation with Binary to Excess one converter. We simulated the FFT using conventional CSA and proposed FFT using CSA with CBL in 0.18јm CMOS technology on cadence RTL. The comparative results show that the area, timing and no of cells have been reduced in the proposed architecture.Results on xilinx ASIC implementation results results for area, delay, power adnd No of cellsParameter Point 8 FFT Using Regular CSA Using CSA with BEC Using CSA with CBL Area(mm^2) 6427 13166 745No of Cells 265 529 28Power (nw) 18885.0 702489.71 39738.025Timing (ps) 706 559 407 CONCLUSIONIn this paper, an area efficient architecture of Radix-2 FFT algorithm is proposed. This work presents a simple approach to reduce the area and to increase the speed. The proposed 18 bit FFT architechture simulated for 180nm CMOS technologies using cadence RTL, has lesser transistor count and reduced power and delay which makes it efficient for VLSI hardware implementations. In the proposed design the area has been reduced by 11.5%, gate count is reduced by 10% while the power utilisation is slightly increased compared to implementation with regular CSLA.AcknowledgmentFirst and foremost, I thank almighty for blessing me with enthusiasm, courage and knowledge for finishing this project. I express my gratitude to my project guide, Mr. Mohammed. Abdul Raheem, Assistant Professor, MJCET, for his valuable and persistent guidance given by him through discussions, support and timely inspiration. His valuable suggestions have improved the quality of this project. References[1] Lakshmi Santhosh, Anoop Thomas: Implementation of Radix 2 and Radix 22 FFT Algorithms on Spartan6 FPGA, 4th International Conference on Computing, Communication and Networking Technologies (ICCNT), July 4-6, 2013.[2] Mayura Patrikar, Prof.Vaishali Tehre: Design and Power Measurement of 2 And 8 Point FFT Using Radix-2 Algorithm for FPGA Implementation, IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 7, Issue 1, Ver. I (Jan. ” Feb) 2017.[3] Aruna Arya, Prof. Augusta Sophy: FPGA Implementation of 32 point Radix-2 Pipelined FFT, International Journal of Research in Electronics & Communication Technology Volume 1, Issue 1, July-September, 2013.[4] Ahmed Saeed, M. Elbably, G. Abdelfadeel, and M. I. Eladawy: Efficient FPGA implementation of FFT/IFFT Processor, International Journal of Circuits, systems and signal processing, Issue 3, Volume 3, 2009.[5] ] Cooley J W & Tukey J W. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19:297-301, 1965, [IBM Watson Research Center, Yorktown Heights, NY; Bell Telephone Laboratories, Murray Hill; and Princeton University, NJ], December 20-27, 1993.[6] Tarek Belabed, Sabeur Jemmali, Chokri Souani: FFT implementation and optimization on FPGA, IEEE 4th International Conference on advanced technologies for signal and image processing (ATSIP), 2018.[7] Sandeep Shrivastava, Jaikaran Singh and Mukesh Tiwari: Implementation of Radix-2 Booth Multiplier and Comparison with Radix-4 Encoder Booth Multiplier, International Journal on Emerging Technologies 2(1), 2011.[8] C.H. Pallavi, V. Swathi: An Efficient 64-bit Carry Select Adder with reduced area application, International Journal of VLSI and Embedded systems (IJVES), Vol 4, article 03044, June 2013. [9] Gurpeet Kaur, Lovleen Kaur, Navdeep Kaur: Reduced Area Carry Select Adder with Low Power Consumptions, International Journal of Emerging Engineering Research and Technology Volume 3, Issue 3, March 2015, PP 90-95.[10] Ansuman DiptiSankar Das: Efficient Multiplier-Less VlLSI Architechtures for folded pipelined complex FFT core, A thesis submitted to Department of Electronics and Communication Engineering, NIT Rourkela, 2013