Design and Tradeoff Analysis of a Multi-Frequency Clocking Circuit Utilizing High-Speed Carry Chains on an FPGAs

Jeong-Gun Lee*, Deok-Young Lee**

This work was supported by the Hallym University Research Fund (grant number H20180095)

Abstract

In this paper, we propose a multi-frequency clocking (MFC) circuit that utilizes high-speed carry chains on an FPGAs in order to suppress the electromagnetic interference (EMI) of modern high-speed digital circuits. The proposed MFC circuit uses the pre-fabricated modules (CARRY4) for conveying a carry signal with fast propagation delay in order to finely adjust the clock cycle time by assigning different numbers of the CARRY4s resource modules in the clock delay line. Our proposed architecture has been developed on the Spartan 6 FPGA. The actual working system with the proposed MFC design shows up to 10.41 dB EMI reduction in the cases of 32 clock frequencies at 31.4 MHz when compared with a single clock reference design. However, the performance overhead increases as the number of frequencies increases in an MFC design, so that system designers need to do tradeoff analysis for optimal target system design.

요 약

본 논문에서는 디지털 회로의 동작상에서 발생하는 전자파 간섭(Electromagnetic Interference, EMI)를 감소시키기 위해서 FPGA 내부 구조에서 회로 자원으로 제공되는 고속의 캐리체인 (Carry chain) 회로를 사용하여 다중 주파수 클럭(multi-frequency clocking, MFC)을 생성하는 전략을 제안하고 이에 대한 성능 평가로 전자파 감소의 필요성을 분석하였다. 제안된 MFC 회로는 전용의 고속 캐리 체인을 사용하여 자연적(Delay line) 내에 상이한 수의 캐리체인 모듈 (CARRY4)을 할당함으로써 클럭 주파수를 적응적으로 변경한다. 제안된 구조는 Spartan 6 FPGA 상에서 구현하였으며, 실험 결과 제안된 구조를 통해서 만들어진 31.4 MHz의 MFC 회로에서 최대 32개의 클럭 주파수를 사용하여 전자파 노이즈를 10.41 dB 감소시킬 수 있음을 확인하였다. 그러나 주파수의 수가 증가함에 따라 성능의 오버헤드가 증가하며 이에 대한 고려와 분석이 필요하다.

Keywords
multi-frequency clocking, high-speed carry logic, FPGA, EMI-aware performance

* Professor, School of Software, Hallym University  
- ORCID: http://orcid.org/0000-0001-6218-4580
** Assistant professor, College of General Education, Hallym University  
- ORCID: http://orcid.org/0000-0003-0960-0347

· Received: Oct. 22, 2020, Revised: Nov. 24, 2020, Accepted: Nov. 27, 2020
· Corresponding Author: Deok-Young Lee
  College of Ilsong Liberal Art, Hallym University, Chuncheon, Gangwon, Korea,  
  Tel.: +82-33-248-2772, Email: riverlike@hallym.ac.kr
I. Introduction

In a global clock based synchronous digital circuit, large number of logic gates switch concurrently at the edges of a global clock signal [1][2]. As a result, they draw a large periodic current through the power distribution networks (PDNs) [3]-[6]. However, the current flows in the PDNs caused strong radiation from unintentional onboard antennas. This phenomenon is commonly known as EMI problem and may cause failures in the digital circuits. Thereby, reducing EMI in digital systems is now an important design issue.

So far, there are many efforts to reduce the EMI noise. The work in [5] used clock skew optimization to shape the switching current by controlling the arrival times of clock at Flip-Flops (FFs). However, it suffers the tight timing constraints to avoid functional failures. The asynchronous system, on the other hand, does not use the global clock signal to synchronize the circuits [2][6]. Within the clockless asynchronous circuit, local circuit modules can operate at their optimal speed without considering other modules’ fabrication process stability and operating supply voltage. Therefore, this approach can achieve the good EMI reduction because the local clock tends to trigger at random points in time [6]. However, one of the weakpoints is the lack of professional design CAD tools, difficulties with verification on a Field Programmable Gate Array (FPGA) and higher requirements of the circuit area due to more complex controlling block.

Spread spectrum clock (SSC) is a technique that modulates the system clock to reduce EMI [7]-[9]. The spread spectrum clock generator which is based on an “all analog” takes the high efforts in implementation and verification. It has been proven to work effectively but it is complex. In general, it requires a large loop filter capacitor to pass modulated signal in the phase-locked loop (PLL), resulting in increasing chip area [9].

Our previous works in [6] propose the multi-frequency clocking generator that modulates the system clock to reduce the EMI. In [6], we focus on the generalization of the MFC generators that have been constructed by the look up table (LUT) to distribute the spectral power over eight different clock frequency components. However, the resolution of the delay using LUT is limited by the LUT and the interconnect delay in between. As a result, it decreases the effect of EMI reduction and timing performance.

This paper proposed a design of multi-frequency clocking strategies utilizes the CARRY4 primitive which is available in 7 Series Xilinx FPGAs [10]. The multi-frequency clocking is a kind of discrete digitalized version of a SSC [3][4]. Then we evaluate how much the implemented FPGA-based MFC circuit can reduce EM noises on PDNs. The advantages of our architecture are simple design and low cost compared to the conventional SSC. On the other hand, the intrinsic delay of each CARRY4 element is relatively short because it uses the high-speed dedicated routes [6]. Therefore, our proposed MFC strategy can obtain the high attenuation of EMI.

The remainder of the paper is organized as follows. In Section II, preliminaries on an FPGA device and the dedicated high-speed carry path will be presented. In Section III, our proposed MFC is demonstrated. Section IV summarizes the results of the experiments, and Section V conducts an analysis on design tradeoff considering system performance. Finally, Section VI concludes this paper.

II. Preliminaries

2.1 A Structure of an FPGA

An FPGA is a reconfigurable hardware circuit who’s functionality can be reprogrammed many time. Its performance is better than that of a software but less than ASIC-style implementation of the function. Typically it has been used mainly as a prototyping platform but it is now used as a core part of a
System-on-Chip (SoC) for better performance and energy efficiency than pure hardware or software only implementations. Fig. 1 shows a typical internal structure of an FPGA and there are four main components such as configurable logic blocks (CLB), memory, digital signal processor (DSP) and configurable switches and wires [7][11][12]. By programming the CLB and configurable switches/wires, any functional logic circuit can be implemented on the FPGA devices.

![Internal structure of an FPGA](image)

**Fig. 1. Internal structure of an FPGA**

### 2.2 High-Speed Carry Logic in an FPGA

Since an FPGA is a reprogrammable platform with redundant reconfigurable logic blocks, in general, it can not be optimized more than full-custom ASIC-style circuit implementation. In order to overcome such a timing and area performance issue, modern commercial FPGAs adopt dedicated well-optimized dedicated circuits and functional modules such as a digital signal processing (DSP) logic, a block memory and a high-speed carry logic (CARRY4 in an Xilinx FPGA) [10]. Through the uses of such dedicated high-speed IP modules, high performance circuits can be implemented on an FPGA. An addition is a very essential logic component in a modern digital system, so modern FPGA vendors develop an FPGA architecture in which high speed carry propagation paths are implemented as a dedicated logic block in the name of ‘CARRY4’ in Xilinx FPGA devices. Through this dedicated carry propagation path, an adder circuit with a wide bit width can be accelerated.

## III. MFC Architecture

### 3.1 MFC operation and CARRY4 in FPGAs

In a clocking circuit design, a clock frequency can be modulated by adding or subtracting the jitter noise, $\Delta t$ deviation, to the clock source. Then, the modulated clock can be one of the pre-defined set of small discrete frequencies and the clock changes its frequency over time slightly with intentional jitter. Finally, the power spectrum of the modulated clock will be reduced [6].

The proposed MFC circuit is implemented on an FPGA and it is mainly based on the CARRY4 primitive in the FPGA. We can use carry chains which are available in most FPGA families. The CARRY4 has been designed for implementing adders, accumulators and counters for the digital processing applications. The carry chains which are the dedicated routes between FPGA logic elements have the minimal propagation delay because they are implemented as a hard-macro design. The timing simulation shows the average delay of CARRY4 block in Spartan 6 FPGA is about ‘100 ps’.

In order to utilize the high-speed feature of the CARRY4 block, we utilize the dedicated carry chain structure to generate a finely controllable variable delay line. This is different from the MFC architecture proposed originally in [6] where the delay line is constructed by cascading LUT block resources.

On the otherhand, for obtaining the minimal frequency differences (delay difference) between discrete clock frequencies (1/“clock cycle time”), authors need a high effort to design carefully the relative locations of delay elements as well as the
Design and Tradeoff Analysis of a Multi-Frequency Clocking Circuit Utilizing High-Speed Carry Chains on an FPGAs

wire delay of their interconnections in the FPGA. Such a low level optimization in an FPGA devices is inefficient and impossible to do in practice.

3.2 Internal Architecture of an MFC

Fig. 2 illustrates the place and routing of CARRY4s chain for an adaptive MFC using eight CARRY4s blocks (from CARRY4_0 to CARRY4_7). Each CARRY4 block provides four built-in multiplexers (MUXCY) and four xor gates (XORCY) [10].

In a single CARRY4 block, we can generate four different delays according to the 4-bit SIN signal. In Fig. 2, the DIN is a single value signal and the signal branches to the input of all the four multiplexers. By properly setting the SIN and DIN signal values, we can make four different carry propagation pattern and it leads to four different delays as follows.

1) Shortest Carry Propagation Delay
   SIN[3:0] = 1111
   The incoming DIN signal takes one-MUX delay
2) Second Shortest Carry Propagation Delay
   SIN[3:0] = 0111
   The incoming DIN signal takes two-MUX delay
3) Third Shortest Carry Propagation Delay
   SIN[3:0] = 0011
   The incoming DIN signal takes three-MUX delay
4) Longest Carry Propagation Delay
   SIN[3:0] = 0001
   The incoming DIN signal takes four-MUX delay

In the CARRY4 block presented in the right side of Fig. 2, incoming signal on DIN goes to COUT through MUXCY_2 and MUXCY_3. The propagation path is marked by a dotted line (DIN → MUXCY_2 → MUXCY_3 → COUT). In this case, the SIN[3:0] is set to “0111”.

Fig. 3 presents the schematic of the MFC of producing a modulated clock signal (CLK) to a synchronous digital circuit. The eight CARRY4 blocks are serially connected in a form of a chain in order to form a variable delay line.

Fig. 2. Internal structure of CARRY4 and CARRY4 chain architecture

Fig. 3. Block diagram of the MFC
Together with the CARRY4 chain, an AND gate, a cascaded inverter chain and a global clock tree buffer (BUFG) are connected in a loop to form a closed loop ring (ring oscillator) structure. The proposed ring oscillator is easy to be designed and implemented in an commercial FPGA by using the chain of the odd number inverters (marked by Inverter-Chain in Fig. 3) which are connected in the series. The Inverter-Chain can be implemented by an LUT element on FPGAs if the delay of the chain needs to be large.

Finally the \(i\)-th smallest clock cycle time \((CCT_i)\) can be described by the following equation, Eq. 1.

\[
CCT_i = BCCT + i \times (2 \times D_{MUX})
\]

where \(0 \leq i \leq n - 1\)

Here, \(D_{MUX}\) is the delay of single MUX element in a CARRY4 block and BCCT is a base clock cycle time which is implemented by configuring the shortest carry propagation length. Note that the \(D_{MUX}\) is 50 ps in average and the clock cycle time is adjusted at the 100 ps granularity.

Note that In the proposed MFC architecture, the serially cascaded CARRY4 chain has the role as a finely controllable delay element. Based on the ‘32-bit’ SIN signal, a different number of delay is selected as explained before. Each CARRY4s block has four delay elements. Thus, an “eight-CARRY4s” generate totally thirty two \((4 \times 8 = 32)\) different delays which are corresponding to thirty two different clock frequencies.

On the other hand, in order to guarantee the feedback CLK (marked by “Data in” Fig. 3) arriving to the eight CARRY4s at the same time, we use the global clock tree line (BUFG) in the design. The BUFG is a FPGA dedicated resource and it has a role of amplifying signal strength so the “Data in” signal propagates to the input of the CARRY4 blocks in a high speed. That means the arrival time of the “Data in” signal at the eight CARRY4 blocks are almost similar to each other.

The counter and comparator are used for the MUXs to select one of incoming value for 32-bit SIN signal. The output of the MUX becomes an input to the eight CARRY4s. The switching threshold value is defined as the number of clock cycles and during the number of clock cycles, selected clock frequency is utilized to drive a synchronous digital system. After consuming the number of clock cycles, new clock frequency is selected. Actually, it is the parameter to select the modulation frequency of the proposed MFC design.

In our design, the switching frequency defines the rate at which the operating frequency is changed. The bit width of the counter is calculated based on the switching frequency and the operating frequency:

\[
m = \left\lceil \frac{\log_2 f_o}{f_{SW}} \right\rceil
\]

In Eq. 2, \(m\) is a number of bits in a counter, \(f_o\) is an operating frequency and \(f_{SW}\) is a switching frequency. To ensure the input values of the SIN for each CARRY4 block become stable before the “Data in” arrives, the following timing constraint needs to be satisfied:

\[
T_{counter} + T_{comp} + T_{MUX} < T_{AND} + T_{inverter-chain} + T_{BUFG}
\]

In Eq. 3, \(T_{counter}, T_{comp}\) and \(T_{MUX}\) are the delay of a counter, a comparator and a multiplexer, respectively. \(T_{AND}, T_{inverter-chain}\) and \(T_{BUFG}\) are the delay of an AND gate, an inverter chain and a BUFG, respectively. By controlling \(T_{inverter-chain}\), we can make the constraint be satisfied for correct operations.

IV. Experimental Results

Our proposed architecture has been implemented on the development board employing a Xilinx Spartan-6 FPGA. The EMI measurement results are shown in Fig. 3 and Fig. 4.
Design and Tradeoff Analysis of a Multi-Frequency Clocking Circuit Utilizing High-Speed Carry Chains on an FPGAs

V. Tradeoff Analysis

As we increase the number of clock frequencies for modulation, we can obtain more EMI reduction as shown in Table 1, we lost some performance. In order to perform tradeoff analysis, we approximate the peak power reduction trend with a log function as shown in Fig. 6. Then, the approximated function for peak spectrum power is fitted with $R^2 = 0.98$ as the following equation (Eq. 4).

$$EMI_{\text{approx}} = -6.266 \times \ln(x) + 70.656$$  \hspace{1cm} (4)
\[
CCT_{n}^\text{avg} = BCCT + \frac{1}{n} \cdot \left( \frac{n(n+1)}{2} \times 2 \times D_{\text{MUXCY}} \right) 
\]

Finally, the performance overhead (Overhead) caused by employing multiple clock frequencies can be presented by the following equation, Eq.6.

\[
\text{Overhead} = \frac{(CCT_{n}^\text{avg} - BCCT)}{BCCT} = \frac{[(n+1) \times D_{\text{MUXCY}}]}{BCCT}
\]

Table 2 shows estimated performance overhead according to Eq. 6. For the estimation, ‘31.84 ns’ is used for BCCT since we used the clock frequency, 31.4 MHz, for the minimum clock frequency of the MFC in experiments. Then, ‘50 ps’ is used for ‘D_{\text{MUXCY}}’.

As shown in Table 2, the performance overhead is less than 5%. However, in some system, such a performance loss can be critical problem. It will be a designer’s choice for selecting most suitable number of frequencies in the MFC for the target application system depending on the EMI regulation.

Table 2. Performance overhead of MFC

<table>
<thead>
<tr>
<th># of Freq, for MFC (n)</th>
<th>Overhead</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>-</td>
</tr>
<tr>
<td>8</td>
<td>0.0138 (1.38%)</td>
</tr>
<tr>
<td>16</td>
<td>0.0248 (2.48%)</td>
</tr>
<tr>
<td>24</td>
<td>0.0341 (3.41%)</td>
</tr>
<tr>
<td>32</td>
<td>0.0414 (4.14%)</td>
</tr>
</tbody>
</table>

Finally, a new metric, EMI-aware performance, for tradeoff analysis can be formulated as a product of \(1/EMI_{\text{appro}}\) and \(1/CCT_{n}^\text{avg}\) in the following equation.

\[
\text{Perf}_{\text{EMI}} = \frac{1}{(EMI_{\text{appro}} \times CCT_{n}^\text{avg})}
\]

Fig. 7 shows the EMI-aware performance for an MFC system. When we consider both of peak EM noise and performance, the use of “8 clock frequencies” will be best choice. Nevertheless, the proper choice for the number of clock frequencies in MFC has to be considered together with EMI regulation since satisfying the regulation is higher priority then performance in some safety critical system design.

VI. Conclusion

This paper proposed a new MFC architecture based on a high-speed dedicated FPGA resource called a CARRY4 block. Then, we evaluated an EMI reduction of the MFC circuit utilizing the carry chains in FPGA. Our proposed architecture was implemented on a Spartan-6 FPGA based board and it had the advantages of low cost and easy implementation on an FPGA thanks to the fully digitalized structure when compared to the design of the conventional spread spectrum generator. However, performance overhead increases as the number of frequencies increases in an MFC design, so that system designers need to do tradeoff analysis for optimal target system design.

References


Authors

Jeong-Gun Lee

Feb. 1996: Dept. of Computer Science, Hallym University (BS)
Feb. 1998: Dept. of Info. and Comm., Gwangju Institute of Science and Technology (ME)
Feb. 2005: Dept. of Info. & Comm., Gwangju Institute of Science & Technology (Ph.D)

Feb. 2008 ~ present: Professor, School of Software, Hallym University

Deok-Young Lee

Feb. 1993 : Dept. of Computer Science, Kangwon National University (BS)
Feb. 1999 : Dept. of Computer Science, Kangwon National University (MS)
Aug. 2006 : Dept. of Computer Science, Kangwon National University (Ph.D)

Sep. 2003 ~ present : Assistant professor, College of General Education, Hallym University
Research topic: Arithmetic Circuit, Computer Architecture, Computer Programming Education