Spur Minimization Techniques for Ultra-Low-Power Injection-Locked Transmitters

Chung-Ching Lin, Student Member, IEEE, Huan Hu, Student Member, IEEE, and Subhanshu Gupta, Senior Member, IEEE

Abstract—Frequency multiplying wireless transmitters (TX) employing harmonic injection-locked technique benefit from high energy efficiency and less hardware complexity but largely suffer from reference spurs that violate the TX spectral specifications. This work proposes a self-aligned phase-locked loop (PLL) in conjunction with harmonic injection locking technique to achieve significantly improved spur performance under ultra-low-power (ULP) operations for sub-GHz IoT TXs. The on-chip type-I PLL calibrates the phase error in the ring oscillator (RO) in real-time and avoids large spurs induced from the frequency deviation in the harmonic injection locking. A compact zero power consumption twin-T notch filter for spur suppression is implemented within the PLL loop to tackle the rail-to-rail voltage jump that comes from the output of the phase detector (PD). Designed and fabricated in the TSMC 180 nm CMOS process, the proposed frequency multiplying TX occupies an active area of only 0.0413 mm². The lowest power consumption with >3X improved energy-efficiency is observed while consistently achieving >62 dB spur suppression with −14 dBm output at 915 MHz. The TX supports OOK modulation with an average power consumption of 200.9 µW only at 3 Mbps data rate achieving a normalized 66.97 pJ/bit energy efficiency.

Index Terms—Ultra-low-power transmitter, spur suppression, self-aligned PLL, injection-locked TX, twin-T notch filter.

I. INTRODUCTION

The deployment of the low-cost wireless sensor nodes with ultra-low-power (ULP) consumption has gained significant interest in both industry and academia. These sensor nodes mostly use license-free bands that differ only slightly in their frequency allocations globally. Fig. 1 shows the commonly-used frequency bands for the low-power radio. Except for the 868 MHz which is dedicated to the European Union, the frequencies, 315/433/915/2450/5800 MHz have been allocated for use in North America [1]. The emphasis on wirelessly connected devices for Internet-of-Things (IoT) applications has furthered the need for long-lasting sub-GHz wireless transmitters (TX). However, the constrained battery capacities put stringent design requirements on the wireless TX which requires not only high energy-efficiency (pJ/bit) but also low spurious emissions with a minimum number of external off-chip components. This trade-off is even more apparent with larger antenna sizes at lower frequencies (e.g., 315/433 MHz) and smaller sizes at higher frequencies (>2.4 GHz) but higher power consumption, propagation loss, and interferences with other technologies such as WiFi and Bluetooth. Therefore, a 915 MHz band is chosen in our design-of-interest which compromises the tradeoff among power consumption, path loss, and antenna size.

In general, the ULP TX architectures can be categorized into four topologies as shown in Fig. 2. A direct-conversion architecture comprising quadrature digital-to-analog converters (DAC) and phase-locked loop (PLL) for up-conversion [2], [3] or directly applying thin-film bulk acoustic resonator (FBAR) which oscillates at carrier frequency to eliminate the PLL [4] is shown in Fig. 2(a). However, this requires high-frequency generation components in the transmit path that increases the power consumption and degrade energy efficiency. Besides, the need for a precise quadrature phase relationship increases power consumption and design complexity [5]. A polar TX architecture is shown in Fig. 2(b) which has been used to demonstrate both high linearity and efficiency simultaneously by decoupling the phase and amplitude paths [6]. The phase information is applied to a fractional-N PLL while the amplitude is handled by the supply regulator that feeds into the supply of the nonlinear power amplifier. High circuit complexity due to bandwidth expansion and a larger number of circuits operating at RF frequency, however, has limited the lowest power consumption achievable. Fig. 2(c) shows the power oscillator based TX that comprises of a voltage-controlled
oscillator (VCO) whose source is modulated to demonstrate different modulations such as on-off keying (OOK) [7]–[10] or frequency-shift keying (FSK) [10], [11] or pulse-position modulation (PPM) [12]. This architecture is attractive owing to its simplicity but requires large inductors as the transmit frequency decreases. Hence, this topology is mostly applicable at higher frequencies TXs (for example, 2.4 GHz or > 6 GHz) and rarely applied to sub-GHz TX (i.e., at 400/900 MHz) implementations. Also, the inductor in the LC tank while serving as the resonant load acts as an antenna which is sensitive to the external interference leading to frequency pulling. Rather than doing the frequency synthesis and data processing at the carrier frequency, the architecture in Fig. 2(d) performs the aforementioned operations at lower frequencies before combining multiple output phases from the local oscillator using an edge-combing power amplifier (ECPA) to generate a high-frequency carrier. Given most circuits are operated at low power, the frequency translating architecture is the most suitable candidate for ULP sub-GHz TXs as evident in recent works [12]–[21]. However, the use of injection locking technique while area- and energy-efficient, results in close-in spurs next to the carrier frequency at the TX output owing to the frequency deviation between the injected signal and carrier signal [13]–[15]. These unwanted spurs violate spectral mask requirements and also create interference to nearby devices at similar frequencies. Earlier works [16]–[21] utilize conventional delay-locked loops (DLL), PLL, or FPGA to calibrate the oscillator frequency. However, these approaches suffer from a large area or lower power efficiency.

To address these issues, this work proposes spur minimization techniques without significant area and power overhead. The proposed architecture corrects phase errors due to injection locking by a self-aligning type-I PLL in real-time operating in conjunction with a harmonic injection-locked ring-oscillator (HILRO). The introduction of type-I PLL for phase error correction can be further sequenced through a coarse- and fine-acquisition processes. During the coarse process, the type-I PLL tries to reduce the large phase difference between phase detector (PD) inputs which helps in shifting the RO frequency closer to the injection-locked frequency. Following this, the injection-locking technique self-starts the fine-tuning process by drawing or injecting current into the RO capacitive load. This process ends as the phase difference between the RO and the injected signal approaches zero. The alignment between the two paths (PLL and HILRO) is achieved after the coarse- and fine-acquisition processes are over. After the phase error correction, the TX performs frequency translation using an ECPA at 915 MHz. The proposed TX with single-ended implementation in 180 nm CMOS realizes minimal power consumption and cost as well as high energy-efficiency compared to the state-of-the-art.

In this manuscript, we expand our recent work in [22] where the PLL-calibrated harmonic injection-locked TX was first introduced with the following key additions:

i) Detailed analysis of fundamental issues in injection-locked technique for ULP sub-GHz TXs. (Section II)
ii) Design considerations for ULP sub-GHz TXs using an injection-locking technique with self-aligned PLLs highlighting trade-offs in spur performance, power consumption, and PLL settling time. (Section III)
iii) Detailed design description of circuit components. (Section IV)
iv) Measurements results to demonstrate robustness over multiple parts and time-domain results. (Section V)

Finally, section VI concludes this article.

II. LIMITATIONS OF ULP INJECTION-LOCKED TXS

The harmonic injection locking technique refers to a phenomenon that locks a free-running oscillator to the $N^{th}$ harmonic of the injected reference signal. Fig. 3(a) shows the ideal harmonic injection spectrum. The ideal case assumes that the frequency error ($f_{ERR}$) is close to zero, and the PVT variation does not exist. When injection-locked the oscillator tracks the reference clock every $N$ cycles and runs freely for $N-1$ cycles. In practice, however, any mismatch in the oscillator free-running frequency ($f_0$) and the injection frequency ($f_{INJ}$) create large frequency errors ($f_{ERR}$) resulting in unwanted spurs around the TX output significantly degrading

![Fig. 2. Conventional TX architectures: (a) direct-conversion TX, (b) polar TX, (c) power oscillator TX, and (d) frequency multiplying TX.](image-url)

![Fig. 3. The spectrum of (a) an ideal harmonic injection-locked, and (b) harmonic injection locking generating close-in spurs in practical implementations [22].](image-url)
the TX spurious performance, as shown in Fig. 3(b). The spur caused by the frequency mismatch can be quantified as [23]:

\[
\text{Spur}_\text{INJ} = 20 \log (N \times \frac{|N \times f_{\text{REF}} - f_{\text{free-running}}|}{f_{\text{free-running}}})
\]

(1)

where \(f_{\text{REF}}\), \(f_{\text{free-running}}\), and \(N\) denotes the reference frequency, oscillator free-running frequency, and the number of harmonics respectively.

Fig. 4 shows the estimated spur level versus the frequency error. It can be observed that higher-order injection with large frequency errors (\(f_{\text{ERR}}\)) result in non-negligible spurs near the TX output. These are especially hard to filter in the harmonic injection-locked TXs. In absence of any real-time error correction, higher-order harmonic injection results in weaker frequency pulling and oscillator drift that can easily exceed the lock range. A real-time calibration loop is thus needed to correct these phase errors [26].

Recent oscillator calibration methods, shown in Fig. 5, have demonstrated either reduced power consumption at the cost of poor spur suppression [13] or good spur suppression performance (<50 dB) but with increased hardware complexity and power consumption [19], [24], [25]. In [13], a low-power injection-locked RO for a wireless TX was demonstrated that combines multiple low-frequency clock phases to generate a 401 MHz carrier. However, no spur suppression technique was considered and the architecture is thus limited to low combining ratios. In [19], an external FPGA was used in feedback to calibrate the RO avoiding the need for injection locking, but large power and integration cost due to off-chip components makes it infeasible for integration in the area limited IoT applications. In [24], the harmonic injection locking was applied with DAC-based digital frequency calibration. The calibration is achieved by feeding one of the RO clocks to a counter with cascaded flip-flops. The counter counts the number of output clock cycles as the injected signal goes from high to low and generates two pulse signals that store the information as time difference. This information is used in the digital calibration engine to generate control codes for adjusting the oscillator frequency. However, the DAC quantization noise, precision, and mismatch largely limit the oscillator calibration performance. A harmonic suppression technique using pulse-generators was also proposed in [25]. The suppression concept in [25] generated an injection signal by strengthening the 15\textsuperscript{th} harmonic of the desired signal and suppressing the adjacent undesired odd-order harmonics (i.e., 13\textsuperscript{th} and 17\textsuperscript{th}). Pseudo-differential delay cells and current-mode logical-AND gates were utilized to generate pulses with different pulse-widths. These pulses were combined by employing a differential current steering DAC with inductive loads connected to the VCO where subharmonic injection locking is applied. But the need for differential configuration and stringent matching requirements limit its adoption in sub-GHz IoT TXs. Thus, a low power, and low complexity method must be provided to calibrate the RO.

III. DESIGN CONSIDERATIONS OF HILRO TX WITH SELF-ALIGNED TYPE-I PLL

Fig. 6 shows the block diagram of the proposed TX including pulse generator (PG), RO, type-I PLL, buffers, and ECPA. Design considerations of the proposed architecture are explained further.

A. Phase Error Correction in RO With Multiple Injection Points

As mentioned earlier in Section I, harmonic injection-locking is an energy- and area-efficient method for implementing ULP sub-GHz TX. However, it does suffer from poor spur suppression because of process, voltage, and temperature
(PVT) variations. Instead of viewing the spurs in frequency domain as shown in Fig. 3(b), the spur performance can be studied in time-domain as shown in Fig. 7. Mismatches and PVT variations result in slight timing differences leading to phase error. For example, if we consider fifth-harmonic injection-locking, the oscillator will adjust its timing every 5 cycles while remain free-running for the rest of the 4 cycles. The spur is proportional to the difference between the injected signal and oscillator signal which indicates that higher timing differences will result in even larger spurs. As the harmonic factor is further increased, the number of free-running cycles also increases which results in the phase error accumulation becoming more severe and leading to higher spur levels. In [27], a gated pulse technique was applied to the LC-VCO. The pulse gating technique gates the injection pulse periodically to avoid the race conditions and capture accumulated phase error which enables the feedback loop to adjust the frequency. The race condition between two loops exists for both LC VCO and RO. Specifically, LC VCO can be adapted to avoid the race condition in the following two ways. First, the injection locked and the frequency calibration loops can be designed to work on opposite phases of the clock signal at the rising edge and falling edges respectively. Second, a pulse gating method can be applied to disable the injection locked signal so that PLL will not compete with the injection locking [27]. In our proposed design, the injection locking and the PLL operation are interleaved as shown in Fig. 7. Therefore, the race condition can be obviated in the RO based design.

B. Design Considerations of Self-Aligned Type-I PLL Loop

While integer-N type-II PLL is the most common architecture owing to its robust performance, noise, and spur performance [26], the need for a charge-pump with large loop filter and bandwidth limitations makes the type-II PLL not very attractive for ULP TXs. In contrast, the type-I PLL has only one integrator contributed by the VCO and does not require additional nulling resistors and capacitors besides the charge-pump resulting in significant area savings and reduced loop noise contribution [28], [29]. Further, the PD in the type-I PLL can be as simple as a digital XOR which also avoids the dead-zone issue common in the phase-frequency detector (PFD) in type-II PLLs.

Despite several benefits of type-I PLL including low hardware complexity and relaxed bandwidth, it suffers from a sub-optimal spurious performance. In the type-I PLL, the spur is contributed from the control line ripple that occurs at the TX output offset by fREF. In type-II PLL, to maintain a constant control voltage during the locked condition, the net charge between charging and discharging phase should be zero ideally. Therefore, the spur is mainly contributed by the charge pump current source mismatch. The spur amplitude ranges between a few millivolts to a tenth of millivolt. In contrast to type-II PLL, non-zero phase error in type-I PLL exists due to lack of the signal path integrator [29] and leads to non-zero PD output which implies that the PD output is always bouncing between supply and the ground voltages. The constant phase error will be further interpreted using a control voltage point of view later in this section.

Fig. 8(a) shows the detailed design approach for the type-I PLL with an XOR gate as PD. The control voltage and phase difference can be expressed as:

\[ V_{ctrl} = \frac{2\pi (f_{ref} - f_{out})}{K_{VCO}} \]  

(2)

\[ \Delta \Phi = \frac{V_{ctrl}}{K_{PD}} = \frac{2\pi (f_{ref} - f_{out})}{K_{PD} \times K_{VCO}} \]  

(3)

where \( K_{PD} \) and \( K_{VCO} \) are the gain of the phase detector and the voltage control oscillator. As shown in (2) and (3), the output DC voltage is the average voltage from the XOR output.

To maintain a constant DC value, the frequency and the duty cycle of the XOR output are kept the same and the constant phase differences relationship is also sustained between the reference clock and the divided RO signal while the type-I PLL is in the lock condition. In absence of a constant DC voltage, a rail-to-rail voltage jump is inevitable. This conclusion is the same as that in [29] that discusses the constant phase error using the dynamic analysis of the PLL loop. If one considers the benefits from the digital intensive architecture and wide bandwidth from the type-I PLL, the rail-to-rail XOR voltage jump must be taken into consideration during design. Admittedly, if the RC time constant in Fig. 8(a) is increased, the spur levels can be suppressed. However, the loop bandwidth also shrinks which negatively affects the loop step response. Besides, the size of the passive components also

Fig. 7. Timing diagram comparing ideal, conventional, and proposed TX outputs [22].

Fig. 8. XOR-based type-I PLL without frequency divider: (a) circuit realization, (b) closed-loop s-domain model, and (c) timing diagram.
increases. Fig. 9 plots the control voltage amplitude against the RC time constant. In [28], a master-slave sampling filter and harmonic traps are proposed to suppress the voltage ripple. The harmonic traps are implemented using an active inductor in series with a capacitor and a delta-modulator is used to calibrate the harmonic traps digitally. However, additional power consumption is required for higher quality factors in the active inductor as well as to reduce noise. The harmonic traps are thus not suitable for applications with limited power and noise budgets. Multi-modulus divider with duty-cycled bootstrapping clocks was proposed in [30] to suppress the spur sampled onto the capacitors with $-33$ dB and $-47$ dB carrier-to-spur ratio (CSR) observed from a 50% and a 5% duty cycle clock respectively. Nevertheless, additional power and area are required to implement all the required circuits.

To avoid large area penalty and additional power consumption, a passive notch filter is utilized in series with the RC low pass filter to form the overall loop filter in our proposed work. The notch filter combines a low-pass and a high-pass filter in parallel as shown in Fig. 10(a). By setting the corner frequencies of the low-pass filter ($f_L$) and the high-pass filter ($f_H$), the notch frequency ($f_C$) can be determined. Fig. 10(b) shows the circuit implementation of the notch filter from Fig. 10(a). $R_1, C_1,$ and $R_2$ form the low-pass path while the $C_2, C_3,$ and $R_3$ form the high-pass path. This topology is also known as the twin-T notch filter. Because the twin-T notch filter only has resistors and capacitors with reasonable values, high power consumption and large area are avoided. The transfer function of Fig. 10(b) can be derived as (4), as shown at the bottom of the page [31]. Detailed derivation can also be referred to in [32]. Compared with the LC series nulling network which adds a zero at an undesired frequency, the twin-T notch filter introduces poles in the loop to null the undesired frequency which can pose a stability concern, especially as the bandwidth of the type-I PLL can reach half the reference frequency. System modeling of the type-I PLL including the modified loop filter is performed using MATLAB as shown in Fig. 11. The oscillator frequency ($f_o$) and reference frequency ($f_{REF}$) are set as 100 MHz and 20 MHz respectively. The magnitude and phase response versus frequency is plotted in Fig. 12. A phase margin of 40$^\circ$ was observed at the band edge (i.e., 500 kHz). As the time constant of the RC low-pass filter increases, more spurs can be attenuated at the cost of slower settling time (and lower phase margin). Therefore, one can choose the appropriate RC values for the notch filter depending on the required system specifications.

A simple first-order model has been developed for studying the PM necessary for loop stability as well as determining the parameters of the first RC low pass filter. The pole locations

$$
\frac{V_{out}}{V_{in}} = \frac{s^3 + s^2 \left[ \frac{1}{C_1} \times \left( \frac{1}{R_1} + \frac{1}{R_2} \right) \right] + s \left[ \frac{1}{C_1R_1R_2} \times \left( \frac{1}{C_2} + \frac{1}{C_3} \right) \right] + \frac{1}{C_1C_2R_1R_2R_3} \left( \frac{1}{C_1R_1} + \frac{1}{C_1R_2} + \frac{1}{C_2R_2} + \frac{1}{C_3R_3} \right)}{s^3 + s^2 \left[ \frac{1}{C_1R_1} + \frac{1}{C_1R_2} + \frac{1}{C_2R_2} + \frac{1}{C_3R_3} \right] + s \left[ \frac{1}{C_1C_2R_1R_2} \times \left( \frac{1}{C_3} + \frac{1}{C_3} \right) \right] + \frac{1}{C_1C_2C_3R_1R_2R_3}}
$$

(4)
of the notch filter, shown in Fig. 10, can be predicted as:

\[
\begin{align*}
\omega_{\text{LP1}} &= 1/R_1 \times C_1 \\
\omega_{\text{LP2}} &= 1/[R_1 + (R_2/R_3)] \times C_3 \\
\omega_{\text{HP1}} &= 1/R_3 \times C_2 \\
\omega_{\text{HP2}} &= 1/(R_2/R_3) \times [C_2 + (C_3/C_1)]
\end{align*}
\] (5)

Assuming \( R_1 = R_2 = 2R, R_3 = R, C_1 = 2C, \) and \( C_2 = C_3 = C \) as the general choices for the twin-T notch filter implementation [33], the notch center frequency and the RC time constant can be expressed as:

\[
\begin{align*}
f_{\text{notch}} &= f_{\text{REF}} = 1/4\pi RC \\
RC &= 1/4\pi f_{\text{REF}}
\end{align*}
\] (6)

Substituting (7) in (5) and expressing it in frequency yields:

\[
\begin{align*}
f_{\text{LP1}} &= 0.5 \times f_{\text{REF}}, \quad f_{\text{LP2}} = 0.75 \times f_{\text{REF}}, \\
f_{\text{HP1}} &= 2 \times f_{\text{REF}}, \quad f_{\text{HP2}} = 4.04 \times f_{\text{REF}}
\end{align*}
\] (8)

The loop phase shift (\( \text{PS}_{\text{total}} \)) can thus be expressed as:

\[
\text{PS}_{\text{total}} = \pi/2 + \tan^{-1}(f/f_{\text{RC}}) + \tan^{-1}(f/f_{\text{LP1}}) + \tan^{-1}(f/f_{\text{LP2}}) + \tan^{-1}(f/f_{\text{HP1}}) + \tan^{-1}(f/f_{\text{HP2}}) - \tan^{-1}(f/f_{\text{HI}}) - \tan^{-1}(f/f_{\text{LD}})
\] (9)

where the first term is a constant contributed by the VCO pole, the second term is due to the phase shift from the RC passive low-pass filter, the third term to the sixth term are due to the phase shift from the poles of the notch filter, the seventh term is the cumulative phase shift due to zeros of the notch filter, and the last term is the effect of the phase shift due to the loop delay (discussed further). Three assumptions have been made to simplify (9). First, the zeros of the notch filter, \( f_{z1,\ldots,z4} \), are assumed out of the band-of-interest since the pole-zero cancellation is at a much higher frequency than both \( f_{\text{REF}} \) and \( f_{\text{RC}} \) (i.e., \( f_{z1,\ldots,z4} > f_{\text{REF}} > f_{\text{RC}} \)). Second, additional half clock delay due to the discrete-time and continuous-time conversion within the PLL [28] is neglected as the proposed type-I PLL is more resilient to loop delay owing to the usage of a continuous-time loop filter. Third, the delay from non-zero interconnect parasitic resistors and capacitances is in the order of few nanoseconds while the PLL bandwidth is only a few MHz [34] and thus this delay is neglected. To a first-order, it can be assumed that the PLL loop dynamics is only determined by the VCO and the loop filter. Further transistor-level simulations are needed to capture effect of higher order poles and zeros. The reader can also refer to [35] for more rigorous derivation and analysis of the loop delay in PLL.

To investigate the maximum \( f_{\text{UGB}} \) that can be achieved in type-I PLL when the twin-T notch filter is included, one can neglect the passive RC low-pass filter since it does not affect the functionality. From (9) and referring Fig. 13, the upper bound of \( f_{\text{UGB}} \) is observed to be \( 0.43f_{\text{REF}} \). For a phase margin of 45°, a \( f_{\text{UGB}} \) of 0.2\( f_{\text{REF}} \) is chosen. Fig. 13 plots the maximum phase margin obtainable based on \( f_{\text{UGB}} \) for a network quality factor of 0.25, i.e., a bandwidth of 4\( f_{\text{REF}} \) using (8). The notch filter cut-off frequencies, \( f_L \) and \( f_H \), can be further estimated by approximating individual pole frequencies \( f_{\text{LP1}}, f_{\text{LP2}}, f_{\text{HP1}}, \) and \( f_{\text{HP2}} \) as follows:

\[
\begin{align*}
f_L &= \frac{1}{\sqrt{1/f_{\text{LP1}}^2 + 1/f_{\text{LP2}}^2}} = 0.4167 \times f_{\text{REF}}, \\
f_H &= \sqrt{f_{\text{HP1}}^2 + f_{\text{HP2}}^2} = 4.5 \times f_{\text{REF}}
\end{align*}
\] (10)

For \( f_{\text{UGB}} = 0.1f_{\text{REF}} \), the maximum PM observed is 68° while the achievable PM is only 8° when \( f_{\text{UGB}} \) approaches 0.4\( f_{\text{REF}} \).

To further dampen the spur, an additional low pass filter can be inserted by introducing another pole before the reference frequency. As shown in Fig. 14, although all the poles contributed by the notch filter are outside the loop bandwidth, these pole frequencies are not high enough (>10\( f_{\text{UGB}} \)) to be classified as high-frequency poles. The corner frequency can be determined by starting with the choice of the desired \( f_{\text{UGB}} \) and the PM. Fig. 15(a) shows the relationship be tween \( f_{\text{RC}}, f_{\text{UGB}} \), and PM. Depending on the requirements posed by the specific applications, if spur outside the \( f_{\text{UGB}} \) is the primary concern, a moderate PM would induce sharper cut-off and better spurious attenuation [33]. However, a certain PM still needs to be maintained to prevent the loop from becoming unstable. Fig. 15(b) shows the flow chart which summarizes the design procedures that one can consider during the circuit implementation. Note that since \( f_{\text{UGB}} \) cannot be fixed as one varies the \( f_{\text{RC}} \), both the PM and \( f_{\text{UGB}} \) deviate from (9) as observed in Fig. 15 especially when \( f_{\text{RC}} \leq f_{\text{UGB}} \). Hence, the potential choices for \( f_{\text{RC}} \) is beyond \( f_{\text{UGB}} \) to have sufficient PM. The objective of Fig. 15(a) is to provide an initial value of the passive components to seed the closed-loop design flow in Fig. 15(b). More accurate estimation of \( f_{\text{UGB}} \) and PM can be performed using either behavioral simulation tools such as CppSim, or transistor-level simulation tools such as Cadence.

Concerning the phase noise influence, the added twin-T notch filter merely influences the in-band phase noise performance for our proposed TX because the loop bandwidth
is usually kept relatively small compared with the reference frequency (i.e., \( < \frac{1}{2} f_{\text{REF}} \) for Type-I PLL as compared to \( < \frac{1}{10} f_{\text{REF}} \) for Type-II PLL) [33].

C. Overall Spur Allocations, Contribution, and Minimization

Frequency multiplying TX benefits from low power consumption, however, it suffers from mixing spur. The overall spur contribution at the proposed TX output can be differentiated based on the spur due to the injection-locking technique or spur that is generated by the PLL loop. The spur due to the injection-locking technique can be expressed as (1). The term \( |N \times f_{\text{REF}} - f_{\text{free-running}}| \) in (1) is minimized by the proposed type-I PLL and thus \( \text{Spur}_{\text{INJ}} \) is suppressed. However, PLL itself also contributes spur from the control line ripple that occurs at the TX output offset by \( f_{\text{REF}} \). The PLL spur can be quantified as follows [9]:

\[
\text{Spur}_{\text{PLL}} = 20 \log\left( \frac{K_{\text{VCO}} \times \Delta V_{\text{ctrl}}}{2\pi \times f_{\text{REF}}} \right) \quad (11)
\]

where \( K_{\text{VCO}} \) is the VCO gain and \( \Delta V_{\text{ctrl}} \) is the voltage variation of the output voltage after the loop filter. In the proposed design, \( f_{\text{REF}} \) and \( K_{\text{VCO}} \) are kept constant and the spur magnitude mainly depends on \( \Delta V_{\text{ctrl}} \). In addition to the first-order low-pass RC filter, the introduction of twin-T notch filter helps in minimizing the ripple on \( V_{\text{ctrl}} \). The null frequency of the proposed twin-T notch filter is set to 20.33 MHz (equal to \( f_{\text{REF}} \)) by the combination of a high-pass and a low-pass filter so that the spur from the VCO control line is further suppressed. Fig. 16 shows the simulated FFT results of the \( V_{\text{ctrl}} \). Use of the notch filter results in 14.9 dB spur suppression at \( V_{\text{ctrl}} \). It is worth noting that Fig. 16 only captures the ripple reduction amplitude of the \( V_{\text{ctrl}} \) within the PLL loop. Other factors further impact the simulated performance including: 1) unmatched PCB traces, 2) coupling capacitance, 3) supply noise and imperfect grounding on the prototype boards, and 4) process variations. The simulation test bench in Fig. 16 is insufficient to capture all these factors comprehensively and as such these external factors impact the simulated spur suppression result resulting in deviations from the measured data.

Because the notch filter is comprised of passive components, its PVT variations should also be considered. Simulations are performed by varying the values of each component by \( \pm 30\% \) to emulate the possible maximum and minimum worst-case PVT variations. As shown in Fig. 17, at the lower extremity (i.e., with all the component values decreased to \(-30\%\)), all the poles are shifted to higher frequencies, having minimal impact on the spur. At the opposite extremity (i.e. with all the values at \(+30\%\) to the normal value), the larger components values shift the poles lower than the reference frequency. As the center frequency of the notch filter is not located exactly at the reference frequency, the spur rejection is reduced by a few dB In the current version of this prototype, we addressed the PVT issues by placing multiple resistors (with larger dimensions) in parallel to ensure better matching. Also,
dummies were added to further ensure improved matching among all the resistors in the twin-T notch filter. A similar approach has been adopted for the MIM capacitors to retain component value accuracy. To mitigate the PVT effect, both resistor and capacitor elements should be tuned similarly to the bit tuning in RO coarse frequency adjustment.

As shown in Fig. 18, the spurious tones in our overall TX are located at an offset of $\pm n \cdot f_{\text{REF}}$ and $\pm m \cdot f_{\text{RO}}$ away from where $n$ and $m$ denote an integer number of harmonics. The spur contributions which appear at $\pm 1 \cdot f_{\text{RO}}$ ($m = 1$) can be contributed by the RO combined with the fifth harmonic of the reference frequency. The harmonic spurs due to the reference frequency can be considered negligible as it is not only suppressed by the ECPA matching network. However, the harmonics from the RO contribute most of the mixing spurs. Admittedly, the RO harmonics are filtered only by the LC matching network which is not sufficient. However, the RO harmonics appear at an offset of $f_{\text{RO}}$ ($m = 1$) away from the carrier frequency, and hence can be classified as out-of-band spurs (considering the ISM band is between 902 MHz $\sim$ 928 MHz). Additionally, the $f_{\text{carrier}} \pm 1 \cdot f_{\text{RO}}$ spurs can be easily removed by including additional off-chip notch filters as these spurs are far away from the carrier signal. This is in contrast to the reference spurs at $f_{\text{carrier}} \pm 1 \cdot f_{\text{REF}}$ which are close to the carrier signal and require on-chip suppression methods.

### IV. Proposed HILRO TX Architecture

The transistor-level schematic for the proposed TX is shown in Fig. 19 [21]. It consists of a RO for frequency generation, a PG for the sub-harmonic injection, and a type-I PLL loop for error correction. Each RO delay cell has 7-bit coarse tuning with forward body-bias fine-tuning ($V_{\text{ctrl}}$) that sets the RO frequency and the tuning range within the ISM band range. The body-biased tuning is chosen for its simplicity in terms of power consumption and area penalty. Although body-biased tuning is limited by its tuning range, it can be overcome by the coarse tuning realized by changing the loading capacitor of each delay cell. The number of required bits is determined during the design stage by observing the frequency variation across PVT. Among the nine phases of the RO output, $\Phi_0$ and $\Phi_3$ observe different capacitive load compared with the rest of 7 phases which leads to a mismatch in the proposed RO. This mismatch is remedied by adding additional routing metal between each delay cell connection.

In the injection path, since the proposed system includes two loops, the locked condition of the two loops might not be met for every scenario. Hence, a tunable delay ($\Delta t$) is placed before the pulse generator (PG) to ensure correct injection locking and accommodate a finite phase error in the injection-locked path. The delay block can be designed by adopting a delay-locked loop (DLL). However, extra power consumption
and increased silicon area makes the DLL less attractive for the proposed application in terms of the limited power budget. Besides, it has been proven in [26] that a fixed delay before the pulse generator is sufficient for most of the ULP applications.

The loop filter comprises an RC filter and a twin-T notch filter described earlier. Rather than employing a programmable clock divider, a fixed-ratio clock divider [26] is implemented along with flip-flops and NAND gates for this proof-of-concept prototype. After the phase-error correction, each RO output feeds into a buffer chain. The buffer chain consists of series-connected inverters that not only helps to isolate the ECPA and the RO (which makes the RO frequency less susceptible to the ECPA switching) but also act like a driver stage of the ECPA sharpening its rising and falling edges on the ECPA and thus, increasing the ECPA power efficiency.

Fig. 20 shows the simulated behavior of two consecutive output phases of the RO. The weak driving capability results in large short-circuit current and thus it is important to be minimized to reduce dynamic power consumption in the ECPA. The short-circuit current between the two phases is minimized by buffer insertion leading to high power efficiency in the ECPA. Note that the ripples in Fig. 20(a) are due to the bondwire and board parasitics. These ripples also transfer to the output but are filtered by the high-quality factor of the off-chip output matching network and are therefore not a significant concern. The bondwire model is shown in Fig. 20(b) which includes pad, QFN packages, and PCB trace modeling. The reader can refer to [37] for details. Although the model is not as accurate as that obtained from EM simulation tools, it provides a first-order estimate for modeling and circuit design.

Also, OOK modulation is applied in this buffer stage. Except for the first inverter in the buffer stage whose PMOS transistor is directly connected to the supply, the supply for the other three inverters is connected through a PMOS transistor to the supply as shown in Fig. 19. The data (i.e., Din) is first fed into an inverter and then applied to the gate of the PMOS transistor. By controlling the mode (i.e., conduction/cut-off) of the PMOS transistor, the output of the buffers can be either 0 (low) or 1 (high) based on the data input.

The edges of the multi-phase buffered output are combined in the ECPA that enables the frequency translation. Each ECPA unit consists of two transistors in series that performs a digital-AND operation. Combining 9 sets of ECPA units, the translation to 9X of the RO frequency is achieved generating the desired carrier signal at 915 MHz. While a higher ratio of frequency translation is possible by including more ECPA units in parallel, a higher number of units increase the routing complexity, and consequently, the mismatch between cells inducing additional spurious tones in the output. The off-chip matching network is realized using a tapped-C structure that transforms the output impedance of the ECPA to 50Ω. The matching network also serves as a band-pass filter that rejects the reference spurs and their harmonics. On-chip matching with two elements is also possible. However, due to the limited network quality factor that can be achieved, a more out-of-band signal is captured, thus degrading the spectral purity.

Fig. 21. Chip micrograph of the proposed HILRO TX.

Fig. 22. Measured RO tuning range.
155 MHz which is sufficient coverage for the targeted ISM band. Fig. 23 presents the measured output return loss at the output of the TX after the matching network. We realize $<-10$ dB return loss over the band of interest.

Fig. 24(a) shows the measured spectrum comparison for the proposed TX in both free-running (pink curve) and locked mode (blue curve) by enabling or disabling the reference clock signal. A 4 MHz lock range is captured during measurement while delivering an output power of $-14$ dBm. The noise of the free-running and locked TX output is also shown in Fig. 24(b). The close-in phase noise is suppressed within the 80 kHz bandwidth. The $-82.3$ dBc/Hz phase noise shown in Fig. 24(b) is measured at the TX output (not at the RO output). Due to the frequency translation, additional $19.085$ dB ($\approx 20\log(9)$) degradation is expected. The estimated phase noise of the RO is thus $-101.39$ dBc/Hz at 1 MHz offset. The injection locking system performs almost the same as a first-order PLL [36], therefore, the close-in phase noise follows the phase noise of the reference clock and the far-out phase noise follows the phase noise of the free-running VCO. The 3dB frequency is determined by the PLL bandwidth, while in the injection locked system it is determined by the injection bandwidth, which is related to the injection strength. The injection strength is mainly affected by the amount of charge transferred to the oscillator during the injection period. As a result, the stronger the injection strength, the wider the injection bandwidth. However, there is a direct trade-off between the injection strength and the reference spur where the reference spur becomes larger as the injection strength increases. Therefore, in our design, as the OOK modulation scheme does not have a stringent phase noise requirement, we target a lower injection bandwidth (injection strength) to minimize the reference spur. Fig. 25 shows the improved spur suppression in the proposed TX with the phase-error correction over 200 MHz at the center frequency of 915 MHz. Based on the harmonic allocation, the nearest spurs are 20.33 MHz apart (i.e., at 894.67 MHz and 935.33 MHz) from the main tone. The proposed notch filter improves the spur suppression by 7 dB with the PLL locked. The spur suppression without the PLL couldn’t be measured independently in the current current setup. The reference spur with and without the notch filter was measured from 4 samples. As shown in Fig. 26, the worst case spur suppression is 62.1 dB with the best case being 63.7 dB showing the robustness of the proposed architecture. Fig. 27(a) shows the output time-domain waveform with 126 mV pk–pk swing (equivalent to $-14$ dBm output power) recorded after
Table I: Performance Summary and Comparison With State-of-the-Art Sub-GHz TXs

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Freq (MHz)</td>
<td>401</td>
<td>402</td>
<td>401-428</td>
<td>915</td>
<td>918</td>
<td>915</td>
<td>480</td>
<td>915</td>
</tr>
<tr>
<td>Spur suppression method</td>
<td>Open-loop</td>
<td>Mismatch-free IL freq. multiplier</td>
<td>Digital Freq. Calibration</td>
<td>None</td>
<td>Off-chip FPGA</td>
<td>Passive-Freq. Tripler</td>
<td>None</td>
<td>PLL</td>
</tr>
<tr>
<td>Key components</td>
<td>RO + ECPA</td>
<td>IL freq. multiplier + PA</td>
<td>Frac-N PLL w/ RO + ECPA</td>
<td>Power LC oscillator</td>
<td>FPGA + RO + ECPA</td>
<td>Frac-N PLL + Tripler + Class D PA</td>
<td>PLL+RO+EC+PA</td>
<td>Type-I PLL w/ RO + ECPA</td>
</tr>
<tr>
<td>P_{out} (dBm)</td>
<td>-17</td>
<td>-16</td>
<td>-13</td>
<td>-1.64(^{+})</td>
<td>-10/-15</td>
<td>5.5</td>
<td>-20</td>
<td>-14</td>
</tr>
<tr>
<td>Modulation</td>
<td>BFSK</td>
<td>OOK</td>
<td>BFSK/QPSK</td>
<td>Sparse PPM</td>
<td>BFSK</td>
<td>BFSK</td>
<td>BFSK</td>
<td>OOK</td>
</tr>
<tr>
<td>Carrier-to-spur ratio (dB)</td>
<td>44.4</td>
<td>52.2</td>
<td>Not reported</td>
<td>Not reported</td>
<td>Not reported</td>
<td>49.6</td>
<td>40(^{+})</td>
<td>62.1</td>
</tr>
<tr>
<td>P_{dc} (µW)</td>
<td>90</td>
<td>215</td>
<td>4060/4080</td>
<td>2000</td>
<td>935/620</td>
<td>11100</td>
<td>170/180</td>
<td>258.6(Peak)(^{+})</td>
</tr>
<tr>
<td>Data rate (kb/s)</td>
<td>200</td>
<td>250</td>
<td>550/10000</td>
<td>30.3</td>
<td>3000</td>
<td>100</td>
<td>1000/10000</td>
<td>3000</td>
</tr>
<tr>
<td>Energy efficiency (pJ/bit)</td>
<td>450</td>
<td>860</td>
<td>7420/370</td>
<td>66010</td>
<td>311/200</td>
<td>111000</td>
<td>170/18</td>
<td>66.97</td>
</tr>
<tr>
<td>FoM(^{2})(mJ/bit *µm/W)</td>
<td>22.5</td>
<td>34.238</td>
<td>148.4/7.4</td>
<td>96.28</td>
<td>3.1/6.325</td>
<td>31.284</td>
<td>17/1.8</td>
<td>1.674</td>
</tr>
<tr>
<td>Area (mm(^{2}))</td>
<td>Not reported</td>
<td>0.643(^{+})</td>
<td>2.676(^{+})</td>
<td>0.4528(^{+})</td>
<td>0.33</td>
<td>0.2168</td>
<td>0.6413(^{+})</td>
<td></td>
</tr>
<tr>
<td>Tech. (nm)</td>
<td>130 (CMOS)</td>
<td>65 (CMOS)</td>
<td>180 (CMOS)</td>
<td>180 (CMOS)</td>
<td>180 (CMOS)</td>
<td>55 (CMOS)</td>
<td>130 (CMOS)</td>
<td>180 (CMOS)</td>
</tr>
</tbody>
</table>

\(^{1}\)Estimated by the data provided in [9]; \(^{2}\)FoM\(^{2}\)= P_{dc}/(Data Rate \times P_{out}); \(^{+}\)Area includes system interfaces, RX, baseband controller and timer; \(^{*}\)Out-of-band spur; \(^{\circ}\)Clock is provided by off-chip instrument.

Fig. 27. (a) Measured transient output waveform, and (b) measured transient output applying 3Mb/s OOK modulation.

The measured power consumption with OOK modulation is 200.6µW at a frequency of 915 MHz. Because the modulation is applied at the buffer stages located after the injection locked type-I PLL, the proposed technique might not help in reducing the spurious tone that is induced by the OOK modulation data rate frequency (i.e., 1.5 MHz). A possible solution that might mitigate this issue is digitally shaping the input OOK data as shown in [4] and [39]. In [4] and [39], a raised-cosine pulse-shaping with an upsampling technique was utilized which can push the modulation spur location of the modulated frequency away from the carrier frequency depends on the upsampling ratio, thereby improving the spectral purity.

Table I compares the proposed work with state-of-the-art standard-compliant sub-GHz TXs. The entire TX consumes only 258.6 µW with 114.46 µW for ECPA, 60.3 µW for RO, 5 µW for PG, 67.5 µW for buffers, and 10.86 µW for PLL comprising the clock divider and the PD. We report the lowest power consumption with 3X improved energy-efficiency while consistently achieving > 62 dB spur suppression. The proposed digital-intensive TX architecture achieves a compact area decoupling the limitations from threshold voltage and overdrive voltage in conventional TX architectures. Even though the proposed TX is implemented in 180 nm CMOS process, it is easily scalable to sub-nanometer CMOS technologies. Besides, the power consumption and the area will also greatly benefit from process scaling in sub-GHz TXs.
VI. CONCLUSION

A digital-intensive ULP 915 MHz TX with improved spur suppression using a self-aligned type-I PLL in a harmonic injection-locked RO based TX is demonstrated for sub-GHz IoT applications. The type-I PLL corrects the phase-error due to the frequency deviation of the injection-locked technique while also simultaneously achieving high energy-efficiency in the frequency translating architecture. The control voltage ripple due to the PLL loop is suppressed using a twin-T notch filter by more than 14 dB and consequently, suppresses the PLL induced reference spur. The active power consumption without any modulation is only 258.6 μW with −14 dBm output power. Under OOK modulation, a 3 MB/s data rate is demonstrated with an energy-efficiency of 66.97 pJ/bit. The average power consumption can be further minimized with aggressive duty-cycling as well as technology scaling. Benefiting from the digital-intensive architecture, the proposed TX will thus be suitable for sub-GHz low-power IoT wireless sensor network applications.

REFERENCES


Chung-Ching Lin (Student Member, IEEE) received the M.S. degree in communication engineering from Yun Ze University, Taoyuan, Taiwan, in 2014. He is currently pursuing the Ph.D. degree with Washington State University, Pullman, WA, USA. His current research interests include low-power and wideband multi-antenna transceivers design. He was a recipient of the IEEE CICC Educational Grants Award in 2020, the IEEE CAS Travel Award in 2019, the Southern Methodist University Graduate Student Travel Grant in 2018, and the Yu-Ziang Academic Scholarship in 2013. He is also the IEEE RFIC Symposium Best Student Paper Award Nominee (out of 12 finalists) in 2020.

Huan Hu (Student Member, IEEE) received the B.S. degree in electrical engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 2013, and the M.S. degree from Oregon State University, Corvallis, OR, USA, in 2015. He is currently pursuing the Ph.D. degree in electrical engineering with Washington State University, Pullman, WA, USA. His research interests include ultra-low-power sensor interface designs, clock generation, and subthreshold circuit designs. He was a recipient of the IEEE RFIC Symposium Best Student Paper Award Nominee (out of 12 finalists) in 2020.

Subhanshu Gupta (Senior Member, IEEE) received the B.E. degree from the National Institute of Technology (NIT), Tiruchirappalli, India, in 2002, and the M.S. and Ph.D. degrees from the University of Washington in 2006 and 2010, respectively. He is currently an Assistant Professor with the School of Electrical Engineering and Computer Science, Washington State University. From 2011 to 2014, he was with the RFIC Group, Maxlinear Inc., where he worked on silicon transceivers and data converters for wireless communication radios. His research interests include time-based circuits and systems, ultra-low-power/wideband transceivers, and stochastic hardware optimization techniques. He was a recipient of the National Science Foundation CAREER Award in 2020, the Analog Devices Outstanding Student Designer Award in 2008, and the IEEE RFIC Symposium Best Student Paper Award (third place) in 2011. He has served as a Guest Editor for the IEEE TRANSACTIONS OF CIRCUITS AND SYSTEMS—I: REGULAR PAPERS and the IEEE Design and Test Magazine in 2019. He is serving as an Associate Editor for the IEEE TRANSACTIONS OF CIRCUITS AND SYSTEMS—I: REGULAR PAPERS from 2020 to 2021.