# Improved Performance Tradeoffs in Harmonic Injection-Locked ULP TX for Sub-GHz Radios

Chung-Ching Lin<sup>(D)</sup>, Student Member, IEEE, Huan Hu, Student Member, IEEE, and Subhanshu Gupta<sup>D</sup>, Senior Member, IEEE

Abstract-This article presents a digital-intensive transmitter (TX) architecture for ultralow-power (ULP) wireless communication telemetry applications. A Type-I correction loop is proposed to adjust the frequency of the voltage-controlled oscillator (VCO) and enable the use of harmonically injectionlocked (IL) technique without significant spurious tones. Two TXs implementations with different loop filters in the Type-I phaselocked loop (PLL) are demonstrated to suppress the voltage ripple at the VCO input. The proposed implementation not only minimizes the spurs from the correction loop but also overcomes the performance tradeoff between spur suppression, phase margin, and maximum achievable bandwidth. Prototyped in the standard 180-nm CMOS process, the two proposed TXs achieve an energy efficiency of 66.97 and 12.5 pJ/bit, respectively, and occupy an active silicon area of 0.0413 and 0.0435 mm<sup>2</sup>, respectively, while delivering the same output power of -14 dBm with >60-dB in-band spur suppression.

Index Terms-Harmonically injection-locked (IL) technique, spur suppression, sub-GHz radios, Type-I phase-locked loop (PLL), ultralow-power TX.

## I. INTRODUCTION

**THE** dramatic increase in the number of wireless devices and data volumes expected in the next decade for Internet-of-Everything (IoE) applications has created countless opportunities and applications in multiple regimes, such as personal health care monitoring, outdoor long-term sensing, and holistic smart factory solutions. This has necessitated the implementations of various types of ultralow-power (ULP) transmitters (TXs) since they usually dominate the power consumption and energy efficiency in the overall device. Generally, ULP TX is operated at industrial, scientific, and medical (ISM) bands at 433/915/2400 MHz, which involves several tradeoffs between power consumption, coverage, and passive component size. The 2400-MHz devices benefit from compact passive components but suffer from higher power consumption and path loss. Also, multiple standards (i.e.,

Manuscript received November 2, 2020; revised January 18, 2021; accepted February 4, 2021. This work was supported in part by the Washington Research Foundation (WRF) and in part by the Center for Design of Analog-Digital Integrated Circuits (CDADIC). This article is an expanded version from the 2020 IEEE RFIC Symposium, Los Angeles, CA, USA, August 4-6, 2020. (Corresponding author: Subhanshu Gupta.)

The authors are with the School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99163 USA (e-mail: chung-ching.lin@wsu.edu; huan.hu@wsu.edu; subhanshu.gupta@ wsu.edu).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TMTT.2021.3064056.

Digital Object Identifier 10.1109/TMTT.2021.3064056

Power consumption (µW) Chip1 [RFIC] [13] [14 Category-1 [6]◆ [5] 10<sup>2</sup> 10 10 10<sup>3</sup> 10<sup>4</sup> Data rate (kb/s)

Categor

Fig. 1. Summary chart illustrating power consumption and data rate for sub-GHz ULP TX in recent years (2011-2020).

Wi-Fi, Bluetooth (BLE), and Zigbee) with countless devices already operate at the 2400-MHz band, which makes this band congested and interference-limited. In contrast, a 433-MHz device consumes less power and has lower path loss but suffers from bulky passive components. Among the three bands, the 915-MHz band provides a good balance between antenna size, power consumption, and path loss.

Fig. 1 shows the trend summary for sub-GHz ULP TX in recent years, including this work. The implemented TXs can be roughly classified into two main categories: low data rate with absolute low power consumption [1]-[16] and high data rate with high energy efficiency (i.e., pJ/bit) [17]-[25]. Both categories are highly application-specific and, hence, have adopted different design methodologies [12]. To minimize the power consumption for the first category, both the categories require many necessary blocks to be optimized, thus hindering the selection of universal architecture and operating frequency. Also, low complexity modulation schemes, such as ON-OFF keying (OOK) and binary frequency shift keying (BFSK), are adopted to further power savings. On the other hand, the latter category devices target high energy efficiency by adopting higher order modulation schemes, such as quadrature phase shift keying (QPSK) and quadrature amplitude modulation (QAM), consuming typically a few mW, which poses additional challenges to the power management unit [12].

The objective of this proposed work is to design a TX, which is at the intersection of the aforementioned categories and can support low-to-medium data rates (kB/s to few MB/s), while operating at low power consumption. We consider both architecture-level selections and circuit-level approaches that apply to various types of IoE applications. This work will

0018-9480 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

expand on the authors' recent published works (Chip1 in Fig. 1) demonstrated in [15] and [16] where the phase-locked loop (PLL)-calibrated harmonic injection-locked (IL) TX architecture was first introduced.

Benefiting from the frequency multiplying architecture and digital intensive correction loop, the undesired spur is removed with relaxed hardware complexity. A fundamental bottleneck in [15] and [16] lies in the tradeoff between the maximum spur suppression possible and the phase margin (PM) of the phase correction loop. Though achieving >60-dB spur suppression with minimum power dissipation, this bottleneck limits the application of this loop at the expense of low PM leading to TXs with slower settling behavior. We overcome this bottleneck through an innovative loop filter in the phase correction loop (Chip2 demonstrated as the new contribution in this work). In this process, we demonstrate a new TX implementation and compare it with the implementation in [16]. The two TXs consume only 200.9 and 223.12  $\mu$ W at 3- and 20-Mb/s data rates, respectively.

This article can be summarized as follows with contributions compared to [15] and [16] and recent state of the art. Section II presents a detailed analysis of frequency multiplying harmonic injection-locked ULP TX focusing on the tradeoffs between spur suppression, bandwidth, and PM. Section III briefly reviews and discusses system design considerations for proposed frequency multiplying TXs operating at sub-GHz using an injection-locking technique with self-aligned PLLs. A new loop filter design is proposed and implemented to overcome stability tradeoffs with bandwidth in [15] and [16]. Section IV describes the detailed design of circuit components for both TXs. Measurement results with validation over multiple parts to demonstrate robust performance with a comparison to state of the art are also provided in Section V. Finally, Section VI concludes this article.

## II. REVIEW OF ULP FREQUENCY MULTIPLYING TXS AND LIMITATIONS OF PHASE CORRECTION LOOP

This section briefly reviews different categories of ULP TXs followed by the review of spur generation mechanism in harmonic injection-locked TXs, its mitigation using Type-I PLL-based phase-correction loop, and the analysis of tradeoffs between spur suppression with the closed-loop bandwidth and PM.

#### A. Review of Sub-GHz ULP TXs

The ULP TX architectures can be generalized into four categories, as shown in Fig. 2(a)-(d). A summary of these architectures is provided here for the benefit of the readers. Interested readers can find a more detailed comparison of different sub-GHz ULP TXs in [15].

As shown in Fig. 2(a) and (b), direct-conversion and polar architectures are widely used in wireless communications at the expense of fairly large power consumption to maintain low mismatch or phase error. Their adoption in ULP TXs is hindered because all the blocks are operated at the carrier frequency. In contrast, inductive tank-based voltage-controlled oscillator (VCO) TX [see Fig. 2(c)] is attractive due to its



Fig. 2. Classification of ULP TX topologies (adapted from [15]). (a) Direct conversion TX. (b) Polar TX. (c) VCO-Based TX. (d) Frequency multiplying TX.

simplicity. However, the passive components are large at sub-GHz frequencies. Furthermore, the TX operates at the carrier frequency, thereby leading to high power consumption. Frequency multiplying TX [see Fig. 2(d)] is a digital-intensive architecture, which minimizes the necessary blocks operating at the carrier frequency, thus achieving lower power consumption. If the frequency generation in injection-locked TXs can be realized with precision and low spurious emissions, this technique provides the most energy- and area-efficient method. Recent advances, such as multipoint injection-locked [26] and subharmonic injection-locked techniques [27], have stemmed from the injection-locked technique to either increase the lock range or reduce the required power consumption for the reference clock generation, respectively. However, open-loop frequency adjustment of the VCO [26] or usage of analogintensive blocks [27] limits their application in ULP TXs considering both area and power consumption.

#### B. Injection-Locked Sub-GHz ULP TXs

In the ideal case, as illustrated in Fig. 3(a), the injectionlocking technique does not generate any additional spurious components as there is no frequency difference between the oscillator (VCO) and the injected signal. In practice, however, the VCO free-running frequency is always drifting, as shown in Fig. 3(b), especially with the process, voltage, and temperature (PVT) variations. This effect is further magnified with the subharmonic injection-locked technique.

Fig. 3 further illustrates the ideal and nonideal phenomena in both frequency and time domains for the case of a subharmonic injection-locked technique with a multiplication factor of 5. In the ideal case [as shown in Fig. 3(c)], the VCO free-running frequency ( $f_O$ ) is equal (or extremely close) to N times the reference frequency ( $f_{REF}$ ). As a result, the spurious tone is minimized at the offset of the carrier frequency,  $\Delta f_{REF}$ , after frequency multiplication. Similarly, the time-domain analysis shown in Fig. 3(e) has no timing error as the injected signal (INJ) and the VCO signal (OUT) are in sync. Consequently, the spurs are also minimized when reflected in the frequency domain achieving a high carrier-tospur ratio (CSR).

LIN et al.: IMPROVED PERFORMANCE TRADEOFFS IN HARMONIC IL ULP TX FOR SUB-GHz RADIOS



Fig. 3. Spurious problem in IL frequency multiplying TX in frequency and time domains (adapted from [16]). (a) and (b) Frequency domain. (c) and (d) Frequency domain (after frequency multiplying). (e) and (f) Time domain.



Fig. 4. Architectures of (a) Type-I PLL and (b) Type-II CP PLL (integer-N).

Fig. 3(b) illustrates the scenario under real-world operations. Due to undesired PVT variations, the VCO phases always drift away creating frequency error,  $f_{ERR}$ , leading to spurious tones after the VCO is locked. These spurs also appear after frequency multiplication and appear around  $f_{OUT}$  degrading the output spectrum purity creating interference to nearby devices. The time-domain plot shown in Fig. 3(f) shows the jitter effect (dark gray) leading to spurious tones as the injected signal tries to "correct" the free-running VCO.

### C. Type-I PLL-Based Frequency Error Mitigation in ULP TX

Recently, Type-I PLL-based frequency error mitigation has gained traction due to its simplicity, compact area, and digitalfriendly architecture [29], [30]. As shown in Fig. 4(a), the simplest Type-I PLL comprises a VCO, a static CMOS logicbased clock divider, a phase detector (PD), and a passive loop filter. The PD can be as simple as an XOR gate. If the VCO is implemented with an RO, an energy-efficient, digital-intensive architecture can be realized. Though Type-I PLL benefits from low hardware complexity and relaxed bandwidth due to lack of a CP as in Type-II PLL, it suffers from poor spur performance due to the rail-to-rail PD output.



Fig. 5. Proposed TXs in the system-level diagram (adapted from [16]).



Fig. 6. Operating principles for the proposed ULP TXs (adapted from [16]).

In our proposed work, a Type-I phase correction loop is implemented along with the harmonic injection-locked technique to realize the ULP TX. Two implementations of a loop filter inside the Type-I correction loop are proposed to tackle the rail-to-rail PD output. The first design is the conventional Type-I loop that includes an *RC* low-pass filter (LPF) with a Twin-T notch filter in [15] and [16], achieving high efficiency with in-band spur suppression and occupying a compact area. The second prototype replaces the *RC* LPF with a discrete master–slave sampling filter (MSSF) to overcome the tradeoff between spurious suppression, loop bandwidth, and PM as in the first implementation. To further improve spectral purity, the same Twin-T notch filter is included. Detailed design analysis and comparison will be brought out in Section III.

## III. SYSTEM DESIGN CONSIDERATIONS FOR PROPOSED SUBHARMONIC INJECTION-LOCKED ULP TX

Fig. 5 shows the diagram for the proposed TX, which includes a pulse generator (PG), an RO, a Type-I phase correction loop, buffer stages, an edge-combining power amplifier (ECPA), and an off-chip matching network. The ECPA also serves as a frequency multiplier to translate the frequency by a factor of M(=9). This also enables the frequency generation unit (i.e., RO) at low frequency (=  $f_O/M$ ). Besides, the phase correction loop also operates at an even lower frequency  $(= f_O/(M \times N))$ , minimizing power consumption. The working principle of the proposed TXs is illustrated in the time domain in Fig. 6 (assuming N = 5). For the Nth harmonic IL system, the injected signal arrives every N cycle to correct the phase error and remains free running for the rest of the cycles (i.e., N-1 cycles), implying that the phase error accumulates during these N - 1 cycles. When the next injected signal arrives, since the accumulated phase error makes the VCO



Fig. 7. Power spectral density of the PD (XOR gate) output.

deviate from its ideal case [Out (w/o PLL)] in Fig. 6, the spur is induced as the injected signal tries to "correct" the VCO phase. In our proposed work, the main concept is to reset the phase error once between two injected signals, as shown in the timing diagram in Fig. 6. With the correction loop, the phase is aligned before the next injected signal, reducing the accumulated phase error compared with injected only structure. Therefore, the induced spur is minimized (approximately 6 dB ideally) since the control voltage of VCO is calibrated, and the spectral purity is also maintained. As analyzed in [15] and [30], these high-frequency components mainly comprise the reference frequency and its harmonics, as plotted in Fig. 7, using the reference frequency of 20.33 MHz. As illustrated in Fig. 7, the amplitude of the undesired signal (i.e., reference frequencies and its harmonics) are almost the same or even larger than the desired signal (i.e., dc). Because this undesired signal appears at the PLL output affecting the spectral purity, the undesired signal needs to be filtered out. It is also worth mentioning that, in Fig. 7, the second harmonic is higher than the first harmonic. The main reason is that the PD (i.e., XOR gate) generates the phase error twice per cycle [30].

Conventional Type-I PLL uses only a passive RC LPF. For filtering the high-frequency components, one can keep increasing the time constants of R and C. However, this shrinks the loop bandwidth affecting the loop step response.

As discussed in Section II, the reference frequency and its harmonics dominate the overall spurious tone contribution. An ideal way to suppress the aforementioned frequency components is to insert a circuit that only dampens the undesired signal while preserving the desired one, which can be captured as the control voltage of the oscillator. More importantly, the anticipated power consumption and the occupied silicon area also need to be considered.

Based on the architecture in Fig. 5, we first proposed a loop filter design using a conventional passive filter along with a Twin-T notch filter to suppress the spurious tones [15], [16]. However, this causes a performance tradeoff, as will be discussed in Section III-A. The second implementation follows the first work by introducing switched-capacitor filters that decouple the above tradeoffs and achieve an energy-efficient implementation with low hardware design complexity.

In the following, we focus mostly on the analysis of the second implementation. Interested readers can refer to [16] for an in-depth analysis of spur suppression in the first implementation.



Fig. 8. Schematic of the Twin-T notch filter.



Fig. 9. Tradeoff between spur and PM in Chip1 regarding LPF pole placement.

#### A. Chip1: RC LPF With Twin-T Notch Filter [16]

In the first approach demonstrated by the authors first in [16], an additional filter is applied in series to a band stop filter in the Type-I PLL. Fig. 8 shows the dual-path bandpass filter (BPF), known as Twin-T notch filter [32], adopted to suppress the undesired signal. The Twin-T filter comprises two paths. The two  $2R_{BSF}$  resistors and  $2C_{BSF}$  capacitors form the low-pass path, while the rest of the components forms the high-pass path. By selecting the values of resistors and capacitors, the null center frequency ( $f_{null}$ ) can be determined as [32]

$$f_{\text{null}} = f_{\text{REF}} = \frac{1}{4\pi \times R_{\text{BSF}} \times C_{\text{BSF}}} \tag{1}$$

where  $R_{BSF}$  and  $C_{BSF}$  denote the resistor and capacitor used for the design. The Twin-T notch filter formed by resistors and capacitors only integrates readily with the Type-I loop considering the limited budget for both area and power. The Twin-T filter contributes to poles compared to placing zeros using *LC* notches in [30]. However, pole locations need to be considered for stability considerations. The pole locations can be normalized to reference frequency ( $f_{REF}$ ) as [15]

$$f_{\rm LP1} = 0.5 \times f_{\rm REF}, \quad f_{\rm LP2} = 0.75 \times f_{\rm REF}$$
  
 $f_{\rm HP1} = 2 \times f_{\rm REF}, \quad f_{\rm HP2} = 4.04 \times f_{\rm REF}$  (2)

where  $f_{\text{LP1}}$  and  $f_{\text{LP2}}$  denote the LPF pole, while  $f_{\text{HP1}}$  and  $f_{\text{HP2}}$  represent the high-pass filter pole. As shown in Fig. 9, the *RC* LPF location plays a critical role in spur suppression along with PM. As the locations of the LPF are close to the origin, the spurious tone is suppressed at the expense of the lower PM. On the other hand, the PM is improved as the LPF pole is far away from the origin. Also, the spur suppressed the effect of the LPF becomes useless as the pole location is beyond  $f_{\text{REF}}$ . Therefore, a distinct tradeoff is placed here for the Chip1 implementation.

107





Fig. 11. Magnitude response of the MSSF and the RC LPF.

#### B. Chip2: MSSF With Twin-T Notch Filter

RC low pass filte Master-slave sampling filte

-10

-20

-30

-40

10

Magnitude (dB)

The second approach replaces the passive RC LPF with a discrete-time filter. Discrete-time filters have been proven useful in Type-II CP PLL over continuous-time filters for spur reduction [33]. Recently, Kong and Razavi [30] proposed a Type-I PLL using a modified discrete-time filter called an MSSF. Fig. 10 shows the schematic of the MSSF. Because  $\varphi_1$ and  $\varphi_2$  are implemented as nonoverlapping clocks, there is no "direct" connection between the PD output and the VCO input, and hence, isolation is achieved. Furthermore, it also features a larger frequency capture range, between  $2N f_{\text{REF}}/3$  and  $2N f_{\text{REF}}$ , where N denotes the division ratio [30]. Here, a brief comparison of LPF and MSSF is provided. Both the implementations have low complexity with the exception that MSSF requires a nonoverlapping clock generator. The most significant difference can be observed from Fig. 11 since the MSSF offers wider bandwidth as the equivalent pole is pushed away from the band-of-interest and better spur rejection because of its inherent sinc behavior. It is also noted here that the overall bandwidth improvement will be limited by the Twin-T notch filter, while the filtering benefit is still preserved. The details will be discussed in the following.

Based on reported results in [30] and [29] that employ MSSF in a Type-I PLL, approximately 40-dB suppression can be achieved. However, this suppression might be not enough to maintain the spectral purity of the sub-GHz TX. To solve this problem, the same Twin-T notch filter that is for Chip1 is included after the MSSF for additional reference spur suppression within the band-of-interest.

In [15], a complete derivation of the transfer function and the first-order PM prediction are presented and discussed. However, a simple first-order observation is lacking especially after including the Twin-T notch filter along with the MSSF. Considering the charge sharing effect between two capacitors (i.e.,  $C_{MSSF1}$  and  $C_{MSSF2}$ ) and neglecting the notch effect



Fig. 12. PM against  $f_{\text{UGB}}$  estimation using (4) and first- and second-order Padé approximations for Fig. 13(d).

(due to the sinc function), the transfer function can be derived as [30]

$$H_{\text{MSSF}} = \frac{1}{1 + \left(\frac{C_{\text{MSSF2}}}{C_{\text{MSSF1}}} \times \frac{1}{f_{\text{CK}}}\right) \times s} \times e^{-j2\pi f \times \frac{T_{\text{CK}}}{2}}$$
(3)

where  $H_{MSSF}$  represents the MSSF transfer function, and  $f_{ck}$ and  $T_{ck}$  represent the  $\varphi_1(\varphi_2)$  frequency and time, respectively. It can be observed that pole location is determined by: 1) the ratio of two sampling capacitors and the reference frequency and 2) an additional time delay equal to half the reference clock cycle is incurred due to the sampling action of the MSSF. The phase response of the system with the time delay T (Laplace equivalent  $e^{-sT}$ ) is obtained from the original phase response (with no time delay) by shifting the phase back by  $\omega$ T. If the ratio of  $C_{MSSF1}$  and  $C_{MSSF2}$  is chosen large, its phase contribution can be neglected. Combining the effects from (2) and (3), the loop phase shift  $(PS_{total})$  can be expressed as

$$PS_{total} = \pi/2 + \pi \times f/f_{REF} + \tan^{-1}(f/f_{LP1}) + \tan^{-1}(f/f_{LP2}) + \tan^{-1}(f/f_{HP1}) + \tan^{-1}(f/f_{HP2}) - \tan^{-1}(f/f_{z1,...,z4})$$
(4)

where the first term is a constant contributed by the VCO pole, the second term is due to the phase shift from the exponential term, the third to sixth terms are due to the phase shift from the poles of the Twin-T notch filter, and the last term is the cumulative phase shift due to four zeros of the notch filter. As the zeros of the notch filter occur at much higher frequencies, their contribution will be neglected in estimating the total phase shift. Fig. 12 shows the estimated PM versus frequency to investigate the maximum achievable unity-gain bandwidth ( $f_{\text{UGB}}$ ). A 0.23  $f_{\text{REF}}$  is achieved for a maximum  $f_{\text{UGB}}$ , while a  $f_{\text{UGB}}$  of 0.11  $f_{\text{REF}}$  is desired for a PM of 45°. The PM estimation by modeling the loop delay using the Padé approximation is also presented. As observed, the estimation error is increased significantly as the frequency-ofinterest increases, which implies that higher order approximation is required. However, higher order approximation makes it difficult to obtain an intuitive estimation. Admittedly, the loop delay equivalently "shifts" the pole locations toward the lowfrequency region. However, the effective locations (after incorporating the shift in poles due to loop delay) are still far away from the possible pole locations with the RC LPF, relaxing



Fig. 13. Tradeoff between spur and PM for different types of loop filter implementations. (a) RC LPF only. (b) RC LPF with a Twin-T notch filter. (c) MSSF only. (d) MSSF with a Twin-T notch filter.

the tradeoff mentioned in the first approach. Note that the equivalent phase MSSF estimation is only applicable to phase response and not to the amplitude response. This phenomenon can be explained by modeling the time delay  $(e^{-sT})$  using first and second orders of Padé approximation [34]

$$e_{1.1}^{-sT} = \frac{1 - \frac{1}{2}sT}{1 + \frac{1}{2}sT}$$
(5)

$$e_{2.2}^{-sT} = \frac{1 + \frac{1}{2}sT + \frac{1}{12}(sT)^2}{1 - \frac{1}{2}sT + \frac{1}{12}(sT)^2}.$$
 (6)

Equation (5) indicates that there are one left-plane pole and one right-plane zero at the same location. This implies that the amplitude response remains unity (i.e., 0 dB) since the pole and zero cancel each other. However, the right-plane zero contributes to the same phase response as the left-plane pole leading to excessive phase degradation. It is also worth mentioning that the purpose of the previous estimation is to provide a fast and simple way to predict stability. Behavioral simulation tools, such as CppSim, or transistor-level simulation tools, such as Cadence, are, thus, highly recommended for accurate estimations.

The maximum  $f_{\text{UGB}}$  that can be obtained for both the approaches (Chip1/Chip2) using (4) is also of interest to investigate and compare. Fig. 13 shows the pole locations for the two approaches. A conventional Type-I comprises only an RC LPF, as shown in Fig. 13(a). Though the maximum achievable bandwidth is large, this type of architecture suffers from a spurious issue mentioned earlier. Thus, one can keep increasing the time constant of the LPF at the cost of a large silicon area and limited bandwidth [30]. A Twin-T notch filter inserted after the RC LPF overcomes this issue by suppressing the spurious components while achieving a maximum  $f_{\text{UGB-max}}$  of 0.18  $f_{\text{REF}}$ , as shown in Fig. 13(b). With only MSSF, the maximum  $f_{\text{UGB-max}}$  is increased to 0.5  $f_{\text{REF}}$ , as shown in Fig. 13(c). The same Twin-T notch is inserted after the MSSF for spur suppression, resulting in a maximum  $f_{\text{UGB-max}}$  of 0.23  $f_{\text{REF}}$ . As observed, the pole of MSSF is placed far away from the reference frequency, thus having an insignificant influence on the loop response. It can also be observed, comparing Fig. 13(b) and (d), that the

maximum achievable bandwidth is improved by replacing the *RC* LPF with the MSSF while preserving the same Twin-T notch filter. Next, spur suppression is discussed for the four cases in Fig. 13.

To evaluate the effectiveness of the two approaches, the power spectral density of the VCO voltage control line is presented in Fig. 13 corresponding to the four cases in Fig. 13. Fig. 14(a) shows only 39.54- and 24.7-dBc suppressions for the first and second harmonics (referred to as the desired signal at dc) with the RC LPF [referred to Fig. 14(a)]. After applying the Twin-T notch filter, additional 20.2- and 7.0-dB suppressions are observed for both the first and second harmonics, as shown in Fig. 14(b). Fig. 14(c) and (d) shows the comparison between MSSF only and MSSF in series with a Twin-T notch filter, respectively. As observed, 18.5- and 6-dB improvements on both harmonics are obtained. It is also worth comparing Fig. 14(a) and (c) since the MSSF notches the harmonics of the reference frequency due to the fundamental nature of the sampling itself. Hence, the suppression is better compared to RC LPF only. Similarly, comparing Fig. 14(b) and (d) shows that slightly larger achievable bandwidth alleviates the tradeoff compared with our first implementations. To conclude this analysis, these results show that the two proposed approaches are suitable for compact lowpower and robust sub-GHz ULP TXs. Furthermore, the second approach provides a stable phase correction loop that can suppress spur over the targeted band-of-interest.

Fig. 15 summarizes and compares the tradeoff for four cases combining the spur rejection from Fig. 14. The MSSF with a Twin-T notch filter [see Fig. 13(d)] does improve the PM by nearly 10° and the spur suppression by nearly 24 dB for the same unity-gain bandwidth. Moreover, the tradeoff is also alleviated with improved bandwidth and the use of digitalintensive architecture compared with Type-II architecture. If one requires a wideband PLL to suppress the VCO noise, the Type-I PLL serves as a better candidate compared with Type-II architecture (i.e., BW is usually less than 1/10  $f_{REF}$ ) due to its inherent wideband techniques. However, if the spur is of concern, the Twin-T notch filter can suppress the reference spur significantly though with a slight degradation in available bandwidth. One should pick the architecture (i.e., type, order,

LIN et al.: IMPROVED PERFORMANCE TRADEOFFS IN HARMONIC IL ULP TX FOR SUB-GHZ RADIOS



Fig. 14. Comparison of voltage control line power spectral densities corresponding to the loop filters in Fig. 13(a)–(d). (a) RC LPF only. (b) RC LPF with a Twin-T notch filter. (c) MSSF only. (d) MSSF with a Twin-T notch filter.



Fig. 15. Phase-margin, bandwidth, and spur suppression comparison for four cases in Fig. 14.

and loop filter) based on the targeted applications for optimal design.

#### C. Stability and Noise Analysis for Both Implementations

The phase-domain model of the proposed system is shown in Fig. 17. Unlike [35] and [36] where the injection-locked technique acts as the proportional path, while the PLL acted as the integral path, the lack of the integrator (i.e., charge pump) in our proposed system implies that both loops function as proportional loops. The effective system bandwidth (i.e.,  $BW_{TX}$ ) can be approximated by the combination of the bandwidth of two loops that are mentioned from [37]. The PLL sets the initial bandwidth, which can be further estimated using the analysis in Section III. Then, as the injection strength increases, the bandwidth of the injection-locked path increases, leading to improvement of the  $BW_{TX}$  [37].

To validate the analysis, the loop gain is the main interest for both bandwidth and stability. Different from the integer-N PLL analysis, two transfer functions representing the upconversion of reference noise  $[H_{rl}(s)]$  and phase realignment  $[H_{up}(s)]$  have been included in the linearized model, as shown in Fig. 16, for the injection lock mechanism.  $H_{rl}(s)$  and  $H_{up}(s)$  are expressed as [38]

$$H_{\rm rl}(s) = 1 - \frac{\beta}{1 + (\beta - 1) \times e^{-sT_{\rm REF}}} e^{-sT_{\rm REF}/2} \frac{\sin(\omega T_{\rm REF}/2)}{\omega T_{\rm REF}/2}$$
(7)

$$H_{\rm up}(s) = \frac{N \times \beta}{1 + (\beta - 1) \times e^{-sT_{\rm REF}}} e^{-sT_{\rm REF}/2} \frac{\sin(\omega T_{\rm REF}/2)}{\omega T_{\rm REF}/2}$$
(8)

where  $\beta$  indicates phase realignment ratio, N represents the divided (injection) ratio,  $T_{\text{REF}}$  is the reference clock period,



Fig. 16. Linearized phase noise model for harmonically injection-locked PLL [38].



Fig. 17. Loop gain and phase response of (a) *RC* LPF w/notch and (b) MSSF w/notch.

and  $\omega$  is the angular frequency.  $\beta$  ranges from 0 to 1, where  $\beta = 0$  represents pure PLL architecture (the injection-locked path does not exist), while  $\beta = 1$  indicates complete phase alignment.

In the following analysis, we have selected  $\beta = 0.2, 0.5$ , or 0.8. Here, the loop gain can be expressed as

$$LG(s) = K_{PD} \times H_{Loop filter}(s) \times \frac{K_{VCO}}{s} \times H_{rl}(s) \times \frac{1}{N}.$$
 (9)

Table I lists the critical parameters for modeling. Fig. 17 shows the magnitude and phase response of the loop gain, including the notch filters. As  $\beta$  is increased, the bandwidth and PM increase, while the loop gain is decreased, which is consistent with the conclusion drawn from [38]. Interestingly, the loop gain is flat rather than having a -20-dB/decade starting from

TABLE I Modeling Parameters

| Reference<br>frequency | 12.8 MHz         |                    |  |  |  |
|------------------------|------------------|--------------------|--|--|--|
| VCO gain               | K <sub>VCO</sub> | 300M rad/s         |  |  |  |
| Div/Inj ratio          | Ν                | 8                  |  |  |  |
| Phase detector         | K <sub>PD</sub>  | 2/ π               |  |  |  |
| RC LPF                 | Time constant    | 7×10 <sup>-8</sup> |  |  |  |
|                        |                  | (s)                |  |  |  |
| notch filter           | Null             | 12.8 MHz           |  |  |  |
|                        | frequency        |                    |  |  |  |
| MSSF                   | capacitor ratio  | 16                 |  |  |  |

the origin. This is because  $H_{rl}(s)$  behaves as a high-pass characteristic, which cancels the effect on the VCO pole (in Type-II PLL, the charge pump contributes another pole located at origin leading to -20 dB/decade). In summary, as indicated in [38] and from our observations, stronger injection increases the PM with lower loop gain which enhances the effective system bandwidth and stability.

For the noise analysis, we again refer to the linearized phase model in Fig. 16. The reference (REF) noise, divider (DIV) noise, and PD noise are all considered as input noise and denoted as  $\Theta_{\text{REF}}$  and  $\Theta_{\text{PD/DIV}}$ , respectively. Note that, in conventional integer-N PLL, both reference noise transfer function and the PD/divider transfer function are shared. However, in the injection-locked system, since an additional phase alignment function is introduced, the reference noise transfer function is different and needs to be considered separately. The essential noise-related transfer function can be derived as

$$\frac{\Theta_{\text{Out}}}{\Theta_{\text{RFF}}}(s) = \frac{K_{\text{PD}} \times H_{\text{LF}}(s) \times \frac{K_{\text{VCO}}}{s} \times H_{\text{rl}}(s) + H_{\text{up}}(s)}{1 + \text{LG}(s)} \quad (10)$$

$$\frac{\Theta_{\text{Out}}}{\Theta_{\text{PD/DIV}}}(s) = \frac{K_{\text{PD}} \times H_{\text{LF}}(s) \times \frac{K_{\text{VCO}}}{s} \times H_{\text{rl}}(s)}{1 + \text{LG}(s)}$$
(11)

$$\frac{\Theta_{\text{Out}}}{\Theta_{\text{VCO}}}(s) = \frac{H_{\text{rl}}(s)}{1 + \text{LG}(s)}.$$
(12)

Fig. 18 shows the simulated phase noise for the MSSF with a notch filter included. (Note that the phase noise of *RC* LPF with a notch filter is almost the same as Fig. 18; therefore, we only present one of the cases to explain). The phase noise of the free-running VCO is extracted directly from Cadence and then imported into MATLAB for estimation using the noise shaping transfer function, as shown in Fig 18. Because the noise transfer function model is shared between the injection-locked technique and the Type-I PLL [39], there is no significant transition point ideally compared with the Type-II injection-locked PLL topology, given that the bandwidth of PLL is greater than the injection-locked bandwidth. However, the relationship between transition frequency regarding the relation of  $\omega_{PLL}$  and  $\omega_{IL}$  is still valid, as stated in [27].

We bring the noise behavior of the MSSF [30] to develop the complete design procedure. Because of the sampling nature of MSSF, the study of the kT/C noise is important. As suggested in [30], a larger value of  $C_{MSSF2}$  reduces noise contribution. Therefore, combined with the stability consideration, the



Fig. 18. Simulated phase noise (MSSF w/notch).

overall process for designing the MSSF should be suggested as follows [30].

- 1) Choose  $C_{MSSF2}$  to satisfy the noise requirement.  $C_{MSSF1}$  is chosen to at least  $10 \times$  larger for pushing the effective pole away from the origin while maintaining a reasonable area budget for an initial value pick.
- 2) Based on the target application, select a Type-I or Type-II PLL. The Type-I PLL serves as a better candidate compared with Type-II PLL (i.e., BW is usually less than  $1/10 f_{REF}$ ) due to its inherent wideband techniques.
- For spur suppression, the Twin-T notch filter can be selected to effectively suppress the reference spur while sacrificing the available bandwidth.

## IV. IMPLEMENTATION OF SUBHARMONIC INJECTION-LOCKED TX WITH PHASE CORRECTION LOOP

This section describes the implementation details of the proposed ULP TX. As shown in Fig. 19(a) and (b), both designs share similar architectures described in [16] except the loop filter, the nonoverlapping clock generator, and the static CMOS clock divider (ratios of 5 and 8 for Chip1 and Chip2, respectively).

The schematic of the RO, including the delay cell, is shown in Fig. 20. The nine-stage single-ended structure is chosen for low power and compact area. Fig. 20(b) shows the schematic of the delay cell that is used in Fig. 20(a). The frequency tuning mechanism is achieved by using both coarse and fine-tuning.

The coarse tuning is realized using a 7-bit capacitor array to meet the required tuning range as well as provide overlapping areas to avoid any dead-zone after fine-tuning (implemented by body bias control). Admittedly, the limited tuning range of the body-bias method can be overcome with the coarse tuning method by increasing the resolution of the coarse capacitor.

Fig. 21(a) and (b) shows the schematic of the static clock divider with a division ratio of 5 [27] and 8, respectively. Fig. 21(c) shows the schematic of the nonoverlapping clock generator using NAND gates. It is also critical to keep the mismatch amongst all the RO phases low. Hence, the delay cell output load is compensated by adding extra routing metal on each connection and carefully verified using parasitic extraction tools, ensuring that the nine phases are matched



Fig. 19. Schematic of the proposed TXs. (a) Chip1 with RC LPF and Twin-T notch filter. (b) Chip2 with MSSF and Twin-T notch filter.



Fig. 20. (a) RO and (b) its delay cell implementation.



Fig. 21. Schematic of (a) divide by 5 clock divider, (b) divide by 8 clock divider, (c) nonoverlapping clock generator, and (d) PG.

to less than a femtofarad capacitance. Fig. 21(d) shows the design of the PG. A fixed time delay is placed before the PG to ensure that the injection happened within the tolerable



Fig. 22. Schematic of (a) buffer stages (buffer amplifier w/ data input) and (b) edge-combining PA (edge-combining PA).

range, as explained in [27]. All the circuits in Fig. 19 are implemented using standard cells, which shows the digitalfriendly and process scalability of the proposed architecture.

Fig. 22 shows the schematic of the driver stages and the ECPA. The data path is also shown in Fig. 17(a). The input OOK data are first applied to the data buffer, driven by a power gating transistor,  $M_1$ , which controls the supply of the last three buffers depending on the streaming OOK data. Besides, the buffer amplifier also helps to bring the signal from each of the RO outputs to the ECPA, not only relaxing the required drive strength of the RO but also helps in saving power.

After the buffer amplifier, each input with different phases is fed into the ECPA composed of nine NAND gates realized as pass-transistor-logic (PTL) connected in parallel. The ECPA serves a dual role of both the PA and the mixer that translates multiple low-frequency inputs into a single high-frequency carrier. In addition, the ECPA is the only block operating at the carrier frequency, while the rest of the blocks operate at 1/5th (Chip1) or 1/8th (Chip2) of the carrier frequency, thus minimizing the power consumption. Higher frequency



Fig. 23. Die micrograph of (a) Chip1 [15] and (b) Chip2.



Fig. 24. Measured free-running RO tuning range (shared for both Chip1 and Chip2).

multiplication ratios increase the layout complexity, power consumption, and mismatches. Hence, the ECPA ratio is chosen as 9 in this work. Following ECPA, an off-chip matching network achieves a 50- $\Omega$  match with the antenna. Care has been taken to model the parasitics of the bond wire, pad, and PCB traces during simulations [15], yielding a first-order design methodology to emulate the real test environment.

## V. MEASUREMENT RESULTS

The proposed ULP TXs have been designed and fabricated in 180-nm CMOS technology. Fig. 23 shows the chip micrograph with Chip1 and Chip2, occupying 0.493 and 0.54 mm<sup>2</sup>, respectively (including pads). An active area of Chip1 and Chip2 is 0.0413 and 0.0461 mm<sup>2</sup>, respectively.

Fig. 24 shows the measured tuning range for the RO varying the body-bias for fine-tuning the delay cell. Combined with the capacitive coarse tuning mechanism, the measured tuning range is from 75 to 155 MHz, demonstrating sufficient coverage for the ISM band frequencies between 902 to 928 MHz. As both the chips use the same RO, Fig. 24 represents the measured tuning range for both Chip1 and Chip2.

Fig. 25 also shows the measured output return loss of less than -10 dB for both the prototypes across the band-of-interest. It can be observed that the two curves have slightly different minimum values due to mismatches in the off-chip components and different PCB layouts.

Figs. 26 and 27 show the measured spectrum for both Chip1 and Chip2, improving the in-band spur suppression by 6.98 and 12.48 dB, respectively, when the Twin-T notch filter is enabled. Note that the reference clock frequency for Chip1 is 20.33 and 12.8 MHz for Chip2. Hence, Chip2 originally has



Fig. 25. Measured output return loss for (a) Chip1 and (b) Chip2.



Fig. 26. Measured spectrum for Chip1 with (a) Twin-T notch filter disabled and (b) Twin-T notch filter enabled.



Fig. 27. Measured spectrum for Chip2 with (a) Twin-T notch filter disabled and (b) Twin-T notch filter enabled.

slightly worse in-band spur suppression (51 dBc) with no Twin-T notch filter but can achieve similar performance to Chip1 once the Twin-T notch filter is enabled. Because the two implementations have different design parameters, it will be hard to interpret the results directly without some normalization. First, the higher injection ratio, N, of Chip2 raises the spur by 4.08 dB [16, eq. (1)], assuming that the frequency error term is the same between the two implementations. Second, a similar spur difference of 4 dB is observed due to higher  $f_{\text{REF}}$  [16, eq. (2)] (assuming the ripple amplitude is the same for simplicity). Third, because Chip1 has higher  $f_{\text{REF}}$  compared to Chip2, the spurious location is slightly far away from the carrier frequency, causing more filtering and leading to higher suppression by the matching network. The out-of-band spurs will be discussed in the following. These two figures clearly show the effectiveness of the proposed loop filter in suppressing the control line ripple, thus minimizing the output spurious tone to maintain the in-band spectral purity.

LIN et al.: IMPROVED PERFORMANCE TRADEOFFS IN HARMONIC IL ULP TX FOR SUB-GHz RADIOS



Fig. 28. Characterization of (a) off-the-self BPF and (b) measured S-parameter.



Fig. 29. Measured spectrum for large span (Chip2) disabling the Twin-T notch filter (a) without off-chip BPF and (b) with off-chip BPF.

We now analyze the results of out-of-band spurious emissions, as observed in Figs. 26 and 27. Unlike other TX architectures in Fig. 2, the frequency multiplying architecture suffers from multiple spurious tones around the carrier frequency contributed by the reference frequency, the RO frequency, and their mixing products. In general, the first and second harmonics have the highest influence within the band, while the rest can be attenuated by the matching network. Among other out-of-band spurs that are rarely investigated in previous works, the spurs due to the RO frequency contribute the most and cannot be suppressed using circuit techniques only as the carrier is generated by combining the RO phases itself. Hence, an additional matching network or filtering is required as demonstrated by using an off-chip BPF.

Fig. 28(a) shows the off-chip BPF that is characterized using S-parameter measurement, as shown in Fig. 28(b). The passband ranges from 895 to 935 MHz, providing around -3-dB loss, while the input/output return loss is maintained less than -15 dB across the band. The results show that the input/output return loss and passband gain (loss) meet our test requirements.

Figs. 29 and 30 compare the spurious performance of Chip2 with a Twin-T notch filter disabled and enabled over a large 500-MHz span with and without the BPF. As shown in Fig. 29(a), when the Twin-T notch filter is disabled, it can be observed that the RO-related harmonics dominate the out-of-band spurious performance. The off-chip BPF suppresses the out-of-band RO-related harmonics [see Fig. 29(b)]; however, the in-band spurs remain the same. Fig. 30(a) shows the measured spectrum over the same 500-MHz span with both the Twin-T notch filter and MSSF enabled and the output in



Fig. 30. Measured spectrum for large span (Chip2) enabling the Twin-T notch filter (a) without off-chip BPF and (b) with off-chip BPF.



Fig. 31. Measured phase noise of (a) Chip1 and (b) Chip2.

series with the off-chip BPF, respectively. Both the in-band and out-of-band spurious tones are greatly suppressed implying the importance of the proposed spurious tone reduction technique. We also observe that there is a low-side spur at 896 MHz (second harmonic) that is not suppressed. The reason is that the low-side spur at 896 MHz still lies in the passband of the off-chip BPF while the high-side spur at 947.2 MHz falls outside the passband of this filter. This can be remedied by customizing the BPF in future works.

One might argue that a BPF with a narrower passband can suppress all the harmonics preserving the desired signal and, thus, relaxing the need for any circuit techniques to maintain spectral purities. This is true if the harmonics are far away from the center frequency, but the filter design becomes challenging as the targeted filtered frequency component is extremely close to the carrier frequency.

Fig. 31 shows the measured TX output phase noise for the two prototypes. The measured phase noise performance at a 1-MHz offset is -82.1 and -96.2 dBc/Hz, respectively. The RO phase noise can be estimated by subtracting the TX phase noise by  $20\log(9)$  (where 9 is the number of RO phases), achieving 101.2 and 115 dBc/Hz at a 1-MHz frequency offset, respectively. The rms jitter performance is estimated from the phase noise plot as 92.67 and 85.25 ps for Chip1 and Chip2, respectively, integrated from 1- to 1-MHz offsets.

Fig. 32 shows the OOK-modulated TX transient output after applying the OOK data. Fig. 32(a) shows the implementation results when applying the input data stream of 101010. Fig. 32(b) shows the results for Chip2 with 20-Mb/s streaming PRBS OOK data. This shows the ability to support low-tomedium data transmission. Table II compares the proposed work with state-of-the-art sub-GHz TXs. A >3X improved

|                                                   | [1]<br>JSSC<br>11' | [18]<br>TMTT<br>12' | [20]<br>JSSC<br>14'     | [41]<br>TMTT<br>14' | [12]<br>TMTT<br>18' | [25]<br>MWCL<br>19' | [13]<br>TCAS-1<br>19'   | [14]<br>TCAS-1<br>20' | [16]<br>RFIC 20'<br>Chip 1                                  | Chip2                                                         |
|---------------------------------------------------|--------------------|---------------------|-------------------------|---------------------|---------------------|---------------------|-------------------------|-----------------------|-------------------------------------------------------------|---------------------------------------------------------------|
| Frequency<br>(MHz)                                | 400                | 915                 | 900                     | 403                 | 915                 | 915                 | 433                     | 484                   | 915                                                         | 921.6                                                         |
| Architecture                                      | Freq.<br>Multi.    | VCO<br>Based        | Freq.<br>Multi          | Freq.<br>Multi      | Freq.<br>Multi.     | Freq.<br>Multi.     | Freq. Multi             | Freq. Multi           | Freq.<br>Multi.                                             | Freq.<br>Multi.                                               |
| Freq. multi.<br>Ratio                             | 9                  | N/A                 | 4                       | 8                   | 3                   | 3                   | 9                       | 4                     | 9                                                           | 9                                                             |
| Carrier generation                                | Inj<br>locked      | Inj<br>locked       | Inj<br>locked<br>w/ FLL | Inj<br>locked       | FLL                 | PLL                 | Injlocked<br>w/ PLL     | PLL                   | Inj<br>locked<br>w/ PLL                                     | Inj<br>locked<br>w/ PLL                                       |
| Frequency<br>calibration                          | Open<br>loop       | Closed<br>loop      | Closed<br>loop          | Open<br>loop        | Closed<br>loop      | Closed<br>loop      | Closed<br>loop          | Closed<br>loop        | Closed<br>loop                                              | Closed<br>loop                                                |
| Spur<br>suppression<br>(dB)                       | 44.4               | N/A                 | 56.0                    | N/A                 | N/A                 | 49.6                | 50.0                    | 40.0                  | 62.1                                                        | 63.9                                                          |
| Pout<br>(dBm)                                     | -17                | -3.3                | -9/<br>-15              | -17                 | -10/<br>-15         | 5.5                 | -24                     | -20                   | -14                                                         | -14                                                           |
| Pdc<br>(μW)                                       | 90                 | 5880                | 3300/<br>2600           | 3320                | 935/<br>620         | 11100               | 248                     | 170                   | 200.9 <sup>3</sup><br>(OOK)<br>258.6 <sup>3</sup><br>(Peak) | 223.12 <sup>3</sup><br>(OOK)<br>287.35 <sup>3</sup><br>(Peak) |
| Power<br>efficiency<br>(mW/GHz)                   | 0.225              | 6.426               | 2.88                    | 8.23                | 0.67                | 12.13               | 0.572                   | 0.351                 | 0.282                                                       | 0.311                                                         |
| Data rate<br>(kb/s)                               | 200                | 50000               | 100000                  | 1000                | 3000                | 100                 | 1000                    | 1000                  | 3000                                                        | 20000                                                         |
| Data<br>modulation                                | BFSK               | QPSK                | QPSK/<br>16QAM          | OOK/O-<br>QPSK      | BFSK                | BFSK                | ООК                     | BFSK                  | ООК                                                         | OOK                                                           |
| TX phase<br>noise @1MHz<br>(dBc/Hz)               | -103               | -125 <sup>4</sup>   | -100.8                  | -110.19             | -94.5               | -94 <sup>4</sup>    | -118.584                | -63.6 <sup>5</sup>    | -82.1                                                       | -96.2                                                         |
| Matching<br>(number of<br>off-chip<br>components) | Off-<br>chip (3)   | On-<br>chip<br>(0)  | Off-chip<br>(2)         | Off-chip<br>(2)     | Off-<br>chip<br>(3) | Off-chip<br>(N/A)   | Off-<br>chip<br>$(4)^2$ | Off-<br>chip<br>(2)   | Off-<br>chip<br>(3)                                         | Off-<br>chip<br>(3)                                           |
| Energy Eff.<br>(pJ/bit)                           | 450                | 117.6               | 33/<br>26               | 3320                | 311.66/<br>206.66   | 111000              | 248                     | 170                   | 66.966                                                      | 12.5                                                          |
| FoM <sup>1</sup>                                  | 22.53              | 0.2514              | 0.26/                   | 166.39              | 3.11/               | 31.28               | 62.29                   | 17                    | 1.68                                                        | 0.313                                                         |

 TABLE II

 Performance Summary and Comparison With State-of-the-Art Sub-GHz TX

 $^{1}$ FoM= P<sub>DC</sub>/(Data Rate × P<sub>OUT</sub>); <sup>2</sup>Estimated by the die photo and schematic; <sup>3</sup>Excluding clock (Fig. 14) power consumption (CLK provided by signal generator), <sup>4</sup>PLL phase noise; <sup>5</sup>Measured under BFSK tone at 10 Mbps.



Fig. 32. Measured OOK-modulated signal. (a) Chip1. (b) Chip2.

energy efficiency under ULP operation is obtained while consistently achieving >60-dB spur suppression.

## VI. CONCLUSION

This work demonstrated digital-intensive ULP ISM band TXs with improved spur suppression using the Type-I

phase correction loop with the injection-locked technique for sub-GHz applications. The Type-I loop successfully corrects the phase error, which effectively minimizes the induced spur from the injection-locked technique. The control voltage ripple due to the correction loop is suppressed using two methods. The first method using a conventional RC lowpass filter with an area-efficient Twin-T notch filter in the Type-I loop, while the second method comprises a discretetime filter with the same Twin-T notch filter. A detailed analysis is presented, which shows that the discrete-time filter with a Twin-T notch filter breaks the performance tradeoff between spur suppression, PM, and loop bandwidth in the first method. Measurements show that the two prototypes can output -14 dBm with an energy efficiency of 66.97- and 12.5-pJ/bit streaming 3- and 20-MB/s OOK data, respectively. The average power consumption can be further minimized with aggressive duty cycling and technology scaling. Benefitting from the digital-intensive architecture, the proposed TX

will, thus, be suitable for sub-GHz low-power IoE wireless sensor network applications.

#### REFERENCES

- J. Pandey and B. P. Otis, "A sub-100 μW MICS/ISM band transmitter based on injection-locking and frequency multiplication," *IEEE J. Solid-State Circuits*, vol. 46, no. 5, pp. 1049–1058, May 2011.
- [2] J. Bae, L. Yan, and H.-J. Yoo, "A low energy injection-locked FSK transceiver with frequency-to-amplitude conversion for body sensor applications," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 928–937, Apr. 2011.
- [3] A. Saito *et al.*, "An all 0.5V, 1 Mbps, 315 MHz OOK transceiver with 38-µW career-frequency-free intermittent sampling receiver and 52-µW class-F transmitter in 40-nm CMOS," *IEEE Symp. VLSI Circuits* (VLSIC), Jun. 2012, pp. 38–39.
- [4] X. Huang, A. Ba, P. Harpe, G. Dolmans, H. De Groot, and J. Long, "A 915 MHz 120 μW-RX/900 μW-TX envelope-detection transceiver with 20 dB in-band interference tolerance," in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2012, pp. 454–456.
- [5] C. Ma, C. Hu, J. Cheng, L. Xia, and P. Y. Chiang, "A near-threshold, 0.16 nJ/b OOK-transmitter with 0.18 nJ/b noise-cancelling super-regenerative receiver for the medical implant communications service," *IEEE Trans. Biomed. Circuits Syst.*, vol. 7, no. 6, pp. 841–850, Dec. 2013.
- [6] K. Natarajan, D. Gangopadhyay, and D. Allstot, "A PLL-based BFSK transmitter with reconfigurable and PVT-tolerant class-C PA for medradio & ISM (433 MHz) standards," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2013, pp. 67–70.
- [7] L. Xia, J. Cheng, N. E. Glover, and P. Chiang, "0.56 V,-20 dBm RF-powered, multi-node wireless body area network system-on-a-chip with harvesting-efficiency tracking loop," *IEEE J. Solid-State Circuits*, vol. 49, no. 6, pp. 1345–1355, Jun. 2014.
- [8] M. S. Jahan, J. Langford, and J. Holleman, "A low-power FSK/OOK transmitter for 915 MHz ISM band," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, May 2015, pp. 163–166.
- [9] Y.-L. Tsai, C.-Y. Lin, B.-C. Wang, and T.-H. Lin, "A 330-μW 400-MHz BPSK transmitter in 0.18 μm CMOS for biomedical applications," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 5, pp. 448–452, May 2016.
- [10] L.-X. Chuo et al., "7.4 A 915 MHz asymmetric radio using Q-enhanced amplifier for a fully integrated 3×3×3mm<sup>3</sup> wireless sensor node with 20m non-line-of-sight communication," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2017, pp. 132–133.
- [11] A. Srivastava, D. Das, P. Mathur, D. K. Sharma, and M. S. Baghini, "0.43-nJ/bit OOK transmitter for wearable and implantable devices in 400-MHz MedRadio band," *IEEE Microw. Wireless Compon. Lett.*, vol. 28, no. 3, pp. 263–265, Mar. 2018.
- [12] J. Zarate-Roldan *et al.*, "0.2-nJ/b fast start-up ultralow power wireless transmitter for IoT applications," *IEEE Trans. Microw. Theory Techn.*, vol. 66, no. 1, pp. 259–272, Jan. 2018.
- [13] H.-C. Cheng, Y.-T. Chen, P.-H. Chen, and Y.-T. Liao, "An optically-powered 432 MHz wireless tag for batteryless Internet-of-Things applications," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 9, pp. 3288–3295, Sep. 2019.
- [14] M. A. A. Ibrahim and M. Onabajo, "A low-power BFSK transmitter architecture for biomedical applications," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 5, pp. 1527–1540, May 2020.
- [15] C.-C. Lin, H. Hu, and S. Gupta, "Spur minimization techniques for ultralow-power injection-locked transmitters," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 67, no. 11, pp. 3643–3655, Nov. 2020.
- [16] C.-C. Lin, H. Hu, and S. Gupta, "A 66.97 pJ/bit, 0.0413 mm<sup>2</sup> selfaligned PLL-calibrated harmonic-injection-locked TX with >62 dBc spur suppression for IoT applications," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Aug. 2020, pp. 323–326.
- [17] M. Vidojkovic et al., "9.7 A 0.33 nJ/b IEEE802.15.6/proprietary-MICS/ISM-band transceiver with scalable data-rate from 11 kb/s to 4.5 Mb/s for medical applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2014, pp. 170–171.
- [18] S. Diao et al., "A 50-Mb/s CMOS QPSK/O-QPSK transmitter employing injection locking for direct modulation," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 1, pp. 120–130, Jan. 2012.
- [19] A. C. W. Wong *et al.*, "A 1 V 5 mA multimode IEEE 802.15.6/Bluetooth low-energy WBAN transceiver for biotelemetry applications," *IEEE J. Solid-State Circuits*, vol. 48, no. 1, pp. 186–198, Jan. 2013.

- [20] X. Liu, M. M. Izad, L. Yao, and C.-H. Heng, "A 13 pJ/bit 900 MHz QPSK/16-QAM band shaped transmitter based on injection locking and digital PA for biomedical applications," *IEEE J. Solid-State Circuits*, vol. 49, no. 11, pp. 2408–2421, Nov. 2014.
- [21] Y.-H. Liu, L.-G. Chen, C.-Y. Lin, and T.-H. Lin, "A 650-pJ/bit MedRadio transmitter with an FIR-embedded phase modulator for medical micropower networks (MMNs)," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 12, pp. 3279–3288, Dec. 2013.
- [22] S.-J. Cheng, Y. Gao, W.-D. Toh, Y. Zheng, M. Je, and C.-H. Heng, "A 110 pJ/b multichannel FSK/GMSK/QPSK/p/4-DQPSK transmitter with phase-interpolated dual-injection DLL-based synthesizer employing hybrid FIR," in *Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, Feb. 2013, pp. 450–451.
- [23] K.-H. Teng and C.-H. Heng, "A 370-pJ/b multichannel BFSK/QPSK transmitter using injection-locked fractional-N synthesizer for wireless biotelemetry devices," *IEEE J. Solid-State Circuits*, vol. 52, no. 3, pp. 867–880, Mar. 2017.
- [24] K.-H. Teng *et al.*, "A 400 MHz wireless neural signal processing IC with 625 × on-chip data reduction and reconfigurable BFSK/QPSK transmitter based on sequential injection locking," *IEEE Trans. Biomed. Circuits Syst.*, vol. 11, no. 3, pp. 547–557, Jun. 2017.
  [25] K.-S. Choi *et al.*, "A 5.5-dBm, 31.9% efficiency 915-MHz transmitter
- [25] K.-S. Choi *et al.*, "A 5.5-dBm, 31.9% efficiency 915-MHz transmitter employing frequency tripler and 207-μW synthesizer," *IEEE Microw. Wireless Compon. Lett.*, vol. 30, no. 1, pp. 90–93, Jan. 2020.
- [26] J.-C. Chien and L.-H. Lu, "Analysis and design of wideband injectionlocked ring oscillators with multiple-input injection," *IEEE J. Solid-State Circuits*, vol. 42, no. 9, pp. 1906–1915, Sep. 2007.
- [27] J. Lee and H. Wang, "Study of subharmonically injection-locked PLLs," *IEEE J. Solid-State Circuits*, vol. 44, no. 5, pp. 1539–1553, May 2009.
- [28] W. Li, Y. Duan, and J. Rabaey, "A 200-Mb/s energy efficient transcranial transmitter using inductive coupling," *IEEE Trans. Biomed. Circuits Syst.*, vol. 13, no. 2, pp. 435–443, Apr. 2019.
- [29] S. Yang, J. Yin, H. Yi, W.-H. Yu, P.-I. Mak, and R. P. Martins, "A 0.2-V energy-harvesting BLE transmitter with a micropower manager achieving 25% system efficiency at 0-dBm output and 5.2-nW sleep power in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1351–1362, May 2019.
- [30] L. Kong and B. Razavi, "A 2.4 GHz 4 mW integer-N inductorless RF synthesizer," *IEEE J. Solid-State Circuits*, vol. 51, no. 3, pp. 626–635, Mar. 2016.
- [31] S.-H. Wang and C.-C. Hung, "A 0.35-V 240-μW fast-lock and low-phase-noise frequency synthesizer for implantable biomedical applications," *IEEE Trans. Biomed. Circuits Syst.*, vol. 13, no. 6, pp. 1759–1770, Dec. 2019.
- [32] Twin-T Notch Filter Design Tool. Accessed: Mar. 9, 2020. [Online]. Available: http://sim.okawa-denshi.jp/en/TwinTCRkeisan.htm
- [33] K. J. Wang and I. Galton, "A discrete-time model for the design of type-II PLLs with passive sampled loop filters," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 58, no. 2, pp. 264–275, Feb. 2011.
- [34] J. R. Partington, "Some frequency-domain approaches to the model reduction of delay systems," Annu. Rev. Control, vol. 28, no. 1, pp. 65–73, Jan. 2004.
- [35] S. Yoo, S. Choi, J. Kim, H. Yoon, Y. Lee, and J. Choi, "A lowintegrated-phase-noise 27–30-GHz injection-locked frequency multiplier with an ultra-low-power frequency-tracking loop for mm-wave-band 5G transceivers," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 375–388, Feb. 2018.
- [36] A. Musa, W. Deng, T. Siriburanon, M. Miyahara, K. Okada, and A. Matsuzawa, "A compact, low-power and low-jitter dual-loop injection locked PLL using all-digital PVT calibration," *IEEE J. Solid-State Circuits*, vol. 49, no. 1, pp. 50–60, Jan. 2014.
- [37] T. Yoshimura, "Study of injection pulling of oscillators in phase-locked loops," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 29, no. 2, pp. 321–332, Feb. 2021.
- [38] S. Ye, L. Jansson, and I. Galton, "A multiple-crystal interface PLL with VCO realignment to reduce phase noise," *IEEE J. Solid-State Circuits*, vol. 37, no. 12, pp. 1795–1803, Dec. 2002.
- [39] S. Kalia, M. Elbadry, B. Sadhu, S. Patnaik, J. Qiu, and R. Harjani, "A simple, unified phase noise model for injection-locked oscillators," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, Baltimore, MD, USA, Jun. 2011, pp. 1–4.
- [40] H.-C. Chen, M.-Y. Yen, Q.-X. Wu, K.-J. Chang, and L.-M. Wang, "Batteryless transceiver prototype for medical implant in 0.18-μm CMOS technology," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 1, pp. 137–147, Jan. 2014.



Chung-Ching Lin (Student Member, IEEE) received the M.S. degree in communication engineering from Yun Ze University, Taoyuan, Taiwan, in 2014. He is currently pursuing the Ph.D. degree at Washington State University, Pullman, WA, USA.

His current research interest includes low-power and wideband multiantenna transceiver designs.

Mr. Lin was a recipient of the IEEE CICC Educational Grants Award in 2020, the IEEE CAS Travel Award in 2019, the Southern Methodist

University Graduate Student Travel Grant in 2018, and the Yu-Ziang Academic Scholarship in 2013. He is also the IEEE RFIC Symposium Best Student Paper Award Nominee (out of 12 finalists) in 2020.



Subhanshu Gupta (Senior Member, IEEE) received the B.E. degree from the National Institute of Technology (NIT) at Tiruchirappalli, Tiruchirappalli, India, in 2002, and the M.S. and Ph.D. degrees from the University of Washington, Seattle, WA, USA, in 2006 and 2010, respectively.

He has held industrial positions at Maxlinear, Irvine, CA, USA, where he worked on wideband transceivers for SATCOM and infrastructure applications. He is currently an Assistant Professor of electrical engineering and computer science with

Washington State University, Pullman, WA, USA. His research interests include large-scale phased arrays and wideband transceivers, energy-efficient circuits and systems, and statistical hardware optimization for next-generation wireless communications, the Internet of Things, and quantum applications.

Dr. Gupta was a recipient of the National Science Foundation CAREER Award in 2019, the Department of Defense DURIP Award in 2021, and the Cisco Faculty Research Award in 2017. He was awarded the Analog Devices Outstanding Student Designer Award in 2008 and the IEEE RFIC Symposium Best Student Paper Award (third place) in 2011. He has served as a Guest Editor for the *IEEE Design & Test of Computers* in 2019. He serves as an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I for the term 2020–2021.



Huan Hu (Student Member, IEEE) received the B.S. degree in electrical engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 2013, and the M.S. degree from Oregon State University, Corvallis, OR, USA, in 2015. He is currently pursuing the Ph.D. degree in electrical engineering at Washington State University, Pullman, WA, USA.

His research interests include ultralow-power sensor interface designs, clock generation, and subthreshold circuit designs.

Mr. Hu is the IEEE RFIC Symposium Best Student Paper Award Nominee (out of 12 finalists) in 2020.