# Fast Beam Training With True-Time-Delay Arrays in Wideband Millimeter-Wave Systems

Veljko Boljanovic<sup>10</sup>, Student Member, IEEE, Han Yan<sup>10</sup>, Member, IEEE,

Chung-Ching Lin<sup>®</sup>, Student Member, IEEE, Soumen Mohapatra, Student Member, IEEE,

Deukhyoun Heo<sup>®</sup>, Senior Member, IEEE, Subhanshu Gupta<sup>®</sup>, Senior Member, IEEE,

and Danijela Cabric, Fellow, IEEE

Abstract-The best beam steering directions are estimated through beam training, which is one of the most important and challenging tasks in millimeter-wave and sub-terahertz communications. Novel array architectures and signal processing techniques are required to avoid prohibitive beam training overhead associated with large antenna arrays and narrow beams. In this work, we leverage recent developments in true-time-delay (TTD) arrays with large delay-bandwidth products to accelerate beam training using frequency-dependent probing beams. We propose and study two TTD architecture candidates, including analog and hybrid analog-digital arrays, that can facilitate beam training with only one wideband pilot. We also propose a suitable algorithm that requires a single pilot to achieve high-accuracy estimation of angle of arrival. The proposed array architectures are compared in terms of beam training requirements and performance, robustness to practical hardware impairments, and power consumption. The findings suggest that the analog and hybrid TTD arrays achieve a sub-degree beam alignment precision with 66% and 25% lower power consumption than a fully digital array, respectively. Our results yield important design trade-offs among the basic system parameters, power consumption, and accuracy of angle of arrival estimation in fast TTD beam training.

*Index Terms*— True-time-delay array, array architecture, beam training, millimeter-wave communication, wideband systems.

#### I. INTRODUCTION

BUNDANT spectrum at millimeter-wave (mmW) frequencies is seen as the key resource for providing high data rates in the fifth generation of cellular systems [1].

Manuscript received November 9, 2020; revised January 5, 2021; accepted January 13, 2021. This work was supported in part by NSF under Grant 1718742, Grant 1955672, Grant 1955306, and Grant 1944688; and in part by the ComSenTer and CONIX Research Centers, two of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by the Defense Advanced Research Projects Agency (DARPA). This article was recommended by Associate Editor H. Sjoland. (*Corresponding author: Veljko Boljanovic.*)

Veljko Boljanovic, Han Yan, and Danijela Cabric are with the Department of Electrical and Computer Engineering, University of California at Los Angeles, Los Angeles, CA 90095 USA (e-mail: vboljanovic@ucla.edu; yhaddint@ucla.edu; danijela@ee.ucla.edu).

Chung-Ching Lin, Soumen Mohapatra, Deukhyoun Heo, and Subhanshu Gupta are with the School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164 USA (e-mail: chung-ching.lin@wsu.edu; soumen.mohapatra@wsu.edu; dheo@wsu.edu; sgupta@eecs.wsu.edu).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/TCSI.2021.3054428.

Digital Object Identifier 10.1109/TCSI.2021.3054428

However, the use of mmW communication bands comes at the cost of less favorable propagation conditions [2]. Both the base station (BS) and user equipment (UE) are required to use large antenna arrays to achieve high beamforming (BF) gain and compensate for severe propagation loss. Beam pointing directions are estimated through *beam training*, a procedure that identifies the angle of arrival (AoA) and angle of departure (AoD) of the dominant propagation path in the wireless channel. Apart from aligning the beams for data communication, knowledge of the AoA and AoD is of utmost importance for other applications in practical mmW systems, including interference nulling and localization [3].

The existing mmW systems utilize analog array architecture with a single transceiver radio frequency (RF)-chain at both the BS and UE due to its power efficiency. Such arrays are referred to as phased arrays since they use adjustable phase shifters to allow coherent signal steering/combining in a desired direction. The existing beam training schemes with phased arrays include various types of extensive beam sweeping, where beams with different pointing directions are synthesised to probe the channel sequentially in order to find the AoD and AoA [4]–[7]. The required number of probing beams linearly scales with the number of antenna elements in the array, which directly translates into beam training overhead and latency. Hence, conventionally used beam sweeping faces scalability challenge in higher mmW frequency bands, where more antenna elements will be used to achieve the required BF gain.

Previous work that addresses the beam training problem can be divided into two categories. The first category intends to reduce the required number of probing beams. Specifically, the number scales logarithmically with the array size when advanced signal processing techniques that exploit the sparsity of mmW channel are used [8]–[10]. Further, various side-information, e.g., location information, out-of-band measurements [11], and dedicated short-range communication [12], can also be used to reduce the required number of probing beams. The second category aims to enhance the simultaneous channel probing capability by using advanced hardware design [13]–[20]. These approaches are more robust when the channel sparsity and side information are not available. Fully digital array architectures, with a dedicated

1549-8328 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. RF-chain per each antenna element, offer the highest flexibility and capability of channel probing. From the signal processing perspective, signals from all antenna branches can be steered/combined to simultaneously probe all angular directions for fast AoD/AoA estimation [13]-[15]. Fullyconnected or sub-array based hybrid arrays are another way to enhance simultaneous probing of the channel [16]-[18]. They can probe multiple directions simultaneously and the flexibility increases linearly with the number of RF-chains that control phase shifter based analog front-end [13]. The probing capability of hybrid arrays can be further enhanced by associating probing beams with different frequencies using spatio-spectral BF [16]. Leaky wave antenna (LWA) can scan all angular directions simultaneously by using different frequency resources since the pointing directions of the beams are frequency-dependent [19], [20]. However, the existing LWA technique requires access to THz spectrum for adequate frequency dispersive beam steering.

TTD arrays are another appealing, yet insufficiently investigated alternative for fast mmW beam training. Due to time delaying of the signal in each antenna branch, TTD arrays have frequency-dependent probing beams, which can be exploited to enhance the channel probing capability. Further, the frequency-dependent beams can be fully controlled by adjusting the delay introduced in TTD circuits [21]. Early implementations relied on delay lines in all antenna branches [22], but this approach suffered from low scalability in terms of required area and power efficiency when the array size becomes large. Further, limited delay range at RF is insufficient to achieve frequency dispersive beam training as proposed in this work. Recent advancement in TTD arrays with baseband delay elements and large delay range-toresolution ratios [23], [24], improved the scalability and thus enabled the realization of fast beam training schemes with large arrays.

In this paper, we extend our previous work [25] and present the design of baseband TTD array architectures for mmW beam training. To the best of our knowledge, this is the first work that comprehensively study the system aspects of TTD-based mmW beam training with dispersive channel probing. Compared to the previous work in [25], this paper explains the used digital signal processing (DSP) beam training algorithm in more details and it introduces the following major novelties:

- We introduce a hybrid TTD architecture for mmW beam training, which uses signal delaying in both analog and digital domains to overcome the maximum delay compensation problem observed in analog TTD array in our previous work.
- We propose a benchmark emulation of frequencydependent TTD-based beam training using a fully digital array and time-domain DSP, to analyze the advantages and disadvantages of TTD array architectures for mmW beam training.
- We perform a thorough comparison of analog TTD, hybrid TTD, and benchmark fully digital arrays in terms of beam training hardware requirements, dependency of beam training on the basic system parameters and



Fig. 1. Architecture of analog TTD array with uniform delay spacing  $\Delta \tau$  and phase spacing  $\Delta \phi$  between antennas. The design of combiners and DSP algorithm is explained in Section III-B.

TTD hardware constraints, and robustness to hardware impairments and quantization errors in analog-to-digital converters (ADC).

• Based on the TTD hardware prototype from our previous work [23], we model and estimate the power consumption of the proposed TTD array architectures in the beam training framework. We investigate how power consumption scales with the key system parameters, including the bandwidth and array size, which provides an insight into the beam training design in future mmW/sub-THz systems. Power consumption of the fully digital array is included as the benchmark.

The rest of the paper is organized as follows. In Section II, we introduce the two TTD architectures and benchmark fully digital array. Section III introduces a wideband system model and it describes the beam training codebook and DSP algorithm design. In Section IV, we thoroughly compare the considered array architectures. Power consumption of all three architectures is modeled and evaluated in Section V. Section VI concludes the paper.

## II. TTD ARRAY ARCHITECTURES FOR BEAM TRAINING

The realization and performance of TTD beam training schemes heavily depends on the underlying TTD hardware. The design of a fast high performance beam training scheme imposes a challenging delay range requirement on TTD circuits, which raises the question of a *beam-training-efficient* TTD array architecture. In this work, the efficiency depends the number of pilots used in beam training, angle estimation accuracy, and array power consumption. To address this question, we propose and extensively compare two uniform linear array architectures with baseband TTD elements, including analog and hybrid analog-digital arrays. We include a fully digital array architecture in the comparison as the benchmark. In particular, we use it to emulate TTD-based beam training and thus highlight the advantages and disadvantages of TTD arrays. All three considered array architectures are described in the reminder of this section.

An analog uniform linear TTD array with a single RF-chain and  $N_{\rm R}$  antennas is presented in Fig. 1. The *n*-th antenna branch has an analog phase shifter with the phase tap  $\phi_{{\rm A},n} =$  $(n-1)\Delta\phi$  and an analog baseband TTD element with the delay tap  $\tau_{{\rm A},n} = (n-1)\Delta\tau$ , where  $\Delta\phi$  and  $\Delta\tau$  represent the phase

BOLJANOVIC et al.: FAST BEAM TRAINING WITH TTD ARRAYS IN WIDEBAND mmW SYSTEMS



Fig. 2. Architecture of hybrid analog-digital TTD array with uniform delay spacing  $\Delta \tau$  and phase spacing  $\Delta \phi$  between antennas. The design of combiners and DSP algorithm is explained in Section III-B.



Fig. 3. Architecture of the benchmark fully digital array that is used to emulate TTD-based beam training by introducing digital delays. The design of combiners and DSP algorithm is explained in Section III-B.

and delay spacing between neighboring branches, respectively. Note that the phase shifters in the analog array can be implemented in the RF path, local oscillator (LO) path, or baseband domain [26]. From mathematical perspective, these different implementations introduce the same phase taps in beam training algorithm design. In this work, we assume that the phase shifters are implemented in the LO path, as depicted in Fig. 1. In practice, the phase taps  $\phi_{A,n}$ ,  $n = 1, ..., N_R$ , can be distorted due to the errors in phase shifters, LOs, imbalance between in-phase and quadrature samples, or other hardware imperfections. Similarly, errors in TTD elements can distort the delay taps  $\tau_{A,n}$ ,  $n = 1, \ldots, N_R$ . In all antenna branches, we model the time-invariant distorted taps as independent zero-mean Gaussian random variables  $\tilde{\phi}_{A,n} \sim \mathcal{N}\left(\phi_{A,n}, \sigma_{P}^{2}\right)$ and  $\tilde{\tau}_{A,n} \sim \mathcal{N}(\tau_{A,n}, \sigma_T^2)$ , respectively. For a specific delay spacing  $\Delta \tau$ , TTD frequency-dependent antenna weight vector (AWV) results in a fixed beam training codebook of pencil beams, where different frequency components of the signal are hard-coded in different angular directions. The frequencyflat phase shifters increase the flexibility by enabling codebook rotations and different frequency-to-angle mapping. The maximum delay in the N<sub>R</sub>-th antenna branch is  $\tau_{A,N_R}$  =  $(N_{\rm R}-1)\Delta\tau$ , which becomes an implementation bottleneck for large antenna arrays. The state-of-the-art TTD delay range is in the order of 15 ns [23], which can be insufficient for wideband beam training with a moderate number of antenna elements  $N_{\rm R}$ , e.g.,  $N_{\rm R} = 32$ , as we previously discussed in [25].

To alleviate the delay range requirement and improve the scalability of analog TTD arrays, we introduce a hybrid analog-digital architecture with  $N_{\rm H}$  sub-arrays, each controlled by one distinct RF-chain, as illustrated in Fig. 2. The hybrid array uses a combination of analog and digital signal delaying, where first all the sub-arrays of  $N_{\rm r}$  antennas introduce the same delays  $\tau_{{\rm A},n'} = (n'-1)\Delta\tau$ ,  $n' = 1, \ldots, N_{\rm r}$ , in the analog domain. The relative delay difference among antennas is compensated in the digital domain by introducing the fixed digital taps  $\tau_{{\rm D},h} = (h-1)N_{\rm r}\Delta\tau$ ,  $h = 1, \ldots, N_{\rm H}$ , i.e., digital delays  $f_{\rm s}\tau_{{\rm D},h}$ , where  $f_{\rm s}$  is the sampling frequency. As in the analog TTD array, the distorted phase taps  $\tilde{\phi}_{{\rm A},n}$ ,  $n = 1, \ldots, N_{\rm R}$ , are modeled as independent Gaussian random variables.

A fully digital array, used as the benchmark, is illustrated in Fig. 3. The digital array can emulate a TTD array through DSP by using the fixed digital taps  $\tau_{D,n} = (n-1)\Delta\tau$ , n = $1, \ldots, N_R$ , i.e., digital delays  $f_s\tau_{D,n}$  in the corresponding antenna branches. We assume phase-only BF without magnitude control in order to create a codebook of pencil beams as with both analog and hybrid TTD arrays. The ability to control the digital phases  $\phi_{D,n}$ ,  $n = 1, \ldots, N_R$ , in DSP, allows the signal frequency components to be independently steered/combined in any angular direction, which provides high flexibility in the beam training design. The digital array does not have the analog phase shifters and TTD elements before the ADCs, and it is assumed to be insensitive to hardware errors. However, each antenna element has a dedicated



Fig. 4. Beam training in clustered frequency-selective multipath channel: (a) An example of frequency-selective channel with two multipath clusters. Frequencyselectivity comes from intra- and inter-cluster delay spreads. The first cluster is dominant and its AoA needs to be estimated. (b) Channel observation of a phased array when only one pilot is used. Beam sweeping is necessary to cover all angles in the range  $(-\pi/2, \pi/2)$ . (c) Channel observation of a TTD array when only one pilot is used. Frequency components (subcarriers) are mapped into different angles to simultaneously probe the range  $(-\pi/2, \pi/2)$ . The angle estimation may fail in frequency-selective channels. (d) Enhanced TTD codebook with frequency diversity order R = 2.

RF-chain, which significantly affects the array power efficiency, as discussed later in Section V.

In the next section, we explain how  $\Delta \tau$  and  $\Delta \phi$  are set up in all three architecture to obtain a beam training codebook robust to frequency-selective channels. We also introduce a DSP algorithm that exploits this codebook. Based on the designed  $\Delta \tau$ , Section IV discusses the requirements in TTD hardware implementation and impact of hardware impairments on the beam training performance. Accounting for the designed  $\Delta \tau$ and proposed baseband TTD implementation, we compare the three architectures in terms of power consumption in Section V.

## **III. TTD BEAM TRAINING ALGORITHM DESIGN**

In this section, we describe a DSP algorithm which achieves a high angle estimation accuracy using only one pilot symbol in a clustered frequency-selective multipath channel.

We consider downlink beam training between the BS and UE, where the cyclic prefix (CP) based orthogonal frequencydivision multiplexing (OFDM) waveform is used as a training pilot. The carrier frequency, bandwidth, and number of subcarriers are denoted as  $f_c$ , BW, and  $M_{tot}$ , respectively. The power-normalized training pilot uses M subcarriers from the predefined set  $\mathcal{M}$ , all loaded with binary phase shift keying modulated symbols. Both the BS and UE have halfwavelength spaced uniform linear arrays with  $N_T$  and  $N_R$ antennas, respectively.

## A. Channel and Received Signal Models

We consider a frequency-selective channel with *L* multipath clusters. An example of the channel with two clusters, as seen by the UE, is illustrated in Fig. 4(a). With coherence bandwidth BW<sub>c</sub>, the bandwidth BW can be segmented into  $K_c = \lceil BW/BW_c \rceil$  distinct sub-bands that have different channels, where  $\lceil x \rceil$  rounds *x* to the nearest greater integer. We assume that all OFDM subcarriers within the *k*-th sub-band experience the same channel  $\mathbf{H}[k] \in \mathbb{C}^{N_{\mathrm{R}} \times N_{\mathrm{T}}}$ , which can be expressed as

$$\mathbf{H}[k] = \sum_{l=1}^{L} G_l[k] \mathbf{a}_{\mathrm{R}}(\theta_l^{(\mathrm{R})}) \mathbf{a}_{\mathrm{T}}^{\mathrm{H}}(\theta_l^{(\mathrm{T})}), \qquad (1)$$

where  $\theta_l^{(R)}$  and  $\theta_l^{(T)}$  are the AoA and AoD of the *l*-th cluster, defined with respect to the local coordinate systems at the UE and BS, respectively. The relationship between the sub-band index *k* and subcarrier index *m* is given as  $k = \lceil (mK_c)/M_{tot} \rceil$ . We assume the array responses are frequency flat, i.e.,  $[\mathbf{a}_R(\theta)]_n = N_R^{-1/2} \exp(-j(n-1)\pi \sin(\theta)), n = 1, \ldots, N_R$  and  $[\mathbf{a}_T(\theta)]_n = N_T^{-1/2} \exp(-j(n-1)\pi \sin(\theta)), n = 1, \ldots, N_T$ . The complex gains  $G_l[k] \sim C\mathcal{N}(0, \sigma_l^2), \forall l, k$ , come from the multipath rays within the *l*-th cluster, and they are assumed to be independent across different clusters and frequency sub-bands. The frequency-domain channel model in (1) can be approximated as [27]

$$\mathbf{H}[k] \approx \mathbf{A}_{\mathbf{R}} \Lambda[k] \mathbf{A}_{\mathbf{T}}^{\mathbf{H}},\tag{2}$$

where  $\mathbf{A}_{R} \in \mathbb{C}^{N_{R} \times Q}$  and  $\mathbf{A}_{T} \in \mathbb{C}^{N_{T} \times Q}$  contain Q array responses  $\mathbf{a}_{R}(\xi_{q})$  and  $\mathbf{a}_{T}(\xi_{q})$  that correspond to Q uniformly spaced angles  $\xi_{q}$ , q = 1, ..., Q, in the range  $(-\pi/2, \pi/2)$ . The square matrix  $\Lambda[k] \in \mathbb{C}^{Q \times Q}$  has only L non-zero elements that correspond to the gains  $G_{I}[k]$ ,  $\forall I$ . Commonly,  $Q \gg L$  and the approximation error in (2) can be neglected.

In general, AoDs evolve slower than AoAs over time in mmW channels. Since BSs have fixed orientation of antenna arrays, the evolution of AoDs is determined by the gradual birth and death of channel clusters [28], [29]. On the other hand, UEs are prone to swift rotations in antenna orientations, which can lead to significant changes of AoAs, even in low mobility environments [28], [29]. Additionally, mmW BSs are likely to be equipped with fully digital antenna arrays [30], which enable the dominant AoD to be estimated using a single pilot by probing all angular directions at once [15]. Thus, for the remainder of this paper, we assume that the slowlychanging AoD  $\theta^{(T)}$  at the BS has already been estimated and used to design a fixed frequency-flat beam defined by the precoder vector  $\mathbf{v} \in \mathbb{C}^{N_{\mathrm{T}}}$ . The real challenge arises at the UE side where the dynamic AoA changes require frequent beam training to be performed in a fast and power-efficient manner. In this work, we propose the UE to be equipped with a TTD array and exploit its frequency-dependent beamforming to achieve a single-shot estimation of the AoA  $\theta^{(R)}$ . Therefore, the received signal Y[m] at the *m*-th subcarrier of the used OFDM pilot is

$$Y[m] = \mathbf{w}^{\mathsf{H}}[m]\mathbf{H}[k]\mathbf{v} + \mathbf{w}^{\mathsf{H}}[m]\mathbf{n}[m], \ m \in \mathcal{M},$$
(3)

where  $\mathbf{n} \sim C\mathcal{N}(0, \sigma_N^2 \mathbf{I}_{N_R})$  is white Gaussian noise. The UE TTD combiner  $\mathbf{w}[m] \in \mathbb{C}^{N_R}$  of the *m*-th subcarrier can be decomposed as an element-wise Hadamard product of the analog combiner  $\mathbf{w}_A[m] \in \mathbb{C}^{N_R}$  and digital combiner  $\mathbf{w}_D[m] \in \mathbb{C}^{N_R}$ , i.e.,  $\mathbf{w}[m] = \mathbf{w}_A[m] \odot \mathbf{w}_D[m] = [[\mathbf{w}_A[m]]_1 [\mathbf{w}_D[m]]_1, \dots, [\mathbf{w}_A[m]]_{N_R} [\mathbf{w}_D[m]]_{N_R}]^T$ . Both the analog and digital combiners depend on the underlying array architecture. In an analog TTD array,  $\mathbf{w}_D[m] = \mathbf{1}_{N_R}$ , i.e.,  $\mathbf{w}[m] = \mathbf{w}_A[m]$ , since there is no digital combining and both the phases  $\phi_{A,n}$ ,  $\forall n$ , and delays  $\tau_{A,n}$ ,  $\forall n$ , are introduced in the analog domain. On the other hand, with a fully digital array,  $\mathbf{w}_A[m] = \mathbf{1}_{N_R}$ , i.e.,  $\mathbf{w}[m] = \mathbf{w}_D[m]$ , as the array is insensitive to hardware impairments and the signal is combined in the digital domain after applying the phases  $\phi_{D,n}$ ,  $\forall n$ , and delays  $\tau_{D,n}$ ,  $\forall n$ . In general, the *n*-th elements of  $\mathbf{w}_A[m]$  and  $\mathbf{w}_D[m]$  are given as

$$[\mathbf{w}_{\mathrm{A}}[m]]_{n} = \exp\left[-j\left(2\pi\left(f_{m} - f_{\mathrm{c}}\right)\tilde{\tau}_{\mathrm{A},n} + \tilde{\phi}_{\mathrm{A},n}\right)\right] \quad (4)$$

$$[\mathbf{w}_{\mathrm{D}}[m]]_{n} = \exp\left[-j\left(2\pi\left(f_{m} - f_{\mathrm{c}}\right)\tau_{\mathrm{D},n} + \phi_{\mathrm{D},n}\right)\right]$$
(5)

where  $f_m = f_c - BW/2 + (m-1)BW/(M_{tot} - 1)$ . Note that there is no magnitude, but only phase and delay control in (5), since the digital array is used to emulate TTD-based beam training with pencil beams in this work.

The expressions (4) and (5) indicate that the beam pointing direction depends on the subcarrier frequency, phases, and delays. With a proper configuration of the phase and delay taps in the analog and/or digital domain, it is possible to set up a codebook of combiners that covers all angular directions, as we discuss in the next subsection.

## B. DSP Algorithm for Beam Training

In this subsection, we first present the design of a robust codebook and then describe a DSP algorithm for TTD arrays [25] that achieves a high resolution in AoA estimation.

As illustrated in Fig. 4(b), conventional phased arrays cannot estimate the AoA of the dominant cluster with one training pilot, and thus they require exhaustive beam sweeping. On the other hand, we have demonstrated in [21] that D spatial directions in the angular range  $(-\pi/2, \pi/2)$  can be simultaneously probed using an analog TTD array and a single OFDM symbol by mapping one subcarrier per direction, as illustrated in Fig. 4(c). We have shown that this can be achieved by setting the delay spacing to be  $\Delta \tau = 1/BW$ . The resulting codebook is, however, sensitive to frequency-selective channels since certain subcarriers can experience deep fades and thus miss to detect the incoming signal. The codebook can be enhanced by increasing its frequency diversity order R, i.e., by mapping Rdistinct subcarriers in each probed direction [25]. Note that this enhancement requires M = DR ( $M \le M_{tot}$ ) subcarriers to be used in beam training. The benefit of the enhanced codebook is illustrated in Fig. 4(d) for R = 2, where two subcarriers detect the dominant cluster. To increase the diversity, we define D



Fig. 5. An example of robust TTD codebook for  $N_{\rm R} = 16$ , D = 16, and R = 4. All D = 16 directions are probed simultaneously. Direction d,  $1 \le d \le D$ , is associated with set of subcarriers  $\mathcal{M}_d$  and combiner  $\mathbf{f}_d$ .

distinct sets  $\mathcal{M}_d$ ,  $1 \le d \le D$ , of *R* subcarriers, where each set is associated with a different direction *d*,  $1 \le d \le D$ . Mathematically, the *R* subcarriers from the set  $\mathcal{M}_d$  have the same combiner  $\mathbf{f}_d$ , i.e.,  $\mathbf{w}[m] = \mathbf{f}_d$ ,  $\forall m \in \mathcal{M}_d$ , where the *n*-th element of  $\mathbf{f}_d$  is defined as

$$[\mathbf{f}_d]_n = \exp[-j2\pi (n-1)(d-1-D/2)/D], \quad d \le D.$$
 (6)

The subcarriers in  $\mathcal{M}_d$ , however, should experience different channels, and thus we choose them uniformly across the bandwidth. So long as  $R \leq K_c$ , the subcarriers in  $\mathcal{M}_d$  see different channels. This codebook can be created for an analog TTD array by setting the *n*-th phase and delay taps as follows

$$\phi_{\mathrm{A},n} = (n-1)[\pi \sin(\theta_{\mathrm{s}}) - \psi], \tag{7}$$

$$\tau_{\mathrm{A},n} = (n-1)R/\mathrm{BW},\tag{8}$$

where  $\psi = \text{mod}(2\pi R(f_1 - f_c)/\text{BW} + \pi, 2\pi) - \pi$ , and mod() is the modulo operator. To ensure that  $\mathbf{w}[m] = \mathbf{f}_d, \forall m \in \mathcal{M}_d$ , for d = 1, ..., D, we set the steering angle  $\theta_s$  to be  $\theta_s = -\pi/2$ . An example of the resulting codebook with  $N_{\rm R} = 16$ , D = 16, and R = 4 is provided in Fig. 5. Different values of  $\theta_s$  in (7) result in different codebook rotations, while the changes in (8) enable the adjustment of the range of probed angles. Note that the same enhanced codebook can be created for the hybrid TTD or fully digital array without the need to implement a fractional ADC sampling since  $\Delta \tau$  is proportional to the Nyquist sampling period, i.e.,  $\Delta \tau = R/BW$ . Analog and digital delay taps of the hybrid array introduced in Section II, can be expressed with respect to the indices of all antenna elements in the array  $n = 1, ..., N_{\rm R}$ , as  $\tau_{{\rm A},n} =$  $(n-1-\lfloor (n-1)/N_r \rfloor N_r) \Delta \tau$ , and  $\tau_{D,n} = \lfloor (n-1)/N_r \rfloor N_r \Delta \tau$ , respectively. The operator  $\lfloor x \rfloor$  rounds x to the nearest lower integer. Thus, the hybrid TTD array can create the enhanced codebook by setting the *n*-th taps of its analog and digital combiners in the following way

$$\phi_{\mathrm{A},n} = (n-1)[\pi \sin(\theta_{\mathrm{s}}) - \psi], \qquad (9)$$

$$\tau_{\mathrm{A},n} = (n-1 - \lfloor (n-1)/N_{\mathrm{r}} \rfloor N_{\mathrm{r}}) R/\mathrm{BW}, \qquad (10)$$

$$\tau_{\mathrm{D},n} = \lfloor (n-1)/N_{\mathrm{r}} \rfloor N_{\mathrm{r}} R/\mathrm{BW}, \qquad (11)$$

where  $\theta_s = -\pi/2$  and  $\psi$  is defined as earlier. The result in (11) suggests that the *h*-th sub-array needs to introduce a digital delay of  $2(h - 1)N_rR$  time samples, assuming the Nyquist sampling frequency  $f_s = 2BW$ . The considered hybrid array

TABLE I Phase and Delay Tap Settings for Robust Codebook Design

| Array arch. | $\mathbf{w}[m]$                                             | $\phi_{\mathrm{A},n}$ | $	au_{\mathrm{A},n}$ | $\phi_{\mathrm{D},n}$ | $  \tau_{\mathrm{D},n}$ |
|-------------|-------------------------------------------------------------|-----------------------|----------------------|-----------------------|-------------------------|
| Analog TTD  | $\mathbf{w}_{\mathrm{A}}[m]$                                | (7)                   | (8)                  | N/A                   | N/A                     |
| Hybrid TTD  | $\mathbf{w}_{\mathrm{A}}[m]\odot\mathbf{w}_{\mathrm{D}}[m]$ | (9)                   | (10)                 | N/A                   | (11)                    |
| Digital     | $\mathbf{w}_{\mathrm{D}}[m]$                                | N/A                   | N/A                  | (12)                  | (13)                    |

in Fig. 2 does not apply the phase changes in the digital domain. The digital array can create the enhanced codebook by using the following digital taps

$$\phi_{\mathbf{D},n} = (n-1)\Delta\phi, \quad \Delta\phi \in \mathbb{R}$$
(12)

$$\tau_{\mathrm{D},n} = (n-1)R/\mathrm{BW}.\tag{13}$$

The phase tap in (12) implies that the digital array can leverage the DSP to introduce any phase spacing  $\Delta\phi$ . With  $f_s = 2BW$ , the *n*-th antenna branch will introduce the digital delay of 2(n-1)R time samples according to (13).

The phase and delay taps required for the design of a robust codebook are summarized in Table I for all three arrays.

We note that the analog and hybrid TTD architectures have the same limited flexibility of receive combining in beam training. Namely, once their corresponding analog combiners  $\mathbf{w}_{A}[m], m \in \mathcal{M}, \text{ and digital combiners } \mathbf{w}_{D}[m], m \in \mathcal{M}$ are set up, they cannot be further changed or manipulated in DSP. In both architectures, this happens because the signals from different antenna branches are completely or partially combined before passing through ADCs. Thus, the inability to rotate the combiners limits the number of sounded directions to D in both arrays. The diversity order R is also limited, but not necessarily the same in both arrays, as discussed later in the paper. On the other hand, the digital array can exploit digitized signals in all antenna branches and combine them from many different directions in DSP by changing the phases  $\phi_{D,n}$ ,  $\forall n$ . Different phases  $\phi_{D,n}$  introduces angular shifts of the entire codebook, and enable scanning more angles and/or higher diversity.

We use the designed beam training codebook to develop a non-coherent power-based DSP angle estimation algorithm. Non-coherent algorithms are preferred in mmW beam training as they do not require measurements in (3) to include the phase information, and thus they can avoid complex joint synchronization and beam training receiver processing.

Since the subcarriers from  $\mathcal{M}_d$ ,  $\forall d$ , experience different channels, we can consider the received signal in all D probed directions as random. In a clustered multipath channel, the vector of expected powers in D directions  $\mathbf{p} = [p_1, p_2, \dots, p_D]^T$  can be expressed as

$$\mathbf{p} = \mathbf{B}\mathbf{g} + N_{\mathrm{R}}\sigma_{\mathrm{N}}^{2}\mathbf{1},\tag{14}$$

where  $\mathbf{B} \in \mathbb{R}^{D \times Q}$  is a known dictionary obtained by generalizing the UE BF gains in Q angles  $\xi_q$ , q = 1, ..., Q, for all D combiners. The (d, q)-th element of  $\mathbf{B}$  is defined as  $[\mathbf{B}]_{d,q} = |\mathbf{f}_d^{\mathrm{H}} \mathbf{a}_{\mathrm{R}}(\xi_q)|^2$ , where  $\mathbf{a}_{\mathrm{T}}(\xi_q)$  is the receive spatial response introduced in Section III-A. The vector  $\mathbf{g} \in \mathbb{R}^Q$  has only one non-zero element. For a detailed derivation of (14), please refer to Appendix A. During beam training, the estimates of  $p_d$ ,  $\forall d$ , are obtained by averaging out the powers of all subcarriers from the corresponding set  $\mathcal{M}_d$ ,  $\forall d$ , as follows

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS

$$\hat{p}_d = \frac{1}{R} \sum_{m \in \mathcal{M}_d} |Y[m]|^2.$$
 (15)

In fact, it can be shown that the sample mean in (15) is the maximum likelihood (ML) estimator of  $p_d$ ,  $\forall d$ . The vector of all power estimates is denoted as  $\hat{\mathbf{p}}$ , which approximate  $\mathbf{p}$  in (14). Note that  $\hat{\mathbf{p}}$  is estimated using M = DR frequency-domain measurements  $Y[m], \forall m$ , in (3) of only one OFDM pilot. Based on the power measurement model in (14), AoA estimation can be solved based on the ML criterion using simple linear algebra operations. The AoA  $\theta^{(R)}$  estimate is obtained by finding the index of the column in **B** which has the highest correlation with  $\hat{\mathbf{p}}$ , which is mathematically expressed as

$$\hat{\theta}^{(\mathsf{R})} = \xi_{q^{\star}}, \text{ where } q^{\star} = \underset{q}{\operatorname{argmax}} \frac{\hat{\mathbf{p}}^{\mathsf{T}}[\mathbf{B}]_{:,q}}{||[\mathbf{B}]_{:,q}||}.$$
 (16)

The proposed algorithm can achieve high AoA estimation accuracy by increasing Q, i.e., the number of the columns in the dictionary matrix **B**. Although this increases the DSP complexity, the proposed beam training scheme can still be performed with a single OFDM symbol. Note that the accuracy can be negatively affected by hardware impairments, which distort the combiners and thus the elements  $[\mathbf{B}]_{d,q}, \forall d, q$ , when analog or hybrid TTD arrays are used. For the rest of this paper, we use root mean square error (RMSE) of AoA estimation and power consumption as main metrics for the comparison of the proposed TTD architectures. The AoA RMSE closely describes the beam training performance and it can be directly converted to an alternative metric in other applications, including the spectral efficiency in mmW data communication and position error in localization.

#### **IV. ARCHITECTURE PERFORMANCE ANALYSIS**

In this section, we introduce and compare the baseband implementation of analog TTD elements in analog and hybrid TTD architectures. Then we study the impact of limited TTD delay range in both architectures on beam training performance and we explain the interplay between the number of antenna elements  $N_{\rm R}$ , bandwidth BW, and diversity order *R*. We also numerically evaluate the impact of hardware impairments and ADC quantization error on the AoA estimation accuracy.

## A. Baseband Implementation of Analog TTD Front-End

While TTD array operation is conceptually simple, its physical implementation is non-trivial when targeting large delay range. In general, implementing delays with large rangeto-resolution ratios is difficult without severe penalties in linearity, noise, power and area besides increased design complexity. In an array with baseband TTD elements, instead of delaying the down-converted and phase shifted signals from the antennas, sampling and digitization, the signals

BOLJANOVIC et al.: FAST BEAM TRAINING WITH TTD ARRAYS IN WIDEBAND mmW SYSTEMS



Fig. 6. (a) Multiply-and-accumulate in discrete-time for TTD BF [23], [31] (inset: switched-capacitor adder without the opamp). (b) Time-interleaved clock generation unit (inset: example timing diagram). (c) Prototype of a die micrograph in 65nm CMOS with chip-on-board bond wires [23].

are sampled at different time instants through the switchedcapacitor arrays (SCA) circuit, resulting in the same digitized value. Thus, the complexity of delaying signals is shifted to the clock path where precise and calibrated delays can be applied in the advanced semiconductor technology nodes. More importantly, a large delay range-to-resolution ratio can be realized easily. The SCA based implementation requires multiple time-interleaved and delay-compensated phases for formation of the beam as shown in Fig. 6(a) and discussed in detail in [23]. In the sampling phase, the input signal from each channel is first sampled (with delayed clocks) on a sampling capacitor ( $C_S$ ). After the last sampling phase, the stored charges on each capacitor corresponding to each channel (and each time-interleaved phase) are summed to form the beam.

The proposed beam-training algorithm requires wider delay ranges with delay offsets that are integer multiples of  $\Delta \tau$ . This significantly relaxes the design requirement of the SCA and the clock path for TTD-based beam-training. Larger delaybandwidth products can thus be realized using passive SCA whose performance will not be limited by the opamp feedback factor or time-based circuits as demonstrated in our recent work in [24]. Ongoing research is also investigating use of high-linearity and high-speed ring amplifiers [32] in the SCA.

Fig. 6(b) shows the clock generation circuit. The proposed beam-training just requires a time-interleaver applied to the input clock. The output of the time-interleaver is applied to interleaved multiply-and-accumulate units (MAC) in the

 TABLE II

 Analog TTD Array Complexity With Increased Diversity R

| R | $\Delta \tau$ | $\begin{array}{c c} \tau_{\mathrm{A},N_{\mathrm{R}}} \\ \mathrm{Analog} \end{array}$ | $\begin{array}{c c} N_{\rm I} \\ \text{Analog} \end{array}$ | $\begin{array}{c c} \tau_{\mathrm{A},N_{\mathrm{r}}} \\ \mathrm{Hybrid} \end{array}$ | N <sub>I</sub><br>Hybrid |
|---|---------------|--------------------------------------------------------------------------------------|-------------------------------------------------------------|--------------------------------------------------------------------------------------|--------------------------|
| 1 | 0.5 ns        | $7.5\mathrm{ns}$                                                                     | 31                                                          | 1.5 ns                                                                               | 7                        |
| 2 | 1 ns          | $15\mathrm{ns}$                                                                      | 61                                                          | $3\mathrm{ns}$                                                                       | 13                       |
| 4 | 2 ns          | 30 ns                                                                                | 121                                                         | $6\mathrm{ns}$                                                                       | 25                       |

Assumed parameters are  $N_{\rm R} = 16$ ,  $f_{\rm CLK} = 4 \,{\rm GHz}$ ,  ${\rm BW} = 2 \,{\rm GHz}$ . Hybrid TTD array has four 4-element sub-arrays ( $N_{\rm r}$ =4).

SCA  $(=N_{I})$  and enables the SCA to span the required delay range while meeting the Nyquist BW. The same circuit can be extended for data communication with the only addition being a multi-bit phase interpolator (PI) as described in [31]. In Fig. 6(b), the external single-phase clock (CLK) is first fed to a quadrature phase generator circuit. The quadrature outputs (I-, I+, Q-, Q+) of each phase generator are further fed to the S-bit PI. The quadrature output is then applied to a multiplexer (MUX) which helps in spanning the angular range  $(-\pi/2, \pi/2)$ . An example of timing diagram is also shown in Fig. 6(b) with  $N_{\rm R} = 4$  and  $N_{\rm I} = 7$  for R = 1 in a hybrid array. A total of 36 phases are shown at the time-interleaver output with a 12.5% pulse width. Hardware prototype based on 65nm complementary metal-oxide-semiconductor (CMOS) technology for Fig. 6(a) and Fig. 6(b) is presented in Fig. 6(c). It was initially proposed in our previous work in [23], where a 100MHz modulated bandwidth was demonstrated as a proofof-concept. The demonstrated architecture is scalable and it can be further expanded to meet the hardware requirement for higher bandwidth support.

We further analyze the number of interleaving levels that are required in analog and hybrid TTD arrays. Considering  $N_{\rm I}$  as the interleaving factor in the analog TTD array (Fig. 1), the maximum achievable delay compensation  $T_{\rm C-max}$  is

$$T_{\text{C-max}} = (N_{\text{I}} - 1)T_{\text{s}} = (N_{\text{I}} - 1)/f_{\text{s}}$$
 (17)

where  $T_{\rm s}$  and  $f_{\rm s}$  are the reference clock period and sampling frequency respectively. To cover the entire angular range in beam training,  $T_{\rm C-max}$  should be equal to  $\tau_{\rm A, N_R}$ . Substituting (8) in this equality and solving for  $T_{\rm s}$  yields

$$T_{\rm s} = (N_{\rm R} - 1)R/((N_{\rm I} - 1){\rm BW})$$
(18)

Considering a heterodyne receiver architecture and perfect sampled signal reconstruction satisfying the Nyquist condition (i.e.,  $T_s \le 1/(2BW)$ ),  $N_I$  can be derived to be

$$N_{\rm I} \ge 1 + 2R(N_{\rm R} - 1). \tag{19}$$

Equation (19) can be further applied for hybrid arrays substituting  $N_{\rm R}$  with  $N_{\rm R}/N_{\rm H}$ .

Table II shows an example case study of the required number of interleaving stages in the analog/hybrid TTD array as a function of diversity order and the delay range. This table uses (19) with a specific case of 2 GHz bandwidth, 4 GHz sampling frequency, and 16 antenna elements for both the analog and hybrid array presented in Fig. 1 and Fig. 2 respectively.



Fig. 7. Beam training performance comparison of the three considered architectures and the interplay of R,  $N_{\rm R}$ , and BW.

## B. Impact of Limited TTD Delay Range on Beam Training

In this subsection, we assume that the analog and hybrid architectures have TTD elements with the same state-of-the-art maximum delay compensation of  $T_{\text{C-max}} = 15$  ns, or equivalently the same interleaving factor  $N_{\text{I}}$ .

To realize the proposed beam training algorithm,  $\tau_{A,N_R} \leq T_{C-max}$  needs to be satisfied for the analog, and  $\tau_{A,N_r} \leq T_{C-max}$  for the hybrid TTD array. Based on these conditions, it is straightforward to show that the achievable diversity order *R* is limited as

$$1 \le R \le \frac{T_{\text{C-max}}}{N_{\text{R}} - 1} \text{BW} \text{ and } 1 \le R \le \frac{T_{\text{C-max}}}{N_{\text{r}} - 1} \text{BW}, \quad (20)$$

for the analog and hybrid array, respectively. Note that with R < 1, the beam training algorithm cannot be realized with a single OFDM symbol. On the other hand, a large *R* provides more precise ML estimates in (15) due to better averaging. The expressions in (20) describe the dependency of *R* on the basic system parameters  $N_{\rm R}$ ,  $N_{\rm r}$ , and BW. In the remainder of this subsection, we numerically evaluate the interplay among them.

We study the beam training performance of different architectures in terms of AoA estimation accuracy, assuming that R is constrained to be maximal power of 2. We consider a system with carrier frequency  $f_c = 60 \text{ GHz}$ , bandwidth values in the range 0.5 GHz  $\leq$  BW  $\leq$  4.5 GHz, and  $M_{\text{tot}} = 4096$ subcarriers for any bandwidth. The transmitter array size is  $N_{\rm T}$  = 128, while the receive array size can take values  $N_{\rm R} = \{16, 32\}$ . There are  $N_{\rm r} = 4$  antennas in each sub-array in hybrid TTD architecture, regardless of the total number of antennas. The number of probed directions in beam training is assumed to be  $D = 2N_{\rm R}$  and the dictionary size is Q = 1024. The channel consists of L = 3 clusters, where one is 10 dB stronger than the other two. Fading is simulated by 20 rays within each cluster with up to 10 ns spread. There is no intracluster angular spread. Pre-beamforming signal-to-noise ratio (SNR) is defined as SNR  $\triangleq \sum_{l=1}^{L} \sigma_l^2 / \sigma_N^2$ , and it is assume to be SNR =  $-20 \, \text{dB}$ .

In Fig. 7, we present the results for the beam training performance and the interplay of the considered parameters. In both cases  $N_{\rm R} = 16$  and  $N_{\rm R} = 32$ , the analog TTD array architecture has the highest RMSE of AoA estimation due to



Fig. 8. Beam training performance comparison of the three considered architectures under the distorted delay taps  $\tilde{\tau}_n \sim \mathcal{N}\left(\tau_n, \sigma_T^2\right)$ ,  $\forall n$ , and phase taps  $\tilde{\phi}_n \sim \mathcal{N}\left(\phi_n, \sigma_P^2\right)$ ,  $\forall n$ . The curves with the delay error (dashed with stars) and phase error (dashed with diamonds) are associated with the upper and lower x-axis, respectively.

low achievable diversity order R. As discussed earlier, analog arrays have large delay range requirements, and thus better estimation accuracy (equivalently, higher R) requires larger BW. Similarly, increasing the array size  $N_{\rm R}$  can have a positive effect on the performance. However, if BW is not large enough and there is no diversity (R = 1), larger arrays do not improve the estimation accuracy in frequency-selective channels. The analog arrays do not have the results for the values of BW for which the proposed single-shot beam training cannot be realized (R < 1). In hybrid TTD arrays, higher diversity orders can be utilized since  $N_r < N_R$ , which leads to better estimation accuracy compared to analog arrays. Increase in the number of antenna elements does not change achievable R in hybrid arrays since we assume that  $N_r = 4$  remains constant. It does, however, improve the estimation accuracy of hybrid arrays, which approaches the sub-degree performance of fully digital arrays. Since R can be maximized through DSP in digital arrays, their performance is independent of BW. The floor of the AoA RMSE is determined by the dictionary size Q =1024. Based on described results in Fig. 7, one can predict the diversity order R and beam training performance for any considered array architecture, given the system parameters BW,  $N_{\rm R}$ , and  $T_{\rm C-max}$ .

## C. Impact of TTD Hardware Impairments on Beam Training

Next, we study the impact of practical TTD hardware impairments and ADC quantization errors on beam training in all considered architectures. Here we keep AoA RMSE as the performance metric and use the same system parameters as in the previous subsection. We consider a specific case with  $N_{\rm R} = 16$  and BW = 2 GHz.

In Fig. 8, we study the beam training performance under the phase and delay errors. Unlike analog and hybrid TTD arrays, fully digital array is not sensitive to these hardware impairments and we include its performance with the maximum R = 32 as the benchmark. With the considered system parameters, analog TTD array has the diversity order R = 2, which limits its angle estimation accuracy and robustness to



Fig. 9. Beam training performance comparison of the three considered architectures under different ADC resolutions.

hardware errors. We can see that the beam training algorithm can tolerate phase errors with the standard deviation of up to  $\sigma_P = 10^\circ$  and delay errors with the standard deviation of up to  $\sigma_T = 50$  ps. Hybrid TTD array achieves a lower estimation accuracy and greater robustness to delay and phase errors than analog TTD array since it leverages the diversity order R = 8in beam training. It can tolerate large phase errors and delay errors with the standard deviation of around  $\sigma_T = 200$  ps. It is worth noting that the delay errors in hybrid arrays are independent of the reduced delay taps in the corresponding TTD elements.

In Fig. 9, we present how finite ADC resolution affects the beam training performance with different array architectures. For fair comparison, we assume that the automatic gain control (AGC) outputs a unit-variance signal in all architectures. We can observe that the AoA estimation accuracy of the analog TTD array with a single RF-chain is marginally affected by low ADC resolution. On the other hand, low resolution ADCs have a noticeable impact on beam training with the hybrid TTD and fully digital arrays, as combined quantization errors from different RF-chains deteriorate the estimation accuracy. We note, however, that the deteriorated accuracy is still within the sub-degree range and lower than that of the analog array. Our results indicate that practical mmW and sub-THz transceivers may require ADCs with only a few bits of resolution for effective beam training. For example, with only 3-bit resolution, the performance loss is negligible in any array. Low-resolution ADCs have a positive impact on the overall power efficiency of the considered TTD architectures, as discussed in the next section.

#### V. POWER ANALYSIS OF TTD ARCHITECTURES

This section presents power analysis of the analog and hybrid TTD arrays comparing it with a digital array for the proposed mmW beam training algorithm in Section III-B. We will estimate the power consumption of the baseband components in the signal chain in Fig. 1, Fig. 2, and Fig. 3 for the analog, hybrid, and digital arrays assuming the mmW front-end consumes the same power in all the three array architectures. The only exception to this assumption in the front-ends of the three array architectures is the phase-shifter. In the analog/hybrid TTD array, the phase shifter precedes the downconverting mixer whereas for the digital array it

TABLE III STATE-OF-THE-ART LOW-RESOLUTION GHZ ADCS

| Parameters              | [36] | [37]  | [38]  |
|-------------------------|------|-------|-------|
| Sampling Rate (fs)(MHz) | 2500 | 2000  | 5000  |
| ENOB (bit)              | 6    | 7.93  | 4.06  |
| Power (µW)              | 7500 | 21000 | 78000 |
| FoM (fJ/c-s)            | 74.7 | 119   | 94.6  |
| Technology (nm)         | 65   | 65    | 65    |

can be implemented after the ADC. To minimize the power discrepancy in the mmW analog/hybrid and digital arrays, we consider an LO phase-shifter for the analog and hybrid arrays as shown in Fig. 1 and Fig. 2, respectively. With the LO phase-shifter, the mixer for the three arrays can be designed to have a flat conversion gain (loss) across a wide range of the LO driving power as shown in [33]. To simplify the analysis, we assume the power consumption of the LO phase-shifter for the fully digital array is similar to that of the analog and hybrid TTD arrays. The estimation methodology for the remaining components of the hybrid and digital arrays follows that of the analog TTD array as described in the following subsections. For each component, we also have provided an example based on Table II.

## A. Power Consumption of Analog/Hybrid TTD Array

This subsection estimates the power consumption of the ADC, AGC, SCA, and the time-interleaving blocks in the analog/hybrid TTD arrays.

1) Analog-to-Digital Converter (ADC): We estimate the ADC power consumption using the figure-of-merit (FoM) derived from recent works on low-resolution high-speed ADCs (different ADC configuration can be selected when considering efficiency). Using the FoM from Table III as well as several publications in recent three years from the survey data in [34], we estimate the average FoM as 96.5fJ/c-s. For a 3-bit ENOB,  $f_s = 4$ GHz and a FoM of 96.5fJ/c-s, the estimated power is thus 3.09mW. Several factors including the bandwidth, sampling frequency, and resolution influence the ADC power consumption. Thus, for fair comparison and to avoid architectural changes due to technology scaling and process variations, we have adopted the FoM-based estimation using the survey data in [34]. The same survey results are used to choose the Walden FoM for our proposed beam training approach. By adopting the FoM-based analysis, we can provide a higher level comparison that is independent of ADC architecture.

In addition to the ADC power consumption, we also estimate the deserializer power that is needed to interface the high-speed ADCs with the backend DSP. Though insignificant for analog and hybrid arrays, it will be an important contributor for digital arrays. We consider here the DSP operating at 1GHz and estimate the deserializer power consumption. From [35], excluding the power of clock generator, the scaled deserializer power for one unit ( $P_{\text{DESo}}$ ) is found to be 0.512 mW (=  $3.2 \times 4/25$ ) which yields 1.5mW and 6mW of power consumption in analog and hybrid array respectively.

2) Switched-Capacitor Array (SCA): The SCA power consumption is dominated mostly by the feedback operational

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS

transconductance amplifier (OTA). We estimate the OTA power consumption for an analog array similar to the method in [23]. The DC gain (A0) and the unity-gain bandwidth ( $\omega_u$ ) requirements of the OTA used in the SCA are found to be:

$$\omega_{\rm u} = 2\ln(2)N_{\rm R}(x+1)f_{\rm s}$$

where x is the ADC resolution and  $f_s$  is the ADC sampling frequency.

The normalized unity-gain bandwidth ( $\omega_{u0}$ ) per unit sampling frequency can be written as:  $\omega_{u0} = 2 \ln(2)(x + 1)$ . For a 3-bit ADC (referring Fig. 9), the normalized unitygain frequency  $\omega_{u0} = 2 \ln(2)(3 + 1) = 5.54$  Hz. Neglecting parasitics, second order effects, and considering a two-stage internally compensated OTA, the transconductance of this OTA can be designed to be linearly dependent to the DC current. As a result, the DC gain of the OTA is independent of its DC current and power consumption  $P_{\text{OTA}}$ . At the same time,  $\omega_u$  is a linear function of the OTA transconductance, and thus varies proportionally to  $P_{\text{OTA}}$ . Given these assumptions, the minimum requirement on the OTA  $\omega_u$  results in linear dependency of  $P_{\text{OTA}}$  to the product of the number of antennas and sampling frequency, as shown below [23]:

## $P_{\rm OTA} \approx P_{\rm OTAo} N_{\rm R} f_{\rm s}$

where  $P_{\text{OTAo}}$  is the power consumption of an OTA designed for a single-element array with unit sampling frequency (1Hz). Solving for a 60° phase margin (PM) requirement puts  $C_c$ close to 0.22 pF yielding  $g_{mn} = 0.22 \text{ pF} \times 5.54 = 1.2188 \text{ ps.}$ Assuming  $g_{\rm m}/I_{\rm D}=15$ , the unit current can be obtained as  $I_{\text{Dn}} = 8.1253 \times 10^{-14}$  A. For a 60° PM, the  $g_{\text{mn}}$  for the second stage is around 10 times of the first stage and we further assume the same  $g_{\rm m}/I_{\rm D}$  ratio. The total current is thus  $(2 + 10)I_{\text{Dn}} = 9.7504 \times 10^{-13}$ A. Assume a 1V supply, the  $P_{\text{OTAo}}$  can be estimated as  $9.7504 \times 10^{-13}$ W. For the 16-antenna array and  $f_s = 4$ GHz (Table II), the estimated power consumption is thus 62.403 mW. Note that the hybrid TTD array relaxes the OTA power consumption per subarray where  $P_{\text{OTA}}$  is scaled by  $N_{\text{R}}/N_{\text{H}}$ . The same power consumption estimation however applies to a digital array without any relaxation.

The power consumption of the AGC can also be estimated using,  $P_{AGC}$ . Assuming  $P_{AGC}$  consumes the same power as the OTA, the estimated total  $P_{AGC}$  is also equal to 3.9mW for analog arrays and 15.6mW for the hybrid array following the design specifications in Table II.

3) *Time-Interleaver*: The power consumption for the time interleaver can be estimated as [39]:

$$P_{\text{TINW}} = f_{\text{s}}/N_{\text{I}} \times N_{\text{I}} \times (C_{\text{sw}}/N_{\text{I}} + C_{\text{int}}) \times \text{VDD}^2$$

where  $C_{sw}$  is the switch capacitance and  $C_{int}$  is the interconnect parasitic capacitance. For a sampling frequency of 4GHz and 1V supply,  $C_{sw} = 2.5 \text{pF}$ ,  $C_{int} = 0.6 \text{pF}$  [39], 31 levels of time interleaving in analog array, and 7 levels of interleaving in a hybrid array, the estimated power consumption of the time interleaver is 2.7mW and 3.8mW for the analog and digital hybrid arrays respectively.

TABLE IV Power Estimation Methodology for TTD Arrays

| # Components       | Analog                                    | Hybrid                                                 | Digital                                         |
|--------------------|-------------------------------------------|--------------------------------------------------------|-------------------------------------------------|
| ADC                | 1                                         | $N_{\rm H}$                                            | $N_{\rm R}$                                     |
| SCA/AGC            | 1                                         | $N_{\rm H}$                                            | $N_{\rm R}$                                     |
| P <sub>SCA</sub>   | $P_{\text{OTAo}}N_{\text{R}}f_{\text{s}}$ | $P_{\text{OTAo}}N_{\text{R}}/N_{\text{H}}f_{\text{s}}$ | $P_{\text{OTAo}}N_{\text{R}}f_{\text{s}}$       |
| $P_{ADC_x-bit}$    | FoM based estimation                      |                                                        |                                                 |
| $P_{AGC}$          | $P_{\text{OTAo}}f_{\text{s}}$             | $P_{\mathrm{OTAo}}N_{\mathrm{H}}f_{\mathrm{s}}$        | $P_{\mathrm{OTAo}}N_{\mathrm{R}}f_{\mathrm{s}}$ |
| $P_{\text{DeSer}}$ | $P_{\text{DESo}}f_{\text{s}}x$            | $P_{\text{DESo}}N_{\text{H}}x$                         | $P_{\text{DESo}}N_{\text{R}}x$                  |



Fig. 10. Comparison of analog (A), hybrid (H), and digital (D) architectures in terms of power consumption for  $N_R = \{16, 32\}$  and  $BW = \{2, 4\}GHz$ .

## B. Power Consumption of Digital Array

The estimated power consumption of the digital array can be derived following a similar approach to the analog arrays with the important consideration that the proposed beam training algorithm will require only integer delays at the ADC sampling frequency. For operation in communication mode, fractional-rate samplers will be needed as detailed in [40]. In addition to the same number of ADCs, AGCs and filters as in an analog TTD array, the digital array consumes higher power at the ADC-DSP interface primarily due to the need for de-serializing the high-speed ADC output. For example, with 16-elements and 3-bit per ADC, the estimated power consumption of the deserializer will be 24.6mW.

## C. Comparison of Estimated Array Power Consumption

Table IV summarizes the required number of components and power consumption in the analog, hybrid, and digital arrays based on the architectures in Fig. 1, Fig. 2, and Fig. 3, respectively. Fig. 10 illustrates the introduced power estimation methodology with a breakdown of individual components for the analog and hybrid TTD arrays and also the benchmark digital array. The estimated power consumption for each component block is described in the previous subsections for each array architecture. The analog array provides high energy efficiency as compared to the hybrid TTD and digital arrays. However, the increasing bandwidth and the number of elements require larger unity-gain bandwidth OTAs, which increases design complexity for higher diversity orders. The need for higher unity-gain bandwidths is further constrained with the increasing number of feedback to the OTA's virtual ground, routing losses, and crosstalk. Future work will

investigate the design of analog and hybrid arrays with a higher number of antenna elements per sub-array using passive SCA that leverages reasonably lower resolutions required by the designed beam training algorithm. Interested readers can also refer to the state-of-the-art beamformers based on analog [41], [42], and hybrid [43]–[45] arrays to estimate the mmW frontend power consumption.

## VI. CONCLUSIONS AND FUTURE WORK

This work introduced and analyzed two TTD architectures with large delay-bandwidth product baseband delay elements as potential candidates for mmW beam training. We demonstrated that a high AoA estimation accuracy can be achieved with both proposed TTD architectures using a power measurement based beam training scheme, which requires only one wideband training pilot. The dependency of the codebook design and beam training performance on system parameters, including the bandwidth, number of antenna elements, and maximum TTD delay compensation, was analyzed and numerically evaluated in a practical multipath fading channel. Detailed analysis of the angle estimation accuracy, robustness to hardware impairments, and power consumption, revealed the trade-offs between the proposed TTD architectures when benchmarked against the digital array. The analog TTD array consumes 66% less power than the digital array, but it achieves a higher angle estimation error. The hybrid TTD array has a comparable beam training performance and 25% lower power consumption than the digital array. The results on how power consumption scales with the key system parameters, including the bandwidth and array size, provided an insight into the beam training design for future mmW and sub-THz systems. Future work will include array implementations supporting larger delay-bandwidth products for arrays with higher number of antenna elements, as well as channel estimation and identification of multiple AoAs in interference-limited networks.

#### APPENDIX A

#### DERIVATION OF EXPECTED POWERS IN D DIRECTIONS

As assumed in Section III-A, the channel gains  $G_l[k]$ ,  $\forall l, k$ , are independent across different clusters and frequency subbands. Thus, with  $Q \gg L$  and a negligible approximation error, the channel in (2) can be considered as one frequency domain realization of the following channel matrix

$$\mathbf{H} = \mathbf{A}_{\mathrm{R}} \Lambda \mathbf{A}_{\mathrm{T}}^{\mathrm{H}}.$$
 (21)

The square matrix  $\Lambda \in \mathbb{C}^{Q \times Q}$  has only *L* non-zero elements that correspond to the cluster gains  $G_l$ ,  $\forall l$ .

With the codebook design described in Section III-B, the received signal in any probed direction d can be considered as a zero-mean complex Gaussian random variable and expressed as

$$Y_d = \mathbf{f}_d^{\mathsf{H}} \mathbf{H} \mathbf{v} + \mathbf{f}_d^{\mathsf{H}} \mathbf{n}, \qquad (22)$$

where  $\mathbf{n} \sim C\mathcal{N}(0, \sigma_{N}^{2}\mathbf{I}_{R})$  is white Gaussian noise. The realizations of (22) are received symbols  $Y[m], m \in \mathcal{M}_{d}$ . The expected received signal power in direction *d* is  $p_{d} = \mathbb{E}[|Y_{d}|^{2}] = \mathbb{E}[(\mathbf{f}_{d}^{H}\mathbf{H}\mathbf{v}M^{-1/2} + \mathbf{f}_{d}^{H}\mathbf{n})^{H}(\mathbf{f}_{d}^{H}\mathbf{H}\mathbf{v}M^{-1/2} + \mathbf{f}_{d}^{H}\mathbf{n})^{H}(\mathbf{h}_{d}^{H}\mathbf{n})^{H}(\mathbf{h}_{d}^{H}\mathbf{n})^{H}(\mathbf{h}_{d}^{H}\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n})^{H}(\mathbf{n}$ 

 $\mathbf{f}_d^{\mathrm{H}}\mathbf{n}$ )]. Based on the channel model in (21), it can be shown that

$$p_d = M^{-1} \mathbb{E} \left[ \mathbf{v}^{\mathrm{H}} \mathbf{A}_{\mathrm{T}} \Lambda^* \mathbf{A}_{\mathrm{R}}^{\mathrm{H}} \mathbf{f}_d \mathbf{f}_d^{\mathrm{H}} \mathbf{A}_{\mathrm{R}} \Lambda \mathbf{A}_{\mathrm{T}}^{\mathrm{H}} \mathbf{v} \right] + \mathbb{E} \left[ \mathbf{n}^{\mathrm{H}} \mathbf{f}_d \mathbf{f}_d^{\mathrm{H}} \mathbf{n} \right].$$
(23)

We apply the trace operator Tr() to (23) and exploit its linearity and cyclic property to obtain

$$p_{d} = M^{-1} \mathbb{E} \left[ \operatorname{Tr} \left( \Lambda \mathbf{A}_{\mathrm{T}}^{\mathrm{H}} \mathbf{v} \mathbf{v}^{\mathrm{H}} \mathbf{A}_{\mathrm{T}} \Lambda^{*} \mathbf{A}_{\mathrm{R}}^{\mathrm{H}} \mathbf{f}_{d}^{\mathrm{H}} \mathbf{A}_{\mathrm{R}} \right) \right] + N_{\mathrm{R}} \sigma_{\mathrm{N}}^{2}$$
$$= \operatorname{Tr} \left( \mathbf{G} \mathbf{A}_{\mathrm{R}}^{\mathrm{H}} \mathbf{f}_{d}^{\mathrm{H}} \mathbf{f}_{d}^{\mathrm{H}} \mathbf{A}_{\mathrm{R}} \right) + N_{\mathrm{R}} \sigma_{\mathrm{N}}^{2}.$$
(24)

where  $\mathbf{G} = \mathbb{E} \left[ \Lambda \mathbf{A}_{\mathrm{T}}^{\mathrm{H}} \mathbf{v} \mathbf{v}^{\mathrm{H}} \mathbf{A}_{\mathrm{T}} \Lambda^{*} \right]$ . Since  $\Lambda$  and  $\Lambda^{*}$  are sparse matrices,  $\Lambda \mathbf{A}_{\mathrm{T}}^{\mathrm{H}} \mathbf{v} \mathbf{v}^{\mathrm{H}} \mathbf{A}_{\mathrm{T}} \Lambda^{*}$  yields another sparse  $Q \times Q$  matrix with  $L^{2}$  non-zero elements. There are L non-zero elements of the form  $|G_{l}|^{2} |\mathbf{a}_{\mathrm{T}}^{\mathrm{H}}(\theta_{l}^{(T)})\mathbf{v}|^{2}$ ,  $\forall l$ , on the main diagonal. The L(L-1) off-diagonal elements are cross terms  $G_{l_{1}}G_{l_{2}}^{*}\mathbf{a}_{\mathrm{T}}^{\mathrm{H}}(\theta_{l_{1}}^{(T)})\mathbf{v}\mathbf{v}^{\mathrm{H}}\mathbf{a}_{\mathrm{T}}(\theta_{l_{2}}^{(T)})$ ,  $\forall l_{1}, l_{2}$ . Thus,  $\mathbf{G}$  is a diagonal matrix with L non-zero elements  $\sigma_{l}^{2} |\mathbf{a}_{\mathrm{T}}^{\mathrm{H}}(\theta_{l}^{(T)})\mathbf{v}|^{2}$ ,  $\forall l$ , since  $\mathbb{E} \left[ G_{l_{1}}G_{l_{2}}^{*} \right] = 0$ ,  $\forall l_{1} \neq l_{2}$ , and  $\mathbb{E} \left[ |G_{l}|^{2} \right] = \sigma_{l}^{2}$ ,  $\forall l$ . The product of  $\mathbf{G}$  and the matrix of the UE BF gains  $\mathbf{A}_{\mathrm{R}}^{\mathrm{H}}\mathbf{f}_{d}\mathbf{f}_{d}^{\mathrm{H}}\mathbf{A}_{\mathrm{R}}$  is a  $Q \times Q$  matrix whose diagonal elements are equal to  $|\mathbf{f}_{d}^{\mathrm{H}}\mathbf{a}_{\mathrm{R}}(\xi_{q})|^{2}[\mathbf{G}]_{q,q}$ , so (24) becomes

$$p_d = \mathbf{b}_d^T \mathbf{g} + N_{\mathrm{R}} \sigma_{\mathrm{N}}^2 \tag{25}$$

where  $\mathbf{b}_d = [|\mathbf{f}_d^{\mathrm{H}} \mathbf{a}_{\mathrm{R}}(\xi_1)|^2, |\mathbf{f}_d^{\mathrm{H}} \mathbf{a}_{\mathrm{R}}(\xi_2)|^2, \dots, |\mathbf{f}_d^{\mathrm{H}} \mathbf{a}_{\mathrm{R}}(\xi_Q)|^2]^T$ and  $\mathbf{g} = \operatorname{diag}(\mathbf{G})$ . By vectorizing the result in (25), we obtain

$$\mathbf{p} = \mathbf{B}\mathbf{g} + N_{\mathrm{R}}\sigma_{\mathrm{N}}^{2}\mathbf{1},\tag{26}$$

where  $\mathbf{p} = [p_1, p_2, \dots, p_D]^T$  and  $\mathbf{B} = [\mathbf{b}_1, \mathbf{b}_2, \dots, \mathbf{b}_D]^T$ . Since the BS provides a large BF gain with the fixed precoder  $\mathbf{v}$ , we can assume that receiver array sees only one spatially filtered dominant cluster, e.g., the first one. Consequently, there is only one non-zero element in  $\mathbf{g}$  equal to  $|\mathbf{a}_{\mathrm{T}}^{\mathrm{H}}(\theta_1^{\mathrm{(T)}})\mathbf{v}|^2\sigma_1^2$ .

#### REFERENCES

- J. G. Andrews *et al.*, "What will 5G be?" *IEEE J. Sel. Areas Commun.*, vol. 32, no. 6, pp. 1065–1082, Jun. 2014.
- [2] T. S. Rappaport *et al.*, "Overview of millimeter wave communications for fifth-generation (5G) wireless networks—With a focus on propagation models," *IEEE Trans. Antennas Propag.*, vol. 65, no. 12, pp. 6213–6230, Dec. 2017.
- [3] K. Witrisal *et al.*, "High-accuracy localization for assisted living: 5G systems will turn multipath channels from foe to friend," *IEEE Signal Process. Mag.*, vol. 33, no. 2, pp. 59–70, Mar. 2016.
- [4] K. Hosoya *et al.*, "Multiple sector ID capture (MIDC): A novel beamforming technique for 60-GHz band multi-gbps WLAN/PAN systems," *IEEE Trans. Antennas Propag.*, vol. 63, no. 1, pp. 81–96, Jan. 2015.
- [5] C. Jeong, J. Park, and H. Yu, "Random access in millimeter-wave beamforming cellular networks: Issues and approaches," *IEEE Commun. Mag.*, vol. 53, no. 1, pp. 180–185, Jan. 2015.
- [6] J. Kim and A. F. Molisch, "Fast millimeter-wave beam training with receive beamforming," J. Commun. Netw., vol. 16, no. 5, pp. 512–522, Oct. 2014.
- [7] L. Zhou and Y. Ohashi, "Efficient codebook-based MIMO beamforming for millimeter-wave WLANs," in *Proc. IEEE 23rd Int. Symp. Pers.*, *Indoor Mobile Radio Commun.-(PIMRC)*, Sep. 2012, pp. 1885–1889.
- [8] D. Zhang *et al.*, "Beam allocation for millimeter-wave MIMO tracking systems," *IEEE Trans. Veh. Technol.*, vol. 69, no. 2, pp. 1595–1611, Feb. 2020.
- [9] H. Yan and D. Cabric, "Compressive initial access and beamforming training for millimeter-wave cellular systems," *IEEE J. Sel. Topics Signal Process.*, vol. 13, no. 5, pp. 1151–1166, Sep. 2019.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS

- [10] M. Bajor *et al.*, "A flexible phased-array architecture for reception and rapid direction-of-arrival finding utilizing pseudo-random antenna weight modulation and compressive sampling," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1315–1328, May 2019.
- [11] A. Ali, N. Gonzalez-Prelcic, and R. W. Heath, "Millimeter wave beamselection using out-of-band spatial information," *IEEE Trans. Wireless Commun.*, vol. 17, no. 2, pp. 1038–1052, Feb. 2018.
- [12] J. Choi, V. Va, N. Gonzalez-Prelcic, R. Daniels, C. R. Bhat, and R. W. Heath, "Millimeter-wave vehicular communication to support massive automotive sensing," *IEEE Commun. Mag.*, vol. 54, no. 12, pp. 160–167, Dec. 2016.
- [13] V. Desai et al., "Initial beamforming for mmwave communications," in Proc. 48th Asilomar Conf. Signals, Syst. Comput., Nov. 2014, pp. 1926–1930.
- [14] C. N. Barati, S. A. Hosseini, S. Rangan, P. Liu, T. Korakis, and S. S. Panwar, "Directional cell search for millimeter wave cellular systems," in *Proc. IEEE 15th Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC)*, Jun. 2014, pp. 120–124.
- [15] C. N. Barati *et al.*, "Initial access in millimeter wave cellular systems," *IEEE Trans. Wireless Commun.*, vol. 15, no. 12, pp. 7926–7940, Dec. 2016.
- [16] S. Kalia, S. A. Patnaik, B. Sadhu, M. Sturm, M. Elbadry, and R. Harjani, "Multi-beam spatio-spectral beamforming receiver for wideband phased arrays," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 8, pp. 2018–2029, Aug. 2013.
- [17] C.-Y. Yeh, T.-C. Chu, C.-E. Chen, and C.-H. Yang, "A hardware-scalable DSP architecture for beam selection in mm-wave MU-MIMO systems," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 11, pp. 3918–3928, Nov. 2018.
- [18] S. Blandino et al., "Multi-user hybrid MIMO at 60 GHz using 16antenna transmitters," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 2, pp. 848–858, Feb. 2019.
- [19] N. J. Karl *et al.*, "Frequency-division multiplexing in the terahertz range using a leaky-wave antenna," *Nature Photon.*, vol. 9, no. 11, p. 717, 2015.
- [20] Y. Ghasempour, C.-Y. Yeh, R. Shrestha, D. Mittleman, and E. Knightly, "Single shot single antenna path discovery in THz networks," in *Proc.* 26th Annu. Int. Conf. Mobile Comput. Netw., New York, NY, USA, Apr. 2020, pp. 1–13, doi: 10.1145/3372224.3380895.
- [21] H. Yan, V. Boljanovic, and D. Cabric, "Wideband millimeterwave beam training with true-time-delay array architecture," in *Proc. 53rd Asilomar Conf. Signals, Syst., Comput.*, Nov. 2019, pp. 1447–1452.
- [22] T.-S. Chu and H. Hashemi, "A true-time-delay-based bandpass multibeam array at mm-waves supporting instantaneously wide bandwidths," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2010, pp. 38–39.
- [23] E. Ghaderi, A. Sivadhasan Ramani, A. A. Rahimi, D. Heo, S. Shekhar, and S. Gupta, "An integrated discrete-time delay-compensating technique for large-array beamformers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 9, pp. 3296–3306, Sep. 2019.
- [24] E. Ghaderi, C. Puglisi, S. Bansal, and S. Gupta, "10.8 A 4-element 500 MHz-modulated-BW 40 mW 6b 1GS/s analog-time-to-digitalconverter-enabled spatial signal processor in 65nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 186–188.
- [25] V. Boljanovic, H. Yan, E. Ghaderi, D. Heo, S. Gupta, and D. Cabric, "Design of millimeter-wave single-shot beam training for true-timedelay array," in *Proc. IEEE 21st Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC)*, May 2020, pp. 1–5.
- [26] A. S. Y. Poon and M. Taghivand, "Supporting and enabling circuits for antenna arrays in wireless communications," *Proc. IEEE*, vol. 100, no. 7, pp. 2207–2218, Jul. 2012.
- [27] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, "An overview of signal processing techniques for millimeter wave MIMO systems," *IEEE J. Sel. Topics Signal Process.*, vol. 10, no. 3, pp. 436–453, Apr. 2016.
- [28] S. Jaeckel, L. Raschkowski, K. Borner, and L. Thiele, "QuaDRiGa: A 3-D multi-cell channel model with time evolution for enabling virtual field trials," *IEEE Trans. Antennas Propag.*, vol. 62, no. 6, pp. 3242–3256, Jun. 2014.
- [29] S. Jaeckel, L. Raschkowski, K. Borner, L. Thiele, F. Burkhardt, and E. Eberlein, "QuaDRiGa–Quasi Deterministic Radio Channel Generator, User Manual and Documentation," Fraunhofer Heinrich Hertz Inst., Berlin, Germany, Tech. Rep. v2.2.0, 2019.

- [30] H. Yan, S. Ramesh, T. Gallagher, C. Ling, and D. Cabric, "Performance, power, and area design trade-offs in millimeter-wave transmitter beamforming architectures," *IEEE Circuits Syst. Mag.*, vol. 19, no. 2, pp. 33–58, 2nd Quart., 2019.
- [31] E. Ghaderi, A. S. Ramani, A. A. Rahimi, D. Heo, S. Shekhar, and S. Gupta, "Four-element wide modulated bandwidth MIMO receiver with >35-dB interference cancellation," *IEEE Trans. Microw. Theory Techn.*, vol. 68, no. 9, pp. 3930–3941, Sep. 2020.
- [32] B. Hershberg, S. Weaver, K. Sobue, S. Takeuchi, K. Hamashita, and U.-K. Moon, "Ring amplifiers for switched capacitor circuits," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 2928–2942, Dec. 2012.
- [33] J. Tsai et al., "Ultra-low-LO-power X-band down-conversion ring mixer using weak-inversion biasing technique," *Electron. Lett.*, vol. 54, no. 3, pp. 130–132, Feb. 2018.
- [34] B. Murmann. (2020). ADC Performance Survey 1997-2020. [Online]. Available: http://web.stanford.edu/~murmann/adcsurvey.html
- [35] J. W. Jung and B. Razavi, "A 25-Gb/s 5-mW CMOS CDR/deserializer," *IEEE J. Solid-State Circuits*, vol. 48, no. 3, pp. 684–697, Mar. 2013.
- [36] D. Oh et al., "A 65-nm CMOS 6-bit 2.5-GS/s 7.5-mW 8×timedomain interpolating flash ADC with sequential slope-matching offset calibration," *IEEE J. Solid-State Circuits*, vol. 54, no. 1, pp. 288–297, Jan. 2019.
- [37] S. Zhu, B. Wu, Y. Cai, and Y. Chiu, "A 2-GS/s 8-bit non-interleaved time-domain flash ADC based on remainder number system in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 4, pp. 1172–1183, Apr. 2018.
- [38] C.-H. Chan, Y. Zhu, S.-W. Sin, U. Seng-Pan, R. P. Martins, and F. Maloberti, "A 7.8-mW 5-b 5-GS/s dual-edges-triggered time-based flash ADC," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 8, pp. 1966–1976, Aug. 2017.
- [39] B. Razavi, "Design considerations for interleaved ADCs," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1806–1817, Aug. 2013.
- [40] S. Jang, R. Lu, J. Jeong, and M. P. Flynn, "A 1-GHz 16-element fourbeam true-time-delay digital beamformer," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1304–1314, May 2019.
- [41] A. Chakrabarti et al., "A 64 Gb/s 1.4 pJ/b/element 60 GHz 2×2-element phased-array receiver with 8b/symbol polarization MIMO and spatial interference tolerance," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 84–86.
- [42] R. Garg et al., "4.3 A 28 GHz 4-element MIMO beam-space array in 65 nm CMOS with simultaneous spatial filtering and single-wire frequency-domain multiplexing," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2020, pp. 80–82.
- [43] J. D. Dunworth *et al.*, "A 28 GHz bulk-CMOS dual-polarization phasedarray transceiver with 24 channels for 5G user and basestation equipment," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 70–72.
- [44] H.-C. Park et al., "4.1 A 39 GHz-band CMOS 16-channel phased-array transceiver IC with a companion dual-stream IF transceiver IC for 5G NR base-station applications," in *IEEE Int. Solid-State Circuits Conf.* (*ISSCC*) Dig. Tech. Papers, Feb. 2020, pp. 76–78.
- [45] M.-Y. Huang, T. Chi, F. Wang, T.-W. Li, and H. Wang, "A 23-to-30 GHz hybrid beamforming MIMO receiver array with closed-loop multistage front-end beamformers for full-FoV dynamic and autonomous unknown signal tracking and blocker rejection," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 68–70.



Veljko Boljanovic (Student Member, IEEE) received the B.S. and M.S. degrees in electrical and computer engineering from the University of Novi Sad, Novi Sad, Serbia, in 2015 and 2016, respectively. He is currently pursuing the Ph.D. degree with the University of California at Los Angeles, Los Angeles, CA, USA. His research interests include system design, network performance optimization, and digital signal processing in wireless millimeter-wave communications. He was a recipient of the Electrical and Computer Engineering

Department Fellowship at the University of California at Los Angeles in 2017.



Han Yan (Member, IEEE) received the B.E. degree from Zhejiang University, Hangzhou, China, in 2013, and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of California at Los Angeles (UCLA) in 2015 and 2020, respectively. He has broad research interests in signal processing and communication system design for millimeter-wave mobile networks, cooperative unmanned aerial vehicles networks and dynamic spectrum sharing radios. He was a recipient of the UCLA Dissertation Year Fellowship in 2018,

the Qualcomm Innovation Fellowship in 2019, the UCLA ECE Distinguished Ph.D. Dissertation Award in 2020, and the Best Paper Award at the 2020 ACM mmNets workshop.



Deukhyoun Heo (Senior Member, IEEE) received the B.S. degree in electrical engineering from Kyoungpuk National University, South Korea, in 1989, the M.S. degree in electrical engineering from the Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 1997, and the Ph.D. degree in electrical and computer engineering from the Georgia Institute of Technology, Atlanta, in 2000. In 2000, he joined the National Semiconductor Corporation, where he was a Senior Design Engineer. In the Fall of 2003, he joined the

faculty of the Electrical Engineering and Computer Science Department, Washington State University, Pullman, where he is currently the Frank Brands Analog Distinguished Professor of electrical engineering. He has authored or coauthored approximately 165 publications, including 80 peerreviewed journal articles and 85 international conference papers. He has primarily been interested in mm-wave/sub-THz transceiver for wireless and wireline data communications, wireless sensors and power management systems, beamformers for phased-array communications, and low-power wireless links for biomedical applications. He has been a member of the Technical Committee of the IEEE Microwave Theory and Techniques Society (IEEE MTT-S). He was a recipient of the 2000 Best Student Paper Award presented at the IEEE MTT-S IMS and the 2009 National Science Foundation (NSF) CAREER Award. He has served as an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: EXPRESS BRIEFS and the IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES. He is also serving as an Associate Editor for the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS and the IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS.



Chung-Ching Lin (Student Member, IEEE) received the M.S. degree in communication engineering from Yun Ze University, Taoyuan, Taiwan, in 2014. He is currently pursuing the Ph.D. degree with Washington State University, Pullman, WA, USA. His current research interests include low-power and wideband multi-antenna transceivers design. He was a recipient of the IEEE CICC Educational Grants Award in 2020, the IEEE CAS Travel Award in 2019, the Southern Methodist University Graduate Student Travel Grant in 2018,

and Yu-Ziang Academic Scholarship in 2013. He is also the IEEE RFIC Symposium Best Student Paper Award Nominee (out of 12 finalists) in 2020.



Subhanshu Gupta (Senior Member, IEEE) received the B.E. degree from the National Institute of Technology (NIT), Trichy, India, in 2002, and the M.S. and Ph.D. degrees from the University of Washington in 2006 and 2010, respectively.

He has held industrial positions at Maxlinear, Irvine, CA, USA, where he worked on wideband transceivers for SATCOM and infrastructure applications. He is currently an Assistant Professor of electrical engineering and computer science with Washington State University. His research interests

include large-scale phased arrays and wideband transceivers, low-power time-domain circuits and systems, and statistical hardware optimization for next-generation wireless communications, Internet of Things, and quantum applications. He was a recipient of the National Science Foundation CAREER Award in 2020, the Department of Defense DURIP Award in 2021, and the Cisco Faculty Research Award in 2017. He received the Analog Devices Outstanding Student Designer Award in 2008 and the IEEE RFIC Symposium Best Student Paper Award (Third Place) in 2011. He served as a Guest Editor for IEEE Design & Test of Computers in 2019. He serves as an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I: REGULAR PAPERS for the term 2020-2021.



Soumen Mohapatra (Student Member, IEEE) received the B.S. degree in electrical engineering from the National Institute of Technology (NIT), Rourkela, India, in 2015. He is currently pursuing the Ph.D. degree with Washington State University, Pullman, WA, USA. He was working with the CMOS Image Sensor Group, ON Semiconductor, India, for two years, and the Power Management IC Team, Samsung Research and Development, India, for two years. His research interests include design of frequency synthesizers, switched inductor capac-

itor voltage regulator, and mixed-signal circuit design.



Danijela Cabric (Fellow, IEEE) received the M.S. degree in electrical engineering from the University of California at Los Angeles (UCLA) in 2001 and the Ph.D. degree in electrical engineering from UC Berkeley in 2007. She is currently a Professor in electrical and computer engineering with UCLA. She received the Samueli Fellowship in 2008, the Okawa Foundation Research Grant in 2009, the Hellman Fellowship in 2012, the National Science Foundation Faculty Early Career Development (CAREER) Award in 2012, and the Qualcomm

Faculty Award in 2020. She served as an Associate Editor for IEEE TRANS-ACTIONS OF COGNITIVE COMMUNICATIONS AND NETWORKING, IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, IEEE TRANSACTIONS ON MOBILE COMPUTING, and IEEE Signal Processing Magazine and as an IEEE ComSoc Distinguished Lecturer. Her research interests are millimeterwave communications, distributed communications and sensing for Internet of Things, and machine learning for wireless networks co-existence and security.