Scalable RF Receivers for Large Antenna Arrays

Konstantin Trotskovsky
Elad Alon, Ed.
Ali Niknejad, Ed.
Ilan Adler, Ed.

Electrical Engineering and Computer Sciences
University of California at Berkeley

Technical Report No. UCB/EECS-2020-188
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-188.html

December 1, 2020
Copyright © 2020, by the author(s).
All rights reserved.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.
Scalable RF Receivers for Large Antenna Arrays

by

Konstantin Trotskovsky

A dissertation submitted in partial satisfaction
of the requirements for the degree of

Doctor of Philosophy

in

Engineering - Electrical Engineering and Computer Sciences

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Elad Alon, Chair
Professor Ali M. Niknejad
Professor Ilan Adler

Fall 2018
Scalable RF Receivers for Large Antenna Arrays

Copyright © 2018

by

Konstantin Trotskovsky
Abstract

Scalable RF Receivers for Large Antenna Arrays

by

Konstantin Trotskovsky

Doctor of Philosophy in Engineering - Electrical Engineering and Computer Sciences

University of California, Berkeley

Professor Elad Alon, Chair

The ongoing exponential mobile traffic increase is continuing to push the requirements of wireless communication systems. Currently in dense urban areas the major limitation to the capacity of wireless links is interference between different users. Massive MIMO is a promising technology to address this challenge, where directional spatial beams are formed by an antenna array on a base station, each serving a different user. This concept is currently verified using commercial general-purpose analog hardware, with power consumption in the kW range for large arrays.

This thesis focuses on energy-efficient RF receiver design for Massive MIMO systems. In order to be energy-efficient, the system requirements for a large antenna array receiver are different from a conventional single-antenna receiver. We use the fact that the noise of each receiver is uncorrelated across the array, so we can save power by using receivers with larger noise, while still meeting the link budget requirements. However, receiver linearity cannot be relaxed due to the digital beamforming used in the system.

To address these challenges a novel RF receiver architecture is proposed, using a mixer-first approach with mixer switch resistance larger than 50Ω. Configurable mixer and baseband Gm sizes allow the receiver to trade noise figure for power consumption and use the same receiver for various array sizes and array-level noise specifications. Harmonic recombination is performed early in the signal path, enabling rejection of harmonic blockers up to -10dBm of power.

This architecture was implemented in a first chip prototype. Scalable element noise figure results in sub-2.5 dB array-level noise figure with 16 to 64 antennas and <368 mW total power consumption, for a frequency range of 0.25-1.7GHz.

In the second chip we perform an optimized design using the Berkeley Analog Generator (BAG). The proposed generator design methodology produces instances with optimum power consumption for a given noise figure specification. An instance of this generator is implemented in the chip, showing power consumption improvement of 50% and wider frequency range of 1-6GHz compared to the first chip.
List of Figures
List of Tables
Acknowledgements

I am very grateful to many people in Berkeley that helped me through my PhD here. First I would like to thank my advisors, professors Elad Alon and Ali Niknejad. I feel very fortunate to work with them. The weekly meetings with Elad were something that I was expecting for the entire week. I was always impressed by Elad’s ability to switch to a new solution and throw away the old one, no matter how much did we invest in it in the past. His levels of energy and optimism are something to aspire for me. I am very grateful to Ali for his offer to be co-advised by him. Ali provided a new insight and a different perspective to the problems that I was facing. I also had a great time GSI-ing 242A and 105 with him. Teaching was a hard experience for me in the beginning, and Ali was very helpful and supportive.

I would also like to thank professor Bora Nikolić for taking part in my quals committee and helping out during the project meetings that we had over the years. Thanks to professor Paul Wright for serving in my quals committee and providing feedback. Special thanks to professor Ilan Adler for being a member in my dissertation committee, even though I asked him so late.

The students in BWRC make this place very special. I feel that a large part of what I learned in the past 5 years was from interacting with them. In the first eWallpaper project we started as a team of inexperienced students, and this chip would not be possible to build without the effort of all of them. I feel lucky to work with Amy Whitcombe on this project and especially during the measurements and paper writing stage. I thought that I am a hard-working person that pays attention to details, but working with Amy had set a new bar for me. Her work ethics, willingness to help and positive attitude made our work a great experience. During the chip and board design I was fortunate to work with Pengpeng Lu. Pengpeng’s patience and sense of humor made many hours that we spent together both funny and productive. Many thanks to Greg LaCaille who helped me with various circuit problems and had so many great ideas. Antonio Puglielli was a brilliant guy who made every problem look easy. Greg’s and Antonio’s help with top-level integration made this tapeout possible. Thanks to Nathan Narevsky, Eric Chang and Zhongkai Wang for their help and advice. Thanks to Greg Wright from Nokia who was very enthusiastic about the project and provided many helpful inputs.

In my second project I had to learn how to use BAG, and I would like to thank Eric Chang for being patient and helpful answering all my questions. In the early stages I had many questions and Eric was always patient and quick to respond. I am very happy that I got to know Marko Kosunen and worked with him on the chip flow and integration. His continuous efforts to make the chip work and meet the deadline made this chip possible. I learned a lot from Marko, both on the technical side and on the problem solving methodology. Thanks to Nathan, Zhongkai and Pengpeng for their help with the chip design.

I was lucky to be involved in two research groups of my two advisors and get to know many smart people. Thanks to Seobin Jung, Yida Duan, Jaeduk Han, Bonjern Yang, Nick Sutardja, Emily Naviasky, Ali Moin, Steven Callender, Andrew Townley, Nai-Chung Kuo, Lorenzo Iotti, Nima Baniasadi, Bo Zhao, Sashank Krishnamurthy, Yi-An Li, Luya Zhang, Ali Ameri and Matthew Anderson. Outside of my research groups, I would like to thank Luke Calderin, Sameet Ramakrishnan, Steve Bailey, Mirjana Videnović-Mišić, Rachel Hochman, Amanda Pratt, Nandish Mehta, Sharon Xiao, Sidney Buchbinder, Filip Maksimovic, Krishna Settaluri and John Wright.

The staff members make BWRC a great place to work. Special thanks to Candy Corpus who was extremely friendly and helpful in solving any administrative problem. Candy’s optimism and hard work make BWRC a great place to be in. Thanks a lot to Brian Richards who was very
helpful and optimistic during the hard tapeout times. James Dunn was very quick to respond to any problem, and made sure that it is resolved no matter how busy he was. Fred Burghardt made the lab a great place to work and was always happy to help. Many thanks to Anita Flynn for her help with PCB design. Thanks to Ajith Amerasekera, Melissa Trevizo, Amber Sanchez, Olivia Nolan and Sarah Jordan for their help.

I would like to thank my family, my mother Elena and sister Maya. Being far away for a long time was a challenge for all of us, and I am very grateful to them for their support during all these years. My last and great thanks is to my fiancée Anna, who made my life far better since we met. I am looking forward to our future life together.
Chapter 1

Introduction

We are living in a time of exponential mobile traffic increase. In the past 5 years the yearly mobile traffic increased by 55-70% per year \[^{??}\]. This trend is expected to continue at least in the next 5 years, projected in Figure \[^{??}\]. The total mobile traffic is expected to grow by a factor of 5 in this period.

![Figure 1.1. Global mobile data traffic (exabytes per month). Exponential global traffic increase over time is projected.](image)

There are two contributors to the total mobile traffic increase. First, the number of mobile devices is growing. Second, and more importantly, the mobile traffic per device is growing as well. The projection is that from 2017 to 2023 the number of worldwide smartphones will increase by
8% per year, while the mobile traffic per smartphone will increase by 31% per year. This trend happens all over the world, as illustrated in Figure 1.2.

Figure 1.2. Mobile data traffic per active smartphone (gigabytes per month). Data traffic increase per user is exponential all over the world.

To address this increase in mobile traffic, the 3rd Generation Partnership Project (3GPP) created the specifications for Long-Term Evolution (LTE) in mobile communication. Approximately every year a new release is made enabling higher data rates and denser deployments. We now see a transition from a 4G LTE standard to 5G, when the current Release 15 is the first to have 5G specifications. However, as shown in Figure 1.2, the transition between 4G and 5G will take many years, and in 2023 only about 20% of the data traffic will be through the 5G standard.

In the past decades many different techniques were used to address the exploding demand for data traffic. Two main challenges should be addressed: increasing the data rate per user, and supporting larger number of users simultaneously. We will briefly discuss them and see that continuing to improve them is becoming more and more difficult.

Ignoring other users, the main techniques to improve the data rate per user are increased bandwidth, improved channel coding and using higher constellations.

- **Bandwidth** is the most straightforward way to increase the possible data rate. However the frequency allocation in RF frequencies is very dense and extremely expensive in RF frequencies. One possible direction that is under consideration for 5G is starting to use mm-wave frequencies where more bandwidth is available. This creates new challenges that we will discuss later.

- **Channel coding** can enable the usage of smaller SNR for the same received data rate. Modern coding schemes are already very close to Shannon’s limit, leaving very little room for improvement in the required SNR.

- **Higher constellation** enables larger channel capacity for the same bandwidth. However since the channel capacity is logarithmic with the SNR, there are diminishing returns in the chan-
nel capacity when improving SNR (Figure ??). In addition, the hardware implementation becomes extremely challenging for higher constellations.

In less dense rural areas these are the main challenges. However, in dense urban areas large numbers of users need to be served simultaneously. Here two main questions arise: how to share the wireless channel between different users, and how to prevent inter-user interference. Several techniques have been used to address these issues: Frequency-Division Multiple Access or FDMA (allocating different frequencies to different users), Time-Division Multiple Access or TDMA (allocating different time slots to different users), and Code-Division Multiple Access or CDMA (allocating different coding schemes to different users). The main limitation of these techniques is that they share the entire network capacity between the users, so adding more users will result in reduced user capacity.

To address these issues, spatial multiplexing is a promising direction. Multiple input-multiple-output (MIMO) technology is already using multiple antennas on both receive and transmit sides, taking advantage of several multi-path propagations. Multi-user MIMO (MU-MIMO) offers even higher capacities due to simultaneous communication to different users.

This approach is part of the Berkeley Wireless Research Center (BWRC) vision of a universal next-generation (xG) network [? ], summarized in Figure ?? A large antenna array access point (xG Hub) provides connectivity to many devices using highly directional beams. The large array can support various communication standards, devices and ranges. Beamforming implements spatial selectivity and enables spectrum re-use and simultaneous communication to many users.

This vision has a lot in common with the recently popular Massive MIMO concept. The key idea of Massive MIMO is the use of a large number of base station antennas and much smaller number of users. Therefore, using beamforming, the large array can form a beam to every user, greatly improving the interference between the users and re-using the spectrum.

Several massive MIMO systems have been recently demonstrated in academia and industry.
Their main goal was to explore the system performance rather than to implement an efficient hardware. For the RF transceivers off-the-shelf components were used. The Ngara system \[?\] operated at 806MHz for the uplink and 638MHz for the downlink and used discrete components. The Argos \[?\] and ArgosV2 \[?\] systems operated at the 2.4GHz ISM band and used Maxim 2829 WiFi transceiver, while ArgosV3 \[?\] used a wideband Lime Microsystems LMS7002M transceiver operating at 50MHz-3.8GHz. Lund University \[?\], National Instruments \[?\] and Samsung \[?\] have demonstrated systems using NI USRP RIO Software-Defined-Radio transceivers, operating at 3.7GHz, 50MHz-6GHz and 3.4-3.6GHz respectively.

These systems have 32-160 antennas and serve 10-16 users. The off-the-shelf RF transceivers used consume 100s of mW, making the array RF power consumption lie in the kW range. This power consumption is already huge, and will be even larger if more antennas are used to serve more users.

So far no custom-design RF IC receivers were demonstrated in Massive MIMO systems. The goal of this work is to implement an integrated energy-efficient RF receiver tailored for Massive MIMO applications. In Chapter ?? we discuss the RF receiver design considerations with an emphasis on the differences between a receiver for large antenna array and a more common single-antenna receiver. In Chapter ?? we describe the circuit architecture to address the array requirements and show a first version implementation of a chip for Massive MIMO applications. Chapter ?? describes the second chip implementation, fully designed using Berkeley Analog Generator (BAG) and optimized for energy-efficiency performance. Finally, Chapter ?? summarizes the thesis.
Chapter 2

System Design Considerations

In this chapter we will discuss the RF receiver design considerations for a large antenna array. Designing an RF receiver for a single-antenna system is a well-known procedure, but the fact that the receiver is intended to be used in a large array changes the specifications for the receiver. In other words, we will show what can we change in the RF receiver design in order to achieve the overall array performance in an energy-efficient manner.

The design considerations details of the entire Massive MIMO system can be found in [? ] and [? ]. Here we will focus on the implications on the RF receiver part of the system. A detailed description of the ADC design considerations is presented in [? ].

2.1 Noise and energy-efficiency

To understand the impact of the receiver noise performance on the overall array energy efficiency, we will compare the transmitter (TX) and the receiver (RX) behavior in large arrays. This is summarized in Figure ??.

We will assume that we have an array of $M$ antennas, each has an output power of $P_{out}$ for the TX side and a noise figure of $NF$ for the RX side. Then both for TX and RX the total array power consumption is proportional to $M$:

$$P_{array} \propto M \quad (2.1)$$

On the TX side with beamforming the main beam points to the user direction, and we have spatial summation of the electromagnetic waves. So the electric fields are summed in the air, which is equivalent to voltage summation of the transmitter outputs. Thus when we have $M$ transmitters with output power of $P_{out}$, the array equivalent isotropic radiated power (EIRP) is proportional to $M^2P_{out}$:

$$EIRP_{array} \propto M^2P_{out} \quad (2.2)$$

It means that the ratio of the EIRP to the array power consumption is proportional to the
array size:

\[
\frac{EIRP_{array}}{P_{array}} \propto M
\]  

(2.3)

The implication is that the overall TX energy efficiency can be improved when using the array. If we need a certain array EIRP to fulfill our link budget, we can reduce the overall array TX power consumption by using more array elements, each having lower output power. The array power consumption cannot be reduced indefinitely due to overhead power that does not scale with the array size, resulting in optimum array size for given array EIRP, transmitter efficiency and overhead power \[?\]. Generally speaking, for a large array we need efficient low output power transmitters with low overhead power.

However, on the RX side the analysis is different. The noise added by each receiver in the array is uncorrelated across the antenna elements. Thus while the output signal power is proportional to \(M^2\) (same as in the TX case), the output noise power is proportional to \(M\). This is illustrated in Figure ?? for \(M = 4\) elements.

Consequently the array SNR is proportional to \(M\), or equivalently:

\[
NF_{array} = \frac{NF}{M}
\]  

(2.4)
We should note that this analysis is simplified as it ignores the uncorrelated noise from the environment. However, the result is unchanged, as shown in [? ]. Hence for the array we have:

\[
\frac{1}{P_{array}N_{F_{array}}} = \text{const.} \tag{2.5}
\]

The implication here is that the overall RX energy efficiency cannot be improved with the array. Even in an ideal case where the element power consumption is inversely proportional to its noise figure, when we add more elements to the array the overall power consumption stays constant. This is equivalent to a large receiver being split into \( M \) smaller receivers, each with smaller power consumption and higher noise figure, so the overall performance stays the same. Hence on the RX side we need to be able to build energy-efficient receivers that can have high noise. In other words, we would like to implement a receiver with power consumption inversely proportional to its noise figure across a wide range of noise figures.

From RX standpoint we should distinguish between two link-budget regimes:

1. Link-budget-limited regime, where a single RX element is not sufficient to meet the desired link budget. In order to meet the link budget, we need to use an antenna array to improve the SNR by the RX array gain. In this regime we should spend extra power consumption on the antenna elements to fulfill the system requirements.

2. Non-noise-limited regime, where a single RX element is sufficient to meet the desired link budget. An antenna array can be used in order to implement directional beamforming to different users rather than to meet the link budget requirements. Then, in order to avoid extra power consumption, we can use receivers with higher noise figure and lower power consumption, to keep the overall array power consumption constant.

We can see that in the two regimes we should have different implementations of the RF receiver. In the link-budget-limited regime, a low-noise receiver should be designed, and its noise is effectively further improved by the array gain. This is the case in many mm-wave systems, where lower output powers, higher noise figures and larger path losses may require an array to meet the link budget. In non-noise-limited regime, higher noise and lower power receivers should be designed in order to save the overall array power. This is the case in many sub-6GHz systems.

In our work we will focus on sub-6GHz receivers in non-noise-limited regime. To the best of our knowledge, this category of receivers has not been studied in the past and it is important for future Massive MIMO systems, especially when the array size is very large.

To summarize, in the non-noise-limited regime, if the link-budget requirement for the receiver array is \( N_{F_{array}} \), then from equation ?? the spec for the element noise figure is \( M \) times higher, or in dB:

\[
N_F [dB] = N_{F_{array}} [dB] + 10 \log_{10} M \tag{2.6}
\]

### 2.2 Linearity

Several linearity metrics are used in RF receivers. RF chain compression by the desired signal (and specifically the 1dB compression point \( P_{-1dB} \)) is used to characterize the SNR degradation
when the desired signals at the input of the receiver become large. The weak-nonlinearity inter-
modulations (and specifically second and third order intercept points IP2 and IP3) characterize the
impact of nearby blockers with moderate input powers on the SNR of the desired signal. Finally,
large nearby blockers cause the receiver gain of the desired signal to compress and the thermal
noise floor to increase, both degrading the desired SNR.

When analyzing the impact of the array on the overall linearity performance, we observe that
the inband linearity (the path of the signal) is unaffected by the fact that we are using an array.
This is due to the fact that nonlinearity is a systematic feature of the array elements, so to first
order each element has the same nonlinear characteristics. However, the blocker impact is very
different. The main idea of Massive MIMO systems is to cancel the blockers and create directional
beams that will enable larger signal-to-interferer ratios. Hence from the analog design standpoint,
an important question is how exactly the blockers can be cancelled.

The first approach is digital beamforming, as shown in Figure ???. Each antenna element has
an ADC and the digital signals are beamformed to cancel the blocker. Note that $a_k$ are complex
multipliers, acting on both I and Q outputs. This architecture is easy to build since we don’t need
to implement analog multiplications (phase shifters and VGAs). In addition, this picture shows a
simple case of a single user, when only one beam is used. As the number of users grows we can keep
the analog receivers unchanged, and add more columns of digital multipliers, which simplifies the
overall system design. However, this architecture is challenging from linearity standpoint. Each
analog receiver has to handle the full blocker strength, since the cancellation occurs only in the
digital domain after the ADCs.

Another approach is analog beamforming, as shown in Figure ???. Here the analog signals are
beamformed early in the RF chain before the ADC. The analog multiplication is implemented
as a combination of phase shifters and VGAs, and can be perfomed in RF, LO, baseband or a
combination of these. This approach relaxes the linearity requirements from the analog stages
located after the beamforming since the blocker levels there are substantially lower. Hence from a
linearity standpoint, it is advantageous to perform the beamforming earlier in the RF chain (for
example, in RF/LO rather than in base-band). However, scaling this architecture to support many
users results in duplicating the beamforming analog hardware, which makes the full system design
much more complicated.

Finally, the last approach is hybrid beamforming, shown in Figure ???. Here we have both
analog and digital blocker cancellation. The $M$ antennas are split into groups of $P$ antennas, each

---

Figure 2.3. Digital beamforming, where the blockers are cancelled at the digital domain
(pink) while the analog domain (light blue) must handle the full blocker power.

Another approach is analog beamforming, as shown in Figure ???. Here the analog signals are
beamformed early in the RF chain before the ADC. The analog multiplication is implemented
as a combination of phase shifters and VGAs, and can be perfomed in RF, LO, baseband or a
combination of these. This approach relaxes the linearity requirements from the analog stages
located after the beamforming since the blocker levels there are substantially lower. Hence from a
linearity standpoint, it is advantageous to perform the beamforming earlier in the RF chain (for
example, in RF/LO rather than in base-band). However, scaling this architecture to support many
users results in duplicating the beamforming analog hardware, which makes the full system design
much more complicated.

Finally, the last approach is hybrid beamforming, shown in Figure ???. Here we have both
analog and digital blocker cancellation. The $M$ antennas are split into groups of $P$ antennas, each
Figure 2.4. Analog beamforming, where the blockers are cancelled at the analog domain (light blue) while the digital domain operates with reduced blocker levels.

The idea here is that each group forms a partial beam that cancels some portion of the blockers, relaxing the linearity requirements from the following analog stages.

Figure 2.5. Hybrid beamforming, where the blockers are cancelled in part in the analog domain (light blue) and in part in the digital (pink) domain.

The choice of the different beamforming architectures depends mainly on the speed of the digital data and its power consumption. In RF frequencies the channel bandwidth is relatively low (usually tens of MHz), so we can build a digital beamforming system with moderate digital power consumption. However in mm-wave systems the channel bandwidths are several GHz and the digital power consumption becomes very large, so some amount of analog beamforming is necessary. Hence mm-wave massive MIMO systems are typically using hybrid beamforming. Analog beamforming is too complicated for multi-user MIMO systems, and is mainly used in traditional phased array applications where a single beam is formed.

In our system we operate in the sub-6GHz frequency range with channel bandwidths of tens of MHz. Hence the digital power consumption is moderate, and we can use digital beamforming to simplify the overall system and enable easier support of many users. However, the implication is that the analog receiver part has to be very linear to support large blockers before being digitized by the ADCs.
2.3 Chip specifications

After a general discussion about the noise and linearity implications on a receiver design in a large array system, we will summarize the specifications for our first receiver prototype. We are not targeting any specific application, like WiFi or LTE, but rather want to build a system that can be used for various applications, as shown in the xG vision in Figure ???. Hence we can briefly look at the current communication standards like WiFi and LTE, and try to derive the initial specifications for our system.

2.3.1 Frequency range and bandwidth

The cellular LTE standard supports 1.4, 5, 10, 15, and 20 MHz channel bandwidths [? ]. It also supports carrier aggregation, in which up to 5 bands can be used simultaneously to enhance bandwidth. Smaller bandwidths are supported for legacy compatibility with existing standards like GSM and CDMA, while larger bandwidths are used to support higher data rates for modern wireless systems. LTE operates over many of frequency bands, from 700MHz up to 3.8GHz.

The wireless local area network (WLAN) protocol or Wi-Fi is the most common protocol providing wireless internet access to laptops and smartphones. Since its introduction in 1997 as the IEEE 802.11 standard, many updates to the standard have been implemented to support higher data rates and more frequency bands to address the increasing demand. WiFi operates in industrial, scientific and medical (ISM) and Unlicensed National Information Infrastructure (U-NII) bands. Today most WiFi devices are operating in the 2.4GHz band of 2.412-2.484GHz (802.11b, 802.11g, and 802.11n) and the 5GHz band of 5.15-5.875GHz (802.11a, 802.11n, and 802.11ac) [? ]. In addition, 802.11ad is using 60GHz bands. All 2.4GHz and 5GHz standards support 20MHz channel bandwidths, while 802.11n and 802.11ac also support 40MHz channels. The 802.11ac standard can also support 80 and 160 MHz bandwidths.

For our chip we will support a broad 700MHz to 6GHz range to cover all possible LTE and WiFi bands, as well as other possible bands in between. In terms of channel bandwidth we will use 20MHz channels (10MHz base-band bandwidth) for our chip. Supporting several channel bandwidths is possible, and was demonstrated in [? ], [? ] and [? ], but the overhead required to do so is large and not required to illustrate our research goals.

2.3.2 Blocker tolerance

As we have seen in section ??, when we use digital beamforming, the RF receiver needs to be able to handle large blocker powers, since the blockers are only cancelled after the conversion by the ADCs. LTE standards have specifications of the adjacent channel interferers and out-of-band interferers resilience. An example of the power levels that an LTE receiver should tolerate is shown in Figure ???. This example has relatively relaxed blocker specs, only blockers further than 85MHz from the signal can be as high as -15dBm.

However, WiFi operates in unlicensed band and has to co-exist with other devices that use the ISM band, like Bluetooth devices, cordless phones and microwave ovens. The received power in the WiFi band can be estimated by the free-space path loss (FSPL) in the Friis transmission equation:
where $d$ is the distance between the transmitter and the receiver and $\lambda$ is the wavelength. At 2.4GHz and distance of 0.5m (a reasonable distance for an interferer) it results in losses of 34dB. So an example cordless phone with output power of 20dBm will result in blocker power of -14dBm. Bluetooth can be even a stronger interferer since mobile devices have both WiFi and Bluetooth support and their antennas can be much closer, or they can even share the same antenna. These Bluetooth blockers can be as high as -6dBm [? ].

In academia, tolerance to large nearby blockers was explored. In recent years works showing resilience to 0dBm blockers tens of MHz away from the signal were demonstrated [? ], [? ], at a price of high power consumption.

For our prototype we will target -10dBm blocker power at 40MHz offset from the carrier frequency. This performance can provide resilience to the majority of possible scenarios, and enable the digital beamforming architecture that we are targeting. In addition, we should also tolerate harmonic blockers of -10dBm, since for the lower part of our band the harmonic blockers are still in-band. We will describe this problem in detail in section ??.
2.3.3 Specifications summary

The specifications for the chip prototype that we discussed in this section are summarized in Table 2.1. The element noise figure range of 12-24dB can support several system scenarios:

- Array noise figure of 6dB, with 4 - 64 array elements
- 16 array elements, with array noise figure of 0 - 12dB

<table>
<thead>
<tr>
<th>Frequency range</th>
<th>0.7 - 6 GHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>Element noise figure</td>
<td>12 - 24 dB</td>
</tr>
<tr>
<td>Baseband bandwidth</td>
<td>10 MHz</td>
</tr>
<tr>
<td>Gain</td>
<td>50 dB</td>
</tr>
<tr>
<td>Nearby blocker power</td>
<td>-10 dBm</td>
</tr>
<tr>
<td>Nearby blocker offset</td>
<td>40 MHz</td>
</tr>
<tr>
<td>Harmonic blocker power</td>
<td>-10 dBm</td>
</tr>
</tbody>
</table>

Table 2.1. Specifications summary for the array element receiver prototype.
Chapter 3

A 0.25-1.7GHz, 3.9-13.7mW Power-Scalable, -10dBm Harmonic Blocker-Tolerant Mixer-First RF-to-Digital Receiver

3.1 Chip architecture

The chip specifications derived in the previous chapter are different from conventional RF receivers. We can generally categorize RF receivers into two main categories: high-performance and low-power. Their main specifications are summarized in Table ??.

<table>
<thead>
<tr>
<th>Applications</th>
<th>High-performance</th>
<th>Low-power</th>
<th>Our goal</th>
</tr>
</thead>
<tbody>
<tr>
<td>Power consumption</td>
<td>Cellular, WiFi, SDR(1)</td>
<td>Bluetooth, IoT(2)</td>
<td>Massive MIMO array</td>
</tr>
<tr>
<td>Noise</td>
<td>High</td>
<td>Low</td>
<td>Low</td>
</tr>
<tr>
<td>Linearity</td>
<td>Low</td>
<td>High</td>
<td>High</td>
</tr>
<tr>
<td>RF frequency</td>
<td>Narrowband/Wideband</td>
<td>Narrowband</td>
<td>Wideband</td>
</tr>
</tbody>
</table>

(1) Software-Defined Radio. (2) Internet of Things.

Table 3.1. Main specifications comparison for RF receiver categories. Our massive MIMO system doesn’t fit into the two common receiver categories.

High-performance receivers [? ], [? ], [? ], [? ], [? ], [? ] are aiming for excellent noise and linearity, and consuming 10s to 100s of mW to achieve them. The noise and linearity performance
of a high-end single-antenna receiver are crucial to achieve the desired ranges and address intense interference scenarios. Thus these DC power budgets are acceptable. These high-performance receivers are not intended to be used in variable-size, low-power multi-user MIMO radio arrays. The work in [? ] uses spatial filtering in a 4-element array to tolerate large in-band blockers and to relax the dynamic range of the analog-to-digital converter (ADC), but consumes >110 mW to support 4 receiver elements without integrated ADCs. In [? ], a multi-antenna system with analog beamforming is shown, but does not include baseband circuitry and supports only a single user. Neither [? ] nor [? ] supports harmonic rejection, which is needed in wideband RF systems that can address multiple frequency bands and communication standards. The work in [? ] shows a high-performance single receiver with resiliency to harmonic blockers, but consumes a large amount of power in noise-cancelling circuitry that is not needed in a massive MIMO array. RF receivers for software-defined radio (SDR) as in [? ] provide only bandwidth and RF frequency tuning, but cannot be configured to trade power for NF in larger MIMO arrays.

Low-power receivers [? ], [? ], [? ] are constrained by strict DC power budgets of a few mW. To meet the power budget, these receivers compromise on noise performance (which is acceptable since the data rates and ranges are smaller) and linearity (though nowadays interference is becoming more important when the number of IoT devices is sharply increasing). In addition, low-power receivers are narrowband which allows low-power LO generation and distribution schemes.

However, our system does not fit into these categories. We would like to have a receiver with good linearity (since we use digital beamforming) and low power (due to very large array size), but we can compromise on the noise since it is averaged across the array elements. We would like to design a programmable noise-power tradeoff as a common element of arrays with varying sizes; as the array size is increased, the relaxed per-element noise requirements are leveraged to reduce power consumption. In addition we would like to support a wideband RF (requiring harmonic rejection) and variable array size/noise specification (a scalable solution). Hence a new receiver architecture is needed.

### 3.1.1 Mixer-first receivers

The first question to address is the RF frond-end. High-performance receivers target very low noise performance, and usually use a Low Noise Amplifier (LNA) as the first stage of the receiver. The LNA contributes low noise while providing large gain, so the noise contribution of the later stages is very small. To achieve a low noise performance, the power consumption of the LNA should be relatively large (typically 5-10mW). Moreover, the LNA experiences large blocker swings, and its power consumption should be large enough to sustain large blockers.

In recent years, a mixer-first topology was introduced [? ], which enables noise figures of a few dB-s and excellent out-of-band linearity. The main idea is shown in Figure ??

In Figure ??, a switching mixer is connected directly to the RF port, driven by non-overlapping LO phases. Each switch is shown as an ideal switch and a series resistor $R_{\text{sw}}$ representing the switch series resistance. On the base-band side, each mixer is connected to a shunt capacitor $C_B$, and the equivalent resistance from the base-band side is represented by $R_B$. Charge-conservation analysis [? ] shows that the input impedance for our linear time-varying (LTV) system can be accurately represented using a linear time-invariant (LTI) model as shown in Figure ??

$\gamma$ represents the fundamental harmonic
conversion gain:
\[ \gamma = \frac{1}{N} \text{sinc}^2 \left( \frac{1}{N} \right) \]
and \( R_{sh} \) represents the loss due to up-conversion of the baseband voltage via the LO harmonics:
\[ R_{sh} = \frac{N\gamma}{1 - N\gamma} (R_a + R_{sw}) \]
where \( N \) is the number of the LO phases (\( N = 4 \) in Figure ??).

![Mixer-first receiver conceptual diagram](image1)

Figure 3.1. Mixer-first receiver conceptual diagram (left) and the corresponding LO waveforms (right). \( R_a \) is the antenna resistance, \( R_{sw} \) is the mixer switch resistance, and the switches are ideal.

![Mixer-first equivalent Linear Time-Invariant model](image2)

Figure 3.2. Mixer-first equivalent Linear Time-Invariant model. Impedance matching is achieved for \( R_{sw} + \gamma R_B || R_{sh} = R_a \)

The mixer-first architecture has two important properties:

- **Impedance matching** can be achieved by using small switch resistance \( R_{sw} \) and small equivalent baseband resistance \( R_B \) so that \( R_{sw} + \gamma R_B || R_{sh} = R_a = 50\Omega \). For \( N = 4 \) we get \( \gamma = 0.2 \) and \( R_{sh} = 4.3 \) \((R_a + R_{sw})\). Switch resistances of less than 10\(\Omega\) can easily be implemented using large mixer switch devices in modern processes. Low base-band resistance can be achieved by using Trans-impedance Amplifiers (TIAs).
- **Band-pass filter** -like input impedance is essentially achieved at RF frequency due to up-conversion of the base-band impedance. At RF frequencies close to the LO frequency (in-band) we can achieve an impedance match, while at RF frequencies further away from the LO frequency (out-of-band) we see a low impedance (limited by the switch resistance). This impedance profile results in excellent out-of-band linearity of mixer-first receivers.

When the base-band $R_B$ resistance is implemented using a TIA [? ] (Figure ??) a feedback resistor $R_f$ satisfying $R_B = R_f / (A + 1)$ can be used, where $A$ is the TIA amplifier gain. Noise analysis of the LTI model in Figure ?? [? ] shows that this topology can achieve noise figure of a few dB-s. Mixer-first topology with noise cancellation was introduced in [? ], showing state-of-the-art sub-2dB noise figure, at a cost of increased complexity and power consumption.

![Figure 3.3. Mixer-first with TIA as the first base-band stage. Low mixer switch resistance and TIA provide impedance matching and large voltage gain.](image)

We will do a simple noise analysis of the model in Figure ?? here, taking into account only the mixer noise contribution ($R_B$ is assumed to be noiseless). Then the output noise voltage is:

$$v_{n,\text{out}}^2 = \left( \frac{R_{sh}||\gamma R_B}{R'_a + R_{sh}||\gamma R_B} \right)^2 \left( v_{n,R_a}^2 + v_{n,R_{sw}}^2 \right) + \left( \frac{R'_a||\gamma R_B}{R_{sh} + R'_a||\gamma R_B} \right)^2 v_{n,R_{sh}}^2$$  \(3.3\)

where $R'_a = R_a + R_{sw}$, and $v_{n,R_a}$, $v_{n,R_{sw}}$ and $v_{n,R_{sh}}$ are the noise voltages of $R_a$, $R_{sw}$ and $R_{sh}$ respectively.

Then the noise figure due to the mixer is given by:

$$F = \frac{1}{\text{sinc}^2 \left( \frac{1}{N} \right)} \left( 1 + \frac{R_{sw}}{R_a} \right)$$  \(3.4\)

The plot of the noise figure is shown in Figure ?? for 4 phases and 8 phases. First we can see that increasing the number of phases from 4 to 8 improves the noise figure since there is less
re-radiation to higher harmonics (more on that in the next section). From equation ?? we can see that this improvement is \( \text{sinc}^2\left(\frac{1}{8}\right) = 0.7 \text{dB} \), independent of the switch resistance. This is a substantial difference when targeting noise figures of a few dB, but less important for our spec. More importantly, even when the switch resistances are as large as several 100s of \( \Omega \)s (corresponding to minimum switch sizes in modern processes), the mixer noise contribution corresponds to noise figures of less than 10dB.

The power cost of the small switch resistance for high-performance mixer-first receivers is quite large, summarized in Table ???. Since the LO power is consumed by digital gates, the power consumption is proportional to the frequency \( P_{LO} = CV^2f \), where \( C \) is the total capacitance of the LO distribution, \( V \) is the LO supply voltage and \( f \) is the LO frequency. So it is convenient to consider the LO power per frequency (mW/GHz). We can see from the table that the LO power is a substantial part of the total (LO+BB) power. While the base-band power is constant, the LO power increases with the frequency. So at a few GHz of LO frequency the LO is the dominant part of the total power.

Since the LO power consumption is significant, and we don’t need to use large switches for our noise specifications, we can use switches with large resistance and save substantial LO power. Moreover, since switches are easily scalable, we can build a bank of parallel switches and pick the size that we need for a desired spec. This architecture will enable us to achieve the noise tuning range that we need. However, we won’t be able to achieve impedance matching using base-band TIAs, since the switch resistance will already be larger than 50\( \Omega \). We will address this issue in section ??.

![Figure 3.4. Mixer-first noise figure, mixer noise contribution only. Note that even resistances of several 100s of \( \Omega \)s provide noise figures of less than 10dB.](image-url)
Table 3.2. LO power consumption of high-performance mixer-first receivers (NR - not reported). The LO power is a substantial part of the receiver, for low switch resistance (< 20Ω) approximately 10-30mW/GHz are consumed by the LO.

<table>
<thead>
<tr>
<th>Ref.</th>
<th>Technology (nm)</th>
<th>f (GHz)</th>
<th>Rsw (Ω)</th>
<th>LO phases</th>
<th>NF (dB)</th>
<th>BB power (mW)</th>
<th>LO power (mW/GHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>180</td>
<td>0.1-1.8</td>
<td>NR</td>
<td>4</td>
<td>3.5</td>
<td>24</td>
<td>11</td>
</tr>
<tr>
<td></td>
<td>28</td>
<td>0.4-3.5</td>
<td>NR</td>
<td>8</td>
<td>2.6</td>
<td>36</td>
<td>16</td>
</tr>
<tr>
<td></td>
<td>65</td>
<td>0.1-1.5</td>
<td>10</td>
<td>8</td>
<td>6.5</td>
<td>35</td>
<td>13</td>
</tr>
<tr>
<td></td>
<td>40</td>
<td>0.1-2.7</td>
<td>20</td>
<td>8</td>
<td>1.6</td>
<td>32</td>
<td>17</td>
</tr>
<tr>
<td></td>
<td>65</td>
<td>0.1-1.2</td>
<td>20</td>
<td>8</td>
<td>4.4</td>
<td>30</td>
<td>28</td>
</tr>
<tr>
<td></td>
<td>65</td>
<td>0.2-1.2</td>
<td>5</td>
<td>8</td>
<td>3.5</td>
<td>36</td>
<td>33</td>
</tr>
</tbody>
</table>

3.1.2 Harmonic rejection

Since we are targeting a wideband frequency operating range of 1-6GHz, if we use a 4-phase mixer-first architecture, we will have a problem of harmonic blockers. Due to the fact that the LO is rectangular, RF signals around the odd harmonics of the LO will be also down-converted to base-band. The even harmonics can be suppressed by using a differential configuration. The 3rd and 5th harmonics are the most important, since for the lower part of the band (1-2GHz) these harmonics are still in-band (below 6GHz). Thus a mechanism of harmonic rejection is required to make sure that the receiver is robust against harmonic blockers.

Harmonic rejection was under intense research in the past two decades. The most popular approach was introduced in [1], and its concept is shown in Figure ???. The main idea was to use 8 LO phases for the mixer switches, and recombine the outputs with the correct coefficients so a sine-like equivalent waveform is implemented. Generally speaking, the more LO phases are used, the more harmonics can be cancelled since the effective LO signal is closer to sinusoidal. For 6 phases [2] only the 3rd harmonic is cancelled, for 8 phases [2, 3, 4, 5, 6, 7, 8, 9, 10] the 3rd and 5th are cancelled, and for 16 phases [9] the odd harmonics up to 13th order can be cancelled. The design complexity and power consumption grows with the number of LO phases, so using more phases has a large cost.

Additional approaches to achieve harmonic rejection were introduced, all trying to emulate a sine-like LO using rectangular (digital) pulses. Multi-level LO DAC [2, 3], LC tank at 4fLO [2, 4], 2-stage recombination with 8 LO phases [3, 4], using 3rd and 5th LO harmonics with feedforward cancellation [5] and using LO pulse-width modulation [6] are just a few examples. These approaches are targeting high-performance applications and all require large power for an extra hardware to implement the harmonic rejection.

Since low-power receivers are narrowband, they do not use harmonic rejection. So we need to come up with a low-power harmonic rejection scheme which will still be highly linear. Thus we would like to use a rectangular LO with a passive mixer for good linearity and come up with a low-power scheme.

In addition, we’d like to be robust against large harmonic blockers of at least -10dBm. The majority of mixer-first receivers [2, 3, 4, 5, 6, 7, 8, 9, 10] don’t mention the harmonics power and concentrate on the level of the harmonic rejection. The published results that report the harmonics power are shown in Table ???. We can see that the majority of them are
reporting harmonic blocker powers of -25dBm to -37dBm. The work in [?] is dedicated to improve large signal harmonic rejection. Their architecture (called “harmonic rejection TIA”) has extra circuitry for large harmonic blocker resiliency with large power consumption of 40-70mW. Rather than quoting the harmonic rejection, this paper shows the NF degradation with large harmonics. We will compare our harmonic rejection linearity to this state-of-the-art paper.

Table 3.3. Published harmonic rejection linearity results. Only one work dedicated to large signal harmonic rejection is able to cancel -10dBm harmonics.

<table>
<thead>
<tr>
<th>Ref.</th>
<th>Harmonics power (dBm)</th>
<th>LO phases</th>
<th>Harmonics</th>
<th>Rejection (dB)</th>
<th>Method</th>
</tr>
</thead>
<tbody>
<tr>
<td>[?]</td>
<td>-30</td>
<td>8</td>
<td>3,5</td>
<td>60,64</td>
<td>RF+BB Gm (2 stages)</td>
</tr>
<tr>
<td>[?]</td>
<td>-37</td>
<td>8</td>
<td>3,5</td>
<td>56,56</td>
<td>RF Gm</td>
</tr>
<tr>
<td>[?]</td>
<td>-25</td>
<td>8</td>
<td>3,5</td>
<td>75,45</td>
<td>BB Gm</td>
</tr>
<tr>
<td>[?]</td>
<td>-30</td>
<td>4 (1)</td>
<td>3,5,7,9</td>
<td>&gt;70</td>
<td>Digital MMSE equalizer</td>
</tr>
<tr>
<td>[?]</td>
<td>-10</td>
<td>8</td>
<td>3,5</td>
<td>not reported (2)</td>
<td>Harmonic rejection TIA</td>
</tr>
</tbody>
</table>

(1) Two paths are used. (2) Gain and NF degradation due to large harmonic blockers is reported.

3.1.3 Proposed architecture

Our proposed architecture is shown in Figure ???. To achieve low power harmonic rejection, we can use small mixer switches with large resistance (as we described in section ??) and use 8 phases to cancel the 3rd and 5th harmonics. As mentioned in section ??, we won’t be able to achieve impedance matching using base-band TIAs, since the switch resistance will already be larger than 50Ω. Hence we eliminate the base-band TIAs and use the harmonic recombination G_m stage directly at the base-band input.

To achieve impedance matching we use a shunt resistor R_p at the RF input. This matching
strategy is a simple passive solution with no linearity and power consumption downside. Active negative impedance matching can be further explored, analyzing its noise, linearity and power consumption consequences.

Figure 3.6. Conventional mixer-first [?] and the proposed architecture. Harmonic recombination earlier in the receiver chain enables better linearity of the harmonic cancellation. Impedance matching is achieved by a shunt \( R_p \) resistor.

This architecture has a substantial advantage in the **harmonic rejection linearity**. In the conventional architecture the harmonic blocker is amplified by the TIAs before getting cancelled by the \( G_m \) stage recombination. In the proposed architecture, the harmonic blocker is cancelled right at the base-band input before any amplification. Hence this architecture is more linear in terms of harmonic rejection - it is capable of cancelling larger blockers without affecting the fundamental path.

In order to calculate the required shunt resistor \( R_p \) for impedance matching, we can look at the mixer-first input impedance without the base-band TIAs. In this case the low-frequency base-band impedance is infinite, and the effective base-band impedance is only \( R_{sh} \) (see Figure ??). From equations ?? and ?? for \( N = 8 \) we get \( \gamma = 0.12 \) and \( R_{sh} = 19.8 (R_a + R_{sw}) \). Hence the input impedance is:

\[
R_{in} = R_{sw} + R_{sh} = R_{sw} + 19.8 (R_a + R_{sw})
\] (3.5)

From equation ?? the input impedance is in the k\( \Omega \) range, so we need \( R_p \approx R_a \) for matching. To analyze the noise impact of this parallel matching we can use a simple LTI model in Figure ??.

Since \( R_{sh} \gg R_{sw} \) we can ignore \( R_{sh} \) for a simpler analysis (which has only 0.2dB of error in the noise figure). For \( R_p = R_a \) the noise figure is:

\[
F = 2 + \frac{4R_{sw}}{R_a}
\] (3.6)
The factor of 2 is due to the noise of $R_p = R_a$, and the factor of 4 is due to the fact that $R_{sw}$ is not in series with $R_a$, so the noise of $R_a$ is attenuated by $R_p$ before propagating to the base-band output.

We can compare the mixer-only noise figure of the original mixer-first architecture from equation ?? to the proposed architecture. The result is shown in Figure ?? . The noise figure degradation is between 3dB for small switch resistances and 6dB for large switch resistances. Based on our desired noise figure range of about 12-24dB, the mixer noise contribution of this architecture looks reasonable.

![Figure 3.7](image1.png)

Figure 3.7. Mixer-first with shunt $R_p$ matching equivalent Linear Time-Invariant model (without the base-band noise contribution).

![Figure 3.8](image2.png)

Figure 3.8. Mixer-first noise figure (mixer noise contribution only), shunt $R_p$ noise impact. Degradation of 3-6dB due to $R_p$ is still reasonable for our low-power high-noise application.

### 3.1.4 AC-coupling

In the proposed architecture in Figure ??, we need to address the issue of different common-mode voltages of the base-band input and the RF. The base-band input is a $G_m$ stage where optimum implementation in terms of noise efficiency is a complimentary inverter structure. Thus the optimum base-band input common-mode voltage is around mid-rail ($0.5V_{dd}$). The three possible options for AC/DC coupling of the mixer are shown in Figure ?? . DC coupling is the simplest
solution, but its main drawback is a large mixer switch resistance, since the device $V_{gs}$ is only $0.5V_{dd}$. LO AC coupling improves the mixer switch resistance (though it is not optimal due to losses to the mixer bottom-plate cap) at a cost of larger power consumption and area. BB AC coupling has the best switch resistance, but the cost is loss in signal path (due to bottom-plate capacitance), area and extra noise from the inverter feedback resistor $R_{inv}$.

![DC coupling](image1)

![LO AC coupling](image2)

![BB AC coupling](image3)

Figure 3.9. Different options for mixer AC/DC coupling. The LO swing is shown in blue, and the common-mode voltages are shown in green. High-performance receivers usually use LO AC-coupling, but for our low-power receiver BB AC-coupling has a better overall performance.

To analyze the noise impact of the inverter feedback resistor $R_{inv}$ in the case of BB AC coupling, we observe that this resistor is connected to a low-impedance node (TIA input) at the $Gm$ drain. In Figure ??, we show a simplified LTI schematic of the impedances that impact the noise propagation of the $R_{inv}$ noise to the $Gm$ input.

![Noise impact](image4)

Figure 3.10. Noise impact of the inverter feedback resistor $R_{inv}$ in the case of BB AC coupling. A simplified LTI schematic (left) and $R_{inv}$ noise source (right).

Since we need the high-pass corner frequency to be much lower than the signal pole, we can treat $R_{low}$ and $C_{BB}$ as shorts, so the $R_{inv}$ noise transfer function is a low-pass with a corner frequency at the AC-coupling high-pass corner frequency:

$$v_i = \frac{v_{n,Rinv}}{1 + sR_{inv}C_{ac}}$$

(3.7)

This behavior is similar to $kT/C$ noise in sample-and-hold circuit. To reduce the resistor noise impact we need to increase the capacitor $C_{ac}$ as much as possible.

In most high-performance mixer-first receivers, LO AC coupling is used [? ], [? ], [? ]. When high performance is targeted, the mixer size is large, so we only lose the LO swing due to the AC-coupling cap bottom-plate. The extra DC power is also tolerated in these applications.

In our case we are using small mixer sizes (down to minimum size devices). Then the AC-coupling cap should be small, and its bottom plate portion becomes larger. So we have large
losses and extra power consumption to drive this bottom-plate cap. Thus we choose to use BB AC-coupling, so we won’t spend extra power on the LO distribution, and pay the extra area and performance loss of the bottom-plate and noise in the signal path.

### 3.1.5 DC-offset

The base-band AC coupling has an important impact on the DC offset of the receiver. A comparison between the opamp DC offset contribution of DC-coupling the base-band and AC-coupling the base-band is shown in Figure ???. The opamp offset gain is a non-inverting amplifier gain:

\[
\frac{V_{os,\text{out}}}{V_{os}} = 1 + \frac{R_5}{R_{out,G_m}}
\]

where \(V_{os}\) is the opamp offset, \(V_{os,\text{out}}\) is the output offset due to the opamp and \(R_{out,G_m}\) is the impedance looking into the \(G_m\) output.

![Figure 3.11. Opamp DC-offset impact in DC and AC coupling of the base-band. AC coupling creates low impedance looking into the \(G_m\) output and large gain of the opamp DC offset to the receiver output.](image)

When the base-band is DC-coupled, the impedance seen into the mixer from the \(G_m\) is low, so the impedance looking into the \(G_m\) output is \(r_o||R_{inv}\), which is high. Thus the offset gain is small, and the opamp offset contribution is relatively low.

However, when the base-band is AC-coupled, the impedance seen into the mixer from the \(G_m\) is infinite at low frequency, so looking into the \(G_m\) output we see a low \(1/g_m\) impedance. Thus the offset gain is large, and the opamp offset becomes the major contribution to the overall receiver DC offset. Thus a mechanism to cancel this DC offset is needed, which we will describe in section ??.

### 3.2 Design Implementation

The implemented receiver architecture is shown in Figure ???. We will describe the details of the analog components here, while the details of the ADC implementation can be found in [? ].

The LO contains a divide-by-4 circuit, and a size-programmable LO distribution circuit that creates 8 non-overlapping phases and drives the mixer switches. The LO distribution is partitioned into 8 unit cells (Figure ??), each with a series resistance of roughly 240Ω. The baseband capacitors
are programmable with 7 bits of resolution to keep the low-pass corner constant for the mixer tuning range and for process variations.

The parallel $G_m$ unit cells are shown in Figure ?? and in more detail in Figure ?? . The $G_m$ cells were built as cascode transconductors to increase the output impedance and avoid degrading the subsequent filter response. The cascode devices gate voltage is also used to turn the cells off. Each $G_m$ unit cell has a transconductance of 0.9 mS. Tunable resistive degeneration (with 5 bits of resolution) of the $G_m$ unit cells allows fine-tuning of the harmonic recombination weights. This tuning method is not in the signal path, so it does not affect the fundamental transfer function. The AC-coupling caps are shared between different phases of the $G_m$ elements, and there are no series switches for the AC coupling caps (to prevent additional noise).

Several aspects of the $G_m$ stage limit the scalability range of the receiver. First, the maximum size is limited by the series capacitor area. The minimum size is limited by the resistor area. In addition, series switches at the output of the $G_m$ cells were added to reduce the parasitic cap of the $G_m$ when tuned to a low $G_m$ setting. These parasitic caps prevent the biquad filter from keeping a constant transfer function at low capacitor settings.

A Rauch biquad TIA [? ], [? ] converts the total current obtained after harmonic recombination into a voltage, providing third order filtering together with the pole after the mixer switches. The Rauch topology was chosen for good linearity, low power, and input capacitance that can embed the parasitics of the $G_m$ output. The passive components of the filter are tunable with 7 bits of resolution to maintain a constant gain and filtering profile across the different $G_m$ sizes. For small $G_m$ size, large feedback resistor for the filter is used (to keep a constant gain), and the filter caps are tuned to maintain the second-order biquad transfer function. A two-stage Miller-compensated opamp was used in the filter. The prototype of this receiver supports a single baseband bandwidth.
Figure 3.13. Scalable RF front-end schematic diagram.

Figure 3.14. $G_m$ stage schematic diagram. All inputs and outputs are shorted.
of 10 MHz, but the architecture can be changed to support various bandwidths for massive MIMO applications by using higher resolution on the filter capacitors and a higher bandwidth op-amp.

The DC-offset cancellation DAC is shown in Figure 3.15. The current is injected into the first stage (folded cascode) of the opamp. The DAC has 9 bits of resolution with an LSB of 40nA to provide an output voltage resolution of 10mV. Due to the low LSB current, the device size is small (w=600nm, l=2.4µm). The current is steered to the positive side, negative side or ground to prevent the bias point to change if the branch is off. On-chip automatic calibration for zero input was implemented, by scanning the DAC codes and selecting the code with minimum output offset.

Figure 3.15. DC-offset cancellation DAC location in the opamp (left) and DAC structure (right).

3.3 Measurement Results

The RF-to-digital receiver was implemented in 65 nm CMOS process (Figure 3.16). Figure 3.16 shows the NF and power scaling range of the receiver (RX) at three different LO frequencies, implemented by tuning the sizes of the mixer switches and the $G_m$ stages. For the min RX size, a single mixer unit cell and $G_m$ cell were used, while for the max RX size, 8 parallel mixer unit cells and 4 parallel $G_m$ cells were used.

The passives are tuned to maintain constant gain and third order filtering profile, as shown in Figure 3.17. The receiver power scales by 4x from the maximum to minimum size, saving power in larger arrays. In this prototype, the maximum LO frequency was limited to 1.7 GHz by the LO frequency divider (which receives an externally generated 4x clock) rather than the LO distribution architecture. Harmonic recombination may not be required for higher bands in the sub-6 GHz range, so a simpler factor of 2 divider can be used with the same input clock.

The linearity performance of the receiver is shown in Figure 3.18 with LO frequency of 800 MHz, signal at 801 MHz, and large harmonic blocker at 2400.7 MHz. The early harmonic recombination enables a large harmonic blocker P1dB of $>-6.2$ dBm and NF degradation of $<10$ dB for input harmonic powers up to -5 dBm. In addition, resilience to large nearby blockers at offset of 40 MHz was measured. The low-pass filtering after the mixer downconversion and additional biquad filtering enable gain and NF degradation of less than 2 dB for input blocker powers up to -10 dBm. The results in Figs. 3.19, 3.20, and 3.21 were measured at the analog test output before the ADC input.

The input matching measurements for LO frequency of 1GHz are shown in Figure 3.22. The
Figure 3.16. Die micrograph.

Figure 3.17. Receiver scalability.

Figure 3.18. Receiver gain for min and max RX size.
impedance seen from the RF port is the intentional 50Ω resistor in parallel with the impedance seen into the mixer. When the mixer size increases as we approach the min RX size, the out-of-band impedance seen into the mixer becomes larger, so the out-of-band impedance seen from the RF port becomes closer to 50Ω. The center frequency is lower than 1GHz due to the parasitic capacitance of the pad and the switches on the RF port. Since the input impedance to the mixer has a band-pass response, it can be seen as a parallel RLC circuit. Thus adding a parasitic capacitance on the RF port decreases the central frequency. Changing the mixer size affects the Q of the band-pass, so the same parasitic capacitance results in different frequency shift. This frequency shift can be eliminated using complex feedback between the base-band I and Q paths [9], [10].

Figure 3.20. Measured input matching of the receiver at 1 GHz.

The power consumption breakdown of the receiver and the ADC is shown in Figure ??.
opamp is the only non-scalable part in the current design since its power consumption is relatively low. The LO power is the major contributor since we are limited by min size devices when implementing the min size RX settings.

Figure 3.21. Measured power consumption breakdown of the receiver and the ADC at 1 GHz.

3.4 Conclusion

When used in an array, this receiver can be configured with higher NF and lower power as the number of array elements grows to maintain constant array-level NF and power consumption while improving spatial selectivity. In this work, the single-element NF range is ~13-19 dB. For an array-level NF of 1.5 dB, 16 elements are required for the max RX size and 64 elements are required for the min RX size. To support >64 elements without linear increase in array power, a smaller (lower power) RX unit cell is required. Similarly, to maintain an array-level NF of 1.5 dB for <16 elements, more unit cells are required in each receiver to lower the per-element NF.

Figure ?? shows the calculated equivalent array-level NF (bottom) and power consumption (top) for three different LO frequencies. With the proposed scalable architecture, this design can maintain sub-2.5 dB array-level noise figure with up to 64 antennas and <368 mW total receiver+ADC power consumption, much lower than any prior art shown in Table ?? when referenced to a 64-element array.

While [?] has relatively low power, it does not include baseband circuitry and uses analog beamforming that supports only a single user. The harmonic blocker resilience of this scalable low-power design is comparable to the state-of-the-art [? ], allowing multiple users to be supported through digital beamforming. Overall, this scalable design can support an array size increase of up to 4x while maintaining excellent linearity and nearly constant array-level NF and power consumption.
Figure 3.22. Calculated array-level NF and total receiver + ADC power consumption.

<table>
<thead>
<tr>
<th>RF Freq. (GHz)</th>
<th>0.1-3.1</th>
<th>1.0-2.5</th>
<th>0.1-3.3</th>
<th>0.4-3</th>
<th>0.25-1.7</th>
</tr>
</thead>
<tbody>
<tr>
<td>BW (MHz)</td>
<td>NR</td>
<td>NR</td>
<td>NR</td>
<td>0.5-50</td>
<td></td>
</tr>
<tr>
<td>Array elements</td>
<td>4</td>
<td>4</td>
<td>1</td>
<td>1</td>
<td>10</td>
</tr>
<tr>
<td>NF (dB)(1)</td>
<td>3.4-5.8(2)</td>
<td>6</td>
<td>1.7</td>
<td>1.8-2.4</td>
<td>13.2-13.8(3)</td>
</tr>
<tr>
<td>Gain (dB)</td>
<td>41</td>
<td>12</td>
<td>NR</td>
<td>70</td>
<td>46</td>
</tr>
<tr>
<td>3rd harm. blocker P1dB (dBm)</td>
<td>N/A</td>
<td>N/A</td>
<td>-6.5</td>
<td>N/A</td>
<td>-4.3(3)</td>
</tr>
<tr>
<td>Harm. blocker NF @-5dBm (dB)</td>
<td>N/A</td>
<td>N/A</td>
<td>9</td>
<td>N/A</td>
<td>22.9(3)</td>
</tr>
<tr>
<td>Out-of-band IIP3 (dBm)</td>
<td>-5/12(5)</td>
<td>5</td>
<td>11.5</td>
<td>8</td>
<td>14.6(3)</td>
</tr>
<tr>
<td>Supply (V)</td>
<td>1.2</td>
<td>1.0</td>
<td>1.0</td>
<td>0.9</td>
<td>1.2 Analog, 1.0 Digital</td>
</tr>
<tr>
<td>CMOS technology</td>
<td>65 nm</td>
<td>65 nm</td>
<td>28 nm</td>
<td>28 nm</td>
<td>65 nm</td>
</tr>
<tr>
<td>Total power (mW)</td>
<td>116-147</td>
<td>26-36</td>
<td>36.8-62.4</td>
<td>&lt;40</td>
<td>7.6-19(3)</td>
</tr>
<tr>
<td>Total area (mm$^2$)</td>
<td>0.8</td>
<td>0.2</td>
<td>5.2</td>
<td>0.6</td>
<td>0.7</td>
</tr>
</tbody>
</table>

NR: Not reported. N/A: Not applicable.

(1) For multi-element arrays, equivalent array-level noise figure calculated as single-element NF - 10 log$_{10}$(num. of array elements). (2) With spatial filtering enabled (without: 1.7-4.5 dB).
(3) Max size configuration. (4) Min size configuration. (5) Depends on receiving angle.

Table 3.4. Summary and comparison with state-of-the-art.
4.1 Motivation

The receiver design that we described in the previous chapter has a few issues that can be improved.

First, optimization of the power consumption was not rigorously performed. We came up with the architecture that enables scalable noise-power consumption tradeoff, but did not design it to have optimum power consumption for each noise setting.

Second, the design was performed for a particular technology (65nm). Design decisions (like LO/BB AC coupling, LO chain fanout, $G_m$ stage structure and so on) were made for this particular technology. If we’d like to design the next version of this receiver in a different technology node, we should repeat the same manual process of creating schematics, drawing layouts, running simulations, updating schematic and layout parameters all over again. The optimum design point will obviously depend on the technology, so we need to have a very long design cycle to get the final optimized design.

This problem is of course nothing new, analog designers faced it for many decades. Recently a Berkeley Analog Generator version 2 (BAG2) framework was introduced [? ]. This framework enables design automation by creating process-portable circuit “generators”. The generators are capturing the design methodology, the schematic and layout creation and running testbenches. Using this framework we can write a single circuit generator for the entire system, and produce different implementations (“instances”) for different specs and different technologies.

In section ?? we will give a short introduction of the BAG framework. Then in section ?? we will introduce a design methodology procedure within BAG that will provide us with the optimum receiver given the specs and the technology. In section ?? we will give a detailed description of the
chip that implements this design methodology in a 16nm FinFET process, and in section ?? we will show the measurement results.

4.2 Berkeley Analog Generator

The Berkeley Analog Generator (BAG) was first introduced in [? ]. The main idea of this framework was that instead of designing a circuit for a specific spec and technology, the designer should capture the design methodology into a circuit “generator”. The generator gets inputs of specs and technology and consists of methods to produce schematics, layouts and testbenches. Using these generators the designer can implement automated design procedures (including loops) that result in a verified post-layout simulated design that meets the desired specs (an “instance”). This procedure is illustrated in Figure ?? . For different specs and/or different technologies, the generator can quickly produce different instances, which reduces the overall design time. Recently a second version of BAG was released [? ], which enables easier process-independent generators creation for deeply scaled technologies and has new layout generation engines.

![Figure 4.1](image.png)

Figure 4.1. The general idea of a generator: a single procedure (circuit generator) produces verified design instance for given specifications and technology.

A simplified flowchart of a circuit generator design is shown in Figure ?? . The blocks shown in blue are implemented in the generator framework (Python) and the blocks shown in brown are generated into the circuit design and simulation software (Cadence Virtuoso). We can summarize the steps as follows:

- The design script gets the specifications and the technology as its inputs, and provides the parameters (device sizes, threshold flavors, number of stages etc.) for the schematic, layout and testbench generators.

- The schematic, layout and testbench generators create an instance according to the parameters specified by the design script. The generated layout is DRC clean, and passes LVS with the generated schematic.

- The generated testbench is executed for the generated circuit (after post-layout extraction).

- The simulated results are fed back into the design script and compared with the desired specifications. If some of the specs are not met and a change of parameters is required, new schematic, layout and testbench instances are generated and the procedure is repeated. Also if the specs are met but a more optimized result is required, the procedure can be repeated.

From Figure ?? we can see that four scripts should be written: the three generators (schematic, layout and testbench) and the design script. The design script incorporates the design methodology
of the circuit. Based on the specs it can select the desired architecture, run preliminary device-level simulations (or get them from a previously generated database), run circuit-level simulations for sub-circuits and extract their important parameters in the desired design space and technology, run optimization algorithms on these parameters (without running additional simulations) and so on.

The flow in Figure ?? is a simplified one, in practice every generator has a different version of this flow. Our goal is to create a flow that will minimize the execution time until the specs are met, so the overall design time is minimized. Thus we should come up with a design methodology that will minimize the number of generation and simulation cycles.

Another important feature of a BAG generator that it can actually provide us instances with better performance than manual designs. If many design iterations are needed, automation saves considerable amount of time, since the designer doesn’t need to repeat manual steps of changing the schematic and the layout and re-running simulations. So with an automated BAG generator, better optimization result can be achieved in shorter overall design time for new specifications and/or technology.

In addition, generators enable easy design re-use for different projects. Many building blocks are used for different applications, with different specs or technologies. Once a generator is built, it can be used for different projects or for different blocks in the same system, without manually re-designing it for each particular application.

\footnote{In addition to the generated schematic, layout and testbench, a behavioral model and various other files (lef/lib/verilog/spice/...) need to be generated to enable integration into a larger SoC.}
4.3 Receiver Design Methodology

4.3.1 DC power optimization

The first question to address is how should we find the optimum DC power consumption for a given noise figure spec. The main contributors to the noise of the receiver are the front-end components: the $R_p$ matching resistor, the mixer switches and the baseband $G_m$. They are shown in the schematic in Figure 4.3. $R_p$ is fixed to be equal to the antenna resistance for impedance matching. The mixer size (and consequently the LO distribution driving it) and the $G_m$ size are unknown for now.

![Figure 4.3. The front-end components that contribute to the noise figure of the receiver.](image)

While $R_p$ is fixed, many combinations of the mixer and the $G_m$ size result in the same noise figure.

We can intuitively see the optimization process in the following way. Larger LO size will result in lower noise figure and larger power consumption. Similarly, larger $G_m$ size will also result in lower noise figure and larger power consumption. So we could achieve the desired noise figure by using large LO and small $G_m$, or by using small LO and large $G_m$. Actually there are many combinations of the mixer and the $G_m$ size that result in the same overall noise figure. From these combinations we would like to pick the mixer size and the $G_m$ size that will minimize the overall power consumption.

To formulate the optimization process we will write the power consumption and the noise contribution of the mixer and the $G_m$ as functions of their size. The LO power consumption is proportional to the mixer size $M_{mixer}$ (number of fingers for given finger width and length):

$$P_{LO} = K_{p,LO} M_{mixer} \quad (4.1)$$

where $K_{p,LO}$ is a constant. Similarly the $G_m$ stage power consumption is proportional to its transconductance $G_m$:

$$P_{Gm} = K_{p,Gm} G_m \quad (4.2)$$

where $K_{p,Gm}$ is a constant. Thus the total power consumption is:

$$P_{total} = K_{p,LO} M_{mixer} + K_{p,Gm} G_m \quad (4.3)$$

The noise voltage generated by the LO is:

$$v_{n,LO}^2 = \frac{K_{n,LO}}{M_{mixer}} v_{n,s}^2 \quad (4.4)$$
where \( v_{n,s} \) is the source (antenna) noise voltage and \( K_{n,LO} \) is a constant. Similarly the noise voltage generated by the \( G_m \) is:

\[
v_{n,Gm}^2 = \frac{K_{n,Gm}}{G_m} v_{n,s}^2
\]  \hspace{1cm} (4.5)

where \( K_{n,Gm} \) is a constant. The noise voltages in equations ?? and ?? can be referred to any point in the circuit (input/output/other). It is just important that all three of \( v_{n,LO} \), \( v_{n,Gm} \) and \( v_{n,s} \) will be referred to the same point in the circuit. Then the receiver noise figure is:

\[
NF = \frac{v_{n,s}^2 + v_{n,Rp}^2 + v_{n,LO}^2 + v_{n,Gm}^2}{v_{n,s}^2} = 2 + \frac{v_{n,LO}^2 + v_{n,Gm}^2}{v_{n,s}^2}
\]  \hspace{1cm} (4.6)

where \( v_{n,Rp} \) is the \( R_p \) resistor noise voltage which is equal to \( v_{n,s} \) since \( R_p \) is equal to the antenna resistance. Substituting the expressions from equations ?? and ?? we get:

\[
NF = 2 + \frac{K_{n,LO}}{M_{mixer}} + \frac{K_{n,Gm}}{G_m}
\]  \hspace{1cm} (4.7)

From equation ?? we can write the required \( G_m \) for a given noise figure and mixer size:

\[
G_m (M_{mixer}) = \frac{K_{n,Gm}}{NF - 2 - \frac{K_{n,LO}}{M_{mixer}}}
\]  \hspace{1cm} (4.8)

And from equations ?? and ?? we can derive the expression for the total power consumption as a function of the noise figure and the mixer size:

\[
P_{total} (M_{mixer}) = K_{p,LO} M_{mixer} + K_{p,Gm} G_m (M_{mixer}) = K_{p,LO} M_{mixer} + \frac{K_{p,Gm} K_{n,Gm}}{NF - 2 - \frac{K_{n,LO}}{M_{mixer}}}
\]  \hspace{1cm} (4.9)

This result is illustrated in Figure ?? . The mixer power consumption is linearly increasing with the mixer size, and the \( G_m \) power consumption is decreasing with the mixer size (since the \( G_m \) size is decreasing). For small mixer sizes the required \( G_m \) to meet the noise spec is sharply increasing (and becomes infinite since there is a minimum mixer size to meet the noise spec, from equation ?? \( NF = 2 + \frac{K_{n,LO}}{M_{mixer}} \), so the power consumption sharply increases as well. For large mixer sizes virtually all noise comes from the \( G_m \) and the total power consumption is dominated by the mixer. So an optimum mixer size exists that balances the mixer and \( G_m \) contributions and minimizes the power consumption.

From equation ?? we can find the optimum power consumption by finding the minimum at \( \frac{dP_{total}}{dM_{mixer}} = 0 \). The result is:

\[
P_{total, opt} = \frac{\Psi}{NF - 2}
\]  \hspace{1cm} (4.10)

where \( \Psi = \left( \sqrt{K_{p,LO} K_{n,LO}} + \sqrt{K_{p,Gm} K_{n,Gm}} \right)^2 \).

We can also derive the optimum mixer and \( G_m \) sizes:

\[
M_{mixer, opt} = \sqrt{\frac{K_{n,LO}}{K_{p,LO} NF - 2}} \sqrt{\Psi}
\]  \hspace{1cm} (4.11)

\[
G_{m, opt} = \sqrt{\frac{K_{n,Gm}}{K_{p,Gm} NF - 2}} \sqrt{\Psi}
\]  \hspace{1cm} (4.12)
And the optimum mixer and $G_m$ power consumptions:

$$P_{\text{mixer, opt}} = \sqrt{K_{p,LO} K_{n,LO}} \frac{\sqrt{\Psi}}{N F - 2}$$

(4.13)

$$P_{G_m, \text{opt}} = \sqrt{K_{p,Gm} K_{n,Gm}} \frac{\sqrt{\Psi}}{N F - 2}$$

(4.14)

Our first observation is that the optimum total power consumption is determined by a single constant $\Psi$, which incorporates all circuit parameters: architecture, technology, device choices, supply voltages, etc. For a given technology we should find a topology that minimizes $\Psi$.

A plot of the optimum power consumption vs the desired noise figure is shown in Figure ???. For large noise figure values, the plot is close to an asymptotic $\frac{\Psi}{N F}$ curve, which makes intuitive sense - when the noise can be larger by 3dB, we can spend 2x less power. However, due to the shunt resistor, when the noise figure is smaller than 10dB, the implementation becomes less efficient. When going down from noise figure of 10dB to 4dB the optimum power increases by 16x instead of 4x. This is an expected result, since this architecture was chosen when we had large noise figures in mind.

Lastly, the breakdown of $\Psi$ can tell us if the mixer or the $G_m$ is the dominant source of power consumption. From equations ?? and ?? we can see that:

$$\frac{P_{\text{mixer, opt}}}{P_{G_m, \text{opt}}} = \frac{\sqrt{K_{p,LO} K_{n,LO}}}{\sqrt{K_{p,Gm} K_{n,Gm}}}$$

(4.15)

Recall that $\Psi = \left( \sqrt{K_{p,LO} K_{n,LO}} + \sqrt{K_{p,Gm} K_{n,Gm}} \right)^2$ and $P_{\text{total, opt}} = P_{\text{mixer, opt}} + P_{G_m, \text{opt}}$. So if $\sqrt{K_{p,LO} K_{n,LO}}$ is much larger than $\sqrt{K_{p,Gm} K_{n,Gm}}$, the mixer will dominate the total power consumption, and our effort should be to try to reduce the mixer power consumption for the same mixer noise contribution.
4.3.2 Methodology implemenation

As we have seen in section ??, in order to design a receiver with an optimum power consumption for a given noise figure spec, we need to calculate 4 constants: $K_{p,LO}, K_{n,LO}, K_{p,Gm}, K_{n,Gm}$. The general procedure is shown in Figure ???. Two simulations should be performed: one for the LO and one for the $G_m$. The 4 constants are calculated from these simulations, and then used to calculate the optimum mixer and $G_m$ sizes.

A few notes about the specifications in the methodology:

- The LO frequency is an input spec for the LO distribution script, since it is directly related to the power consumption (the $K_{p,LO}$ constant). It also affects the noise contribution ($K_{n,LO}$) since for larger LO frequencies we have effectively narrower pulses (the edge slope is constant if we keep the same fanout) and higher switch resistance. The overlap between the LO pulses is also increasing with the frequency, resulting in additional noise.
• Linearity is only an input to the $G_m$ script, since its $V^*$ directly affects it. The mixer linearity is determined by the switch threshold (which is selected to be the lowest for the technology for speed reasons) and the supply voltage (which is selected to be highest for speed reasons).

• The flicker corner is an input to the $G_m$ script only, since we are using differential RF and the flicker noise from the LO is rejected as common-mode after down-conversion [? ].

The LO distribution simulation testbench is shown in Figure ?? . The matching resistor and the full 8-phase LO distribution network are included in the testbench. Instead of the actual baseband $G_m$ recombination, an ideal noiseless Verilog-A recombination module is used to produce the equivalent base-band outputs.

![Figure 4.7. The LO testbench for extracting the parameters for the receiver optimization.](image)

From this simulation we get the power consumption and the noise figure for a specific mixer size. Then from equation ??:

$$K_{p,LO} = \frac{P_{LO}}{M_{mixer}}$$

(4.16)

and from equation ?? (without the $G_m$ noise contribution):

$$K_{n,LO} = (NF - 2) M_{mixer}$$

(4.17)

The $G_m$ simulation testbench is shown in Figure ?? . A single transcuductor cell is been simulated.

![Figure 4.8. The $G_m$ testbench for extracting the parameters for the receiver optimization. Large caps are used for transconductance simulation.](image)

From the simulation we get the DC current, transconductance $G_m$, and input-referred voltage noise. We can write the power consumption of all $G_m$ cells in the following form:

$$P_{Gm} = V_{dd} (I_{Gm,I} + I_{Gm,Q}) = V_{dd} \times 2 \times \left(2 + \sqrt{2}\right) \times \frac{V^*G_m}{2} \left(2 + \sqrt{2}\right) V_{dd} V^*G_m$$

(4.18)
And from equation ?? we get:

\[ K_{p,Gm} = \left(2 + \sqrt{2}\right) V_{dd} V^* \quad (4.19) \]

The dependence on \( V^* \) makes intuitive sense. Larger \( V^* \) means larger power consumption for the same \( G_m \), which is exactly how we defined \( K_{p,Gm} \).

To calculate the \( G_m \) noise coefficient \( K_{n,Gm} \), we will look at the total \( G_m \) output current noise \( i_{n,out,Gm} \):

\[ i_{n,out,Gm} = v_{n,in,Gm} \times G_m \times \sqrt{1^2 + 1^2 + \left(\sqrt{2}\right)^2} \quad (4.20) \]

where \( v_{n,in,Gm} \) is the input voltage noise of a single \( G_m \) transconductor from the testbench of Figure ???. The output current noise due to the input (antenna) resistance \( i_{n,out,s} \) is:

\[ i_{n,out,s} = \sqrt{4KTR_a \times A_{mixer} \times G_m} \quad (4.21) \]

where \( A_{mixer} \) is the mixer gain (actually the loss) simulated from the LO testbench of Figure ???.

Then from equation ?? we can calculate the the \( G_m \) noise coefficient (the noises are output-referred currents):

\[ K_{n,Gm} = \frac{i^2_{n,Gm}}{i^2_{n,s}} G_m = \frac{v^2_{n,in,Gm}}{KTR_a A^2_{mixer}} G_m \quad (4.22) \]

To summarize, we should perform only two simulations (shown in Figures ?? and ??) for arbitrary LO and \( G_m \) sizes. The 4 constants needed to calculate the optimum receiver size for the desired noise figure are given by equations ??, ??, ?? and ??:

| \( K_{p,LO} \) | \( \frac{P_{LO}}{M_{mixer}} \) |
| \( K_{n,LO} \) | \( (NF - 2) M_{mixer} \) |
| \( K_{p,Gm} \) | \( \left(2 + \sqrt{2}\right) V_{dd} V^* \) |
| \( K_{n,Gm} \) | \( \frac{v^2_{n,in,Gm}}{KTR_a A^2_{mixer}} G_m \) |

(4.23)
4.4 Chip Implementation

The LO generation diagram is shown in Figure ???. An off-chip LO signal is amplified and fed into two separate circuits. One is dividing the LO by 4 and creating 8 non-overlapping phases, and the other is dividing by 2 and creating 4 non-overlapping phases. Each one of these circuits has a reset signal that initiates a timing circuit that starts the divider flip-flops at the right state. The switches of the 8 phases and 4 phases mixers are connected in parallel. The parasitic cap of the switches is not limiting our bandwidth since the switches that we are using are small.

The 8 phases mixer is used for the lower band of the input range (1-2GHz), where harmonic rejection is needed. The 4 phases mixer is used for the higher band of the input range (2-6GHz), where harmonic rejection is not required and we can save power by using less phases. Also the highest frequency to implement 8 phases is limited by the finite rise and fall times of the LO, so it’s not practical to implement 8 phases at 6GHz (where the input frequency would be 24GHz).

An AC-coupled inverter amplifier is used for an external LO input of 4-12GHz which drives the two timing circuits. This architecture can be used with an on-chip LO PLL instead of the external LO. Since the input LO is off-chip, the LO receiver is of fixed size, and not optimized to achieve the optimum power consumption for the required noise figure. The LO part included in the optimization is the timing circuit, the divider, and the logic driving the mixer switches.

A detailed diagram of the timing circuit and the divider-by-4 is shown in Figure ???. The differential divider consists of two divider state machines, one for each phase of the input LO. Each divider state machine starts at a fixed state 1000. The timing diagram of the negative clock phase LON outputs is shown at the bottom (the positive clock phase LOP diagram is similar). The reset signal starts a LORN clock that drives the dividing registers. The register outputs are NANDed with the input to create the output pulses that drive the mixers. Since the register output pulses are 2x wider than the input clock, the NAND output edges are defined by the input clock edges and not by the register output edges. This way the registers are not contributing noise to the output pulses. An inverter chain drives the mixer switches. The divider-by-2 has the same timing circuit and two registers instead of four in each divider state machine.
Figure 4.10. LO generation timing circuit and divider-by-4.
In section ?? we have analyzed the different options for AC-coupling the mixer. In the first version of the chip we chose to use base-band AC-coupling, since the LO overhead power for driving the bottom-plate of the AC-coupling caps was large. However, the area of the base-band AC-coupling caps was very large. Now with BAG generation, it is easier to generate different layouts and explore the different LO configurations. Also now we are using technology with smaller parasitics (16nm vs 65nm) which can lead to different optimum design.

The simplest way to implement an LO AC-coupling is shown in Figure ???. The LO is passing through a high-pass filter formed by a series capacitor and a shunt resistor. The resistor has to be large enough to prevent the cap from discharging when the mixer gate voltage is high. For a small switch the mixer cap is a few fF, so the AC-coupling cap is order of 10-s of fF. Our lowest frequency of operation is 1GHz, so the RC time constant should be at least 5ns. For $C = 20 fF$ it leads to $R = 250k\Omega$, which is not practical from an area standpoint.

![Figure 4.11](image1)

**Figure 4.11.** Simplest implementation of LO AC-coupling. Large resistor size makes this solution impractical for small mixer sizes.

A better LO AC-coupling implementation is shown in Figure ???. When the inverter output goes high to $V_{dd}$, the bias switch is turned off and the mixer gate is charged to $V_{dd} + V_{bias}$. When the inverter output goes low to 0, the bias switch is turned on and the mixer gate is shorted to $V_{bias}$.

![Figure 4.12](image2)

**Figure 4.12.** LO AC-coupling implementation with nmos switches (left) and pmos switches (right).

This technique can be implemented using nmos and pmos switches. For nmos we might have
a reliability problem for the bias switches that have a $V_{dd} + V_{bias}$ voltage between drain and gate. For pmos switches in off state, we have $|V_{gs}| = V_{bias}$, so the device is not turned completely off and the leakage current might discharge the cap. In addition, we’d like to be able to turn the mixer switches on to a DC $V_{dd}$ voltage, to test the chip with a base-band input. When adding additional switches, the complete schematic is shown in Figure 4.13. In LO and OFF modes the switches are set in the same state, so the bias transistors are connected to the common-mode voltage, in a similar way to LO. However, in ON mode, we would like to connect one of the mixer switches gates to $V_{dd}$. Then pmos switches have to be used, since with nmos we lose the threshold voltage when pulling up.

![Figure 4.13. LO AC-coupling implementation for LO (normal operation), OFF and ON (base-band input) modes.](image)

To minimize the leakage of the pmos devices when turned off, high-threshold devices are used. It was tested that even for FF corner (lowest threshold voltage) the leakage is small enough to have a negligible impact on the waveform at the lowest frequency of 1GHz.

The Gm stage transconductor unit cell is implemented as an inverter with resistive feedback for biasing. Large device length of 1µm was used to reduce the flicker noise corner frequency to around 400KHz. Common-mode rejection for the Gm was implemented using the conceptual circuit shown in Figure ???. The output common-mode voltage is used as a bias gate voltage of the degeneration nmos and pmos devices. In order to simplify the biasing, we would like to have the same feedback voltage to bias both the nmos and the pmos degeneration devices. Thus we would like the degeneration devices to be in saturation for $|V_{gs}|$ voltage of approximately 0.5$V_{dd}$. High threshold degeneration devices are used to allow this operation, while the inverter devices have low threshold to maximize their $V^*$ for better linearity. Headroom of 100mV on the nmos and the pmos side was used for the degeneration, providing common-mode rejection of 30dB. The downside of this headroom is a reduced $V^*$ of the inverters, from 240mV down to 150mV. Thus for each unit cell, an option to turn the common-mode rejection circuit off is implemented. So for the entire Gm stage linearity can be traded off for common-mode rejection.
We will now show the simulation results of the methodology to achieve the optimum LO and Gm size described in section ?? . The following inputs were given to the optimization script:

- Supply voltage of 0.9V for the LO and the $G_m$.
- LO frequency of 1GHz, 8 phases mixer.
- Antenna impedance of 50Ω.
- Flicker corner of 500KHz. $G_m$ unit cell pre-characterization was performed, resulting in channel length of 1µm.

The simulation results of the 4 constants in the methodology diagram (Figure ??) are shown in Table ??:

<table>
<thead>
<tr>
<th>Constant</th>
<th>Value (mW)</th>
</tr>
</thead>
<tbody>
<tr>
<td>$K_{p,LO}$</td>
<td>146</td>
</tr>
<tr>
<td>$K_{n,LO}$</td>
<td>266</td>
</tr>
<tr>
<td>$K_{p,LO}K_{n,LO}$</td>
<td>39</td>
</tr>
<tr>
<td>$K_{p,GM}(V^2)$</td>
<td>0.7</td>
</tr>
<tr>
<td>$K_{n,GM}$ (mS)</td>
<td>37</td>
</tr>
<tr>
<td>$K_{p,GM}K_{n,GM}$ (mW)</td>
<td>25</td>
</tr>
<tr>
<td>$\Psi$ (mW)</td>
<td>126</td>
</tr>
</tbody>
</table>

Table 4.1. Simulation results for minimum power consumption optimization.

Recall that from equation ?? $\Psi = (\sqrt{K_{p,LO}K_{n,LO}} + \sqrt{K_{p,GM}K_{n,GM}})^2$ and $P_{total, opt} = \frac{\Psi}{NF - 2}$. So after calculating $\Psi$ from the LO and $G_m$ characterization we are able to calculate the optimum sizes for a given noise figure spec. These sizes are shown in Table ?? . $M_{mixer}$ is the mixer number of fingers (nmos device of L=16nm and 2 fins) and $M_{Gm}$ is the $G_m$ number of fingers (inverter with L=1µm, 4 fins for nmos and 5 fins for pmos). We can see that when the noise figure is increased by
3dB, the LO size, $G_m$ size and total power consumption are decreased by approximately 2x. The ratio is close to 2x for large noise figures and gets larger as the noise figure decreases, as we have seen in Figure ??.

<table>
<thead>
<tr>
<th>NF (dB)</th>
<th>12</th>
<th>15</th>
<th>18</th>
<th>21</th>
<th>24</th>
</tr>
</thead>
<tbody>
<tr>
<td>$M_{mixer}$</td>
<td>34.6</td>
<td>16.2</td>
<td>7.8</td>
<td>3.9</td>
<td>1.9</td>
</tr>
<tr>
<td>$G_m$ (mS)</td>
<td>6.0</td>
<td>2.8</td>
<td>1.4</td>
<td>0.7</td>
<td>0.3</td>
</tr>
<tr>
<td>$M_{Gm}$</td>
<td>161.7</td>
<td>75.6</td>
<td>36.7</td>
<td>18.1</td>
<td>9.0</td>
</tr>
<tr>
<td>$P_{total}$ (mW)</td>
<td>9.1</td>
<td>4.2</td>
<td>2.1</td>
<td>1.0</td>
<td>0.5</td>
</tr>
</tbody>
</table>

Table 4.2. Optimum LO and $G_m$ sizes vs noise figure spec.

A noise figure of 15dB was chosen for this chip. The choice was practical from the total area restriction, since we have 4 transceivers in our chip. The optimum power consumption breakdown for NF=15dB is shown in Figure ???. From the results in Table ?? we see that $K_{p,LO}K_{n,LO}$ and $K_{p,Gm}K_{n,Gm}$ are of the same order of magnitude so the overall power contributions of the LO and the $G_m$ are similar. The LO power is approximately equally divided between the timing circuit, the divider and the mixers logic and drivers.

It is insightful to compare the high-level parameters of the current chip and the previous chip. The comparison is shown in Table ???. In the first generation FADER1 chip (described in chapter ??) we have used base-band AC-coupling that made the $G_m$ noise constant $K_{n,Gm}$ significantly larger. The $G_m$ power constant $K_{p,Gm}$ is smaller for the new chip since the supply voltage is smaller (0.9V vs 1.2V). The LO noise and power constants cannot be easily compared since they are defined with respect to the mixer number of fingers, and a single finger switch performance is very different in different technologies. For the LO part we can compare the overall LO constant $K_{p,LO}K_{n,LO}$. We can see that despite the fact that the new chip includes more functionality, as it includes the LO (timing circuit and divider) and uses LO AC-coupling, the overall LO constant is still smaller. There are two main reasons to this improvement. First, now we use more advanced technology with smaller devices, smaller parasitics, and lower supply voltage. Second, the design process with BAG generation enabled us to generate designs with optimized sizing (such as number of stages and fanout) due to layout generation automation.

The layout snapshot of the RF part of the chip is shown in Figure ???. The RF part of the chip contains four transceivers that share the same LO receiver circuit. Each transceiver consists of the
<table>
<thead>
<tr>
<th></th>
<th>FADER1</th>
<th>FADER2</th>
</tr>
</thead>
<tbody>
<tr>
<td>Technology</td>
<td>65nm</td>
<td>16nm</td>
</tr>
<tr>
<td>AC-coupling</td>
<td>BB</td>
<td>LO</td>
</tr>
<tr>
<td>LO divider included</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>$K_{p,LO}$ ($\mu$W)</td>
<td>438</td>
<td>146</td>
</tr>
<tr>
<td>$K_{n,LO}$</td>
<td>101</td>
<td>266</td>
</tr>
<tr>
<td>$K_{p,LO}K_{n,LO}$ (mW)</td>
<td>44</td>
<td>39</td>
</tr>
<tr>
<td>$K_{p,Gm}$</td>
<td>1.0</td>
<td>0.7</td>
</tr>
<tr>
<td>$K_{n,Gm}$ (mS)</td>
<td>65</td>
<td>37</td>
</tr>
<tr>
<td>$K_{p,Gm}K_{n,Gm}$ (mW)</td>
<td>64</td>
<td>25</td>
</tr>
<tr>
<td>$\Psi$ (mW)</td>
<td>214</td>
<td>126</td>
</tr>
</tbody>
</table>

Table 4.3. Comparison of the first generation FADER1 chip (chapter ??) and the current FADER2 chip.

LO, base-band caps, base-band TX, base-band RX and ADC. The receiver and the transmitter are sharing the LO and the base-band caps.

Figure 4.16. Layout snapshot of the RF part of the chip. Four transceivers share the same LO receiver circuit. Each receiver and the transmitter are sharing the LO and the base-band caps.
4.5 Measurement Results

The die micrograph is shown in Figure ???. The 4 transceivers are located in the center of the chip. Due to chip assembly constraints, the IO pads can only be located at the chip boundary. Transmission lines connect the 4 RF signals and the LO signals to the central part of the chip.

![Die micrograph](image)

Figure 4.17. Die micrograph.

In this chip we have an analog test output (before the ADC) only for a single receiver (TRX3 in Figure ???). So the analog measurements shown here are for this receiver only.

The receiver gain is shown in Figure ???. For the lower band (1-2GHz) 8 LO phases were used to provide harmonic rejection. For the upper band (3-6GHz) 4 LO phases were used since harmonic rejection is not needed.

![Receiver gain](image)

Figure 4.18. Receiver gain.
The receiver noise figure and power consumption are shown in Figure ???. Note that when the LO frequency is increased from 2GHz to 3GHz the power is decreased since 4 LO phases are used. We can see that noise figure is close to the 15dB that we designed for. The noise figure increases with the LO frequency since the rise time is fixed so the LO pulses become narrower and the switch resistance increases. The measured DC power consumption of 4mW at 1GHz is close to the simulated 4.2mW (Table ??).

If we compare the performance of this chip to the performance of the chip in Chapter ??, we can see that for LO frequency of 1GHz and noise figure of 15dB, the power consumption is down from 7.2mW (Figure ??, excluding the ADC power consumption) to 4mW. This is consistent with our optimization result in Table ??, where the constant $\Psi$ is down from 214mW to 126mW (recall that $P_{total,\text{opt}} = \frac{\Psi}{NF-2}$).

![Figure 4.19. Receiver noise figure and power consumption.](image)

In the previous chip we used Biquad TIA so 3rd order filtering response was achieved together with the pole after the mixer switches. In the current chip we have a simple TIA, so we only have a first-order filtering response. Both receivers have the same gain of 50dB, so in the current chip our resilience to nearby blockers is not as good. The front-end linearity is similar, but the nearby blockers gain is higher so we are limited by the linearity of the output stage.

The receiver resilience to inband harmonic blockers is shown in Figure ???. LO frequency of 1GHz was used with a signal at 1MHz offset at 999GHz. The 3rd harmonic blocker was at 0.9MHz offset at 2999.1GHz. We can see that the harmonic tolerance is similar to the previous chip. The harmonic blocker $P1dB$ is -5.1dBm and the NF degradation is <6dB for input harmonic powers up to -5dBm.
Figure 4.20. Receiver tolerance to inband 3rd harmonic blocker. LO frequency of 1GHz and signal at 1.001GHz.
Chapter 5

Conclusion

5.1 Thesis Summary

This thesis explores RF receivers design for large antenna array systems, with Massive MIMO as a primary application. First, a system analysis is performed, deriving the specifications for the receiver, with an emphasis on the differences between a large antenna array receiver and a single-antenna receiver. These specifications of low power, high noise and good linearity do not fall in the existing categories of “high performance” and “low power” single-antenna receivers.

A new receiver architecture is proposed, where mixer-first receiver with linear baseband chain and early harmonic recombination results in nearby and harmonic blocker tolerance of up to −10dBm. The small size of the mixer switches and the baseband Gm allows us to save power while maintaining good linearity. The architecture has a tunable size of the LO chain and the baseband Gm, so the same receiver can be used in various applications of array sizes and array-level noise specifications.

In the first chip prototype, tunable element noise figure results in sub-2.5 dB array-level noise figure with 16 to 64 antennas and <368 mW total power consumption, for a frequency range of 0.25-1.7GHz. Nearby and harmonic blockers of up to -10dBm are tolerated.

In the second chip, Berkeley Analog Generator (BAG) is used to build a circuit generator of the receiver. The proposed generator design methodology produces instances with optimum power consumption for a given noise figure specification. An instance of this generator is implemented in the chip, showing power consumption improvement of 50% and wider frequency range of 1-6GHz compared to the first chip.
5.2 Future Work

The design methodology used in the BAG generator has several issues that can be improved.

- The current generator implements a design instance for a single noise specification. Embedding the scalability into the design using the approach that we used in the first chip will provide a more complete receiver generator, providing solution to a range of noise figure specifications. Extra power consumption to implement the scalability can be explored and minimized.

- The LO distribution is now optimized for a single receiver. On the chip we have 4 identical receivers and the extra power to drive all 4 is non-negligible. Minimizing this extra power by using more compact layout where all 4 mixers are closer to each other can help. And taking this extra power in the power optimization with the baseband Gm will result in a better overall power consumption.

- Our power consumption optimization to achieve the desired noise figure does not take the TIA power consumption into account. The TIA power is set by the settling requirements of the ADC sampling capacitors. A more complete system level design can be performed that can balance the ADC resolution/speed and the receiver noise/linearity requirements to achieve a lower power consumption of the receiver+ADC.

In addition to the generator methodology improvements, on a system level, the beamforming type can be further explored. In our system we assumed that the digital power consumption won’t be very large due to relatively low data rates. If this assumption is not true, hybrid beamforming can be used to reduce the digital power and possibly reduce the RF receiver power due to more relaxed linearity requirements.
Bibliography


[24] M. C. M. Soer, E. A. M. Klumperink, B. Nauta, and F. E. van Vliet, “3.5 A 1.0-to-2.5GHz beamforming receiver with constant-Gm vector modulator consuming < 9mW per antenna


M. C. M. Soer, E. A. M. Klumperink, B. Nauta, and F. E. van Vliet, “Spatial Interferer Rejection in a Four-Element Beamforming Receiver Front-End With a Switched-Capacitor


