Millimeter-Wave Receiver and Package Design Close to the Device Activity Limits

Nima Baniasadi

Electrical Engineering and Computer Sciences
University of California, Berkeley

Technical Report No. UCB/EECS-2023-40
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-40.html

May 1, 2023
I want to express my profound gratitude to my research advisor, Prof. Ali M. Niknejad. He provided me with excellent academic guidance and supported me personally, and taught me to be patient when facing various problems. I also thank Prof. Mehrdad Sharif Bakhtiar and Prof. Ali Fotowat Ahmady, my undergraduate professors at Sharif University of Technology, from whom I learned a lot.

This journey would not have been possible without the support of my family and friends. I am grateful to my parents and brother; they have always helped me wherever and whenever I needed them the most. I would also like to thank my friends who provided me with a warm and supportive environment during the years of the pandemic and political crisis.
Millimeter-Wave Receiver and Package Design Close to the Device Activity Limits

by

Nima Baniasadi

A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy

in
Engineering - Electrical Engineering and Computer Sciences

in the
Graduate Division

of the
University of California, Berkeley

Committee in charge:
Professor Ali M. Niknejad, Chair
Professor Elad Alon
Professor William L. Holzapfel

Fall 2021
Millimeter-Wave Receiver and Package Design Close to the Device Activity Limits

Copyright 2021
by
Nima Baniasadi
Abstract

Millimeter-Wave Receiver and Package Design Close to the Device Activity Limits

by

Nima Baniasadi

Doctor of Philosophy in Engineering - Electrical Engineering and Computer Sciences

University of California, Berkeley

Professor Ali M. Niknejad, Chair

For several decades, rapid improvements in the semiconductor industry, particularly the scaling of CMOS processes, have enabled high-speed wireless communications. However, the scaling of CMOS processes seems to be paying off less and less. Moreover, as the carrier frequency increases, the limited power of the CMOS chip can be quickly dissipated by passive elements or at the edges of the chip. The next generations of high-speed radios will require co-design and co-optimization of the chip and package to ensure that the highest data rates are achieved.

This work addresses the design of a packaged wideband millimeter-wave radio. The fundamental limitations of the CMOS process for millimeter-wave applications are examined. Noise measure theory is used to design low-noise amplifiers near the device activity limits. New techniques for minimizing the insertion loss of passive matching networks are proposed. The challenges of a package design are investigated, and an optimized transition structure is proposed. Finally, a 140GHz wideband receiver operating at half the transit frequency of the technology is implemented.
To my parents
for their endless love and support

and to my brother
who made me laugh countless times.
Contents

1 Introduction 1
  1.1 Connectivity ..................................................... 1
  1.2 Capacity .......................................................... 3
  1.3 Silicon limits .................................................... 14
    Transit Frequency \( f_t \) ........................................ 15
    Analog Efficiency \( f_t \frac{g_m}{I_d} \) .................................. 15
    Speed-Power Trade-off ........................................... 17
    Termination Levels vs. Frequency ............................... 19
    Large Signal Power Gain vs. Frequency ......................... 19
    Detailed Model with Extrinsic Parasitics ..................... 20
  1.4 Challenges ........................................................ 24

2 Millimeter-wave LNA Design 27
  2.1 Introduction ..................................................... 27
  2.2 Derivation of the Noise Measure .............................. 30
  2.3 Examples .......................................................... 38
    CMOS Noise Measure .............................................. 38
    Multiple Active Devices ......................................... 40
  2.4 Design of Low-Noise CS Amplifiers with Single Feedback Component 41
  2.5 Design of Low-Noise CS Amplifiers with General Peripheral Network 43
  2.6 Optimal Bias Condition ......................................... 45
  2.7 Simulation Flow of Minimum Noise Measure .................. 47

3 140GHz Receiver Design 49
  3.1 Low-Loss LC Matching Networks ................................ 49
3.2 Transformers .................................................. 55
3.3 High Quality-Factor Inductors .............................. 63
3.4 Low Noise Active Balun .................................... 65
3.5 Interstage Amplifiers ...................................... 71
3.6 I/Q Splitter .................................................... 78
3.7 Mixer Design .................................................. 79
   Current Mode Mixer ...................................... 82
   Voltage Mode Mixer ..................................... 82
3.8 Baseband Amplifier ....................................... 90
3.9 Full Receiver Performance ................................. 104

4 Chip-to-Package Transition ................................. 107
  4.1 Packaging Challenges at High Frequencies ............ 107
  4.2 Transition Structures .................................. 108
  4.3 Limitation of the Stripline Structure .................. 114
  4.4 Final Pad Structure .................................... 119

5 Package-to-Package Transition ............................. 123
  5.1 Introduction .............................................. 123
  5.2 Design Principles ....................................... 124
  5.3 Design Considerations .................................. 125
  5.4 Prototype Design and Measurement Results ............ 128
     Interposer Technology .................................. 128
     Channel Design Trade-offs ............................. 128
     Antenna Design with Distributed Matching Network .... 130
     Prototype Performance ................................. 130
  5.5 Conclusion ............................................... 133

6 Conclusion .................................................. 134
  6.1 Thesis Summary ........................................... 134
  6.2 Future Directions ....................................... 135

Bibliography ................................................ 136
### List of Figures

1.1 Maslow's hierarchy of human needs with an additional new layer [1] ........................................ 2  
1.2 Mobile subscriptions by technology (billions) ................................................................. 2  
1.3 Backhaul capacity per distributed site ................................................................................. 3  
1.4 Mobile backhaul technology trade-Offs ............................................................................... 4  
1.5 Global backhaul media distribution .................................................................................... 4  
1.6 Cost of spectrum vs. cost of equipment over time ............................................................... 5  
1.7 Capacity vs. carrier frequency ............................................................................................... 6  
1.8 Safe radiation levels for persons in unrestricted environments [2] ....................................... 6  
1.9 An 8 × 8 2-D transmitter phased array with 1-D steering capability. ................................. 8  
1.10 An 8 × 8 2-D receiver phased array ...................................................................................... 9  
1.11 1-D Phased Array .................................................................................................................. 10  
1.12 Channel capacity vs. carrier frequency for a link based on Table. 1.2 ................................. 11  
1.13 Spectral efficiency vs. carrier frequency for a link based on Table. 1.2 .............................. 12  
1.14 The output power of published power amplifiers as a function of carrier frequency[3] .... 13  
1.15 A simple model of planar CMOS transistor. ........................................................................ 14  
1.16 Estimating the mobility of the device from simulations for different current densities (A/μm) ......................................................................................................................... 16  
1.17 Parasitic elements of a single-finger transistor. ................................................................. 21  
1.18 A simple planar transistor in layout view and its 3D representation .................................... 23  
1.19 Parasitic elements of a transistor. ......................................................................................... 23  
1.20 Simplified transistor model .................................................................................................. 25  
1.21 Packaged millimeter-wave radios ....................................................................................... 26  
2.1 Chain of identical noisy amplifiers ..................................................................................... 28  
2.2 Two scenarios for cascading non-identical amplifiers ....................................................... 29  
2.3 Noise figure vs. power gain ........................................................................................................ 31  
2.4 Y-parameter model of the circuit ......................................................................................... 31  
2.5 Thevenin equivalent circuit ..................................................................................................... 32  
2.6 CMOS transistor parasitic model. .......................................................................................... 38  
2.7 Noise measure vs. frequency .................................................................................................. 40  
2.8 Using a feedback component to improve the input reflection with minimum noise measure ................................................................................................................................. 42  
2.9 Power gain vs. feedback admittance at 190GHz. ............................................................... 43
3.37 Performance of the amplifier driving the splitter with the insertion loss of the splitter

3.38 Bias generation circuit for mixers

3.39 Current mode mixer

3.40 Current efficiency

3.41 Comparison of peak current conversion efficiency and corresponding input quality factor

3.42 Voltage mode mixer

3.43 Comparison of active and passive mixers in voltage mode with different peak-to-peak differential LO swings

3.44 Equivalent Thevenin source used in the mixer model

3.45 Decomposition of the impedance seen by the equivalent source into an all-pass and a low-pass section

3.46 Comparison of the input resistance of the passive mixers for in-band and out-of-band tones. The dashed portion of each line shows the region where the gain falls below $\frac{2}{\pi}$.

3.47 Performance of the mixer and its preceding gain stage

3.48 Mixer implementation

3.49 Wideband Cherry-Hooper amplifier [35]

3.50 Simplified model of the Cherry-Hooper topology

3.51 Comparison between the Cherry-Hooper topology and first-order amplifiers

3.52 Simplified model of an amplifier with active inductor

3.53 Comparison of the voltage gain in an amplifier with active inductor with its first-order and Butterworth counterparts

3.54 PMOS and NMOS implementation of the active inductor

3.55 Final implementation of the amplifier with active inductor

3.56 Performance of cascaded active inductor stages

3.57 Baseband chain

3.58 The layout of the baseband amplifier

3.59 Using an artificial T-line to increase the bandwidth

3.60 Baseband Chain Performance

3.61 The layout of the baseband amplifier

3.62 140GHz receiver taped out in 28nm CMOS technology

3.63 Power consumption of the receiver

3.64 Performance of the receiver chain

3.65 Translation gain vs. input power

4.1 Conventional microstrip GSG pads

4.2 Modeling the microstrip transition with transmission lines

4.3 $G_{\text{max}}$ versus frequency for different distances

4.4 Notch frequency of $G_{\text{max}}$ in the simulation versus the loop antenna model

4.5 Microstrip with front shield
List of Tables

1.1 Microwave and fiber consideration [4] .................................................. 3
1.2 A numerical example of an optimal over-the-air communication link .......... 12
1.3 Model values. .................................................................................. 24

2.1 Parameter values used for calculations .............................................. 41

3.1 Summery of different matching network design methodologies ............. 56
3.2 Comparison of the baseband amplifier with earlier work ..................... 101
3.3 Comparison of the receiver with the state-of-the-art ............................. 105

4.1 Performance of the final design ...................................................... 121
4.2 Summary of performance and comparison with the state-of-the-art .......... 122
Acknowledgments

I want to express my profound gratitude to my research advisor, Prof. Ali M. Niknejad. He provided me with excellent academic guidance and supported me personally, and taught me to be patient when facing various problems. I also thank Prof. Mehrdad Sharif Bakhtiar and Prof. Ali Fotowat Ahmady, my undergraduate professors at Sharif University of Technology, from whom I learned a lot.

This journey would not have been possible without the support of my family and friends. I am grateful to my parents and brother; they have always helped me wherever and whenever I needed them the most. I would also like to thank my friends who provided me with a warm and supportive environment during the years of the pandemic and political crisis.
Chapter 1

Introduction

1.1 Connectivity

The internet has become an essential part of daily life over the last three decades. The need for a stable Internet connection is so high that it can be considered an additional layer to Maslow’s hierarchy of human needs (Fig. 1.1). For example, during the COVID-19 crisis, the internet played a crucial role in keeping people connected despite physical isolation.

Although the speed of the internet has increased dramatically, user demand has also increased exponentially, and it continues to grow. For example, 5G subscriptions will reach 4.4 billion by 2027 (Fig. 1.2). While Internet Service Providers (ISPs) are upgrading their infrastructure to meet users’ needs, they need to keep costs low in such a competitive environment where operators demand 99.999% availability (about 5 minutes of downtime per year) [5].

Subsequent generations of mobile communications have smaller cells for higher spectral efficiency and lower path loss in free space. Fig. 1.3 shows the required backhaul capacity, which is in the tens of Gbps.

Operators have the choice to deploy different technologies. For a detailed comparison of fiber and wireless technologies and their tradeoffs, see Table. 1.1 and Fig. 1.4.

Fiber optic cables are often prohibitively expensive to deploy, and they are still prone to breaks and lengthy disruptions. However, they will be essential for core and inner-city aggregation sites with extremely high capacity requirements. Microwave and millimeter bands are suitable for heterogeneous network backhaul because they allow outdoor cell sites and network aggregation of traffic from multiple base stations, which can then be handed off to mobile switching centers and the core network at the end [6]. Note that tower placement
is not always required in urban areas (antennas can be mounted on rooftops, for example). Therefore, wireless will be used mainly in urban and densely populated areas as the last mile access. It is predicted that between 2021 and 2027, more than 60% of cellular base stations will be connected via microwaves and millimeter waves[7].

Wireless communication in licensed frequency bands increases ISP costs. However, as
CHAPTER 1. INTRODUCTION

Figure 1.3: Backhaul capacity per distributed site

<table>
<thead>
<tr>
<th></th>
<th>Wireless</th>
<th>Fiber</th>
</tr>
</thead>
<tbody>
<tr>
<td>Capacity</td>
<td>Up to several Gbps</td>
<td>Unlimited</td>
</tr>
<tr>
<td>Regulation</td>
<td>Requires spectrum</td>
<td>Requires right of ways</td>
</tr>
<tr>
<td>Deployment Time</td>
<td>Fast deployment time</td>
<td>Increases linearly with distance</td>
</tr>
<tr>
<td>Deployment Cost</td>
<td>Increases partially with distance</td>
<td>Increase linearly with distance</td>
</tr>
<tr>
<td>Terrain</td>
<td>Requires line-of-sight between two end-points</td>
<td>Costly when trenching in difficult terrain (if accessible)</td>
</tr>
<tr>
<td>Climate</td>
<td>Prone to weather conditions</td>
<td>Normally, not affected</td>
</tr>
</tbody>
</table>

Table 1.1: Microwave and fiber consideration [4]

As the carrier frequency increases, spectrum costs decrease (Fig. 1.6). On the other hand, the cost of equipment increases as the carrier frequency increases. However, new semiconductor technologies and novel circuit designs reduce these costs. Therefore, using higher carrier frequencies is cost-beneficial to the ISP and subsequently to the end-user.

1.2 Capacity

In this section, the relationship between the link’s capacity and the carrier frequency is investigated. Based on Shannon’s theorem, the channel capacity $C$ is given by

$$C = \frac{1}{\ln 2} B \ln (1 + SNR)$$ (1.1)
### Mobile Backhaul Technology Trade-Offs

Wireless vs Fixed vs Satellite

<table>
<thead>
<tr>
<th>Segment</th>
<th>Microwave (7–40 GHz)</th>
<th>V-Band (60 GHz)</th>
<th>E-Band (70/80 GHz)</th>
<th>Fiber-optic</th>
<th>Copper (Bonded)</th>
<th>Satellite</th>
</tr>
</thead>
<tbody>
<tr>
<td>Future-Proof Available Bandwidth</td>
<td>Medium</td>
<td>High</td>
<td>High</td>
<td>High</td>
<td>Very Low</td>
<td>Low</td>
</tr>
<tr>
<td>Deployment Cost</td>
<td>Low</td>
<td>Low</td>
<td>Low</td>
<td>Medium</td>
<td>Medium/High</td>
<td>High</td>
</tr>
<tr>
<td>Suitability for Heterogeneous Networks</td>
<td>Outdoor Cell-Site/Access Network</td>
<td>Outdoor Cell-Site/Access Network</td>
<td>Outdoor Cell-Site/Access Network</td>
<td>Outdoor Cell-Site/Access Network</td>
<td>Indoor Access Network</td>
<td>Rural only</td>
</tr>
<tr>
<td>Support for Mesh/Ring Topology</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes where available</td>
<td>Indoors</td>
<td>Yes</td>
</tr>
<tr>
<td>Interference Immunity</td>
<td>Medium</td>
<td>High</td>
<td>High</td>
<td>Very High</td>
<td>Very High</td>
<td>Medium</td>
</tr>
<tr>
<td>Range (Km)</td>
<td>5–30, ++</td>
<td>1–3</td>
<td>&lt;80</td>
<td>&lt;15</td>
<td>Unlimited</td>
<td></td>
</tr>
<tr>
<td>Time to Deploy</td>
<td>Weeks</td>
<td>Days</td>
<td>Days</td>
<td>Months</td>
<td>Months</td>
<td>Months</td>
</tr>
<tr>
<td>License Required</td>
<td>Yes</td>
<td>Light License/Unlicensed</td>
<td>Licensed/Light License</td>
<td>No</td>
<td>No</td>
<td>No</td>
</tr>
</tbody>
</table>

Note: Shading indicates preferred choice for 5G mobile backhaul.

Source: ABI Research

---

### Figure 1.4: Mobile backhaul technology trade-Offs

![Mobile backhaul technology trade-Offs](image)

### Figure 1.5: Global backhaul media distribution

Assuming a white profile for thermal noise, \( SNR = \frac{P_r}{\kappa B} \) and the capacity is

\[
C = \frac{1}{\ln 2} B \ln \left( 1 + \frac{P_r}{\kappa B} \right)
\]

(1.2)

where \( \kappa \) is the background noise level. It is not immediately clear whether increasing the total bandwidth contributes to the increase in channel capacity or not, since a higher bandwidth allows for a higher thermal noise. Further investigation of this relationship,

\[
\frac{\partial C}{\partial B} = \ln \left( 1 + \frac{P_r}{\kappa B} \right) - \frac{1}{1 + \frac{\kappa B}{P_r}}
\]

(1.3)
Figure 1.6: Cost of spectrum vs. cost of equipment over time

shows that increasing the absolute bandwidth always increases the capacity, since \( \frac{\partial C}{\partial B} > 0 \). Since \( B \) represents the absolute bandwidth, the same capacity can be obtained for different carrier frequencies \( (f_c) \). Defining the fractional bandwidth as

\[
B_F = \frac{B}{f_c}
\]  

(1.4)

most radio systems support a limited fractional bandwidth. There are several reasons for this, to name a few:

- Despite the existence of ultra-wideband antennas, most high-efficiency antennas have a relatively limited fractional bandwidth.

- High-frequency circuits tend to use resonators to compensate for the parasitic capacitance of the various elements. The Bode-Fano criterion [8] places an upper limit on the achievable bandwidth when parasitic reactive elements are present.

Therefore, it seems reasonable to use higher carrier frequencies to achieve higher capacity at a given fractional bandwidth (Fig. 1.7). However, as explained in the next section, it should be kept in mind that power generation at higher frequencies is less efficient, and the generated power is attenuated when propagating through the air. Therefore, THz radio systems use phased arrays to generate higher power. Note that phased arrays can increase
the directivity of the radiation compared to other power combining techniques, resulting in a higher EIRP\textsuperscript{1}.

In the non-ionizing frequency range of the electromagnetic spectrum, safety protocols \cite{2} limit the output power of each radiator to avoid electrostimulation of nerve and muscle cells (mainly below 1MHz) or excessive tissue heating. Based on Fig. 1.8, the power density ($P_D$) is defined as

$$P_D = \frac{P_t G_t}{4\pi d^2} \quad (1.5)$$

should be less than 10W m\textsuperscript{-2}, where $P_t$ is the transmit power and $G_t$ is the antenna gain, and $d$ is the minimum distance in any direction from any part of the radiating structure to the user’s body. The FCC\textsuperscript{2} currently regulates the maximum EIRP level, which must be below 55dBm/MHz\cite{9, 10} for a wide range of frequencies to ensure user safety. While service providers should adhere to this limit, power generation in the millimeter-wave band becomes extremely difficult, and most of these systems have limited total output power. Based on

\begin{figure}[h]
\centering
\includegraphics[width=\textwidth]{Figure1.8.png}
\captionof{figure}{Safe radiation levels for persons in unrestricted environments \cite{2}.
\textsuperscript{1}Equivalent Isotropically Radiated Power
\textsuperscript{2}Federal Communications Commission}
Friis’s formula, the received power is

\[ P_r = P_D \frac{\lambda^2 G_r}{4\pi} \]  

(1.6)

where \( \lambda \) is the wavelength and \( G_r \) is the antenna gain of the receiver. The effective area \( A_e \) of the receiver is defined as

\[ A_e = \frac{\lambda^2 G_r}{4\pi} \]  

(1.7)

With a continuous wave approximation of \( \lambda \approx \frac{v}{f_c} \) and a maximum output EIRP of \( P_{\text{max}} = P_t G_t |_{\text{max}} \), the channel capacity can be written as

\[
C = \frac{1}{\ln 2} B_F f_c \ln \left( 1 + \frac{P_{\text{max}} \frac{\lambda^2 G_r}{4\pi}}{\kappa B_F f_c} \right) \\
\]  

(1.8)

\[
= \frac{1}{\ln 2} B_F f_c \ln \left( 1 + \frac{\frac{v^2}{(4\pi)^2 d^2} \frac{1}{\kappa B_F} \frac{1}{f_c^3} P_{\text{max}} G_r}{f_c} \right) \\
\]  

(1.9)

Assuming a user device with a single antenna and a fixed fractional bandwidth, the optimal carrier frequency for the maximum channel capacity can be found by solving the following equation.

\[
\frac{\partial C}{\partial f_c} = 0 \\
\]  

(1.10)

With a change in the variables \( \mathcal{X} = 1 + \frac{\mathcal{G}}{f_c^3} \) and \( \mathcal{G} = \frac{\frac{v^2}{(4\pi)^2 d^2} \frac{1}{\kappa B_F} P_{\text{max}} G_r}{f_c} \),

\[
C = \frac{1}{\ln 2} B_F \sqrt[3]{\frac{\mathcal{G}}{\mathcal{X} - 1}} \ln(\mathcal{X}) \\
\]  

(1.11)

and

\[
\frac{\partial C}{\partial \mathcal{X}} = -\frac{1}{\ln 2} B_F \left( \frac{\mathcal{G}}{3 \left( \frac{\mathcal{G}}{\mathcal{X} - 1} \right)^{2/3} (\mathcal{X} - 1)^{2/3}} \right) \left( \ln(\mathcal{X}) - 3 + \frac{3}{\mathcal{X}} \right) \\
\]  

(1.12)

The maximum capacity can be reached when

\[
\mathcal{X} = e^{W_0(\frac{3}{\mathcal{X}})^{2/3}} + 3 \approx 16.8 \\
\]  

(1.13)

where \( e \) is the Euler’s number and \( W(.) \) is the Lambert function. The important observation here is that for a maximum allowable transmitter EIRP and a fixed fractional bandwidth, the carrier frequency should be increased so that the total SNR at the end of the receive chain is approximately \( \approx 12\text{dB} \), suggesting that for a high-speed over-the-air communication (with a target bit error rate of \( 10^{-3} \)), low-order digital modulations such as QPSK or 16-QAM should be used \(^3\). Higher-order modulations increase the spectral efficiency, but the absolute QPSK also has the advantage of allowing power amplifiers to operate at their saturated power.
bandwidth must be reduced to achieve the same bit error rate, which ultimately lowers the
data rate. The maximum channel capacity of

$$C_{\text{max}} \approx \frac{4.2v^2}{(4\pi)^2 d^2} \frac{1}{15.8} \frac{P_{\text{max}} G_r}{\kappa B^2}$$

is obtained when the carrier frequency is chosen as

$$f_{c,\text{opt}} \approx \frac{1}{15.8} \frac{v^2}{(4\pi)^2 d^2} \frac{1}{\kappa B^2}$$

In practice, the routing loss and the antenna’s efficiency should also be considered. As for the
directivity of the array, an $N \times N$ array of antennas with $\frac{\lambda}{2}$ spacing provides a directivity of
$N^2$. Moreover, patch antennas provide an additional advantage since these antennas radiate
from the front side and ideally have no backside radiation. Hence,

$$G_t = 2N^2$$

which sets the maximum EIPR as

$$P_{\text{max}} = 2N^3 P_e$$

Figure 1.9: An $8 \times 8$ 2-D transmitter phased array with 1-D steering capability.
For example, if each PA has 0dBm of output power, EIRP of 30dBm can be achieved.

The receiver side is a bit more challenging. First, let us consider a fully passive power combining for the phased array as depicted in Fig. 1.10a. Assuming the same patch antenna,

\[ G_r = 2N^2 \]  \hspace{1cm} (1.19)

Recall that the available noise power of a passive device \(^4\) in thermal equilibrium is equal to kT \([8]\), where k is the Boltzmann constant and T is the absolute temperature. In other words, the use of multiple antennas does not affect the thermal noise power picked up from the ambient blackbody radiation. However, it does increase the directivity of the antennas. It may be difficult to understand why the power level of the radiation noise remains constant despite the combined noise of multiple antennas. The reason is that a passive loss-less power combiner with more than two matched ports does not exist \([8]\). Therefore, the passive combiner will either partially dissipate or reflect the power to the antennas. Another view is that the thermal noise of the individual elements is generally considered uncorrelated. In contrast, the radiation noise picked up by the different antennas is correlated because it has the same origin, namely the environment. Therefore, different sources can add constructively or destructively after the combiner. The important observation is that the SNR increases by a factor \(G_r\) when the antenna array is used. After the LNA, the input-referred noise of the LNA (\(\mathcal{N}_{LNA}\)) is added directly to the output. Let us now consider the active array from Fig. 1.10b. In this case, the \(N\) uncorrelated noise powers of the LNAs are visible at the output. However, since the signal adds correlatedly in the voltage domain, it is amplified by \(N^2\). The ambient thermal noise at the input of each LNA is preserved as kT because the

\(^4\)Including passive antennas and passive power combiners.
passive structures are in thermal equilibrium. However, since they are partially correlated, the correct method to determine the noise level at the output due to the ambient thermal noise (Fig. 1.11) is to use

$$V_{\text{out}} = \sum_{m=1}^{N} \left[ \int \int v_{\text{N}}(\theta', \phi') e^{jm\pi \sin(\theta')} g_m(\theta', \phi') \sin(\theta') d\theta' d\phi' e^{-jm\pi \sin(\theta)} \right]$$  \hspace{1cm} (1.20)

where $v_{\text{N}}(\theta', \phi')$ is the thermal noise source at different spherical locations, $g_m(\theta', \phi')$ is the effective gain of each group of passive antennas for each noise source in spherical coordinates, $\frac{\lambda}{2} \sin(\theta') \frac{2\pi}{\lambda} = \pi \sin(\theta')$ is the phase delay of each noise source to each set of antennas, and $-\pi \sin(\theta)$ is the correction phase that the phased array processor must apply to steer its beam toward the angle of incidence $\theta$. If we assume that each group of antennas is isolated from others (which is not necessarily true [11]), the gain of all groups is approximately equal to $g(\theta', \phi')$ and

$$\overline{V_{\text{out}}^2} = \int \int (v_{\text{N}}(\theta', \phi') g(\theta', \phi'))^2 \left[ \sum_{m=1}^{N} e^{jm\pi (\sin(\theta') - \sin(\theta))} \right]^2 \sin(\theta') d\theta' d\phi'$$ \hspace{1cm} (1.21)

Note that if each group of antennas has a radiation resistance of $R$,

$$\int \int (v_{\text{N}}(\theta', \phi') g(\theta', \phi'))^2 \sin(\theta') d\theta' d\phi' = 4kTR$$ \hspace{1cm} (1.22)

since each group of antennas is in thermal equilibrium. If we assume that ambient thermal noise power coming from different directions is equal ($v_{\text{N}}(\theta', \phi')^2 = v_{\text{N}}^2$) and each group of series-fed antennas has a uniform radiation distribution in the $\theta'$ axis ($g(\theta', \phi') = g(\phi')$), Eq. 1.21 can be simplified to

$$\overline{V_{\text{out}}^2} = \int (v_{\text{N}} g(\phi'))^2 \int \left[ \sum_{m=1}^{N} e^{jm\pi (\sin(\theta') - \sin(\theta))} \right]^2 \sin(\theta') d\theta' d\phi'$$ \hspace{1cm} (1.23)
Note that the term in the second integral is nothing but the total radiated power (in all directions) of \( N \) number of radiators normalized to the total radiated power of a single antenna, equal to \( N \). It follows,

\[
\overline{V_{\text{out}}^2} = 4kTR \times N \tag{1.24}
\]

In other words, there is no difference between passive and active combiners when it comes to the SNR of the received signal due to ambient thermal noise. The only difference is that when multiple LNAs are used in an active combiner, the noise of the LNAs is averaged. For the rest of this section, for simplicity, we consider active combiners with LNA amplification that provides the same signal power as the passive combiner (\( G_{P,LNA} = \frac{1}{N} \)). Therefore, the noise level of the Shannon’s capacity equation is

\[
\kappa = kT \frac{1}{N} (N + N (F_{LNA} - 1))
\]

\[
= kT F_{LNA} \tag{1.25}
\]

where \( F_{LNA} \) is the linear noise figure of the LNA. Now, maximum channel capacity and optimum carrier frequency can be calculated as

\[
C_{\text{max}} \approx \frac{\sqrt{\frac{4.2v^2}{(4\pi)^2} d^2 kT F_{LNA} B_f^2}}{\sqrt{4N^5L^2 P_e}} \tag{1.27}
\]

\[
f_{c,\text{opt}} \approx \frac{\sqrt{\frac{1}{15.8} \frac{v^2}{(4\pi)^2} d^2 kT F_{LNA} B_f}}{\sqrt{4N^5L^2 P_e}} \tag{1.28}
\]

where \( L \) models the routing loss and radiation efficiency of the antennas. As a numerical example (Table. 1.2), in an \( 8 \times 8 \) 1-D steerable phased array where each PA generates 3dBm output power followed by \(-3\)dB routing loss, 30dBm EIRP is generated at the transmitter side. Assuming a noise figure of 12dB for the LNAs, a carrier frequency of 135GHz is suitable for 15% fractional bandwidth (20GHz absolute bandwidth) at a distance of 10m. Such a
Figure 1.13: Spectral efficiency vs. carrier frequency for a link based on Table 1.2

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>(N)</td>
<td>8</td>
<td>Number of antennas in each axis of the TRX phase arrays</td>
</tr>
<tr>
<td>(P_e)</td>
<td>3dBm</td>
<td>Output power of each PA element</td>
</tr>
<tr>
<td>(B_F)</td>
<td>15%</td>
<td>Fractional bandwidth of radio</td>
</tr>
<tr>
<td>(d)</td>
<td>10m</td>
<td>Distance between receiver and transmitter</td>
</tr>
<tr>
<td>(L)</td>
<td>(-3dB)</td>
<td>Routing loss on PCB</td>
</tr>
<tr>
<td>(F_{LNA})</td>
<td>12dB</td>
<td>Noise figure of millimeter-wave LNAs</td>
</tr>
<tr>
<td>EIRP</td>
<td>30dBm</td>
<td>Equivalent Isotropic Radiated Power</td>
</tr>
<tr>
<td>(f_{c,\text{opt}})</td>
<td>135GHz</td>
<td>Optimum carrier frequency for the maximum capacity</td>
</tr>
<tr>
<td>(C_{\text{max}})</td>
<td>81Gbps</td>
<td>Maximum channel capacity</td>
</tr>
</tbody>
</table>

Table 1.2: A numerical example of an optimal over-the-air communication link

A system can deliver a data rate of 81Gbps at maximum capacity (Fig. 1.12). It is clear that a simple QPSK modulation is sufficient to achieve the maximum data rate in this system, since SNR has been traded off for a higher data rate (Fig. 1.13). This is easy to understand since

\[ C_{\text{max}} \approx 4B_F f_{c,\text{opt}} \]  

(1.29)

The strength of phased arrays is easy to see here because \(C_{\text{max}} \propto \sqrt{N^5}\). However, the physical dimension of the array ultimately limits the number of antenna elements. Let \(W = N^{\frac{3}{2}}\) be the width of the antenna array,

\[ f_{c,\text{opt}} \approx \pi dv \sqrt{\frac{2.0 kT F_{LNA} B_F}{W^5 L^2 P_e v}} \]  

(1.30)

which shows that an optimal carrier frequency should be used for a fixed-size antenna array. The reader should note that the optimal carrier frequency increases as the number of antennas increases as \(f_{c,\text{opt}} \propto \sqrt{N^5}\), so that the total width of the antenna array is proportional to \(W_{\text{opt}} \propto \frac{1}{\sqrt{N}}\). Although increasing the number of elements without changing the carrier frequency increases the data rate, the channel capacity is suboptimal because a higher carrier frequency increases the data rate. For example, if the antenna array cannot occupy more
than \(1\text{cm} \times 1\text{cm}\), a carrier frequency of 100GHz with a number of 6 antennas is optimal if the other parameters are taken from Table 1.2. For a fixed array dimension, the optimal number of elements is

\[
N_{\text{opt}} \approx \pi d \sqrt{\frac{7.9}{W^3} \frac{1}{kT} \frac{F_{\text{LNA}} B_F}{L^2 P_c}} v
\]  

So far, increasing the number of elements has increased the channel capacity in the optimal case and made the array dimension smaller. It is easy to see that with this trend, the power density, defined as \(\frac{N \times P_c}{W_{\text{opt}}} \propto \sqrt{N^5}\), increases with the number of elements used in the array. The high power density makes the packaging of such arrays quite difficult, as they have to cope with higher heat dissipation.

Before concluding this section, let us consider device constraints and their impact on the link capacity. For a CMOS device, the minimum noise figure increases linearly with frequency.

\[
F_{\text{min}} = \gamma f_c
\]  

where \(\gamma\) is a technology-dependent proportionality factor. Moreover, the PA survey of

![Saturated Output Power vs. Frequency (CMOS)](image)

Figure 1.14: The output power of published power amplifiers as a function of carrier frequency[3]

Fig. 1.14 shows that the output power decreases with increasing carrier frequency as

\[
P_{e,\text{max}} \propto \frac{P}{f_c}
\]  

(1.33)
where $\mathcal{P}$ is a technology-dependent factor. Now, considering Eq. 1.34

$$W_{opt} \approx \sqrt[5]{\frac{2.0\pi^2 kT\gamma B_F}{L^2 \mathcal{P}} v^3 \sqrt{d^2}}$$  \hspace{1cm} (1.34)

This means that for a given technology and a fixed distance between two transceivers, an optimal array dimension can achieve the maximum data rate of communication.

### 1.3 Silicon limits

High carrier frequencies require fast transistors. While other compound semiconductors can achieve higher $f_t$ and $f_{\text{max}}$, silicon remains the dominant semiconductor since it has unique capabilities in digital-intense designs. In this section, we will explore some of the limitations of the Bulk CMOS process as shown in Fig. 1.15.

![Figure 1.15: A simple model of planar CMOS transistor.](image)

Note that only the semiconductor device is considered here, and the parasitic impact of the back end of the line metallization is not considered. In practice, the performance of deep sub-micron devices is deteriorated by extrinsic parasitic elements.
CHAPTER 1. INTRODUCTION

Transit Frequency $f_t$

For a CMOS device, the transit frequency $f_t$ is defined as

$$f_t = \frac{1}{2\pi} \frac{g_m}{C_{gs}}$$

$$= \frac{1}{2\pi} \frac{\partial I_{ds}}{\partial V_{gs}}$$

$$= \frac{1}{2\pi} \frac{\partial Q_{gs}}{\partial V_{gs}}$$

$$= \frac{1}{2\pi} \frac{\partial I_{ds}}{\partial Q_{gs}}$$

(1.35)

Assuming that the device operates under velocity saturation $^5$, the maximum drain-source current is reached when all new charges on the source side ($\partial Q_{gs}$) traverse the effective channel length ($L_{eff}$) at the maximum saturation velocity ($v_{sat}$). This means

$$\partial I_{ds,max} = \partial Q_{gs} \frac{v_{sat}}{L_{eff}}$$

(1.36)

This means that for a CMOS process, there is a maximum limit to the $f_t$ of the device

$$f_{t,max} = \frac{1}{2\pi} \frac{v_{sat}}{L_{eff}}$$

(1.37)

To achieve higher current gain and higher transit frequency, either the channel length must be reduced, or the saturation velocity of carriers must be increased. While the latter can be achieved by channel engineering, scaling the channel length remains the main strategy to increase the operating speed of transistors. For example, for a 28nm CMOS node with a saturation velocity of $10^7$ cm s$^{-1}$, one can expect a maximum $f_t$ of 570GHz. In practice, the effective channel length is about one-third of the drawn channel for the smallest channel length of each node, which can potentially increase the transition speed. On the other hand, the transition frequency is reduced by fringe capacitors and the gate-source and gate-drain overlap capacitors $^6$, negating any potential improvement from the smaller effective channel length. Once the parasitic capacitance of the back-end metallization is added, the transition frequency drops again.

Analog Efficiency $f_t \frac{g_m}{I_d}$

Another commonly used metric for CMOS transistors is $f_t \frac{g_m}{I_d}$. It gives the analog efficiency of a transistor at a fixed current consumption. Although this metric is not useful for millimeter-wave circuit design, it is still instructive to examine the limits of this metric for a square-law

$^5$Diffusion currents are neglected.

$^6$For 28nm devices, the fringe capacitors of gate-drain and gate-source are about the same size as the channel capacitance, reducing the transition frequency by a factor of about 3.
device. Analog efficiency can be described as

\[
 f_t \frac{g_m}{I_d} = \frac{1}{2\pi} \frac{g_m}{C_{gs}} \frac{2I_{ds}}{V_{od}} \frac{1}{I_{ds}} = \frac{1}{\pi} \frac{g_m}{C_{gs}} \frac{1}{V_{od}}
\]

(1.38)

where \( V_{od} \) is the overdrive voltage. Substitute \( g_m = \mu C_{ox} \frac{W}{L} V_{od} \) and \( C_{gs} = \frac{2}{3} C_{ox} W L \) into the previous equation, the analog efficiency can be calculated as

\[
 f_t \frac{g_m}{I_d} = \frac{1}{\pi} \frac{\mu C_{ox} W L V_{od}}{2 C_{ox} W L} \frac{1}{V_{od}} = \frac{1}{2\pi} \frac{\mu}{L^2}
\]

(1.39)

It should now be clear that for a square-law device with a fixed channel length, the maximum analog efficiency is achieved at a gate-source voltage that provides the highest mobility for the charges in the channel. Fig. 1.16 shows simulation results for NMOS and PMOS devices,

![Simulation Results](image)

Figure 1.16: Estimating the mobility of the device from simulations for different current densities (A/\( \mu m \))

where the mobility of the device is given by

\[
 \mu = \frac{2}{3} \frac{\pi f_t g_m L^2}{I_d}
\]

\[
 \frac{2}{3} \frac{\pi C_{gs} - C_{GD}}{C_{gs}} \frac{g_m L^2}{I_d}
\]

(1.40)
Note that $C_{gs}$, the channel capacitance, is replaced by $C_{GS} - C_{GD}$ to remove fringe capacitors. Note that, as expected [12], the NMOS device still performs better than its PMOS counterpart. However, as silicon doping increases, NMOS and PMOS devices become more similar.

### Speed-Power Trade-off

Johnson has shown [13] that for bipolar transistors, there is a relationship between the maximum current, the input impedance of the device, and the cutoff frequency of the device. The cutoff frequency is defined by $f_T = \frac{1}{2\pi \tau}$, where $\tau$ is the average time required for a carrier to traverse the base at the saturated drift velocity. This definition agrees well with the maximum transit frequency defined earlier

$$f_T = f_{t,\text{max}}$$

Considering a CMOS process with a dielectric breakdown field of $E_{Si}$, the maximum drain-source voltage can be described as

$$V_{ds,\text{max}} = E_{Si}L_{eff}$$

Thus, there is a relationship between the maximum drain-source voltage and the device cutoff frequency where

$$V_{ds,\text{max}} f_{t,\text{max}} = \frac{1}{2\pi} E_{Si} v_{sat}$$

A transistor with a transit frequency of 400GHz cannot generate more than 2 volts peak-to-peak drain-source voltage, assuming $E_{Si} \approx 5 \times 10^5 \text{V cm}^{-1}$ and $v_{sat} \approx 10^7 \text{cm s}^{-1}$.

While the dielectric breakdown field sets a maximum drain-source voltage, any drain-source current can be achieved at the cost of increased input capacitance by connecting multiple devices in parallel. The load current through a charge control device is defined by

$$I_{ds} = \frac{Q_{ch}}{\tau}$$

where $Q_{ch}$ is the total mobile charge in the channel, and $\tau$ is the average charge transit time. To calculate the maximum current, we should assume the highest drift velocity, which sets $\tau_{\text{min}} = \frac{v_{sat}}{L_{eff}}$. Also, we assume a dielectric breakdown field of $E_{ox}$ for the dielectric barrier between the gate and the channel,

$$Q_{\text{max}} = C_{gs} E_{ox} t_{ox}$$

where $t_{ox}$ is the oxide thickness. It follows that,

$$I_{ds,\text{max}} = C_{gs} E_{ox} t_{ox} \frac{v_{sat}}{L_{eff}}$$
If we define $X_{f_T} = \frac{1}{2\pi f_T C_{gs}}$ as the reactive input impedance between gate and source, it is clear that

$$I_{ds,max} X_{f_T} = E_{ox} t_{ox} \tag{1.47}$$

In other words, for a fixed impedance at the device cutoff frequency, the maximum drain-source current is fixed by the properties of the gate oxide. For most CMOS process nodes, the drain-source and gate-source breakdown fields are close to each other because the drain of the preceding transistors directly controls the gate nodes of digital circuits. For example, with $E_{ox} = 14\text{MV cm}^{-1}$ for silicon dioxide [14] and an oxide thickness of 1nm,

$$E_{ox} t_{ox} \approx E_{Si} L_{eff} \approx 1.5V \tag{1.48}$$

For a CMOS process optimized for digital circuits,

$$I_{ds,max} = C_{gs} E_{ox} t_{ox} \frac{v_{sat}}{L_{eff}} \approx C_{gs} E_{Si} L_{eff} \frac{v_{sat}}{L_{eff}} \approx C_{gs} E_{Si} v_{sat} \tag{1.49}$$

Therefore, the relationship between the volt-ampere product (as an approximation for the maximum output power), the input impedance level ($X_f = \frac{1}{2\pi f C_{gs}}$), and the cutoff frequency of the transistor can be found as

$$f \times I_{ds,max} \times V_{ds,max} = f \frac{1}{2\pi} \frac{E_{Si} v_{sat}}{f_T} C_{gs} E_{Si} v_{sat} = 2\pi f C_{gs} \left( \frac{E_{Si} v_{sat}}{2\pi} \right)^2 \frac{1}{f_T} \tag{1.50}$$

and thus,

$$f \times I_{ds,max} \times V_{ds,max} \times X_f = \left( \frac{E_{Si} v_{sat}}{2\pi} \right)^2 \frac{1}{f_T} \tag{1.51}$$

Note that

- increasing the cutoff frequency decreases the output power for a fixed technology and a fixed input impedance, and
- for a fixed input impedance, faster process nodes provide lower output power at a fixed operating frequency.
CHAPTER 1. INTRODUCTION

Termination Levels vs. Frequency

The optimal large-signal termination load for maximum power transfer can be studied by finding the proper ratio of $\frac{V_{ds,max}}{I_{ds,max}}$

$$R_{opt} = \frac{V_{ds,max}}{I_{ds,max}} = \frac{\frac{1}{2\pi} \frac{E_{Si}v_{sat}}{f_T}}{C_{gs}E_{Si}v_{sat}}$$

$$= \frac{1}{2\pi f_T C_{gs}}$$  \hspace{1cm} (1.52)

which shows that for a fixed input impedance level, output terminations should be smaller for faster process nodes

$$R_{opt} = X_f \frac{f}{f_T}$$  \hspace{1cm} (1.53)

Large Signal Power Gain vs. Frequency

In the simple model presented above, gate impedance is considered purely imaginary. It is generally not the case as the operating frequency of the transistor increases since one must consider the non-quasistatic model for the device. First, consider a resistor $R_g$ in series with the gate capacitance. We will discuss the origins of this resistance later. Let us assume a small resistor,

$$I_{gs} = 2\pi f Q_{ch}$$  \hspace{1cm} (1.54)

where $Q_{ch}$ is the total mobile charge in the channel. Thus, the peak power dissipation at the series resistor is

$$P_{in} = \frac{1}{2} (2\pi f Q_{ch})^2 R_g$$  \hspace{1cm} (1.55)

The maximum output current is

$$I_{ds} = \frac{Q_{ch}}{\tau_{min}}$$  \hspace{1cm} (1.56)

and therefore the peak output power is

$$P_{out} = \frac{1}{2} \left( \frac{Q_{ch}}{\tau_{min}} \right)^2 R_L$$  \hspace{1cm} (1.57)

where $R_L$ is the terminating resistor. Assume that the terminating resistor $R_L = R_{opt}$ is chosen,

$$P_{out} = \left( \frac{Q_{ch}}{\tau_{min}} \right)^2 \frac{1}{4\pi f_T C_{gs}}$$  \hspace{1cm} (1.58)
and the power gain can be calculated as

$$ G = \frac{P_{\text{out}}}{P_{\text{in}}} = \frac{1}{(2\pi f Q)^2 R_g \left( \frac{Q_{\text{ch}}}{\tau_{\text{min}}} \right)^2 \frac{1}{2\pi f_T C'_{gs}}} $$

$$ = \frac{2\pi f_T}{(2\pi f)^2 R_g C'_{gs}} \quad (1.59) $$

Considering only the intrinsic device, $R_g C'_{gs}$ is the time constant for the redistribution of the channel charge in response to the gate excitation and can be calculated to be about $\frac{1}{5} \sim \frac{1}{8} \times \frac{1}{2\pi f}$ for a square-law device [15]. The exact coefficient depends on the exact charge distribution in the channel. Therefore, we assume that $\alpha$ represents this coefficient, which ranges between $5 \sim 8$ for square-law devices and drops to smaller values ($\approx 2$) for a uniform charge distribution. It follows,

$$ G = \alpha \left( \frac{f_T}{f} \right)^2 \quad (1.60) $$

Interestingly, although the output power of faster process nodes tends to decrease for a fixed input impedance, the large-signal power gain improves when faster transistors are used. Defining $f_{\text{max}}$ as the frequency at which the power gain drops to 0dB, we find that.

$$ f_{\text{max}} \propto f_T \quad (1.61) $$

**Detailed Model with Extrinsic Parasitics**

Fig. 1.17 shows a complex model of a transistor with the first layer of back-end metallization, and the different values are listed in Table. 1.3. The red components are calculated within the BSIM model, while the blue components extract the parasitic elements in the layout. These parasitic elements include:

- $R_{\text{tip}}$ is the resistance from the edge of the via to the edge of the OD definition. In HKMG processes, the work function of the metal is used to set the threshold voltage of the device. Therefore, devices with the same layout but different thresholds may have different series resistance and high-frequency response. While other parasitic components scale with the width of the device, the minimum $R_{\text{tip}}$, which corresponds to the shortest distance between the transistor and the poly contact, is fixed by the process capabilities.

---

7HKMG: High-K Metal Gate
• $R_{POC}$ is the poly contact resistance from the first metal layer (M1) to the gate metal (PO). It should be mentioned that some process nodes allow negative enclosure for the contact. It allows devices with a shorter channel length (and consequently faster $f_t$), but the current crowding at the contact tip increases the contact resistance. Instead of using a single contact, the designer can use multiple contacts in parallel. It seems compelling, but since additional poly contacts are connected in series with the added resistance of the gate extension, the advantage is quickly exhausted. For example, for a single poly contact

$$R_{POC} + R_{tip} = 100 + 117 \Omega = 217 \Omega$$  \hspace{1cm} (1.62)

while two poly contacts

$$R_{POC} + R_{tip} = (100 + 117) \parallel (100 + 117) \Omega = 185 \Omega$$  \hspace{1cm} (1.63)

will only provide 15% of improvement. In contrast, using a double-sided contact, as shown in Fig. 1.18, halves the effective resistance $R_{POC} + R_{tip}$ in series with $R_g$ and improves the overall gate resistance by more than %50.\footnote{Since the gate metal is driven from both sides, its effective resistance also decreases.} Although the double-sided contact strategy is promising, it requires a complex layout and additional parasitic capacitance. Therefore, the double-sided contact should be used only when necessary.
CHAPTER 1. INTRODUCTION

- \( R_g \) is the series resistance of the metal gate on the active area (OD). For a planar transistor,
  \[
  R_g = \alpha \rho_{MG} \frac{W}{t_{MG} L}
  \]
  where \( t_{MG} \) is the thickness of the metal gate, \( \rho_{MG} \) is the resistivity of the metal used for the gate, and \( \frac{1}{2} \) assumes a simple T-model \(^9\). It should be mentioned that when a double-sided contact is used, the effective resistance of the metal gate decreases by a factor of 4.

- \( C_{ch} \) is the channel capacitance. It is a nonlinear capacitance that depends strongly on the bias of the gate voltage and the response to large signals.

- \( C_{gdo} \) and \( C_{gso} \) are the gate-drain and gate-source overlap capacitance modeled in the BSIM model. These are relatively linear, bias-independent capacitors generally symmetric on both the drain and source sides. While these capacitors are negligible for devices with long channels, they become comparable with the highest channel capacitance of the short channel device.

- Similarly, \( C_{ov} \) is the stray capacitance between gate to source and drain extracted from the layout extractor. \( C_{GC} \) also represents the coupling capacitor between gate and M1.

- \( C_{sb} \) and \( C_{db} \) represent the source-bulk and drain-bulk junction capacitance, respectively, and \( C_{gb} \) represents the gate-bulk capacitance. This capacitor comes from the gate metal extensions running away from the active area. All these capacitors are weakly dependent on the gate bias voltage.

- \( R_S \) models the resistance of the shallow drain-source junction extensions. It is one of the most critical limiting factors for the minimum switch resistance in digital circuits \([16, 17]\). Therefore, it is essential to model this resistance in a passive mixer properly. In analog and high-frequency applications, this physical resistor also generates thermal noise. It acts like a degeneration resistor in series with the device source and ultimately limits the device’s transconductance.

As mentioned earlier, the use of more advanced nodes improves the intrinsic cutoff frequency of the device. However, scaling is detrimental to the effect of back-end metallization. As shown in Fig. 1.19, as the technology scales, not only do the lateral dimensions (e.g., channel length and channel width of the transistors) scale but so do the vertical dimensions (e.g., the thickness of the metals and the thickness of the interlayer dielectrics). If the designer keeps the device’s width constant from one technology node to another, driving the same transistor will result in higher power dissipation because the series resistance of the gate has increased. Therefore, despite improving the device’s cutoff frequency, the device’s power gain may not follow the same improvement. To illustrate this point, let us calculate

\(^9\)Most parasitic extraction programs use a T-model for the gate resistance.
CHAPTER 1. INTRODUCTION

Figure 1.18: A simple planar transistor in layout view and its 3D representation

Figure 1.19: Parasitic elements of a transistor.

\[ f_{\text{max}} \text{ using a simplified model shown in Fig. 1.20. For this unilateral device, the maximum available power gain is} \]

\[ G_P = \frac{1}{4} G_I^2 \frac{R_{\text{out}}}{R_g} \]  

(1.65)

where \( G_I \approx \frac{f_t}{f} \) is the current gain, \( R_{\text{out}} \) is the device output resistance, and \( R_g \) is the total series resistance before channel capacitance. It follows that,

\[ f_{\text{max}} \approx \frac{f_t}{2} \sqrt{\frac{R_{\text{out}}}{R_g}} \approx \frac{f_t}{2} \sqrt{\frac{g_m R_{\text{out}}}{g_m R_g}} \]  

(1.66)

If \( R_g \) is dominated by the metal gate, assuming a square-law device where \( g_m = \mu \frac{\varepsilon_0 W}{L} V_{od} \),

\[ g_m R_g = \alpha \mu \rho_{MG} \varepsilon_0 \frac{1}{t_{MG} t_{ox}} \left( \frac{W}{L} \right)^2 V_{od} \]  

(1.67)
CHAPTER 1. INTRODUCTION

<table>
<thead>
<tr>
<th>Length</th>
<th>Width</th>
<th>Nominal Current</th>
</tr>
</thead>
<tbody>
<tr>
<td>40nm</td>
<td>370nm</td>
<td>200uA/um</td>
</tr>
</tbody>
</table>

| $R_{PQC}$ | 100Ω | Per “Poly Contact” |
| $R_{tip}$ | 117Ω | Minimum size allowed by DRC |
| $R_g$     | 241Ω | For 370um length modeled as T |
| $R_{ODC}$ | 100Ω | Per “OD Contact” |
| $R_S$     | 200Ω | Weakly depends on bias condition (200 ~ 350Ω) |
| $R_{NQS}$ | 400Ω | Strongly depends on bias condition ($\approx \frac{1}{g_m}$) |
| $C_{SD}$  | 23.3aF | For 370um length |
| $C_{ov}$  | 6.6aF | For 370um length |
| $C_{GC}$  | 9.2aF | For 370um length |
| $C_{gs}$  | 80aF  | For 370um length |
| $C_{gdo}$ | 80aF  | For 370um length |
| $C_{gb}$  | 30aF  | Weakly depends on bias condition (15 ~ 35aF) |
| $C_{sb}$  | 85aF  | Weakly depends on bias condition (80 ~ 120aF) |
| $C_{db}$  | 85aF  | Weakly depends on bias condition (80 ~ 120aF) |
| $C_{ch}$  | 90aF  | Strongly depends on bias condition (0 ~ 100aF) |

Table 1.3: Model values.

which is inversely proportional to the scaling trend $^{10}$ Moreover, $g_mR_{out}$ is the intrinsic gain of the device, which decreases proportionally to the scaling factor, assuming a first-order approximation for the channel length modulation. Therefore, despite the improvement of $f_t$, $f_{max}$ will not follow the same trend and will remain nearly constant unless the mobility of the majority carriers is increased by channel engineering $^{18}$ or the conductivity of the metal gate is increased $^{11}$. The significance of this result is that once the extrinsic parasitic elements limit the performance of the device, scaling offers the designer little to no improvement with respect to the device $f_{max}$.

1.4 Challenges

The previous sections have explained the need for high frequency, high data-rate communication links. While previous works have achieved high-speed links above 100GHz (Fig. 1.21), challenges still exist.

$^{10}$Remember that $g_mR_{NQS}$ would have remained constant if $R_g$ were dominated by the intrinsic gate resistance $R_{NQS}$.

$^{11}$While reducing $\varepsilon_{ox}$ seems to be equally effective, it is not desirable because the gate loses its control over the channel.
The first problem is to increase the link distance from tens of centimeters to several meters. As described in the previous section, using a phased array is effective. However, using a large number of elements increases the system’s power consumption. On the other hand, the spacing between elements becomes smaller at high carrier frequencies, which increases the power dissipation density. Therefore, a good packaging approach capable of cooling the various elements of the system should be considered. Also, the cost of the package and the silicon area should be considered together. For example, in Fig. 1.21a, on-chip antennas were used to implement an array of $2 \times 4$ elements. Keep in mind that if high directivity antennas were used, a larger portion of the silicon was occupied by passive antennas. Therefore, it makes sense to place the antennas outside the chip. Unfortunately, the signal transition from the chip to the package becomes a challenge at millimeter-wave frequencies, with potentially high insertion loss if not properly designed.

The rest of this article is organized as follows. The next chapter addresses the tradeoff between noise and gain in any amplifier and the noise measure. Chapter 3 looks in detail at the design of a 140GHz wideband receiver. Chapter 4 and Chapter 5 deal with the chip-to-package and inter-package transition of millimeter-wave signals. Chapter 6 concludes this article.
## CHAPTER 1. INTRODUCTION

### Measured Constellations at Different Data Rates, Scan Angles and Modulations

<table>
<thead>
<tr>
<th>Modulation</th>
<th>Data rate/EVM</th>
<th>16-QAM 4 Gbps/1.5%</th>
<th>16-QAM 10 Gbps/4.8%</th>
<th>64-QAM 6 Gbps/1.8%</th>
<th>64-QAM 5 Gbps/1.3%</th>
</tr>
</thead>
<tbody>
<tr>
<td>140 GHz 16/64 QAM</td>
<td>4 Gbps/1.7%</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>150 GHz 16/64 QAM</td>
<td>6 Gbps/3.7%</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>145 GHz 64 QAM</td>
<td>3.6 Gbps/4.0%</td>
<td>3.6 Gbps/3.5%</td>
<td>3.6 Gbps/3.4%</td>
<td>3.6 Gbps/3.3%</td>
<td></td>
</tr>
</tbody>
</table>

#### Figures

(a) 9Gbps link at 140GHz [19]

(b) 80Gbps link at 115GHz [20]

Figure 1.21: Packaged millimeter-wave radios
Chapter 2

Millimeter-wave LNA Design

2.1 Introduction

The concept of “noise measure” was introduced in [21] by Haus and Adler. It becomes crucial when the operating frequency approaches $f_{\text{max}}$ of the active device when the available gain of the device is severely limited. Suppose that a chain of $M$ identical amplifiers with a limited power gain of $G$ is cascaded as shown in Fig. 2.1 to achieve a high power gain. Due to the noise of the amplifiers, the SNR at the output of the chain is degraded by $D$:

$$D = \frac{S_{\text{in}}}{N_{\text{in}}} \frac{S_{\text{out}}}{N_{\text{out}}}$$  \hspace{1cm} (2.1)

where $S_{\text{out}}$ and $N_{\text{out}}$ are the powers of the output signal and output noise \(^1\) powers and $S_{\text{in}}$ and $N_{\text{in}}$ are the powers of the input signal and noise, respectively.

$$S_{\text{out}} = S_{\text{in}} \times G^Q$$  \hspace{1cm} (2.2)

$$N_{\text{out}} = P_{\text{noise}} \left( 1 + G + ... + G^{Q-1} \right)$$

$$= P_{\text{noise}} \frac{G^Q - 1}{G - 1}$$  \hspace{1cm} (2.3)

Note that $P_{\text{noise}}$ is the noise power that the amplifier itself contributes to the output. Assuming that the input signal comes from a passive device in thermal equilibrium, $N_{\text{in}} = kT\Delta f$,

\(^1\)Excluding the contribution of the source noise to the output.
where $k$ is the Boltzmann constant, $T$ is the absolute temperature, and $\Delta f$ is the unit bandwidth, $D$ can be calculated as

$$D = \frac{S_{in}}{kT\Delta f S_{in} \times G^Q}$$

$$= \frac{P_{\text{noise}}}{G^Q - 1}$$

Assuming that $G^Q \gg 1$, by cascading a large number of amplifiers or using a few amplifiers with high gain, we obtain a special case where

$$M = \frac{P_{\text{noise}}}{kT\Delta f G - 1}$$

where $M$ represents the noise measure. Since the noise figure is $NF = 1 + \frac{P_{\text{noise}}}{kT\Delta f G}$, $M$ can be written as a function of the noise figure as

$$M = \frac{P_{\text{noise}}}{kT\Delta f G 1 - \frac{1}{G}}$$

$$= \frac{NF - 1}{1 - \frac{1}{G}}$$

which is more common in the literature. It should be noted that:

- If the total power gain of the cascaded amplifiers is high enough ($G^M \gg 1$), the noise measure is an indicator of how much the SNR is degraded.
When the power gain of a single stage is high enough ($G \gg 1$), the noise measure is $M \approx NF - 1$. Thus, as the frequency of the input signal approaches $f_{max}$ of the active device, the noise measure becomes more critical.

The definition of power gain must be clarified here. Since $kT\Delta f$ is the available noise power of the source, $S_{in}$ must be considered as the (maximum available) power of the source. Therefore, it is reasonable to use the power gain and write the power of the output signal as:

$$S_{out} = S_{in,max} \times \left[ \frac{P_{in,2}}{S_{in,max}} \times \frac{P_{in,3}}{P_{in,2}} \times \cdots \times \frac{P_L}{P_{in,M}} \right]$$

(2.9)

Since each $P_{in,j}$ appears once in the numerator and once in the denominator, they cancel each other. However, instead of canceling them, you can replace $P_{in,j}$ with $P_{out,max,j-1}$ and rewrite the previous equation as:

$$S_{out} = S_{in,max} \times \frac{P_{out,max,1}}{S_{in,max}} \times \cdots \times \frac{P_L}{P_{out,max,M-1}} \times \frac{P_{out,max,M}}{P_{out,max,M}}$$

Now, it is clear that the available power gain ($G_{AP}$) is the better choice when dealing with cascaded identical amplifiers because

$$S_{out} = S_{in} \times G_{AP}^M \times (1 - |\Gamma|^2)$$

(2.10)

$$N_{out} = P_{noise,max} \left( 1 + G_{AP} + \cdots + G_{AP}^{M-1} \right) \times (1 - |\Gamma|^2)$$

$$= P_{noise,max} \frac{G_{AP}^M - 1}{G_{AP} - 1} \times (1 - |\Gamma|^2)$$

(2.11)

where $\Gamma$ is the output reflection coefficient of the last amplifier. Since there is $\frac{S_{out}}{N_{out}}$ in the definition of SNR degradation (Eq. 2.1), $(1 - |\Gamma|^2)$ cancels out. Therefore, in the rest of this chapter, the power gain of an amplifier is defined as its available power gain.
CHAPTER 2. MILLIMETER-WAVE LNA DESIGN

Now suppose that the cascaded amplifiers are not identical, as in Fig. 2.2. In scenario 1, amplifier “A” precedes amplifier “B”, while in scenario 2, amplifier “B” is the first stage. Using the Friis formulas, the noise figure for each scenario can be calculated as

\[
NF_1 = NF_A + \frac{NF_B - 1}{G_A} \\
NF_2 = NF_B + \frac{NF_A - 1}{G_B}
\]

Comparing the two scenarios for the best noise figure (\(NF_1\) and \(NF_2\)) gives the following:

\[
NF_1 \leq NF_2 \\
NF_A + \frac{NF_B - 1}{G_A} \leq NF_B + \frac{NF_A - 1}{G_B} \\
NF_A - \frac{NF_A - 1}{G_B} \leq NF_B - \frac{NF_B - 1}{G_A} \\
NF_A - 1 - \frac{NF_A - 1}{G_B} \leq NF_B - 1 - \frac{NF_B - 1}{G_A} \\
(NF_A - 1) \left(1 - \frac{1}{G_B}\right) \leq (NF_B - 1) \left(1 - \frac{1}{G_A}\right)
\]

Assuming that the amplifiers have gain (\(G_A > 1, G_B > 1\)), the previous comparison in terms of noise measure can be written as

\[
NF_1 \leq NF_2 \\
(NF_A - 1) \left(1 - \frac{1}{G_B}\right) \leq (NF_B - 1) \left(1 - \frac{1}{G_A}\right) \\
\frac{NF_A - 1}{1 - \frac{1}{G_A}} \leq \frac{NF_B - 1}{1 - \frac{1}{G_B}} \\
M_A \leq M_B
\]

The key observation is that it is better to start the amplification chain with the stage that has the lowest noise measure to achieve the minimum noise figure. The following section proves that the minimum noise measure is an invariant property of technology. It means that for any amplifier, if the gain of the amplifier increases, the noise figure must increase as consequently, as shown in Fig. 2.3.

### 2.2 Derivation of the Noise Measure

The minimum noise measure is calculated in [21], where the choice of circuit representation has led to unnecessarily complicated mathematical equations that are difficult to grasp
intuitively. Here we provide a different circuit model (Fig. 2.4) in which all blocks are represented by their Y-parameter matrices ($Y_{W \times W}$), which are connected to $W$ voltage nodes (represented by $V_{W \times 1}$) shared among them. The sub-indices $S$, $A$, $P$, and $L$ represent the source, core amplifier, peripheral embedding network, and load, respectively. Without losing generality, the source and load ports are assumed to be connected between one of the $W$ voltage nodes and the ground. The selection of the node for each port can be made using $u_{W \times 1}$ vectors. For example, if the source port is connected to the first node

$$u_S = \begin{bmatrix} 1 \\ 0 \\ 0 \\ ... \\ 0 \end{bmatrix}_{W \times 1} \quad (2.16)$$

In this case, the internal voltage source can be represented as

$$V_S = v_S u_S \quad (2.17)$$
where \( v_S \) is the physical internal voltage source. Similarly, \( Y_S \) can be defined as

\[
Y_S = y_S u_S u_S^H
\]  

(2.18)

where \( y_S \) is the physical source admittance, and \((.)^H\) represents the Hermitian transpose. Similarly, \( Y_L \) can be defined as

\[
Y_L = y_L u_L u_L^H
\]  

(2.19)

Finally, the internal noise sources of the core amplifier are represented by \( W \) number of series noise voltages \((V_N)\) at each port of the amplifier. Writing the KCL equation for Fig. 2.4, we obtain

\[
Y_S (V - V_S) + Y_L V + Y_P V + Y_A (V - V_N) = 0
\]  

(2.20)

where \( V_{W \times 1} \) represents the voltage at each of the \( W \) nodes and can be calculated as

\[
V = [Y_S + Y_L + Y_P + Y_A]^{-1} [Y_S V_S + Y_A V_N]
\]  

(2.21)

To further simplify these equations, the effective \( Y_E \) matrix and the effective \( I_E \) matrix are defined as

\[
Y_E = Y_S + Y_L + Y_P + Y_A
\]  

(2.22)

\[
I_E = Y_S V_S + Y_A V_N
\]  

(2.23)

and thus \( V = Y_E^{-1} I_E \).

Before calculating the noise measure, we should clarify how to calculate the available power from the matrices defined earlier. If Fig. 2.5 represents the Thevenin equivalent circuit, then the available power of the load is

\[
P_{O,max} = \frac{1}{4} \frac{|v_O|^2}{\text{Re}\{z_O\}}
\]

\[
= \frac{1}{4} \frac{v_O v_O^*}{\frac{1}{2} (z_O + z_O^*)}
\]

\[
= \frac{1}{2} \frac{v_O v_O^H}{z_O + z_O^H}
\]  

(2.24)
where \((.)^*\) stands for the complex conjugate, corresponding to the Hermitian transpose operator when applied to a scalar number. The same equation can be used to find the available power of the source:

\[
P_{S,\text{max}} = \frac{1}{2} \frac{v_S v_S^H}{z_S + z_S^H} = \frac{1}{2} \frac{v_S v_S^H}{\frac{1}{y_S} + \frac{1}{y_S^H}} = \frac{1}{2} \frac{y_S v_S \times v_S^H y_S^H}{y_S + y_S^H}
\]

(2.25)

To apply the above equations, you must determine the Thevenin open-circuit voltage \(v_O\) and the output impedance \(z_O\). To calculate \(v_O\), the output port should be left open to calculate the output voltage \(2\). It can be written as:

\[
v_O = u_L^H Y_{E,OC}^{-1} i_L
\]

(2.26)

where \(Y_{E,OC}\) is defined as the effective y-parameter of the network when the output is open:

\[
Y_{E,OC} = Y_S + Y_P + Y_A
\]

(2.27)

To calculate the output impedance of the amplifier \(z_O\), you should calculate the output voltage response to a current test source at the output while all other independent sources are off. In matrix form, the following equation should be solved to find the voltage vector \(V\):

\[
Y_{E,OC} V = i_L u_L
\]

(2.28)

where \(i_L\) is the test current at the output port. Once \(V\) is calculated, the output impedance can be calculated by dividing the output voltage by the test current source. In matrix form:

\[
V = Y_{E,OC}^{-1} i_L u_L
\]

(2.29)

and therefore

\[
z_O = \frac{1}{i_L} u_L^H Y_{E,OC}^{-1} i_L u_L = u_L^H Y_{E,OC}^{-1} u_L
\]

(2.30)

\(^2\)After calculating the voltage vector \(V\), it should be multiplied by \(u_L^H\) to extract the (scalar) output voltage from it
Using Eq. 2.24, the available output power can be written as

\[
P_{O,max} = \frac{1}{2} u_L^H Y_{E,OC}^{-1} I_E \times \left( u_L^H Y_{E,OC}^{-1} I_E \right)^H
\]

\[
= \frac{1}{2} u_L^H Y_{E,OC}^{-1} I_E \times I_E^H Y_{E,OC}^{-1H} u_L
\]

\[
= \frac{1}{2} u_L^H \left( Y_{E,OC}^{-1} \left( I_E^H \right) Y_{E,OC}^{-1H} \right) u_L
\]

\[
= \frac{1}{2} \frac{u_L^H Y_{E,OC}^{-1} \left( I_E^H \right) Y_{E,OC}^{-1H} u_L}{u_L^H Y_{E,OC}^{-1} \left( I_E^H I_E \right) Y_{E,OC}^{-1H} u_L}
\]  \(2.31\)

To calculate the noise measure, noise and gain must be calculated from Eq. 2.6. The available power gain can be calculated by substituting \(I_E = Y_S V_S\) into Eq. 2.31 and normalizing with the available input power from Eq. 2.25.

\[
G = \frac{\frac{1}{2} u_L^H Y_{E,OC}^{-1} \left( Y_S V_S Y_S^H \right) Y_{E,OC}^{-1H} u_L}{\frac{1}{2} \frac{Y_S V_S Y_S^H \times Y_S^H}{y_S + y_S^H}}
\]

\[
= \frac{\frac{1}{2} u_L^H Y_{E,OC}^{-1} \left( Y_S V_S Y_S^H \right) Y_{E,OC}^{-1H} u_L \times \frac{y_S + y_S^H}{y_S V_S \times V_S^H y_S^H}}
\]  \(2.32\)

Note that all components in the numerator and denominator of the second fraction are scalar and can be freely shifted in the multiplication chain. Using Eq. 2.17 and Eq. 2.18, the available gain can be written as in Eq. 2.33.

\[
G = \frac{u_L^H Y_{E,OC}^{-1} \left( \frac{y_S + y_S^H}{y_S V_S \times V_S^H y_S^H} Y_S V_S Y_S^H \right) Y_{E,OC}^{-1H} u_L}{u_L^H Y_{E,OC}^{-1} \left( Y_E \right) Y_{E,OC}^{-1H} u_L}
\]

\[
= \frac{u_L^H Y_{E,OC}^{-1} \left( \left( y_S + y_S^H \right) u_S u_S^H \right) Y_{E,OC}^{-1H} u_L}{u_L^H Y_{E,OC}^{-1} \left( Y_E \right) Y_{E,OC}^{-1H} u_L}
\]

\[
= \frac{u_L^H Y_{E,OC}^{-1} \left( Y_S + Y_S^H \right) Y_{E,OC}^{-1H} u_L}{u_L^H Y_{E,OC}^{-1} \left( Y_E \right) Y_{E,OC}^{-1H} u_L}
\]  \(2.33\)

Similarly, the noise power can be calculated by substituting \(I_E = Y_A V_N\) into Eq. 2.31.

\[
P_{\text{noise}} = \frac{1}{2} \frac{u_L^H Y_{E,OC}^{-1} \left( Y_A V_N V_N^H Y_A^H \right) Y_{E,OC}^{-1H} u_L}{u_L^H Y_{E,OC}^{-1} \left( Y_E \right) Y_{E,OC}^{-1H} u_L}
\]  \(2.34\)
Using Eq. 2.6, the noise measure can be written as in Eq. 2.35.

\[
M = \frac{1}{kT\Delta f} \frac{u_L H Y_{E,OC}^{-1} (Y_A V_N V_N^H Y_A^H) Y_{E,OC}^{-1} H u_L}{u_L H Y_{E,OC}^{-1} (Y_{E,OC} + Y_{E,OC}^H) Y_{E,OC}^{-1} H u_L} - 1
\]

\[
= -\frac{1}{2kT\Delta f} u_L H Y_{E,OC}^{-1} \left[ (Y_A V_N V_N^H Y_A^H) Y_{E,OC}^{-1} H u_L - u_L H Y_{E,OC}^{-1} (Y_{E,OC} + Y_{E,OC}^H) Y_{E,OC}^{-1} H u_L \right]
\]

\[
= -\frac{1}{2kT\Delta f} u_L H Y_{E,OC}^{-1} \left[ (Y_A V_N V_N^H Y_A^H) Y_{E,OC}^{-1} H u_L \right]
\]

To gain intuition over the previous equation, we assume that the peripheral network neither absorbs nor generates energy. For example, the peripheral network may consist of passive, loss-less components \(^3\). In this case, the total active power in this block should be zero

\[
P_{\text{loss-less}} = 0
\]

\[
= \text{Re}\{V^H \times Y_P V\}
\]

\[
= \frac{1}{2} \left[ V^H \times Y_P V + (V^H \times Y_P V)^H \right]
\]

\[
= \frac{1}{2} \left[ V^H \times Y_P V + V^H Y_P^H \times V \right]
\]

\[
= \frac{1}{2} V^H \left[ Y_P + Y_P^H \right] V
\]

Since this equation must hold for all possible \(V\) vectors, it can be concluded that

\[
Y_P + Y_P^H \bigg|_{Y_P: \text{loss-less}} = 0
\]

(2.37)

Therefore, under the assumption of a passive loss-less peripheral network, Eq. 2.35 can be simplified to

\[
M = -\frac{1}{2kT\Delta f} \frac{u_L H Y_{E,OC}^{-1} (Y_A V_N V_N^H Y_A^H) Y_{E,OC}^{-1} H u_L}{u_L H Y_{E,OC}^{-1} (Y_A + Y_A^H) Y_{E,OC}^{-1} H u_L}
\]

(2.38)

The minimum noise measure is desired here since it sets the lower bound of the SNR in a low-noise amplification chain. To simplify the calculations, new symbols are defined as

\(^3\)Note that network reciprocity is not used here and the only assumption is that the power flow is zero
follows:

\[ x_{W \times 1} = Y_{E,OC}^{-1H} u_L \]  
\[ A_{W \times W} = Y_A V_N V_N^H Y_A^H \]  
\[ B_{W \times W} = -2kT\Delta f (Y_A + Y_A^H) \]

(2.39)  
(2.40)  
(2.41)

Note that while \( A \) and \( B \) are fixed by the available active device, the vector \( x \) can be modified by the proper choice of load port and peripheral network. Therefore, the noise measure \( M_x \) is only a function of the vector \( x \)

\[ M_x = \frac{x^H A x}{x^H B x} \]  
(2.42)

Therefore, \( M_x \) should be minimized under the constraint \( g(x) = x^H A x - M_x x^H B x = 0 \) or

\[ g(x) = x^H (A - M_x B) x = 0 \]  
(2.43)

Since \( g(x) = 0 \) is constant, its derivative with respect to the real and imaginary parts of each component \( x_j \) of the vector \( x \) should be zero. With respect to the real parts of each component \( x_{j, re} \)

\[ \frac{\partial g(x)}{\partial x_{j, re}} = 0 \]  
\[ = \left( \frac{\partial x}{\partial x_{j, re}} \right)^H (A - M_x B) x \]  
\[ + x^H \left( -\frac{\partial M_x}{\partial x_{j, re}} B \right) x \]  
\[ + x^H (A - M_x B) \frac{\partial x}{\partial x_{j, re}} \]  
(2.44)

and with respect to the imaginary parts of each component \( x_{j, im} \)

\[ \frac{\partial g(x)}{\partial x_{j, im}} = 0 \]  
\[ = \left( \frac{\partial x}{\partial x_{j, im}} \right)^H (A - M_x B) x \]  
\[ + x^H \left( -\frac{\partial M_x}{\partial x_{j, im}} B \right) x \]  
\[ + x^H (A - M_x B) \frac{\partial x}{\partial x_{j, im}} \]  
(2.45)

\(^4\)\( A \) and \( B \) are both Hermitian. Moreover, \( A \) is also positive (semi)-definite.

\(^5\)\( x \) is used to emphasize that \( M \) is a function of \( x \).
CHAPTER 2. MILLIMETER-WAVE LNA DESIGN

For the optimal noise measure \( \lambda \), \( \frac{\partial M}{\partial x_j, re} = \frac{\partial M}{\partial x_j, im} = 0 \). Moreover, \( \frac{\partial x}{\partial x_j, re} = i \frac{\partial x}{\partial x_j, im} \). To satisfy the previous equations,

\[(A - \lambda B)x = B(B^{-1}A - \lambda I)x = 0 \quad (2.46)\]

In other words, all local optima of the noise measure \( \lambda \) are eigenvalues of the characteristic noise matrix \( N \), defined as

\[N = B^{-1}A = \frac{-1}{2kT\Delta f} (Y_A + Y_A^H)^{-1} Y_A V_N V_N^H Y_A^H \quad (2.47)\]

and the minimum noise measure is equal to the minimum eigenvalue \( \lambda_{\text{min}} \) of this matrix. Since the noise is stochastic, \( V_N V_N^H \) should be replaced by the noise correlation matrix

\[N = B^{-1}A = \frac{-1}{2kT\Delta f} (Y_A + Y_A^H)^{-1} Y_A \bar{V_N V_N^H} Y_A^H \quad (2.48)\]

\[= \frac{-1}{2kT\Delta f} (Y_A + Y_A^H)^{-1} I_N I_N^H \quad (2.49)\]

where \( I_N = Y_A V_N \). Similar results are obtained in [22]. The important conclusion is that using a passive loss-less embedding network does not change the minimum achievable noise measure. To achieve the minimum noise measure, circuits must be designed such that \( x \) is the (\( \alpha \) scaled) eigenvector \( e_{\lambda_{\text{min}}} \) of \( N \) corresponding to the minimum eigenvalue \( \lambda_{\text{min}} \), which implies

\[Y_{E,OC}^{-1H} u_L = \alpha e_{\lambda_{\text{min}}} \quad (2.50)\]

Assuming that \( u_L \) and \( u_S \) are real vectors, the previous equation can be simplified as

\[\beta u_L = Y_{E,OC}^T e_{\lambda_{\text{min}}}^* = (Y_S + Y_P + Y_A)^T e_{\lambda_{\text{min}}}^* \quad (2.51)\]

Where \( \beta = \frac{1}{\alpha} \) is used. Therefore, the peripheral network and the source impedance must be designed such that

\[y_S u_S u_S^H e_{\lambda_{\text{min}}}^* = \beta u_L - (Y_P + Y_A)^T e_{\lambda_{\text{min}}}^* \quad (2.52)\]

Since \( y_S \) and \( \beta \) are two complex independent variables, the above equation can always be satisfied in a two-port network, resulting in a minimum noise figure. Thus, the minimum noise figure is just a function of the inherent characteristics of the active device.

\(^6\)Alternatively, the Cauchy-Riemann equations can be used
2.3 Examples

CMOS Noise Measure

To calculate the minimum noise measure, a CMOS transistor is modeled as shown in Fig. 2.6. \( Y_A \) can be written as

\[
Y_A = \begin{bmatrix}
\frac{(C_\pi + C_\mu)S}{R_g(C_\pi + C_\mu)S + 1} & -\frac{C_\mu S}{R_g(C_\pi + C_\mu)S + 1} \\
\frac{g_m - C_\mu S}{R_g(C_\pi + C_\mu)S + 1} & \frac{R_g(C_\pi + C_\mu)S + 1}{R_g(C_\pi + C_\mu)S + 1}
\end{bmatrix}
\]

(2.53)

which satisfies KCL equation of

\[
\begin{bmatrix}
i_G \\
i_d
\end{bmatrix} = Y_A 
\begin{bmatrix}
v_G \\
v_D
\end{bmatrix}
\]

(2.54)

The effective current noise source can be written as

\[
I_N = \begin{bmatrix}
0 \\
i_{n,d}
\end{bmatrix} + Y_A \begin{bmatrix}
v_{n,g} \\
0
\end{bmatrix}
\]

(2.55)

Note that two noise sources shown in Fig. 2.6 are independent and have a power spectral density of

\[
\overline{i_{n,d}^2} = 4kT\gamma g_m \Delta f
\]

(2.56)

\[
\overline{v_{n,g}^2} = 4kTR_g \Delta f
\]

(2.57)

where the channel-induced gate current noise is ignored [23]. Assuming no correlation between noise sources \( <i_{n,d}, v_{n,g} >= 0 \),

\[
I_N I_N^H = \begin{bmatrix}
0 & 0 \\
0 & \overline{i_{n,d}^2}
\end{bmatrix} + Y_A \begin{bmatrix}
\overline{v_{n,g}^2} & 0 \\
0 & 0
\end{bmatrix} Y_A^H
\]

(2.58)

\[7\text{It should be noted that without loss of generality, all purely imaginary parasitic elements at the gain/source/drain nodes can be considered as part of the loss-less peripheral network}\]
Consequently, the characteristic noise matrix can be calculated using Eq. 2.47. This matrix has two eigenvalues:

\[
\lambda = \frac{1}{1 - \frac{1}{U}} \left[ 2\kappa + \frac{1}{2U} \pm \sqrt{4\kappa + \left( 2\kappa - \frac{1}{2U} \right)^2} \right]
\]  

(2.59)

where \( U \) is Mason’s unilateral power gain, which can be calculated as [24][25]

\[
U = \frac{1}{4R_g \left( g_{ds} + g_m \frac{C_\mu}{C_\mu + C_\pi} \right)} \left( \frac{\omega_T}{\omega} \right)^2
\]  

(2.60)

and

\[
\kappa = \gamma g_m R_g \left( \frac{\omega}{\omega_T} \right)^2
\]  

(2.61)

\[
\omega_T = \frac{g_m}{C_\mu + C_\pi}
\]  

(2.62)

Since a positive power gain corresponds to a positive noise measure, only the positive eigenvalue is acceptable. Therefore, for a CMOS amplifier, the minimum achievable noise measure is

\[
M_{\text{min}} = \frac{1}{1 - \frac{1}{U}} \left[ 2\kappa + \frac{1}{2U} + \sqrt{4\kappa + \left( 2\kappa - \frac{1}{2U} \right)^2} \right]
\]  

(2.63)

To gain insight into the behavior of the noise measure as a function of frequency, the above equation should be simplified by making reasonable assumptions. First, note that while both \( U \) and \( \kappa \) are frequency-dependent, when

\[
\frac{g_{ds}}{g_m} + \frac{C_\mu}{C_\mu + C_\pi} \ll \gamma
\]  

(2.64)

then it can be concluded that

\[
\frac{1}{2U} \ll 2\kappa
\]  

(2.65)

which is independent of the operating frequency and results in

\[
M_{\text{min}} \approx \frac{1}{1 - \frac{1}{U}} \left[ 2\kappa + \sqrt{4\kappa^2} \right]
\]  

(2.66)

At low frequencies where \( \kappa < 1 \),

\[
M_{\text{min}} \approx \frac{2\sqrt{\kappa}}{1 - \frac{1}{U}} \left( 1 + \sqrt{\kappa + \frac{\kappa}{2}} \right)
\]  

(2.67)
and it can be observed that the minimum noise measure increases as a linear function of frequency

$$M_{\text{min}}|_{\omega \ll \omega_T} \approx \sqrt{4\gamma g_m R_g \frac{\omega}{\omega_T}}$$

(2.68)

which shows a similar trend as the minimum noise figure of the transistor. However, as the operating frequency approaches $f_{\text{max}}$ of the device where $U = 1$, the noise measure approaches infinity. Fig. 2.7 shows a comparison between the simulation of a commercial CMOS 28nm PDK (post layout extraction) \(^8\) vs. the calculation results of Eq. 2.63 with the parameters from Table. 2.1\(^9\). Even at the frequency of $f = \frac{f_{\text{max}}}{2}$, the error of Eq. 2.68 is less than 50%. As a rule of thumb, this frequency should be used to evaluate the feasibility of implementing low-noise amplifiers for any technology.

![Figure 2.7: Noise measure vs. frequency](image)

**Multiple Active Devices**

Since a single device has a minimum noise measure, one might think that using multiple active devices would improve performance. For example, in a noise-canceling LNA, an auxiliary amplifier helps reduce the main amplifier’s noise. In this section, we present a simple case with two amplifiers to show that the minimum noise measure of a single device also dictates the minimum noise measure of any combination of multiple devices. Assuming that the noise

---

\(^8\) The simulation method is explained in the appendix.

\(^9\) Note that the parameters from Table. 2.1 are only used to show the trends based on the calculations and are not extracted from the PDK.
Table 2.1: Parameter values used for calculations

(\(I_{N1}, I_{N2}\)) of two amplifiers (\(Y_{A1}, Y_{A2}\)) is uncorrelated, the characteristic noise matrix can be written as in Eq. 2.69

\[
N = \frac{-1}{2kT\Delta f} \left( Y_A + Y_A^H \right) \left( Y_A + Y_A^H \right)^{-1} I_N I_N^H \\
= \frac{-1}{2kT\Delta f} \left[ \begin{array}{cc} Y_{A1} + Y_{A1}^H & 0 \\ 0 & Y_{A2} + Y_{A2}^H \end{array} \right]^{-1} \left[ \begin{array}{cc} I_{N1}I_{N1}^H & 0 \\ 0 & I_{N2}I_{N2}^H \end{array} \right] \\
= \frac{-1}{2kT\Delta f} \left[ \begin{array}{cc} \left( Y_{A1} + Y_{A1}^H \right)^{-1} & 0 \\ 0 & \left( Y_{A2} + Y_{A2}^H \right)^{-1} \end{array} \right] \left[ \begin{array}{cc} I_{N1}I_{N1}^H & 0 \\ 0 & I_{N2}I_{N2}^H \end{array} \right] \\
= \frac{-1}{2kT\Delta f} \left[ \begin{array}{cc} \left( Y_{A1} + Y_{A1}^H \right)^{-1} & 0 \\ 0 & \left( Y_{A2} + Y_{A2}^H \right)^{-1} \end{array} \right] \left[ \begin{array}{cc} I_{N1}I_{N1}^H & 0 \\ 0 & I_{N2}I_{N2}^H \end{array} \right] \\
\tag{2.69}
\]

Since the eigenvalues of a block-diagonal matrix are the combination of the eigenvalues of the original sub-matrices, the minimum noise measure is equal to the minimum noise measures of the two amplifiers. Therefore, noise-canceling topologies cannot improve the minimum noise measure.

2.4 Design of Low-Noise CS Amplifiers with Single Feedback Component

In this part, we design a simple common-source stage for the minimum noise measure. As mentioned earlier, it is always possible to achieve the minimum noise measure in a two-port network. However, the correct value of the source impedance may be far from the matching condition. Here, a passive, reactive feedback component from the drain to the gate of the transistor is used to improve the input reflection of the LNA, as shown in Fig. 2.8. The following procedure is used in the simulation:
1. \( Y_A \) and \( \mathbf{e}_{\lambda_{\min}} \) are extracted to be used in Eq. 2.52.

2. Given the admittance \( y_f \) of the feedback component,

\[
Y_P = \begin{bmatrix} y_f & -y_f \\ -y_f & y_f \end{bmatrix}
\]

(2.70)

the optimal source impedance can be calculated from Eq. 2.52 as

\[
y_s e_1^* = -(y_f + y_{A_{11}}) e_1^* - (-y_f + y_{A_{21}}) e_2^*
\]

(2.71)

where \( \mathbf{e}_{\lambda_{\min}} = \begin{bmatrix} e_1 \\ e_2 \end{bmatrix} \). Therefore, the minimum noise measure is guaranteed when

\[
y_{S_{Opt}} = -(y_f + y_{A_{11}}) + (y_f - y_{A_{21}}) \frac{e_2^*}{e_1^*}
\]

(2.72)

3. While sweeping the feedback admittance, the maximum available power gain (\( G_{\text{max}} \)) is measured and compared to the available power gain (\( G_a \)) for the source impedance calculated for the minimum noise measure (\( y_{S_{Opt}} \)). The admittance which corresponds to the smallest difference between two gain metrics (\( G_{\text{max}} \) and \( G_a \)) is the optimal feedback admittance for a low-noise CS stage to satisfy the minimum noise measure of the technology while minimizing input reflection.

4. When multiple stages are cascaded, the output matching network should be designed so that the effective source impedance seen by each stage is equal to the optimal source impedance (\( y_{S_{Opt}} \)). Otherwise, the output matching network should be designed to achieve a conjugate match at the output when a single stage is used. The key observation here is that the minimum noise figure is guaranteed as long as the source impedance of \( y_{S_{Opt}} \) is provided to each stage.

Fig. 2.9 shows the optimum feedback admittance and various power gain factors at 190GHz for a 28nm CMOS transistor after parasitic extraction. It should be noted that a neutralized device does not necessarily have the minimum noise measure. Moreover, designers can meet the minimum noise measure of technology while achieving a power gain much higher than the Mason’s unilateral power gain [26].
2.5 Design of Low-Noise CS Amplifiers with General Peripheral Network

As we have already shown, a feedback component at the gate-drain ports can only improve the input reflection to a limited extent while achieving the minimum noise measure. In this section, we consider general passive, reactive, and reciprocal peripheral networks, as shown in Fig. 2.10, to investigate whether the minimum noise measure can be achieved with a simultaneous conjugate match at the input and output. Starting from a general double-side tuning (Fig. 2.10b)

\[ y_S u_S u_S^H e_{\lambda_{\text{min}}}^{r_s} = \beta u_L - (Y_P + Y'_A)^T e_{\lambda_{\text{min}}}^{r_s} \]  

(2.73)

where

\[ Y'_A = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 0 & y_{A_{11}} & y_{A_{12}} & 0 \\ 0 & y_{A_{21}} & y_{A_{22}} & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix} \]  

(2.74)
CHAPTER 2. MILLIMETER-WAVE LNA DESIGN

\[ e'_{\lambda_{\text{min}}} = \begin{bmatrix} 0 \\ e_1 \\ e_2 \\ 0 \end{bmatrix} \] (2.75)

\[ Y'_{P} = \begin{bmatrix} y_{P_{11}} & y_{P_{12}} & y_{P_{13}} & y_{P_{14}} \\ y_{P_{12}} & y_{P_{22}} & y_{P_{23}} & y_{P_{24}} \\ y_{P_{13}} & y_{P_{23}} & y_{P_{33}} & y_{P_{34}} \\ y_{P_{14}} & y_{P_{24}} & y_{P_{34}} & y_{P_{44}} \end{bmatrix} \] (2.76)

where an order of source, gate, drain, and load is used for the port indices. Note that under the assumption that the components of \( y \neq \infty \) are finite, the minimum noise measure can be obtained when

\[ \begin{bmatrix} 0 \\ 0 \\ 0 \\ \beta \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix} - \begin{bmatrix} 0 & y_{A_{12}} & y_{A_{22}} + y_{P_{23}} & 0 \\ 0 & y_{A_{12}} + y_{P_{23}} & y_{A_{22}} + y_{P_{33}} & 0 \\ 0 & y_{P_{24}} & y_{P_{34}} & 0 \end{bmatrix} \begin{bmatrix} 0 \\ e_1^* \\ e_2^* \\ 0 \end{bmatrix} \] (2.77)

is satisfied. Unfortunately, with three purely imaginary variables \( y_{P_{22}}, y_{P_{23}}, \) and \( y_{P_{33}}, \) the four equations resulting from the real and imaginary parts of the second and third lines cannot be satisfied.

For a source-side tuning (Fig. 2.10a), the following equations are defined:

\[ Y'_{A} = \begin{bmatrix} 0 & 0 & 0 \\ 0 & y_{A_{11}} & y_{A_{12}} \\ 0 & y_{A_{22}} & y_{A_{22}} \end{bmatrix} \] (2.78)

\[ e'_{\lambda_{\text{min}}} = \begin{bmatrix} 0 \\ e_1 \\ e_2 \end{bmatrix} \] (2.79)

\[ Y'_{P} = \begin{bmatrix} y_{P_{11}} & y_{P_{12}} & y_{P_{13}} \\ y_{P_{12}} & y_{P_{22}} & y_{P_{23}} \\ y_{P_{13}} & y_{P_{23}} & y_{P_{33}} \end{bmatrix} \] (2.80)

where an order of source, gate, and load is used for the port indices. The minimum noise measure is reached when

\[ \begin{bmatrix} 0 \\ 0 \\ \beta \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix} - \begin{bmatrix} 0 & y_{P_{12}} & y_{P_{13}} \\ 0 & y_{A_{11}} + y_{P_{22}} & y_{A_{21}} + y_{P_{23}} \\ 0 & y_{A_{12}} + y_{P_{23}} & y_{A_{22}} + y_{P_{33}} \end{bmatrix} \begin{bmatrix} 0 \\ e_1^* \\ e_2^* \end{bmatrix} \] (2.81)
is satisfied. While the previous one no longer exists, the first line with two purely imaginary $y_{12}$ and $y_{13}$ can only be satisfied if only $\frac{e_1}{e_2}$ is purely real, a condition that is not usually satisfied.

If you use load-side tuning (Fig. 2.10c), you get the following equations:

$$ Y'_A = \begin{bmatrix} 0 & 0 & 0 \\ y_{A11} & y_{A12} & 0 \\ y_{A21} & y_{A22} & 0 \end{bmatrix} \quad (2.82) $$

$$ e'_{\lambda_{\text{min}}} = \begin{bmatrix} e_1 \\ e_2 \\ 0 \end{bmatrix} \quad (2.83) $$

$$ Y'_P = \begin{bmatrix} y_{P11} & y_{P12} & y_{P13} \\ y_{P21} & y_{P22} & y_{P23} \\ y_{P31} & y_{P32} & y_{P33} \end{bmatrix} \quad (2.84) $$

where an order of source, drain, and load is used for the port indices. The minimum noise measure is obtained when

$$ \begin{bmatrix} y_S e_1^* \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} - \begin{bmatrix} y_{A11} + y_{P11} & y_{A12} + y_{P12} & 0 \\ y_{A21} + y_{P21} & y_{A22} + y_{P22} & 0 \\ y_{A31} & y_{A32} & y_{P33} \end{bmatrix} \begin{bmatrix} e_1^* \\ e_2^* \\ 0 \end{bmatrix} \quad (2.85) $$

The second line of this equation forces $y_{P12}$ and $y_{P22}$ such that

$$(y_{A12} + y_{P12}) e_1^* + (y_{A22} + y_{P22}) e_2^* = 0 \quad (2.86)$$

Once solved, the optimal source impedance is

$$ y_{S,\text{Opt}} = -(y_{A11} + y_{P11}) - (y_{A21} + y_{P12}) \frac{e_2^*}{e_1^*} \quad (2.87) $$

Note that for each value of $y_{P13}$ and $y_{P33}$, there is a $\beta$ that satisfies the minimum noise measure. Therefore, these two $y$-parameters can be optimized such that the calculated optimal source impedance also provides the correct input impedance for power matching. Unfortunately, the required $y_{P12}$ and $y_{P22}$ may result in an unstable amplifier, limiting the use of this technique.

### 2.6 Optimal Bias Condition

Considering that the minimum noise figure and the maximum available power gain together play a crucial role in the performance of an LNA, neither of them should be considered alone
to find the optimal bias condition. Moreover, the optimal source impedance for the minimum noise figure and maximum available gain may be different due to the noise correlation at different ports. For this reason, any noise measure proxy such as

$$M_{\text{proxy}} = \frac{NF_{\text{min}} - 1}{1 - \frac{1}{G_{\text{max}}}}$$ (2.88)

will not be correct. Fig. 2.11 shows that the minimum noise measure is obtained at about 100μA/μm current density. Note that in some cases, the BSIM noise models show a wrong trend [27], which is not physically possible. Therefore, the optimal current density should be chosen based on the measured data.

Note that the optimal current density calculated here is lower than most typical LNAs. This is because it assumes that a loss-less matching network can be implemented. However, in typical millimeter-wave amplifiers, matching networks contribute 1 to 3dB to the insertion loss. Note that the noise measure of cascaded amplifiers can be derived as

$$M_C = M_1 + (M_2 - M_1) \frac{G_2 - 1}{G_1 G_2 - 1}$$ (2.89)

By definition, the noise measure of a passive lossy device is equal to $-1$. Therefore, the noise measure of the amplifier with its matching network can be calculated as

$$M_C = M_1 + (M_1 + 1) \frac{1 - IL}{G_1 IL - 1}$$ (2.90)

where $IL$ is the insertion loss of the matching network. Note that as the gain of the amplifier decreases, even if the noise measure for the active device is the same, the noise measure of the cascade decreases. Therefore, the optimal current density should be chosen in an iterative process. For most of the low-noise amplifiers implemented in this work, a current density of $\approx 200\mu\text{A}/\mu\text{m}$ is used.
2.A Simulation Flow of Minimum Noise Measure

Since most simulation tools do not directly calculate the noise measure, this method is used with Spectre.

1. The bias circuit of the device under test is present.

2. Ports with 50Ω internal impedance are added.

3. Noise generation of all ports is enabled, and the noise temperature is explicitly set to the simulation temperature.

4. The S-parameter simulation is set to the correct frequency range.

5. Two sets of output files are generated:

   - Y-parameters: With the data format set to “touchstone”, the parameter type set to “y”, and the noise data set to “no”, the normalized y-parameters of the device under test are calculated and extracted. This file can be read immediately by the CAD tools.

   - Noise Cross-Correlation: With the data format set to “Spectre”, the parameter type set to “y”, and the noise data set to “cy”, the normalized noise cross-correlation matrix of the device under test is calculated and extracted. Since the noise cross-correlation matrix is a Hermitian matrix, only half of the entries are exported: diagonal values with a single real number and off-diagonal values with a pair of real and imaginary numbers. Since this format is not necessarily compatible with CAD tools, the extracted values must be put into a suitable format (e.g., CSV) and then imported.

6. Y-parameter and noise cross-correlation files are imported. Since each of these files is normalized, the normalization factors should be considered.

   - The Y-parameters are normalized to $(50\Omega)^{-1}$, and therefore all entries of the Y-matrix should be multiplied by $(50\Omega)^{-1}$.

   - The noise cross-correlation matrix is normalized to $4kT\Delta f$, where $T$ is the port temperature and not the simulation temperature. Note that the characteristic noise matrix must also be normalized by a factor of $2kT\Delta f$.

7. The following equation can be used to derive the characteristic matrix:

$$\mathbf{N} = -2 \times (\mathbf{Y}_A + \mathbf{Y}_A^H)^{-1} \left( \frac{\mathbf{I}_N\mathbf{I}_N^H}{4kT\Delta f} \right)$$

(2.91)
8. The eigenvalues of the characteristic noise matrix are calculated, and the smallest positive value is taken as the minimum noise measure. If there is no positive eigenvalue, it can be concluded that the power gain is less than 0dB.
Chapter 3

140GHz Receiver Design

In this chapter, a wideband receiver at 140GHz is explained. Note that the carrier frequency is close to the $f_t^2$, so the receiver chain should be carefully optimized to get the most out of the available technology.

Fig. 3.1 shows the block diagram of the receiver. Each section is carefully examined in the remainder of this chapter.

3.1 Low-Loss LC Matching Networks

Before implementing the receiver, it is instructive to study the behavior of the matching network since the insertion loss of the matching network is not negligible at millimeter-wave frequencies. The insertion loss of the matching network in Fig. 3.2 is

$$ IL = \frac{P_L}{P_L + P_M} \tag{3.1} $$

Under the assumption of series matching

$$ IL = \frac{I^2 R_L}{I^2 (R_L + R_M)} = \frac{R_L}{R_L + R_M} \tag{3.2} $$. 

---

1 $f_t$ is the unity current-gain frequency.
2 Without loss of generality, parallel elements exhibit the same behavior.
where $R_L$, $X_L$, $R_M$, and $X_M$ are resistance and reactance of the load and matching component impedances. Therefore,

$$IL = \frac{R_L}{R_L + \frac{X_M}{Q_M}}$$  \hspace{1cm} (3.3)$$

$$= \frac{R_L}{R_L + \frac{(X_L + X_M) - X_L}{Q_M}}$$  \hspace{1cm} (3.4)$$

$$= \frac{1}{1 + \frac{X_L + X_M - X_L}{Q_M}}$$  \hspace{1cm} (3.5)$$

$$= \frac{1}{1 + \frac{X_L + X_M - Q_L}{Q_M}}$$  \hspace{1cm} (3.6)$$
If the insertion loss were small, it could be easily simplified at this point. However, most on-chip networks have a high loss. The above equation can be written as follows to get the exact formula

\[ IL = \frac{1}{1 + \frac{X_L + X_M}{R_L + R_M} \frac{R_L + R_M}{R_L + R_M} - Q_L} \]

\[ = \frac{1}{1 + \frac{Q_E R_L + R_M}{Q_L} - Q_L} \]

\[ = \frac{1}{1 + \frac{Q_E R_L + R_M}{Q_L} - Q_L} \]

(3.7)

where \( Q_E \) is the equivalent quality factor of the impedance seen at the end of the matching network (Fig. 3.2). The exact insertion loss can be derived by solving the previous equation as

\[ IL = \frac{Q_M - Q_E}{Q_M - Q_L} \]

(3.8)

\[ = 1 - \frac{Q_E}{Q_M} \]

(3.9)

Note that in a low-loss network, where \( Q_M \gg Q_E \) and \( Q_M \gg Q_L \), the insertion loss can be calculated approximately as

\[ IL = 1 - \frac{Q_E}{Q_M} \]

(3.10)

\[ = \frac{1}{1 + \frac{Q_E}{Q_M}} \]

(3.11)

Note that in the above equations, the quality factors are defined as \( Q_M = \frac{X_M}{R_M} \), which means that the quality factor of a capacitor is negative; and the quality factor of an inductor is positive, as in Fig. 3.3a. From this, we conclude that

\[ Q_L < Q_M \Rightarrow Q_E < Q_M \]

(3.12)

\[ Q_M < Q_L \Rightarrow Q_M < Q_E \]

(3.13)

When using multiple matching components as in Fig. 3.4,

\[ IL = \frac{P_L}{P_L + P_{M1} + P_{M2}} \]

\[ = \frac{P_L}{P_L + P_{M1}} \frac{P_L}{P_L + P_{M2}} \]

\[ = IL_2 \times IL_1 \]

\[ = \frac{Q_{M2} - Q_{E2}}{Q_{M2} - Q_{E1}} \frac{Q_{M1} - Q_{E1}}{Q_{M1} - Q_{E1}} \]

(3.14)
CHAPTER 3. 140GHZ RECEIVER DESIGN

(a) The contours of the constant Q

(b) Same insertion loss for the two transitions is expected

Figure 3.3: Definition of Q and moving between different Q-contours.

\[ IL = \frac{Q_M - Q_{E2}}{Q_M - Q_L} \]  \hspace{1cm} (3.15)

Figure 3.4: Cascade of several elements

Figure 3.5: Circuit model used to obtain the maximum transmission
Let us now design a matching network for an amplifier with circuit model shown in Fig. 3.5. Let us assume that it is a unilateral amplifier

$$G_{\text{tot}} = G_{\text{mux}} \times (1 - |\Gamma|^2) \times IL$$  \hspace{1cm} (3.16)

$$T = (1 - |\Gamma|^2) \times IL$$  \hspace{1cm} (3.17)

A suitable matching network should maximize $T$. In a loss-less system ($IL = 1 = 0\text{dB}$), maximum transmission is achieved when reflection is minimized. However, if we consider a lossy network, the optimum looks different. Note that $1 - |\Gamma|^2$ represents the accepted power normalized to the available power

$$1 - |\Gamma|^2 = \frac{P_S}{P_{S,\text{max}}^2}$$ \hspace{1cm} (3.18)

$$P_{S,\text{max}} = \frac{V_S^2}{(2R_S)^2} R_S$$ \hspace{1cm} (3.19)

$$P_S = \frac{V_S^2}{|R_S(1 + jQ_S) + R_E(1 + jQ_E)|^2} R_E$$ \hspace{1cm} (3.20)

$$1 - |\Gamma|^2 = \frac{|R_S(1 + jQ_S) + R_E(1 + jQ_E)|^2}{V_S^2 R_S} R_E$$ \hspace{1cm} (3.21)

$$= \frac{4R_SR_E}{|R_S(1 + jQ_S) + R_E(1 + jQ_E)|^2}$$ \hspace{1cm} (3.22)

$$= \frac{(R_S + R_E)^2 + (R_SQ_S + R_EQ_E)^2}{(R_S + R_E)^2 + (R_EQ_S + R_EQ_E)^2}$$ \hspace{1cm} (3.23)

$$T = \frac{4R_SR_E}{(R_S + R_E)^2 + (R_EQ_S + R_EQ_E)^2} \frac{Q_M - Q_E}{Q_M - Q_L}$$ \hspace{1cm} (3.24)

Let us first consider the case where the loss of the matching network is negligible. In this case, the conjugate matching condition yields maximum transmission when $Q_E = -Q_S$ and $R_S = R_E$. The assumption of a low-loss matching network holds as long as

$$\left| \frac{Q_S}{Q_M} \right| \ll 1, \left| \frac{Q_L}{Q_M} \right| \ll 1 \Rightarrow IL \approx \left( 1 + \frac{Q_S}{Q_M} \right) \left( 1 + \frac{Q_L}{Q_M} \right)$$ \hspace{1cm} (3.25)

Suppose that the quality factor of the source or load is comparable to the magnitude of the quality factor of the components of the matching network. In this case, the conjugate matching does not provide the maximum transmission. To achieve the maximum transmission

$$\frac{\partial T}{\partial R_E} = 0 \Rightarrow R_E = R_S \sqrt{\frac{1 + Q_S^2}{1 + Q_L^2}}$$ \hspace{1cm} (3.26)
\[
\frac{\partial T}{\partial Q_E} \bigg|_{R_E = R_S} \frac{\sqrt{1 + Q_S^2}}{1 + Q_E^2} = 0 \Rightarrow Q_{E,\text{opt}} = -Q_S + \frac{2Q_M(Q_S^2 + 1)}{1 + 2Q_MQ_S - Q_M^2} \tag{3.27}
\]

Figure 3.6: The optimal input quality factor for the network with \(Q_M = 20\).

As shown in Fig. 3.6, the optimal quality factor is different from the conjugate matching condition. Assuming a reasonable passive component \(|Q_M| > 1\)

\[
T_{\text{opt}} = \frac{Q_M^2 + 1}{(Q_M - Q_S)(Q_M - Q_L)} \tag{3.28}
\]

\[
R_{E,\text{opt}} = R_S - \frac{2(Q_MQ_S + 1)}{Q_M^2 + 1} R_S \tag{3.29}
\]

\[
X_{E,\text{opt}} = -X_S - \frac{2(Q_M - Q_S)}{Q_M^2 + 1} R_S \tag{3.30}
\]

Fig. 3.7a shows the transmission loss in a lossy matching network. Note that the insertion loss for source and load quality factors is not symmetric. This may seem unreasonable and counterintuitive. To understand this problem, consider Fig. 3.8, in which \(Z_L = 1 - j5\Omega\) is matched to \(Z_S = 2 - j5\Omega\) using a lossy inductor with \(Z_M = 1 + j10\Omega\). In this simple schematic, the input impedance on the source side is \(Z_{E,\text{Source}} = 1 + j10 + 1 - j5 = 2 + j5\Omega\), which provides a perfect conjugate match on the source side and eliminates any reflections (\(\Gamma_S = 0\)). On the other hand, the output impedance on the load side is \(Z_{E,\text{Load}} = 1 + j10 + 2 - j5 = 3 + j5\Omega\). Although the termination on the source side is matched, the load impedance sees an unmatched termination with a reflection of \(\Gamma_L = \frac{1}{2} = -6\text{dB}\). Therefore, the asymmetry of Fig. 3.7a is due to the choice of which port is matched and which port has nonzero reflection.
3.2 Transformers

Transformers are popular at millimeter-wave frequencies. Let us study their performance and compare them with LC ladder networks. For a lossy transformer,\[ Z = \begin{bmatrix} R_p + j \omega L_p & j \omega M \\ j \omega M & R_s + j \omega L_s \end{bmatrix} \] (3.31)
### Table 3.1: Summery of different matching network design methodologies

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Conjugate Matched</th>
<th>Transmission Optimized</th>
</tr>
</thead>
<tbody>
<tr>
<td>Resistance</td>
<td>$R_S$</td>
<td>$R_S - \frac{2(Q_MQ_S+1)}{Q_M^2+1} R_S$</td>
</tr>
<tr>
<td>Reactance</td>
<td>$-X_S$</td>
<td>$-X_S - \frac{2(Q_M-Q_S)}{Q_M^2+1} R_S$</td>
</tr>
<tr>
<td>Transmission Loss</td>
<td>$\frac{Q_M+Q_S}{Q_M-Q_L}$</td>
<td>$\frac{Q_M^2+1}{(Q_M-Q_S)(Q_M-Q_L)}$</td>
</tr>
<tr>
<td>Optimum $Q_E$</td>
<td>$-Q_S$</td>
<td>$-Q_S + \frac{2Q_M(Q_S^2+1)}{1+2Q_MQ_S-Q_M^2}$</td>
</tr>
<tr>
<td>Comments</td>
<td>Impractical when $</td>
<td>Q_M</td>
</tr>
</tbody>
</table>

Assuming $Z_{jk} = m_{jk} + in_{jk}$, the stability K-factor can be calculated as

\[
K = \frac{2m_{11}m_{22} - P}{L}
\]

\[
= \frac{2R_p R_s + \omega^2 M^2}{\omega^2 M^2}
\]

\[
= 1 + \frac{2R_p R_s}{\omega^2 M^2}
\]

\[
= 1 + \frac{2}{k^2 Q_p Q_s}
\]
where \( Z_{12Z_{21}} = P + jB = |L|e^{j\theta} \) and \( M = k\sqrt{L_pL_s} \), \( Q_p = \frac{\omega L_p}{R_p} \), \( Q_s = \frac{\omega L_s}{R_s} \)

\[
G_{max} = \frac{1}{K + \sqrt{K^2 - 1}} = K - \sqrt{K^2 - 1}
\]

\[
= 1 + \frac{2}{k^2Q_pQ_s} - 2\sqrt{\frac{1}{k^2Q_pQ_s} \left( 1 + \frac{1}{k^2Q_pQ_s} \right)}
\]

\[
= \frac{1}{1 + \frac{2}{k^2Q_pQ_s} + 2\sqrt{\frac{1}{k^2Q_pQ_s} \left( 1 + \frac{1}{k^2Q_pQ_s} \right)}}
\]

\[
= \frac{k^2Q_pQ_s}{k^2Q_pQ_s + 2 + 2\sqrt{k^2Q_pQ_s + 1}}
\]

\[
= \frac{\sqrt{k^2Q_pQ_s + 1} - 1}{\sqrt{k^2Q_pQ_s + 1} + 1}
\]

Based on the equations of [28], the optimal terminations on each side of the transformer can be calculated as

\[
Z_{1, opt} = m_{11}\Delta + j \left[ \frac{B}{2m_{22}} - n_{11} \right]
\]

where

\[
\Delta = \sqrt{1 - \frac{P}{m_{11}m_{22}} - \left( \frac{B}{2m_{11}m_{22}} \right)^2}
\]

Using the above equations, the optimal impedance can be calculated as

\[
Z_{p, opt} = \sqrt{k^2Q_pQ_s + 1}R_p - j\omega L_p
\]

\[
= \frac{\sqrt{k^2Q_pQ_s + 1}}{Q_p} \omega L_p - j\omega L_p
\]

Similarly,

\[
Z_{s, opt} = \sqrt{k^2Q_pQ_s + 1}R_s - j\omega L_s
\]

\[
= \frac{\sqrt{k^2Q_pQ_s + 1}}{Q_s} \omega L_s - j\omega L_s
\]

These optimal impedances are shown in Fig. 3.5. It is instructive to see whether or not the transformer outperforms the LC ladder networks in terms of transmission losses. Note that for a conjugate matching condition, the source and load quality factors (\( Q_S \) and \( Q_L \)) can be calculated based on the coupling factor (\( k \)) and the primary and secondary quality factors.
Figure 3.9: Optimal loading condition to achieve the minimum insertion loss of the transformer

\( Q_S = -\frac{Q_p}{\sqrt{k^2 Q_p Q_s + 1}} \)  \( Q_L = -\frac{Q_s}{\sqrt{k^2 Q_p Q_s + 1}} \) (3.47, 3.48)

If LC ladder networks were used,

\[ IL_{LC} = \frac{Q_M + Q_S}{Q_M - Q_L} \]
\[ = \frac{Q - \frac{Q}{\sqrt{k^2 Q_s^2 + 1}}}{Q + \frac{Q}{\sqrt{k^2 Q_s^2 + 1}}} \]
\[ = \frac{\sqrt{k^2 Q_s^2 + 1} - 1}{\sqrt{k^2 Q_s^2 + 1} + 1} \] (3.50, 3.51)

where the same quality factors \( Q_M = Q_p = Q_s = Q \) are considered for all inductors for a fair comparison. Note that the insertion loss of an LC ladder network is the same as that of its transformer counterpart. There are mainly two factors that determine which matching strategy is better. First, in a conjugate matched circuit with an LC ladder network, comparable source and load quality factors are not required for the minimum insertion loss. On the other hand, if impedance transformation is the goal, an optimally matched transformer provides an impedance transformation of

\[ \frac{R_S}{R_L}_{\text{Transformer}} = \frac{L_s}{L_p} \] (3.52)

However, in an LC ladder network without additional capacitors, there is a minimum and a maximum impedance that can be achieved with step-up or step-down networks (Fig. 3.10) which is

\[ \frac{1}{1 + Q_s^2} < \frac{R_S}{R_L}_{\text{LC}} < 1 + Q_L^2 \] (3.53)
Beyond this range, additional capacitors are required in the matching network, and the additional inductive energy resonating with the new capacitive energy increases the power dissipation. Therefore, when the source and load have a relatively low quality factor but a high impedance transformation is required, transformers are superior to their LC ladder counterparts.

In a CMOS process, neglecting gate-drain capacitance,

\[
Q_{\text{Gate}} = \frac{-1}{R_g \omega C_g} \quad (3.54)
\]

\[
Q_{\text{Drain}} = R_d \omega C_d \quad (3.55)
\]

where \( R_g, C_g, R_d \) and \( C_d \) are the gate series resistance, gate capacitance, drain output resistance and drain capacitance, respectively. As the frequency increases, the gate quality factor increases while the gate quality factor decreases (Fig. 3.11). While they are completely different at RF frequencies, these two quality factors become comparable in the millimeter-wave range. Therefore, transformers can be used for matching between stages.

Note that assuming high quality factor transformers \( k^2 Q_p Q_s \gg 1 \), the optimal termination quality factor approaches \( |\frac{1}{k}| \) when assuming similar quality factors for primary and secondary coils. Therefore, for a conjugate matched network,

\[- Q_S = \frac{1}{|k|} \quad (3.56)\]

For example, if the drain and gate quality factors are \( Q_S = Q_g = Q_d \approx 10 \), a transformer with a coupling factor of \( |k| = 0.1 \) is required. However, as mentioned earlier, the optimal
transmission does not occur under conjugate matched conditions. Using the equations for optimal transmission

\[-Q_S + \frac{2Q_M(Q_S^2 + 1)}{1 + 2Q_MQ_S - Q_M^2} = \frac{1}{|k|}\]  

(3.57)

If transformers with quality factors of \(Q_M = Q_p = Q_s = 30\) were used in the previous example, a transformer with a coupling factor of \(|k| = 0.16\) provides optimal transmission. Given the low coupling factor of the transformer, its physical shape can be optimized to achieve the highest quality factor possible with the technology. Fig. 3.12a shows how high-k transformers are typically implemented. Note that two thick metal layers are required if no bridges are used. The coupling factor can be reduced by moving the two loops away from each other (Fig. 3.12b).

If a low coupling is desired, the transformer could be implemented with a single thick metal layer, as in Fig. 3.13a, where two single inductors are broadside-coupled. With octagonal loops, a maximum coupling of \(|k| = \frac{1}{8} = 0.12\) can be achieved since only one of eight
edges is coupled. When triangular loops are used, as in Fig. 3.13b, coupling factors as high as \( k = 0.3 \) can be achieved when \( L_s = L_p \).

![Figure 3.13: The symbolic structure of a transformer with broadside coupling](image)

(a) Low-k  
(b) High-k

In some situations, primary and secondary coils must differ for impedance transformation while maintaining moderate to high coupling factors. At RF frequencies, this can be easily accomplished by using an inductor with multiple turns stacked over a single-turn inductor. At millimeter-wave frequencies, the self-resonance-frequency of the transformer prohibits the use of multi-turn inductors. In this case, transformer equivalents can be used, as in Fig. 3.14.

In the series equivalent of Fig. 3.14c, where

\[
L''_s + L_{ss} = L_s
\]  
(3.58)

the new transformer has the same Z-matrix as the reference transformer if

\[
\frac{M}{(L_p - M) + M} = \frac{M''}{(L_p - M'') + M''}
\]  
(3.59)

\[
k\sqrt{L_pl_s} = k''\sqrt{L_pL''_s}
\]  
(3.60)

\[
k'' = k\sqrt{\frac{L''_s}{L_s}} = k\sqrt{\frac{L_s}{L_s - L_{ss}}}
\]  
(3.61)

Similarly, in the parallel equivalent of Fig. 3.14a, where

\[
L'_p || L_{pp} = L_p
\]  
(3.62)
the new transformer has the same Z-matrix as the reference transformer if

\[
\frac{M}{(L_P - M) + M} = \frac{M'}{(L'_P - M') + M'}
\]  

(3.63)

\[
k\sqrt{L_p L_S} = \frac{k'}{L'_p L_S} \quad \frac{k'}{L'_p L_S} = k\sqrt{\frac{L_{PP}}{L_{PP} - L_P}}
\]  

(3.64)

(3.65)

Figure 3.15: Example of equivalent transformer topologies

To show the effectiveness of this method, consider the example in Fig. 3.15. It is difficult to achieve a coupling factor of 0.5 when the secondary inductance is twice the primary inductance. As suggested by [29], the inductance of spiral inductors is directly related to the length of the inductor, typically around 1pH/um as a rule of thumb.

On the other hand, the coupling factor is usually determined by the mutual inductance of the parallel legs of each inductor. As you can see in Fig. 3.16b, the mutual inductance decreases as the two inductors move apart because the length of the overlapping side metals decreases. Note that the coupling through the lateral runs opposes the end legs. Fig. 3.16a
shows the exact offset at which they cancel each other. Beyond this point, the lateral coupling is negligible, and the coupled current flows in the opposite direction. Intuitively, higher mutual inductance can be achieved by using larger loops for the primary or secondary. However, this is accompanied by higher inductance for the corresponding loop, which can be compensated by adding series or parallel inductance. Fig. 3.17 shows simulation results demonstrating the effectiveness of this technique to increase the effective coupling factor.

### 3.3 High Quality-Factor Inductors

As described in the previous section, high-quality inductors are required to minimize the insertion loss of the amplifier. Most inductors are designed as a single-turn loop in the millimeter-wave range to achieve a high self-resonance frequency (SRF). Assuming that a single-turn inductor can be modeled as a lossy transmission line, its impedance can be described as

\[
Z_L = Z_0 \frac{1 - e^{-2\gamma d}}{1 + e^{-2\gamma d}} \tag{3.66}
\]

where \(\gamma\) is the propagation constant and \(d\) is the length of the transmission line. The propagation constant can be written as

\[
\gamma = \sqrt{(i\omega L' + R')(i\omega C' + G')} \tag{3.67}
\]

\[
\approx i\omega \sqrt{L'C'} \left(1 + \frac{R'}{2i\omega L'} + \frac{G'}{2i\omega C'}\right) \tag{3.68}
\]
where $L'$, $R'$, $C'$ and $G'$ are respectively the inductance, series resistance, capacitance and shunt conductance per unit length. Note that the second approximation applies only to low-loss structures. The quality factor of the inductance is

$$Q_L = \frac{\text{Im}\{Z_L\}}{\text{Re}\{Z_L\}}$$

(3.69)

$$\approx \frac{2e^{-\sqrt{L'C'}}d\left(\frac{R'}{L'} + \frac{G'}{C'}\right)}{1 - e^{-2\sqrt{L'C'}d\left(\frac{R'}{L'} + \frac{G'}{C'}\right)}} \sin(2d\omega \sqrt{L'C'})$$

(3.70)

Note that the maximum inductance is reached when $2d\omega \sqrt{L'C'} = \frac{\pi}{2}$ and the peak quality factor is

$$Q_{L,max} \approx \frac{1}{\sqrt{L'C'd\left(\frac{R'}{L'} + \frac{G'}{C'}\right)}}$$

(3.71)

Note that the peak quality factor is only a function of the length of the inductor. The inductance of the loop can be calculated as follows

$$\text{Im}\{Z_L\} \approx Z_0 \frac{2e^{-\sqrt{L'C'}d\left(\frac{R'}{L'} + \frac{G'}{C'}\right)}}{1 + 2e^{-\sqrt{L'C'}d\left(\frac{R'}{L'} + \frac{G'}{C'}\right)} \cos(2d\omega \sqrt{L'C'}) + e^{-2\sqrt{L'C'}d\left(\frac{R'}{L'} + \frac{G'}{C'}\right)}}$$

(3.72)

For the peak quality factor, the inductance of the loop seems to be independent of the frequency and equal to

$$\text{Im}\{Z_L\}|_{2d\omega \sqrt{L'C'} = \frac{\pi}{2}} \approx Z_0$$

(3.73)

The definition of characteristic impedance is not clear here. Note that as the loop diameter increases, the characteristic impedance also increases. The consequence of this trend is that a higher optimum inductance can be expected in a lower frequency range. However, as the frequency decreases, the conductivity of the substrate ($\sigma$) dominates over its permittivity, as

$$\epsilon_c(\omega) = \epsilon_r \epsilon_0 - i\frac{\sigma}{\omega}$$

(3.74)

and therefore, the transmission line model in this section resembles a differential microstrip line. For a low-doped silicon with a conductivity of $10\Omega^{-1}m^{-1}$, this transition occurs around 15GHz. Since the operating frequency of this work is much higher than 15GHz, a quasi-TEM wave is considered for the twinstrip line ([30]). The characteristic impedance of a homogeneous twinstrip line can be approximately calculated as follows

$$Z_{Twin} \approx \sqrt{\mu \frac{1}{\epsilon \epsilon_r \pi} \cosh^{-1} \left(1 + \frac{S}{W}\right)}$$

(3.75)
Here $S$ is the distance between the strips and $W$ is the width of each strip. Note that the above equation can be approximated as follows when the distance between the strips is much larger than the width of each strip:

\[ Z_{\text{Twin}} \approx \sqrt{\frac{\mu}{\epsilon_\epsilon_r}} \frac{1}{\pi} \ln \left( 1 + \frac{S}{W} \right) \]  

(3.76)

This means that the characteristic impedance becomes a weak function of the spacing. Calculating the relative dielectric constant requires conformal mapping, which is beyond the scope of this chapter. Instead, an average dielectric constant of the silicon and interlayer dielectric can be considered. Simulation results show that the optimal reactance depends to some extent on the width of the inductor but is relatively independent of the loop diameter. However, decreasing the width of the inductor may increase the optimal impedance at the expense of a lower quality factor. Fig. 3.18 summarizes the simulation results for different inductor widths.

![Graph showing inductor reactance and quality factor](image)

Figure 3.18: Optimal inductors at different frequencies

The conclusion is that the designer should know the range of optimum reactances when high-quality inductors are required. Since transformers consist of coupled inductors, the same argument applies to them. Transformer equivalents should be used if primary and secondary inductors deviate from the optimum reactances.

### 3.4 Low Noise Active Balun

Conventionally, passive baluns (Fig. 3.19a) are used to convert single-ended signals coming from the antenna into differential signals before passing them to low-noise differential amplifiers. These passive baluns are lossy and contribute to a noise figure of about 2dB. As an alternative, single-ended LNAs can be used that do not require conversion of single-ended to differential signals, saving about 2dB of noise degradation. However, electromagnetic modeling of single-ended amplifiers is complicated, and designers tend to worry about the
possibility of oscillations due to unpredictable instabilities. Therefore, despite the merits of single-ended LNAs, most mmWave LNAs are preceded by a passive balun. As mentioned earlier, the minimum achievable noise measure does not change when more active stages are added. Therefore, it can be assumed that using a common-source stage in parallel with a common-gate stage (as in Fig. 3.19b) will still achieve the minimum noise measure for each stage. First, the minimum noise figure of Fig. 3.20a should be examined.

As you can see in Fig. 3.20b, the minimum achievable noise figure changes for different inductive terminations. More importantly, the minimum noise figure peaks for the inductive loads that resonate with the output capacitance of the active balun. This dilemma can be investigated using noise measure theory. The simulated 3-port Y-parameters for a post-
extraction core are

\[
Y_A = \begin{bmatrix}
0.0144 + 0.0173i & -0.0002 - 0.0027i & -0.0012 - 0.0032i \\
0.0118 - 0.0043i & 0.0014 + 0.0066i & -0.0000 + 0.0000i \\
-0.0132 - 0.0016i & 0.0000 + 0.0000i & 0.0014 + 0.0060i \\
\end{bmatrix}
\]  \hspace{1em} (3.77)

and the correlation matrix is

\[
N_C = \begin{bmatrix}
0.0106 + 0.0000i & -0.0003 + 0.0010i & -0.0094 - 0.0010i \\
-0.0003 - 0.0010i & 0.0097 + 0.0000i & -0.0000 - 0.0000i \\
-0.0094 + 0.0010i & -0.0000 + 0.0000i & 0.0097 + 0.0000i \\
\end{bmatrix}
\]  \hspace{1em} (3.78)

The characteristic noise matrix can be calculated as

\[
N = \begin{bmatrix}
-0.8017 + 0.0887i & -0.8490 - 0.0971i & 0.8490 + 0.0971i \\
3.4938 - 0.0987i & -3.3895 - 0.0929i & -3.6183 + 0.0929i \\
2.7123 + 0.1871i & -4.4671 - 0.0042i & -2.5406 + 0.0041i \\
\end{bmatrix}
\]  \hspace{1em} (3.79)

which has three eigenvalues

\[
\lambda_{1,2,3} = \{-7.01, -0.29, 0.566\}
\]  \hspace{1em} (3.80)

The smallest positive eigenvalue (\(\lambda_3\)) determines the minimum noise measure of this architecture. As expected, the minimum noise measure remains the same as a single transistor. The eigenvectors can be calculated as

\[
V_{\lambda_1} = \begin{bmatrix}
0.00 - 0.00i \\
0.71 + 0.00i \\
0.71 + 0.00i \\
\end{bmatrix}
\]  \hspace{1em} (3.81)

\[
V_{\lambda_2} = \begin{bmatrix}
0.77 + 0.00i \\
0.16 + 0.07i \\
0.61 - 0.07i \\
\end{bmatrix}
\]  \hspace{1em} (3.82)

\[
V_{\lambda_3} = \begin{bmatrix}
0.60 + 0.07i \\
-0.18 + 0.07i \\
0.78 + 0.00i \\
\end{bmatrix}
\]  \hspace{1em} (3.83)

Note that to achieve the minimum noise measure

\[
y_s \begin{bmatrix}
1 \\
0 \\
0 \\
\end{bmatrix}^H \begin{bmatrix}
0.60 + 0.07i \\
-0.18 + 0.07i \\
0.78 + 0.00i \\
\end{bmatrix}^* = \beta \begin{bmatrix}
0 \\
1 \\
-1 \\
\end{bmatrix} - (Y_P + Y_A)^T \begin{bmatrix}
0.60 + 0.07i \\
-0.18 + 0.07i \\
0.78 + 0.00i \\
\end{bmatrix}^*
\]

which can be simplified as

\[
y_s \begin{bmatrix}
0.60 - 0.07i \\
0 \\
0 \\
\end{bmatrix} = \beta \begin{bmatrix}
0 \\
1 \\
-1 \\
\end{bmatrix} - \begin{bmatrix}
-0.0029 + 0.0081i \\
-0.0002 - 0.0028i \\
0.0002 + 0.0028i \\
\end{bmatrix} - Y_P \begin{bmatrix}
0.60 - 0.07i \\
-0.18 - 0.07i \\
0.78 + 0.00i \\
\end{bmatrix}
\]  \hspace{1em} (3.85)
which assumes a reciprocal and symmetric peripheral network that

\[
Y_P = \begin{bmatrix}
  y_{11} & y_{12} & y_{12} \\
  y_{12} & y_{22} & y_{23} \\
  y_{12} & y_{23} & y_{22}
\end{bmatrix}
\]  

(3.86)

In the absence of a direct path from the input to any of the outputs \( y_{12} = 0 \), obtaining the minimum noise measure requires that

\[ y_{22} = -y_{23} \]  

(3.87)

In other words, the passive network should be purely differential at the output. If the symmetry is broken (e.g., by asymmetric passive components or CS and CG stages with different transconductance), the above condition is no longer valid. To prove the hypothesis, we examine the topology of Fig. 3.21a. As you can see in Fig. 3.21b, the common-mode termination indeed changes the differential performance. The critical observation is that the common-mode output impedance of the peripheral network itself should be high to achieve the minimum noise measure. As you can see in Fig. 3.21b, the minimum noise figure increases as the inductance values resonate with the common-mode capacitance of the core transistors.

Note that the optimum source impedance is achieved when

\[ y_s + y_{11} = 0.0064 - 0.0130i \]  

(3.88)

which has a low quality factor, allowing a low-loss and wideband input matching.

It can be observed that the optimal source is approximately \( \frac{1}{2g_m} \). Recall that in the transistor model of Fig. 2.6

\[
\begin{bmatrix}
  i_G \\
  i_D
\end{bmatrix} = Y_{CS} \begin{bmatrix}
  v_G - v_S \\
  v_D - v_S
\end{bmatrix}
\]  

(3.89)
CHAPTER 3. 140GHZ RECEIVER DESIGN

\[ Y_{CS} = \frac{1}{R_g(C_\pi + C_\mu)S + 1} \begin{bmatrix} (C_\pi + C_\mu)S & -C_\mu S \\ g_m - C_\mu S & (R_g C_\pi S + g_m R_g + 1) C_\mu S \end{bmatrix} \]  \hspace{1cm} (3.90)

where \( Y_{CS} \) is the Y-parameter of a common-source topology. The Y-parameters of a common-gate topology can be easily computed if we note that

\[ \begin{bmatrix} i_S \\ i_D \end{bmatrix} = \begin{bmatrix} -1 & -1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} i_G \\ i_D \end{bmatrix} \]  \hspace{1cm} (3.91)

and

\[ \begin{bmatrix} v_S - v_G \\ v_D - v_G \end{bmatrix} = \begin{bmatrix} -1 & 0 \\ -1 & 1 \end{bmatrix} \begin{bmatrix} v_G - v_s \\ v_D - v_S \end{bmatrix} \]  \hspace{1cm} (3.92)

Therefore,

\[ \begin{bmatrix} i_S \\ i_D \end{bmatrix} = \begin{bmatrix} -1 & -1 \\ 0 & 1 \end{bmatrix} Y_{CS} \begin{bmatrix} -1 & 0 \\ -1 & 1 \end{bmatrix}^{-1} \begin{bmatrix} v_S - v_G \\ v_D - v_G \end{bmatrix} \]  \hspace{1cm} (3.93)

which means that the Y-parameter of a common-gate topology can be computed as

\[ Y_{CG} = \begin{bmatrix} -1 & -1 \\ 0 & 1 \end{bmatrix} Y_{CS} \begin{bmatrix} -1 & 0 \\ -1 & 1 \end{bmatrix}^{-1} \]  \hspace{1cm} (3.94)

The Y-parameters of the active balun with a CS and a CG stage can be calculated as

\[ Y_{CSCG} = \begin{bmatrix} y_{CG_{11}} + y_{CS_{11}} & y_{CG_{12}} & y_{CS_{12}} \\ y_{CG_{21}} & y_{CG_{22}} & 0 \\ y_{CS_{21}} & 0 & y_{CS_{22}} \end{bmatrix} \]  \hspace{1cm} (3.95)

which corresponds to the following formula

\[ \begin{bmatrix} i_{in} \\ i_P \\ i_M \end{bmatrix} = Y_{CSCG} \begin{bmatrix} v_{in} \\ v_P \\ v_M \end{bmatrix} \]  \hspace{1cm} (3.96)

Since the optimum noise measure requires the correct choice of common-mode and differential-mode impedance, the above equation can be modified as follows

\[ \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & -1 & 1 \end{bmatrix} \begin{bmatrix} i_{in} \\ i_{DM} \\ i_{CM} \end{bmatrix} = Y_{CSCG} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & -1 & 1 \end{bmatrix} \begin{bmatrix} v_{in} \\ v_{DM} \\ v_{CM} \end{bmatrix} \]  \hspace{1cm} (3.97)

which means

\[ Y_{CMDM} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & -1 & 1 \end{bmatrix}^{-1} Y_{CSCG} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 1 \\ 0 & -1 & 1 \end{bmatrix} \]  \hspace{1cm} (3.98)
CHAPTER 3. 140 GHz RECEIVER DESIGN

Assuming that the common-mode termination is high, the Y-parameters of the two-port can be calculated as

\[ Y_{DM} = \left( (Y_{CMDM}^{-1})_{[1;1,2]} \right)^{-1} \]  

(3.99)

Using a symbolic math package,

\[ Y_{DM} = \frac{1}{R_g(C_\pi + C_\mu)s + 1} \times \begin{bmatrix} g_m + 2C_\pi s + \frac{C_\mu s(C_\pi R_g s + R_g g_m - 1)}{2} \\ g_m + \frac{C_\mu s(C_\pi R_g s + R_g g_m + 1)}{2} \end{bmatrix} \]  

(3.100)

Since the output resistance of the devices is neglected, the output impedance at DC approaches infinity. This means that a high passive impedance transformation is required for simultaneous conjugate input-output matching, which increases the noise figure due to the loss of the matching network. However, as the frequency increases, a lower impedance transformation ratio is required. Fig. 3.22 shows the optimum source impedance for the minimum noise figure and maximum available power gain.

![Figure 3.22: Optimum source reflection in different cases](image)

Compared to the input impedance for a matched load, Fig. 3.23 shows that the noise of the optimal source impedance approaches the matched condition as the frequency increases. The critical observation here is that \( R_S = \frac{1}{g_m} \) is not the correct choice of transistor conductance despite the low-frequency case. Simulation is required to find the optimal conductance for a given frequency.
Once the bias circuit is included, the symmetry of the circuit is partially broken due to the body effect of the CG stage and the voltage division caused by the capacitive coupling of the CS stage. Therefore, it is important to make the layout as symmetrical as possible. Once the DC circuit is implemented, the Y parameters are extracted to find the optimal common-mode impedance. As shown in Fig. 3.25, the matching network connected to the output of the active balun should have a common mode inductance of about 400pH.

So far, the limited quality factor of the matching network has not been considered. A Python script has been written to establish a connection between the EM simulator and the circuit simulator. The performance of the LNA is calculated for several different matching networks. Fig. 3.26 shows the final implemented matching networks. Fig. 3.27 shows the input impedance of the matching network. Note that the differential inductance is relatively constant, while the common-mode inductance increases at about 155GHz and then goes to negative values. Note that based on Fig. 3.25, even negative values of inductance can potentially degrade the performance of the LNA.

Fig. 3.28 shows the performance of the active balun developed here. It has a power gain of 2.25dB and an insertion loss of 3.4dB due to the matching network.

### 3.5 Interstage Amplifiers

Once the active balun generates the differential signal, it is passed through dummy-neutralized pseudo-differential CS amplifiers shown in Fig. 3.29a. The dummy device uses high threshold voltage transistors to ensure that the channel has the highest resistance. In the capacitively

---

3Which means that the self-resonance frequency of the differential mode is high.
neutralized amplifier of Fig. 3.29b, the reverse conductance can be calculated as

\[
y_{12,C} = \frac{C_{gs}s}{1 + R_g(C_{gs} + C_{gd})s} - C_n s
\]

\[
\approx C_{gd}s \left(1 - \frac{C_n}{C_{gd}}\right) - R_g(C_{gs} + C_{gd})s
\]
While it is effective at low frequencies when $C_n = C_{gd}$, the reverse conductance is limited by $-C_{gd}R_g(C_{gs} + C_{gd})s^2$ as the operating frequency increases. In practice, the neutralization capacitor is made from the back-end metallization layers, while the gate-drain capacitor is from the front-end metallization layers and transistors. Since they have completely different origins, they will not follow each other in process variations. This means that while this neutralization technique is effective in simulation, it is limited in practice. Let us now
CHAPTER 3. 140GHZ RECEIVER DESIGN

Figure 3.27: Differential and common-mode impedance of the matching network

Figure 3.28: Performance of the active balun with matching network

consider a dummy-neutralized topology from Fig. 3.29a with

\[ y_{12.D} = \frac{C_{gd}s}{1 + R_g(C_{gs} + C_{gd})s} - \frac{C_{gd}s}{1 + R_g(C'_{gs} + C_{gd})s} \]

\[ = \frac{R_g C_{gd}(C'_{gs} - C_{gs})s^2}{1 + R_g(C_{gs} + C_{gd})s} \]

\[ \approx C_{gd}s \left( R_g(C'_{gs} - C_{gs})s \right) \left( 1 - R_g(C_{gs} + C_{gd})s \right) \]

Note that \( C'_{gs} \) is the gate-source capacitance of the dummy device, while \( C_{gs} \) is the counterpart of the active device. Since the channel is not formed in the off device, the channel capacitance has a high series resistance that effectively blocks its action. Therefore, \( C'_{gs} \) is mainly the overlap and fringe capacitance of the front-end metallization. In the current
process, the simulation results show that $\frac{C_{gs}'}{C_{gs}} \approx 2$. Despite the constraint of $C_{gs}' - C_{gs}$, the dummy neutralization outperforms the capacitive neutralization topology as long as $R_g(C_{gs}' - C_{gs})s \ll 1$.

Note that a lossy matching network inevitably increases the noise measure [21]. However, it can potentially increase Mason’s unilateral gain [31]. As you can see in Fig. 3.32, the dummy-neutralized amplifier has a higher available gain and also a higher input quality factor $^4$. The higher quality factor means that the matching network can expect a higher insertion loss.

$^4$The quality factor is calculated based on the quality factor of the simultaneous input-output conjugate matching impedances.
Since dummy-neutralized amplifiers are less sensitive to process variations and accurate transistor models, it was preferred over the capacitive counterpart. Fig. 3.32 shows the performance of the core transistor. Note that \( f_{\text{max}} = 400\text{GHz} \) can be achieved with RC extraction. However, as soon as EMX is used, it decreases to \( f_{\text{max}} = 300\text{GHz} \). Fig. 3.31 shows the implementation of the core transistors.

Given the high input and output quality factors, the insertion loss of the matching network must be considered. It can be estimated assuming a quality factor of 20 for the
CHAPTER 3. 140GHz RECEIVER DESIGN

Figure 3.32: Performance of the amplifier core with RC-extraction and EMX inductors of the matching network. Fig. 3.32a contains this estimate. Fig. 3.33 shows the implementation of the interstage matching network. Using the top-most thick metal, a low-k transformer with quality factors greater than 20 is implemented.

Figure 3.33: Interstage transformer

The performance of the interstage amplifier is shown in Fig. 3.34. Each amplifier consumes 4mA of DC current. Note that by using low-k transformers \cite{32}, a relatively wide
bandwidth is achieved. Reducing the coupling factor could potentially result in higher bandwidth at the expense of lower gain. However, more complicated matching networks can increase the bandwidth of the amplifier without sacrificing gain.

![Graph showing performance of interstage amplifiers](image.png)

**Figure 3.34: Performance of interstage amplifiers**

### 3.6 I/Q Splitter

Once the input signal is amplified, it should be split into I and Q paths before corresponding mixers. The splitter is laid out as shown in Fig. 3.35b. Note that the effective coupling factor is the same as the coupling factor of the interstage amplifier. However, an impedance transformation from 1 to 2 is realized by using transformer equivalents.

An important technique used here is the inductive termination of the transmission line. As you can see in Fig. 3.36, the optimal termination impedance for long transmission lines is $Z_0$ for minimum insertion loss, while small transmission lines (smaller than $\frac{\lambda}{2}$) have optimal source and load impedances, similar to the optimal source and load impedances for an amplifier. Note that the layout of the chip here determines the length of the transmission line. The exact terminations on each side of the splitter are optimized in ADS to show a wideband Chebyshev response with the effective shunt capacitance of the transmission line. Fig. 3.37 delivers the performance of the I/Q splitter. Note that as shown in Fig. 3.1, another stage of the dummy-neutralized amplifier is used after the splitter to provide isolation between the I/Q mixers.
3.7 Mixer Design

In the earlier work [20], active mixers were used. The reason for this was the hypothesis that using an active mixer instead of a passive mixer may be beneficial when the operating frequency is less than half of $f_{\text{max}}$. It is also assumed that the conversion gain of a passive mixer is more sensitive to the LO swing [33]. However, these assumptions can be refuted as follows.

First, as the transistors in Fig. 3.39 switch between on and off states, their effective $g_m$ falls below the peak value at which $f_{\text{max}}$ can be reached. In other words, assuming a sharp switching behavior between on and off states in Fig. 3.39, at any time, two transistors are connected to the input RF while only one of them is active.

Second, the odd harmonic currents of the transistors do not produce a noticeable voltage swing when the impedance of the even harmonics at the common node of the mixer is low. Therefore, the switching behavior of active and passive mixers is similar. Quantitatively, in each transistor

$$I_{ds} \approx g_{m1}V_{gs} + g_{m2}V_{gs}^2 + g_{m3}V_{gs}^3 + \ldots$$  \hspace{1cm} (3.106)
Assuming that \( V_{gs} = V_{LO} \cos(\omega_{LO} t) + V_{in} \cos(\omega_{in} t + \phi) \), the current harmonics are generated at \( m\omega_{LO} + n\omega_{in} \). These harmonics are passed through \( Z_s \) and change the source voltage. When \( Z_s \approx 0 \Omega \) for all these harmonics, the source voltage remains constant. This is the case for most millimeter-wave mixers beyond \( \frac{f_t}{2} \). Assume that the source is tuned to the
fundamental frequency of $\omega_{LO}$,

$$\frac{V_s(\omega_{LO})}{V_s(2\omega_{LO})} \approx \frac{|I_{ds}(\omega_{LO})|}{|I_{ds}(2\omega_{LO})|} \frac{|Z_s(\omega_{LO})|}{|Z_s(2\omega_{LO})|}$$ \hspace{1cm} (3.107)

$$\approx \frac{|I_{ds}(\omega_{LO})|}{|I_{ds}(2\omega_{LO})|} \frac{Q_M \frac{1}{\omega_{LO}C_s}}{\frac{1}{2\omega_{LO}C_s}}$$ \hspace{1cm} (3.108)

$$\approx \frac{|I_{ds}(\omega_{LO})|}{|I_{ds}(2\omega_{LO})|} 2Q_M$$ \hspace{1cm} (3.109)

where $Q_M$ is the quality factor of the matching network and transistors at the fundamental frequency. Note that for devices with weak nonlinearity, the first term in the above equation is larger than 1. Therefore, despite the existence of harmonic currents, the harmonic voltages on the source side are negligible. This suggests that mixers should be operated with voltage sources rather than current sources when performing simulations to gain insight into the design space.

![Bias generation circuit for mixers](image)

Figure 3.38: Bias generation circuit for mixers

In the rest of this section, the mixer bias is illustrated by Fig. 3.38. In the active mixer, the drain nodes are connected to the supply through ideal current sources, while in the passive mixer, the drain nodes are disconnected from the supply to be biased in the triode region. This ensures that the only difference between the two mixers in the simulation environment is the DC voltage of the drain nodes.
CHAPTER 3. 140GHZ RECEIVER DESIGN

Figure 3.39: Current mode mixer

Current Mode Mixer

The current conversion efficiency can be defined as the current delivered to the load normalized by the real part of the current generated by the LNA

\[ \eta = \frac{|I_{\text{out}}|}{\Re(I_{\text{in}})} \quad (3.110) \]

Note that while the passive and active mixers have relatively similar performance, as shown in Fig. 3.41a and Fig. 3.41b, high transconductance is required for the TIA to avoid voltage division in the passive mixer. On the other hand, if active mixers are used, the flicker noise of the mixer adds directly to the output, which can degrade the noise figure.

Voltage Mode Mixer

Assume a square-law device in the triode region,

\[ I_{ds} = k' \left( (V_{gs} - V_{th}) - \frac{V_{ds}}{2} \right) V_{ds} \quad (3.111) \]
and therefore, assuming a small signal variation in the drain-source voltage.

\[ R_{ds} \approx \frac{1}{k' (V_{gs} - V_{th})} \]  \hspace{1cm} (3.112)
As for the voltage division, assuming that the LO swing is smaller than the threshold voltage \(V_{gs} > V_{th}\)

\[
G_{on} = \frac{R_{ds,off}}{R_{ds,on} + R_{ds,off}} = \frac{V_{od} + V_{LO}}{(V_{od} + V_{LO}) + (V_{od} - V_{LO})} = \frac{1}{2} \left( 1 + \frac{V_{LO}}{V_{od}} \right)
\]

Similarly, the voltage division in the off-state is

\[
G_{off} = \frac{1}{2} \left( 1 - \frac{V_{LO}}{V_{od}} \right)
\]

Assuming a sharp LO swing, when the LO signal is high

\[
V_{out} = V_{in} \times (G_{on} - G_{off}) = V_{in} \times \frac{V_{LO}}{V_{od}}
\]
and when the LO signal is low

\[ V_{\text{out}} = V_{\text{in}} \times (G_{\text{off}} - G_{\text{on}}) \] (3.119)

\[ = -V_{\text{in}} \times \frac{V_{\text{LO}}}{V_{\text{od}}} \] (3.120)

which means that the input signal \( V_{\text{in}}(t) \) is multiplied by \( \frac{V_{\text{LO}}}{V_{\text{od}}} \omega_{\text{LO}} \sin(\omega_{\text{LO}} t) \). Note that increasing the overdrive voltage decreases the conversion gain. The maximum gain can be reached when the LO swing magnitude reaches the overdrive voltage. At higher LO swings, the off switch has a very low conductance, and the gain remains constant.

Note that when \( V_{\text{od}} \) is decreased, the bandwidth decreases despite the improvement in conversion gain. This is because the on-resistance of the switch increases, reducing the ability of the switch to drive load capacitors. So there is a tradeoff between the gain and bandwidth of mixers.

It should be noted that the maximum conversion gain of the passive mixer can be higher than \( \frac{2}{\pi} \) because as the LO swing increases beyond the overdrive voltage, the conduction angle decreases, and the mixer becomes more similar to a sample-and-hold circuit. If \( V_{\text{LO}} \leq V_{\text{od}} \), the equivalent Thevenin voltage source is

\[ V_{\text{out}}(t) = V_{\text{in}}(t) \times \frac{V_{\text{LO}}}{V_{\text{od}}} \sin(\omega_{\text{LO}} t) \] (3.121)

If \( V_{\text{LO}} \gg V_{\text{od}} \), the Thevenin equivalent voltage source can be approximated as

\[ V_{\text{out}}(t) \approx V_{\text{in}}(t) \Pi(\omega_{\text{LO}} t) \] (3.122)

However, in the presence of the sampling capacitor, the resistance of the Thevenin equivalent source charging the sampling capacitor should also be considered. As shown in Fig. 3.44, if the \( V_{\text{od}} \to 0 \), the conduction time of the Thevenin equivalent resistor decreases. Therefore, the modulated input signal is first sampled and then held. This additional sampling mechanism downconverts the upconverted spectral content of the signal and increases the theoretical conversion gain of the passive mixer to 0dB.

As you can see in Fig. 3.43a, the gain-bandwidth product of the active and passive mixers in the voltage mode is quite similar. Assuming that the mixer has less than 1dB attenuation at the edge of the desired bandwidth, the 3dB bandwidth should be twice the desired baseband bandwidth since

\[ \frac{1}{1 + \left( \frac{f_{3\text{dB}}}{f_{3\text{db}}^*} \right)^2} = 10^{-1} \Rightarrow f \approx \frac{f_{3\text{dB}}}{2} \] (3.123)

A passive mixer is used to reduce the power consumption of the array elements. It also has lower flicker noise compared to its active counterpart.
Figure 3.43: Comparison of active and passive mixers in voltage mode with different peak-to-peak differential LO swings

Since passive mixers are reciprocal, the input impedance of the mixer should be investigated. [34] provides an excellent mathematical framework for calculating the input impedance. The current can be calculated as

\[ I_{in}(t) = \Pi(\omega_{LO}t)S(2\omega_{LO}t) [V_{in}(t)\Pi(\omega_{LO}t) * \{Y_L(t), S(2\omega_{LO}\tau)\}] \]  

(3.124)

where \( \{Y_L(t), S(2\omega_{LO}\tau)\} \) is the current response of the system at time \( t \) to an impulse voltage at time \( \tau \).
CHAPTER 3. 140GHZ RECEIVER DESIGN

Figure 3.44: Equivalent Thevenin source used in the mixer model

\[
\{ Y_L(t), S(2\omega_{LO} \tau) \} = \begin{cases} 
0, & \text{if } S(2\omega_{LO} \tau) = 0 \\
\frac{\delta(t)}{R_{on}} - \frac{\int_0^t S(2\omega_{LO} x)\,dx}{R_{on} C_S} e^{-\frac{\alpha t}{R_{on} C_S}}, & \text{if } S(2\omega_{LO} \tau) = 1 
\end{cases} 
\] (3.125)

By algebraic manipulation, the input current can be written as

\[
I_{in}(t) = \Pi(\omega_{LO} t) S(2\omega_{LO} t) [V_{in}(t) \Pi(\omega_{LO} t) S(2\omega_{LO} t) \ast Y'_L(t)] 
\] (3.126)

where

\[
Y'_L(t) = \frac{\delta(t)}{R_{on}} - \frac{\int_0^t S(2\omega_{LO} x)\,dx}{R_{on} C_S} 
\] (3.127)

Approximating the integral part in the exponential decay with its continuous-time equivalent simplifies the above equation into

\[
Y'_L(t) \approx \frac{\delta(t)}{R_{on}} - \frac{\alpha t}{R_{on} C_S} 
\] (3.128)

where \( \alpha \) is the mean term of \( S(2\omega_{LO} t) \). In the Laplace domain, \( Y'_L(s) \) can be written as

\[
Y'_L(s) = \left[ R_{on} || (-\alpha R_{on} - R_{on}^2 C_S s) \right]^{-1} 
\] (3.129)
which can be divided into an all-pass and a low-pass section, as in Fig. 3.45. Since Eq. 3.126 is a linear equation, $I_{in}(t)$ can be calculated as the response to each section. Note that the response to the all-pass section can be calculated simply as

$$I_{in, all-pass}(t) = \Pi(\omega_{LO}t)S(2\omega_{LO}t) \left[ \frac{V_{in}(t)\Pi(\omega_{LO}t)S(2\omega_{LO}t) \ast \delta(t)}{R_{on}} \right]$$

(3.130)

$$= \frac{V_{in}(t)}{R_{on}}S(2\omega_{LO}t)^2$$

(3.131)

$$\approx \frac{\alpha^2 V_{in}(t)}{R_{on}}$$

(3.132)

where the last approximation ignores the harmonics of $S(2\omega_{LO}t)$. Computing the response to the low-pass section is more complicated. In the Laplace domain

$$I_{in, low-pass}(s) = \Pi(s) \ast S(s) \ast \left[ (V_{in}(s) \ast \Pi(s) \ast S(s)) \times Y'_{L, low-pass}(s) \right]$$

(3.133)

Assuming that the high-frequency current of the low-pass is negligible

$$I_{in, low-pass}(s) \approx \left( \frac{2}{\pi} \right)^2 \alpha^2 V_{in}(s) Y'_{L, low-pass}(s')$$

(3.134)

where $s' = j|\omega - \omega_{LO}|$. Note that for the frequency range outside the baseband bandwidth,

$$R_{in, out-of-band} \approx \alpha^2 R_{on}$$

(3.135)

and for the frequency range within the baseband bandwidth,

$$R_{in, in-band} \approx \alpha^2 \frac{R_{on}}{1 - \left( \frac{2}{\pi} \right)^2 \frac{1}{\alpha}}$$

(3.136)

Fig. 3.46 shows the ratio between the in-band input resistance and the out-of-band input resistance. Note that at low LO swings, the assumption of hard switching of the mixer does not hold, and the above model breaks. Both the on and off switches are somewhat conductive in this case, resulting in dissipative behavior with no conversion gain. As the LO swing continues to increase, the hard switching becomes more realistic, and with $\alpha \approx 1$ exactly at the conversion gain of $\frac{2}{\pi}$,

$$\frac{R_{in, in-band}}{R_{in, out-of-band}} \approx \frac{1}{1 - \left( \frac{2}{\pi} \right)^2}$$

(3.137)

As the LO swing continues to increase, $R_{on}$ decreases. However, this lower value of $R_{on}$ is reached for a shorter time, i.e., $\alpha < 1$. Thus, the ratio of the two resistors increases. Therefore, despite its existence, the baseband capacitor is not visible in the RF domain, and
Figure 3.46: Comparison of the input resistance of the passive mixers for in-band and out-of-band tones. The dashed portion of each line shows the region where the gain falls below $\frac{1}{\sqrt{2}}$.

Figure 3.47: Performance of the mixer and its preceding gain stage

The passive mixer exhibits an input impedance with a relatively low quality factor. Note that the gate capacitance and the parasitic elements of the layout increase the quality factor.

Fig. 3.47b shows the performance of the mixer when driven with 660mV LO swing. The DC bias of switches is generated by a current mirror biased at a current density of $100\mu A/\mu m$. While the conversion gain itself has a high bandwidth, the overall conversion

\footnote{This requires that the baseband bandwidth to be much smaller than the carrier frequency, a condition that does not hold for wideband communication links.}
gain has a limited bandwidth of $2 \times 7\text{GHz}$ due to an error in the matching network of the previous buffer stage. The problem with the matching network is illustrated in Fig. 3.47a. While the AC voltage gain at the output of the amplifier is broadband, the low coupling factor of the transformer reduces the overall bandwidth at the input of the mixer. Note that a 3dB attenuation budget cannot be used exclusively in the mixer since it is cascaded with the rest of the chain. Fortunately, my colleague Ethan Chou caught this error on the second tapeout and corrected it.

![Figure 3.48: Mixer implementation](image)

### 3.8 Baseband Amplifier

The previous millimeter-wave transceiver used a Cherry-Hooper amplifier [35] (Fig. 3.49). Despite its broadband performance and high gain, it had some problems:
• The amplifiers were designed as pseudo-differential stages. This topology is prone to common-mode noise since any unwanted coupling (from surrounding circuits or the supply network) goes through a high-gain amplification chain. Although the output is differential, the common-mode noise may saturate the intermediate blocks, resulting in a low differential gain.

• Since the amplifiers themselves are self-biased, the current consumption of each stage is highly process-dependent.

It should be noted that the implemented Cherry-Hooper amplifier still consists of multiple cascaded amplifiers. Therefore, it is instructive to investigate why and how a Cherry-Hooper outperforms cascaded amplifiers. Note that if each stage has a simple first-order frequency response ([36])

\[ A(s) = \frac{A_0}{1 + \frac{s}{\omega_0}} \]  

the bandwidth for \( \alpha \) attenuation is

\[ 1 + \left( \frac{\omega}{\omega_0} \right)^2 = \alpha^{-1} \Rightarrow BW = \omega_0\sqrt{\alpha^{-1} - 1} \]  

and therefore the gain-bandwidth product \( GBW_0 = A_0\omega_0\sqrt{\alpha^{-1} - 1} \). By cascading \( N \) of these stages one obtains

\[ A_N(s) = \frac{A_0^N}{\left(1 + \frac{s}{\omega_0}\right)^N} \]

which corresponds to a bandwidth of

\[ \left( 1 + \left( \frac{\omega}{\omega_0} \right)^2 \right)^N = \alpha^{-1} \Rightarrow BW_N = \omega_0\sqrt{\alpha^{-N} - 1} \]
and thus $GBW_N = A_0^N \omega_0 \sqrt{\alpha^{-\frac{1}{N}} - 1}$. The gain expansion can be calculated as

$$\frac{GBW_N}{GBW_0} = \frac{A_0^N \omega_0 \sqrt{\alpha^{-\frac{1}{N}} - 1}}{A_0 \omega_0 \sqrt{\alpha^{-1} - 1}} = (A_0^N)^{1-\frac{1}{N}} \sqrt{\frac{\alpha^{\frac{1}{N}} - 1}{\alpha^{-1} - 1}} \approx (A_0^N)^{1-\frac{1}{N}} \sqrt{\frac{\alpha^{\frac{1}{N}} - 1}{\alpha^{-1} - 1}}$$

Note that the maximum bandwidth expansion occurs at

$$\frac{\partial GBW_N}{\partial N} = 0 \Rightarrow N_{opt} = \frac{-\ln(\alpha)}{\ln(2 \ln(A_{tot}) + \ln(\alpha))} \approx 2 \ln(A_{tot})$$

where the last approximation works at a high total gain and less than 3dB attenuation. Given the optimal number of stages, the optimal gain per stage can be easily calculated as $A_{opt} = \sqrt{e}$.

Now assume that each stage has a maximally flat $M$-th order Butterworth frequency response,

$$|A(\omega)| = \frac{A_0}{\sqrt{1 + \left(\frac{\omega}{\omega_0}\right)^{2M}}}$$

and its bandwidth for $\alpha$ attenuation is defined as

$$1 + \left(\frac{\omega}{\omega_0}\right)^{2M} = \alpha^{-1} \Rightarrow BW = \omega_0^{2M} \alpha^{-1} - 1$$

and thus $GBW_0 = A_0 \omega_0^{2M} \alpha^{-1} - 1$. Cascading the same $N$ of such amplifiers results in a new amplifier with a frequency response of

$$|A_N(\omega)| = \frac{A_0^N}{\left(\sqrt{1 + \left(\frac{\omega}{\omega_0}\right)^{2M}}\right)^N}$$

The bandwidth of the new amplifier can be defined as

$$\left(1 + \left(\frac{\omega}{\omega_0}\right)^{2M}\right)^N = \alpha^{-1} \Rightarrow BW_N = \omega_0^{2M} \sqrt{\alpha^{-\frac{1}{N}} - 1}$$

As before, the gain-bandwidth expansion can be defined as

$$\frac{GBW_N}{GBW_0} = A_{tot}^{1-\frac{1}{N}} \frac{\omega_0^{2M} \sqrt{\alpha^{-\frac{1}{N}} - 1}}{\omega_0^{2M} \alpha^{-1} - 1}$$
which peaks at
\[
\frac{\partial \text{GBW}_{N,F_{\text{tot}}}}{\partial N} = 0 \Rightarrow N_{\text{opt}} = \frac{- \ln(\alpha)}{\ln(2^{2M/\ln(A_{\text{tot}})+2M \ln(F_{\text{tot}})+ln(\alpha)})} \approx 2M \ln(A_{\text{tot}}) \tag{3.151}
\]
and the optimal gain per stage is \( A_{M,\text{opt}} = \sqrt{e}. \)

When a total fan out of \( F_{\text{tot}} \) from input to output is required, the gain bandwidth expansion can be calculated as
\[
\frac{\text{GBW}_{N,F_{\text{tot}}}}{\text{GBW}_0} = A_{\text{tot}} 1 - \frac{1}{N} - \frac{1}{2M} \ln(\frac{A_{\text{tot}}}{\alpha}) - \frac{1}{F_{\text{tot}}^{\frac{1}{N}}} \tag{3.152}
\]
where it is assumed that the natural frequency of each stage scales with \( \frac{1}{F_{\text{tot}}^{\frac{1}{N}}} \). To find the optimal number of stages
\[
\frac{\partial \text{GBW}_{N,F_{\text{tot}}}}{\partial N} = 0 \Rightarrow N_{\text{opt}} = \frac{- \ln(\alpha)}{\ln(2^{2M/\ln(A_{\text{tot}})+2M \ln(F_{\text{tot}})+ln(\alpha)})} \approx 2M \ln(A_{\text{tot}}) + 2M \ln(F_{\text{tot}}) \tag{3.153}
\]
The optimal gain per stage and fan-out per stage can be calculated as
\[
A_{\text{opt},F_{\text{tot}}} = \left( \sqrt{e} \right)^{M(1+\frac{1}{\ln(F_{\text{tot}})})} \tag{3.154}
\]
\[
F_{\text{opt}} = \left( \sqrt{e} \right)^{M(1+\frac{1}{\ln(F_{\text{tot}})})} \tag{3.155}
\]

Note that the low gain of each stage requires multiple stages in the optimal case. Consider the power of \( N \) stages for the gain-bandwidth expansion,
\[
\text{Power}_{\text{DC},N} \propto 1 + \left( F_{\text{tot}}^{\frac{1}{N}} \right) + \left( F_{\text{tot}}^{\frac{1}{N}} \right)^2 + \ldots + \left( F_{\text{tot}}^{\frac{1}{N}} \right)^{N-1} \tag{3.156}
\]
\[
\propto \frac{F_{\text{tot}} - 1}{F_{\text{tot}}^{\frac{1}{N}} - 1} \tag{3.157}
\]
Therefore, we can define the efficiency of the expansion as
\[
\eta = \frac{\text{GBW}_{N,F_{\text{tot}}}}{\text{GBW}_0} \frac{\text{Power}_{\text{DC},1}}{\text{Power}_{\text{DC},N}} = A_{\text{tot}} 1 - \frac{1}{N} - \frac{2M\sqrt{\alpha}}{\sqrt{\alpha - 1}} - 1 - \frac{1}{2M} \ln(\frac{A_{\text{tot}}}{2M \ln(F_{\text{tot}})+ln(\alpha)}) \frac{F_{\text{tot}}^{\frac{1}{N}} - 1}{F_{\text{tot}} - 1} \tag{3.158}
\]
Assuming a low fan-out per stage and a high number of stages
\[
\text{Power}_{\text{DC}} \propto \frac{F_{\text{tot}} - 1}{\ln(F_{\text{tot}})} N \tag{3.159}
\]
which means that

\[ \eta \approx A_{tot}^{1 - \frac{1}{N}} \frac{2^M}{\sqrt[2^M]{\alpha^{\frac{1}{N}} - 1}} \frac{1}{F_{\text{tot}}^{\frac{1}{N}}} \frac{\ln(F_{\text{tot}})}{1 - N} \]  

(3.160)

To find the optimal efficiency,

\[ \frac{\partial \eta}{\partial N} = 0 \Rightarrow 2 \ln(A_{tot})M + 2M \ln(F_{\text{tot}}) = 2MN_{\text{opt}} + \frac{\ln(\alpha)}{\alpha^{\frac{1}{N_{\text{opt}}}} - 1} \]  

(3.161)

To simplify the answer, note that

\[ \lim_{\alpha \to 1} \frac{\ln(\alpha)}{\alpha^{\frac{1}{N_{\text{opt}}}} - 1} = N_{\text{opt}} \]  

(3.162)

and therefore,

\[ N_{\text{opt,Power}} \approx \frac{2 \ln(A_{tot})M + 2M \ln(F_{\text{tot}})}{2M + 1} \]  

(3.163)

\[ A_{\text{opt,Power,F_{\text{tot}}}} = e^{\frac{M + \frac{1}{2} \ln(A_{\text{tot}})}{\ln(A_{\text{tot}})}} \]  

(3.164)

\[ F_{\text{opt,Power}} = e^{\frac{M + \frac{1}{2} \ln(F_{\text{tot}})}{\ln(F_{\text{tot}})}} \]  

(3.165)

Figure 3.50: Simplified model of the Cherry-Hooper topology

Let us now analyze a simple Cherry-Hooper design from Fig. 3.50

\[ V_{out}C_2 + g_{m2}V_{in} = \frac{V_X - V_{out}}{R_f} \]  

(3.166)

\[ -g_{m1}V_{in} = \frac{V_X C_1 + \frac{V_X - V_{out}}{R_f}}{g_{m2}} \]  

(3.167)
and the gain can be calculated as

\[ \frac{V_{out}}{V_{in}}(s) = \frac{g_{m1}}{g_{m2}} \left( 1 + \frac{g_{m2}R_f - 1}{g_{m2}C_1 + g_{m2}C_2 s \frac{C_1 C_2 R_f}{g_{m2}^2} s^2} \right) \tag{3.168} \]

For a maximally flat response

\[ g_m R_f = \frac{1}{2} (C_1 + C_2)^2 \tag{3.169} \]

should be satisfied. Note that the natural frequency of this Cherry-Hooper chain is

\[ \omega_{C-H} = \sqrt{\frac{g_{m2}}{C_1 + C_2}} \tag{3.170} \]

The DC gain can be calculated as

\[ A_{C-H} = \frac{1}{2} \left( C_g + \beta C_d \right)^2 + \frac{(C_g + C_d)^2}{\beta C_g + C_d} - 1 + \frac{1}{\beta} \tag{3.171} \]

Assume that \( \frac{g_{m1}}{g_{m2}} = \beta \) satisfies the optimal condition for the maximum gain bandwidth. In this case

\[ C_1 = C_g + \beta C_d \tag{3.172} \]
\[ C_2 = \beta C_g + C_d \tag{3.173} \]

where \( C_g \) and \( C_d \) are the gate capacitance and drain capacitance of a transconductance stage, respectively. Therefore,

\[ A_{C-H} = \frac{1}{2} \left( C_g + \beta C_d \right)^2 + \frac{(\beta C_g + C_d)^2}{\beta C_g + C_d} - 1 + \frac{1}{\beta} \tag{3.174} \]

Since the optimal gain of \( A_{C-H} = \sqrt{e} \) is very close to 1, we will first solve this equation for \( A_{C-H} = 1 \) and then adjust \( \beta \) to reach the optimal value.

\[ A_{C-H}(\beta) = 1 \Rightarrow \beta = 1 \tag{3.175} \]
\[ A_{C-H}(1 + \Delta \beta) \approx \left. \frac{\partial A_{C-H}(\beta)}{\partial \beta} \right|_{\beta=1} \Delta \beta \tag{3.176} \]

To avoid tedious derivations, the gain equation can be reformulated as follows.

\[ A_{C-H} = \frac{1}{2} \frac{(C_g + \beta C_d)^2 + (\beta C_g + C_d)^2}{(C_g + \beta C_d)(\beta C_g + C_d)} - 1 + 1 \tag{3.177} \]
\[ = \beta \left( \frac{(C_g + \beta C_d)^2 + (\beta C_g + C_d)^2}{2(C_g + \beta C_d)(\beta C_g + C_d)} - 1 + 1 \right) \tag{3.178} \]
\[ = \beta \left( \frac{(C_g + \beta C_d) - (\beta C_g + C_d))^2}{2(C_g + \beta C_d)(\beta C_g + C_d)} + 1 \right) \tag{3.179} \]
\[ = \beta \left( \frac{(C_g - C_d)^2(1 - \beta)^2}{2(C_g + \beta C_d)(\beta C_g + C_d)} + 1 \right) \tag{3.180} \]
Note that the first term in the parenthesis has two zeros at $\beta = 1$. It follows,

$$ \frac{\partial A_{C-H}(\beta)}{\partial \beta} \bigg|_{\beta=1} = 0 \Rightarrow A_{C-H} \approx \beta = \frac{g_{m1}}{g_{m2}} $$

(3.182)

which requires that the successive stages have $\frac{g_{m1}}{g_{m2}} = e$. Compared to a simple first-order amplifier,

$$ GBW_{C-H,-3dB} = \sqrt{2} \frac{g_{m2}}{(C_g + \beta C_d) + (\beta C_g + C_d)\beta} $$

(3.183)

$$ = \sqrt{2} \frac{g_{m2}}{(1 + \beta) (C_g + C_d)\beta} $$

(3.184)

$$ = \frac{\sqrt{2} \beta}{1 + \beta} \cdot GBW_{0,-3dB} $$

(3.185)

where $GBW_{0,-3dB} = \frac{g_{m2}}{C_g + C_d}$ is the product of gain and $-3dB$ attenuation bandwidth. Note that for the optimal gain $GBW_{C-H,-3dB} = 0.62 GBW_{0,-3dB}$, which shows that the Cherry-Hooper amplifier actually performs worse compared to a single stage. However, when $\beta > \sqrt{2} + 1 \approx 2.4$, the Cherry-Hooper wins over its first-order single-stage counterpart.

Since the Cherry-Hooper amplifier consists of two active components, it is also instructive to compare it to a 2-stage first-order amplifier. The Cherry-Hooper topology wins when $GBW_{C-H,-3dB} > GBW_{N=2,-3dB}$, which means that

$$ \frac{\sqrt{2} \beta}{1 + \beta} > A_{tot} \bigg|_{N=2, \alpha=\frac{1}{2}, A_{tot}=\beta} $$

(3.186)

$$ \frac{\sqrt{2} \beta}{1 + \beta} > \sqrt{\beta} \sqrt{2} - 1 $$

(3.187)

This condition is satisfied as long as

$$ \sqrt{2} - 1 < \beta < \sqrt{2} + 1 $$

(3.188)

which means that an optimally designed 2-stage first-order amplifier still outperforms the Cherry-Hooper topology, albeit only slightly. However, given the sharper out-of-band roll-off, it is better suited in a cascaded chain. Fig. 3.51 shows a comparison of the different designs.

So far, it has been shown that the advantage of the Cherry-Hooper topology is the sharper slope for out-of-band signal suppression. Shunt peaking with active inductors should be investigated as a means of improving the bandwidth of a single-stage amplifier. Using the KVL-KCL equations, the frequency response of the circuit of Fig. 3.52 can be calculated as

$$ \frac{V_{out}(s)}{V_{in}(s)} = - \frac{g_{m1}}{g_{m2}} \cdot \frac{R_f C_2 s + 1}{1 + \frac{C_1+C_2}{g_{m2}} s + \frac{C_1 C_2 R_f}{g_{m2}} s^2} $$

(3.189)
For a maximally flat design,

\[ \frac{\partial}{\partial \omega} \left| \frac{V_{\text{out}}(\omega)}{V_{\text{in}}(\omega)} \right| \bigg|_{\omega=0} = 0 \]  
(3.190)

\[ \frac{\partial^2}{\partial \omega^2} \left| \frac{V_{\text{out}}(\omega)}{V_{\text{in}}(\omega)} \right| \bigg|_{\omega=0} = 0 \]  
(3.191)

\[ \frac{\partial^3}{\partial \omega^3} \left| \frac{V_{\text{out}}(\omega)}{V_{\text{in}}(\omega)} \right| \bigg|_{\omega=0} = 0 \]  
(3.192)
The first and third derivatives are always satisfied. The second derivative is satisfied when

$$(R_f C_2)^2 = \frac{(C_1 + C_2)^2 - 2C_1C_2R_f}{g_{m2}} \tag{3.193}$$

and

$$g_{m2}R_f = \frac{\sqrt{C_1^2 + (C_1 + C_2)^2} - C_1}{C_2} \tag{3.194}$$

Note that the DC gain is given by $g_{m1}/g_{m2} = \beta$. So let us assume that the chain has a per-stage fan out of $f$,

$$C_1 = \beta C_d + C_d + f C_g \tag{3.195}$$
$$C_2 = C_g \tag{3.196}$$

For most practical cases, $C_2 \ll C_1$ and

$$g_{m2}R_f \approx \frac{\sqrt{2}}{2} + (\sqrt{2} - 1) \frac{C_1}{C_2} \tag{3.197}$$

The definition of a natural frequency is more complicated here since this system has one zero and two poles. Instead, we use the natural frequency of the similar Butterworth response, which has the same 4-th derivative

$$\omega_n : \left. \frac{\partial^4 V_{out}(\omega)}{\partial \omega^4} \right|_{\omega=0} = \left. \frac{\partial^4 \left( \frac{1}{1+\left(\frac{\omega}{\omega_n}\right)^4} \right)}{\partial \omega^4} \right|_{\omega=0} \tag{3.198}$$

Substitute the required $R_f$ into the gain equation and take the derivatives,

$$\frac{12C_1^2 \left( -3C_1^2 + \left( 2\sqrt{2}C_1^2 + 2C_1C_2 + C_2^2 - 2C_2 \right) C_1 - C_2^2 \right)}{g_{m2}^4} = \frac{-24}{\omega_n^4} \tag{3.199}$$

Assuming $C_2 \ll C_1$, the natural frequency can be approximated as

$$\omega_n \approx g_{m2} \frac{\sqrt{2}}{C_1 \sqrt{6 - 4\sqrt{2}}} \left( 1 - \frac{1}{4 - 2\sqrt{2} C_1} \right) \tag{3.200}$$

and the $-3$dB gain-bandwidth product is

$$GBW_{ActiveInd} \approx g_{m1} \frac{\sqrt{2}}{C_1 \sqrt{6 - 4\sqrt{2}}} \left( 1 - \frac{1}{4 - 2\sqrt{2} C_1} \right) \tag{3.201}$$

$$\approx GBW_0 \frac{\sqrt{2}}{\sqrt{6 - 4\sqrt{2}}} \left( 1 - \frac{1}{4 - 2\sqrt{2} C_1} \right) \tag{3.202}$$
Note that the gain-bandwidth product increases by about 85% compared to a simple first-order stage. Fig. 3.53 shows that the Butterworth model used here agrees well with the actual transfer function.

Having demonstrated the effectiveness of an active inductor to increase the bandwidth of each stage, PMOS and NMOS active inductors are investigated, as in Fig. 3.54. The problem with these topologies is that the current density of the active load must be higher than that of the differential pair to achieve gains greater than 1. This means that for a fixed $GBW_0$ of the differential pairs, a higher parasitic $C_2$ can be expected, which lowers the $GBW_{ActiveInd}$. To obtain the near-optimal current density for all devices, the topology of Fig. 3.55 is chosen. In this design, the active devices have the near-optimal current density for maximum speed. Moreover, both bandwidth and gain are controllable by triode devices (purple and green transistors, respectively). At the maximum gain setting
With the same mobility and intrinsic gain for PMOS and NMOS devices, the DC gain can be calculated as

\[ I_{m2} = \frac{2}{3} I_{m1} \Rightarrow g_{m2} \approx \sqrt{\frac{1}{1} \times \frac{2}{3}} g_{m1}, g_{o2} \approx \sqrt{\frac{1}{1} \times \frac{2}{3}} g_{o1} \]  
\[ I_{m3} = \frac{1}{3} I_{m1} \Rightarrow g_{m3} \approx \sqrt{\frac{1}{2} \times \frac{1}{3}} g_{m1}, g_{o3} \approx \sqrt{\frac{1}{2} \times \frac{1}{3}} g_{o1} \]  

Where the transconductance of the device is \( g_m = \sqrt{2\mu C_{ox} \frac{W}{L} I_{DC}} \). Hence,

\[ G = g_{m1} \left( 1 + \sqrt{\frac{2}{3}} r_{ol} \left( 1 + \sqrt{\frac{2}{3}} + \sqrt{\frac{1}{2} \times \frac{1}{3}} + \sqrt{\frac{1}{2} \times \frac{1}{3}} g_{m1} r_{ol} \right) \right)^{-1} \]  

which corresponds to about 2.8 for an intrinsic gain of 10. This is very close to the optimal gain in a power-efficient cascaded chain for a total gain of 30dB and a fanout factor of 10. Fig. 3.56 shows the simulated performance of the cascaded chain. Table 3.2 compares this work with previous work. Note that both DC gain and fan-out factor should be considered for a fair comparison between different results.
Figure 3.56: Performance of cascaded active inductor stages

<table>
<thead>
<tr>
<th>CMOS Tech.</th>
<th>Bandwidth</th>
<th>Gain</th>
<th>DC Power</th>
<th>Fan-out</th>
</tr>
</thead>
<tbody>
<tr>
<td>[35] 28nm</td>
<td>19.2GHz</td>
<td>28.3dB</td>
<td>10.3mW</td>
<td>1</td>
</tr>
<tr>
<td>This 28nm</td>
<td>9.64GHz</td>
<td>27.2dB</td>
<td>10.6mW</td>
<td>7</td>
</tr>
</tbody>
</table>

Table 3.2: Comparison of the baseband amplifier with earlier work

Figure 3.57: Baseband chain

The disadvantage of this design is that the output swing is limited. Therefore, an additional stage is inserted that does not contain an active inductor, as in Fig. 3.57. The differential feedback provides the DC bias for the chain. Remember that common-mode feedback is necessary for an amplifier with a differential input and output. However, for this
amplifier, the common-mode gain can be approximated as

$$G_{cm} \approx g_{m2} (g_{o2} + g_{o3} + g_{m3})^{-1}$$  \hspace{1cm} (3.207)

$$\approx \sqrt{\frac{2}{3}} g_{m1} r_{o1} \left( \sqrt{\frac{2}{3}} + \sqrt{\frac{1}{2}} \times \frac{1}{3} + \sqrt{\frac{1}{2}} \times \frac{1}{3} g_{m1} r_{o1} \right)^{-1}$$  \hspace{1cm} (3.208)

which is approximately 1.8 for an intrinsic gain of 10. This common-mode gain is achieved by using an odd number of stages to ensure that the feedback polarity is negative once the loop is closed. To avoid loop compensation, the feedback resistor is implemented using triode devices. This results in the dominant pole of the feedback loop being at the input of the chain. Fig. 3.58 shows the layout of the baseband amplifier. Note that to avoid latch-up and ESD failures, the last stage is implemented with individual guard rings for each set of PMOS and NMOS.

To cope with the ESD and pad capacitance, series inductors are used to form an artificial transmission line, as shown in Fig. 3.59. Due to the limited area available for the tape out and the congestion of the phased array units, the final performance is suboptimal. Fig. 3.60 shows the performance of the entire baseband chain, including the pads and ESD units. The entire baseband chain consumes 20mW.
CHAPTER 3. 140GHZ RECEIVER DESIGN

Figure 3.59: Using an artificial T-line to increase the bandwidth

Figure 3.60: Baseband Chain Performance

Figure 3.61: The layout of the baseband amplifier
3.9 Full Receiver Performance

Figure 3.62: 140GHz receiver taped out in 28nm CMOS technology.

This receiver is implemented in a 28nm Bulk CMOS process. The die photo and layout of the chip are shown in Fig. 3.62. The receiver consumes 60mW power, details are in Fig. 3.63.

![Power consumption diagram](image_url)

Figure 3.63: Power consumption of the receiver

Fig. 3.64 shows the performance of the receiver chain. While the 3dB bandwidth of the output is 11GHz, a bandwidth of 18GHz with a noise figure of 3dB is achievable when equalization is applied. Details of the performance can be found in Table. 3.3, where this
work is compared with other published work. SOI processes have better performance due to superior devices and RF-optimized back-end metallization.

<table>
<thead>
<tr>
<th>Technology</th>
<th>This Work</th>
<th>[37]</th>
<th>[19]</th>
<th>[38]</th>
<th>[39]</th>
<th>[40]</th>
<th>[20]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Carrier Frequency (GHz)</td>
<td>140</td>
<td>140</td>
<td>147</td>
<td>144</td>
<td>140</td>
<td>135</td>
<td>113</td>
</tr>
<tr>
<td>RF Bandwidth (GHz)</td>
<td>11</td>
<td>20</td>
<td>16</td>
<td>14</td>
<td>12</td>
<td>20</td>
<td>10</td>
</tr>
<tr>
<td>RX Gain (dB)</td>
<td>48</td>
<td>43</td>
<td>27.5</td>
<td>26.5</td>
<td>18</td>
<td>27</td>
<td>43.8</td>
</tr>
<tr>
<td>RX NF (dB)</td>
<td>10</td>
<td>11</td>
<td>6.4</td>
<td>6.4</td>
<td>5.5</td>
<td>8.5</td>
<td>11.2</td>
</tr>
<tr>
<td>Power Consumption (W)</td>
<td>0.060</td>
<td>NA</td>
<td>0.145</td>
<td>0.133</td>
<td>0.125</td>
<td>0.198</td>
<td>0.500</td>
</tr>
</tbody>
</table>

Table 3.3: Comparison of the receiver with the state-of-the-art

Fig. 3.65 shows the gain of the chain as a function of the input power. Note that the linearity of the circuit is mainly limited by the output swing of the baseband amplifier in the high and low gain modes.
Figure 3.65: Translation gain vs. input power
Chapter 4

Chip-to-Package Transition

4.1 Packaging Challenges at High Frequencies

The transition of the signal from the chip to the printed circuit board (PCB) becomes increasingly difficult as the carrier frequency increases. Several factors play a role in this:

- Although wire-bond is still the primary packaging solution for the frequency range above 100GHz, it is not reliable for massive array deployment. Since PCB fabrication capabilities dictate wire-bond length [41], the parasitic inductance of wire-bonds reduces the achievable bandwidth even with tuning techniques [42]. On the other hand, the horizontal alignment and vertical dimensions of flip-chip technology can be controlled with an accuracy of ten microns or less [43].

- Most PCBs have limited resolution in trace spacing and trace width. As a rule of thumb, for a transmission line, the return path should be closer than $\frac{\lambda_{\text{min}}}{10}$ to the signal path, where $\lambda_{\text{min}}$ is the wavelength at the maximum operating frequency. With a trace spacing of 6mil $\approx 150\mu$m on an FR-4 dielectric (with a relative dielectric constant of 4), transmission lines are limited to a maximum frequency of 100GHz.

- The diameter of the bumps or studs used for the transition determines the minimum distance between signal and return current. AuSn micropumps with a diameter of 10$\mu$m, for example, have shown return loss of better than 10dB up to 250GHz [44]. Unfortunately, these small bumps are costly and require much higher accuracy in PCB fabrication and chip assembly. As the spacing and diameter of the balls increase, unintended resonant modes can create notches near the band of interest.
• If the transition is not properly shielded, it can become a radiating element. For example, at a pitch of 150\(\mu\)m, two pads become a radiating dipole at

\[
f_{rad} = \frac{\frac{v_0}{2}\sqrt{\epsilon_{si}150\mu m}}{290\text{GHz}}
\]  (4.1)

• Metal planes on the PCB adjacent to the metal planes of the chip support the parallel-plate propagation mode. Once the new wave is excited, it is reflected from nearby bumps, or partially radiated and partially reflected from the chip boundaries. The reflected wave changes the effective impedance at the excitation point. In addition, if the reflected wave adds destructively with the original wave, notch behavior occurs in the transfer characteristic.

The suitability of flip-chip packages for millimeter-wave applications is well explained in [43]. Furthermore, various non-idealities occurring in flip-chip packages are described in [45]:

• Detuning: the presence of a semiconductor dielectric on the PCB changes the effective dielectric constant on the transmission lines [45]. Therefore, it is necessary to keep high-frequency I/Os at the periphery of the chip and minimize the distance between the signal pad and the chip edge. In addition, it is often essential to use an underfill between the chip and the board to increase the mechanical reliability of the assembled chip. While the volume of the added underfill is controlled, its exact shape and extension beyond the chip edge are unknown, making it difficult to model its loss and detuning effect properly.

• Excitation of parasitic modes: Considering only the semiconductor and its metal plane, this structure supports the propagation of TE and TM waves, commonly referred to as surface modes. While a lower thickness of the semiconductor shifts the cut-off frequency of these parasitic modes to a higher frequency range, the mechanical strength of the chip is reduced, leading to a higher susceptibility to mechanical stress. Unfortunately, these higher-order modes are always excited at the boundary of the chip where the signal transition occurs. When resonance occurs, the transition can have a very high loss.

• Reflections and insertion loss at the transition site.

4.2 Transition Structures

The transition structure of most works is still a simple ground-signal-ground (GSG) structure. Therefore, the only method to improve the transition performance is to use smaller bumps. Here, different structures are analyzed to improve the performance with a fixed bump diameter of 75\(\mu\)m.
Let us start the analysis with a simple structure shown in Fig. 4.1a. The cross-section of this transition is shown in Fig. 4.2a. Intuitively, we can see that the time for the signal current and the return current to travel from the PCB to the chip are not the same. The extra length of metal and the corresponding delays can be modeled with transmission lines. In this model, the bumps that carry the current in the vertical direction are intentional transmission lines (red in Fig. 4.2a). In contrast, the horizontal paths that the return currents must follow on the PCB and chip are parasitic transmission lines (green in Fig. 4.2a). In the simple transmission line model of Fig. 4.2b, the input current into the intended transmission line (labeled 1) must equal the input current into the parasitic line (labeled 2). Therefore,

\[ I_{in} = \frac{v_{2f} - v_{2r}}{Z_2} = \frac{v_{2f}e^{-j\theta_2} - v_{2r}e^{j\theta_2}}{Z_2} \]  

(4.2)

Where \( v_{xf} \) and \( v_{xr} \) are the voltage of the propagating waves in the forward and reverse directions in each transmission line. To satisfy this equation,

\[ e^{j\theta_2} = -\frac{v_{2f}}{v_{2r}} \]  

(4.3)
Interestingly, the standing wave ratio on the second transmission line does not depend on the load impedance. Moreover, at the frequency where \( \theta_2 = \pi \),

\[
I_{in} = \frac{v_{2f} - v_{2r}}{Z_2} = 0
\]

which suggests that at the frequency of

\[
f_{\text{notch}} = \frac{1}{2\tau_2}
\]

or an odd integer multiple of that, a notch in the transmission characteristic is expected. In other words, the timing mismatch between current and reverse current results in deep notches in the transition. Note that under the assumption of loss-less transmission lines, even near the notch frequency, \( G_{\text{max}} \) remains high because any reactive energy can be tuned out with ideal components, at least in theory. However, the tuning comes at the cost of extremely low bandwidth and high insertion loss due to the matching elements. The other transmission line may also exhibit similar notch behavior; however, for most practical transitions \( \tau_1 < \tau_2 \). Note that this deep notch is easily seen when the length of the horizontal line is much greater than that of the vertical line, which is usually the case when small bumps are used on low manufacturing resolution PCBs.

Since the green transmission line is a 2-D parallel-plate transmission line, the signal escapes by coupling to the parallel-plate propagation mode at the metal-dielectric-metal stack in the transition region. Fig. 4.1b shows the direction of the Poynting vector. If we assume an optimal situation, the length of the two transmission lines should be similar. Moreover, a parasitic loop antenna is excited at the transition in this situation. The loop antenna is in resonance when the circumferential length of the loop is equal to the wavelength, assuming a short circuit on the chip. In terms of delays in the transmission line model

\[
f_{\text{rad}} = \frac{1}{2(\tau_1 + \tau_2)}
\]

The incoming signal near the radiation frequency is dissipated by coupling with parasitic surface wave modes and parallel plate modes. This factor is clearly seen when \( G_{\text{max}} \) is considered. Fig. 4.3 shows \( G_{\text{max}} \) for different distances as a function of frequency. Here, a bump height of 75\( \mu \)m is considered, which is the minimum bump height offered by the technology used. It can be observed that as the bump height increases, the first notch moves closer to the origin. In the simulation structure, the total distance between two footprints (\( H \) in Fig. 4.2a) is 125\( \mu \)m. With a dielectric constant of 3.1 for the underfill material, Eq. 4.6 estimates the first notch to be

\[
f_{\text{notch}} \approx \frac{1}{2 \frac{3 \times 10^8 \text{ms}^{-1}}{\sqrt{3.1}} 125\mu \text{m} + \text{Pitch}}
\]
where *Pitch* in Fig. 4.2a. As you can see in Fig. 4.4, the radiation frequency of the loop antenna agrees well with the simulation results.

Let us now discuss some other structures.

- By adding additional ground bumps as in Fig. 4.5a, one can partially reflect surface waves, which should reduce the transition loss. However, the reflected wave will still reach the other side of the chip and will be dissipated either by radiation or excitation of surface waves across the chip boundary. As you can see in Fig. 4.10b, this method is quite effective in reducing the transition loss at the previous radiation frequency and shifting the notch to a higher frequency.
By adding two sets of ground bumps with positive and negative offsets in a rectangular shape, one can make the transition as shown in Fig. 4.5a. This is a much better approach in the lower frequency range because it can effectively reject forward and backward surface waves. However, as the distance between two ground bumps increases, higher leakage is expected, as shown in Fig. 4.10b. Moreover, as the length of the PCB microstrip line increases over the chip region, this structure suffers from a higher degree of detuning and coupling with the silicon substrate. This indicates that the least leakage is expected when a full bump cage is formed with minimal spacing.

As mentioned earlier, the best performance is expected when the smallest spacing between all bumps is used. To achieve this goal, the ground bumps must be on a hexagon around the signal, as shown in Fig. 4.7a. The simulation results shown in Fig. 4.10b indicate that this structure achieves the best performance in terms of transition loss and notch frequency. Unfortunately, depending on the capabilities of the PCB manufacturer, this design may be impractical since the microstrip signal must be squeezed out of two ground bumps and their associated pads.
If the previous structure with a full shield was not practical, a reverse microstrip could be used, as in Fig. 4.8a. In this case, the metal layer of the microstrip is on the topmost layer, while the signal metal is buried underneath. This strategy allows us to place the feedline in the middle of the chip. This additional degree of freedom will enable us to use the periphery of the chip for other purposes. However, it requires interruptions on the transmission line’s ground plane to accommodate more I/Os. To avoid this interruption, the ground plane of the inverted microstrip can be implemented on the second top layer while the signal is on the third layer. This transition topology minimizes leakage at the chip interface. However, signal loss occurs at the inner via. The simulation results (Fig. 4.10b) show that this structure has higher losses compared to the other topologies. Moreover, it requires a large keep-out region above the signal line to reduce the parasitic coupling, making it less attractive.

To solve the previous problem, the microstrip line can be replaced by a stripline, as shown in Fig. 4.9a. The simulation results (Fig. 4.10b) show that this structure has...
superior performance compared to other practical options up to 220GHz. After that, the transition’s loss increases with increasing frequency, and at 325GHz, there is a notch in the transmission characteristic.

![Figure 4.9: Stripline with full shield](image)

![Figure 4.10: \( G_{max} \) of the different transition scenarios](image)

### 4.3 Limitation of the Stripline Structure

So far, the stripline design of Fig. 4.9a is the most promising solution for high frequencies. Another advantage of this topology is that the millimeter-wave signal is completely shielded from the environment. This means that the performance is less susceptible to variations in the shape of the underfill or the expansion of the silicon. Therefore, it is desirable to explore this structure and investigate its possible limitations. First, the PCB stripline itself should be investigated. The cross section of the stripline is shown in Fig. 4.11. The first propagation mode of this structure (Fig. 4.12a) is the intended TEM mode, which has no cut-off frequency. However, as the frequency increases, the metal cage around the line forms an effective waveguide, commonly called a substrate-integrated waveguide \([46, 47]\). Note
that the discrete nature of microvias allows only TE propagation modes in the waveguide. The effective width of the waveguide can be approximated by

$$W_{\text{eff}} = W - \frac{D^2}{0.95P}$$

where $W$ is the center-to-center spacing of the microvias on two sides, $D$ is the diameter of the vias, and $P$ is the spacing of the vias on the same side. The E-field of the first TE mode of this effective waveguide is shown in Fig. 4.12b. Intuitively, above the cut-off frequency of the TE wave, the upper and lower ground planes may propagate different signals, indicating that the ground planes above the cut-off frequency are undefined. Fig. 4.13 shows that a cut-off frequency of 300GHz is expected for the TE wave. This means that while an ideal straight stripline will perform smoothly in a simulation platform, any other structure may exhibit unpredictable performance if the exact length of the transmission lines is not known at the design stage. Therefore, the designer should ensure that the cut-off frequency of the TE wave is well above the highest frequency range of interest.

Considering Fig. 4.10b, the notch frequency of the stripline structure is above 300GHz. It is still important to understand the formation mechanism of this notch since process variations can change its frequency. When it is shifted to the lower frequency range, the insertion loss of the transition can increase rapidly. Let us first understand how a notch in $G_{\text{max}}$ occurs and why it is different from a notch in transmission ($S_{21}$). Consider a
CHAPTER 4. CHIP-TO-PACKAGE TRANSITION

simple circuit shown in Fig. 4.14a. Note that at the resonant frequency of the tank \( S_{21} = 0 \). However, there is always an ideal matching network near the resonant frequency that cancels the effect of the tank. This means that \( G_{\text{max}} = 0 \)dB over the entire frequency range (Fig. 4.14b). Intuitively, such a matching network must translate the input impedance of each port to a much lower impedance so that the equivalent parallel impedance of the tank looks much smaller than the port impedance. Translating the port impedance to a lower impedance requires passive current gain. The impedance translation ratio increases as the frequency gets closer to the notch frequency, requiring more current gain. Now considering the series loss in accessing the tank, as in Fig. 4.14c, a higher current gain increases the power loss. Therefore, as the frequency approaches the resonant frequency of the tank, the insertion loss approaches infinity, leading to a notch in \( G_{\text{max}} \) (Fig. 4.14d). The same considerations can be applied to a series tank, as in Fig. 4.14. The critical point here is that in the presence of any resonant structure, the series and parallel losses of the access lines may force \( G_{\text{max}} = 0 \).

To avoid such a notch, one must intentionally change the resonant frequency or ensure that the resonant structure is not excited. An eigenmode solver of Ansys HFSS was used to study the resonant modes, and the structure was modified to remove the access transmission lines. Among the numerous resonant modes, one of the modes corresponds to the cavity where the signal goes down through microvias in the shielded cage. The stripline is connected to the body of the cage, and based on the field vectors, the resonant mode of the cavity couples to the TE mode of the parasitic stripline waveguide. Therefore, depending on the reflection phase of the coupled wave, the resonant frequency of the loaded structure changes slightly. The actual reflection phase is unknown because this parasitic mode is not necessarily terminated with an actual load. Therefore, the waveguide is short-circuited at the end of the stripline, and several different lengths of the stripline are simulated (Fig. 4.15). Note that the phase constant of the TE mode approaches 0 near the cut-off frequency of the waveguide. Once the cavity’s resonant frequency is shifted down towards the cut-off frequency of the waveguide, the phase shift of the reflected wave becomes independent of the length.

Although the notch frequency of \( G_{\text{max}} \) may shift to lower frequency bands, it will not
cross the cut-off frequency of the waveguide. For this reason, this transition structure should not be used beyond the TE cut-off frequency of the waveguide. To prove this theory, several different stripline lengths are simulated (Fig. 4.16). The simulation results (Fig. 4.17b) prove that the frequency of the notch varies with the length of the line. Moreover, multiple resonant modes can cause multiple notches. However, all of these notches persist above 300GHz and have almost no effect on the performance of the transition below 200GHz (Fig. 4.17a).
Figure 4.15: Eigenmode simulation of resonant modes with different stripline length

(a) Short Line

(b) E-field magnitude of the short line Eigen mode

(c) Long Line

(d) E-field magnitude of the long line Eigen mode

(e) E-field vector of the long line Eigen mode
CHAPTER 4. CHIP-TO-PACKAGE TRANSITION

Figure 4.16: Long striplines are studied for the effects of cavity resonance

![Figure 4.16: Long striplines are studied for the effects of cavity resonance](image)

(a) Sub 200GHz  
(b) DC to 400GHz

Figure 4.17: $G_{\text{max}}$ of the stripline transition when the length of the stripline extension is varied

4.4 Final Pad Structure

Given the advantages of the stripline transition over its counterparts, it was chosen for millimeter-wave I/Os. Below 300GHz, the transition can be modeled with two capacitors and a series transmission line representing the pad capacitance, the effective delay, and the characteristic impedance of the microvias from the stripline opening to the chip, as shown in Fig. 4.18. Note that the area inside the ground cage on the silicon is wasted if the matching network is implemented outside the pad area. Moreover, the access line can degrade the bandwidth and loss of the network (Fig. 4.19). Therefore, the matching network is implemented inside the ground cage. It consists of two symmetrical transmission lines (Fig. 4.20a), whose characteristic impedance and length are calculated to obtain a matched impedance (Fig. 4.20b).

The performance of the final design is simulated and shown in Fig. 4.21 and summarized.
Figure 4.18: Lumped model of the transition below the stripline cut-off frequency

Figure 4.19: Wasted silicon area and additional losses due to the access line

in Table. 4.1. Table. 4.2 compares the performance obtained here with several other published papers.
Figure 4.20: The final design of the transition with a suitable matching network

Figure 4.21: Performance of the final design

Table 4.1: Performance of the final design
### Table 4.2: Summary of performance and comparison with the state-of-the-art

<table>
<thead>
<tr>
<th>Ref.</th>
<th>Package</th>
<th>Interconnect</th>
<th>Size</th>
<th>Pad Pitch</th>
<th>Frequency</th>
<th>Transition Loss</th>
</tr>
</thead>
<tbody>
<tr>
<td>[49]</td>
<td>RO4350</td>
<td>Copper Pillar</td>
<td>-</td>
<td>-</td>
<td>130GHz</td>
<td>3dB</td>
</tr>
<tr>
<td>[50]</td>
<td>Astra MT77</td>
<td>-</td>
<td>-</td>
<td>-</td>
<td>145GHz</td>
<td>2.5dB</td>
</tr>
<tr>
<td>[51]</td>
<td>LTCC GL771</td>
<td>Copper Pillar</td>
<td>30μm</td>
<td>175μm</td>
<td>135GHz</td>
<td>1.1dB</td>
</tr>
<tr>
<td>[33]</td>
<td>Megtron 6</td>
<td>Solder Bump</td>
<td>-</td>
<td>250μm</td>
<td>115GHz</td>
<td>-</td>
</tr>
<tr>
<td>[52]</td>
<td>IPD carrier</td>
<td>Gold Bump</td>
<td>65μm</td>
<td>170μm</td>
<td>163GHz</td>
<td>2.8dB</td>
</tr>
<tr>
<td>This work</td>
<td>ABF GL102</td>
<td>Solder Bump</td>
<td>75μm</td>
<td>150μm</td>
<td>140GHz</td>
<td>1dB</td>
</tr>
</tbody>
</table>
Chapter 5

Package-to-Package Transition

5.1 Introduction

Millimeter-wave and sub-THz systems offer unique applications for communication systems because higher total bandwidth and higher data rates can be achieved with a fixed fractional bandwidth [20]. However, due to excessive path loss in this frequency range, such systems must generate higher output power at the transmitter and achieve higher gain at the receiver. A phased array architecture (with \(N\) elements) can increase performance by relaxing the requirements on each element in the array and using the effective array gain. Given the enormous number of elements required to achieve high array gain [53, 54], the use of on-chip antennas is not feasible due to the cost of antenna area on semiconductors. Therefore, it makes sense to leave the antennas on the package and optimize the package materials and technology for higher radiation efficiency.

The decades of innovation and scaling of CMOS technology [55] makes it the first choice for designers when it comes to array processing. Although digital circuits have benefited dramatically from technological improvements, the analog and RF performance of CMOS has remained relatively similar over the past decade at \(f_{\text{max}} \approx 300 – 400\)GHz [56]. This makes CMOS extremely inefficient for sub-THz applications and encourages the coexistence of (III /V) compound semiconductors such as GaN or InP to boost performance. A package capable of carrying millimeter-wave and sub-THz signals with minimal insertion and radiation losses is needed. Also, high resolution and fine pitch are required to connect as many signals as possible to the chip with minimal reflection losses.

Thermal considerations are another aspect of sub-THz package design. The elements of a phased array are typically spaced \(\lambda_0/2\) apart to minimize side lobes and mutual antenna
CHAPTER 5. PACKAGE-TO-PACKAGE TRANSITION

coupling. This means that as frequency increases, so does the heat flux \(^1\), and the package should have excellent heat dissipation for reliable performance.

A single package that meets all the above requirements is expensive. This means that a modular package (as shown in Fig. 5.1) can optimize the cost and performance of millimeter-wave and sub-THz systems. However, a major practical problem is the transition of high-frequency signals between packages. Current low-cost solutions such as wire bonds and C4 bumps have low reliability for massive array implementation, or their resolution is insufficient to realize a low-loss transition due to reflections. This chapter proposes a new inter-package interconnect architecture based on guided inter-package radiation using mature and low-cost Ball Grid Arrays (BGA). Compared to other low-cost solutions, the proposed solution achieves higher bandwidth with lower insertion loss, while the lithography and alignment requirements are much more relaxed.

![Diagram of proposed millimeter-wave phased array packaging solution with integrated III/V semiconductor](image)

Figure 5.1: Proposed millimeter-wave phased array packaging solution with integrated III/V semiconductor

### 5.2 Design Principles

Proximity interconnects based on capacitive or inductive coupling have been explored for various applications, such as when isolation (thermal or electrical) is required \([57]\) or when transceivers cannot be physically connected. In such systems, the receiver is located in the reactive near-field region of the transmitter. While this method works very well at lower frequencies, it is not readily possible to place transceivers in each other’s reactive near-field in the millimeter-wave and sub-THz frequency range. For example, to transmit a signal with a frequency of 150GHz between two packages, their distance should be less than \(200 \mu m\), which requires good alignment during fabrication.

On the other hand, transmitting signals between two antennas in the far-field (Fraunhofer zone) is more common. This is how most conventional radio receivers operate. Far-field transmission, while simple, involves significant path loss, making it impractical for interconnects.

\(^1\)Neglecting the lower device efficiency at higher frequencies.
When the distance between antennas is comparable to the wavelength \( d \approx 0.2\lambda \ldots 2\lambda \), the transceivers are in each other's Fresnel zone (radiative near-field). It is better to balance the two zones, as lower insertion loss can be achieved without stringent manufacturing requirements. However, electromagnetic fields tend to change rapidly with distance in this region. This effect can be modeled as the superposition of multiple propagation modes with different phase velocities. Perfect transmission occurs when all transmitter modes (having propagated through the channel at their velocities) match the corresponding receiver modes, or when the reflections of the different modes cancel each other out. This approach, while theoretically possible, requires strict alignment and precision in fabrication and usually has a narrow bandwidth.

The channel can be designed to have a preferred propagation mode to reduce sensitivity to distance. In this case, the channel rejects unwanted modes and allows only a single mode. Since the channel enforces modal purity, variations in the distance between antennas during fabrication only change the phase delay through the channel.

It can be compared to the performance of single-mode waveguides [58]. Waveguides are usually designed to have modal purity for an infinitely long channel. However, the transition distance between packages shown in Fig. 5.1 is generally about \( d \approx 1\text{mm} \) or less. Therefore, instead of a standard waveguide, a pseudo-waveguide can be designed to operate with only one dominant mode over a given channel length, which is far more relaxed than a conventional waveguide design.

Fig. 5.2 shows the principles of the proposed idea. For an inter-package interconnect, antennas on each package face each other, surrounded by BGA balls. These balls will shield the radiation to minimize leakage and insertion loss while rejecting undesired modes.

### 5.3 Design Considerations

Fig. 5.3 shows an example cross-section of two packages mounted on top of each other with a Ball Grid Array (BGA). There are several things to note here:
• With inexpensive BGAs, relatively good alignment can be achieved. Also, almost the same pattern repeats at the interface, indicating that reliable, predictable, and reproducible performance can be expected.

• The exact shape and curvature of the solder balls may vary. Therefore, the design should be such that the performance is less sensitive to the precise diameter of the balls, for example, by increasing the distance between the balls. In this case, the solder balls can be modeled as cylinders.

• Glass-weave may adversely impact the performance when the structure is much smaller than the periodicity of the woven structure. However, if the dimensions are chosen large enough, the radiators will see an average dielectric constant.

As mentioned earlier, the larger the structure is, the less sensitive it is to manufacturing variations and process nonidealities. However, the further away the solder balls are, the less shielding can be expected. Therefore, how much shielding can be expected from the BGA is unknown. To answer this question, two different scenarios for an incident wave are considered (Fig. 5.5):

1. E-field parallel to the solder balls (Fig. 5.4a): In this case, the shielding is achieved by the induced current in the solder balls (which are modeled as cylinders), leading to an inductive reflection of the incident wave. Intuitively, as the ball pitch increases (for a fixed ball diameter), higher leakage should be expected since the incident wave is less coupled to the BGA balls. Moreover, the shielding performance is independent of the height of the balls (which determines the length of the pseudo-waveguide channel).
2. E-field perpendicular to the solder balls (Fig. 5.4b): In this case, individual BGA balls cannot provide sufficient shielding because the induced current is immediately interrupted by the discontinuity of the ball grid array. In this case, there is a redistribution of electric charge on the ball, indicating a (tiny) capacitive reflection of the incident wave. However, suppose that the solder balls are short-circuited by the interposer or the metal planes of the PCB. In this case, the induced charges cause an electric current to flow through the planes, and consequently, inductive reflection is expected. Increasing the ball pitch decreases the shielding performance for a fixed ball diameter since a smaller electric charge is initially induced on the balls. As the height of the ball increases (assuming its diameter can be kept constant), the shielding decreases as the same induced charges on the balls experience a higher series inductance before reaching the metal planes.

Since the reflection depends on the LC loop formed by the BGA and metal planes, there is a resonant frequency at which no shielding is expected. Assuming a simple LC model

\[ \omega_0^{-2} = (C_Z)(2L_S + L_Z) \]  

(5.1)

The effectiveness of the BGAs for shielding was verified using the full-wave simulation software ANSYS HFSS. In this simulation (Fig. 5.5a), unit cells with slave/master boundary conditions are used. A ball diameter of 350µm is chosen, and the ball height is assumed to be equal to the ball diameter. Fig. 5.5b shows the simulation results at 140GHz. They show that the BGA can effectively reflect incident waves and mimic a solid metallic plane for the frequency range of interest.

![Diagram](image-url)  

Figure 5.4: The lumped circuit model seen by an incoming wave with specific E-polarization
5.4 Prototype Design and Measurement Results

A contactless interconnect was developed as a proof of concept. In this section, various aspects of the design methodology are explained.

Interposer Technology

The interposer used here is made of organic materials. It consists of six build-up layers (ABF GL102) with a total thickness of 300\(\mu m\) symmetrically attached to a 400\(\mu m\)-thick core layer (MCL-E-705G) for mechanical support, as shown in Fig. 5.6. The fabrication capabilities allow the microvias of the build-up layers to have a spacing of only 100\(\mu m\), while the plated through holes in the core layer have a minimum spacing of 300\(\mu m\).

Channel Design Trade-offs

Assuming that solder balls can provide sufficient shielding, cylindrical balls are connected from the outer sides to form a pseudo-waveguide, as shown in Fig. 5.7. Then a modal simulation is performed to find the propagation modes. Different dimensions of the pseudo-waveguide (by changing the ball pitch, the ball diameter, and the number of balls in each row) are investigated. As a compromise between modal purity, the characteristic impedance of the desired mode, frequency dispersion, and fabrication capabilities, the structure of Fig. 5.7
with a ball diameter of $350\mu m$ and a ball pitch of $600\mu m$ is chosen. Once the dimensions of the structure are determined, the shielding performance is simulated and verified using the technique described in the previous section.
Antenna Design with Distributed Matching Network

As shown in Fig. 5.1, the millimeter-wave contactless interconnect is fed from one side of the interposer (slot- antenna fed from a CPW line), while the receiver (on the PCB) is located on the other side of the interposer. Since the electrical length of the core and build-up layers is comparable to the wavelength, a lumped matching network at the excitation site leads to low bandwidth and high insertion loss. Therefore, a distributed matching network is used.

To facilitate the design of the matching network and avoid exhaustive electromagnetic simulations, the matching network is first divided into two parts as explained in Fig. 5.8. The first part matches the impedance of the pseudo-waveguide to the impedance of the wave in the core layer \(^2\). The second part is about matching the slot antenna to the impedance of the waves in the core layer. The reason for this decision is that the core layer is thick (with an electrical length of \(\approx 150^\circ\)), and has a higher dielectric loss than the build-up layers. Therefore, any reflection within the core results in higher insertion loss and lower bandwidth. Once the matching network is roughly calculated, the correct values (for the size of the inductive and capacitive posts) are entered into the simulator. After running the optimization engine to fine-tune the entire structure, we found that the initial calculated values were close to optimal. The final design and exploded view are shown in Fig. 5.9 and Fig. 5.10, respectively.

Prototype Performance

A prototype is simulated and fabricated (Fig. 5.11) to verify the proposed solution and design methodology. It consists of a back-to-back connection of two millimeter-wave contactless interconnects which (Fig. 5.12). The simulation and measurement results are shown in Fig. 5.13. It is observed that 20GHz of \(-10\)dB reflection bandwidth is achievable with 4dB insertion loss for a back-to-back structure (2dB loss for each leg). The additional insertion loss in the measurement compared to the simulation results is likely due to the surface roughness of the copper.

\(^2\)The core dielectric and the plated through holes together form another pseudo-waveguide.
Figure 5.9: Interposer antenna

Figure 5.10: Exploded view of the antenna (microvias not shown)
CHAPTER 5. PACKAGE-TO-PACKAGE TRANSITION

Figure 5.11: Fabricated millimeter-wave contactless interconnect

(a) Top view
(b) Bottom view

Figure 5.12: Back-to-back millimeter-wave contactless interconnect

(a) Flipping for attaching
(b) Back-to-back attachment
(c) Measurement
CHAPTER 5. PACKAGE-TO-PACKAGE TRANSITION

Figure 5.13: Simulation and measurement results for a back-to-back millimeter-wave contactless interconnect

5.5 Conclusion

The millimeter-wave contactless interconnect based on guided radiation is proposed as a new method for inter-package routing. Design methods and guidelines are explained to obtain an estimated performance before performing detailed electromagnetic simulations. A distributed matching network is also proposed to achieve high bandwidth and low insertion loss. Full-wave electromagnetic simulations verify all proposed ideas and methods. Finally, the prototype is fabricated and measured. The measurements agree well with the simulation results.
Chapter 6

Conclusion

6.1 Thesis Summary

The next generation of mobile communications requires cost-effective solutions to increase the capacity of the cellular network. Millimeter-wave carrier frequencies enable high-speed links in a lightly licensed portion of the spectrum. However, CMOS process scaling is no longer as effective as it was in the past to enable high-speed applications. CMOS scaling can degrade device output power at high frequencies.

Noise measure is considered as a performance metric that combines the power gain with the minimum noise figure given the limited power gain of devices operating close to their activity limit. Enlightening examples allow the reader to grasp the mathematical framework intuitively. The use case of active baluns is explored using noise measure theory, and optimal working conditions are investigated.

The design of a wideband receiver at 140GHz is discussed. Several different techniques are proposed to improve the performance of the receiver chain compared to the state-of-the-art. All of these techniques are mathematically proven, and tradeoffs are explored. These techniques, such as transformer equivalents, active baluns, and optimal matching networks, can be readily implemented in commercial ICs to improve performance.

Finally, cost-effective packaging solutions for millimeter-wave applications have been explored. Note that much of the published work was either measured with probes or packaged with on-chip antennas, possibly with integrated silicon lenses. As with commercial applications, the transition from chip to package and between packages has been investigated. It has been shown that currently available low-cost package options can meet millimeter-wave requirements.
6.2 Future Directions

As shown in this work, conjugate matching does not provide optimal performance. Therefore, transmission is optimized, and matching networks are designed to achieve optimal transmission. Although low-k transformers have been used extensively in this work, they were not intentionally designed for high bandwidth. In other words, the high bandwidth is just a byproduct of using low-k transformers. The simulation results show that a combination of LC ladder networks with transformers can deliver the maximum transmission while intentionally maximizing the system’s bandwidth.

Another avenue of research is to investigate the performance of common-gate amplifiers. Note that there is no difference between the two amplifiers from the noise measure point of view. However, the power gain of common-gate amplifiers is lower than that of common-source amplifiers. On the other hand, the insertion loss over the matching network is lower as expected due to the lower input quality factor. Thus, if the insertion loss of the matching network is significant, a common-gate stage may be superior to a common-source counterpart. Also, since the input signal is not connected to the gate, it is easy to use a double-sided contact with minimal parasitic capacitance.

![Figure 6.1: Estimating the noise measure of an amplifier including the insertion loss of the matching networks](image)

Finally, given the equations for the minimum noise measure and the insertion loss of the matching network, you can combine the two to derive the minimum noise measure of an amplifier with its matching network (Fig. 6.1).
Bibliography


