## Power Efficient System and A/D Converter Design for Ultra-Wideband Radio



Shuo-Wei Chen

#### Electrical Engineering and Computer Sciences University of California at Berkeley

Technical Report No. UCB/EECS-2006-71 http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-71.html

May 19, 2006

Copyright © 2006, by the author(s). All rights reserved.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.

#### Power Efficient System and A/D Converter Design for Ultra-Wideband Radio

by

Shuo-Wei Michael Chen

B.S. (National Taiwan University) 1998M.S. (University of California, Berkeley) 2002

A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy

in

Electrical Engineering and Computer Sciences

in the

## GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY

Committee in charge: Professor Robert W. Brodersen, Chair Professor Borivoje Nikolic Professor Paul Wright

Spring 2006

The dissertation of Shuo-Wei Michael Chen is approved:

Chair

Date

Date

Date

University of California, Berkeley

Spring 2006

## Power Efficient System and A/D Converter Design for Ultra-Wideband Radio

Copyright 2006

by

Shuo-Wei Michael Chen

#### Abstract

Power Efficient System and A/D Converter Design for Ultra-Wideband Radio

by

Shuo-Wei Michael Chen

Doctor of Philosophy in Electrical Engineering and Computer Sciences

University of California, Berkeley

Professor Robert W. Brodersen, Chair

Ultra-Wideband (UWB) technology has been approved by the FCC in 2002 and has since drawn considerable attention for a variety of applications, including communications, imaging, surveillance, and locationing. One of the most attractive applications is for the indoor communication system which is allowed to operate in the frequency band from 3.1 to 10.6 GHz. The interest in these indoor systems extends from highspeed, short-range systems to low data rate communications and precision ranging, as seen in the standardization efforts of IEEE 802.15.3a/4a. The data rate of interest scales from 10's Kbps to 100's Mbps. Regardless of application, it is very crucial to design with low cost and low power; especially as many of these applications intend to deploy a large volume of inexpensive UWB mobile devices that must operate with the longest possible battery life. This thesis proposes both system and circuit solutions intended for minimal UWB implementation cost. At the system level, an impulse radio architecture utilizing a sub-sampling analog front-end along with digital complex signal processing is proposed to allow a low complexity implementation of a 3.1-10.6 GHz Ultra-Wideband radio. The proposed system modulates information onto passband pulses using a pulser and antenna, and the receiver front-end down-converts the signal frequency via sub-sampling, thus, requiring substantially less hardware than the existing direct conversion approach. After analog-to-digital converter (ADC), the signal is projected into complex signal domain to perform matched filtering to not only mitigate the timing sensitivity induced by analog circuit impairment, but also extract the fine time resolution provided by the wideband nature of a UWB signal.

Based on this transceiver architecture, the most challenging circuit block was found to be the high-speed (GHz) ADC, requiring to sub-sample the RF frequencies. As a result, an asynchronous analog-to-digital converter (ADC) based on successive approximation is introduced to provide a high speed (600-MS/sec) and medium resolution (6 bits) conversion. A high input bandwidth (>4 GHz) was achieved which allows its use in RF sub-sampling applications. By using asynchronous processing techniques, it avoids clocks at higher than the sample rate and speeds up a nonbinary successive approximation algorithm utilizing a series, non-binary capacitive ladder with digital radix calibration. The sample rate of 600-MS/sec was achieved by time interleaving two single ADCs, which were fabricated in a 0.13- $\mu$ m standard digital CMOS process. The ADC achieves a peak SNDR of 34 dB, while only consuming an active area of  $0.12 \text{ mm}^2$  and power consumption of 5.3 mW.

Professor Robert W. Brodersen Dissertation Committee Chair

# Contents

| Li       | ist of Figures iii |             |                                                   |    |  |  |
|----------|--------------------|-------------|---------------------------------------------------|----|--|--|
| Li       | List of Tables     |             |                                                   | v  |  |  |
| 1        | Intr               | ntroduction |                                                   |    |  |  |
|          | 1.1                | Trend       | s in Wireless Communication Systems               | 1  |  |  |
|          | 1.2                | Overv       | iew of Ultra-Wideband Communication               | 4  |  |  |
|          | 1.3                | Resear      | rch Contributions                                 | 6  |  |  |
|          | 1.4                | Thesis      | organizations                                     | 7  |  |  |
| <b>2</b> | Sub                | -samp       | ling UWB system architecture                      | 8  |  |  |
|          | 2.1                | Introd      | luction                                           | 9  |  |  |
|          | 2.2                | Comp        | arison of Transceiver Architectures               | 11 |  |  |
|          | 2.3                | Analo       | g: Subsampling Front End                          | 13 |  |  |
|          |                    | 2.3.1       | Theory Background and Challenges                  | 13 |  |  |
|          |                    | 2.3.2       | Subsampling for UWB                               | 15 |  |  |
|          |                    | 2.3.3       | Timing Sensitivity Issue                          | 18 |  |  |
|          | 2.4                | Digita      | l: Complex Signal Processing                      | 19 |  |  |
|          |                    | 2.4.1       | Complex representations of UWB signals            | 20 |  |  |
|          |                    | 2.4.2       | Proposed digital baseband using complex signaling | 22 |  |  |
|          |                    | 2.4.3       | Timing extraction from the proposed baseband      | 36 |  |  |
|          | 2.5                | Imple       | mentation Specifications and Issues               | 39 |  |  |
|          |                    | 2.5.1       | Received SNR                                      | 40 |  |  |
|          |                    | 2.5.2       | Bandpass Filter Response                          | 42 |  |  |
|          |                    | 2.5.3       | Gain                                              | 43 |  |  |
|          |                    | 2.5.4       | Sampling Clock                                    | 44 |  |  |
|          |                    | 2.5.5       | Subsampling Mixer and ADC                         | 46 |  |  |
|          |                    | 2.5.6       | Implementation Cost of the Digital Baseband       | 47 |  |  |
|          | 2.6                | System      | n Simulations                                     | 51 |  |  |
|          | 2.7                | System      | n Prototype                                       | 57 |  |  |

|    | 2.8   | Conclu | usion                                              | 59  |
|----|-------|--------|----------------------------------------------------|-----|
| 3  | Hig   | h-spee | d Low-power Asynchronous ADC                       | 60  |
|    | 3.1   | Introd | uction                                             | 61  |
|    | 3.2   | ADC .  | Architecture                                       | 62  |
|    |       | 3.2.1  | Power Efficiency of Conventional ADC Architectures | 62  |
|    |       | 3.2.2  | Asynchronous Processing                            | 66  |
|    |       | 3.2.3  | Architecture                                       | 71  |
|    | 3.3   | Circui | t Implementation Details                           | 73  |
|    |       | 3.3.1  | Dynamic Comparator and Ready Signal                | 73  |
|    |       | 3.3.2  | Non-Binary Successive Approximation Review         | 75  |
|    |       | 3.3.3  | Series Non-Binary Capacitive Ladder                | 77  |
|    |       | 3.3.4  | Digital Calibration Scheme                         | 80  |
|    |       | 3.3.5  | Variable Duty-Cycled Clock                         | 83  |
|    |       | 3.3.6  | High-Speed Digital Logic                           | 85  |
|    | 3.4   | Measu  | red Results                                        | 85  |
|    | 3.5   | Applic | eability                                           | 91  |
|    |       | 3.5.1  | Technology Scaling                                 | 91  |
|    |       | 3.5.2  | Future Role of the Proposed ADC Topology           | 93  |
|    | 3.6   | Conclu | asion                                              | 95  |
| 4  | Con   | clusio | n and Future Work                                  | 97  |
| Bi | bliog | graphy |                                                    | 100 |

# List of Figures

| 1.1          | Time and frequency domain comparisons of ultra-wideband and narrowband signals.                                                                                                                                                          | 3               |
|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
| 1.2          | FCC regulations of indoor communications                                                                                                                                                                                                 | 5               |
| 2.1          | Radio architectures.                                                                                                                                                                                                                     | 12              |
| 2.2          | Signal and noise spectrum (left) before and (right) after sub-sampling.<br>The black color is wanted signal, and the gray one represents noise,<br>assuming a sufficient out-of-band blocking                                            | 1/              |
| 2.3          | Simplified receiver path for direct-conversion (top) and subsampling (bottom) architecture                                                                                                                                               | 16              |
| 2.4          | Bandpass pulse (top left) and subsampled waveform with 0Ts (top                                                                                                                                                                          | 10              |
| 2.5          | right), 0.05Ts (bottom left), 0.1Ts (bottom right) sampling offset (a) Magnitude response over $[0,\pi]$ of FIR Hilbert transformer, backward central and FIB differentiator: (b)–(e) 21-tap FIB of Hilbert and                          | 18              |
|              | differentiator operator on Gaussian and modulated Gaussian pulses                                                                                                                                                                        |                 |
|              | (b) Hilbert on Gaussian pulse (c) Differentiator on Gaussian pulse (d)<br>Hilbert of modulated Gaussian pulse (e) Differentiator on modulated                                                                                            |                 |
| 2.6          | Gaussian pulse                                                                                                                                                                                                                           | 23<br>24        |
| 2.7          | (a)–(b) Analytic signal pair of signal and matched filter response; (c) Graphic view of matched filtering $(< Y, H) > $ and $< Y, H_1 > $ )                                                                                              | 28              |
| 2.8          | Detection performance for analytic matched filter                                                                                                                                                                                        | $\frac{20}{34}$ |
| 2.9          | (a) Plots of analytic matched filter outputs corresponding to $\{0,5,10,15\}\%$<br>of $T_s$ timing offset (b)–(c) $SNR$ of real, imaginary, magnitude part of<br>matched filter output with 0 to 1 $T_s$ timing offset with high and low |                 |
| 9.10         | SNR.                                                                                                                                                                                                                                     | 37              |
| 2.10<br>2.11 | (a) Measured noise and interference using TEM horn antenna; (b)                                                                                                                                                                          | 38              |
| 2.12         | Spectrum after 8th order Butterworth bandpass filter Block diagrams of a digital baseband for 0-1 GHz impulse radio                                                                                                                      | 41<br>47        |
|              |                                                                                                                                                                                                                                          |                 |

| 2.13         | Flow charts of the digital design flow.                                                                                  | 48 |
|--------------|--------------------------------------------------------------------------------------------------------------------------|----|
| 2.14         | Graphic view of the digital design flow.                                                                                 | 49 |
| 2.15         | Layout view of the digital baseband.                                                                                     | 50 |
| 2.16         | Power efficiency comparisons.                                                                                            | 51 |
| 2.17         | (a) Power (b) Area pie charts of the digital baseband.                                                                   | 52 |
| 2.18         | Implementation loss versus input SNR and ADC bits.                                                                       | 54 |
| 2.19         | Implementation loss versus input SNR and iitter.                                                                         | 55 |
| 2.20         | Implementation loss versus in-band interference level and ADC bits                                                       | 56 |
| 2.20<br>2.21 | (a) Experiment setup: (b)–(c) Local oscillator frequency mismatch ef-                                                    | 00 |
| 2.21         | (a) Experiment setup, $(b)$ $(c)$ Escar eschator requery minimator of fect: $(d)-(e)$ Measurements with various distance | 58 |
|              | (c) (c) moustionion with tarious distance                                                                                | 00 |
| 3.1          | Conventional architectures for Nyquist ADCs                                                                              | 63 |
| 3.2          | Synchronous conversion for SAR ADCs.                                                                                     | 65 |
| 3.3          | Asynchronous processing concept.                                                                                         | 65 |
| 3.4          | Best (solid line) and worst (dash line) case of $V_{res}$ profile                                                        | 68 |
| 3.5          | Simplified block diagrams of the ADC architecture.                                                                       | 71 |
| 3.6          | Dynamic comparator schematic.                                                                                            | 73 |
| 3.7          | Conventional implementation of radix creation.                                                                           | 76 |
| 3.8          | Series Non-binary Capacitive Ladder.                                                                                     | 78 |
| 3.9          | LMS calibration loop.                                                                                                    | 82 |
| 3.10         | Variable duty-cycled clock generation.                                                                                   | 84 |
| 3.11         | DNL and INL before and after combining weights calibration                                                               | 86 |
| 3.12         | PCB for testing Nyquist frequencies.                                                                                     | 87 |
| 3.13         | DNL and INL before and after combining weights calibration                                                               | 88 |
| 3.14         | Measured SNDR versus $f_s$ and $f_{in}$ for single ADC (a) below and (b)                                                 |    |
|              | above Nyquist frequency.                                                                                                 | 89 |
| 3.15         | (a) Measured SNDR versus $f_s$ and $f_{in}$ for time-interleaved ADC and                                                 |    |
|              | its (b) FFT spectrum measured at 159 MHz input.                                                                          | 90 |
| 3.16         | FFT spectrum before and after clock skew calibration.                                                                    | 91 |
| 3.17         | Future role of the SA architecture.                                                                                      | 94 |
| 3.18         | FOM comparisons with recent $>10$ MHz, 6-8 bit ADCs from ISSCC                                                           |    |
|              | 00-05.                                                                                                                   | 95 |

# List of Tables

| 3.1 | Performance Summary (25 °C) $\ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots \ldots$ | 92 |
|-----|----------------------------------------------------------------------------------------------|----|
| 3.2 | Technology Scaling on the proposed ADC architecture                                          | 93 |

#### Acknowledgments

First of all, I am truly grateful to my advisor, Prof. Bob Brodersen, for his support, encouragement and advice over the past years. I have been benefited not only from his vision but also his unique philosophies of handling things, which will become invaluable treasures in my life. I am indeed fortunate to pursue MS and PhD degrees with him. Additionally, BWRC created by him and Prof. Jan Rabaey has been a great place to grow as a graduate student.

I'd like to thank Prof. Borivoje Nikolic for his informative digital classes, feedback on my thesis and chairing my qualifying exam committee. I'd also like to thank Prof. Paul Wright for spending his time being my dissertation committee. Thanks to Prof. David Tse and Prof. Kannan Ramchandran, I have learned fundamental communication and DSP background in the early graduate school years. I'd like to thank Prof. Paul Gray for discussing with my PhD project.

There are, of course, other graduate students who add more flavors to my life in BWRC. First of all, my special thanks go to Jeff Ou, who found my application package was lost before I could even come to Berkeley. Ian O'Donnell led the first project when I joined the group. We had countless technical and non-technical discussions which I truly learned from him. His sense of humor and travel advice made my life in Berkeley enjoyable. Thanks, Ian. Thanks to Stanley Wang, we have survived many class projects together. His technical comments and experience sharing have been very helpful ever since we joined the group. My thanks also go to Luns Tee, who offered me many useful technical discussions regarding my PhD project. With his company, we were able to go on car-pool lanes whenever we went down to south bay to fix our board and packaging problems. Engling Yeo has been kindly helping me with digital tool issues in the early days, and joined me with many bike rides and other outdoor activities. Tufan Karalar, Axel Berny and Tim Wongkomet who started graduate school at the same year with me. It has been great to take classes, exams with them and exchange our experiences at different stages of the graduate school. Tufan and I ended up sitting next to each other. I'd also thank Bill Tsang for useful discussions and being a great listener when I had to complain about testing and layout stuffs. I'd thank Liang-Teck Pang for sharing the laboratory test bench with me while he was away for an internship.

There are other graduate students who overlapped with my years and I am thankful for their advice and being great neighbors in BWRC. They are Yun Chiu, Ada Poon, Johan Vanderhaegen, Changchun Shi, Yuen-Hui Chee, Chinh Doan, Brian Limketkai, Henry Jen, Dejan Markovic, En-Yi Lin and Sayf Alalusi.

Without these great BWRC staffs, there is no way I could get any project going. They are Tom Boot, Gary Kelson, Elise Mills, Brian Richards, Kevin Zimmerman, and Sue Mellers.

My thanks to Shiang-Lung Koo, I have been travelling around most of the places with his company ever since I came to Berkeley, and he is always there whenever I need help including any car problem. I also thank him along with Yuan-Shih Chen and Luns Tee for our regular Friday night dinners.

Last but not the least, I am greatly indebted to my family. My parents, Chia-Shun Chen and Shu-Mei Huang, have given me their wholehearted support and care through every stage of my life. My brother, Wei-Ming Chen, has always been my best buddy. I am also grateful for the support and blessings from my parents in law, Chia-Ming Lee and Huei-Huei Hsueh. Undoubtedly, without the unreserved love, support and care from my wife, Wan-Hsuan Lee, there is no way I could reach this point. I am truly fortunate to have you in my life, Wan!

# Chapter 1

# Introduction

#### 1.1 Trends in Wireless Communication Systems

The rapid development of the wireless systems has closely linked with our daily life. From outdoor cellular systems to indoor office/home network, the wireless technologies are ubiquitous. Among these, the regulatory allocation of unlicensed bandwidth particularly triggers a wide range of applications from personal area to metropolitan area network. The WLAN systems are successful examples, such as IEEE 802.11a/b/g, to provide data network in office or home environment. The data capacity has been increasing using more efficient modulation schemes or multiple antenna arrays as proposed in IEEE 802.11n. In the WMAN area, 802.16 has been standardized to provide a high-rate wireless link in wider area. For the personal area (WPAN), the high-speed link is also of interest to replace the cable connections, as seen in 802.15 efforts. Regardless of which application, the trend of these emerging wireless systems has been demanding more bandwidth, increased data rate, and more configurability to accommodate multi-standards. However, the power consumption is tightly constrained to enable mobile deployment for longer battery life. Especially, those systems without high data rate constraint, such as sensor networks and RFIDs, would most likely require extremely low cost and low power consumption.

This trend has driven the radio implementation into a new era. Previously, the radio was normally optimized for a particular signal band, which is no longer adequate for the future wideband radios. From the radio architecture perspective, there is always a tradeoff between the amount of processing in analog and digital domains. The most critical factor that often determines this tradeoff is analog-to-digital converter (ADC). The wide signal bandwidth implies a high sampling rate of ADC, and more dynamic range (DR) can be required if less analog processing is done prior to ADC. From circuit implementation perspective, the digital baseband processing is less an issue since the scaled technology keeps favoring the digital world. However, the RF power amplifier needs enhancement of its linearity and dynamic range in order to maintain the flexibility for multi-standards. The antenna requires a wider bandwidth and optimized in those communication bands. Similarly, the receiver front-end components before frequency translation, i.e. mixers, have to cover the wide RF bandwidth and sometimes are tuned to several communication modes. Finally, the implementation cost of a high-speed high-DR ADC can be exorbitant so that a mod-



Figure 1.1: Time and frequency domain comparisons of ultra-wideband and narrowband signals.

ification of radio architecture is needed to relax the ADC specifications for an overall lower cost. On the other hand, a new wave of pushing ADC performance has been underway, which will give more freedom for different radio architectures.

These challenges have rendered engineers a new playground from both system and component designs. A co-design of system and circuit level becomes crucial especially a system-on-chip (SOC) solution is a key to dramatically cut down the cost. This thesis is one example of these new opportunities. We will explore both system and circuit spaces for ultra-wideband communication.

## 1.2 Overview of Ultra-Wideband Communication

An ultra-wideband signal has been widely used in radar applications. Recently, the usage of such a signal was approved by FCC [1] in 2002. According to the definitions by FCC, any signal with an absolute bandwidth or relative bandwidth (ratio of signal bandwidth and center frequency) greater than 500 MHz and 25% respectively is considered as an ultra-wideband signal. Comparing to a narrowband signal, an UWB signal concentrates only a small duration in time, while its frequency spectrum is widely spread, as shown in Fig. 1.1. Due to the different signal characteristics, it owns different design issues from both communication and circuit implementation perspectives. For example, the channel diversity of a wideband signal increases with the bandwidth, i.e. more resolvable multi-paths. Moreover, the narrow pulse duration also implies a fine timing information. Based on these unique features, the scope of utilizing UWB has covered both high and low end applications as seen in 802.15 UWB standardization efforts. High-speed application includes multimedia communications, home networking, wireless USB and wireless data network. The data rate is on the order of hundreds of Mbps with a power consumption of hundreds of milli-watts. For a lower speed application, precision ranging, sensor network and tracking systems are of high interests. The data rate normally requires tens of Kbps with a mere power consumption of mW.

There has been a great concern of harmful interferences with the existing wireless systems after deploying UWB devices. To ensure the peaceful coexistence of UWB



Figure 1.2: FCC regulations of indoor communications

and the rest of the world, the transmission of UWB signals is regulated through an emission power mask. The maximum power level is regulated at Part 15 level (-41 dBm/MHz). Currently, the approved application fields include imaging systems, vehicular radar systems, communications and measurement systems. Among these, the indoor communications have drawn the most of commercial interests, which is allocated from 3.1-10.6 GHz, as shown in Fig. 1.2. They are particularly driven by the emerging demands of high-speed home/office network either for WLAN or WPAN.

## **1.3** Research Contributions

This dissertation explores both optimal system and circuit solutions for UWB communication. The scope of this work has covered radio system architecture and circuit implementation levels. At both levels, a judicious usage between analog and digital domains was exploited through digital signal processing techniques. In summary, the research contributions include:

• An impulse radio architecture using sub-sampling front end to reduce the overall radio complexity and cost;

• An analytic signal processing technique to compensate the analog front-end impairment and explore the fine timing information, which is an unique feature of UWB signals;

• A full analysis and simulation of system specifications of the proposed transceiver

architecture;

• An asynchronous ADC architecture to achieve outstanding power efficiency of a high-speed, medium-resolution ADC;

• A series, non-binary capacitive ladder to enable high-speed operation, low-power consumption and high input bandwidth while creating arbitrary radix for the searching schemes;

• A digital calibration scheme to compensate the defects of manufacturing process, which can be extended for higher-resolution ADCs.

## 1.4 Thesis Organizations

Chapter 2 focuses on the system-level work, including radio architecture comparisons, analog front-end and digital signal processing and baseband blocks. The system specifications and prototype are also discussed. In chapter 3, the asynchronous ADC architecture is proposed and described in details. The prototype design in  $0.13\mu$ m CMOS and its measured results are provided. The future extension and potential usage of this ADC architecture are discussed in the end. Finally, the conclusions and future work are summarized in Chapter 4.

# Chapter 2

# Sub-sampling UWB system architecture

In this chapter, an impulse radio architecture utilizing a simple analog front-end along with digital complex signal processing is proposed to allow a low complexity implementation of a 3.1-10.6 GHz Ultra-Wideband radio. The proposed system transmits passband pulses using a pulser and antenna, and the receiver front-end down-converts the signal frequency via sub-sampling, thus, requiring substantially less hardware than the existing direct conversion approach. After analog-to-digital converter (ADC), the signal is projected into complex signal domain to perform matched filtering to not only mitigate the timing sensitivity induced by analog circuit impairment, but also extract the fine time resolution provided by the wideband nature of a UWB signal. The performance and potential usages of these complex signal processing blocks are solved and compared with different complex signal transformations. Based on the proposed architecture, the system specifications and implementation issues are further analyzed and emulated with system-level simulations with measured signal and noise. Finally, a prototype is built with discrete components to demonstrate the feasibility of the proposed impulse radio architecture.

#### 2.1 Introduction

Ultra-wideband (UWB) transmission was approved by the FCC in 2002 [1] for several frequency bands (0-960 MHz, 3.1-10.6 GHz and 22-29 GHz), and has since drawn considerable attention for a variety of applications, including imaging, surveillance, high-speed data communication and high-resolution locationing [2][3]. One of the most discussed applications is indoor communication systems operated in the frequency band from 3.1 to 10.6 GHz. The main interest lies in high-speed (100s' of Mb/s) and short-range (less than 10 meters) systems for wireless personal area network (WPAN). On the other extreme, low data rate (10s' Kb/s) communications are also of interest, such as sensor network and ranging system.

The new challenge for UWB radio implementation is to fully exploit the wideband nature for lower power and a less costly solution than by increasing the efficiency of narrowband techniques such as occurring in the standard 802.11n. While most of the published UWB implementations [4][5][6][7][8] are supporting the MultiBand OFDM approach, i.e. WiMedia Alliance [9], these radios are essentially a scaled up version of the current 802.11a/g systems, which results in considerable complexity in their implementations.

In this thesis, we have explored the new signalling opportunity using non-sinusoidal carriers, so called impulse radios [10][11], which allowed us to take a fundamentally new approach to radio architectures, signal processing techniques, and analog circuits. Our goal is to take the benefit of the wideband characteristics of UWB pulses and seek a system solution which dramatically cuts down the implementation cost while exploring the inherent capability of fine-timing resolution.

For the radio architecture, we proposed to utilize a sub-sampling analog frontend for reduced component count and to provide as much processing as possible in the digital domain to realize a substantial reduction of power and area [12]. The challenges of the simplified frontend are identified and suitability for the UWB case is justified from system performance and circuit implementation perspectives. One key issue of sub-sampling an RF signal is timing sensitivity, and this problem is mitigated by exploring several signal transformations of the sampled UWB signal in the digital domain. Among them, the use of analytic signalling provides significant performance improvements in wideband signal processing as is the case in UWB. The performance and properties of the proposed analytic signal processing blocks were analyzed and shown to be useful for synchronization, data detection and fine timing extraction for ranging applications. Given the proposed system architecture, the implementation specifications and issues were further explored with the assistance of system simulations. In addition, a recently developed low-power and high-speed ADC that is able to sub-sample above 4 GHz signal band [13] makes the proposed sub-sampling front-end practical. Finally, the proposed sub-sampling impulse radio was validated through a real wireless link using an experimental transceiver composed of UWB antenna, filters, pulse generator and high-speed sampling oscilloscope.

The thesis is organized as follows. Section 2.2 compares the implementation complexity of the proposed transceiver architecture and the conventional direct-conversion one. Section 2.3 provides an overview of sub-sampling and explains its implementation challenges and justifies the applicability for UWB. Section 2.4 focuses on the digital complex signal processing of a UWB signal. The properties and usage of several critical complex signal processing blocks are analyzed. Section 2.5 provides a link budget analysis to determine system specification and identifies implementation issues. Next, system-level simulations using measured pulse shape and noise samples are performed to verify system specifications and tradeoffs in section 2.6. Finally, a prototype is built and measured to prove the concept in section 2.7.

## 2.2 Comparison of Transceiver Architectures

The proposed system is based on the impulse radio approach [10][11]. Many recent published UWB system solutions [4][5][14][6][7][8], including both OFDM and DSSS approaches, have adopted a direct-conversion architecture, as shown in Fig. 2.1(a). A major challenge of this approach is that the overall complexity still remains



(a) Direct-conversion radio architecture



(b) Sub-sampling radio architecture

Figure 2.1: Radio architectures.

high which means a dramatic cost and power reduction from the current wireless solutions is unlikely. For example, the transmitter of a wideband OFDM radio requires high-speed digital-to-analog converter, up-conversion mixers, oscillators and power amplifier with linearity and peak-to-average ratio (PAR) constraints because of the multicarrier transmission [15]. On the other hand, an impulse radio simply uses a pulser to drive the antenna, and radiates a passband pulse shaped by the response of the wideband antenna and any possible bandpass filter, as shown in Fig. 2.1(b). The hardware elimination of mixers and local oscillators for mixing and the reduced linearity requirement imply lower complexity implementations of a transmitter.

On the receiver side, the direct-conversion architecture utilizes two paths (I and Q) of local oscillator (LO), frequency synthesizer and mixer to down-convert the passband

signal to baseband prior to the ADCs. According to the published literatures, these extra analog circuit blocks for frequency translation can contribute significant portion of the total power consumption. Alternatively, the frequency translation can be done in the sampling process, using sub-sampling, as a part of ADC. This results in only one receive path and dramatically reduces the component count compared to a direct-conversion architecture. The remaining analog blocks prior to the ADC are amplifiers and bandpass filters. The sampled data are processed by a digital matched filter in order to reach the matched filter bound for optimal detection [16]. The proposed system avoids wideband analog processing by adding more processing to the digital backend, which results in a more efficient solution. Moreover, the single analog receiving path eliminates the analog I/Q mismatch problem caused by the variability of IC fabrication.

## 2.3 Analog: Subsampling Front End

In this section we will first review the fundamentals of subsampling theory and then discuss what makes it a promising solution for ultra-wideband systems.

#### 2.3.1 Theory Background and Challenges

A Subsampling front end directly samples the passband signal at twice the signal bandwidth instead of the maximum signal frequency. The signal is bandlimited from



Figure 2.2: Signal and noise spectrum (left) before and (right) after sub-sampling. The black color is wanted signal, and the gray one represents noise, assuming a sufficient out-of-band blocking.

 $F_l$  to  $F_h$  (Hz), and the sampling frequency is  $F_s$  (Hz). By carefully choosing  $F_s$ ,  $F_l$ and  $F_h$ , a non-aliased sampled spectrum can be derived [17]. For example, if the lower or upper frequency bound, i.e.  $F_l$  and  $F_h$ , is a non-negative integer multiple of the signal bandwidth, B, the signal aliasing can be avoided at the minimal uniform sampling rate, 2B,

$$F_l = n \cdot (F_h - F_l) = n \cdot B, \tag{2.1}$$

where  $n \in N$ .

The undersampling ratio, K, is defined as,  $\lfloor F_h/2B \rfloor$ , the largest integer but smaller than  $\lfloor F_h/2B \rfloor$ . This ratio is a good indication of the amount of aliasing effect, which will be seen later.

There are several key challenges about subsampling that prohibit it from being popular in existing narrowband systems. First of all, the noise spectrum from  $-F_l$  to  $+F_l$  will alias into the signal band and thus deteriorate the passband SNR, assuming the receiver bandwidth is  $F_h$ . Even if the receiver can afford a perfect anti-aliasing filter, the circuits after the filter, such as the sample and hold or buffer stages, still contribute thermal noise. Therefore, if the receiver noise is dominated by these circuit noise sources, the passband SNR degradation is proportional to the undersampling ratio, K, which can be on the order of hundred's for a narrowband system. The higher the undersampling ratio is, the harder it is to implement the anti-aliasing filter, which will be elaborated in the system specification section. Secondly, it is more difficult to design a good bandpass (anti-aliasing) filter at RF frequency band than IF or baseband. As mentioned earlier, insufficient attenuation of out of band noise causes more degradation in signal SNR. Traditionally, these high Q bandpass filters are implemented off-chip, which unavoidably increases the cost and power consumption. Finally, the incoming signal frequency is actually higher than the sampling frequency; therefore the receiver's tracking bandwidth needs to cover the maximum signal frequency range and sampling jitter causes even more degradation of the SNR. If one models the jitter-induced error as another noise source, the noise power increases with input signal frequency [18]. Therefore, directly sampling at the RF frequency introduces more noise power than at an IF or baseband.

#### 2.3.2 Subsampling for UWB

Interestingly, the challenge of subsampling is relaxed in the UWB case due to its wide signal bandwidth. For example, if a 1 GHz wide pulse is transmitted between 3



Figure 2.3: Simplified receiver path for direct-conversion (top) and subsampling (bottom) architecture.

to 4 GHz, and sampled at 2 Gsa/s, then the undersampling ratio is only two, much smaller than any narrowband system. A simplified receiver path of direct-conversion and subsampling architecture is shown in Fig. 2.3 to illustrate the impact of the undersampling ratio, K, on the noise folding issue. The out-of-band noise prior to anti-aliasing (AA) filer should be blocked by the filter response, which sets the same out-of-band attenuation constraints of the two radio architectures. We will discuss the specification of AA filter response for the subsampling frontend in section 2.5. Here, we assume an ideal anti-aliasing filter and focus on the noise from the following circuits, composed of a buffer stage and track-and-hold. The main difference between the two architectures is that the subsampling frontend needs to maintain a high bandwidth up to the sampling circuit. Therefore, the wider thermal noise spectrum post AA filter will fold into signal band after sampling. In this case, a simple buffer stage is modelled as a transconductance  $(g_m)$  cell and loaded by  $R_L$ . Considering the thermal noise from active devices  $(4kT\gamma g_m V^2/\text{Hz})$  and resistors  $(4kTR V^2/\text{Hz})$ , the total noise sampled onto the capacitor can be derived as,  $(1 + \gamma g_m R_L) \frac{kT}{C_L}$  for the directconversion architecture, and as  $(1+\gamma g_m R_L) \frac{kT}{C_L} \cdot K$  for subsampling one, indicating that a smaller undersampling ratio reduces the amount of extra noise folding. Moreover, UWB has a relatively lower received SNR due to limited signal transmission power regulated by FCC and larger in-band noise power, as the large signal bandwidth takes in more ambient noise, early-stage circuit noise power and possible interference. This means the UWB receiver requires only a moderate resolution ADC (4-6 bits), as will be further analyzed in later sections. For such ADCs, the quantization noise dominates over the sampled thermal noise on the sampling capacitor. In other words, the small sampling capacitor does not noticeably degrade the overall system performance, which makes subsampling architecture very promising for UWB.

Next, the difficulty of implementing a RF bandpass filter is also relaxed by the lower filter Q, which is less than ten in UWB, enabling the integration in a CMOS implementation. Note that the use of a high-Q notch filter can be required in the case of a strong interference close to the signal band. In section 2.5, more detailed discussion of bandpass filtering will be provided. Lastly, the lower ADC dynamic range due to large in-band noise and vulnerability to interference will reduce the sampling jitter constraint. Our system level simulations in section 2.6 also verify the



Figure 2.4: Bandpass pulse (top left) and subsampled waveform with 0Ts (top right), 0.05Ts (bottom left), 0.1Ts (bottom right) sampling offset.

relatively lower dynamic range requirements.

#### 2.3.3 Timing Sensitivity Issue

While subsampling from the noise standpoint is feasible for UWB, the architecture also suffers from sampling offset, which is introduced by frequency and phase mismatch between the TX and RX oscillators or changes of pulse arrival times. The sampled waveform will change dramatically due to this sampling offset, which can cause serious performance degradation of a receiver using matched filtering approach. It is known that the use of a matched filter [16] results in the optimal SNR performance when the matched filter response perfectly matched to the incoming signal. However, if there is any mismatch between the two, the system performance will be deteriorated as derived in [19]. In this case, the mismatch is caused by the varied sampled waveform due to timing offset, which will be referred as a timing sensitivity issue in the following contents. In fact, the timing sensitivity is more significant in subsampling than Nyquist sampling given the same sampling frequency, since the variation in the waveform significantly complicates the design of digital matched filter. Using measured UWB pulses that are bandlimited to 3-4 GHz, the impact of sampling offset on sampled waveform is shown in Fig. 2.4. An analytic signaling approach described in the next section is thus proposed to alleviate this problem.

## 2.4 Digital: Complex Signal Processing

In order to fully exploit the wideband nature of UWB signal, we need to analyze the signal in the appropriate space. Due to the wide band frequency content, the UWB pulses possess fine timing information while it increases synchronization difficulties as described previously, especially for a subsampling analog front-end. From the ranging/locationing perspective, this fine timing resolution is very helpful.

#### 2.4.1 Complex representations of UWB signals

To understand the timing information extraction, we purposely inject a timing offset into the incoming signal and observe the change in the frequency domain. Equation (2.2) shows a bandlimited signal, s(t), which is sampled at the sampling rate,  $1/T_s$ . Any sampling offset,  $T_o$ , of the sampling sequence will transform into a phase shift,  $e^{-jk\frac{2\pi}{T_s}T_o}$ , in the frequency domain.

$$s(t) \cdot \sum \delta(t - k \cdot T_s - T_o) \stackrel{\mathcal{F}.\mathcal{I}}{\iff} S(\omega) * \sum \delta(\omega - k \frac{2\pi}{T_s}) \cdot e^{-jk\frac{2\pi}{T_s}T_o}.$$
 (2.2)

The key to study phase information is to project the signal into an orthogonal space, which is a 90-degree phase shift in this case. In a narrowband system, sine and cosine carrier is used to develop I and Q channels. However, for an UWB signal, the orthogonality at one particular frequency is not good enough, so a wideband phase shifter is required. Two possibilities are Hilbert transformer ( $\mathcal{H}[\cdot]$ ) and the differentiator ( $\mathcal{D}[\cdot]$ ). A wideband 90-degree phase shift is thus obtained, as seen in their ideal frequency responses, (2.3) and (2.4).

$$\mathcal{H}[s(t)] = \frac{1}{\pi t} * s(t) \stackrel{\mathcal{F}.\mathcal{I}_{\cdot}}{\longleftrightarrow} - j \cdot sign(\omega) \cdot S(\omega)$$
(2.3)

where sign() is the signum function.

$$\mathcal{D}[s(t)] = \frac{d}{dt} \cdot s(t) \stackrel{\mathcal{F}.\mathcal{T}}{\longleftrightarrow} j \cdot \omega \cdot S(\omega)$$
(2.4)
Note that, in both cases, 90-degree phase shift is independent of the frequency. For a narrowband signal, like  $\sin(\omega_o t)$ , the two operators are almost identical except for a constant gain difference, because  $\mathcal{H}(\sin(\omega_o t)) = \cos(\omega_o t)$  while  $\mathcal{D}(\sin(\omega_o t)) = \omega_o \cdot \cos(\omega_o t)$ . However, in the wideband signal case, the differentiator suffers from a non-flat gain response (proportional to frequency) as seen in (2.4), which will not only introduce signal distortion, but also enhance the unwanted noise at high frequency. In contrast, an ideal Hilbert transformer has unity gain over the entire spectrum, i.e. no signal distortion.

To implement these operations in digital baseband, we take a further look into the discrete Hilbert transformer and differentiator. As an example, we designed an equal-ripple 21-tap FIR filter using the Parks-McClellan algorithm to obtain an exact 90-degree phase shift for the Hilbert transformer [20]. For the differentiator, we used a backward difference approximation (y[n] = x[n] - x[n-1]), and central difference approximation (y[n] = (x[n+1] - x[n-1])/2) for their simplicity in computation. We also design another 21-tap FIR filter for the differentiator using the Parks-McClellan algorithm. The phase response of these discrete-time implementations all remain 90 degrees with Figure 2.5(a) showing the magnitude response of these filters.

With these phase shifters in hand, the orthogonal bases can be constructed into a complex signal. By combining a Hilbert transformed signal, a single-side banded (SSB) spectrum, i.e. analytic signal, is derived,

$$y(t) = s(t) + j \cdot \mathcal{H}(s(t)). \tag{2.5}$$

Similarly, for a narrowband signal centered at  $\omega_o$ , a complex signal, which is an approximation of the analytic signal, can be formulated as,

$$y(t) = s(t) + j \cdot \frac{1}{\omega_o} \mathcal{D}(s(t)).$$
(2.6)

To better understand what these complex signal represents in the case of UWB pulses, we apply these transformations for both baseband and passband wideband signal, since FCC allows the use of both DC-1 GHz and 3-10 GHz frequency bands. Figure 2.5(b)–(e) shows the real and imaginary part of the transformed complex signal and its magnitude and phase information using a 21-tap FIR filter. Due to the large passband ripple in the frequency response, using a differentiator approach causes more pulse distortion. For preserving the shape and energy of a wideband signal, the discrete Hilbert transformer is preferred.

### 2.4.2 Proposed digital baseband using complex signaling

Figure 2.6 shows the datapath of the proposed digital baseband. The main components are an analytic signal transformer, pulse shape estimator, correlators, analytic matched filter, and detection block. For optimal detection, sufficient statistics is achieved by projecting onto the signal dimension through pulse matched filtering whose filter response requires an estimation block. The following correlators are used



Figure 2.5: (a) Magnitude response over  $[0,\pi]$  of FIR Hilbert transformer, backward, central, and FIR differentiator; (b)–(e) 21-tap FIR of Hilbert and differentiator operator on Gaussian and modulated Gaussian pulses (b) Hilbert on Gaussian pulse (c) Differentiator on Gaussian pulse (d) Hilbert of modulated Gaussian pulse (e) Differentiator on modulated Gaussian pulse



Figure 2.6: Datapath of the proposed digital baseband

to provide more processing gain if necessary or despread any possible modulated code. Finally, a detection block makes use of the analytic matched filter outputs to decode the data or extract timing information for ranging purpose. Several critical processing blocks will be discussed in this section.

#### Analytic signal transformer

The analytic signal transformation can be implemented using FIR Hilbert transformer or FFT processor. With the FFT approach, one can potentially eliminate in-band tones before the detection block with the penalty of more implementation complexity. From the system simulations, using a 21-tap FIR Hilbert transformer degrades SNR by less than 1 dB compared to ideal analytic signal transformation.

#### Pulse shape estimator

To make an optimal detection, the pulse matched filter response is required as a priori information. However, due to the varying wireless channel response and antenna pattern, the received pulse shape needs an estimation by the receiver before decoding data. In the proposed baseband processing, a maximum likelihood (ML) estimator is used for the shape estimation without any prior information. It is assumed that a training sequence spread with a pseudo random (PN) code is transmitted within the coherence time of the channel. The modulated pulses are spaced far enough to avoid inter-symbol interference during training phase. The estimator can be derived as follows.

The  $i^{th}$  symbol of the incoming signal to the ML estimator is expressed as a  $1 \times n$  vector,  $\vec{y_i}$ .

$$\vec{\mathbf{y}}_i = [y_i^{(1)} y_i^{(2)} \dots y_i^{(n)}] = c_i \cdot \vec{\mathbf{S}} + c_i \cdot \vec{\mathbf{I}}_i + \vec{\mathbf{N}}_i, \qquad (2.7)$$

where  $\vec{\mathbf{S}} = [s^{(1)}s^{(2)}\dots s^{(n)}]$  is the received pulse shape (assumed constant within channel coherence time),  $\vec{\mathbf{I}}_i$  is the sum of the interfers,  $\vec{\mathbf{N}}_i$  is the sum of all the additive Gaussian noise sources, and  $c_i$  is the  $i^{th}$  chip of PN code.

Therefore, the received samples over the entire PN code is 
$$\mathbf{Y} = \begin{bmatrix} \vec{\mathbf{y}}_1 \\ \vec{\mathbf{y}}_2 \\ \vdots \\ \vec{\mathbf{y}}_m \end{bmatrix}$$
, and the

ML estimation of  $\vec{\mathbf{S}}$  is

$$\vec{\mathbf{S}}_{ML} = \operatorname{argmax}_{\widehat{\mathbf{S}}} P(\mathbf{Y} \mid \vec{\mathbf{S}} = [s^{(1)}s^{(2)}\dots s^{(n)}])$$

$$\Rightarrow \hat{s}_{ML}^{(j)} = \operatorname{argmax}_{\widehat{s}^{(j)}} P(\begin{bmatrix} y_1^{(j)} \\ y_2^{(j)} \\ \vdots \\ y_m^{(j)} \end{bmatrix} \mid s^{(j)}), \forall j \in [1\dots n].$$
(2.8)

Since the sum of interfers is despreaded with a PN code, the aggregate noise samples are modelled as white Gaussian noise. We also consider a LO frequency offset,  $\Delta f$ , between Tx and Rx, which causes the sampling error term  $-iks'^{(j)}$  as shown in Eq. (2.9), assuming  $k \propto \Delta f$ , and is so small that first-order approximation of sampling error is sufficient.

$$c_{i} \cdot \vec{I}_{i}^{(j)} + \vec{N}_{i}^{(j)} \sim \mathcal{N}(0, \sigma_{I}^{2} + \sigma_{N}^{2}), \forall i, j.$$

$$\Rightarrow P( \begin{vmatrix} y_{1}^{(j)} \\ y_{2}^{(j)} \\ \vdots \\ y_{m}^{(j)} \end{vmatrix} \mid s^{(j)}) = \prod_{i=1}^{m} \frac{1}{\sqrt{2\pi(\sigma_{I}^{2} + \sigma_{N}^{2})}} \cdot \exp^{-\left[\frac{(c_{i}y_{i}^{(j)} - s^{(j)} - iks'^{(j)})^{2}}{2(\sigma_{I}^{2} + \sigma_{N}^{2})}\right]}$$
(2.9)

Apply Eq. (2.9) back to Eq. (2.8), the estimator can be derived as,

$$\frac{\partial}{\partial s} \left( \sum_{i=1}^{m} (c_i y_i^{(j)} - s^{(j)} - iks^{\prime(j)})^2 \right) = 0$$

$$\Rightarrow m \cdot \widehat{s}_{ML}^{(j)} + \frac{m(m+1)}{2} k \widehat{s'}_{ML}^{(j)} = \sum_{i=1}^{m} c_i y_i^{(j)}$$

$$\Rightarrow \widehat{s}_{ML}^{(j)} = \frac{1}{m} \cdot \sum_{i=1}^{m} c_i y_i^{(j)} + d \cdot e^{\frac{-2}{(m+1)k}t_j}$$
if  $(m+1)k \sim 0 \Rightarrow \widehat{\mathbf{S}}_{ML} = \frac{1}{m} \cdot \sum_{i=1}^{m} c_i \overline{\mathbf{y}}_i$ 
(2.10)

This shows that the sample drift, i.e. (m + 1)k, over the entire training sequence period due to frequency mismatch must be small to reduce the estimation error term. The above analysis was done assuming the estimator is prior to the analytic signal transformation. Eq. (2.10) still holds if the estimator is placed after the complex signal formulation, since the noise is modelled as complex Gaussian variable and the rest of the derivations stay the same.

#### Analytic matched filter

An analytic matched filter is used to achieve optimal performance assuming the incoming noise is additive and white. The assumption is valid if the circuit and ambient thermal noise dominate. For the interference dominated case, a whitening filter, which can be incorporated with analytic signal transformer as mentioned earlier, is required for optimal performance. To gain more insights about the analytic matched filter, we derive the expressions of its outputs in continuous time for both noiseless and noisy cases. Assuming the UWB signal, s(t), is transformed into the analytic signal



Hr. Imag{} H. H.

(a) Signal frequency response,  $S_r$  and  $S_i$ 

(b) Matched filter frequency response,  $H_r$ and  $H_i$ 



Figure 2.7: (a)–(b) Analytic signal pair of signal and matched filter response; (c) Graphic view of matched filtering  $(\langle Y_r, H_r \rangle$  and  $\langle Y_r, H_i \rangle)$ 

representation,  $\tilde{s}(t) = s_r(t) + j \cdot s_i(t)$ , the matched filter response is implemented as  $\tilde{h}(t) = \tilde{s}^*(T-t) = s_r^*(T-t) - j \cdot s_i^*(T-t)$ . We define the impulse response of the filter as  $\tilde{h}(t) = h_r(t) - j \cdot h_i(t)$ , and thus  $h_r(t) = s_r^*(T-t)$ , and  $h_i(t) = s_i^*(T-t)$ . As a result, their Fourier transformed counterparts satisfy  $H_r(\omega) = S_r^*(\omega)$  and  $H_i(\omega) = S_i^*(\omega)$ , as shown in Fig. 2.7. Note that  $s_r(t), s_i(t), h_r(t), h_i(t)$  are real-valued functions.

Before we proceed with the following analysis, an operator is defined as follows,

$$\langle x, y \rangle = \int_{a}^{b} x(t) \cdot y(T-t)dt$$
, where  $T = b - a$ . (2.11)

If the inputs of the analytic matched filter are  $y_r(t) + j \cdot y_i(t)$ , its outputs are expressed as,

$$m_r + j \cdot m_i = \langle y_r + j \cdot y_i, h_r - j \cdot h_i \rangle$$
  
=  $\langle y_r, h_r \rangle + \langle y_i, h_i \rangle + j \cdot (-\langle y_r, h_i \rangle + \langle y_i, h_r \rangle).$ (2.12)

The UWB pulse is assumed to locate between time a and b without inter-symbol interference.

To understand the properties of the analytic matched filter, three conditions are considered for Eq. (2.12). We start from an ideal communication channel and gradually add in more non-idealities.

Case I. Noiseless input and perfectly matched timing

This is the most ideal case in which incoming signal perfectly matches with the matched filter response without any noise, i.e.  $y_r = s_r, y_i = s_i$ . The sampling timing offset between input signal and impulse response is assumed zero. The output is expressed as,

$$m_{r} + j \cdot m_{i} = \langle s_{r} + j \cdot s_{i}, h_{r} - j \cdot h_{i} \rangle$$

$$= \langle s_{r}, h_{r} \rangle + \langle s_{i}, h_{i} \rangle + j \cdot (-\langle s_{r}, h_{i} \rangle + \langle s_{i}, h_{r} \rangle)$$

$$= E_{r} + E_{i} = 2 \cdot E_{s} \qquad (2.13)$$

,where  $E_r$ ,  $E_i$  is the real and imaginary part of signal energy defined as  $\int_a^b s_r^2(t) dt$ and  $\int_a^b s_i^2(t) dt$  respectively. We assume an ideal analytic transformer, and thus  $E_r = E_i = E_s$ . Note that  $s_r(t)$  and  $h_i(t)$  are orthogonal to each other, therefore  $\langle s_r, h_i \rangle = 0$ , and so is  $\langle s_i, h_r \rangle = 0$ . This shows that all the energy concentrates in the real part of the analytic matched filter, and thus complex signal transformations are redundant under the ideal case.

#### Case II. Noiseless input with timing offset

In this case, we add in a timing offset of  $T_o$  between input signal and matched filter response while the input is still noiseless. Recall from Eq. (2.2), the timing offset introduces a phase shift term of  $e^{\pm jk\frac{2\pi}{T_s}T_o}$ . The frequency response of the real part of the received signal is shown in Fig. 2.7(c);  $Y_{r\pm}(\omega)$  is rotated by  $\pm k\frac{2\pi}{T_s}T_o$  from  $H_{r\pm}(\omega)$ . Since matched filtering is equivalent to image projection in the frequency domain, the projected area represents the absolute value of the matched filter output. In Fig. 2.7(c), the values of  $\langle y_r, h_r \rangle, \langle y_r, h_i \rangle$  are shown in the darker and lighter shaded areas. From these graphs, the timing offset results in a re-distribution of the signal energy between these two terms. Therefore, a complex signal transformation is necessary in this case, otherwise the signal energy will be lost under any timing uncertainty. The matched filter output is re-derived as,

$$m_r + j \cdot m_i = \langle y_r, h_r \rangle + \langle y_i, h_i \rangle + j \cdot (-\langle y_r, h_i \rangle + \langle y_i, h_r \rangle)$$
  
=  $2E_s \cdot \cos \theta + j \cdot (-2E_s \cdot \sin \theta)$ , where  $\theta = k \frac{2\pi}{T_s} T_o$ . (2.14)

This equation implies that taking the magnitude of the analytic matched filter output conserves the entire signal energy.

$$\sqrt{m_r^2 + m_i^2} = \sqrt{4E_s^2(\cos^2\theta + \sin^2\theta)} = 2E_s \tag{2.15}$$

Case III. Noisy input with timing offset

In this final case, we add a noise term to the inputs to study the filter performance under various signal-to-noise ratio and timing offsets. The noise is modelled as white Gaussian for the following analysis.

$$m_{r} + j \cdot m_{i} = \langle y_{r} + n_{r} + j \cdot (y_{i} + n_{i}), h_{r} - j \cdot h_{i} \rangle$$

$$= \langle y_{r}, h_{r} \rangle + \langle y_{i}, h_{i} \rangle + \langle n_{r}, h_{r} \rangle + \langle n_{i}, h_{i} \rangle$$

$$+ j \cdot (-\langle y_{r}, h_{i} \rangle + \langle y_{i}, h_{r} \rangle - \langle n_{r}, h_{i} \rangle + \langle n_{i}, h_{r} \rangle)$$

$$= 2E_{s} \cdot \cos\theta + n_{mr} - j \cdot (2E_{s} \cdot \sin\theta + n_{mi}) \qquad (2.16)$$

, where the following assumptions are made,

- (a)  $n_r$  and  $n_i$  are i.i.d.
- (b)  $n_{mr} = \langle n_r, h_r \rangle + \langle n_i, h_i \rangle \sim N(0, \sigma_n^2)$
- (c)  $n_{ir} = \langle n_r, h_i \rangle + \langle n_i, h_r \rangle \sim N(0, \sigma_n^2)$

From the previous case of noiseless input, the magnitude of the analytic matched filter output conserves all the signal energy. However, under the noisy input case, this does not result in the best performance. For example, when the timing perfectly matches, i.e. all the signal energy resides in the real part, the imaginary part has only noise. Therefore, any operation taking into imaginary part has to degrade the overall performance. In the following analysis, we derive the output SNR of the magnitude of the filter to compare against that of the real part. So that, we can derive the optimal detection region for various input SNR's.

Given the white Gaussian input noise, the magnitude of the matched filter, R =

 $\sqrt{m_r^2 + m_i^2}$  is Ricean distribution [16]. Its first and second moment statistics are,

$$E(R) = \sqrt{2\sigma_n^2} \cdot e^{-\frac{(2E_s)^2}{2\sigma_n^2}} \cdot \frac{\Gamma(1.5)}{\Gamma(1)} \cdot F_1(1.5, 1, \frac{(2E_s)^2}{2\sigma_n^2})$$
  

$$E(R^2) = 2\sigma_n^2 \cdot e^{-\frac{(2E_s)^2}{2\sigma_n^2}} \cdot \frac{\Gamma(2)}{\Gamma(1)} \cdot F_1(2, 1, \frac{(2E_s)^2}{2\sigma_n^2})$$
(2.17)

To calculate the SNR of R, the Euclidean distance between signal existence and absence is used as signal energy.

$$SNR_{R}\left(\frac{2E_{s}}{\sigma_{n}}=\alpha\right) = \frac{\left(\frac{E(R)\left|\frac{2E_{s}}{\sigma_{n}}=\alpha}-E(R)\right|\frac{2E_{s}}{\sigma_{n}}=0}{2}\right)^{2}}{Var(R)\left|\frac{2E_{s}}{\sigma_{n}}=\alpha}\right)^{2}}.$$
(2.18)

To give insight, we compare this result to the SNR of the real part of the matched filter,

$$SNR_{m_r}\left(\frac{2E_s}{\sigma_n} = \alpha\right) = \frac{\left(\frac{E(m_r)|\frac{2E_s}{\sigma_n} = \alpha} - E(m_r)|\frac{2E_s}{\sigma_n} = 0}{Var(m_r)|\frac{2E_s}{\sigma_n} = \alpha}\right)^2}{Var(m_r)|\frac{2E_s}{\sigma_n} = \alpha} = \frac{\alpha^2}{4} \cdot \cos^2\theta.$$
(2.19)

Figure 2.8(a) shows the  $SNR_{loss} = SNR_{m_r} - SNR_R$  (in dB) under different input noise level without timing offset, i.e.  $\theta = 0$ . For the higher SNR regime, SNR degradation is less significant than the lower SNR regime. Due to the  $SNR_{loss}$ , there exists an optimal detection region where using any of the real, imaginary or magnitude part of the matched filter output results in a better



(a)  $SNR_{loss}$  and detection boundary angle ( $\Phi$ ) v.s. input SNR.



(b) Optimal detection region.

Figure 2.8: Detection performance for analytic matched filter.

performance, as shown in Figure 2.8(b). The boundary of these detection regions can be defined through a detection boundary angle,  $\Phi$ , which is calculated as,

$$SNR_R(\alpha) \le SNR_{m_r}(\alpha, \theta) \Longrightarrow \theta \le \Phi = \arccos(\sqrt{\frac{4}{\alpha^2} \cdot SNR_R}).$$
 (2.20)

Eq. (2.20) defines the region where the real part of matched filter output has the best performance. So, if  $\Phi > 45$  degree, the magnitude of the matched filter always performs worse than the real or imaginary part, and should not be used for detection. Note that  $\Phi$  decreases as SNR increases. In the extremely high SNR case, the magnitude of the matched filter performs best regardless of the timing offset.

Next, we run system-level monte-carlo simulations to verify the derived equations. Fig. 2.9(a) plots the analytic matched filter output on the Euler coordinates. 10,000 experiments are simulated with and without signal existence. Each graph differs only in sampling offset. The results show that an offset of even 5% of the sampling period could rotate the complex signal about 30 degrees as predicted by Eq. (2.16). The rate of rotation is proportional to  $T_o/T_s$ , and the undersampling ratio, k. This explains the reason why a subsampling front-end results in a even higher timing sensitivity. If a real-valued matched filter is used rather than an analytic one, the real-valued matched filter output is essentially the projection onto the real axis, which creates SNR nulls during rotation.

Figure 2.9 shows SNR of the real, imaginary and magnitude of the matched filter with  $[0...T_s]$  timing offset under different input noise level. From the simulation results, partitioning into three detection regions is necessary for optimal performance in the higher SNR case, while two is adequate for lower SNR.

#### 2.4.3 Timing extraction from the proposed baseband

The sampling offset due to circuit impairment is the same as the signal arrival delay due to the movement between transmitter and receiver or environmental changes if it is a non-line-of-sight (NLOS) link. Therefore, the timing information can potentially be extracted for ranging or locationing. The complex output of the analytic matched filter essentially provides a 2-D correlation profile of a UWB pulse and as the delay increases, the analytic matched filter output moves along a certain trajectory on the Euler plane. The trajectory can be better understood from the frequency domain, shown in Eq. (2.21). The first term is the signal band shifted from RF to baseband. The second term is simply the time shift of the signal, and the last one is the shift band rotation due to sub-sampling.

$$s(t - \Delta t) \cdot \sum \delta(t - k \cdot T_s) \stackrel{\mathcal{F}.\mathcal{T}_{\cdot}}{\longleftrightarrow} \sum \underbrace{S(\omega - k\frac{2\pi}{T_s})}_{\text{Mixed down signal band}} \cdot \underbrace{e^{-j\omega\Delta t}}_{\text{Time shift}} \cdot \underbrace{e^{-jk\frac{2\pi}{T_s}\Delta t}}_{\text{Shift band rotation}}.$$
(2.21)

As the delay increases from  $t_{ref}$  to  $t_{ref} + \Delta t$ , the movement of the trajectory can



Figure 2.9: (a) Plots of analytic matched filter outputs corresponding to  $\{0,5,10,15\}\%$  of  $T_s$  timing offset (b)–(c) SNR of real, imaginary, magnitude part of matched filter output with 0 to 1  $T_s$  timing offset with high and low SNR.



Figure 2.10: Trajectory of analytic matched filter output as delay varies.

be decomposed into two steps. First of all, the matched filter output will follow a baseband signal correlation profile, shown as a heart shape in Fig. 2.10. And there is extra circular rotation with an angle proportional to the timing offset. Therefore, the trajectory is strongly related to the shape of the UWB pulse, undersampling ratio, and the sampling rate. Since each UWB pulse shape has its unique trajectory, it enables the capability of tracking environmental changes. However, it requires a perfect gain control and a high SNR to precisely locate the position of the trajectory, which will come at a higher implementation cost.

# 2.5 Implementation Specifications and Issues

A first-order link budget analysis including circuit implementation loss will be provided for the entire receiver chain up to the ADC. The approach will be to treat each individual non-ideality as an independent and additive noise source. The circuit specification of each block should be made to minimize the implementation loss, defined as the gap between output and received SNR. The received and output SNR can be expressed as:

$$SNR_{received} = \frac{P_{signal}}{P_{ambient}} \tag{2.22}$$

$$SNR_{out} = \frac{P_{signal}}{P_{ambient} + P_{ckt} + P_{jitter} + P_{SH} + P_{adc}}$$
(2.23)

,where

 $P_{signal}$ : received signal power;

 $P_{ambient}$ : received ambient thermal noise plus interference power within communication band;

 $P_{ckt}$ : input-referred thermal noise power caused by amplifiers and filters;

 $P_{jitter}$ : input-referred clock jitter induced sampling noise;

 $P_{SH}$ : input-referred sample and hold (subsampling mixer) noise;

 $P_{adc}$ : input-referred quantization noise power of ADC.

The following analysis is based on transmitting a 1GHz UWB pulse centered at 3.5 GHz with a sampling rate of 2 Gsa/s. However, one may easily apply this analytical approach to a different communication band and sampling frequency as long as there is no signal aliasing. Later, we will include these realistic circuit impairments into system-level simulations using measured noise and interference samples.

## 2.5.1 Received SNR

According to the FCC regulations [1], the transmission power spectral density has to be under -41 dBm/MHz. Given a 1 GHz wide signal bandwidth, the maximum transmission power is therefore -11 dBm. The received power however gets attenuated through the wireless channel. According to our S21 measurements using spiral and elliptical wideband antennas [21] as well as the literatures [22][23], the path loss can be 40 to 60 dB from 1 to 10 meters between transmitter and receiver. Therefore, the expected received signal power in the following analysis is within -51 to -71 dBm



Figure 2.11: (a) Measured noise and interference using TEM horn antenna; (b) Spectrum after 8th order Butterworth bandpass filter.

range.

The ambient noise level is strongly coupled with the operation environment. Figure 2.11(a) shows a noise spectrum measured by TEM horn antenna and spectrum analyzer (HP 8563E). The measured data was recorded over a typical day using the maximum values, which represent the worst-case scenario. Most of the interference comes from <1 GHz, the 1.9 GHz PCS band, and the 2.4 GHz ISM band. From WiFi systems, we also measured interference from the 5 GHz UNI band. From the measurement results, received interference power can vary from -50 to -30 dBm. The received thermal noise power under a power matched front-end is -174 dBm/Hz [15], so for 1 GHz bandwidth and ideal bandpass filtering, the total thermal noise power is -84 dBm, which sets the minimum bound on noise level.

## 2.5.2 Bandpass Filter Response

The aggregate receiver response, including antenna, matching network, amplifiers and filters, requires a bandpass response for image rejection and channel selection if a multi-band operation is desired. As mentioned earlier, due to the wideband nature (low Q) of UWB, the requirement of bandpass filtering is considerably relaxed in comparison to narrowband system. From thermal noise perspective, the undersampling ratio (less than 10) determines the required stop band attenuation in order to suppress the aliased out-of-band thermal noise well below in-band noise level. Nevertheless, the real requirement of stop band attenuation depends on out-of-band interferers in Fig. 2.11(a). The unfiltered out-of-band interference will alias back to the signal band and corrupt the SNR. The proposed bandpass response attenuates any out-of-band interference at least 10 dB below in-band thermal noise level. Shown in Fig. 2.11(b), an 8th order Butterworth bandpass filter between 3 to 4 GHz meets the requirement. Note that one may relax the order of bandpass response if an additional notch filter is used to block the high interference band.

### 2.5.3 Gain

Sufficient gain is required to amplify the input received level to the full swing of the ADC in order to fully utilize its dynamic range. On the other hand, the gain should be limited to avoid saturating the front-end. Saturation of the receiver will cause large distortion as well as enhancing noise power. Not only is it difficult to perform good matched filtering in the digital domain, but we also lose the ability to reject in-band interference by any digital signal processing technique. As described in section 5.1, the input received power is within -51 dBm to -71 dBm. If a passband UWB pulse in Fig. 2.4 is used, the peak signal level varies from 100's of  $\mu$ V to a couple of mV, assuming there is no further duty cycling. According to FCC's regulation, one may increase the pulse energy by lowering pulse repetition rate up to 20 dB. If a system adopts duty cycling, one should further reduce receiver gain.

On the noise side, considering 1 GHz thermal noise (-84 dBm) plus 20 dB margin for noise figure of the receiver front-end, aliased noise power and insertion loss of matching network, the standard deviation is about 200  $\mu$ V on a 50 ohm input impedance. Since the signal and noise are independent, the total received signal variance is the summation of the two. For the pulse shape we measured, the total received signal standard deviation is around 1mV. Using the three-sigma rule, the input-referred single-ended swing is about 3 mV for a 0.3% probability of saturation. As the supply voltage of CMOS process scales, the input full swing of ADC is reduced to the order of 100's mV, especially for high-speed operation. This implies the gain should not exceed 40 dB. Note that this analysis does not consider AGC loop in order to reduce the receiver complexity.

## 2.5.4 Sampling Clock

The only oscillator required in the subsampling front-end is the sampling clock. The two most important specifications of clocking are precision and jitter. As illustrated in Fig. 2.9(a), the sampling offset will rotate the analytic matched filter output, which in turn limits the number of pulses that can be used for pulse shape estimation. The proposed specification on clock precision constrains the sampling offset to at most 1% of the sampling period during the channel estimation phase, which is about a 6 degree rotation. For example, a 10 ppm, 100 MHz oscillator can tolerate about 50 pulses for channel estimation, calculated by the following equation:

$$\frac{1}{f_{osc}} \cdot \frac{P_{off}}{10^6} \cdot \frac{\# \text{ cycles}}{\text{pulse}} \cdot \# \text{ pulses} \le T_s \cdot 1\%$$
(2.24)

,where

 $f_{osc}$ : oscillator frequency;

 $P_{off}$ : clock frequency offset measured in part per million (ppm);

# cycles/pulse: number of oscillator cycles within a pulse repetition period;

# pulses: number of pulses required for channel estimation.

Another critical specification of the clock is jitter, especially for a subsampling receiver. In a traditional worst-case jitter analysis, the clock is assumed to sample at the sharpest edge. Jitter is constrained such that it contributes negligible noise compared to one LSB of ADC. However, for a UWB signal, the energy is distributed over a wide frequency band. Thus, a worst-case analysis is too pessimistic. A noise modeling considering the input signal spectrum is more appropriate [16].

$$P_{jitter} = \int_{-\infty}^{\infty} |S(j\omega)|^2 \cdot (1 - e^{\frac{-\omega^2 \sigma_j^2}{2}}) d\omega$$
(2.25)

, where  $P_{jitter}$  is the equivalent noise power due to clock jitter,  $S(j\omega)$  is the signal spectrum and  $\sigma_j$  is the RMS jitter of the clock source.

Once the UWB pulse is known, one may calculate the jitter induced noise power for the link budget analysis. In the next section, we will perform system simulations to get more insights on the impact of clock jitter.

### 2.5.5 Subsampling Mixer and ADC

Conventionally, quantization noise power contributed from ADC is modeled as  $LSB^2/12$  assuming quantization noise is uniformly distributed [20]. As bit resolution decreases, this noise modeling becomes less accurate. Therefore, we will determine the ADC resolution in system simulations. In a back of envelope calculation, the quantization noise power of a 4-bit ADC is about the same as ambient noise given in section 2.5.3.

One key issue with the proposed system is the implementation cost of a high speed and wide input bandwidth ADC. Previous state-of-the-art high speed (GHz) and medium resolution (6-8 bit) ADCs consume at least hundreds to thousands of milliwatts and only supports up to Nyquist input. Fortunately, the recent development of a high-speed 6-bit ADC [13] has pushed the power consumption down to 5 milliwatts at 600 MS/s and maintained the input bandwidth greater than 4.5 GHz in  $0.13\mu$ m CMOS process. Scaling from the published result, a 6-bit ADC with GHz sampling rate consumes on the order of ten milliwatts. Moreover, the input bandwidth will keep increasing because technology continues to scale down and sampling capacitance for medium resolution ADC can be very small. For example, a sampling capacitance larger than 10's fF has negligible thermal noise for 8-bit ADC. The significantly reduced power and area of a sub-sampling ADC makes the proposed radio architecture promising for low-cost implementation.



Figure 2.12: Block diagrams of a digital baseband for 0-1 GHz impulse radio.

## 2.5.6 Implementation Cost of the Digital Baseband

Benefiting from technology scaling, the power and area cost of digital gates continues to decrease with the scaled supply voltage and feature size. In this section, we will examine an ASIC implementation of the previously designed UWB digital baseband [24] to estimate the capabilities of modern technology. Shown in Fig. 2.12, the baseband was designed to perform synchronization and data detection/tracking loop. A pulse matched filter is used to match the expected pulse waveform with the following PN correlators providing additional processing gain up to 30 dB. An absolute peak detector is then used to perform maximum likelihood (ML) detection to acquire the pulse. Once in synchronization, the control logic will switch the system into the data recovery mode, which consists of an early/late tracking loop to compensate the



Figure 2.13: Flow charts of the digital design flow.

impairment of the crystal oscillator. Finally, the system performs hard decision while the soft output can be read off-chip.

Shown in Fig. 2.13, the baseband algorithms development and verifications are accomplished in Matlab/Simulink and in-house FPGA emulation engine. The digital frontend design consisting of VHDL netlisting, logic synthesis and optimazation are performed in Synopsys Module Compiler and Design Compiler. ModelSim is then utilized to verify the VHDL code with the same Matlab testbench used to develop algorithms. The digital backend flow is composed of standard cell floor-planning, placement, power structuring, clock tree synthesis, and routing. There are several iterations required to improve the chip performance. Finally, the verification procedure includes all the verifications before and after the layout, such as VHDL simulations, static timing analysis ,power estimation, functionality check and DRC/LVS. There is one critical issue in this design and it is the massive parallel matched filters that



Figure 2.14: Graphic view of the digital design flow.

make flat physical design impractical. The routing channels demanded by these dense interconnects are congested under any reasonable placement density. Therefore, creating a hierarchical physical design into the in-house digital design flow is a necessity. A semi-automatic place and route flow is finally created to solve the routing issues. Fig. 2.14 shows the graphic view of the design flow.

The technology used in this ASIC design is  $0.13\mu$ m CMOS process. The total chip size measures 3.6 mm by 3.3 mm as shown in Fig. 2.15. There are 530,000 standard cells being used yielding ~1.5 MOPS/mW. The power efficiency of this direct-mapped approach outperforms that of DSP and microprocessor ones as expected (Fig. 2.16). The power consumption in acquisition and tracking mode is 12 and 1.5 mW respectively while operated in 1.1 V and 10 MHz clock. The area and power breakdown of the baseband blocks are illustrated in Fig. 2.17, while they both are dominated by



Figure 2.15: Layout view of the digital baseband.



Figure 2.16: Power efficiency comparisons.

the matched filters and correlators. Based on this ASIC design example, the power consumption of the proposed analytic signal baseband in sub-sampling radio is estimated to be on the order of 10 mW. Note that the leakage power can eventually play a critical role as the scaling trend continues.

# 2.6 System Simulations

While the previous section provides an analytical approach of designing the proposed system, we now include these circuit non-idealities into system-level simulations using measured noise samples. The simulation takes measured pulses generated from a pulser and TEM horn antenna, whose frequency response is flat between 3 to 10 GHz. For interference and noise, sixty million samples were acquired by the TEM



Figure 2.17: (a) Power (b) Area pie charts of the digital baseband.

horn antenna and Agilent DSO (54855A), which is capable of sampling at 20 Gsa/s. The measured pulse shape and noise samples were post-processed in Matlab. The pulse is bandlimited to 3-4 GHz and subsampled at 2 Gsa/s.

In the simulation framework, signal and noise are each oversampled at higher rates and filtered by the same bandpass (anti-aliasing) filter. A random jitter can be introduced while downsampling to 2 Gsa/s. The system simulation allows us to investigate input sensitivity, clock jitter, bandpass filter response, gain, ADC bits along with the whole digital signal processing blocks. For the time being, we will focus only on the specifications of analog blocks, while keeping all the digital blocks in floating point. Digital implementations can be easily built upon this Matlab/Simulink framework using our in-house digital FPGA/ASIC design flow [25][26]. A previously designed UWB digital baseband for DC-1GHz band is one example of using this fast silicon prototyping approach [24].

The figure of merit of the proposed design methodology is implementation loss between input (received) SNR and analytic matched filter output SNR, which is measured without any sampling offset or channel estimation error. Next, the relationship between implementation loss and critical system parameters, such as ADC bits, jitter, in-band interference level, and input SNR, will be examined.

• ADC Quantization effect

The input signal level is scaled from -51 dBm to -71 dBm as explained in section 2.5.1. Shown in Fig. 2.18, more than 4-bit quantization is sufficient to keep implemen-



Figure 2.18: Implementation loss versus input SNR and ADC bits.

tation loss within 3 dB for all input SNR. This is fairly close to hand analysis results. Note that, in the low SNR region, 1-bit ADC does not degrade system performance by much.

• Jitter

In the jitter simulation, a 6-bit ADC is used and RMS jitter varies from 0 to 10 picoseconds. Figure 2.19 shows that a jitter greater than 6 picoseconds can cause more than 3 dB implementation loss. The jitter requirement is relatively stringent; however a fixed frequency clock tends to produce less phase noise than a tunable frequency one [27]. We can also conclude that jitter induced noise dominates the total noise power in high SNR region.



Figure 2.19: Implementation loss versus input SNR and jitter.

#### • In-band interference Immunity

Depending on the operating environment, UWB is highly vulnerable to the inband interference. Therefore, we purposely inject an in-band sine wave to investigate the interference immunity. The simulation results do not include any interference cancellation that can be potentially incorporated into the system. Observing from Fig. 2.20, larger than -40 dBm interference power in general degrades system performance more than 3 dB, because the input-referred saturation level is set around 3 mV (amplitude of -40 dBm interference on 50 ohm impedance). Therefore, input-referred saturation level is a trade-off between interference immunity and quantization noise. We can also observe that more than 3-bit ADC shows a better in-band interference



Figure 2.20: Implementation loss versus in-band interference level and ADC bits.
immunity.

## 2.7 System Prototype

The subsampling system prototype was set up as shown in Fig. 2.21(a). The passive components, such as UWB antenna and bandpass filters, are donated from Taiyo Yuden. The antenna is driven by Agilent 81134A pulse/pattern generator to generate UWB pulses. On the receiver side, Agilent 54855A samples the received signal immediately after antenna and bandpass filter. The oscilloscope front end has a bandwidth of 7 GHz, and around 7 bit built-in ADC. The sampling RMS jitter is reported to be 3 ps.

Using such a prototype system, we can verify the proposed complex signal processing technique under subsampling front end. We performed the following experiments:

1. Frequency mismatch of local oscillator between transmitter and receiver also introduces the sampling offset as mentioned in the previous section. An experiment was done by transmitting 250 MHz pulse rate with LOS, and sampling rate of 5 Gsa/s. One million samples were taken and the matched filter outputs are shown in Fig. 2.21(b)–(c). The constellation of matched filter outputs rotate at the rate of frequency offset.

2. Movement of the Tx and Rx under LOS scenario has the similar phenomenon as frequency mismatch, except the amplitude variation. This experiment was done in 20 Gsa/s sampling mode in order to reduce the triggering uncertainty. The sampled



(a) Prototype configuration



(b) Output SNR of analytic MF



(d) Time domain views of the first three measurements



(c) Trajectory of analytic matched filter

#### output within $0.2~\mathrm{msec}$



(e) Trajectory of analytic matched filter output of all measurements

Figure 2.21: (a) Experiment setup; (b)–(c) Local oscillator frequency mismatch effect; (d)–(e) Measurements with various distance

data was decimated by four times in Matlab. The signal was captured at various distance between transmitter and receiver with 3 cm steps. We can see both the time domain view and the trajectory of analytic matched filter output on Euler plane in Fig. 2.21(d)–(e). The trajectory is composed of phase portrait of the pulse itself and the amplitude attenuation due to free space propagation.

## 2.8 Conclusion

A sub-sampling analog front-end combined with analytic signal processing has been proposed for passband UWB communications. The architecture minimizes the building blocks for a low-complexity implementation with the potential for full CMOS integration given the recent demonstration of a high-speed and low-power sub-sampling ADC in a fully integrated CMOS process. By exploiting the analytic matched filter outputs, timing sensitivity is mitigated for synchronization and data detection purpose. On the other hand, the derived 2-D correlation profile allows a fine time resolution, which implies a high accuracy ranging capability. A first-order link budget analysis including circuit impairments is provided. The specifications of the critical blocks are verified by both analytical analysis and system-level simulations. The prototype also proved the feasibility of the proposed transceiver architecture. Following the presented design approach, one may determine the optimal circuit specifications of the proposed radio architecture for different applications, such as low-rate ranging system or high-speed data communications.

## Chapter 3

# High-speed Low-power Asynchronous ADC

In this chapter, an asynchronous analog-to-digital converter (ADC) based on successive approximation is introduced to provide a high speed (600-MS/sec) and medium resolution (6 bits) conversion. A high input bandwidth (>4 GHz) was achieved which allows its use in RF sub-sampling applications. By using asynchronous processing techniques, it avoids clocks at higher than the sample rate and speeds up a non-binary successive approximation algorithm utilizing a series, non-binary capacitive ladder with digital radix calibration. The sample rate of 600-MS/sec was achieved by time interleaving two single ADCs, which were fabricated in a 0.13- $\mu$ m standard digital CMOS process. The ADC achieves a peak SNDR of 34 dB, while only consuming an active area of 0.12 mm<sup>2</sup> and power consumption of 5.3 mW.

## **3.1** Introduction

Trends in many communication systems, such as ultra-wideband (UWB), cognitive, and software defined radio, etc, require ever wider signal bandwidths, increased flexibility and system integration yet with lower power consumption and smaller area to meet cost targets. Typical requirements of these system architectures demand medium resolution ( $\sim$ 6-bit) and high speed ( $\sim$ GHz) ADCs such as needed in 802.15 UWB standard. If the ADC is designed with sufficient input bandwidth, it is able to sub-sample the wideband RF signal and achieve a radio solution that even further dramatically reduces implementation cost [12].

Conventionally, a flash-type converter [28][29][30][31] is often chosen when the sample rate is high, since it can perform a conversion in a single clock cycle. However this comes at the expense of an exponential dependence of area and power on the resolution, as well as offset variations of the parallel paths, which requires pre-amplifiers or extra calibrations [32]. On the other hand, a successive approximation (SA) architecture has only a logarithmic dependence on resolution, but consumes multiple clock cycles to implement the conversion algorithm [33], which requires more time interleaving for faster conversion speed.

This work explores architectural strategies and circuit techniques to optimize the power efficiency and area of a high-speed ADC. An asynchronous ADC architecture is proposed to speed up the power efficient SA algorithm using a dynamic comparator and digital logic to facilitate asynchronous processing. To achieve the high-speed and high-bandwidth requirement of this ADC, a series capacitive ladder network is used to reduce the effective capacitance. In fact, much effort has been made to push the size limit of this ladder, raising the concern of random errors. Therefore, a postprocessing digital calibration scheme is used to compensate for random errors induced from manufacture. Finally, the asynchronous logic is optimized for its speed by using dynamic digital logic.

This chapter describes the prototype design of a 6-bit 600 MS/s asynchronous ADC [13] that consumes a total power of 5.3 mW. Section 3.2 reviews the power efficiency of conventional ADC topologies and describes the concept and architecture of the proposed asynchronous approach. In Section 3.3, the implementation details of asynchronous ADC are provided, and measured results are shown in Section 3.4. Finally, the technology scaling and potential usage of the proposed ADC architecture are concluded in Section 3.5 and 3.6.

## **3.2 ADC Architecture**

#### **3.2.1** Power Efficiency of Conventional ADC Architectures

Fig. 3.1 shows the three commonly used Nyquist ADC topologies: flash, pipeline and SAR. A first order estimation of power and conversion speed of these conventional topologies is performed to identify the best entry point for further power efficiency improvement. Traditionally, flash ADCs are favored for high-speed N-bit converter



Figure 3.1: Conventional architectures for Nyquist ADCs.

since  $2^{N}$ -1 comparators are utilized to make a fully parallel comparison with the entire quantization levels within one clock cycle. The decoding circuits solving sparkle and metastability issues and thermometer-to-binary code conversion also dissipate extra power. The total power consumption of a flash ADC therefore roughly scales as  $2^{N}$ . In Fig. 3.1, the conversion speed is normalized to one for comparison with other architectures corresponding to the fact that the full conversion is complete within one sample clock cycle. An approach to breaking the exponential dependence of the number of comparators on the number of bits is the use of a pipeline ADC. Instead of fully parallel comparison, it divides the process into several comparison stages, the number of which is proportional to the number of bits. Therefore, the total number of required comparators is greatly reduced, with only N comparators required for a 1-bit per stage, N-bit pipeline ADC. However, due to the pipeline structure of both analog and digital signal path, inter-stage residue amplification is needed which consumes considerable power and limits high speed operation. While it is possible to make use of open-loop residue amplification [34], an extra calibration loop is needed, increasing overall complexity and power consumption. Therefore, the total power consumption of a pipeline ADC increases > N with a speed < 1.

For low conversion speeds, an SAR approach is often used since it also divides a full conversion into several comparison stages in a way similar to the pipeline ADC, except the algorithm is executed sequentially rather than in parallel as in the pipeline case. An *N*-bit SAR converter utilizes only one comparator with *N* clock cycles to complete a full conversion. Thus, the total power consumption is normalized to approximately one, while speed is now 1/N. Since the ratio of power and speed represents the energy consumption per conversion sample, SAR converters clearly have a power efficiency advantage over the other approaches. Due to the fact that the power efficiency difference between SAR and flash topologies increases exponentially with the number of bits, *N*, an SAR converter provides a promising starting point for achieving the most power efficient solution. However, the sequential operation of the SA algorithm has traditionally been a limitation in achieving high-speed operation, so in the following section, an architecture based on asynchronous processing will be used to yield high speed operation with a normalized power/speed ratio << N.



Figure 3.2: Synchronous conversion for SAR ADCs.



Figure 3.3: Asynchronous processing concept.

#### 3.2.2 Asynchronous Processing

The conventional implementation of the SA algorithm, such as an SAR converter, relies on a synchronous clock to divide the time into a signal tracking phase and conversion phase which progresses from the MSB to the LSB as shown in Fig. 3.2. For an *N*-bit converter with conversion rate of  $F_s$ , a synchronous approach would require a clock running at least  $(N + 1) \cdot F_s$ . Since a SAR converter is traditionally used in lower conversion rate regime, therefore, clock generation is less of an issue. However, for a high-speed converter, the clock generation of this high-speed internal clock is a significant overhead. For example, a 300 MS/s and 6-bit SAR would require a 2.1 GHz clock. Synthesizing such a high-frequency clock plus clock distribution network would likely consume more power than the ADC itself. From speed perspective, every clock cycle has to tolerate the worst case comparison time, which is composed of maximum DAC settling time and comparator resolving time depending on the minimum resolvable input level. In addition, every clock cycle requires margin for the clock jitter which will either slow down the conversion speed or impose a stringent jitter requirement on the clock generator.

Therefore, the power and speed limitations of a synchronous SA design comes largely from the high-speed internal clock. By using asynchronous processing of the internal comparisons, it removes the need for such a clock and substantially improves the power efficiency compared to a synchronous design. On the top level, a global clock running at the sample rate is still used for an uniform sampling, since most of the digital baseband to date remains in a synchronous world. The concept of asynchronous processing is to trigger the internal comparison from MSB to LSB like dominoes. Shown in Fig. 3.3, whenever the current comparison is complete, a ready signal is generated to trigger the following comparison.

The voltage difference  $(V_{res})$  between input signal and reference level determines the comparator resolving time. For example, a typical regenerative latch has the following tradeoff between input voltage  $(V_{res})$  and resolving time  $(T_{cmp})$  [35].

$$T_{cmp} = \frac{\tau}{A_o - 1} \cdot \ln \frac{V_{FS}}{V_{res}} = K \cdot \ln \frac{V_{FS}}{V_{res}}$$
(3.1)

where  $A_o$  is the small-signal gain of the internal inverting amplifier,  $\tau$  is the time constant at the latch outputs and  $V_{FS}$  is the full logic swing level.

Depending on the comparator topology, the resolving time and input voltage tradeoff will change. Nevertheless, this simple regenerative latch model provides intuition into how asynchronous processing helps to improve the conversion speed. For an N-bit converter, the total comparison time of both synchronous and asynchronous design can be expressed as,

$$T_{async} = \sum_{i=0}^{N-1} K \cdot \ln \frac{V_{FS}}{V_{res}[i]}$$
$$T_{sync} = N \cdot K \cdot \ln \frac{V_{FS}}{V_{min}}$$
(3.2)

where  $V_{res}[i]$  denotes the input voltage of the comparator at  $i^{th}$  stage (Fig. 3.3), and



Figure 3.4: Best (solid line) and worst (dash line) case of  $V_{res}$  profile

 $V_{min}$  is usually set by the LSB level.

Clearly, the asynchronous conversion takes the advantage of the faster comparison cycles, since only one of these  $V_{res}[i], \forall i \in [0, N-1]$  will fall within  $\pm 1/2$  LSB due to the successive approximation algorithm. The amount of conversion time savings between  $T_{async}$  and  $T_{sync}$  is a function of the number of bits as well as the profile of  $V_{res}[i]$ , which depends on the input voltage level. In the extreme case, a 1-bit converter does not benefit from asynchronous processing, since the only comparison cycle is always limited by the worst-case resolving time. As the number of bits increase,  $V_{res}[i]$  will distribute over the full scale range and thus create time savings. Intuitively, the wider the range of  $V_{res}$ , the faster conversion speed it can achieve. With the assistance of numerical analysis of Eq. (3.2), the best case scenario is found when input signal is at full swing, i.e.  $V_{res}$  reaches  $\pm 1/2V_{FS}$ . When  $V_{res}$  alternates its polarity from consecutive comparison cycles, it results in the longest conversion time, as shown in Fig. 3.4.

The ratio of  $\frac{T_{async}}{T_{sync}}$  of both cases is derived as a function of number of bits in order to explore the theoretical performance bound of asynchronous processing. In the best case,  $V_{res}[i]$  are simply  $V_{FS}/2$ ,  $V_{FS}/4$ ,  $V_{FS}/8$ , ...,  $V_{FS}/2^N$ , assuming a binary successive approximation algorithm. If  $V_{LSB} = V_{FS}/2^N$ , the ratio of  $\frac{T_{async}}{T_{sync}}$  can be expressed as,

$$\frac{T_{async}}{T_{sync}} = \frac{\ln \frac{V_{FS}}{V_{LSB}} + \ln \frac{V_{FS}}{2V_{LSB}} + \ln \frac{V_{FS}}{4V_{LSB}} + \dots \ln \frac{V_{FS}}{2^{N-1}V_{LSB}}}{N \cdot \ln \frac{V_{FS}}{V_{LSB}/2}} \\
= \frac{N^2 \ln 2 - \frac{N}{2}(N-1) \ln 2}{N(N+1) \ln 2} = \frac{1}{2}$$
(3.3)

In the worst case, the input voltage level that leads to comparison results with alternating polarity can be better understood as the number of bits increases from 2-bit case, assuming  $V_{res}$  begins from positive side.

$$2\text{-bit Case} \Rightarrow V_{res}[0] - \frac{V_{FS}}{4} < 0$$

$$3\text{-bit Case} \Rightarrow V_{res}[0] - \frac{V_{FS}}{4} + \frac{V_{FS}}{8} > 0$$

$$4\text{-bit Case} \Rightarrow V_{res}[0] - \frac{V_{FS}}{4} + \frac{V_{FS}}{8} - \frac{V_{FS}}{16} < 0$$

$$\vdots$$

$$\Rightarrow \frac{1}{8}(1 + \frac{1}{4} + \frac{1}{16} + \cdots) < V_{res}[0] < \frac{1}{4}(1 - \frac{1}{4} - \frac{1}{16} + \cdots)$$

$$\Rightarrow V_{res}[0] \rightarrow \frac{1}{6}V_{FS}$$
(3.4)

Given the derived results from Eq. (3.4), the worst case conversion time occurs when  $V_{in}$  is equal to  $\frac{V_{FS}}{3}$  or  $\frac{2V_{FS}}{3}$ , and the ratio of  $\frac{T_{async}}{T_{sync}}$  is thus derived as follows,

$$\frac{T_{async}}{T_{sync}} = \frac{\ln \frac{V_{FS}}{1/6 \cdot V_{FS}} + \ln \frac{V_{FS}}{1/12 \cdot V_{FS}} + \dots \ln \frac{V_{FS}}{max(1/(3 \cdot 2^N) \cdot V_{FS}, V_{LSB}/2)}}{N \cdot \ln \frac{V_{FS}}{V_{LSB}/2}} \\
= \frac{(N-1)\ln 3 + \ln 2 + \frac{N}{2}(N+1)\ln 2}{N(N+1)\ln 2}$$
(3.5)

Note that the ratio of  $\frac{T_{async}}{T_{sync}}$  in Eq. (3.5) approaches 1/2 as N increases. In conclusion, given the lower and upper bound from Eq. (3.3) and (3.5), the maximum resolving time reduction between synchronous and asynchronous case is two fold. Moreover, the conversion time savings over a synchronous approach increases with higher ADC resolution.



Figure 3.5: Simplified block diagrams of the ADC architecture.

#### 3.2.3 Architecture

While there are several possible architectures to incorporate the asynchronous processing concept, the first prototype has utilized only one comparator with a charge redistribution network to achieve a low complexity implementation similar to an SAR converter. Since the internal comparisons use the same comparator, it does not require special attention to reduce its offset in the analog domain as the global offset can be subtracted in the digital domain. However, the overall conversion speed is slowed down because the comparator must be reset after each comparison cycle. The charge redistribution capacitor network is used to sample the input signal and serves as a digital-to-analog converter (DAC) for creating and subtracting reference voltages.

Besides asynchronous processing, time interleaving [36] is used to increase the

maximum conversion rate over what a single ADC can achieve. Note that there are power and area overheads as the number of parallel converters increases. Therefore, a single asynchronous ADC should be optimized for high speed and small silicon area. In this prototype (Fig. 3.5), two ADCs are time interleaved for a doubling of the sample rate over an individual ADC. The two phase (0 and 180 degree) clocks are provided via on-chip inversion, and used as sampling clocks and reset signals. The high input bandwidth (>4 GHz) of the individual converter achieved here would actually allow additional time interleaving.

There are two critical delay paths in this architecture, which involve signal and timing. For the signal path, each internal comparison result is stored in a SR latch as a buffering stage to the temporary bit caches. For the asynchronous timing path, the comparator's outputs are detected by a ready signal generator as a data completion flag of each comparison cycle. This ready signal then drives a sequencer to provide multiple-phase clocks for switching logic and temporary bit caches to store the internal comparison results. A separate pulse generator creates a reset phase for the comparator to avoid any memory effect from the previous comparisons. Note that the ready signal generator, pulse generator, and sequencer are the dedicated digital logic functions to perform asynchronous conversion and they occupy only a small portion of the silicon area.

Finally, the bit streams at the output of the bit caches are designed for high throughput which raises the difficulty of real-time streaming off chip. Therefore, a



Figure 3.6: Dynamic comparator schematic.

1K-depth on-chip SRAM is used to store the converted data, and later read out to off-chip in a much slower rate. The integration of the SRAMs is solely for testing purpose, and occupies the most of the die area.

## **3.3** Circuit Implementation Details

## 3.3.1 Dynamic Comparator and Ready Signal

The design of the comparator requires special consideration because of the need to generate a data ready signal. Shown in Fig. 3.6, a dynamic comparator is used that is composed of a pre-amplifier and regenerative latch. The complementary outputs of the comparator are connected to the positive supply during the reset phase and one of the two outputs ( $Q_p$  and  $Q_n$ ) is pulled down to the negative supply during comparison. Therefore, digital logic which is able to distinguish state '1"1' (reset phase) from state '1"0' or '0"1' (data ready) seems to be sufficient for the ready signal generation. However, one potential issue with asynchronous processing is that the comparator can be in a metastable state when the input is sufficiently small. The time needed for the comparator outputs to fully resolve may take arbitrarily long. As a result, the comparator is designed fast enough such that this only occurs when the input signal is less than 1 LSB, which means the decision does not effect the converter accuracy. In this case, the ready signal generator should still set the flag and the decision result is simply taken from the previous value stored in the SR latch. To achieve this goal, since both outputs ( $Q_p$  and  $Q_n$ ) will drop together to a lower level when the comparator is n metastable state, a simple NAND gate with input threshold above this level is a key solution to the ready signal generation. As  $Q_p$  and  $Q_n$  are to be in state '0"0' once they drop across the threshold, the NAND gate will set the flag and continue the remaining conversion process.

There are reset switches in both the pre-amplifier and latch stage that help to reduce the comparator recovery time during the reset phase. An input offset cancellation is also utilized for the pre-amplifier stage but is not critical in this ADC architecture as mentioned earlier. Current mirrors between the two stages are useful to reduce charge kickback [37] from the logic level swing of the latch onto the input capacitors preceded the pre-amplifier. This is especially important since the input capacitor network is pushed to the minimum possible value as will be described shortly.

#### 3.3.2 Non-Binary Successive Approximation Review

Instead of a binary successive approximation scheme, this ADC adopts redundancy to allow dynamic decision errors for faster conversion speed [38]. In other words, the overlapped search range compensates for wrong decisions made in earlier stages as long as they are within the error tolerance range. This is similar to the redundancy often used in a pipeline ADC via reduced residue amplification gain. The equivalent radix is less than 2 and computed as,

$$Radix = 2^{\frac{N_{bit}}{N_{bit} + N_{rdn}}} \tag{3.6}$$

,where  $N_{bit}$  is the target bit resolution, and  $N_{rdn}$  is the redundant bit. In this prototype, 1 extra redundant bit is used for the target of 6-bit resolution. Therefore, the equivalent radix is about 1.81.

In terms of implementation, there are two basic approaches as shown in Fig. 3.7. The geometrically scaled capacitor array makes use of a parallel bank of capacitors ratioed from 1,  $\alpha$ ,  $\alpha^2$ ...,  $\alpha^{N-1}$ , as shown in Fig. 3.7(a). The advantage of this approach is the low complexity of the switching logic, since only one capacitor will be switched at each comparison cycle. The propagation delay and power consumption through switching logic is expected to be lower. However, in this prototype, the



(a) Geometrically scaled capacitor array.



(b) Unitary capacitor array

Figure 3.7: Conventional implementation of radix creation.

ratio  $\alpha$  is non-integer which significantly increases layout complexity and matching difficulty [39] for a full array. On the other hand, an unitary capacitor array (Fig. 3.7(b)) can be used to avoid this non-integer matching issue. The non-binary code words are stored in a digital ROM. However, the propagation delay through a digital ROM is much larger compared to the previous case due to the longer logic depth. Moreover, the total input capacitance of both schemes is on the order of  $2^N$  times unit capacitance. Even for a 6-bit case, the total capacitance can be on the order of pico-farads with even tens of femto-farads unit capacitance which is set by matching and parasitic considerations. This causes additional power consumption as well as the difficulty to maintain a high input bandwidth.

#### 3.3.3 Series Non-Binary Capacitive Ladder

Another approach was therefore taken to create an arbitrary radix, i.e. in effect an analog ROM. This approach uses a ladder structure of a non-binary capacitor array which allows a significant reduction in the input capacitance with relaxed matching and layout requirements.

Shown in Fig. 3.8(a), three different sizes of capacitors ratioed from  $1:\alpha:\beta$  are used to build the ladder. The approach is to have the equivalent capacitance at every internal node be identical, i.e.  $\beta \cdot C_u$ . Therefore, the charge redistribution from one section to the adjacent one always sees the capacitive divider between  $\alpha \cdot C_u$  and  $\beta \cdot C_u$ . This division ratio will determine the radix of the SA algorithm. Based on



(a) Ideal case without parasitic capacitance.



(b) Inclusion of parasitic capacitance.

Figure 3.8: Series Non-binary Capacitive Ladder.

the above observations, the design equations of this ladder is derived as,

$$\begin{cases} \beta = 1 + \alpha \| \beta \\ Radix = 1 + \frac{\beta}{\alpha} \end{cases}$$
(3.7)

Due to the series connection of the capacitors, the equivalent capacitance is decreased which reduces the DAC settling time and the total input capacitance. The traditional tradeoff between matching property and total input capacitance is removed since it does not depend on reducing the unit capacitance size. The total input capacitance of the proposed ladder is no longer dependent on the number of ADC bits, and is calculated as,

$$C_{in} = [1 + 2 \cdot (\alpha \|\beta)] \cdot C_u \tag{3.8}$$

One potential issue with this ladder structure is the vulnerability to the parasitic capacitance due to interconnects or the capacitor itself, especially when the capacitor is implemented as a low-cost metal-oxide-metal (MOM) finger capacitor available in a standard digital CMOS process instead of a higher quality metal-insulator-metal (MIM) capacitor. The extra capacitance introduced at the floating nodes can change the effective radix value if the parasitic capacitance is not negligible to the capacitors in the ladder. In this prototype, the unit capacitance is set at the minimum possible value and MOM capacitors have non-negligible fringing capacitances, which necessitates a new design equation including the parasitic capacitance,  $p_1 \cdot C_u$  and  $p_2 \cdot C_u$  denoted in Fig. 3.8(b).

$$\begin{cases} \beta = 1 + \alpha \| \beta' + p_1 \\ Radix = 1 + \frac{\beta'}{\alpha} \\ \beta' = \beta + p_2 \end{cases}$$
(3.9)

By solving Eq. (3.7) and (3.9), one can show that the new ratios ( $\alpha_{mod}$  and  $\beta_{mod}$ ) should be modified according to the following relations with the original ones ( $\alpha_{org}$ and  $\beta_{org}$ ).

$$\begin{cases} \alpha_{mod} = (1+p_1) \cdot \alpha_{org} \\ \beta_{mod} = (1+p_1)\beta_{org} - p_2 \end{cases}$$
(3.10)

In this design, a standard capacitor model with conventional EDA extraction tools were used for estimating the parasitic capacitances and were found to be accurate enough at this level of ADC resolution. Also, the total input capacitance is pushed down to  $\sim$ 90 femto-farads, which is a key to the achieved >4 GHz input bandwidth.

#### 3.3.4 Digital Calibration Scheme

As the systematic error is accounted for in the modified design equation, the ADC is still vulnerable to the random error, such as capacitor mismatch and parasitic variation. These random errors can change the effective radix from MSB to LSB bit, and thus reduce the linearity of the ADC. Similar to the gain error of a residue amplifier in a pipeline ADC, the digital combining weights need correction by estimating the real gain [40]. In this prototype, a foreground digital calibration scheme is developed to correct the combining weights and currently implemented off-chip. The approach was to inject a known input signal to the ADC, and use the converted outputs with initial combining weights to reconstruct the input signal. By using the reconstructed signal as the reference for an LMS loop, the combining weights can be adapted to the real values. Alternatively, the combining weights can be directly calculated through matrix operation using the orthogonality principle. The reason for using an LMS loop is to reduce the algorithmic complexity to enable a potential on-chip integration.

Shown in Fig. 3.9(a), a ramp signal that spans over the full swing range is injected as the known signal. The reconstruction of the reference signal is done through a best linear curve fitting of the the ADC outputs with initial guess of the combining weights. Next, the same ADC output code words are fed into an adaptive FIR filter to converge the real combining weights. The simulation results showed the quantization error can be improved after calibration using several hundred samples. Alternatively, a sine wave of a certain frequency and full swing amplitude can be used as the prior information as illustrated in Fig. 3.9(b). By using a FFT processor, the sine wave is reconstructed by extracting its amplitude, phase and offset at the fundamental frequency. A benefit of using a sine wave rather than a ramp is the potentially easier on-chip implementation with high linearity so that the digital calibration can be turned into an on-chip self-calibration scheme and extended to higher ADC resolution.



(a) Calibration with ramp input.



(b) Calibration with sine wave input.

Figure 3.9: LMS calibration loop.

#### 3.3.5 Variable Duty-Cycled Clock

There are two criteria for the global sampling clock, it needs to have a variable duty-cycle for testing purposes by allowing the time between tracking and conversion phases to be varied to explore the ADC performance and it should have very low jitter if RF sub-sampling is to be used. In fact, the RMS jitter should be on the order of psec using worst-case case analysis to support the sub-sampling capability.

The clock generation is illustrated in Fig. 3.10 and uses two sinusoidal waves are generated off-chip with a tunable phase skew in between. The waveforms are then regenerated on chip and combined with an AND gate. The phase skew determines the duty cycle of the clock source. Another 180 degree phase-shifted clock is achieved by simply inverting the two sinusoidal waves and going through the same combination logic. Special attention was paid in both logic and layout level to ensure the exact 180degree phase shift. Any phase imbalance causes extra distortion or requires additional calibration. Finally, the clock jitter is minimized by careful layout, a dedicated power domain and a clean clock source with extra bandpass filtering. In addition, the edge rate of the sampling clock should be high enough to reduce the jitter, which results in extra power dissipation in the large sized buffers. The jitter due to intrinsic noise of the logic gates are analyzed and simulated to ensure it is well below the specification.



Figure 3.10: Variable duty-cycled clock generation.

#### 3.3.6 High-Speed Digital Logic

The speed of the entire asynchronous and switching logic is also critical to the speed of conversion rate. Therefore, all the digital logic in the critical path is custom dynamic logic and optimized using logical efforts [41] as well as careful layout. The dynamic logic uses a weak keeper transistor to avoid charge leakage and enhance noise invulnerability. Moreover, the dynamic registers are designed for minimal clock loading as these are driven by the asynchronous logic. Note that the pulse duration is adjusted by a variable MOS resistor operated deep in the triode region in order to explore the tradeoff between conversion speed and dynamic error. The less critical digital blocks such as bit caches, and SRAM controller are made from standard cells provided by the foundry to save the design time. Nevertheless, the timing constraint between ADC and testing SRAM is very tight, which requires careful design of the interface circuitry.

## 3.4 Measured Results

The prototype ADC was fabricated in a 1.2 V 0.13  $\mu$ m six-metal one-poly digital CMOS process. Chip on board packaging was used on two versions of PCB designs to measure ADC performance below and above Nyquist frequency (above Nyquist to investigate the use of sub-sampling). Fig. 3.12 shows the Nyquist testing board. A photomicrograph is shown in Fig. 3.11. The total chip size measures 1.7x1.4 mm<sup>2</sup>,



Figure 3.11: DNL and INL before and after combining weights calibration.

while each ADC occupies only 250x240  $\mu \mathrm{m}^2$  which reduces the overhead of time interleaving.

The static performance is characterized through DNL and INL measurement. Shown in Fig. 3.13, DNL and INL improves from over 1 LSB to within half LSB after combining weights calibration described in section 3.3.4. It is equivalent to 2 dB SNDR improvement, which implies the random error at 6 bit level is not significant. The dynamic performance measurements (Fig. 3.14(a)) show that the ENOB of a single ADC scales from 5.3 bit at 300 MS/s to 3.7 bit at 500 MS/s, demonstrating the straight-forward tradeoff between ENOB and conversion rate which is inherent to the proposed ADC architecture. In Fig. 3.14(b), the dynamic performance is further



Figure 3.12: PCB for testing Nyquist frequencies.



Figure 3.13: DNL and INL before and after combining weights calibration.

explored using RF input above Nyquist ranging from 3 to 5 GHz, showing that the SNDR remains above 30dB even with an input frequency over 4 GHz.

Fig. 3.15 shows the performance of time interleaving two of the ADCs to achieve 600 MS/s sampling rate at twice the power and area. Off-chip digital subtraction of each ADC offset removes spurious tones improving the SNDR by 0.7 dB. Note that there is little reduction of SNDR at lower frequency, but as the input frequency increases above 300 MHz, the clock skew between paths yields a several dB SNDR reduction. To prove this, the clock skew is extracted through Hilbert transformer and then compensated by digital interpolation. The results (Fig. 3.16) show that SFDR improves 11 dB and SNDR improves 2 dB. The total power consumption excluding SRAM and IO pads consumes 5.3 mW while analog, digital and clock



Figure 3.14: Measured SNDR versus  $f_s$  and  $f_{in}$  for single ADC (a) below and (b) above Nyquist frequency.



Figure 3.15: (a) Measured SNDR versus  $f_s$  and  $f_{in}$  for time-interleaved ADC and its (b) FFT spectrum measured at 159 MHz input.



Figure 3.16: FFT spectrum before and after clock skew calibration.

section consuming 1.2 mW, 3.2 mW and 0.9 mW, respectively. The performance of the chip is summarized in Table 3.1.

## 3.5 Applicability

### 3.5.1 Technology Scaling

As a medium resolution ADC is not limited by KT/C noise, it generally benefits from technology scaling. This also applies to the proposed asynchronous ADC architecture, since most circuits are open-loop and digitally operated while only limited by capacitor matching accuracy. A first-order power and speed analysis for technology scaling is explored assuming a constant field scaling, i.e. dimension of transistors and supply voltage scales down by 1/S. The conversion time is mainly dominated by the signal tracking time, comparator speed and digital propagation delay. The value

| Technology          |         | 0.13- $\mu$ m 6M1P Digital CMOS                                              |                     |
|---------------------|---------|------------------------------------------------------------------------------|---------------------|
| Package             |         | Chip on board                                                                |                     |
| Resolution          |         | 6 bit                                                                        |                     |
| Sampling Rate       |         | 300-500 MS/s for single<br>ADC (600 MS/s-1 GS/s for<br>time-interleaved one) |                     |
| Supply Voltage      |         | 1.2 V                                                                        |                     |
| Input 3dB Bandwidth |         | >4 GHz                                                                       |                     |
| Peak SNDR           |         | 34  dB  at  600  MS/s                                                        |                     |
| FOM                 |         | 0.22  pJ/conversion step                                                     |                     |
| Power               | Analog  | 1.2 mW                                                                       |                     |
|                     | Digital | 3.2 mW                                                                       | Total Power: 5.3 mW |
|                     | Clock   | 0.9 mW                                                                       |                     |

Table 3.1: Performance Summary (25  $^{\circ}$ C)

of the capacitor array is assumed fixed to preserve the matching property, and the on-resistance of a MOS switch is deliberately scaled down by fixing W. Therefore, the tracking time constant  $(R_{on}C_s)$ , comparator bandwidth ( $\propto f_T$  of a transistor), and digital gate delay (CV/I) all scale down by 1/S with the consideration of velocity saturation. From power perspective, if the overdrive voltage and W/L are assumed fixed, the analog power scales with supply voltage. The digital switching power  $(fCV^2)$  scales down by  $1/S^2$  [42], while that of clock network scales less due to the relatively larger sized sampling switch. Table 3.2 summarized the scaling trend, which predicts the figure-of-merit (FOM) defined in Eq. (3.11) improves at least  $1/S^2$ .
| $T_{track}$   | $T_{comp}$       | $T_{dig}$     |
|---------------|------------------|---------------|
| $RC \sim 1/S$ | $1/f_T \sim 1/S$ | $RC \sim 1/S$ |
| Panalog       | P <sub>clk</sub> | $P_{dig}$     |
|               |                  |               |

Table 3.2: Technology Scaling on the proposed ADC architecture

$$FOM = \frac{\text{Power}}{2^{ENOB} \cdot f_s} \tag{3.11}$$

#### 3.5.2 Future Role of the Proposed ADC Topology

Fig. 3.17 shows a common usage of ADC topologies in terms of resolution and sampling rate. Traditionally, flash-type ADCs, including sub-ranging and folding converters, dominate over the high-speed and medium-resolution regime, while consuming redundant power. To save the extra power consumption, an SAR converter can be a better solution, however, it is normally used for high to medium resolution while limited to speeds of 10s' MHz to KHz range. By using asynchronous processing technique presented in this silicon prototype, it has successfully pushed the SA algorithm implementation to the speed of 100s' MHz. The proposed architecture can be easily extended to higher ADC resolution before being limited by KT/C noise. In fact, the FOM improves as the ADC bits increases, since the speed scale down proportionally while quantization levels ( $2^{ENOB}$ ) scales up exponentially. Finally, greater than 4 GHz input bandwidth renders the potential to time-interleave close to 10 GHz for supporting Nyquist rate sampling.



Figure 3.17: Future role of the SA architecture.



Figure 3.18: FOM comparisons with recent >10 MHz, 6-8 bit ADCs from ISSCC 00-05.

### 3.6 Conclusion

An asynchronous ADC architecture has been demonstrated to achieve a high power efficiency for a high-speed and medium resolution converter. While the asynchronous processing concept can be incorporated with other ADC topologies, the first prototype has been realized with a SA architecture to achieve FOM of 0.22pJ/conversion step in 0.13- $\mu$ m CMOS process. The results have been compared with the recent 6-8 bit, high-speed (>10MS/s) ADCs in ISSCC 2000-2005, as shown in Fig. 3.18. The FOM of the proposed architecture is anticipated to further improve with continued technology scaling.

## Chapter 4

# **Conclusion and Future Work**

This research work has proposed both system and circuit solutions to minimize the overall cost of implementing an ultra-wideband radio. It is found that combining the different disciplinary domains including communication theories, signal processing algorithm and analog/digital IC design is truly the key to better solutions. Especially, the silicon implementation of most modern communication systems has reached the point of full system-on-chip (SOC) solutions for lower cost. With this level of integration and tighter power constraint, the boundary between system and circuit, analog and digital becomes even more obscure.

Looking into the future, the results derived from this thesis work can drive into several directions. First of all, the sub-sampling front-end is a promising architecture for any wideband system, besides UWB. Especially, the future wireless radio front-end are going wider bandwidth to utilize more unlicensed bands and multiple standards. The performance degradation and anti-aliasing filtering requirements are greatly reduced, due to the lower in-band SNR and under-sampling ratio. From technology scaling perspective, the increasing intrinsic frequency of a transistor will favor the implementation of a sub-sampling front-end, where more circuit blocks will be operated in the RF band while reducing the overall complexity. The emerging MEMS filter technology should also be paid attentions as it is a promising candidate for more attenuations for out-of-band aliasing.

Secondly, the analytic signal processing can be further explored for UWB radio. The example derived in this thesis is assuming there is no inter-symbol interference (ISI) between pulses. However, for higher pulse repetition rate, an equalizer incorporating analytic signal transformation can be used to remove the interference with minimal overhead of computation complexity. Additionally, the analytic signal processing can be extended for a tracking application using impulse radio approach, since the trajectory of an analytic matched filter serves as the signature of a certain pulse waveform.

Finally, the wideband radio has driven the demand of high-speed ADCs with very low power consumption for portable device applications. The high sampling cost has long been the bottleneck of such systems, which often necessitates more analog processing. Recently, there has been a new revolutionary wave of pushing the power efficiency of ADCs, which has entered below 1 pJ/conversion step regime. The greatly reduced sampling cost will enable new system architecture where more digital signal processing can be implemented. The asynchronous ADC architecture has demonstrated a superior power efficiency and possess a great potential to scale for higher sampling speed, more resolution, and future technology. The asynchronous concept and circuit techniques introduced in this research work can be crucial for pushing the performance of future high-speed and medium-resolution ADCs, achieving below 0.1 pJ/conversion step power efficiency.

# Bibliography

- "Revision of part 15 of the commission's rule regarding ultra-wideband transmission systems," *Federal Communications Commission*, First Report and Order, Feb. 2002.
- [2] S. Roy, J. R. Foerster, V. S. Somayazulu, and D. G. Leeper, "Ultrawideband radio design: The promise of high-speed, short-range wireless connectivity," *Proceedings of the IEEE*, vol. 4 Issue 2, pp. 295–311, Feb. 2004.
- [3] G. R. Aiello and G. D. Rogerson, "Ultra-wideband wireless systems," IEEE Microwave Magazine, vol. 4 Issue 2, pp. 36–47, June 2003.
- [4] J. Bergervoet et al., "An interference robust receive chain for UWB radio in SiGe BiCMOS," *IEEE International Solid-State Circuits Conference (ISSCC)* Dig. Tech. Papers, vol. 48, pp. 200–201, Feb. 2005.
- [5] A. Ismail and A. Abidi, "A 3.1 to 8.2GHz direct conversion receiver for MB-OFDM UWB communications," *IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers*, vol. 48, pp. 208–209, Feb. 2005.

- [6] B. Razavi et al., "A 0.13μm CMOS UWB transceiver," IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, vol. 48, pp. 216– 217, Feb. 2005.
- [7] A. Tanaka et al., "A 1.1v 3.1-to-9.5GHz MB-OFDM UWB transceiver in 90nm CMOS," *IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers*, vol. 49, pp. 120–121, Feb. 2006.
- [8] C. Sandner et al., "A WidMedia/MBOA-Compliant CMOS RF transceiver for UWB," IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, vol. 49, pp. 122–123, Feb. 2006.
- [9] WiMedia Alliance. [Online]. Available: http://wimedia.org/
- [10] R. A. Scholtz, "Multiple access with time-hopping impulse modulatoin," Proc.
   MILCOM, vol. 2, pp. 447–450, Oct. 1993.
- [11] I. O'Donnell, M. Chen, S. Wang, and R. Brodersen, "An integrated, low power, ultra-wideband transceiver architecture for low-rate, indoor wireless systems," *IEEE CAS Workshop on Wireless Communications and Networking*, Sept. 2002.
- [12] M. S. W. Chen and R. W. Brodersen, "A subsampling UWB radio architecture by analytic signalling," *Proc. ICASSP*, vol. 4, pp. 533–536, May 2004.
- [13] —, "A 6b 600MS/s 5.3mw asynchronous ADC in  $0.13\mu$ m CMOS," IEEE In-

ternational Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, vol. 49, pp. 574–575, Feb. 2006.

- [14] S. Lida et al., "A 3.1 to 5GHz CMOS DSSS UWB transceiver for WPANs," *IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers*, vol. 48, pp. 214–215, Feb. 2005.
- [15] A. Aggarwal et al., "A low power implementation for the transmit path of a UWB transceiver," Proc. IEEE Custom Integrated Circuits Conference (CICC), Sep. 2005.
- [16] J. G. Proakis, *Digital Communications*. New York: McGraw Hill, 1995.
- [17] R. G. Vaughan, N. L. Scott, and D. R. White, "The theory of bandpass sampling," *IEEE Transactions on Signal Processing*, vol. 39 No. 9, pp. 1973–1984, Sept. 1991.
- [18] M. Shinagawa, Y. Akazawa, and T. Wakimoto, "Jitter analysis of high-speed sampling systems," *IEEE J. Solid-State Circuits*, vol. 25 No. 1, pp. 220–224, Feb. 1990.
- [19] M. S. W. Chen and R. W. Brodersen, "The impact of a wideband channel on UWB system design," *Proc. MILCOM*, vol. 1, pp. 163–168, Nov. 2004.
- [20] A. Oppenheim and R. Schafer, Discrete-Time Signal Processing 2nd Ed. Englewood Cliffs, NJ: Prentice Hall, 1998.

- [21] H. Schantz, "Bottom fed planar elliptical UWB antennas," IEEE Conference on Ultra Wideband Systems and Technologies, pp. 219–223, Nov. 2003.
- [22] S. Ghassemzadeh, R. Jana, C. Rice, W. Turin, and V. Tarokh, "Measurement and modeling of an ultra-wide bandwidth indoor channel," *IEEE Trans. on Communications*, vol. 52 No. 10, pp. 1786–1796, Oct. 2004.
- [23] D. Cassioli, M. Win, and A. Molisch, "The ultra-wide bandwidth indoor channel: From statistical model to simulations," *IEEE Journal on Selected Areas in Communications*, vol. 20 No. 6, pp. 1247–1257, Aug. 2002.
- [24] M. S. W. Chen, "Ultra wideband baseband design and implementation," Master's thesis, University of California, Berkeley, 2002.
- [25] C. Chang, K. Kuusilinna, B. Richards, A. Chen, N. Chan, and R. Brodersen, "Rapid design and analysis of communication systems using the BEE hardware emulation environment," *Proc. IEEE Rapid System Prototyping Workshop*, pp. 148–154, June 2003.
- [26] C. Shi and R. Brodersen, "Automated fixed-point data-type optimization tool for signal processing and communication," *Proc. Design Automation Conference*, pp. 478–483, June 2004.
- [27] G. Chien and P. R. Gray, "A 900-MHz local oscillator using a DLL-based fre-

quency multiplier technique for PCS applications," *IEEE Journal of Solid-State Circuits*, vol. 35 No. 12, pp. 1996–1999, Dec. 2000.

- [28] P. Scholtens and M. Vertregt, "A 6b 1.6 Gsample/s flash ADC in 0.18 μm CMOS using averaging termination," *IEEE International Solid-State Circuits Confer*ence (ISSCC) Dig. Tech. Papers, vol. 1, pp. 168–457, Feb. 2002.
- [29] X. Jiang, Z. Wang, and F. Chang, "A 2GS/s 6b ADC in 0.18μm CMOS," IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, vol. 1, pp. 322–497, Feb. 2003.
- [30] R. Taft et al., "A 1.8v 1.6GS/s 8b self-calibrating folding ADC with 7.26
   ENOB at Nyquist frequency," *IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers*, vol. 1, pp. 252–256, Feb. 2004.
- [31] P. Figueiredo et al., "A 90nm CMOS 1.2V 6b 1GS/s two-step subranging ADC," IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, vol. 49, pp. 568–569, Feb. 2006.
- [32] G. Van der Plas, S. Decoutere, and S. Donnay, "A 0.16pj/conversion-step 2.5mw
  1.25GS/s 4b ADC in a 90nm digital CMOS process," *IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers*, vol. 49, pp. 566–567, Feb. 2006.
- [33] D. Draxelmayr, "A 6b 600MHz 10mw ADC arrary in digital 90nm CMOS,"

IEEE International Solid-State Circuits Conference (ISSCC) Dig. Tech. Papers, vol. 1, pp. 264–527, Feb. 2004.

- [34] B. Murmann and B. Boser, "A 12-bit 75-MS/s pipelined ADC using open-loop residue amplification," *IEEE J. Solid-State Circuits*, vol. 38 No. 12, pp. 2040– 2050, Dec. 2003.
- [35] H. J. M. Veendrick, "The behavior of flip-flops used as synchronizers and prediction of their failure rate," *IEEE J. Solid-State Circuits*, vol. SC-15, pp. 169–176, April 1980.
- [36] W. Black and D. Hodges, "Time interleaved converter arrays," IEEE J. Solid-State Circuits, vol. 15 No. 6, pp. 1022–1029, Dec. 1980.
- [37] K. Bult and A. Buchwald, "An embedded 240-mW 10-b 50 MS/s CMOS ADC in 1-mm<sup>2</sup>," *IEEE J. Solid-State Circuits*, vol. 32, pp. 1887–1895, Dec. 1997.
- [38] F. Kuttner, "A 1.2V 10b 20MSample/s non-binary successive approximation ADC in 0.13μ CMOS," *IEEE International Solid-State Circuits Conference* (*ISSCC*) Dig. Tech. Papers, vol. 1, pp. 176–177, Feb. 2004.
- [39] A. Hastings, The art of analog layout. New Jersey: Prentice Hall, 2001.
- [40] A. N. Karanicolas, H. Lee, and K. L. Bacrania, "A 15-b 1-Msample/s digitally self-calibrated pipeline ADC," *IEEE J. Solid-State Circuits*, vol. 28, pp. 1207– 1215, Dec. 1993.

- [41] I. Sutherland, R. Sproull, and D. Harris, Logical effort: designing fast CMOS circuits. San Francisco, CA: Morgan Kaufmann, 1999.
- [42] J. M. Rabaey et al., Digital integrated circuits: a design perspective. Upper Saddle River, New Jersey: Pearson Education, 2003.