## Low Noise Transimpedance Amplifier Design Using Berkeley Analog Generator



Eric Jan

## Electrical Engineering and Computer Sciences University of California at Berkeley

Technical Report No. UCB/EECS-2020-146 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-146.html

August 13, 2020

Copyright © 2020, by the author(s). All rights reserved.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission.

Acknowledgement

I would like to thank Professor Stojanovic, Krishna and Nandish for their mentorship through my college career. I have been very lucky to have an opportunity to work on such cutting edge technology and to work with such brilliant people. Thanks Sidney and to the rest of the group for all the help. Thanks to my friends for making it a great time. And finally thanks to my parents for always being there to support me. Low Noise Transimpedance Amplifier Design Using Berkeley Analog Generator

by

Eric Jan

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Master of Science

in

Electrical Engineering and Computer Science

in the

Graduate Division

of the

University of California, Berkeley

Committee in charge:

Professor Vladimir Stojanovic, Chair Professor Elad Alon

Summer 2020

#### Low Noise Transimpedance Amplifier Design Using Berkeley Analog Generator

by Eric Jan

#### **Research Project**

Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, in partial satisfaction of the requirements for the degree of **Master of Science, Plan II**.

Approval for the Report and Comprehensive Examination:

**Committee:** 

Professor Vladimir Stojanovic Research Advisor

20

(Date)

\* \* \* \* \* \* \*

5 hur

Professor Elad Alon Second Reader

8-12-2020

(Date)

Low Noise Transimpedance Amplifier Design Using Berkeley Analog Generator

Copyright 2020 by Eric Jan

#### Abstract

#### Low Noise Transimpedance Amplifier Design Using Berkeley Analog Generator

by

#### Eric Jan

#### Master of Science in Electrical Engineering and Computer Science

University of California, Berkeley

Professor Vladimir Stojanovic, Chair

Modern applications for high speed optical links demand low noise as a fundamental constraint. Much prior work exists in terms of low noise optimization, with various different techniques and architecture proposed, but few are generalizable across process and are comprehensive enough for other designers to use. This work investigates fundamental techniques at both an architectural level through equalization and system-wide co-optimization and low-level component sizing techniques achieved through automated design script also covering layout and schematic generation. The result is a true push-button flow interfacing with Berkeley Analog Generator that takes in a generic set of desired system specifications and produces a corresponding layout and schematic of a satisfactory system. The proposed methodology, techniques, design scripts and resultant push-button flow are validated across multiple design points with various photodiode models by taping out three macro designs in a 45nm SOI process.

# Contents

| Co       | Contents i          |                                        |    |  |  |  |
|----------|---------------------|----------------------------------------|----|--|--|--|
| Li       | List of Figures iii |                                        |    |  |  |  |
| 1        | Mot                 | tivations                              | 1  |  |  |  |
|          | 1.1                 | Reuse                                  | 1  |  |  |  |
|          | 1.2                 | More Optimal Design                    | 2  |  |  |  |
| <b>2</b> | Tra                 | nsimpedence Amplifier Analysis         | 3  |  |  |  |
|          | 2.1                 | Accurate Input Referred Noise Analysis | 3  |  |  |  |
|          | 2.2                 | TIA Topology Selection                 | 5  |  |  |  |
|          | 2.3                 | TIA Noise Analysis                     | 6  |  |  |  |
|          | 2.4                 | Further Architectural Optimizations    | 11 |  |  |  |
| 3        | Gen                 | nerator Design                         | 19 |  |  |  |
|          | 3.1                 | Inverter TIA Design                    | 20 |  |  |  |
|          | 3.2                 | Layout-Aware Inverter TIA Optimization | 23 |  |  |  |
|          | 3.3                 | System Wide Optimization               | 25 |  |  |  |
|          | 3.4                 | Differential Pre-Amplifier Design      | 27 |  |  |  |
|          | 3.5                 | CTLE Design                            | 30 |  |  |  |
|          | 3.6                 | Output Stage Design                    | 32 |  |  |  |
|          | 3.7                 | Macro and Top Level Assembly           | 34 |  |  |  |
| <b>4</b> | Sim                 | ulation and Design Results             | 38 |  |  |  |
|          | 4.1                 | Photodiode Model                       | 38 |  |  |  |
|          | 4.2                 | Optimization and Design Specifications | 40 |  |  |  |
|          | 4.3                 | Optimized Designs                      | 41 |  |  |  |
|          | 4.4                 | CTLE Performance                       | 46 |  |  |  |
|          | 4.5                 | Noise Performance                      | 51 |  |  |  |
|          | 4.6                 | Offset Correction                      | 56 |  |  |  |
|          | 4.7                 | Power Consumption                      | 56 |  |  |  |
| <b>5</b> | Pac                 | kaging and Testing Setup               | 59 |  |  |  |

|    | 5.1          | Wire-Bonding                 | 59 |
|----|--------------|------------------------------|----|
|    | 5.2          | Testing Setup and PCB Design | 64 |
|    | 5.3          | Intended Tests               | 67 |
| 6  | Con          | clusion and Future Work      | 69 |
| Bi | Bibliography |                              |    |

ii

# List of Figures

| 2.1  | Brick-wall filter approximation illustrated graphically [2]                                                          | 4  |
|------|----------------------------------------------------------------------------------------------------------------------|----|
| 2.2  | Table for the ratio of $\frac{\omega_{ENBW}}{\omega_{R}}$ for applying the brick-wall filter approximation to a      |    |
|      | filter of the given order. $\ldots$ | 4  |
| 2.3  | Common TIA Architectures                                                                                             | 5  |
| 2.4  | Observations on the resistor noise for the resistor feedback TIA                                                     | 7  |
| 2.5  | Testbench Setup for simulating the amplifier noise contribution. In this case, a                                     |    |
|      | basic single stage inverter tia is used                                                                              | 9  |
| 2.6  | Observations on the amplifier noise for the resistor feedback TIA                                                    | 9  |
| 2.7  | Schematic of the inverter TIA with three inverters and an internal feedback resistor                                 | 10 |
| 2.8  | Schematic of Inverter TIA with Inductive Peaking                                                                     | 11 |
| 2.9  | Effect of inductive peaking                                                                                          |    |
|      | on the inverter TIA frequency response                                                                               | 12 |
| 2.10 | Effect of inductive peaking on the inverter TIA noise power spectral density                                         | 12 |
| 2.11 | CTLE Architectures                                                                                                   | 13 |
| 2.12 | Evaluating the CTLE Frequency Response                                                                               | 14 |
| 2.13 | Simulation setup for seeing improvement in noise due to lower bandwidth and                                          |    |
|      | $consequent \ equalization  . \ . \ . \ . \ . \ . \ . \ . \ . \ .$                                                   | 15 |
| 2.14 | Effect of noise optimization with CTLE on resistor and transistor noise                                              | 16 |
| 2.15 | Proposed architecture for the full optimized system                                                                  | 18 |
| 3.1  | TIA Layouts                                                                                                          | 22 |
| 3.1  | TIA Layouts (cont.)                                                                                                  | 23 |
| 3.2  | Block Diagram of the TIA Design Loop                                                                                 | 25 |
| 3.3  | Layout of the Pre-amplifier                                                                                          | 28 |
| 3.4  | Schematic of Pre-Amplifier with Offset Correction                                                                    | 29 |
| 3.5  | Schematic of Pre-Amplifier with Offset Correction                                                                    | 30 |
| 3.6  | CTLE Layout                                                                                                          | 32 |
| 3.7  | Output Stage Layout                                                                                                  | 34 |
| 3.8  | Current Distribution Schematic                                                                                       | 35 |
| 3.9  | Annotated Overall Macro Layout with Power Grid                                                                       | 36 |
| 3.10 | Annotated Overall Layout                                                                                             | 37 |

| 4.1                   | Photodiode and Packaging Model Specifications                                                                                                                                                  | 39              |
|-----------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|
| 4.2                   | Targeted Design Specifications                                                                                                                                                                 | 40              |
| 4.3                   | Table Summarizing Overall TIA Performance                                                                                                                                                      | 41              |
| 4.4                   | Table Summarizing Sub-block Performance                                                                                                                                                        | 41              |
| 4.5                   | PD1 10 Gbps Design Frequency Domain Response                                                                                                                                                   | 42              |
| 4.6                   | PD2 10 Gbps Design Frequency Domain Response                                                                                                                                                   | 43              |
| 4.7                   | PD2 25 Gbps Design Frequency Domain Response                                                                                                                                                   | 44              |
| 4.8                   | Overall system output eye diagrams for the various design specifications all op-<br>erating at maximum swing                                                                                   | 45              |
| 4.8                   | Overall system output eve diagrams for the various design specifications all op-                                                                                                               |                 |
| 1.0                   | erating at maximum swing (cont.)                                                                                                                                                               | 46              |
| 49                    | Seeing the effect of the CTLE stages on the eve opening                                                                                                                                        | 10              |
| 1.0                   | Design based on PD1                                                                                                                                                                            | $\overline{47}$ |
| 4 10                  | Seeing the effect of the CTLE stages on the eve opening                                                                                                                                        | 11              |
| 1.10                  | Design based on PD1                                                                                                                                                                            | 48              |
| 4 11                  | Seeing the effect of the two CTLE stages on the eve opening                                                                                                                                    | 10              |
| <b>T</b> , <b>I</b> I | Design based on PD?                                                                                                                                                                            | 49              |
| 4 1 2                 | Noise PSD for PD1 10 Gbps design at various stages input referred according to                                                                                                                 | 10              |
| 7.12                  | the respective midband gain $(f_{\rm ENDW} = 6.5 \times \frac{\pi}{2} \text{ GHz is marked})$                                                                                                  | 51              |
| 4 13                  | Noise PSD for the three macro designs $(f_{ENDW} = 0.5 \times \frac{\pi}{2} \text{ GHz is marked}) = 0.5 \times \frac{\pi}{2} \text{ GHz and}$                                                 | 01              |
| т.10                  | for the time intermeter designs $(JENBW, 10Gbps = 0.5 \times \frac{1}{2})$ GHz and for provide $(JENBW, 10Gbps = 0.5 \times \frac{1}{2})$                                                      | 52              |
| 1 11                  | $J_{ENBW,25Gbps} = 15.2 \times \frac{1}{2}$ OHZ marked) $\ldots \ldots \ldots$ | 52              |
| 4 15                  | Noise Breakdown for PD2 10 Gbps Design                                                                                                                                                         | 54              |
| 4 16                  | Noise Breakdown for PD2 25 Gbps Design                                                                                                                                                         | 55              |
| 4 17                  | Eve Diagrams After Adjusting Offset                                                                                                                                                            | 57              |
| 4 18                  | Power breakdown of the different designs comparing how each block contributes                                                                                                                  | 01              |
| <b>1.1</b> 0          | to the total power consumption                                                                                                                                                                 | 58              |
|                       |                                                                                                                                                                                                | 00              |
| 5.1                   | TIA chip diagrams                                                                                                                                                                              | 60              |
| 5.2                   | Photodiode chip diagram                                                                                                                                                                        | 61              |
| 5.3                   | Packaging cross section with the chips placed on the PCB (not drawn to scale).                                                                                                                 | 61              |
| 5.4                   | Wire-bonding Diagrams for Testing the Various Macros (cont.)                                                                                                                                   | 63              |
| 5.5                   | PCB Diagram                                                                                                                                                                                    | 64              |
| 5.6                   | PCB Layer Stack                                                                                                                                                                                | 65              |
| 5.7                   | Table detailing difference of each bonding site                                                                                                                                                | 66              |
| 5.8                   | Zooming in to the bonding sites to see the hole placed under the photodiode chip                                                                                                               | 66              |
| 50                    | Sample statistical are diagram with standard deviation of 0 and 1 levels marked                                                                                                                | 00              |
| J.9                   | sample statistical eye diagram with standard deviation of 0 and 1 levels marked                                                                                                                | 67              |
| E 10                  | Sample statistical are diagram with standard deviation of our height (                                                                                                                         | 07              |
| 9.10                  | approximated by a gaussian                                                                                                                                                                     | 68              |

#### Acknowledgments

I would like to thank Professor Stojanovic, Krishna and Nandish for their mentorship through my college career. I have been very lucky to have an opportunity to work on such cutting edge technology and to work with such brilliant people. Thanks Sidney and to the rest of the group for all the help. Thanks to my friends for making it a great time. And finally thanks to my parents for always being there to support me.

# Chapter 1 Motivations

Optical links and sensors are now common in a number of applications, from data communication and biomedical sensing to cryogenic supercomputing and high performance computing chiplets. In order to accommodate these applications, there is a demand for higher performance in terms of bandwidth and power. Things have to work at low power but also function at much higher speeds. For these circuits to be as sensitive as possible, very low noise is necessary. Both architectural and system-level optimizations as well as lower level circuit design techniques are necessary to meet this goal.

Additionally, at modern process nodes, there is great disparity between modeled schematic level simulations and post-layout extracted simulations. This means that the models break down and complicate the design process. As a result designs require performing more adjustments and a greater number of simulations to find the optimum design; circuit design becomes more laborious and difficult. The effect of more stringent specifications is only exacerbated as the process nodes are pushed to their limit also necessitate more optimal designs. Solutions take the form of more intelligent optimization through reuse and automation while factoring in improved design choices and techniques.

#### 1.1 Reuse

In designing mixed-signal analog circuits, there are many structures that are often repeated and used as fundamental building blocks. Specifically for optical receivers, there is necessarily some transimpedance amplifier (TIA) and perhaps some voltage gain amplifiers that follow. If a digital output is required, there will be some sense-amp; if an analog output is required there must be some output stage to drive the voltage off chip. From these common building blocks, it is clear that for two links even with drastically different specifications, some blocks will be shared with slight modifications in parameters or sizing (or some small tweaks to the circuit) more so than any dramatic architectural changes. To achieve shorter design times, large amounts of reuse can thus be leveraged. Currently work has been done in terms of layout and schematic generators in academia and industry alike [1]. Layout and schematic generation scripts are promising and with some effort can be ported across to different process nodes. Along the same line of logic, when designing limited by one factor (such as noise-limited design, power-limited design, etc.) the reasoning and process of design may be rather similar ranging across different data rates, sensitivity requirements, target applications, etc. As a result, if some characterization of the technology is provided, these circuit generators can be used in a design script to intelligently design the parameters of a circuit and simulate a large number of relevant designs in the design space (which will be necessary regardless of the chosen method of design), given a set of desired performance specifications. How flexible the design script is, in terms of how it performs optimizations and in terms of how it picks the component parameters according to the simulations, is dependent totally on the designer and how the designer maps the logic of the design process to the script. This is very similar to the current design process, only that it can now be highly automated and more precise to cover a larger design space with higher granularity using frameworks such as BAG.

## **1.2** More Optimal Design

Given that this proposed process of design is rather similar to what exists now, why is it necessary? As mentioned prior, there is constantly a desire to improve the design process such that it is more powerful and can search for a more optimal design according to what the designer wants. In the current process, there are heavy restrictions set in place according to how much time and effort the designer is willing to put in as adjustments must be made to both schematic and layouts, which is quite a pain. Additionally, specifications for blocks are constantly in flux perhaps due to the foundry providing updates to the PDK, assumptions on interactions with blocks designed by other people changing, or just changes in the overall target specifications. Even in this process, there is a large amount of redesign that is rather redundant. A designer is limited to the number of layouts they can simulate. Producing a script that can intelligently generate and simulate designs that have been laid out can cover a much broader design space. The designer is limited in how thorough they can be in the design process. A script, can thus be much more thorough, if in a constant design space it can simulate significantly more designs. These improvements are now very possible.

This work will consider one such example, hoping to design an analog optical receiver frontend optimizing for the lowest noise possible given a desired data rate, with specific focus on the design of a low noise TIA. There is a consideration of different TIA and receiver architectures and what tricks can be played to achieve lower noise. Through this process, the implementation and design of each of these circuit architectures and techniques will be of constant discussion. Upon noting trends and determining generally how a design choice (such as picking a specific architecture or varying a circuit parameter in one way) impacts the resulting noise, a design script is constructed that will simulate relevant designs in the corresponding space to find an optimal design.

## Chapter 2

# **Transimpedence Amplifier Analysis**

The problem involves designing an analog front-end offering the lowest possible noise (within reasonable power and gain constraints) given an arbitrary photodiode model and necessary data rate. The input to the Analog Front End (AFE) is a current and the output is a voltage, motivating the use of a transimpedance amplifier stage (TIA) at the outset. This section follows the analysis of the transimpedance amplifier in order to optimize the impedance, bandwidth and noise. Metrics for noise analysis will be covered as will additional architectural level optimizations and the co-design with additional blocks.

In analyzing noise, the input referred noise is a good metric for noise because it describes how much noise is effectively applied from the input of a noiseless amplifier, imposing systemwide limitations and necessary input signal specifications. For this reason, the input referred noise will be the primary consideration in noise analysis.

#### 2.1 Accurate Input Referred Noise Analysis

As noise is the focus of this work, it is important to analyze noise properly according to the correct metrics. The goal is to see how noise impacts the system and how the whole system should be designed to minimize these effects. Initially, the power spectral density (PSD) is useful to see how the output and input spot noise vary across different frequencies and can be helpful in determining what steps to take or what frequencies are especially of interest when intending to decrease noise. It is easy to see which techniques seem to work and not work in achieving lower noise. However, reporting a single value of the noise PSD or providing multiple PSD's is still difficult to evaluate across the given frequencies. Thus the integrated noise is useful. The integrated noise describes how much noise the system is expected to see, referred either from the output or the input. These will show up during real-time operation of the circuits. To refer it to the input, it may make sense to immediately refer the output noise to the input by referring it according to the transfer function and frequency response. However, from an input-referred noise PSD (that is customary for input-referred noise plots) provided in Figure 2.4b, it would seem that upon integrating the noise, the noise would

increase without bound, giving infinite input referred integrated noise. No system should be able to work according to this logic, but this is not a phenomena observed in reality and is perhaps characteristic of an improper calculation of noise. Certainly, the input referred noise, given directly as  $\frac{Noise_{output}(z)}{H(z)}$  is important to consider, as is the integrated noise, but this method of calculation certainly is not the most accurate. To provide a more accurate approximation for the noise, consider the brick-wall approximation that can be made by calculating the equivalent noise bandwidth. If the transfer function can be modeled by some number of poles, there is some brick wall equivalent cutoff frequency (noise bandwidth) such that integrating the noise given by the system of poles gives the same result as with the brickwall filter. The approximation is modeled in Figure 2.1 with the equivalent noise bandwidth for varying degree poles given in Table 2.2. In this work, the shape of the frequency response and the output noise response is rarely shaped as nice as the ideal single pole filter. Upon performing this calculation for the total integrated output noise of the system, it can be referred back to the input according to the midband transfer function, as the noise of the system resembles a sharp brick-wall filter. For the rest of this paper, results in terms of input referred integrated noise will be calculated more accurately in this manner. If anything, this will give a safe slightly over-estimate of the noise.



| Filter Order | Ratio: $\frac{\omega_{ENBW}}{\omega_p}$                  |
|--------------|----------------------------------------------------------|
| 1            | $\frac{\pi}{2} \approx 1.57$                             |
| 2            | $\frac{\pi}{4} \frac{1}{\sqrt{\sqrt{2}-1}} \approx 1.22$ |
| 3            | 1.15                                                     |

Figure 2.2: Table for the ratio of  $\frac{\omega_{ENBW}}{\omega_{ENBW}}$  for applying the brickwall filter approximation to a filter of the given order.

Figure 2.1: Brick-wall filter approximation illustrated graphically [2]

fp f<sub>BF</sub> Frequency (f) (2.1)

## 2.2 TIA Topology Selection

According to the given problem, it is necessary to determine a topology that offers very good noise performance while able to support the necessary bandwidth and data rates. The main topologies considered for these applications are the common gate TIA and the resistive feedback TIA shown in Figure 2.3a and Figure 2.3b, respectively. The former is a simple common gate amplifier stage whose output can be cascaded with voltage gain amplifiers. This topology is advantageous in that the design of its bandwidth and transimpedance are decoupled. The transimpedance of this stage itself can be roughly given as the inverse of the input transistor's transconductance as in Figure 2.2. For small input capacitance, the bandwidth is set by the output pole, but these applications must take into account a generic range of photodiodes. If the pole is set by the input capacitance, it too trades off directly with the transimpedance.



(a) Common Gate TIA Schematic (b) Resistor Feedback TIA Schematic



$$\frac{V_{out}}{I_{in}} = \frac{R_D}{(1+j\omega C_{out}R_D)(1+j\omega\frac{C_{in}}{am})}$$
(2.2)

$$\overline{i_{inp,res}^2} = \frac{4k_BT}{R_D} A^2 \left(\frac{1+j\omega\frac{C_{in}}{gm}}{1+j\omega\frac{C_{out}}{ads}}\right)^2$$
(2.3)

$$\overline{i_{in,fet}^2} = 4k_B T \gamma \ gm \ (\frac{1}{gds \ R_D} \frac{j\omega C_{in} R_D}{1 + j\omega \frac{C_{out}}{qds}})^2 \tag{2.4}$$

In terms of noise performance, the CG-TIA does not perform too well as has been confirmed by current literature [3]. Both the drain resistance and the input transistor contribute thermal noise. Voltage headroom presents a consequent challenge for necessary sizing up of the drain resistance to reduce the noise contributed in Equation 2.3. Increasing the size of the transistor to decrease the channel thermal noise correlated with the transistor small signal transconductance also implies increasing the input capacitance, which is worrisome. The noise from the transistor is given by Equation 2.4. The regulated cascode is a variant with active feedback but it does not solve the problems with noise faced by this underlying architecture. The amplifier in feedback effectively scales up the transconductance of the input transistor. In doing so, it decreases the amplifier transimpedance but allows for very high bandwidth. This allows for more room to design the cascaded amplifiers.

The resistive feedback TIA is composed of some amplifier with a resistor feeding back the output to the input. Modeling the amplifier as some transconductance in parallel with some output conductance, the transimpedance of the TIA can be given Equation 2.5. The approximation is made assuming  $R_{fb}gm \gg 1$ ,  $R_{fb}gds \gg 1$  and gm is sufficiently larger than gds (intrinsic gain A is sufficiently large). For infinite amplifier output impedance, the transimpedance is set solely by the inverse of the transconductance, however with shrinking channel lengths, it may be more and more difficult to achieve these large output impedance values. For finite amplifier output impedance, the transimpedance is set largely by the value of the feedback resistance itself.

$$\frac{V_{out}}{I_{in}} = \frac{R_{fb} \ gm - 1}{gds + gm} \frac{1}{1 + j\omega \frac{C_{pd}(R_{fb} \ gds + 1)}{gds + gm}} \approx \frac{R_{fb}}{1 + j\omega \frac{C_{pd}R_{fb}}{A}}$$
(2.5)

#### 2.3 TIA Noise Analysis

In analyzing the TIA noise, consider only the noise that is contributed from the TIA itself. The noise current from the photodiode and the noise from the supply are not considered at this moment. There are two main noise sources: the resistor thermal noise and the amplifier noise. Consider first the resistor thermal noise output referred in Equation 2.6 and input referred in Equation 2.7. Note that this is the power spectral density of the noise.

$$\overline{i_{out,res}^2} = 4k_B T R_{fb} \frac{(1+j\omega \frac{C_{pd}}{gm})^2}{(1+j\omega \frac{C_{pd}R_{fb}}{A})^2} \frac{1}{(1+j\omega \frac{C_{load}}{gds})^2}$$
(2.6)

$$\overline{i_{in,res}^2} = \frac{4k_BT}{R_{fb}}(1+j\omega\frac{C_{pd}}{gm})^2$$
(2.7)

The noise is inversely proportional to the size of the resistance. Decreasing the input referred noise thus requires increasing the transimpedance and decreasing the noise, effectively requiring a large transconductance and a large feedback resistance. Thus, it would seem that maximizing the feedback resistance solves all the problems with noise, but note that the bandwidth is also inversely proportional to the feedback resistance assuming the input pole dominant.



(a) Testbench setup for simulating a resistor feedback TIA with an ideal noiseless infinite bandwidth amplifier using CPD=250fF, GM=100mS, GDS=10mS, A=GM/GDS=10



(c) Sweeping the size of the resistor and looking at the power spectral density contributed by the resistor thermal noise.



(b) Sweeping the size of the resistor and looking at the power spectral density contributed by the resistor thermal noise.



(d) Sweeping the ideal amplifier gain by increasing the transconductance.

Figure 2.4: Observations on the resistor noise for the resistor feedback TIA

It is clear that a large feedback resistance is necessary to minimize the noise contributed by the resistor according to Figure 2.4b. These numbers are referred to the input by simply dividing the output referred noise according to the frequency response. However, looking at the frequency response in Figure 2.4c, it is also clear that the amplifier will soon face bandwidth limitations. To deal with this, consider sweeping the intrinsic gain A = gm/gdsin Figure 2.4d. In Equation 2.5, which gives the transfer function for this TIA architecture, observe that the dominant pole can be given roughly as  $\omega_p = \frac{A}{C_{pd}R_{fb}}$ . Therefore increasing the intrinsic gain of the amplifier should directly increase the bandwidth. As a result, it is possible to increase the feedback resistor as is necessary to reduce its noise and compensate the drop in bandwidth by simply increasing the amplifier gain.

These address the issues for the noise contributed from the resistor. The other component of the noise is the amplifier noise likely contributed largely by the channel thermal noise of the transistors. When the output noise current of the amplifier is referred back to the input, the noise transfer function can be given output referred according to Equation 2.8 and input referred according to Equation 2.9.

$$\overline{i_{in,amp}^2} = \overline{i_{amp}^2} \frac{1}{gm^2} \frac{(1+j\omega C_{pd}R_{fb})^2}{(1+j\omega \frac{C_{pd}R_{fb}}{A})^2} \frac{1}{(1+j\omega \frac{C_{load}}{gds})^2}$$
(2.8)

$$\overline{i_{in,amp}^2} = \overline{i_{amp}^2} \frac{1}{gm^2 R_{fb}^2} (1 + j\omega C_{pd} R_{fb})^2 = 4k_B T \gamma \frac{1}{gm R_{fb}^2} (1 + j\omega C_{pd} R_{fb})^2$$
(2.9)

Increasing the feedback resistance decreases the contribution of the amplifier noise because it increase the overall transimpedance which is significant when the noise is referred back to the input. From these equations, it is clear that large amplifier transconductance is desirable as it not only directly decreases the noise contributed by the amplifier, but indirectly the greater amplifier gain increases the bandwidth of the TIA, allowing for selection of larger feedback resistances. Note that this must be done in such a way that does not increase the input capacitance too much. In consideration of the effect of noise on the system, the integrated noise must be calculated, which represents how much total noise will be observed. Looking at the input referred noise, the input capacitance has an effect on the location of the zero, directly affecting the integrated noise and motivating careful attention surrounding the contributed input capacitance. In terms of optimization, it makes sense then that increasing the transistor transconductance is beneficial but only to a certain limit until the contributed input capacitance is too large. These observations on input referred noise can be summarized in Equation 2.10 taken from [4].

$$I_{integ,in,fet}^{2} = \frac{16\pi^{2}k_{B}T\gamma(C_{pd} + C_{in})^{2}}{gmT_{bit}^{3}}$$
(2.10)

To simulate this behavior, the setup in Figure 2.5 has been constructed, with the resistor configured to be noiseless in simulation. From the simulation result in Figure 2.6, this exact trend is perceptible. The noise decreases with the size of the photodiode capacitance and there is some reasonable optimal sizing to achieve the lowest possible noise. As the size of the photodiode capacitance increases, the optimal point will occur at larger and larger transistor sizes (requiring the designer size up the devices) as the increase in input capacitance due to the input transistor will see a smaller and smaller effect according to the tradeoff in Equation 2.10 of selecting an optimal  $\frac{(C_{pd}+C_{in})^2}{gm}$ .



Figure 2.5: Testbench Setup for simulating the amplifier noise contribution. In this case, a basic single stage inverter tia is used.



ferred noise for different values of photodiode capacitance

Figure 2.6: Observations on the amplifier noise for the resistor feedback TIA

In this quest to increase amplifier transconductance, the inverter TIA is actually preferable because it offers large transconductance per contributed input capacitance to minimize the integrated noise. To achieve even higher transconductance without contributing too much input capacitance, a multiple stage inverter TIA is proposed and chosen for this work, as has been demonstrated prior [5]. The overall amplifier must be inverting, otherwise there would be positive feedback making the system unstable. Therefore, only an odd number of inverters can be utilized. With the single inverter TIA, system instability is unlikely to be an issue, as there is likely a single dominant pole spaced far from the other poles. As more inverters are added and the open loop gain increases, the amplifier approaches upon insufficient phase margin. New internal poles are introduced and achieving stability becomes more difficult. It makes sense also that increasing the number of inverters is difficult at extremely high bandwidths as the increasing inverter delay could eventually result in a point where the next signal arrives before the previous signal has fed back. As a result, there is an internal feedback resistor placed between the input and output of the final inverter added to decreases the open loop gain and achieve sufficient phase margin. The resulting architecture is shown in Figure 2.7, specifically a variant using three inverters. The input is to the left of the schematic.



Figure 2.7: Schematic of the inverter TIA with three inverters and an internal feedback resistor

#### 2.4 Further Architectural Optimizations

In this optimization of noise, it follows that generally using the lowest possible bandwidth with the largest resistance and transistors with large transconductance is optimal. The optimization seems very constrained by bandwidth. So far, the solutions proposed have attempted to apply techniques that strictly lower the noise with the bandwidth fixed by the desired application. However, it may be interesting to look at the problem from another angle. Consider slightly easing the bandwidth restrictions, and optimizing a design at a lower bandwidth. Only later, attempt to compensate for the bandwidth and focus now on optimizing the compensation process for lowest noise. This still involves very noise-aware and noise-focussed analysis, but opens up a larger design space. The question was once how can the designer lower the noise for a specific bandwidth? The question now becomes how can the designer add as little noise as possible to the system when equalizing the bandwidth? The optimization of selecting a larger resistor and increasing the amplifier gain resembles this pattern of thinking. Many approaches incorporate some sort of peaking in the transfer function to reach the necessary data rates. One common method is with inductive peaking as shown in Figure 2.8. Work such as [6] has been done using this technique.



Figure 2.8: Schematic of Inverter TIA with Inductive Peaking

Large photodiode capacitances generally make designing at higher data rates difficult as the input capacitance will dominate. Upon adding the inductor, the impedance seen at the input if the TIA can be given according to Equation 2.11 and verified in Figure 2.9. The frequency response peaks at  $\frac{1}{\sqrt{LC_{PR}}}$ . Following the peak, the frequency response decays as 2 poles at 40dB per decade. The effect on the noise is just as significant as the effect on the frequency response though. Notably, the peaking effects are also visible in the input referred noise PSD in Figure 2.10. Whatever output noise is referred back to the input and affected by the transimpedance. Ideally, the inductor would be placed such that the transimpedance frequency response of the TIA is flat, only to drop off at the compensated bandwidth; it would then make sense that the noise transfer function would also be flat with the location of the zero slightly higher and roughly related to this new pole location. However, the value of the noise PSD at lower frequencies should be lower (an assumption made according to the observations of Figure 2.4b). Thus, the overall noise of the system would be decreased. These ideas are flesh out in greater detail in the following analysis of equalization techniques.

$$Z_{\text{Inductive Peaking},TIA,in} = \frac{R_{fb} + j\omega L}{1 + C_{pd}R_{fb} + (j\omega)^2 C_{pd}L}$$
(2.11)



Figure 2.9: Effect of inductive peaking on the inverter TIA frequency response



Figure 2.10: Effect of inductive peaking on the inverter TIA noise power spectral density



Figure 2.11: CTLE Architectures

Another method of compensation is with a cascaded continuous time linear equalization (CTLE) stage. These perform equalization by placing a zero prior to all of its poles as given generically in Equation 2.12. When cascaded the zero can be matched with the dominant pole and the first pole of the CTLE becomes the dominant pole of the resulting transfer function, effectively extending the bandwidth of the whole system. These can be realized in either a passive or active manner. The two architectures are shown in Figure 2.11. The passive CTLE necessarily will have some loss, but the active devices allow the CTLE to be designed for minimal attenuation and can even offer gain. For this reason, the active CTLE is preferable with the frequency response specific to it given in Equation 2.13. The dominant pole is generally the output pole of the CTLE. The poles of this variant of the active CTLE can be found at  $\omega_{p1} = \frac{1}{R_D C_P}, \omega_{p2} = \frac{1+\frac{gmR_S}{R_S C_S}}{R_S C_S}$  and the zero can be found at  $\omega_z = \frac{1}{R_S C_S}$ .

$$H(j\omega) = A \frac{(1+j\frac{\omega}{\omega_z})}{(1+j\frac{\omega}{\omega_{p1}})(1+j\frac{\omega}{\omega_{p2}})}$$
(2.12)

$$H(j\omega) = \frac{gm R_D}{gm (R_{degen} || R_{tail}) + 1} \frac{(1 + j\omega (R_{degen} || R_{tail}) C_{degen})}{(1 + j\omega R_D C_{Load})(1 + j\omega \frac{(R_{degen} || R_{tail}) C_{degen}}{1 + \frac{gm (R_{degen} || R_{tail})}{2}})$$
(2.13)



(a) Frequency Response [7]

(b) How the CTLE can be used to extend the system bandwidth [4]

Figure 2.12: Evaluating the CTLE Frequency Response

The CTLE performance can be gauged either in terms of the location and maximum of its peak, or perhaps from a design standpoint, the location of its first zero and pole. The idea is to design the resistive feedback TIA with a lower bandwidth and therefore much less noise, only later equalizing to a higher bandwidth. Figure 2.12 illustrates the process. The tradeoff is that the CTLE also contributes a bit of its own noise and also boosts the high frequency noise from previous stages. This allows for some systemwide optimization, as opposed to optimizing solely for the inverter TIA block and then the CTLE block. There are also a variable number of CTLE's possible. Analysis for multiple equalization stages is the same as with analyzing a single CTLE. It simply allows for a greater amount of equalization and an even lower TIA bandwidth.

To take a closer look at the incorporation of the CTLE, consider its impact separately on the resistor and transistor noise, the latter attributed to the amplifier represented by a multiple inverter TIA. This analysis is based on mathematical simulation with an ideal infinite bandwidth amplifier leaving the input pole the dominant pole. The tradeoff that is critical is the increase in input capacitance that accompanies the increased gm of the input stage. There is also some additional amplifier input capacitance which scales proportionally with gm and the supposed transistor size; the amplifier capacitance appears effectively in parallel with the photodiode capacitance, contributing to the dominant input pole. The testbench for calculations is set up according to Figure 2.13 with an ideal infinite bandwidth amplifier (composed of a transconductance gm and transconductance gds with a resistor in shunt feedback. To evaluate the effect of a lower-bandwidth design on resistor noise, the amplifier design is fixed and the resistor size is increased to create a design at lower bandwidth. The lower bandwidth design is then equalized with the CTLE and the noise is investigated. Here, the feedback resistor size is fixed and the amplifier input stage is increased, altering the gm and  $C_{in}$  contributed by the amplifier. The design of the CTLE is simply as given in Equation 2.13, with the output pole the dominant pole and the pole from the degeneration significantly higher. The calculations assume a fixed amplifier gain of 50 and a photodiode capacitance of 100 fF, with a gm of 1mS per micron and capacitance of 1fF per micron. The thermal noise current of the resistance  $\frac{4k_BT}{R}$  and of the amplifier  $4k_BT \gamma gm$  are considered.



Figure 2.13: Simulation setup for seeing improvement in noise due to lower bandwidth and consequent equalization



TIA fixing amplifier design

(normalized by the midband resistance) of the TIA fixing  $R_{fb}$  size

Figure 2.14: Effect of noise optimization with CTLE on resistor and transistor noise

The results are shown in Figure 2.14. Increasing the feedback resistance in Figure 2.14a shows an increased resistance, but lower bandwidth, requiring the CTLE to boost the bandwidth. Sizing up the amplifier does not directly increase the resistance in Figure 2.14b. Figures 2.14c and 2.14d show the improvement in noise, with the compensated design (from 4 GHz to 6.5 GHz) offering less noise than the original design at 6.5 GHz. For the given plots of noise, there are additional parasitic capacitors and poles from the amplifier and load that will show up but complicate this example. For this reason they have not been included in this initial analysis. These will make it so that the amount of integrated noise at the output is finite (contributing poles to the noise PSD). For the most part, these poles will occur at similar frequencies across the designs.

A few problems come with incorporating the CTLE by simply cascading the stages. First, the input common mode of the CTLE is set directly by the self-biased inverter TIA, which sets the common mode near mid-rail. This compromises a lot of the freedom in designing the CTLE. Additionally, a differential input is desired to get good power supply and common mode rejection ratio, critical in low-noise applications [8]. To deal with these issues, include a pre-amplification stage prior to the CTLE which takes the form of a cascaded differential amplifier which functions as a preamplifier. This also allows for a place to correct for offset between positive and negative. Conveniently, with the multiple TIA architecture, an internal node at the input of the last inverter can be tapped out to the input of the differential preamplifier. The imbalance between signal strength of the differential inputs is at most the gain of the final inverter (which conveniently also has the internal feedback resistor and sees the least gain) and the preamplifier has sufficient strength such that the variance in signal strength is virtually imperceptible at its output per simulations. This problem of single ended to differential conversion is also found when using a dummy TIA; there the imbalance in signal strength in positive and negative is even more pronounced

This approach can achieve much lower noise than incorporating a dummy TIA. With the dummy TIA, it follows that the input referred integrated noise is immediately increased by a factor of  $\sqrt{2}$  as it contributes the same amount of noise as the non-dummy TIA (which has so far been the subject of analysis) without any improvement to the signal amplitude. This approach seems strictly better, given the three-inverter architecture because the dummy is replaced with a smaller input signal. The common-mode rejection of the preamplifier should also reduce a bit of the noise.

From a systemwide design standpoint, the desired output is an analog voltage. This requires that either the last CTLE drive the signal off-chip, or that some additional output stage be included. As a result, a full system appearing according to Figure 2.15 is proposed. There is a 3-inverter TIA followed by a preamplifier amplifier that serves to bias the common mode and adjust for offset. Afterwards, there are two CTLE stages to perform equalization followed by an output stage to drive the analog voltage. The input current signal comes from the bottom of the figure and the output voltage signal exists the top of the figure.



Figure 2.15: Proposed architecture for the full optimized system.

## Chapter 3

## Generator Design

Having discussed circuit optimization techniques, now the discussion moves to the layout and the low-level optimization and design of each of these individual blocks with systemwide considerations in mind. In designing analog circuits, there is an unnecessary amount of redesign necessary as discussed prior. Often, a designer will reuse blocks with slight changes in circuit parameters (transistor type, finger width, number of fingers, etc.), but have to redraw much of it by hand. By designing using generators, the designer can leverage large amounts of reuse and tweak these parameters easily, having established the floorplan, planned out routing, etc.

The disparity between schematic and post-layout extracted simulations is becoming greater and greater. Thus, the number of tweaks necessary in design, in terms of first designing with schematic, translating to layout, then simulating and, when specifications are not met, looping back and iterating to slightly adjust the circuit parameters, will only increase as it becomes increasingly difficult to properly model and characterize layout effects. It is quite convenient then that designing with generators allows for the creation of a large number of layouts to effectively sweep layout parameters in a layout-effect informed manner. To perform this process by hand requires an exorbitant amount of time. It may seem ambitious, but this work proposes a push-button flow for generating an optimized low-noise AFE from the aforementioned floorplan (Figure 2.15).

In terms of implementation, the idea is to perform a rough characterization of the devices and run a quick local optimization script in python. The term local is provided to denote that the script is based in python and does not interface directly with any circuit design tools. It uses a basic model of transistor parasitics and operating parameters such as gm, gds, etc. and  $c_{gs}, c_{ds}$ , etc. Given a list of desired specifications such as bandwidth, gain, or phase margin, it spits out the circuit parameters for a circuit topology that is optimal according to the model. The role of these scripts is to perform a rough optimization that is only as good as the characterization is (they are rough only when compared to the post-extracted results, but provide the optimal design for the given characterizations). These scripts are used to spit out a number of designs (upon partitioning the design space) that can be fed into the layout and schematic generators, which will be extracted and simulated accordingly. The results of these simulations are used to improve and adjust the characterizations, feeding back into the local optimization script for another iteration. This flow resembles the flow a designer might currently take, just that the entire process is automated.

Recently, there has been much work on creating layout and schematic generators, allowing greater automation for each process [1]. These will hopefully reduce the amount of work designers will have to redo and will leverage the repeated use of similar circuits through reuse of generators. These generators should work and scale how a layout designer would want them to as if done by hand.

The generators and design flow are demonstrated in 45RFSOI process, though the scripts are portable to other technologies given the proper characterizations and slightly altered layout generation scripts to meet any new DRC rules. On this topic, consider the requirements for making a layout generator scalable. It must be able to meet DRC rules for an extremely broad range of component parameters. It must also be able to perform without significant degradation throughout this range, showing performance close to an ideal hand-drawn layout. If designs are too limited, it defeats the purpose of having a design script and a generator and hand layout would be far superior. To allow this design loop to function properly and explore many designs, the generator scripts scripts must be generic and robust enough to handle variation. The scripts do not want to solely be trained upon how the layout generator underperforms, but it does want to capture the difficult of scaling up or down layouts and the impact resulting layout parasitics have on the circuit performance. Some logical restrictions may also be placed, among these elements of basic symmetry such as the differential pair having the same sizing across each branch and accommodating variants that are not superfluous (accommodating only what is practical and can be found in a legitimate design). The scripts also take into account electrimigration concerns for wire width and allowed transistor current density according to the characteristics of the process.

In terms of generator design, there are a few basic specifications related to circuit performance that are provided for each script. These are used to guarantee that the transistors are all in the proper region of operation which will often factor in as a maximum  $V_{gs}$  or minimum  $V_{ds}$  constraint. This also functions to limit the minimum and maximum current density which limits the amount of variation due to process mismatch and reliability concerns due to electromigration effects. Additionally, design scripts must each take into account some input and output loading specifications. Other elements such as reasonable transistor sizings are used to make sure that the generators provide designs that can be practically implemented and operate in the proper region

#### 3.1 Inverter TIA Design

The design of the TIA has been the subject of much discussion. This section describes the design of the local optimization script for the TIA. It has the ability to optimize for a variable number of inverters (they still must be odd), though for the given characterization and applications it was never optimal to use a chain of more than 3 inverters.

The inverter TIA can be broken down into the amplifier (inverter chain design) and the resistor, each contributing noise. The main constraints are bandwidth and the amplifier phase margin, optimizing for lowest noise under these conditions. Inverter size, inverter chain internal feedback resistance, and the overall shunt feedback resistor are available degrees of freedom. The technique for optimization involves designing the best possible amplifier and later sizing the resistor to meet the appropriate bandwidth. To optimize the amplifier, sweep the size of the inverter. As demonstrated prior in Figure 2.6b, as the inverter size increases, there will be some optimal point at which the noise contributed by the transistors is least. Looking at the curve, applying the technique of increasing inverter size will reap less and less benefit as it is applied repeatedly. It also contributes to the input capacitance of the structure, somewhat limiting the size of the resistor due to bandwidth constraints. Thus, the tradeoff becomes weighing whether the current decrease in transistor noise is more significant than the slight increase to resistor noise. This point is calculated by sweeping through inverter size, sizing the overall feedback resistance to achieve the proper bandwidth. Design for the phase margin is done by varying the number of inverters in the chain as well as adjusting the gain of the final inverter by an internal feedback resistor in shunt feedback placed between the final inverter's input and output. The algorithm can be found in Algorithm 1.

| Algorithm 1 Inverter TIA Local Design Script Pseudocode               |
|-----------------------------------------------------------------------|
| for MIN_FG to MAX_FG do                                               |
| Initialize design                                                     |
| while current BW is too far from target BW $do$                       |
| Design the amplifier in the following loop                            |
| while Phase margin is unnecessarily large do                          |
| Find the minimum possible internal feedback resistor size             |
| end while                                                             |
| Binary search to size overall feedback resistor to meet BW constraint |
| end while                                                             |
| Estimate noise                                                        |
| if current noise $<$ best noise then                                  |
| End the script and return the current optimum                         |
| else                                                                  |
| Set the current design as the best design                             |
| end if                                                                |
| end for                                                               |
| return the best design                                                |

More in depth in terms of the transistor sizing, the size of the nmos and pmos need not be adjusted individually and can be considered one unit as both the input and output should be set to mid-rail  $\frac{VDD}{2}$ . Thus, some ratio of nmos and pmos achieving the same current can be stored. In terms of sizing the chain of inverters, it was found that the optimal design

had all three inverters the same size. It may appear logical that the inverters be sized up to manage the fanout to drive the load capacitance, but this consumes unnecessary power and puts greater strain on the first inverter. It may also seem logical that each consecutive stage be smaller as each contribute less noise after the first (decreasing by a factor inversely proportional to the gain of each stage), but on the contrary this will make it difficult to drive the load capacitance and will limit the bandwidth.

The input specifications for the inverter TIA design script can be summarized as the desired bandwidth, the expected output load capacitance, the minimum necessary phase margin, and a model for the photodiode and packaging parasitics at the input. The output results are the number and sizings of the inverters and the size of the resistors as well as the predicted performance.

The floorplanning is rather straightforward, with the inverters stacked upon each other as in Figure 3.1. This floorplan is advantageous because it is very scalable with respect to the number of inverters in the chain. The feedback resistors each consists of two resistors in parallel (one along each side of the inverters) to achieve better matching. In sizing up the inverters, the number of fingers is kept similar. A key consideration for the layout of the inverter is electromigration, necessitating a dense power grid. The feedback resistor can be found at the bottom right and left. When present, the amplifier internal feedback resistor (used for adjusting open loop gain and phase margin) can be found at the top right and top left.



(a) Annotated 3-Inverter TIA Layout

Figure 3.1: TIA Layouts



Figure 3.1: TIA Layouts (cont.)

## 3.2 Layout-Aware Inverter TIA Optimization

The design script provided above utilizes transistor characterizations (collecting the current and  $v^*$  numbers, among other parameters) to provide a first pass design. In simulation, however, the results can vary widely in both schematic level and post-extracted simulations. As it becomes increasingly difficult to characterize layout effects even in schematic, the disparity between the expected results from simple models and simulated results becomes very significant. As a result, it is necessary to find a way to take into account these effects and will likely require interfacing with available layout generation and simulation tools beyond python.

It is common to interface directly with the tools and simply perform the optimization by adjusting the parameters slightly. Here, the burden of optimization is straightforward, but is likely restricted to a narrow design space. For example, consider an approach for the design of the TIA that seems more traditional. Having performed some optimizations at the schematic level, the layout is performed and simulated, falling slightly below the desired specifications. The designer then loops back to redesign, perform layout and simulate again. This loop is extremely tedious and is hardly time efficient, gaining little information. All the designer has gained information on is the specific parameters and designs simulated, and perhaps some rather abstract qualitative observations on how the layout simulations deviate from the schematic simulations. Instead, it would make sense to use those results that provide unsuitable designs and update the characterizations and models used in the local design script. This implies trusting the design script to find the optimal design given the proper characterizations. The onus of optimization is thus placed upon fitting the models properly to the layout-extracted simulations.

The layout parasitics are rather non-linear, which requires fitting some non-linear model done with spline interpolation. The interpolation is mainly to adjust the parasitic capacitances of the models until the bandwidth dictated by the design script matches that which is simulated. There are slight adjustments to the transconductance and output resistance models as well. As a result the transistor sizes are broken up into a few distinct intervals to record points for the interpolator. The local design script will look for a best design between each interval, passing them into a higher level optimization script. The higher level optimization script interfaces with a simulator which has the ability to generate layouts and schematics. It then runs the simulation on the target design from each sub-interval and passes the information back to the design script to update the interpolator and the characterizations. Because the characterizations and models now more closely match, it is as if a wider range of designs have been explored, much more than has been simulated (specifically in comparison to the aforementioned approach). An algorithm for the overarching design script has been provided below in Algorithm 2 with a corresponding block diagram of the design process given in Figure 3.2.

| Algorithm 2 Layout Aware Inverter TIA Design Loop                             |
|-------------------------------------------------------------------------------|
| Perform initial characterizations and create model for parasitics             |
| while $CUR_ITER < MAX_ITER$ do                                                |
| for each truncated interval for number of fingers $do$                        |
| Run local design script to produce a local optimized design                   |
| end for                                                                       |
| if all designs output by the local design script have been simulated then     |
| Design space has been sufficiently explored and <b>return</b> the best design |
| end if                                                                        |
| Generate layouts and schematics in parallel                                   |
| Perform Simulations in parallel                                               |
| Parse results and update parasitics model in characterizations                |
| Add to list of simulated designs                                              |
| end while                                                                     |
| Design space has been sufficiently explored and <b>return</b> the best design |



Figure 3.2: Block Diagram of the TIA Design Loop

It is interesting to consider the stopping point. When are the models in the local design script accurate enough? It would seem that there is an infinitely large design space to be explored and it would require truly exhaustive simulation to fit the models perfectly. However, even in the TIA design loop there is only a mesh of designs that are simulated. The size of the inverter is quantized in terms of number of fingers; the sizes of the resistors are quantized in terms of achievable resistances sizing in nanometer increments. It is both possible and practical to derive accurate characterizations and observe convergence within this mesh of the design space.

## 3.3 System Wide Optimization

The previous section has addressed the isolated design of the TIA. However, from a system design perspective, it is not this simple. There are many other blocks that must follow. It is important to find a way to properly assign the correct design specifications for each of the sub-blocks, critically seeing how the design of one impacts another. To address this, consider the design with bandwidth, biasing and the overall loading of each stage in mind.

There is some target bandwidth for the overall system that is derived from the data rate. The pre-amplifier and the output stage should be designed such that they have little impact on the amplifier bandwidth. The blocks mainly of concern are the two CTLE's and the TIA. It would make sense to design from the back towards the front, seeing how much of an impact the CTLE's can have and as a result how low of a bandwidth is necessary from the TIA. However, the design and optimization of the CTLE also hinges upon the behavior of prior stages, as will be discussed later. This can be addressed by analyzing the CTLE to estimate what percentage fractional boost in bandwidth is expected. From this estimate and the target bandwidth, generate a specification for the TIA target bandwidth. Use this to design the TIA and feed the specifications to the TIA from front to back.

For determining the biases and DC common mode voltages, it is preferable to design from the back to the front. Selection of the input common mode of the output stage has a
very significant impact as it will determine the current density and the necessary size of the output stage. A large current is naturally required. The determination of the input common mode of the output stage determines the output common mode of the second CTLE stage. The input common mode is left as a free variable for the design of the CTLE, eventually determining the output common mode of the pre-amplifier. The pre-amplifier is meant to properly bias the rest of the sub-blocks.

The expected load capacitance of each stage has a significant impact upon each block. It is responsible for determining the dominant pole likely in all blocks besides the TIA. It is also influential in the sizing of various blocks. For example, the input size of the output stage is heavily impacted by how large a load the second CTLE can drive, noting that the dominant pole of the second CTLE must be higher than the dominant pole of the first CTLE. These are set to some initial reasonable estimate for the design script, and then later tuned according to how hard it is to meet the BW specifications.

In getting the sub-blocks to function properly with respect to one another, it makes sense to design from the back to the front, dealing with the problems of biasing and load capacitances. However, this makes the design less optimal, specifically in terms of the bandwidth optimization. Necessarily, there must be some pass from back to front for proper biasing followed by a pass from front to back for optimizing the bandwidth. The pass from front to back deals mostly with the optimization of the CTLE that will be addressed later; normally in this pass there are no tweaks to the biasing, only tweaks to the location of the poles and zeros resulting from the degeneration impedance.

This method does not yet necessarily produce the most optimal design with noise in consideration, but closely approximates the optimal. As a result some co-optimization may be necessary, with adjustments of the TIA BW at the core. This is demonstrated in the following Algorithm 3. At first glance, the process may seem rather inefficient having to perform two passes of design, but note that the TIA optimization is the most time and resource intensive. The latter block-level optimizations are much quicker to perform in comparison.

| Algorithm 3 Overall Design Process                                           |
|------------------------------------------------------------------------------|
| Estimate TIA BW specification based on CTLE fractional boost and target BW   |
| while noise is not optimal do                                                |
| Adjust TIA BW specification according to the achieved BW                     |
| Design the TIA according to the current specifications                       |
| Design the output stage                                                      |
| Perform a first pass design of the second CTLE focussing on the biasing      |
| Perform a first pass design of the first CTLE focussing on the biasing       |
| Design the Differential Pre-Amplifier                                        |
| Perform a second pass design of the first CTLE adjusting pole/zero location  |
| Perform a second pass design of the second CTLE adjusting pole/zero location |
| end while                                                                    |

### **3.4** Differential Pre-Amplifier Design

The role of the preamplifier is to adjust the input common mode of the CTLE, providing some gain in the meantime. Offset correction is also performed at this stage. The amplifier is a simple cascoded differential amplifier with a resistive load. The pmos transistors above are used to tune the common mode voltage. The bandwidth is set by the output pole, which means it relies heavily upon the load capacitance. In this design, the constraints are the bandwidth and the output common mode, optimizing to maximize the gain. The bandwidth is chosen to be much higher than the signal bandwidth such that the preamplifier amplifier does not degrade the overall bandwidth significantly.

The free variables that are available are the  $v^*$  of the input pair and the cascode transistors as well as the bias current. The  $v^*$  of the transistors is determined by the voltage bias, however the mapping of  $v^*$  to the bias point is not always so straightforward, making directly sweeping  $v^*$  difficult. Instead, sweep the drain voltage of the input pair and the size of the cascode transistor. The drain voltage determines the  $v^*$  as the source voltage is somewhat fixed to as low as it can be for the tail transistor to function properly. The size of the cascode transistor determines the operation of the transistor as it corresponds to the necessary gate voltage. Its source is already set by the drain of the input pair and its drain is set by the desired output common mode. In the innermost loop, there is a sweep of current because doing so provides the most direct tradeoff between BW and gain, which is necessary for the BW constrained optimization. This allows for evaluation of gain for the same BW across all of these  $v^*$  choices. The input specifications are the desired input and output common mode voltage, the maximum input capacitance it is allowed to provide, the expected load capacitance, and the minimum bandwidth. The output results are the sizings of the various transistors and resistors. The algorithm is given in Algorithm 4.

| Algorithm 4 Differential Pre-Amplifier Design                                          |
|----------------------------------------------------------------------------------------|
| Store necessary output and input common mode                                           |
| Determine input transistor size according to the expected load from the previous stage |
| for possible drain voltages for the input pair <b>do</b>                               |
| for possible sizes of the cascode transistors do                                       |
| while BW specification is not met do                                                   |
| Adjust the chosen bias current which tunes the gain and BW                             |
| Bias the cascode transistor properly according to the chosen bias current              |
| Set the size of the load resistor for the proper output common mode                    |
| Calculate expected BW and gain                                                         |
| end while                                                                              |
| end for                                                                                |
| end for                                                                                |
| Compare simulated designs                                                              |
| return the design with highest gain                                                    |

The layout follows closely how the structures appear in the schematic. It consists of the pmos load transistors (for tuning the output common mode), stacked atop the load resistors, stacked atop the cascode transistors. These then sit atop the input pair, which sits atop the tail transistor. To the right of the stack, all of the transistors that require an external bias (the tail, cascode transistors, and pmos load transistors) have a current mirror placed there. The input wires are around the input transistors and the output wires are located around the resistors.

The respective current mirrors are placed close to the transistors they bias allowing current routing throughout the macro. Routing voltages faces the challenge of voltage drop due to the wire resistance (IR drop) and sensitivity to supply resistance. As components may be separated by large distances, there is chip-wide process mismatch that must be considered. By outing currents across the chip, the impact of these problems is reduced.



Figure 3.3: Layout of the Pre-amplifier



Figure 3.4: Schematic of Pre-Amplifier with Offset Correction

Offset correction is applied by pulling down extra current in one branch of the preamplifier from the source of the cascode pair (drain of the input pair). Only one of either offset\_correct\_n or offset\_correct\_p should be high at the same time. Adjustment of offset in the common mode is done by the pmos tail transistors which create a virtual ground at their drains in differential mode analysis and thus does not directly impact the pre-amplifier transfer function. The set of pmos transistors form a virtual ground in the differential mode towards the top of the structure. These are used to tune the common mode voltage along the chain.



Figure 3.5: Schematic of Pre-Amplifier with Offset Correction

## 3.5 CTLE Design

The CTLE design script takes in the input pole and outputs the greatest possible output pole (it optimizes the maximum output pole location for a given input pole). This script is used to optimize for the lowest possible input pole given the output pole by iterating through selection of input poles. The script initially optimizes with a fixed input pole because the output pole depends largely upon the load capacitance (which is dependent upon later stages). It is tuned by the size of the input (contributing parasitics) and the size of the load transistor output resistances. Additionally, tuning the zero location and the degeneration capacitance directly and then designing for the output pole is more straightforward and promising for finding optimal designs. There is likely some mismatch between simulation and the calculations in python. The gain error is less pronounced than the error of locations of poles and zeros. As a result, following the CTLE design script, in the pass of designing from the front to the back, a post-layout simulation (using the designed preamplifier and TIA) is performed to adjust the cap such that the peaking of the output transfer function is not too great (a flat frequency response is desired), looking at the new pole of the system. When simulating the second CTLE, the capacitance is adjusted using similar simulations, now also factoring in the first CTLE.

The input specifications are the pole of the prior stage, the expected load capacitance, the maximum input capacitance this block and provide, the output common mode voltage, and the desired gain. The output results are the sizings of the various transistors and passive components (pull-up resistance, degeneration resistance, and degeneration capacitance), the optimized input common mode voltage, and the predicted pole location. The script is found in Algorithm 6.

| Algorithm 5 CTLE Design Script                                                     |
|------------------------------------------------------------------------------------|
| Store necessary output common mode                                                 |
| for possible input transistor size do                                              |
| for possible input bias do                                                         |
| Select resistor and size tail transistor according to gain requirement             |
| Size capacitor so that the zero matches with the previous stage pole               |
| Calculate CTLE pole (new output pole)                                              |
| end for                                                                            |
| end for                                                                            |
| return the design with highest CTLE pole (new output pole) and the necessary input |
| common mode                                                                        |
|                                                                                    |

The layout for the CTLE is split into three components. There is a block composed of the active components with the load resistor, along with another block for the degeneration resistor DAC and degenration capacitive DAC. The DAC's are necessary because they will allow for tunability of the zeros and some prompt adjustment of the gain in testing. They are rather large though and scale in size exponentially with the number of bits in order to guarantee good matching.



Figure 3.6: CTLE Layout

## 3.6 Output Stage Design

The role of the output stage is simply to drive the signal off chip. This amounts to a slew rate constraint as well as a swing constraint as the impedance must be matched with the transmission line or probe from off chip.

These amount into a constraint for the bias current. For slew rate considerations, it is important to take into account the target signal frequency, the expected load capacitance and the peak to peak signal swing given by  $V_{Swing}$ . To meet the necessary swing, there must also be sufficient voltage headroom. These are governed by the load resistance  $R_L$  and the peak to peak signal swing. Typically, in order to probe the analog output, the output stage must match to an  $R_L = 50\Omega$  termination resistance. Consider the resulting Equation 3.1. Once the necessary bias current is achieved, there is verification for the proper bandwidth and an optimization for maximum gain. Because the load resistance is generally very small, optimizing for a large transconductance to attenuate the signal as little as possible or to provide as much gain as is feasible.

$$I_{bias,min} = \max(\frac{V_{\text{Swing}}}{R_L}, \pi f C_{load} V_{Swing})$$
(3.1)

The input specifications are the desired output swing, the expected load capacitance, the maximum input capacitance this stage can provide, the intended data rate, the desired resistance to match to, and the desired gain. The output results are the sizings of the various transistors and the optimized input common mode voltage.

| Algorithm 6 Output Stage Design                                                        |
|----------------------------------------------------------------------------------------|
| Estimate the necessary current according to the load capacitance, BW and output swing  |
| Estimate the necessary current according to the output swing and load resistor         |
| Take the necessary current as the maximum calculated from the two sets of specs        |
| Optimize for the input common mode                                                     |
| while gain is too large or too small do                                                |
| Update the input common mode according to the gain constraint                          |
| Use input common mode and maximum input capacitance to size the input pair             |
| Size the tail transistor                                                               |
| Calculate the gain                                                                     |
| end while                                                                              |
| return the design with the proper gain and bias current and the necessary input common |
| mode                                                                                   |

Similar to the inverter TIA, a critical consideration is electromigration. This requires a dense power grid and matching up the fingers of the input pair and the tail transistor. There must also be careful consideration of the connection from the input pair to the resistors. The pmos tail transistors are stacked atop the input pair. The mirror for the tail transistor is right next it with a set of transistors that function as decoupling moscaps between the gate voltage of the tail and VDD. The load resistors below are generally matched to 50  $\Omega$ .



Figure 3.7: Output Stage Layout

## 3.7 Macro and Top Level Assembly

The layout of the individual sub-blocks have been discussed. In order to have a truly pushbutton flow, the assembly of the whole AFE macro must also be automated. Three designs have been explored based on different design specifications, which showcase the scalability and flexibility of the layout and schematic generators. In conceptualizing the positioning of the blocks a few things are important to keep in mind, among them minimizing parasitics for key signal nodes and laying down a dense power grid while making sure various signals can escape the block. In order to minimize parasitics, the blocks are lined up such that the signals between stages do not have to travel vertically, only horizontally as they traverse the block. This is one clear advantage of using a generator, as slight tweaks do not require any adjustments by hand. Doing so simplifies the process of transferring the individual blocks for use by the digital tools.

In terms of scaling, the most difficult blocks to handle are the CTLE's. For these blocks, it is particularly important to pay attention to the placement of the resistive and capacitive DAC's because they are very large. As a result, the DAC's have been placed around the central routing channel used for routing so they do not impede the placement of other blocks and do no necessitate unnecessarily complicated routing of important signals. Because the sizes and placement of all of these blocks will change with different specifications and component parameters, it is important to have a very general method of connecting the power supplies. Using BAG, it is very easy to keep track of both signal and power supply wires on lower routing layers and the routing tracks that have been occupied. Using this information, a script can iterate through relevant wires over open tracks to lay down the power grid while making sure to leave room to escape the wires. In this script, the wires on the vertical white layers were considered the top level for each of the blocks. The red horizontal layer was used as the power grid within the macro and to escape wires from the blocks. These wires are mainly the digital configuration bits necessary for the DAC's, some biasing, and performing offset correction. In order to set the digital configuration bits, a scan chain is necessary. The escaped wires are routed to the scan chains, which sit close to the macro. Many of the currents also require bias currents which need to be fed in from off the chip.



Figure 3.8: Current Distribution Schematic

The current distribution and biasing scheme can be found in Figure 3.8. Some of the macro's share a bias current. Sharing bias currents reduces the number of off chip currents that need to be provided. The current bias wires escape through the top and bottom of the block while the scan chan bits escape through the right and left of the block. VDD has also been divided into three separate domains as to reduce the deterministic droops in VDD that can show up as supply noise. VDD1 is dedicated to the TIA. VDD2 is dedicated to the preamplification and equalization stages. VDD3 is dedicated to the output stage. The scan chains also run off of VDD3. There is just a single VSS shared throughout the chip. An

additional VDDPAD is also present to bias the ESD diodes for the pads. The macro layout for the PD1 10 Gbps design is given in Figure 3.9; all of the macro's look rather similar and scale in the same fashion. The red layer densely lain is the top horizontal layer of the macro. The top vertical layer of the macro is the white layer. A dense vertical grid at the next highest level of hierarchy is placed over the horizontal layer to route the power.



Figure 3.9: Annotated Overall Macro Layout with Power Grid

From an overall system perspective, because noise is the crux of the study, there is an analog current input and an analog voltage output allowing for direct noise measurement. To accommodate there are a set of pads at the output and many pads at the input. The overall design with the three macros have been provided in Figure 3.10.



Figure 3.10: Annotated Overall Layout

## Chapter 4

## Simulation and Design Results

This section focusses on the design specifications and designs actually tested and generated along with their simulated results. The design specifications are based primarily off of two photodiode models and three sets of performance specifications that come from realworld design problems. These three sets of specifications have motivated the designs of the three macros. In terms of simulation, there are a few key elements to address. The photodiode has largely been generalized as a current source with photodiode capacitance. In practice, however, more complicated models that take into account additional parasitics are necessary. The performance of the various design scripts will be showcased, focussing on the performance of the layout-aware inverter optimization script and the peaking effect of the CTLE. Finally, there is some analysis of the power consumption for each stage as the performance specifications change.

### 4.1 Photodiode Model

Necessarily there are a number of packaging parasitics that must be taken into account when analyzing the photodiode model. This adds to the case for the design methodology of this work in using highly parameterizable and generalizable fashion. A generic photodiode model can be very easily adapted with this flow. As the intended package is wire-bonding, the wire-bond parasitics as well as the pad capacitance must all be taken into account, all of which add factor in significantly. The wire-bond inductance is critical to seeing an inductive peaking effect, though the effect is rather unpronounced due to the difficulty in tuning the inductance and general limitations correlating to what wire-bond lengths can be actually realized. To showcase the capability of the proposed design methodology and circuit techniques, this work has investigated three design points, as mentioned prior, focussing on two photodiode models. The models appear as in Figure 4.1.

Looking at the provided models, it is clear that design is hardly as straightforward as when abstracting to just a single photodiode capacitance. Hand analysis becomes extremely complicated. From a simple abstracted model, it may look somewhat like designing with



(a) Photodiode and Packing Model Schematic

|            | PD1               | PD2           |
|------------|-------------------|---------------|
| $R_j$      | $5 M\Omega$       | $5 M\Omega$   |
| $C_j$      | $250~\mathrm{fF}$ | 100 fF        |
| $C_p$      | 20 fF             | 20 fF         |
| $R_s$      | $15 \ \Omega$     | $20 \ \Omega$ |
| $L_s$      | 15 pH             | 15 pH         |
| $L_{wire}$ | 1.5  nH           | 1.5 nH        |
| $C_{pad}$  | 80 fF             | 80 fF         |

(b) Photodiode and Packing Model Component Parameters

Figure 4.1: Photodiode and Packaging Model Specifications

a pure capacitance at the input of the TIA of 350 fF and 200 fF for the models PD1 and PD2, respectively (this is attained by simply adding together the capacitances present in the photodiode and packaging model). These numbers are rather large as on-chip photonics can achieve numbers on the order of 5-15fF. From the discussions prior, it makes sense that higher data rates would require a smaller photodiode capacitance with high responsivity. If not, the design would simply be too difficult. This statement comes from both the perspective of supporting a signal with the proper circuit bandwidth as well as from the perspective of noise-limited design as the noise contributed by the inverter TIA scales directly with the size of the photodiode capacitance when referred back to the input. For both photodiodes, a 10 Gbps link design is targeted. For PD2 with slightly smaller photodiode capacitance, a 25 Gbps design has also been produced, but the large photodiode capacitances do present a significant challenge.

## 4.2 Optimization and Design Specifications

In determining the performance of the optimization script, there must be an analysis not only the the result, but also of the runtime of the algorithm. If it requires exceedingly large amounts of time, then this method of design is not better than the existing. According to Algorithm 2 the runtime will generally depend on a few factors. First is the time required for each loop, consisting of the runtime of the local design script, generating layouts, running post-layout extraction, and running simulations. There is also the matter of how many loops are required and how many times the design space is partitioned (in terms of number inverter size). It takes roughly 10-20 iterations to converge and there are between 5-10 partitions, requiring on the order of 100 inverter designs to be evaluated. The runtime of the local design script is very short (less than a minute); layout and schematic generation are also rather quick (at most a few minutes). The simulations are generally AC simulations so they do not take too much time; not until later is a transient simulation run on the postlayout extracted result which consumes a fairly large amount of time to generate proper eye diagrams. Running post-layout extraction require significant amounts of time, for the TIA alone often taking up to the order of minutes to extract the parasitics for each design. The time also increases for increased design complexity. Here, parallelization reaps enormous benefits. The time immediately decreases by roughly the number of design space partitions. Overall, the automated design loop finishes within 4 hours. That is the amount of time it takes to optimize the design for a generic set of specifications and output the functioning design in the form of a generated layout and schematic.

The desired specifications are as given in Table 4.2. They designed at nominal 27C, with a 1.2V VDD in a 45nm SOI process. Each has an assumed output pad capacitance of 80 fF and bond wire inductance of 1.5 nH and each output stage must match a 50  $\Omega$  termination.

|                                       | PD1 10 Gbps        | PD2 10 Gbps        | $PD2 \ 25 \ Gbps$   |
|---------------------------------------|--------------------|--------------------|---------------------|
| $f_{3db}$ Bandwidth                   | $6.5~\mathrm{GHz}$ | $6.5~\mathrm{GHz}$ | $15~\mathrm{GHz}$   |
| $f_{ENBW}$ Equivalent Noise Bandwidth | 10.2 GHz           | 10.2 GHz           | $23.6~\mathrm{GHz}$ |
| (Brick Wall Approximation)            |                    |                    |                     |
| Output Voltage Swing (Peak to Peak)   | $350 \mathrm{mV}$  | $350 \mathrm{mV}$  | $350 \mathrm{~mV}$  |

Figure 4.2: Targeted Design Specifications

## 4.3 Optimized Designs

In the optimization of noise, the baseline constraint is proper operation at the desired data rate. For example, the bare minimum required of the PD1 10 Gbps design is that it can operate at the desired 10 Gbps data rate apart from noise concerns. Though proper operation should be gauged with a transient simulation having applied some PRBS signal, the AC frequency domain simulations also provides useful information. Figure 4.3 gives the integrated input referred noise, the full system bandwidth and the full system impedance given as  $\frac{V_{out}}{I_{ph}}$ . Figure 4.4 provides the performance of each of the sub-blocks. The preamplifier and output stage are both designed with BW, much larger than the signal BW. Figures 4.5, 4.6, and 4.7 contain additional AC simulation results.

|                   | PD1 10 Gbps Design         | PD2 10 Gbps Design          | PD2 25 Gbps Design      |
|-------------------|----------------------------|-----------------------------|-------------------------|
| Tia $f_{3db}$     | 4.02 GHz                   | 3.96 GHz                    | 14.3 GHz                |
| Tia $R_{dc}$      | 83.1 dB (14.3 k $\Omega$ ) | 88.1 dB (22.9 K $\Omega$ )  | 55.4 dB (587 $\Omega$ ) |
| Overall $f_{3db}$ | $6.47~\mathrm{GHz}$        | $6.5~\mathrm{GHz}$          | 15.2 GHz                |
| Overall $R_{dc}$  | 88.4 dB (26.2 K $\Omega$ ) | 92.6 dB (42.84 K $\Omega$ ) | 59.3 dB (919 $\Omega$ ) |
| Brick Wall Noise  | 906.2 nA                   | 586.9 nA                    | 2.154 uA                |

Figure 4.3: Table Summarizing Overall TIA Performance

|                             | PD1 10 Gbps Design         | PD2 10 Gbps Design         | PD2 25 Gbps Design      |
|-----------------------------|----------------------------|----------------------------|-------------------------|
| Tia $R_{dc}$                | 83.1 dB (14.3 k $\Omega$ ) | 88.1 dB (22.9 K $\Omega$ ) | 55.4 dB (587 $\Omega$ ) |
| Preamp Gain $R_{dc}$        | 11.0  dB (3.54)            | 10.9  dB (3.49)            | 8.62(2.70)              |
| Preamp BW $(f_{3db})$       | $> 15 \mathrm{~GHz}$       | $> 15 \mathrm{~GHz}$       | $> 15 \mathrm{~GHz}$    |
| CTLE1 Gain $R_{dc}$         | -1.28 dB (0.863)           | -1.30 (0.861)              | -1.17 dB (0.874)        |
| CTLE2 Gain $R_{dc}$         | -0.271 dB (0.969)          | -0.868 dB (0.905)          | 0.547  dB (1.07)        |
| Output Stage Gain $R_{dc}$  | -4.11 dB (0.623)           | -4.12 (0.622)              | -4.11 dB (0.623)        |
| Output Stage BW $(f_{3db})$ | > 20  GHz                  | > 20  GHz                  | > 20  GHz               |

Figure 4.4: Table Summarizing Sub-block Performance

The effect of the preamp is visible in figures (a). These stages do not affect the shape of the frequency response but simply applies some amount of gain to the plot. Figures (b) show the effect of the CTLE extending the bandwidth with their individual frequency responses shown in figure (d). The bump in the frequency response is apparent. Figure (c) gives the overall output with the output of the second CTLE to confirm the output stage has minimal impact on the overall  $f_{3db}$ . The  $f_{3db}$  of the overall systems are marked in figures (c) and (d). Figure 4.7e contain the frequency response of the preamplifier and output stages by themselves to verify their large bandwidth, with the responses from the PD1 10 Gbps design provided as a definitive example.



Figure 4.5: PD1 10 Gbps Design Frequency Domain Response



(a) TIA Frequency Response



(b) Looking at the effect of the CTLE stages in the Frequency Response







(d) CTLE stages Frequency Response marking TIA  $f_{3db}$  and overall system  $f_{3db}$ 

Figure 4.6: PD2 10 Gbps Design Frequency Domain Response



(e) PD1 10 Gbps Design Preamplifier and Output Stage Frequency Response

Figure 4.7: PD2 25 Gbps Design Frequency Domain Response

The results in the frequency domain are very interesting, but are difficult to interpret and map to the actual circuit response. Transient simulations and eye diagrams provide a more accurate picture. A PRBS31 current is applied as the input to the system. Figure 4.8 showcases the eye diagrams for the various macro designs (according to the three sets of specifications) operating at maximum swing, with the input current applied according to the output swing and overall system impedance.



(b) Output Eye for PD2 10Gbps Design

Figure 4.8: Overall system output eye diagrams for the various design specifications all operating at maximum swing



(c) Output Eye for PD2 25Gbps Design

Figure 4.8: Overall system output eye diagrams for the various design specifications all operating at maximum swing (cont.)

### 4.4 CTLE Performance

The performance of the CTLE is a critical component of this work in demonstrating that the optimization with the CTLE yields superior noise performance. It is again generally tied with the signal integrity and functionality of the block; the CTLE exists so that other blocks can be designed at a lower bandwidth. As a result, it should follow that the CTLE extends the overall system bandwidth, as evident from frequency domain simulations, and opens up the eye (increases the eye opening height relative to the overall height of the eye) in eye diagrams from transient simulations. Figures 4.5b, 4.6b, and 4.7b all show how the CTLE stages are able to properly apply peaking on the transfer function and extend the overall bandwidth (with the output stage providing little bandwidth degradation as stated prior). Figures 4.9 and 4.10 show qualitatively the eye opening gradually through the CTLE stages.



(c) PD1 10Gbps Design CTLE 2 Single-Ended Output Eye

Figure 4.9: Seeing the effect of the CTLE stages on the eye opening. Design based on PD1



(c) PD2 10Gbps Design CTLE 2 Single-Ended Output Eye

Figure 4.10: Seeing the effect of the CTLE stages on the eye opening. Design based on PD1



(b) PD2 25Gbps Design CTLE 2 Differential Output Eye

Figure 4.11: Seeing the effect of the two CTLE stages on the eye opening. Design based on PD2

The 25 Gbps Design with PD2 experiences a significantly lower bandwidth extension when looking at the frequency response. This is apparent from Table 4.3 evaluating the bandwidth of the TIA against the overall bandwidth. The eye still visibly opens up a bit, but the effect is not as pronounced. The reason the continuous time linear equalization is not as effective is because the design is impacted by the wire bond inductance which is visible in the frequency domain response of the 25 Gbps design in Figure 4.7a. There is already some peaking visible prior to the CTLE stage. As a result, the signal decays with at least 40 dB per decade (second order roll-of) making compensation rather difficult and necessitating at least two CTLE stages to compensate the dominant poles and see any sort of bandwidth extension. The eye diagrams out of the TIA and following the equalization and output stage are nonetheless provided in Figure 4.11.

These measurements and results so far take into account the effect of the CTLE on the performance of the signal independent of noise. It is clear that the CTLE is effective in extending the TIA bandwidth and opens up the eye diagram in simulation. The noise effects are not so clear yet though. Consider the noise PSD at different stages of the design in Figure 4.12. For specifically the high frequency portion, the bump in noise resulting from the CTLE is evident, showing in practice how and why the CTLE increases the integrated noise. Note that the differential amplifier actually filters out additional noise largely without effect on the bandwidth or integrity of the signal as its own bandwidth is much higher. However, the higher frequency poles it contributes help to filter out some higher frequency noise.



Figure 4.12: Noise PSD for PD1 10 Gbps design at various stages input referred according to the respective midband gain ( $f_{ENBW} = 6.5 \times \frac{\pi}{2}$  GHz is marked)

#### 4.5 Noise Performance

The resulting input referred integrated noise numbers have been provided prior in Figure 4.3. However, to capture the full picture and verify that our design is optimal, consider the noise breakdown (the primary noise sources) and the effect of various stages on the input referred noise power spectral density (PSD). The latter is meant not only to confirm that peaking boosts high frequency noise, but also to demonstrate that this boost is tolerable according to the noise breakdown in Figure 4.12. Note the bump in high frequency noise in the PSD due to equalization. Some of the high frequency noise is however filtered out by the amplification stages; this is seen in comparing the noise PSD at higher frequencies out of the preamplifier versus out of the TIA. The noise PSD plots for the overall designs are provided in Figure 4.13. The plot generally shows how the designs scale in terms of input capacitance, with the comparison for the design at 10 Gbps between PD1 and PD2 clear. The comparison between designs at 10 Gbps and 25 Gbps showcase the tradeoff of noise and necessary bandwidth.



Figure 4.13: Noise PSD for the three macro designs  $(f_{ENBW,10Gbps} = 6.5 \times \frac{\pi}{2} \text{ GHz and} f_{ENBW,25Gbps} = 15.2 \times \frac{\pi}{2} \text{ GHz marked})$ 

A key element is comparing a design with and without the CTLE for compensation. The comparison is tricky because the design was extremely difficult for the large photodiode capacitances in the given photodiode models. The loosened bandwidth constraint is necessary to perform the optimization, as well as the ensure basic functionality of the TIA at higher data rates. As the tradeoff involves the CTLE noise and the noise contributed by later stages along the chain, the noise breakdown is very informative. These are given in Figures 4.14, 4.15 and 4.16.

| Device                                                  | Param | Noise Contribution | % Of Total |
|---------------------------------------------------------|-------|--------------------|------------|
| XTIA.XXNMOS0_XM0                                        | id    | 0.00878992         | 19.75      |
| XTIA.XXPMOS0_XM0                                        | id    | 0.00657773         | 11.06      |
| XTIA.XXNMOS0_XM0.rbody                                  | rn    | 0.00548368         | 7.69       |
| XTIA.XXPM0S0_XM0.rbody                                  | rn    | 0.00378195         | 3.66       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2499                        | rn    | 0.00334603         | 2.86       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2816                        | rn    | 0.00283755         | 2.06       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2865                        | rn    | 0.00283073         | 2.05       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2510                        | rn    | 0.00279754         | 2.00       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2561                        | rn    | 0.00279331         | 1.99       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2968                        | rn    | 0.00255717         | 1.67       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3211                        | rn    | 0.00255103         | 1.66       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3400                        | rn    | 0.00240481         | 1.48       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3556                        | rn    | 0.00240141         | 1.47       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2817                        | rn    | 0.00237921         | 1.45       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2864                        | rn    | 0.0023735          | 1.44       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2511                        | rn    | 0.00235498         | 1.42       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2560                        | rn    | 0.00235172         | 1.41       |
| XTIA.XXNMOS0_XM0                                        | fn    | 0.00217374         | 1.21       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2500                        | rn    | 0.00209711         | 1.12       |
| XTIA.XXPMOS0_XM0                                        | fn    | 0.00192405         | 0.95       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2975                        | rn    | 0.00187384         | 0.90       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2502                        | rn    | 0.00187259         | 0.90       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3205                        | rn    | 0.00186933         | 0.89       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3405                        | rn    | 0.00184947         | 0.87       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3549                        | rn    | 0.0018469          | 0.87       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2965                        | rn    | 0.00169056         | 0.73       |
| XTIA.XXNMOS1_XM0                                        | id    | 0.00168954         | 0.73       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3213                        | rn    | 0.0016865          | 0.73       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2562                        | rn    | 0.00149891         | 0.57       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2613                        | rn    | 0.00149709         | 0.57       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2970                        | rn    | 0.00146938         | 0.55       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3209                        | rn    | 0.00146588         | 0.55       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2972                        | rn    | 0.00141764         | 0.51       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3207                        | rn    | 0.00141424         | 0.51       |
| <pre>XTIA.x_PM_inv_tia_5_3\%Vint\&lt;0\&gt;.r4682</pre> | rn    | 0.00137893         | 0.49       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3402                        | rn    | 0.00130252         | 0.43       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r3554                        | rn    | 0.00130104         | 0.43       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2512                        | rn    | 0.00128072         | 0.42       |
| XTIA.x_PM_inv_tia_5_3\%Iin.r2559                        | rn    | 0.00127895         | 0.42       |

Figure 4.14: Noise Breakdown for PD1 10 Gbps Design

At the moment, looking at the noise breakdowns, the primary noise contributors are the transistors related to the first inverter of the inverter TIA. The inverter TIA instance is XTIA, with the instances of the first inverter of the inverter TIA given as XXPMOS0\_XM0 and XXNMOS0\_XM0. This effect is especially pronounced in the designs targeting 10 Gbps operation. The remaining contributors XTIA.x...%Iin.r are related to the feedback resistance. At 25 Gbps, the TIA noise is still dominant, but noise from the later stages, namely the first CTLE and the pre-amplifier stage begin to show up. It is possible to solve this by increasing the gain of the preamplifier stage or by incorporating an additional lower noise

| Device                                                  | Param | Noise Contribution | % Of Total |
|---------------------------------------------------------|-------|--------------------|------------|
|                                                         |       |                    |            |
| XTIA.XXNMOS0_XM0                                        | id    | 0.0123405          | 25.73      |
| XTIA.XXPMOS0_XM0                                        | id    | 0.00927793         | 14.54      |
| XTIA.XXNMOS0_XM0.rbody                                  | rn    | 0.0081976          | 11.35      |
| XTIA.XXPM0S0_XM0.rbody                                  | rn    | 0.00471467         | 3.76       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r2767                        | rn    | 0.00336512         | 1.91       |
| XTIA.XXNMOS0_XM0                                        | fn    | 0.00299933         | 1.52       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3126                        | rn    | 0.00292577         | 1.45       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3175                        | rn    | 0.00291399         | 1.43       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3282                        | rn    | 0.00270516         | 1.24       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3527                        | rn    | 0.00269428         | 1.23       |
| XTIA.XXPMOS0_XM0                                        | fn    | 0.00266603         | 1.20       |
| XTIA.XXNMOS1_XM0                                        | id    | 0.00260705         | 1.15       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r2780                        | rn    | 0.00252728         | 1.08       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r2841                        | rn    | 0.00251152         | 1.07       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3127                        | rn    | 0.00244885         | 1.01       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3174                        | rn    | 0.00243903         | 1.01       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3719                        | rn    | 0.00231452         | 0.91       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3905                        | rn    | 0.00230073         | 0.89       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r2768                        | rn    | 0.00219729         | 0.82       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r2781                        | rn    | 0.00211621         | 0.76       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r2840                        | rn    | 0.00210366         | 0.75       |
| <pre>XTIA.x_PM_inv_tia_8_4\%Vint\&lt;0\&gt;.r5003</pre> | rn    | 0.00202806         | 0.69       |
| XTIA.XXPMOS1_XM0                                        | id    | 0.00196218         | 0.65       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3289                        | rn    | 0.0019342          | 0.63       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3519                        | rn    | 0.00192642         | 0.63       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3277                        | rn    | 0.00183788         | 0.57       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3529                        | rn    | 0.00183047         | 0.57       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r2772                        | rn    | 0.00179503         | 0.54       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3724                        | rn    | 0.00166902         | 0.47       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3898                        | rn    | 0.00165907         | 0.47       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3284                        | rn    | 0.00159734         | 0.43       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3525                        | rn    | 0.00159103         | 0.43       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r2842                        | rn    | 0.00155824         | 0.41       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r2903                        | rn    | 0.00154951         | 0.41       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3286                        | rn    | 0.00153889         | 0.40       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3521                        | rn    | 0.00153272         | 0.40       |
| XTIA.XXNMOS1_XM0.rbody                                  | rn    | 0.00145405         | 0.36       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3721                        | rn    | 0.00135329         | 0.31       |
| XTIA.x_PM_inv_tia_8_4\%Iin.r3903                        | rn    | 0.00134597         | 0.31       |

Figure 4.15: Noise Breakdown for PD2 10 Gbps Design

| Device                                            | Param         | Noise Contribution | % Of Total |
|---------------------------------------------------|---------------|--------------------|------------|
| XTIA.XXNMOS_XM0                                   | id            | 0.000567224        | 8.21       |
| XTIA_DUMMY.XXNMOS_XM0                             | id            | 0.000566795        | 8.20       |
| XPREAMP.XXINPUTN_XM0                              | id            | 0.000470786        | 5.66       |
| XPREAMP.XXINPUTP_XM0                              | id            | 0.000470005        | 5.64       |
| XTIA.XXPMOS_XM0                                   | id            | 0.000421496        | 4.53       |
| XTIA_DUMMY.XXPMOS_XM0                             | id            | 0.000421177        | 4.53       |
| XTIA.XXNMOS_XM0.rbody                             | rn            | 0.000238558        | 1.45       |
| XTIA_DUMMY.XXNMOS_XM0.rbody                       | rn            | 0.000238239        | 1.45       |
| XCTLE1.XXINPUTN_XM0                               | id            | 0.00020772         | 1.10       |
| XCTLE1.XXINPUTP_XM0                               | id            | 0.000207278        | 1.10       |
| XTIA.XXPMOS_XM0.rbody                             | rn            | 0.000170398        | 0.74       |
| XTIA_DUMMY.XXPMOS_XM0.rbody                       | rn            | 0.000170237        | 0.74       |
| XTIA.x_PM_inv_tia_8_0\%Iin.r2066                  | rn            | 0.000154293        | 0.61       |
| <pre>XTIA_DUMMY.x_PM_inv_tia_8_0\%Iin.r2066</pre> | rn            | 0.000154194        | 0.61       |
| <pre>XTIA.x_PM_inv_tia_8_0\%Iin.r2357</pre>       | rn            | 0.000132658        | 0.45       |
| <pre>XTIA_DUMMY.x_PM_inv_tia_8_0\%Iin.r2357</pre> | rn            | 0.000132566        | 0.45       |
| <pre>XTIA.x_PM_inv_tia_8_0\%Iin.r2396</pre>       | rn            | 0.000132335        | 0.45       |
| <pre>XTIA_DUMMY.x_PM_inv_tia_8_0\%Iin.r2396</pre> | rn            | 0.000132243        | 0.45       |
| XPREAMP.XXINPUTN_XM0.rbody                        | rn            | 0.000126059        | 0.41       |
| XPREAMP.XXINPUTP_XM0.rbody                        | rn            | 0.000125548        | 0.40       |
| <pre>XTIA.x_PM_inv_tia_8_0\%Iin.r2499</pre>       | rn            | 0.000124709        | 0.40       |
| <pre>XTIA_DUMMY.x_PM_inv_tia_8_0\%Iin.r2499</pre> | rn            | 0.000124622        | 0.40       |
| XTIA.x_PM_inv_tia_8_0\%Iin.r2692                  | rn            | 0.000124406        | 0.39       |
| XTIA.XXXRFBL\<1\>_RR0.r2                          | thermal_noise | 0.000124356        | 0.39       |
| <pre>XTIA_DUMMY.x_PM_inv_tia_8_0\%Iin.r2692</pre> | rn            | 0.00012432         | 0.39       |
| XTIA.XXXRFBL\<2\>_RR0.r2                          | thermal_noise | 0.000124317        | 0.39       |
| XTIA.XXXRFBL\<1\>_RR0.r3                          | thermal_noise | 0.000124314        | 0.39       |
| XTIA.XXXRFBR\<0\>_RR0.r2                          | thermal_noise | 0.000124303        | 0.39       |
| XTIA.XXXRFBL\<2\>_RR0.r3                          | thermal_noise | 0.000124275        | 0.39       |
| XTIA.XXXRFBL\<1\>_RR0.r4                          | thermal_noise | 0.000124273        | 0.39       |
| XTIA.XXXRFBR\<2\>_RR0.r2                          | thermal_noise | 0.000124264        | 0.39       |
| XTIA.XXXRFBR\<0\>_RR0.r3                          | thermal_noise | 0.000124261        | 0.39       |
| XTIA.XXXRFBL\<0\>_RR0.r2                          | thermal_noise | 0.000124239        | 0.39       |
| XTIA.XXXRFBL\<2\>_RR0.r4                          | thermal_noise | 0.000124234        | 0.39       |
| XTIA.XXXRFBR\<2\>_RR0.r3                          | thermal_noise | 0.000124222        | 0.39       |
| XTIA.XXXRFBR\<0\>_RR0.r4                          | thermal_noise | 0.00012422         | 0.39       |
| XTIA.XXXRFBL\<0\>_RR0.r3                          | thermal_noise | 0.000124197        | 0.39       |
| XTIA.XXXRFBR\<1\>_RR0.r2                          | thermal_noise | 0.000124186        | 0.39       |
| XTIA.XXXRFBR\<2\>_RR0.r4                          | thermal_noise | 0.000124181        | 0.39       |

Figure 4.16: Noise Breakdown for PD2 25 Gbps Design

gain stage such that the noise of consequent stages when referred back to the input is less. Throughout these designs, noise from the transistor is dominant. Thus, as the constraint for the bandwidth is relaxed, it makes sense to leverage the optimization technique of increasing the transistor size at the cost of the resistor size.

Note also the noise from the dummy TIA that shows up as XTIA\_DUMMY entries. The dummy TIA does not contribute any gain, but contributes the same amount of noise as the TIA in the signal path. According to the squared sum of the noise in integrated noise, the dummy TIA automatically increases the noise contributed by the TIA's by a factor of  $\sqrt{2}$ . To solve this, it is possible to incorporate a cherry-hooper amplification stage and tap out one of the internal nodes, similar to the technique used in the 3-inverter TIA in tapping out the input of the third TIA [9]. This analysis focusses generally upon the noise from the TIA and the consequent amplification and equalization stages. Noise from the power supply is filtered out according to the power supply rejection ratio (PSRR) and the common mode rejection ratio (CMRR) that result from the chain of amplifiers.

#### 4.6 Offset Correction

In order to have a functioning design, it is important to take into consideration problems due to process mismatch and variation. This manifests in terms of offset to the common mode voltages as well as the offset between positive and negative dc bias. The offset in common mode voltage heavily affects the performance of the output stage and the output swing; it can significantly decrease the gain of each of the CTLE stages and the output stage as well. These are corrected for at the preamplifier stage early on through the chain. Offset is corrected for by pulling down extra current in one branch of the circuit. The amount of offset can be quantified in terms of offset referred to the output of the TIA (the input voltage of the preamplifier stage). Offset up to 30mV is tolerable tested in Figure 4.17. These tests are run on the 10 Gbps Design using PD1. As the adjustments are made for the common mode and offset between branches, the performance eventually degrades and the eye closes up. These problems with the eye closing can be resolved by higher levels of peaking, as demonstrated from the effects in the section prior (in Figures 4.9 and 4.10).

#### 4.7 Power Consumption

In this work, optimization in terms of power was never the primary concern. Rather, the desire was to observe as low noise as possible within reasonable constraints for noise and the design of the blocks. The power is reported in Figure 4.18 operating at a VDD of 1.2V (VDD1, VDD2, VDD3 are all at the same voltage). Power consumption is dominated by the TIA and output stage. The TIA of the 25 Gbps design consumes significantly less power because it uses only a single-inverter design. Additionally, with smaller input capacitance, the optimal sizing for the transistor in terms of noise reduction decreases, as according to



(b) With 30mV offset applied at the output of the TIA correction

Figure 4.17: Eye Diagrams After Adjusting Offset

simulations prior from Figure 2.6b and is visible comparing the power of 10 Gbps designs using PD1 and PD2. For this reason, the contribution of power from the TIA for the PD2 10 Gbps design is less than that of the PD1 10 Gbps design. The TIA for both designs at 10Gbps has very similar bandwidth, which is the focus of the equalization blocks. As a result, the preamplifier design is shared as are most blocks along the chain. The CTLE stages between the two 10 Gbps designs consume the same amount of power as only their degeneration capacitance and resistance take on different optimum values.



(a) Power breakdown in uA for PD1 10 Gbps Design (Using a three-inverter TIA)



(b) Power breakdown in uA for PD2 10 Gbps Design (Using a three-inverter TIA)



(c) Power breakdown in uA for PD2 25 Gbps Design (Using a single-inverter TIA)

Figure 4.18: Power breakdown of the different designs comparing how each block contributes to the total power consumption

# Chapter 5

# Packaging and Testing Setup

In order to have a functional system surrounding the TIA macros that have been designed and discussed, there also must be some photodiode present and some way of configuring the biases and scan bits of each macro. The TIA designs sit on some larger chip that will be referred to as the TIA Chips. The photodiodes also sit on their own chip and will be referred to as the photodiode chips. To put everything together, the TIA chips and the photodiode chips be attached onto a PCB. The signals of each will be wirebonded to each other and to the PCB as deemed necessary. The plan is to configure the scan-chain with an Opal-Kelly FPGA and then feed in the currents by tuning a potentiometer connected to the relevant VDD or VSS. From there, it will be possible to feed in an optical signal and probe the output and run relevant tests to benchmark the performance of the designed TIA's and analyze the optimality of the design script and methodology across different design points.

#### 5.1 Wire-Bonding

The TIA chips measure 8.891 mm by 2.031 mm, appearing as in Figure 5.1a. The overall layout of the macros from Figure 3.10 is placed in the bottom right corner as seen in Figure 5.1b. The input pad openings are 70 x 70  $\mu$ m and they are placed with a center to center pitch vertically of 95  $\mu$ m. The spacing between the columns of pads is 180  $\mu$ m center to center. The output pads measure 56 x 34  $\mu$ m. The output pads are meant to be probe pads and the input pads are meant to be wire-bonding pads. The TIA chip is 100  $\mu$ m thick. The output will be probed with a GSSG probe. For the top two designs (PD1 10 Gbps Design and PD2 10 Gbps Design) one of the G (ground) pads is shared.

The photodiode chips are rather small, only  $350 \times 350 \ \mu\text{m}$ . The diagram appears in Figure 5.2. These pads are are  $80 \times 80 \ \mu\text{m}$ , spacing  $104.5 \ \mu\text{m}$  center to center. The photodiode chip measures  $150 \ \mu\text{m}$  thick.

The bias currents, the scan voltages, and the voltage references must be wire-bonded from the TIA chip to the PCB. The photodiode anode will be attached to the TIA input. The cathode will be wire-bonded to a bias on the board. Both bare chips will be glued



(a) Full TIA chip diagram with input and output pads marked



(b) TIA chip pads highlighting pad spacing and pad location

Figure 5.1: TIA chip diagrams



Figure 5.2: Photodiode chip diagram

directly to the PCB likely with epoxy. The cross-sectional diagram is provided in Figure 5.3. Rather than wire-bond all of the wires at once and test all three macros per chip, it makes more sense to only test one macro per chip. This eases the constraints for wire-bonding and placement on the PCB. The various wire-bonding variants are as provided in Figure 5.4. These show the placement of the pads on the PCB and the relative placement of both chips, also indicating the wire-bonds. There are some variations in length because they will be wire-bonded column by column, columns specified as they appear in the figure.



Figure 5.3: Packaging cross section with the chips placed on the PCB (not drawn to scale)


(a) Wire-bonding Diagram for Testing Macro Designed for PD1 at 10 Gbps



(b) Wire-bonding Diagram for Testing Macro Designed for PD2 at 10 Gbps



(c) Wire-bonding Diagram for Testing Macro Designed for PD2 at 25 Gbps

Figure 5.4: Wire-bonding Diagrams for Testing the Various Macros (cont.)

#### 5.2 Testing Setup and PCB Design



Figure 5.5: PCB Diagram

The PCB diagram appears in Figure 5.5. The board size is 6000 x 4000 mils, which amounts to 152.4 x 101.6 mm. The layer stack is given in Figure 5.6. There are no ground planes between the layers, but the VSS is poured into top layer, VDD1 in the second layer, VDD2 in the third layer, and VDD3 in the bottom layer to create a power plane. The boards have an Electroless Nickel Immersion Gold (ENIG) coating which is necessary to support wire-bonding. In order to test the chip properly, there must be a way to provide an optical input, providing the proper bias voltages, configuring the scan chain, and probing the output. To accommodate this an optical signal will be provided from the bottom side of the

|   | Name           | Material      | Туре        | Thickness | Weight | Dk  | Df   |  |
|---|----------------|---------------|-------------|-----------|--------|-----|------|--|
|   | Top Overlay    |               | Overlay     |           |        |     |      |  |
|   | Top Solder     | Solder Resist | Solder Mask | 0.4mil    |        | 3.5 |      |  |
| 1 | Top Layer      |               | Signal      | 1.4mil    | 1oz    |     |      |  |
|   | Dielectric 2   | Core-009      | Core        | 4mil      |        | 4.5 | 0.02 |  |
| 2 | Layer 2        | CF-004        | Signal      | 1.378mil  | 1oz    |     |      |  |
|   | Dielectric 1   | PP-006        |             | 2.8mil    |        | 4.1 | 0.02 |  |
| 3 | Layer 3        | CF-004        | Signal      | 1.378mil  | 1oz    |     |      |  |
|   | Dielectric3    | Core-009      | Core        | 4mil      |        | 4.5 | 0.02 |  |
| 4 | Bottom Layer   |               | Signal      | 1.4mil    | 1oz    |     |      |  |
|   | Bottom Solder  | Solder Resist | Solder Mask | 0.4mil    |        | 3.5 |      |  |
|   | Bottom Overlay |               | Overlay     |           |        |     |      |  |
|   |                |               |             |           |        |     |      |  |

Figure 5.6: PCB Layer Stack

PCB. The output will be probed from the left of the chip. The supplies and the photodiode bias will be connected with banana plugs. There is a fair amount of decoupling capacitance between each power supply and ground. This is used to filter out supply noise (particularly high frequency noise). An LDO or similar voltage regulator can be used to filter out lower frequency noise if necessary. PD\_BIAS also has decoupling capacitance to VDD1 and VSS. There are also decoupling capacitors between the gate and source nodes of the current mirror transistors on the board. Vias drilled at the corners act as mechanical mounts used to fix the board in place during testing. These serve to filter out high frequency noise. The headers to the scan chain will be fed in from the right of the PCB. There are 32 signals possible (two 8x2 headers to the FPGA), but there are only a few signals. With the freedom to choose which headers will carry actual signal, only the even headers are used. The corresponding negative header is shorted to VSS to solve signal integrity issues found prior in the ribbon connectors to the FPGA.

Only one bonding site with photodiode chip and TIA chip pair should be tested on a PCB at once. In order to provide the proper reference current, there are a set of headers per current and potentiometer that act like a current multiplexers (MUX). There is the option to select which unit to test and to which sites the current is routed. To allow a wider range of offset correction current, there are three potentiometers that can potentially be hooked up. The headers selection functions as a MUX. Only one should be enable and connected at a time. In addition, if there is an issue with the potentiometer or the current coming from the potentiometer, a current source can be directly fed to the site and into the chip.

The bonding sites marked U1-U6 are the location of the bonding sites given in Figure 5.4. The sites are sufficiently spaced apart vertically and horizontally with great care so that the wire-bonds of one site are not disturbed when testing (probing the output) another. Each site has a photodiode chip, a TIA chip and a set of wire-bonds. There are also a number of different photodiode chips that are to be tested. They vary in terms of the optical aperture diameter and photodiode capacitance. Some also have a backside lens. These make it easier

to couple light in and potentially offer a lower photodiode capacitance. In order to test these photodiode chips, a fiber must be fed from the backside of the PCB. This requires a hole be drilled directly under the photodiode as indicated in Figure 5.8. The even number sites all have a hole drilled beneath the placement of the photodiode chip. The specifics of the site are described below in Table 5.7.

| Bonding Site | Wire-bond Diagram              | Hole drilled under PD |
|--------------|--------------------------------|-----------------------|
|              | (Corresponding Design to Test) |                       |
| U1           | PD1 10 Gbps                    | No                    |
| U2           | PD1 10 Gbps                    | Yes                   |
| U3           | PD2 10 Gbps                    | No                    |
| U4           | PD2 10 Gbps                    | Yes                   |
| U5           | $PD2 \ 25 \ Gbps$              | No                    |
| U6           | $PD2 \ 25 \ Gbps$              | Yes                   |

Figure 5.7: Table detailing difference of each bonding site





(a) Bonding Site U1 WITHOUT a hole under the photodiode chip placement

(b) Bonding Site U2 WITH a hole under the photodiode chip placement

Figure 5.8: Zooming in to the bonding sites to see the hole placed under the photodiode chip placement.

#### 5.3 Intended Tests

The goal of this work has been to provide a working design to operate with low noise. Thus, the two main components for testing involve ensuring the TIA operates properly and analyzing how much noise exists in the system. A laser source and modulator are necessary to generate the input optical signal. The generated light travels along a fiber and is coupled into the photodiode chip. From there, the output signal from the chip (output by the AFE) will be probed and the signal can be read through an oscilloscope or spectrum analyzer. The oscilloscope should be able to provide eye diagram measurements. The statistical deviation of the 0 and 1 levels provides interesting information on the inter-symbol interference (ISI) as well as a measurement of the non-deterministic noise coming from the circuit. These are marked in a sample eye diagram generated in Figure 5.9.



Figure 5.9: Sample statistical eye diagram with standard deviation of 0 and 1 levels marked out approximated by a gaussian

The noise margin can be determined at the widest opening of the eye (marking peak SNR) according to the standard deviation of the eye height, as marked in Figure 5.10. The eye diagram is useful for benchmark the basic performance of the AFE, and according to the standard deviation of the margin of eye opening, the BER can also be estimated according to Equation 5.1, with the error function *erf* defined in Equation 5.2. Variation of the input amplitude and performing a similar analysis can also be used to estimate the effect of noise versus the effect of ISI. In addition to these measurements, the noise can also be measured by a spectrum analyzer. Bias the circuit properly and without providing an input signal, probe the output and send the signal to a spectrum analyzer. This will measure the noise floor as well as provide information on the shape of the noise.



Figure 5.10: Sample statistical eye diagram with standard deviation of eye height (opening) approximated by a gaussian

$$BER = \frac{1}{2} erf\left(\frac{V_{out,amp}}{\sqrt{2}\sigma}\right) \tag{5.1}$$

$$erf(x) = \frac{2}{\sqrt{\pi}} \int_0^x e^{-t^2} dt$$
 (5.2)

With the three macros, there can be an analysis of the optimality of the design script, looking at how good the design works at the target data rate with the specified photodiode. In addition to measuring the pure performance of the designed TIA macros, the performance of the TIA with respect to noise can also be attempted. This has been shown in simulation but it would also be interesting to confirm this phenomena in measurement. This can be done with different configurations of the CTLE degeneration capacitance and resistance. Changing the degeneration resistance corresponds to altering the DC gain of the CTLE and also adjusts how much equalization the CTLE performs. Adjusting for a lower DC gain involves increasing the degeneration resistance and decreases the location of the zero. The new pole does not change as the dominant pole of the CTLE will be the output pole which remains unchanged. As a result, this amounts to more peaking and more equalization. The degeneration capacitance can also be directly swept to decrease the location of the zero. Upon adjusting the configuration, measure the eye and noise and compare to see the effect of a boost in high frequency noise from the CTLE.

## Chapter 6

### **Conclusion and Future Work**

This work has investigated the fundamental problems of TIA design for low noise. At the core of the problem is the noise from the transistors that is heavily impacted by large photodiode capacitance. Lowering that capacitance would ameliorate many of the difficulties at hand. From there, the tradeoff of a low-bandwidth TIA and later equalization is shown to provide a significant improvement in noise in accordance with past literature. A clever solution has also been proposed to obtain a differential mode signal from a single photodiode current input that allows for greater common mode rejection and reduces the noise penalty paid when simply incorporating a dummy photodiode.

Future work involves incorporating DFE with number of taps small enough such that there is not too significant a penalty in power. This will allow for an even lower bandwidth TIA and may allow a multiple inverter architecture with high photodiode capacitance to perform at higher data rate. Additional system-wide co-optimization may also be attempted, paying closer attention to inductive peaking techniques and potentially incorporating creative methods of feedback. In terms of the implementation, adding a low-gain high BW amplifier stage (such as the cherry-hooper) for 25Gbps design to eliminate the need for the dummy TIA should prove an interesting exercise. There must also be work done in terms of performing measurements for the work that has been taped out.

# Bibliography

- E. Chang et al. "BAG2: A process-portable framework for generator-based AMS circuit design". In: (2018), pp. 1–8.
- [2] Low-Pass Filter Brick Wall Filter Equivalents. URL: https://www.ti.com/lit/an/ sboal10a/sboal10a.pdf?&ts=1589577943452.
- [3] D. Li et al. "A Low-Noise Design Technique for High-Speed CMOS Optical Receivers". In: *IEEE Journal of Solid-State Circuits* 49.6 (2014), pp. 1437–1447.
- K. T. Settaluri et al. "First Principles Optimization of Opto-Electronic Communication Links". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 64.5 (2017), pp. 1270–1283.
- [5] H. Escid, Sonia Salhi, and Abdelhalim Slimane. "Bandwidth enhancement for 0.18 um CMOS transimpedance amplifier circuit". In: 2013 25th International Conference on Microelectronics (ICM) (2013), pp. 1–4.
- [6] Chia-Hsin Wu et al. "CMOS wideband amplifiers using multiple inductive-series peaking technique". In: *IEEE Journal of Solid-State Circuits* 40.2 (2005), pp. 548–552.
- [7] S. Gondi and B. Razavi. "Equalization and Clock and Data Recovery Techniques for 10-Gb/s CMOS Serial-Link Receivers". In: *IEEE Journal of Solid-State Circuits* 42.9 (2007), pp. 1999–2011.
- [8] S. S. Mohan et al. "Bandwidth extension in CMOS with optimized on-chip inductors". In: *IEEE Journal of Solid-State Circuits* 35.3 (2000), pp. 346–355.
- [9] K. R. Lakshmikumar et al. "A Process and Temperature Insensitive CMOS Linear TIA for 100 Gb/s/ λ PAM-4 Optical Links". In: *IEEE Journal of Solid-State Circuits* 54.11 (2019), pp. 3180–3190.