### Tijana Zrnic

###
EECS Department

University of California, Berkeley

Technical Report No. UCB/EECS-2023-65

May 5, 2023

### http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-65.pdf

Classical machine learning and statistics are built on the paradigm that there is a fixed quantity that we want to learn about a population, such as the best predictor of outcomes from features or the average effect of a treatment. In modern practices, however, predictions and inferences beget other predictions and inferences, causing the quantity of interest to change over time and drift away in a feedback loop. The feedback poses challenges for traditional methods, calling for new solutions. This thesis introduces new principles for prediction and inference in the presence of feedback loops.

The first part focuses on performative prediction. Performative prediction formalizes the phenomenon that predictive models—by means of being used to make consequential downstream decisions—often influence the outcomes they aim to predict in the first place. For example, travel time estimates on navigation apps influence traffic patterns and thus realized travel times, stock price predictions influence trading activity and hence prices. We examine common heuristics such as retraining, as well as more refined optimization strategies for dealing with performative feedback. At the end of the first part, we identify important scenarios where the act of prediction triggers feedback loops that are not explained by the framework of performativity, and we develop theory to describe and study such feedback.

The second part discusses principles for valid statistical inference, i.e., valid p-values and confidence intervals, in the presence of feedback. We consider two types of feedback: the first is due to data snooping, i.e., the practice of choosing which results to report only after seeing the data; the second arises when machine-learning systems are used to supply cheap predictions to augment or supplant high-quality data in future scientific analyses. In both cases, ignoring the feedback and naively applying classical statistical methods leads to inflated error rates and false discoveries; we provide alternative approaches that guarantee valid inferences in the face of feedback.

**Advisor:** Michael Jordan and Moritz Hardt

BibTeX citation:

@phdthesis{Zrnic:EECS-2023-65, Author = {Zrnic, Tijana}, Title = {Prediction and Statistical Inference in Feedback Loops}, School = {EECS Department, University of California, Berkeley}, Year = {2023}, Month = {May}, URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-65.html}, Number = {UCB/EECS-2023-65}, Abstract = {Classical machine learning and statistics are built on the paradigm that there is a fixed quantity that we want to learn about a population, such as the best predictor of outcomes from features or the average effect of a treatment. In modern practices, however, predictions and inferences beget other predictions and inferences, causing the quantity of interest to change over time and drift away in a feedback loop. The feedback poses challenges for traditional methods, calling for new solutions. This thesis introduces new principles for prediction and inference in the presence of feedback loops. The first part focuses on performative prediction. Performative prediction formalizes the phenomenon that predictive models—by means of being used to make consequential downstream decisions—often influence the outcomes they aim to predict in the first place. For example, travel time estimates on navigation apps influence traffic patterns and thus realized travel times, stock price predictions influence trading activity and hence prices. We examine common heuristics such as retraining, as well as more refined optimization strategies for dealing with performative feedback. At the end of the first part, we identify important scenarios where the act of prediction triggers feedback loops that are not explained by the framework of performativity, and we develop theory to describe and study such feedback. The second part discusses principles for valid statistical inference, i.e., valid p-values and confidence intervals, in the presence of feedback. We consider two types of feedback: the first is due to data snooping, i.e., the practice of choosing which results to report only after seeing the data; the second arises when machine-learning systems are used to supply cheap predictions to augment or supplant high-quality data in future scientific analyses. In both cases, ignoring the feedback and naively applying classical statistical methods leads to inflated error rates and false discoveries; we provide alternative approaches that guarantee valid inferences in the face of feedback.} }

EndNote citation:

%0 Thesis %A Zrnic, Tijana %T Prediction and Statistical Inference in Feedback Loops %I EECS Department, University of California, Berkeley %D 2023 %8 May 5 %@ UCB/EECS-2023-65 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-65.html %F Zrnic:EECS-2023-65