Steering How Deep Neural Networks Generalize

Katie Kang

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2025-91

May 16, 2025

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-91.pdf

Deep learning models, particularly Large Language Models (LLMs), have achieved remarkable capabilities, yet their reliability is often hindered by a lack of understanding regarding their generalization to unseen data. Failures when encountering novel inputs, such as factual inaccuracies or deviations from instructions, can lead to safety vulnerabilities in real-world applications. This dissertation confronts this challenge by investigating how different aspects of the training process influence generalization and extrapolation in deep neural networks, with a specific focus on LLMs. The core objective is twofold: first, to characterize how elements of the learning recipe shape model behavior on both in-distribution and out-of-distribution data, and second, to develop strategies for steering generalization to enhance performance and robustness on unseen examples.

A model's generalization behavior varies depending on the training recipe and evaluation data. This thesis studies this behavior from different angles, progressing from standard deep neural networks, often optimized in a single stage, to modern LLMs, which typically undergo multiple stages of pretraining and finetuning. First, we study the extrapolation tendencies of standard deep networks when presented with inputs distributionally different from their training data. Challenging the assumption of erratic out-of-distribution behavior, this work demonstrates that these networks often exhibit structured and predictable extrapolation patterns, tending towards constant outputs that can be systematically linked to properties of the training data and the loss function used. Subsequently, this thesis examines hallucination in LLMs, finding that unfamiliar examples in finetuning data critically influence factually incorrect outputs and that modifying their supervision can mitigate these errors. Finally, the work explores the acquisition of generalizable mathematical reasoning skills, revealing that learning dynamics, particularly accuracy achieved before memorizing training steps, strongly correlate with the model's performance on heldout examples. Collectively, these investigations offer a more nuanced understanding of generalization, contributing towards the development of more predictable and reliable deep learning systems.

Advisors: Claire Tomlin and Sergey Levine

BibTeX citation:

@phdthesis{Kang:EECS-2025-91,
    Author= {Kang, Katie},
    Title= {Steering How Deep Neural Networks Generalize},
    School= {EECS Department, University of California, Berkeley},
    Year= {2025},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-91.html},
    Number= {UCB/EECS-2025-91},
    Abstract= {Deep learning models, particularly Large Language Models (LLMs), have achieved remarkable capabilities, yet their reliability is often hindered by a lack of understanding regarding their generalization to unseen data. Failures when encountering novel inputs, such as factual inaccuracies or deviations from instructions, can lead to safety vulnerabilities in real-world applications. This dissertation confronts this challenge by investigating how different aspects of the training process influence generalization and extrapolation in deep neural networks, with a specific focus on LLMs. The core objective is twofold: first, to characterize how elements of the learning recipe shape model behavior on both in-distribution and out-of-distribution data, and second, to develop strategies for steering generalization to enhance performance and robustness on unseen examples.

A model's generalization behavior varies depending on the training recipe and evaluation data. This thesis studies this behavior from different angles, progressing from standard deep neural networks, often optimized in a single stage, to modern LLMs, which typically undergo multiple stages of pretraining and finetuning. First, we study the extrapolation tendencies of standard deep networks when presented with inputs distributionally different from their training data. Challenging the assumption of erratic out-of-distribution behavior, this work demonstrates that these networks often exhibit structured and predictable extrapolation patterns, tending towards constant outputs that can be systematically linked to properties of the training data and the loss function used. Subsequently, this thesis examines hallucination in LLMs, finding that unfamiliar examples in finetuning data critically influence factually incorrect outputs and that modifying their supervision can mitigate these errors. Finally, the work explores the acquisition of generalizable mathematical reasoning skills, revealing that learning dynamics, particularly accuracy achieved before memorizing training steps, strongly correlate with the model's performance on heldout examples. Collectively, these investigations offer a more nuanced understanding of generalization, contributing towards the development of more predictable and reliable deep learning systems.},
}

EndNote citation:

%0 Thesis
%A Kang, Katie 
%T Steering How Deep Neural Networks Generalize
%I EECS Department, University of California, Berkeley
%D 2025
%8 May 16
%@ UCB/EECS-2025-91
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-91.html
%F Kang:EECS-2025-91