Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets

Harry Zhao

EECS Department
University of California, Berkeley
Technical Report No. UCB/EECS-2023-96
May 11, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.pdf

We aimed to study the training dynamics and the internal interpretability of transformer models by formulating an algorithmically generated in-context learning task and training small models that can learn to generalize the task with 100% test accuracy. We found clear indications of phase change behavior that are indicative of emergent abilities, invariant attention patterns across different one-attention-head models, and early determination during training of convergence probability. We found promising future work directions for further studying transformer models, both small models and generalizations on larger models.

Advisor: Daniel Klein


BibTeX citation:

@mastersthesis{Zhao:EECS-2023-96,
    Author = {Zhao, Harry},
    Title = {Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets},
    School = {EECS Department, University of California, Berkeley},
    Year = {2023},
    Month = {May},
    URL = {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.html},
    Number = {UCB/EECS-2023-96},
    Abstract = {We aimed to study the training dynamics and the internal interpretability of transformer models by
formulating an algorithmically generated in-context learning task and training small models that
can learn to generalize the task with 100% test accuracy. We found clear indications of phase change
behavior that are indicative of emergent abilities, invariant attention patterns across different
one-attention-head models, and early determination during training of convergence probability.
We found promising future work directions for further studying transformer models, both small
models and generalizations on larger models.}
}

EndNote citation:

%0 Thesis
%A Zhao, Harry
%T Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 11
%@ UCB/EECS-2023-96
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.html
%F Zhao:EECS-2023-96