Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets

Harry Zhao

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-96

May 11, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.pdf

We aimed to study the training dynamics and the internal interpretability of transformer models by formulating an algorithmically generated in-context learning task and training small models that can learn to generalize the task with 100% test accuracy. We found clear indications of phase change behavior that are indicative of emergent abilities, invariant attention patterns across different one-attention-head models, and early determination during training of convergence probability. We found promising future work directions for further studying transformer models, both small models and generalizations on larger models.

Advisors: Daniel Klein

BibTeX citation:

@mastersthesis{Zhao:EECS-2023-96,
    Author= {Zhao, Harry},
    Title= {Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.html},
    Number= {UCB/EECS-2023-96},
    Abstract= {We aimed to study the training dynamics and the internal interpretability of transformer models by
formulating an algorithmically generated in-context learning task and training small models that
can learn to generalize the task with 100% test accuracy. We found clear indications of phase change
behavior that are indicative of emergent abilities, invariant attention patterns across different
one-attention-head models, and early determination during training of convergence probability.
We found promising future work directions for further studying transformer models, both small
models and generalizations on larger models.},
}

EndNote citation:

%0 Thesis
%A Zhao, Harry 
%T Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 11
%@ UCB/EECS-2023-96
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.html
%F Zhao:EECS-2023-96