Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets
Harry Zhao
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2023-96
May 11, 2023
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.pdf
We aimed to study the training dynamics and the internal interpretability of transformer models by formulating an algorithmically generated in-context learning task and training small models that can learn to generalize the task with 100% test accuracy. We found clear indications of phase change behavior that are indicative of emergent abilities, invariant attention patterns across different one-attention-head models, and early determination during training of convergence probability. We found promising future work directions for further studying transformer models, both small models and generalizations on larger models.
Advisors: Daniel Klein
BibTeX citation:
@mastersthesis{Zhao:EECS-2023-96,
Author= {Zhao, Harry},
Title= {Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets},
School= {EECS Department, University of California, Berkeley},
Year= {2023},
Month= {May},
Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.html},
Number= {UCB/EECS-2023-96},
Abstract= {We aimed to study the training dynamics and the internal interpretability of transformer models by
formulating an algorithmically generated in-context learning task and training small models that
can learn to generalize the task with 100% test accuracy. We found clear indications of phase change
behavior that are indicative of emergent abilities, invariant attention patterns across different
one-attention-head models, and early determination during training of convergence probability.
We found promising future work directions for further studying transformer models, both small
models and generalizations on larger models.},
}
EndNote citation:
%0 Thesis %A Zhao, Harry %T Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets %I EECS Department, University of California, Berkeley %D 2023 %8 May 11 %@ UCB/EECS-2023-96 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.html %F Zhao:EECS-2023-96