Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets
Harry Zhao
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2023-96
May 11, 2023
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.pdf
We aimed to study the training dynamics and the internal interpretability of transformer models by formulating an algorithmically generated in-context learning task and training small models that can learn to generalize the task with 100% test accuracy. We found clear indications of phase change behavior that are indicative of emergent abilities, invariant attention patterns across different one-attention-head models, and early determination during training of convergence probability. We found promising future work directions for further studying transformer models, both small models and generalizations on larger models.
Advisors: Daniel Klein
BibTeX citation:
@mastersthesis{Zhao:EECS-2023-96, Author= {Zhao, Harry}, Title= {Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets}, School= {EECS Department, University of California, Berkeley}, Year= {2023}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.html}, Number= {UCB/EECS-2023-96}, Abstract= {We aimed to study the training dynamics and the internal interpretability of transformer models by formulating an algorithmically generated in-context learning task and training small models that can learn to generalize the task with 100% test accuracy. We found clear indications of phase change behavior that are indicative of emergent abilities, invariant attention patterns across different one-attention-head models, and early determination during training of convergence probability. We found promising future work directions for further studying transformer models, both small models and generalizations on larger models.}, }
EndNote citation:
%0 Thesis %A Zhao, Harry %T Investigating Training Dynamics of Transformer Models on Algorithmically Generated In-Context Learning Datasets %I EECS Department, University of California, Berkeley %D 2023 %8 May 11 %@ UCB/EECS-2023-96 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-96.html %F Zhao:EECS-2023-96