Treating Models Better for Language-agnostic Understanding
Brian Yu
EECS Department, University of California, Berkeley
Technical Report No. UCB/EECS-2023-148
May 12, 2023
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-148.pdf
State-of-the-art foundation language models have many strengths that are under-valued. Simultaneously, multilingual NLP lacks a clear goal. In this paper, we propose language-agnostic understanding as the goal of multilingual NLP and demonstrate that leveraging foundation language model strengths directly improves on this goal. We reformulate inputs during supervised finetuning to better leverage foundation language model strengths. We obtain significant improvements on challenging translation tasks compared to a baseline mT5 setup. On a Classical Tibetan to English translation task, these reformulations improve performance up to 2.8 BLEU. On the Flores200 translation benchmark, these reformulations improve performance up to 3.1 chrF++. Our research reveals insights into how models learn from different inputs, enabling more effective training to scalably improve state-of-the-art performance. We hope our research inspires further work that leverages foundation language model strengths and further work on language-agnostic understanding. Our experiments are released here.
Advisors: Kurt Keutzer
BibTeX citation:
@mastersthesis{Yu:EECS-2023-148, Author= {Yu, Brian}, Editor= {Keutzer, Kurt and DeNero, John}, Title= {Treating Models Better for Language-agnostic Understanding}, School= {EECS Department, University of California, Berkeley}, Year= {2023}, Month= {May}, Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-148.html}, Number= {UCB/EECS-2023-148}, Abstract= {State-of-the-art foundation language models have many strengths that are under-valued. Simultaneously, multilingual NLP lacks a clear goal. In this paper, we propose language-agnostic understanding as the goal of multilingual NLP and demonstrate that leveraging foundation language model strengths directly improves on this goal. We reformulate inputs during supervised finetuning to better leverage foundation language model strengths. We obtain significant improvements on challenging translation tasks compared to a baseline mT5 setup. On a Classical Tibetan to English translation task, these reformulations improve performance up to 2.8 BLEU. On the Flores200 translation benchmark, these reformulations improve performance up to 3.1 chrF++. Our research reveals insights into how models learn from different inputs, enabling more effective training to scalably improve state-of-the-art performance. We hope our research inspires further work that leverages foundation language model strengths and further work on language-agnostic understanding. Our experiments are released here.}, }
EndNote citation:
%0 Thesis %A Yu, Brian %E Keutzer, Kurt %E DeNero, John %T Treating Models Better for Language-agnostic Understanding %I EECS Department, University of California, Berkeley %D 2023 %8 May 12 %@ UCB/EECS-2023-148 %U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-148.html %F Yu:EECS-2023-148