Treating Models Better for Language-agnostic Understanding

Brian Yu

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2023-148

May 12, 2023

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-148.pdf

State-of-the-art foundation language models have many strengths that are under-valued. Simultaneously, multilingual NLP lacks a clear goal. In this paper, we propose language-agnostic understanding as the goal of multilingual NLP and demonstrate that leveraging foundation language model strengths directly improves on this goal. We reformulate inputs during supervised finetuning to better leverage foundation language model strengths. We obtain significant improvements on challenging translation tasks compared to a baseline mT5 setup. On a Classical Tibetan to English translation task, these reformulations improve performance up to 2.8 BLEU. On the Flores200 translation benchmark, these reformulations improve performance up to 3.1 chrF++. Our research reveals insights into how models learn from different inputs, enabling more effective training to scalably improve state-of-the-art performance. We hope our research inspires further work that leverages foundation language model strengths and further work on language-agnostic understanding. Our experiments are released here.

Advisors: Kurt Keutzer

BibTeX citation:

@mastersthesis{Yu:EECS-2023-148,
    Author= {Yu, Brian},
    Editor= {Keutzer, Kurt and DeNero, John},
    Title= {Treating Models Better for Language-agnostic Understanding},
    School= {EECS Department, University of California, Berkeley},
    Year= {2023},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-148.html},
    Number= {UCB/EECS-2023-148},
    Abstract= {State-of-the-art foundation language models have many strengths that are under-valued. Simultaneously, multilingual NLP lacks a clear goal. In this paper, we propose language-agnostic understanding as the goal of multilingual NLP and demonstrate that leveraging foundation language model strengths directly improves on this goal. We reformulate inputs during supervised finetuning to better leverage foundation language model strengths. We obtain significant improvements on challenging translation tasks compared to a baseline mT5 setup. On a Classical Tibetan to English translation task, these reformulations improve performance up to 2.8 BLEU. On the Flores200 translation benchmark, these reformulations improve performance up to 3.1 chrF++. Our research reveals insights into how models learn from different inputs, enabling more effective training to scalably improve state-of-the-art performance. We hope our research inspires further work that leverages foundation language model strengths and further work on language-agnostic understanding. Our experiments are released here.},
}

EndNote citation:

%0 Thesis
%A Yu, Brian 
%E Keutzer, Kurt 
%E DeNero, John 
%T Treating Models Better for Language-agnostic Understanding
%I EECS Department, University of California, Berkeley
%D 2023
%8 May 12
%@ UCB/EECS-2023-148
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2023/EECS-2023-148.html
%F Yu:EECS-2023-148