Advancing Robust and Aligned Measures of Semantic Similarity in Large Language Models

Samarth Goel

EECS Department, University of California, Berkeley

Technical Report No. UCB/EECS-2024-84

May 10, 2024

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-84.pdf

With the increasing usage of text similarity measures in conjunction with Large Language Models (LLMs), greater scrutiny and evaluation methodologies are needed to ensure the correct metric choice for a given task. In this thesis, I will evaluate the ability of text simi- larity measures to be robust and aligned with a human understanding of semantic similarity and assess the e↵ectiveness of popular LLMs in maintaining semantic understanding. My core contributions are as follows. I develop and introduce the Unified semantic Similarity Metric Benchmark (USMB), a novel leaderboard for text similarity metrics composed of 10+ datasets and original tasks measuring human preference alignment, robustness, sensitivity, and clustering performance. My next contribution is the development of an ensembled text similarity measurement that achieves top scores in all tasks composing the USMB, beating the previously measured best overall score by 48.2%. I also demonstrate the robustness of this ensembled text similarity measurement on popular information retrieval tasks. Lastly, I contribute a new LLM benchmarking task titled Semantic Elasticity, a generalization of summarization that measures a model’s ability to compress and expand information and quantify the performance of 6 popular LLMs on this task. I hope that through this work, greater attention can be given to potential performance gains through proper metric treat- ment and selection and that the field’s ability to measure semantic similarity advances as a result.

Advisors: Kannan Ramchandran

BibTeX citation:

@mastersthesis{Goel:EECS-2024-84,
    Author= {Goel, Samarth},
    Title= {Advancing Robust and Aligned Measures of Semantic Similarity in Large Language Models},
    School= {EECS Department, University of California, Berkeley},
    Year= {2024},
    Month= {May},
    Url= {http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-84.html},
    Number= {UCB/EECS-2024-84},
    Abstract= {With the increasing usage of text similarity measures in conjunction with Large Language Models (LLMs), greater scrutiny and evaluation methodologies are needed to ensure the correct metric choice for a given task. In this thesis, I will evaluate the ability of text simi- larity measures to be robust and aligned with a human understanding of semantic similarity and assess the e↵ectiveness of popular LLMs in maintaining semantic understanding. My core contributions are as follows. I develop and introduce the Unified semantic Similarity Metric Benchmark (USMB), a novel leaderboard for text similarity metrics composed of 10+ datasets and original tasks measuring human preference alignment, robustness, sensitivity, and clustering performance. My next contribution is the development of an ensembled text similarity measurement that achieves top scores in all tasks composing the USMB, beating the previously measured best overall score by 48.2%. I also demonstrate the robustness of this ensembled text similarity measurement on popular information retrieval tasks. Lastly, I contribute a new LLM benchmarking task titled Semantic Elasticity, a generalization of summarization that measures a model’s ability to compress and expand information and quantify the performance of 6 popular LLMs on this task. I hope that through this work, greater attention can be given to potential performance gains through proper metric treat- ment and selection and that the field’s ability to measure semantic similarity advances as a result.},
}

EndNote citation:

%0 Thesis
%A Goel, Samarth 
%T Advancing Robust and Aligned Measures of Semantic Similarity in Large Language Models
%I EECS Department, University of California, Berkeley
%D 2024
%8 May 10
%@ UCB/EECS-2024-84
%U http://www2.eecs.berkeley.edu/Pubs/TechRpts/2024/EECS-2024-84.html
%F Goel:EECS-2024-84