MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

Abstract

This paper presents our system developed for the SemEval-2024 Task 1:Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims todetect semantic relatedness of two sentences in a given target language withoutaccess to direct supervision (i.e. zero-shot cross-lingual transfer). To thisend, we focus on different source language selection strategies on twodifferent pre-trained languages models: XLM-R and Furina. We experiment with 1)single-source transfer and select source languages based on typologicalsimilarity, 2) augmenting English training data with the two nearest-neighborsource languages, and 3) multi-source transfer where we compare selecting onall training languages against languages from the same family. We further studymachine translation-based data augmentation and the impact of scriptdifferences. Our submission achieved the first place in the C8 (Kinyarwanda)test set.

Quick Read (beta)

loading the full paper ...