PILA: A Historical-Linguistic Dataset of Proto-Italic and Latin

  • 2024-04-25 06:33:47
  • Stephen Bothwell, Brian DuSell, David Chiang, Brian Krostenko
  • 0

Abstract

Computational historical linguistics seeks to systematically understandprocesses of sound change, including during periods at which little to noformal recording of language is attested. At the same time, few computationalresources exist which deeply explore phonological and morphological connectionsbetween proto-languages and their descendants. This is particularly true forthe family of Italic languages. To assist historical linguists in the study ofItalic sound change, we introduce the Proto-Italic to Latin (PILA) dataset,which consists of roughly 3,000 pairs of forms from Proto-Italic and Latin. Weprovide a detailed description of how our dataset was created and organized.Then, we exhibit PILA's value in two ways. First, we present baseline resultsfor PILA on a pair of traditional computational historical linguistics tasks.Second, we demonstrate PILA's capability for enhancing otherhistorical-linguistic datasets through a dataset compatibility study.

 

Quick Read (beta)

loading the full paper ...