A Data-Driven Representation for Sign Language Production

Abstract

Phonetic representations are used when recording spoken languages, but noequivalent exists for recording signed languages. As a result, linguists haveproposed several annotation systems that operate on the gloss or sub-unitlevel; however, these resources are notably irregular and scarce. Sign Language Production (SLP) aims to automatically translate spokenlanguage sentences into continuous sequences of sign language. However, currentstate-of-the-art approaches rely on scarce linguistic resources to work. Thishas limited progress in the field. This paper introduces an innovative solutionby transforming the continuous pose generation problem into a discrete sequencegeneration problem. Thus, overcoming the need for costly annotation. Although,if available, we leverage the additional information to enhance our approach. By applying Vector Quantisation (VQ) to sign language data, we first learn acodebook of short motions that can be combined to create a natural sequence ofsign. Where each token in the codebook can be thought of as the lexicon of ourrepresentation. Then using a transformer we perform a translation from spokenlanguage text to a sequence of codebook tokens. Each token can be directlymapped to a sequence of poses allowing the translation to be performed by asingle network. Furthermore, we present a sign stitching method to effectivelyjoin tokens together. We evaluate on the RWTH-PHOENIX-Weather-2014T(PHOENIX14T) and the more challenging Meine DGS Annotated (mDGS) datasets. Anextensive evaluation shows our approach outperforms previous methods,increasing the BLEU-1 back translation score by up to 72%.

Quick Read (beta)

loading the full paper ...