Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics

Abstract

Transformers have supplanted Recurrent Neural Networks as the dominantarchitecture for both natural language processing tasks and, despite criticismsof cognitive implausibility, for modelling the effect of predictability ononline human language comprehension. However, two recently developed recurrentneural network architectures, RWKV and Mamba, appear to perform naturallanguage tasks comparably to or better than transformers of equivalent scale.In this paper, we show that contemporary recurrent models are now also able tomatch - and in some cases, exceed - performance of comparably sizedtransformers at modeling online human language comprehension. This suggeststhat transformer language models are not uniquely suited to this task, andopens up new directions for debates about the extent to which architecturalfeatures of language models make them better or worse models of human languagecomprehension.

Quick Read (beta)

loading the full paper ...