Abstract
Language models now constitute essential tools for improving efficiency formany professional tasks such as writing, coding, or learning. For this reason,it is imperative to identify inherent biases. In the field of Natural LanguageProcessing, five sources of bias are well-identified: data, annotation,representation, models, and research design. This study focuses on biasesrelated to geographical knowledge. We explore the connection between geographyand language models by highlighting their tendency to misrepresent spatialinformation, thus leading to distortions in the representation of geographicaldistances. This study introduces four indicators to assess these distortions,by comparing geographical and semantic distances. Experiments are conductedfrom these four indicators with ten widely used language models. Resultsunderscore the critical necessity of inspecting and rectifying spatial biasesin language models to ensure accurate and equitable representations.