Abstract
The advancements of technology have led to the use of multimodal systems invarious real-world applications. Among them, the audio-visual systems are oneof the widely used multimodal systems. In the recent years, associating faceand voice of a person has gained attention due to presence of uniquecorrelation between them. The Face-voice Association in MultilingualEnvironments (FAME) Challenge 2024 focuses on exploring face-voice associationunder a unique condition of multilingual scenario. This condition is inspiredfrom the fact that half of the world's population is bilingual and most oftenpeople communicate under multilingual scenario. The challenge uses a datasetnamely, Multilingual Audio-Visual (MAV-Celeb) for exploring face-voiceassociation in multilingual environments. This report provides the details ofthe challenge, dataset, baselines and task details for the FAME Challenge.