Breaking the Silence Detecting and Mitigating Gendered Abuse in Hindi, Tamil, and Indian English Online Spaces

Abstract

Online gender-based harassment is a widespread issue limiting the freeexpression and participation of women and marginalized genders in digitalspaces. Detecting such abusive content can enable platforms to curb thismenace. We participated in the Gendered Abuse Detection in Indic Languagesshared task at ICON2023 that provided datasets of annotated Twitter posts inEnglish, Hindi and Tamil for building classifiers to identify gendered abuse.Our team CNLP-NITS-PP developed an ensemble approach combining CNN and BiLSTMnetworks that can effectively model semantic and sequential patterns in textualdata. The CNN captures localized features indicative of abusive languagethrough its convolution filters applied on embedded input text. To determinecontext-based offensiveness, the BiLSTM analyzes this sequence for dependenciesamong words and phrases. Multiple variations were trained using FastText andGloVe word embeddings for each language dataset comprising over 7,600crowdsourced annotations across labels for explicit abuse, targeted minorityattacks and general offences. The validation scores showed strong performanceacross f1-measures, especially for English 0.84. Our experiments reveal howcustomizing embeddings and model hyperparameters can improve detectioncapability. The proposed architecture ranked 1st in the competition, provingits ability to handle real-world noisy text with code-switching. This techniquehas a promising scope as platforms aim to combat cyber harassment facing Indiclanguage internet users. Our Code is athttps://github.com/advaithavetagiri/CNLP-NITS-PP

Quick Read (beta)

loading the full paper ...