ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish

  • 2024-04-09 16:04:27
  • Fernando Gallego, Guillermo López-García, Luis Gasco-Sánchez, Martin Krallinger, Francisco J. Veredas
  • 0

Abstract

Advances in natural language processing techniques, such as named entityrecognition and normalization to widely used standardized terminologies likeUMLS or SNOMED-CT, along with the digitalization of electronic health records,have significantly advanced clinical text analysis. This study presentsClinLinker, a novel approach employing a two-phase pipeline for medical entitylinking that leverages the potential of in-domain adapted language models forbiomedical text mining: initial candidate retrieval using a SapBERT-basedbi-encoder and subsequent re-ranking with a cross-encoder, trained by followinga contrastive-learning strategy to be tailored to medical concepts in Spanish.This methodology, focused initially on content in Spanish, substantiallyoutperforming multilingual language models designed for the same purpose. Thisis true even for complex scenarios involving heterogeneous medicalterminologies and being trained on a subset of the original data. Our results,evaluated using top-k accuracy at 25 and other top-k metrics, demonstrate ourapproach's performance on two distinct clinical entity linking Gold Standardcorpora, DisTEMIST (diseases) and MedProcNER (clinical procedures),outperforming previous benchmarks by 40 points in DisTEMIST and 43 points inMedProcNER, both normalized to SNOMED-CT codes. These findings highlight ourapproach's ability to address language-specific nuances and set a new benchmarkin entity linking, offering a potent tool for enhancing the utility of digitalmedical records. The resulting system is of practical value, both for largescale automatic generation of structured data derived from clinical records, aswell as for exhaustive extraction and harmonization of predefined clinicalvariables of interest.

 

Quick Read (beta)

loading the full paper ...