Driver Activity Classification Using Generalizable Representations from Vision-Language Models

  • 2024-04-23 11:42:24
  • Ross Greer, Mathias Viborg Andersen, Andreas Møgelmose, Mohan Trivedi
  • 0

Abstract

Driver activity classification is crucial for ensuring road safety, withapplications ranging from driver assistance systems to autonomous vehiclecontrol transitions. In this paper, we present a novel approach leveraginggeneralizable representations from vision-language models for driver activityclassification. Our method employs a Semantic Representation Late Fusion NeuralNetwork (SRLF-Net) to process synchronized video frames from multipleperspectives. Each frame is encoded using a pretrained vision-language encoder,and the resulting embeddings are fused to generate class probabilitypredictions. By leveraging contrastively-learned vision-languagerepresentations, our approach achieves robust performance across diverse driveractivities. We evaluate our method on the Naturalistic Driving ActionRecognition Dataset, demonstrating strong accuracy across many classes. Ourresults suggest that vision-language representations offer a promising avenuefor driver monitoring systems, providing both accuracy and interpretabilitythrough natural language descriptors.

 

Quick Read (beta)

loading the full paper ...