Other languages

Preliminary Study of VLM-based Verbalization of Learning Content for Comprehension Assessment

Authors: Maaya Tamaru, Shota Harada, Marc A. Kastner, Ikuhisa Mitsugami

Abstract:

In educational settings, students' levels of comprehension vary even when they attend the same lecture. This difference is thought to be influenced not only by learners' innate abilities but also by their attitude and the way they direct their attention during the lecture. Therefore, this study aims to contribute to improving lecture quality and enhancing student academic achievement by estimating comprehension levels. In a prior study, we investigated the relationship between the time series of gazed objects during a lecture and students’ levels of comprehension. Students attended a lecture while wearing eye trackers. We applied segmentation to the acquired first-person view videos and compared gaze coordinates with the segmented regions to generate a chronological sequence of gazed objects. Based on the results of a paper-based test administered after the lecture, we observed differences in the time series of gazed objects depending on students’ comprehension levels. However, this method only identifies what was looked at and does not analyze what content students actually absorbed from the lecture. To address this limitation, the present study proposes a novel quantitative method that uses a Vision-Language Model (VLM) to convert lecture slides into text and evaluate the extent to which students’ gaze histories cover the lecture content. First, we use a VLM to extract the lecture content as text from the slides. Next, we use gaze information obtained with an eye tracker to output the lecture content the student actually gazed at as a gaze history. We then compare the content extracted from the gaze history with the entirety of the slide content to quantitatively evaluate how much of the lecture content the student's visual attention covered. In this study, we experimentally investigated the relationship between this quantified content coverage rate and students' levels of comprehension.

Type: 17th Asia-Pacific Workshop on Mixed and Augmented Reality (APMAR 2025), Pitch Your Work Presentation Track

Publication date: To be published in Sep 2025

If you have questions or ideas about this research, feel free to leave a comment below or send me an email. I will reply quickly.