Abstract:
Different words can evoke different varieties of mental images. We call this visual variety. It helps to explore the relationship between words and their associated visual images. In this study, we aim to contribute to the quantification of the visual variety of a word by identifying visual features useful for estimating it. We analyze the relationship between the variety of individual visual features, such as color and texture, extracted from images associated with an adjective, and its visual variety perceived by humans. Specifically, we first conduct a questionnaire to investigate the visual varieties of a list of adjectives perceived by humans. We then collect image data from the Web and perform clustering on feature vectors of various visual features. Using the weighted entropy of the clustering results, we quantitatively evaluate the variety of each visual feature as the feature variety score. Based on these results, we perform a comparative analysis using rank correlation coefficients. Furthermore, we estimate which of the given two words has a higher visual variety from the feature variety scores, and analyze the relationship between each visual feature and the visual variety by identifying effective combinations of features through feature selection. Experimental results demonstrate that a simple linear classification using the feature variety scores allows for highly accurate estimation of which of the two given words has a higher visual variety. In addition, the results suggest that hue and scene features of images are especially strongly associated with the visual variety, compared to other visual features.
Type: MUWS '25 @ ACMMM '25
Publication date: To be published in Oct 2025