Cluster-based Relevance Models for Automatic Image Annotation
The availability of digital image databases creates a demand for tools which automatically retrieve relevant images in response to user queries. A common approach to image retrieval is to use annotations as image proxies and to compare the query words and the description words of candidate images. This technique requires that annotations are produced in advance. However, for collections of realistic size, it is inconceivable to rely exclusively on manual annotation. A practical alternative is automatic annotation, where a computer system analyzes the visual features of an image to determine the appropriate description. A system for automatic image annotation learns from manually annotated images what correlations exist between words and visual components, and then uses the discovered relationships to automatically assign semantically descriptive words to new, unannotated images. Our work builds on a cross-media relevance modeling approach which learns such correspondences by comparing the visual components of the image to be annotated with those of already annotated images. We extend the relevance model with clustering, a technique for partitioning a space into groups of similar objects. The motivation is that clustering utilizes collection-wide features, which are ignored by an individual-document analysis, and that the similarity structure of the collection as a whole is a potentially useful source of information. We propose and evaluate two cluster-based relevance models. We compare their performance with that of the baseline unclustered model and show that by using cluster statistics in addition to individual image statistics we can better estimate the correlations between words and visual components.