Concadia is a corpus of contextualized images with captions and alt descriptions from Wikipedia, consisting of 96,918 images. It provides the opportunity to investigate and model how text produced for images is affected by context and the intended communicative purpose. For instance, captions and (alt) descriptions aim to address distinct questions which is reflected in the communicated information. While captions are intended to provide supplementary information to an image (and presupposing that a reader can view an image), descriptions are meant to replace the image (Kreiss, Goodman, Potts, 2021). Those text forms therefore pose their own challenges for automatic generation and evaluation. We hope that Concadia can help address these issues, specifically considering the relevance for improving accessibility of images across the Web.

Explore Concadia below by clicking through randomly sampled examples!



Article (excerpt): x

All of the examples above are captions and alt descriptions Wikipedia users provided for the respective images. The quality of those texts varies, especially for descriptions. This is likely due to the fact that they are not visually displayed to the majority of Wikipedia users. This formulates a separate challenge for advancing image accessibility systems: what makes good descriptions and how can we select for them.

Concadia: Tackling image accessibility with context.
Kreiss, E., Goodman, N.D, Potts, C., (Manuscript)