Concadia is a corpus of contextualized images with captions and alt descriptions from Wikipedia, consisting of 96,918 images. It provides the opportunity to investigate and model how text produced for images is affected by context and the intended communicative purpose. For instance, captions and (alt) descriptions aim to address distinct questions which is reflected in the communicated information. While captions are intended to provide supplementary information to an image (and presupposing that a reader can view an image), descriptions are meant to replace the image (Kreiss, Goodman, Potts, 2021). Those text forms therefore pose their own challenges for automatic generation and evaluation. We hope that Concadia can help address these issues, specifically considering the relevance for improving accessibility of images across the Web.
Explore Concadia below by clicking through randomly sampled examples!