One of the great aspects of open science is sharing the data we collect such that the research community can efficiently use existing data to address new questions. I'm dedicated to this effort and would like to encourage everyone to make use of the resources below and reach out to me if there are any questions.

Concadia, contextualized images with captions and alt descriptions from Wikipedia, consists of 96,918 images. It provides an opportunity to investigate and model which distinct information are relevant in image descriptions vs. captions, and how context shapes relevance.

Explore! | Repository | Kreiss et al. (Manuscript)

ReaSCAN, a benchmark that tests the compositional generalization capabilities of grounded language models. Complex navigational commands that require reasoning over multiple objects in a grid world need to be correctly mapped to action sequences. Most of the challenges posed are still unsolved.

Website | Repository | Wu & Kreiss et al. (2021)

SuspectGuilt is a collection of 474k local news reports about crime suspects from across the US. 1821 are further annotated according to (1) how likely readers considered the suspect to be guilty, and (2) what the readers thought about the author's belief. For each question, annotators also highlighted why they gave their response. We would like to encourage more research in the domain of crime reporting and specifically guilt perception.

Repository | Kreiss & Wang et al. (2020)

The Annotated Iterated Narration Corpus (AINC) contains chains of reproductions of news stories about crime suspects, a process similar to the game of Telephone. Each story exists in a weak and strong evidence condition, providing a step towards understanding how uncertainty affects news retellings. Stories and reproductions are further annotated with, for instance, readers' beliefs about a suspect's guilt and how subjectively written the story appeared.

Repository | Kreiss et al. (2019)