Social media imagery offers a low-latency source of situational information during natural and human-induced disasters, but the complex, safety-critical reasoning required for disaster response is poorly served by general-purpose vision–language models. We introduce DisasterVQA, a benchmark dataset designed for perception and reasoning in crisis contexts. It comprises 1,395 real-world images and 4,405 expert-curated question–answer pairs spanning diverse events such as floods, wildfires, and earthquakes. Grounded in established humanitarian frameworks including FEMA’s Emergency Support Functions (ESF) and OCHA’s Multi-Cluster/Sector Initial Rapid Assessment (MIRA), the dataset features binary, multiple-choice, and open-ended questions that target both situational awareness and operational decision-making. DisasterVQA provides a challenging and practical benchmark to guide the development of more robust and operationally meaningful vision–language models for disaster response.
Paper: ICWSM 2026 · Dataset: Hugging Face, Zenodo · Code: GitHub