Paired brain MRI scans and radiology text annotations for multimodal vision–language research. Provides image-level labels and image–text pairs suited for VQA, classification, and image-to-text tasks; CC BY-NC-SA 4.0 and ~10K–100K samples — research/non-commercial use.
Provides ~1M synthetic Salvadoran‑Spanish personas (148k records, ~300M tokens) grounded in 2024 census distributions for demographics, occupations and locations; intended for training/evaluating localized LLMs and synthetic-data workflows. CC BY 4.0, adults only.
Clinical MRI datasets with reliable paired text labels are scarce, yet they are essential for training and evaluating multimodal medical models. MR-RATE addresses that gap by providing a curated collection of brain MRI volumes linked to radiology-style annotations and ratings, enabling tasks from visual question answering to image classification and image-to-text generation.
Great fit if you are developing or evaluating multimodal medical/vision-language models, probing clinical reasoning in foundation models, or benchmarking VQA and image-to-text approaches on MRI data. Look elsewhere if you need fully de-identified, hospital-grade DICOM metadata for deployment, larger-scale population cohorts, or a permissive commercial license — the CC BY-NC-SA 4.0 terms and dataset scope limit production-use and very large-scale training.
The dataset emphasizes clinically relevant MRI content paired with short reports/ratings rather than exhaustive clinical histories. Expect common medical-data caveats: verify provenance, comply with institutional policies for human-data use, and confirm whether provided labels meet your annotation quality requirements before using for model training or evaluation.