Most real-world image restoration is bottlenecked by the scarcity of reliable paired LQ–HQ data; synthetic datasets fail to capture complex degradations, while capturing real pairs is costly. This work's core insight is to treat powerful multimodal foundation models as generators of high-quality ground truth (GGT): synthesize perceptually realistic, content-faithful HQ images from real low-quality inputs and use them as training targets to expand restoration data at scale.
Key Findings
- Systematic evaluation of nine state-of-the-art MFMs across diverse scenes and degradation types showed substantial variation; Nano-Banana-2 with VLM-based adaptive prompting produced the most perceptually realistic and content-faithful HQ outputs in the authors' tests, and was chosen to build the pipeline. (So what: not all MFMs are equally suitable as synthetic GT sources.)
- Constructed GGT-100K with 103,707 LQ–HQ training pairs and a 500-pair test set, produced via a multi-stage synthesis and quality-control pipeline. (So what: provides large-scale paired data reflecting complex, real degradations.)
- Extensive experiments demonstrate that training or finetuning restoration models on GGT-100K consistently improves real-world generalization across model families, with especially strong gains when finetuning generative restorers. (So what: GGT can be an effective substitute/augmentation when real paired data are scarce.)
- The pipeline emphasizes data reliability (filtering and QC), but the results still depend on the chosen MFM and prompting strategy. (So what: dataset quality is tied to the generator's fidelity.)
Who it's for and tradeoffs
Great fit if you are a researcher or practitioner who needs large-scale paired data to improve real-world image restoration generalization but lack the budget or logistics to capture and align physical HQ references. This dataset is particularly useful for finetuning generative restoration models and for benchmarking cross-domain robustness.
Look elsewhere if you require physically measured, sensor-accurate ground truth (e.g., scientific imaging or metrology), or if you cannot rely on external MFMs due to licensing, reproducibility, or deployment constraints. Synthetic HQ targets can still introduce subtle artifacts or distributional biases that differ from true optical measurements, so validation on real captured pairs remains important.
Where it fits
GGT-100K sits between purely synthetic degradations (which often underestimate real complexity) and small, expensive real paired datasets: it uses generative models to approximate HQ targets at scale, acting as a pragmatic data augmentation and domain-bridging resource rather than a complete substitute for carefully captured physical ground truth.
