Why this matters
Real-world image restoration struggles because training pairs rarely match diverse natural degradations. GGT-100K addresses that gap by using modern multi-frame/multi-modal models (MFMs) to synthesize realistic LQ–HQ pairs at scale, so restoration models trained with these pairs generalize better to unseen real degradations.
What Sets It Apart
- Large, generative ground-truth corpus: 100,000 LQ–HQ pairs produced to cover a wide range of realistic degradations rather than handcrafted synthetic noise.
- Baselines and checkpoints included: baseline training code plus 20 pretrained checkpoints (10 models × 2 settings) to compare "existing data only" vs "existing data + GGT-100K." This lowers the barrier to reproduce comparisons.
- Convenient training lists: three JSONL split files (train_existing, train_existing_GGT, test_GGT_500) use relative paths for easy dataset joining in experiments.
- Clear license and provenance: released on Hugging Face with CC BY‑NC‑ND 4.0 and linked paper/ProjectPage for methodological details.
Who It's For & Trade-offs
Great fit if you train or evaluate image restoration models and need broader real-world degradation coverage for better generalization; useful for researchers benchmarking SOTA restorers and for engineers wanting ready-made training splits and checkpoints. Look elsewhere if you require fully permissive commercial licensing (CC BY‑NC‑ND forbids commercial reuse and derivatives) or if you need pixel-perfect, human-photographed ground truth rather than model-generated GT.
Where It Fits
Use GGT-100K as an augmentation or additional training corpus alongside existing real/synthetic datasets when assessing robustness to diverse degradations. It complements traditional datasets by providing generative ground truth derived from MFMs and is most informative when compared side-by-side with models trained without GGT-100K.
