LogoAIAny
Icon for item

Generative Models by Stability AI

An open-source repository from Stability AI that collects implementations, training configs, demos and inference scripts for multiple generative models (e.g. SDXL, SV3D, SV4D, SV4D 2.0, Stable Video Diffusion). It is modular and config-driven, provides sampling/demo scripts, training examples, and references to model weights on Hugging Face.

Introduction

Generative Models by Stability AI

Generative Models by Stability AI is a public GitHub repository that organizes code, configuration and demos for a family of Stability AI's generative models (text-to-image, image-to-video, novel-view/4D video synthesis, etc.). The project emphasizes modular, config-driven design so researchers and engineers can compose encoders/conditioners/samplers and run both training and inference workflows.

Key highlights:

  • Models & releases (selected):

    • May 20, 2025 — Stable Video 4D 2.0 (SV4D 2.0): enhanced video-to-4D diffusion for novel-view video synthesis (trained to generate multi-view frames, better spatio-temporal consistency).
    • July 24, 2024 — Stable Video 4D (SV4D): video-to-4D diffusion used for novel-view video synthesis (5 frames x 8 views, sampling strategies for longer sequences).
    • March 18, 2024 — SV3D: image-to-video / multi-view synthesis (variants SV3D_u and SV3D_p).
    • November 2023 — SDXL-Turbo and Stable Video Diffusion releases and related technical reports.
    • July 2023 — SDXL family (base/refiner) initial releases and licensing notes.
  • Repository design & components:

    • Config-driven instantiation (yaml configs + instantiate_from_config pattern) to combine embedders, networks, samplers, and guiders.
    • GeneralConditioner abstraction for handling diverse conditioning (text, classes, spatial conditionings).
    • Separate samplers (numerical solvers) and guidance wrappers; denoiser framework for continuous & discrete-time models.
    • Training examples (configs/example_training), support for PyTorch Lightning, and notes on dataset format (webdataset).
  • Demos & inference:

    • Streamlit and Gradio demo scripts for sampling and video demos.
    • Quickstart sampling scripts and example commands for SV3D/SV4D/SV4D2.0.
    • Instructions to obtain model weights from Hugging Face and where to place them (checkpoints/).
  • Practical notes for users:

    • Installation steps (virtualenv, PyTorch wheel index, requirements files) and packaging with Hatch.
    • Guidance for low-VRAM inference (encoding_t/decoding_t flags, lower resolution) and background removal suggestions for better results (rembg/SAM2/Clipdrop).
    • Invisible watermark embedding/detection utilities and instructions to run detection scripts.
  • Use cases:

    • Research reproducibility of advanced generative models, rapid prototyping of novel-view and video synthesis, base code for training new diffusion-based models, and demonstration apps for sampling/visualization.

This repository is best suited for researchers and engineers familiar with PyTorch, diffusion models, and Hugging Face model distribution. It contains both high-level demos and low-level training/configuration examples to support experimentation and production prototyping.