AI Image2023

Generative Models by Stability AI

Reference implementation for Stability AI's diffusion models: SDXL base/refiner/Turbo for text-to-image, plus Stable Video Diffusion, SV3D, and SV4D for image-to-video and 4D synthesis. A modular engine separates samplers, guiders, and conditioners.

Visit Website

Introduction

Most "official model" repos are a thin demo wrapper around a single checkpoint. This one is the scaffolding Stability AI actually ships its releases on, which makes its configs/ tree a better map of how a modern diffusion system is wired than most papers. The core idea: every piece — denoiser, sampler, guider, conditioner — is a swappable module instantiated from YAML, so moving from a discrete-time SDXL setup to a continuous-time video model is a config change, not a rewrite.

What Sets It Apart

One codebase spans the whole lineup — SDXL 1.0 base/refiner and SDXL-Turbo for images, Stable Video Diffusion (SVD/SVD-XT) for image-to-video, and SV3D/SV4D for multi-view and 4D synthesis. You can trace how each builds on the same engine.
The sampling stack is decomposed into guiders, samplers, and discretizations, so research on a new guidance scheme drops in without touching the model.
Code is MIT-licensed while weights carry the CreativeML Open RAIL++-M license — a split that matters if you plan to ship derivatives.

Who It's For

Great fit if you want to study a production diffusion stack, fine-tune SDXL/SVD with the exact training engine Stability used, or build research on top of a clean sampler/conditioner abstraction. Look elsewhere if you just want to generate images quickly — the API surface is config-heavy, and a higher-level library like Diffusers will get you to a first render faster with far less setup.

Where It Fits

Diffusers optimizes for breadth and one-line pipelines across many vendors; this repo optimizes for fidelity to Stability's own training and inference recipes. Reach for it when you need the reference behavior, not a convenience wrapper.

Back

Information

Websitegithub.com
AuthorsStability AI
Published date2023/06/22

More Items

AI Video2026

LTX-Video 2.3 22B — IC-LoRA: CrossView Prompt v0.9

Cseti

Generates a new camera viewpoint from a reference video: an IC‑LoRA adapter for LTX‑Video 2.3 that re‑renders the same scene from a requested discrete camera angle while preserving subject and content. Trained on synthetic multi‑view data, proof‑of‑concept with limited viewpoint range and best for small, chained angle shifts.

ai-video video lora huggingface vision+1

AI Video2026

Wan-Dancer-14B

Mingyang Huang, Peng Zhang +3

Generates minute-scale, temporally coherent dance videos from full music tracks using a hierarchical two-stage approach: global keyframe planning plus local temporal refinement; suitable when long-range musical structure and rhythmic continuity matter.

diffusers ai-video video AIGC multimodal+2

AI Infra2025

Apache Ossie

Apache Software Foundation

Defines a vendor-neutral JSON/YAML semantic model specification and tooling to exchange metrics, dimensions, lineage and other business semantics across analytics, AI and BI platforms; includes a core spec, validators, converters (dbt, GoodData, Salesforce) and example models.

json ai ai-development ai-tools github+2