Overview
Stable Diffusion WebUI Forge ("Forge") is an enhanced distribution/extension platform built on top of AUTOMATIC1111's Stable Diffusion WebUI. Implemented with Gradio for the web UI layer, Forge's goals are to streamline development for contributors and advanced users, improve resource management (GPU memory, offloading and swapping), accelerate inference for low-bit and quantized models, and serve as a modular "Forge" ecosystem where experimental features and integrations can be developed and tested.
Key features
- One-click installers: prebuilt packages with bundled git and Python plus tested CUDA/PyTorch environments for easier setup (multiple CUDA/PyTorch combos provided in releases).
- Resource & inference optimizations: GPU weight slider, queue/async swap controls, offload location/method settings, plus explicit support for low-bit formats like Flux BNB NF4 and GGUF quantizations (Q8_0/Q5_0/Q5_1/Q4_0/Q4_1) and BitsAndBytes models.
- LoRA & model handling: improved LoRA loading behavior, options to avoid repetitive LoRA patching per generation, and troubleshooting guides for LoRA compatibility on low-bit models.
- Integrated experimental modules: examples include FreeU integration (Unet patcher / Fourier filtering example), custom Unet implementation files, enhanced Gradio 4 canvas with right-mouse pan behavior, and multiple ControlNet/adapter interactions.
- Extension ecosystem: built-in extensions and compatibility with many popular SD-WebUI extensions; the project maintains lists of integrated extensions, replacements and status reports.
- Status & testing table: manual test results for major components (diffusion, preprocessors, ControlNets, integrated extensions, touch pressure support, etc.) to indicate what works and what is pending.
Technical details
- Base: Forge is based on a specific commit of AUTOMATIC1111's SD-WebUI (documented in README); maintainers plan periodic syncs (noted as every 90 days or when needed).
- Quantization & offload: supports multiple quantized formats natively and exposes UI controls to tune GPU weight, offload strategy and swap behavior to accommodate limited GPU memory.
- Unet patching: includes single-file examples (UnetPatcher / FreeU implementation) showing how to patch Unet output blocks and apply frequency-domain filters (torch.fft) or channel-wise scaling.
- Compatibility: offers an "advanced install" path for users who want to add Forge as a branch to an existing SD-WebUI repo, enabling reuse of existing checkpoints and extensions when done correctly.
Installation & usage
- Recommended quick start: download the one-click release package that matches your CUDA/PyTorch environment, extract, run
update.batandrun.bat. - Advanced users: git-clone the repo as a branch or follow advanced install instructions to integrate Forge into an existing SD-WebUI setup.
- Important: README emphasizes running update scripts after extracting to ensure the latest fixes are applied.
Use cases
- Users with limited GPU memory who need robust offload/swap control for large or quantized models.
- Researchers experimenting with Unet-level patches, Fourier filtering, or new sampler/patch approaches.
- Creators wanting an integrated environment that bundles multiple optimizations and UI improvements for stable-diffusion-based image generation.
Project status & maintenance
- The repository contains a detailed status table showing tested features and known broken areas (with dates of last manual test). Project maintainer(s) invite issue reports and note that fresh reinstalls often resolve problems not reproducible by maintainers.
Notes & caveats
- Forge extends and modifies SD-WebUI behavior; compatibility with third-party extensions may vary and users should follow the project's docs for integration tips.
- Some features (e.g., certain ControlNet unions or OFT LoRAs) may be listed as not implemented or broken in the status table — check the repo's NEWS/discussions for updates and workarounds.
