Overview
LTX-Video is an open-source video-generation repository and model family developed by Lightricks. Built around a DiT-based latent diffusion architecture, the project aims to deliver production-ready video generation with features commonly required in creative and VFX workflows: synchronized audio+video generation, multi-keyframe conditioning, image-to-video and video-to-video transforms, and support for high resolutions and high frame rates.
Key features
- Synchronized audio + video generation in a single model pass (audio and visuals aligned).
- Multiple model scales and flavors: full 13B models, distilled 13B/2B variants, and FP8/quantized builds for faster inference and lower VRAM usage.
- Supports image-to-video, multi-keyframe conditioning, keyframe-based animation, forward/backward video extension, and combinations of conditioning media.
- Production-grade outputs: native 4K capability (up to 50 FPS in model descriptions), multiscale pipelines, temporal and spatial upscalers, and LoRA/IC-LoRA control models for precise style and structure control.
- Integrations and tooling: ComfyUI workflows, Hugging Face model hosting, Diffusers pipeline support, and an online demo (LTX-Studio). Training tools are provided via a separate trainer repository for full fine-tuning or LoRA training.
Models & performance
LTX-Video provides several pre-configured pipelines and checkpoints (examples include ltxv-13b-dev, ltxv-13b-distilled, ltxv-2b-distilled, and FP8/quantized variants). Distilled and quantized models trade a small amount of quality for much faster inference and much lower VRAM requirements — useful for iteration and real-time generation on GPUs (and special kernels such as FP8 for Ada-class cards).
Quick start
- Online: LTX-Studio provides immediate image-to-video demos and playgrounds for trying models without local setup.
- Local: the repo includes an inference.py, example ComfyUI workflows, and instructions for creating a Python environment. Recommended environment tested with Python 3.10.5 and CUDA 12.2 (PyTorch >= 2.1.2); macOS MPS supported with specific PyTorch versions.
Example local command (image-to-video):
python inference.py --prompt "PROMPT" --conditioning_media_paths IMAGE_PATH --conditioning_start_frames 0 --height 512 --width 896 --num_frames 33 --pipeline_config configs/ltxv-13b-0.9.8-distilled.yamlIntegrations & community
- ComfyUI: official and community workflows for easy node-based generation and advanced techniques.
- Diffusers: pipeline integration is available for using the models inside Hugging Face's diffusers ecosystem.
- Hugging Face: model weights and control LoRAs are hosted for convenient access.
- Community projects: FP8 kernels, TeaCache acceleration, and community distilled/8-bit variants exist to improve speed or reduce resource needs.
Training & extensibility
A separate LTX-Video-Trainer repo supports fine-tuning of 2B and 13B variants and LoRA training to create custom control or effect LoRAs (depth, pose, canny, style/detailers). This makes the project suitable both for researchers and production teams that need to adapt the model.
License, citation & releases
The repo includes research citations (arXiv tech report) and release notes covering checkpoints, distilled builds, quantized models, and the later LTX-2 successor. The initial repository creation and public release metadata are tracked in the GitHub project.
Use cases & limitations
Use cases include creative video generation, storyboarding, visual prototyping, and rapid content creation. Limitations include significant GPU requirements for the largest models, potential ethical and copyright considerations for generated content, and the need to follow the project license for commercial usage.
