Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

benchmark

benchmarks

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

coding

coding-agents

copilot

course

cpu

cuda

cursor

deepmind

deepseek

depth

devops

diffusers

distillation

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

flow-matching

foundation

foundation-model

gcode

gemini

gemini-cli

gemma

genomics

gitHub

github

go

google

gradient-booting

grok

groq

huggingface

image

ios

java

javascript

json

kimi

llama.cpp

LLM

llm

long-horizon

lora

mLOps

math

mcp

mcp-client

mcp-server

meta-ai

meta-pytorch

metal

microsoft

mlops

mobile

multilingual

multimodal

mysql

NLP

nlp

nodejs

numpy

nvidia

ocr

ollama

openai

opencode

pandas

paper

parquet

physics

pi

plugin

polars

postgres

privacy

programming

prompt-engineering

pwa

python

pytorch

qwen

react

reasoning

redis

retrieval

RL

rl

robotics

rust

science

security

segmentation

shodan

skillkit

software-engineering

sora

speech

sqlite

ssh

stt

swe

swift

tensorrt

terminal

transformers

translation

tts

tutorial

typescript

vibe-coding

video

vision

vllm

voice

vulkan

web-search

windsurf

xAI

xai

ByteDance/Bernini-R

Provides the renderer weights and inference code for Bernini’s video renderer, enabling text→video, image→video and video editing inference. Offers a ready diffusers-format bundle or safetensors checkpoints under Apache‑2.0; intended for multi‑GPU/Hopper inference and reproducible research.

bytedance huggingface diffusers video ai-video+3

Echo-LongVideo (JoyAI-Echo)

Echo Team @ Joy Future Academy, JD, jdopensource

Generates minute-level, multi-shot synchronized audio+video from a single text prompt, using a paired cross-modal memory to preserve character appearance and voice across shots. Uses DMD-distilled few-step inference for ~7.5× speedup; requires high-GPU memory and is released under the LTX-2 community license.

ai-video video audio multimodal huggingface+3

SynthTraces

Generates synthetic coding-agent session traces by pairing remotely hosted open agent models with local llama.cpp user models across real open-source codebases. Each trace records read/write/edit/bash actions and tool use; the dataset is a reproducible cartesian product (20×3×20×20 = 24,000 sessions) under an MIT license.

code github ai-coding ai-agent agent-skills+3

CustoMDiT / PexelsCustom-1M

Provides 1,036,431 identity–text–video triplets with per-video JSON annotations and reference keyframes to train and evaluate identity-preserving customized video generation models. Data is drawn from ~320K Pexels HD videos; videos must be downloaded separately per Pexels' terms.

ai-video video huggingface pandas diffusers+1

DiffusionGemma 26B A4B

Google DeepMind

Generates text from interleaved text, image, and short-video inputs using discrete diffusion and block‑autoregressive multi‑canvas sampling; built on a sparse MoE (8/128) Gemma 4 backbone and optimized for low‑latency inference and very long contexts (up to 256K tokens).

gemma foundation-model multimodal vision transformers+5

SCAIL-2

End-to-end pose-driven image-to-video model that animates a reference character from a driving video, supporting cross-identity replacement and multi-character scenarios without intermediate pose representations; performs best at 704p and ships as a diffusers-compatible checkpoint.

diffusers video ai-video huggingface image

RazzzHF/Realism_Engine_Ideogram_4

Fine-tuned Hugging Face image-generation model that biases Ideogram-style prompts toward photorealistic outputs. Emphasizes natural lighting and realistic materials to reduce prompt tweaking; license not specified.

huggingface ai-image image foundation-model multimodal+2

AI Video Papers2026

OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data

Jiwen Liu, Shujuan Li +9

Encodes and clones camera motion from reference videos to generate multi-shot videos — uses a visual "camera grid" to represent camera parameters, trains on million-scale grid–video pairs, and employs a hierarchical prompt-expansion agent to coordinate camera, subject, and action control for multimodal diffusion models.

video multimodal ai-video vision prompt-engineering+2

Krea 2 (Comfy-Org/Krea-2)

Comfy-Org, Krea

Provides ComfyUI-ready repackaged checkpoints of the Krea 2 image model family for local text-to-image workflows. Includes RAW (undistilled base for fine-tuning and LoRA training) and Turbo (8-step distilled checkpoint for fast inference), using a Qwen Image VAE and Qwen3‑VL encoder.

qwen diffusers huggingface ai-image image+2

Krea 2 Turbo

Sangwu Lee, Erwann Millon +14Krea.ai, Inc.

Generates images from natural-language prompts as an 8-step distilled checkpoint of Krea 2, optimized for fast iterative text-to-image workflows with style references and 1K–2K resolution outputs.

diffusers ai-image image vision huggingface+5

fal · Krea 2 Style LoRAs

Provides 1,503 Krea 2 style LoRAs (original safetensors + ComfyUI builds) trained on fal.ai, each with a short trigger phrase and downloadable weights for quick style transfer or further retraining.

huggingface ai-image ai-tools diffusers AIGC+2

Sun Direction LoRA (Flux2Klein 9B)

eric-venti-seeds

Applies or repositions directional sunlight in outdoor images by using a LoRA trained for Flux2Klein 9B to match a reference sun elevation and rotation. Workflow uses an overcast intermediate and a sphere (ball) reference; includes a ComfyUI node and Blender scene for rendering the reference.

ai-image image huggingface diffusers ai-demos