Search
Collection
Category
Tag
Daily AI

Tag

Explore by tags

AIAIAny

Curated AI Resources for Everyone

[email protected]

Product

Search
Collection
Category
Tag

Resources

Blog

Company

Privacy Policy
Terms of Service
Sitemap

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

benchmark

benchmarks

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

coding

coding-agents

copilot

course

cpu

cuda

cursor

deepmind

deepseek

depth

devops

diffusers

distillation

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

flow-matching

foundation

foundation-model

gcode

gemini

gemini-cli

gemma

genomics

gitHub

github

go

google

gradient-booting

grok

groq

huggingface

image

ios

java

javascript

json

kimi

llama.cpp

LLM

llm

long-horizon

lora

mLOps

math

mcp

mcp-client

mcp-server

meta-ai

meta-pytorch

metal

microsoft

mlops

mobile

multilingual

multimodal

mysql

NLP

nlp

nodejs

numpy

nvidia

ocr

ollama

openai

opencode

pandas

paper

parquet

physics

pi

plugin

polars

postgres

privacy

programming

prompt-engineering

pwa

python

pytorch

qwen

react

reasoning

redis

retrieval

RL

robotics

rust

science

security

segmentation

shodan

skillkit

software-engineering

sora

speech

sqlite

ssh

stt

swe

swift

tensorrt

terminal

transformers

translation

tts

tutorial

typescript

vibe-coding

video

vision

vllm

voice

vulkan

web-search

windsurf

xAI

xai

AI Model·2026

ideogram-ai/ideogram-4-fp8

ideogram-ai

Text-to-image model packaged for Diffusers that uses fp8 quantization to lower memory and speed up inference. Delivered as a safetensors checkpoint on Hugging Face with an Ideogram pipeline; created May 30, 2026 — license unspecified.

#diffusers #huggingface #ai-image #image #AIGC+3

AI Model·2026

ideogram-4-nf4

ideogram-ai

NF4-quantized text-to-image diffusion model released as safetensors and compatible with the Diffusers Ideogram4Pipeline — optimized for lower-memory local inference and faster deployments while preserving the original model's text-to-image capabilities.

#diffusers #ai-image #image #AIGC #foundation-model

Computer Vision Papers·2026

Cosmos 3: Omnimodal World Models for Physical AI

Aditi, Niket Agarwal +9

Omnimodal world model that jointly processes and generates text, images, video, audio, and action trajectories for physical AI. Uses a mixture-of-transformers to combine autoregressive reasoning and diffusion-based multimodal generation; released open-source with checkpoints, datasets and benchmarks for robotics and simulation.

#foundation-model #multimodal #video #image #robotics+4

AI Agent Papers·2026

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

Junqi Liu, Salena Song +13

Workflow-aware benchmark for autonomous medical-AI research that splits agent execution into five stages (Plan, Setup, Validate, Inference, Submit) and evaluates long-horizon runs across segmentation, image enhancement, VQA, report generation, and lesion detection with stage-level scoring.

#vision #multimodal #ai-agent #agent-skills #ai-workflow+2

AI Model·2026

unsloth/gemma-4-26B-A4B-it-qat-GGUF

unsloth

A GGUF release of Gemma 4 26B A4B (QAT) packaged by Unsloth for local multimodal inference — quantization-aware trained to keep near-bfloat16 quality while significantly lowering memory requirements, compatible with Transformers and Unsloth tooling.

#gemma #huggingface #transformers #llm #vision+3

AI Video·2026

SCAIL-2

zai-org

End-to-end pose-driven image-to-video model that animates a reference character from a driving video, supporting cross-identity replacement and multi-character scenarios without intermediate pose representations; performs best at 704p and ships as a diffusers-compatible checkpoint.

#diffusers #video #ai-video #huggingface #image

AI Model·2026

RazzzHF/Realism_Engine_Ideogram_4

RazzzHF

Fine-tuned Hugging Face image-generation model that biases Ideogram-style prompts toward photorealistic outputs. Emphasizes natural lighting and realistic materials to reduce prompt tweaking; license not specified.

#huggingface #ai-image #image #foundation-model #multimodal+2

AI Dataset·2026

KSAFE-MM

K-intelligence

Benchmark for evaluating multimodal LLM safety in Korean cultural contexts — includes KSAFE-MM-G which localizes global safety queries into Korean scenarios and KSAFE-MM-C which targets culture-specific visual-textual vulnerabilities. Provides curated image–text pairs and jailbreak-style prompts to reveal both unsafe behaviors and over-refusal.

#multimodal #vision #image #evaluation #huggingface+3

AI Model·2026

Rio 3.5 Open 397B

IplanRIO (prefeitura-rio)

A post-trained Mixture-of-Experts multimodal LLM with ~397B total (≈17B active) and a 1,010,000-token context for image-text-to-text and conversational tasks. Integrates SwiReasoning to switch between latent and explicit reasoning; MIT-licensed and optimized for Portuguese/English research and on-prem inference.

#transformers #multilingual #multimodal #huggingface #vllm+4

Computer Vision Papers·2026

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Dian Zheng, Harry Lee +5

Adds interleaved text–image generation to existing image generators via a multi-agent pipeline: a planner sequences stepwise instructions, a critic detects and refines failures, and single-step RL (GRPO) reinforces per-step corrections—suited for visual narratives and embodied guidance.

#multimodal #vision #ai-image #image #RL+3

AI Image·2026

Krea 2 (Comfy-Org/Krea-2)

Comfy-Org, Krea

Provides ComfyUI-ready repackaged checkpoints of the Krea 2 image model family for local text-to-image workflows. Includes RAW (undistilled base for fine-tuning and LoRA training) and Turbo (8-step distilled checkpoint for fast inference), using a Qwen Image VAE and Qwen3‑VL encoder.

#qwen #diffusers #huggingface #ai-image #image+2

Computer Vision Papers·2026

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Yatai Ji, An-Chieh Cheng +14

Provides a dual-path approach for spatial vision-language models: a Language-Only Reasoning (LOR) path for stepwise linguistic deduction and a Detect-Then-Reason (DTR) path that detects 3D cues via region tokens before numerical inference. Trains with chain-of-thought cold-start supervision and reinforcement learning to improve 3D grounding and multi-step spatial reasoning.

#vision #multimodal #RL #paper #depth+1