Search
Collection
Category
Tag
Daily AI

Tag

Explore by tags

AIAIAny

Curated AI Resources for Everyone

[email protected]

Product

Search
Collection
Category
Tag

Resources

Blog

Company

Privacy Policy
Terms of Service
Sitemap

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

benchmark

benchmarks

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

coding

coding-agents

copilot

course

cpu

cuda

cursor

deepmind

deepseek

depth

devops

diffusers

distillation

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

flow-matching

foundation

foundation-model

gcode

gemini

gemini-cli

gemma

genomics

gitHub

github

go

google

gradient-booting

grok

groq

huggingface

image

ios

java

javascript

json

kimi

llama.cpp

LLM

llm

long-horizon

lora

mLOps

math

mcp

mcp-client

mcp-server

meta-ai

meta-pytorch

metal

microsoft

mlops

mobile

multilingual

multimodal

mysql

NLP

nlp

nodejs

numpy

nvidia

ocr

ollama

openai

opencode

pandas

paper

parquet

physics

pi

plugin

polars

postgres

privacy

programming

prompt-engineering

pwa

python

pytorch

qwen

react

reasoning

redis

retrieval

RL

rl

robotics

rust

science

security

segmentation

shodan

skillkit

software-engineering

sora

speech

sqlite

ssh

stt

swe

swift

tensorrt

terminal

transformers

translation

tts

tutorial

typescript

vibe-coding

video

vision

vllm

voice

vulkan

web-search

windsurf

xAI

xai

youtube

AI Model·2026

NVIDIA Cosmos3-Super-Image2Video

NVIDIA

Generates temporally coherent MP4 videos from a single input image plus text instructions, with configurable resolution, frame count, and optional AAC audio. Optimized for NVIDIA GPU stacks and integrates with vLLM‑Omni and Hugging Face Diffusers for production inference and research workflows.

#nvidia #huggingface #diffusers #ai-video #video+5

AI Model·2026

Bonsai Image · Ternary 4B (gemlite 2-bit)

Prism ML (prism-ml)

A ternary-weight (~1.58-bit) 4B text-to-image diffusion transformer optimized for NVIDIA GPUs using Gemlite INT2 and HQQ; it reduces the transformer to ~1.21 GB (4.55 GB CUDA payload) and targets 1024×1024 generation with a 4-step FlowMatch-Euler sampler.

#huggingface #ai-image #image #nvidia #ai-inference+3

AI Model·2026

LocateAnything-3B

NVIDIA

Performs fast, high-quality vision–language grounding: given an image plus a natural-language prompt it returns bounding boxes or points for referred objects. Uses Parallel Box Decoding for parallel coordinate prediction (higher throughput) and targets research/non-commercial use.

#nvidia #vision #multimodal #transformers #huggingface+5

AI Model·2026

nvidia/Qwen3.6-35B-A3B-NVFP4

nvidia

Quantized NVFP4 build of the Qwen3.6-35B MoE language model, optimized with NVIDIA Model Optimizer to cut model size and GPU memory by ~3.06× for inference. Designed for vLLM and NVIDIA GPU deployments (Hopper/Blackwell).

#nvidia #huggingface #vllm #llm #ai-inference+3

AI Model·2026

Cosmos3-Super-Text2Image

NVIDIA

Generates high-fidelity images from text prompts using NVIDIA's 64B Cosmos3-Super multimodal foundation model. Integrates with Hugging Face Diffusers and vLLM‑Omni, is released under OpenMDW1.1 for commercial use, and is optimized for Physical AI workflows (robotics, AV, simulation).

#nvidia #huggingface #diffusers #vllm #ai-image+5

AI Dataset·2026

Nemotron-Pretraining-Code-v3

NVIDIA Corporation

Metadata-only corpus of 146.3M new GitHub source-code files (commit_id, rel_path, language) intended as an incremental update to Nemotron v1/v2 for LLM code pretraining; CC-BY-4.0 licensed and designed to be used jointly with older versions.

#nvidia #huggingface #code #github #llm+4

AI Video Papers·2026

SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer

Yuyang Zhao, Yicheng Pan +7

Enables real-time streaming video-to-video editing (1280×704 @24 FPS) on a single RTX 5090 GPU. Uses a Hybrid Diffusion Transformer for balanced local/global modeling, Cycle‑Reverse Regularization for temporal consistency, and system-level mixed-precision and fused kernels to maximize throughput.

#video #ai-video #vision #transformers #nvidia+2

AI Model·2026

Cosmos3-Super

NVIDIA

Generates and reasons about multimodal physical-world content—text, images, video, audio, and robot/action trajectories—conditioned on combinations of text, image, video and action inputs. The 64B “Super” variant targets Physical AI use cases and supports vLLM‑Omni, Diffusers, and action prediction.

#nvidia #huggingface #multimodal #robotics #ai-video+5

Computer Vision Papers·2026

Cosmos 3: Omnimodal World Models for Physical AI

Aditi, Niket Agarwal +9

Omnimodal world model that jointly processes and generates text, images, video, audio, and action trajectories for physical AI. Uses a mixture-of-transformers to combine autoregressive reasoning and diffusion-based multimodal generation; released open-source with checkpoints, datasets and benchmarks for robotics and simulation.

#foundation-model #multimodal #video #image #robotics+4

AI Dataset·2026

Nemotron-Personas-El-Salvador

Rodrigo Malossi, Andre Manoel +9

Provides ~1M synthetic Salvadoran‑Spanish personas (148k records, ~300M tokens) grounded in 2024 census distributions for demographics, occupations and locations; intended for training/evaluating localized LLMs and synthetic-data workflows. CC BY 4.0, adults only.

#huggingface #nvidia #nlp #multilingual #llm+2

AI Model·2026

NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

NVIDIA Corporation

Open-weight frontier LLM for agentic reasoning and long-context analysis (up to 1M tokens). Uses a LatentMoE + Mamba-2 hybrid with Multi-Token Prediction and NVFP4 efficiency (550B total / 55B active). Suited for multilingual agents, RAG, and heavy tool-use workloads.

#nvidia #pytorch #transformers #LLM #multilingual+9

AI Model·2026

NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4

NVIDIA

Multilingual frontier LLM optimized for long-context reasoning and agentic workflows, combining a LatentMoE (Mamba-2 + MoE) hybrid architecture with Multi-Token Prediction and NVFP4 quantization; targeted for NVIDIA GPU deployments and governed by the OpenMDW-1.1 license.

#nvidia #pytorch #transformers #LLM #multilingual+8