Tag

Explore by tags

AI Dataset2026

GoLongRL (Kwai-Klear)

RL training dataset for long-context language-model fine-tuning with ~23K samples and nine reward types, provided in Parquet with bilingual ground-truth and reward metadata for direct RL/bench evaluation.

huggingface RL LLM NLP paper+2

AI Dataset2026

openbmb/UltraData-SFT-2605

openbmb

Supervised fine-tuning dataset of instruction-style examples in English and Chinese covering generation, QA, reasoning, math and code — targeted for SFT of 10–100B-parameter LLMs. Associated with arXiv:2602.09003; first published May 21, 2026.

llm huggingface multilingual math code+4

AI Model2026

Qwopus3.6-27B-v2-MTP

Jack Rong (Jackrong)

Fine-tuned reasoning model that speeds up structured multi-step outputs using Multi-Token Prediction (MTP) from a Qwen3.6-27B base. Produces more concise, faster generations for coding, DevOps, math, and constrained-format tasks; experimental community release for research and evaluation.

huggingface transformers llm ai-train ai-inference+5

AI Dataset2026

Qwen-Image-Bench

Qwen

Creator-centric benchmark for evaluating text-to-image models with 1,000 bilingual prompts and a 3-level, 56-facet taxonomy. Includes a trained Q-Judger judge model and leaderboard-ready evaluation scripts to surface gaps in real-world fidelity and creative generation.

huggingface ai-image image vision multilingual+3

AI Dataset2026

tran-vi-teacher

ngocdang83

Parallel Chinese→Vietnamese dataset of webnovel (xianxia) text provided in JSON for NMT training and teacher-student distillation. In-domain, ~100K–1M examples with CC-BY-4.0 license — useful for fine-tuning or distillation experiments but limited by narrow genre and small download footprint.

translation huggingface nlp multilingual pandas+1

AI Model2026

google/gemma-4-12B-it

Google DeepMind

Instruction-tuned, unified Gemma 4 12B multimodal model that accepts text, image and audio inputs and generates text outputs locally. Encoder-free design reduces multimodal latency and fits on consumer devices while offering long-context support and native thinking/system-prompt features.

gemma google deepmind multimodal transformers+5

AI Model2026

Gemma 4 12B Unified

Google DeepMind

A 12B unified, encoder-free multimodal model that directly ingests text, images and audio and returns text; supports very long contexts (up to 256K tokens), native function-calling/thinking modes, and small-model deployment for local or on-device use.

gemma multimodal transformers google deepmind+8

AI Dataset2026

StreamAudio-2M

zhifeixie

Large streaming-audio dataset for training and evaluating audio-LLMs and audio agents. About 2.28M clips grouped into multi-turn “streams” across six task subsets (ASR, speech translation, audio understanding, voice chat, proactive response, environment-aware); audio shipped as tar shards.

audio ASR translation speech voice+2

AI Audio2026

MOSS-TTS-v1.5

OpenMOSS-Team

Generates multilingual text-to-speech with zero-shot voice cloning, token-level duration control, and inline pause markers. v1.5 improves multilingual fidelity (with language tags), cloning stability, and long-reference handling—suitable for research and production TTS pipelines.

speech audio voice multilingual huggingface+2

AI Model2026

PaddleOCR-VL-1.6

PaddlePaddle

Performs image-to-text document parsing and OCR for complex elements (tables, formulas, charts, seals), with multilingual support (en/zh). It uses region-aware data optimization and progressive post-training to improve weak-region supervision and is plug-and-play compatible with PaddleOCR-VL-1.5.

ocr multimodal vision image multilingual+5

AI Model2026

LFM2.5-8B-A1B

Liquid AI

Hybrid LFM2.5 text-generation model optimized for on-device assistants and agentic workflows — 8.3B total / 1.5B active parameters with 131,072-token context. Prioritizes low-latency, high-throughput inference and multilingual instruction-following; not optimized for pure heavy programming or knowledge-heavy QA without retrieval.

llm transformers huggingface multilingual vllm+5

AI Model2026

Step-3.7-Flash (GGUF quantizations)

stepfun-ai

GGUF quantizations of Step-3.7-Flash: a sparse multimodal Mixture-of-Experts LLM with native image understanding, selectable reasoning levels, and a 256K context window. Ships multiple calibrated Q3/Q4/IQ quant files plus an mmproj vision projector for local llama.cpp inference on high-memory hosts.

huggingface llm vision multilingual ai-inference+4

Tag

Explore by tags

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

agent-skills

ai

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-deploy

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

algorithms

alibaba

amazon

android

anthropic

audio

aws

benchmark

biology

blog

book

bytedance

chatbot

chatgpt

chemistry

claude

claude-code

cli

code

codex

copilot

course

cuda

cursor

deepmind

deepseek

depth

devops

diffusers

docker

drug-discovery

electron

embeddings

engineering

evaluation

facebook

finance

flow-matching

foundation

foundation-model

gemini

gemini-cli

gemma

genomics

gitHub

github