AIAny - AI Dataset

GLM-5.2 Agent traces

2026

AletheiaResearch, TeichAI

Provides 319 newline-delimited JSON agent session traces captured from GLM-5.2 using Teich for training agentic models. Preserves reasoning-first assistant fragments, tool-call events, and a dataset-level training-ready tool schema; convertible to OpenAI-style JSONL for SFT/distillation.

llm ai-agent huggingface json ai-train+1

AgentWorldBench

2026

Qwen

Provides 2,170 reference-grounded evaluation samples across seven agent domains (MCP, Search, Terminal, SWE, Android, Web, OS) to score language world models on Format, Factuality, Consistency, Realism and Quality. Includes per-domain JSONL files, judge prompts and an evaluation script for reproducible scoring.

qwen evaluation huggingface ai-agent agent-skills+6

SVG Generation Benchmark (Static)

2026

Rapidata

Compares 30 frontier LLMs generating static SVG markup from 500 prompts using 1,355,161 human votes across three leaderboards (Preference, Coherence, Alignment); provides raw SVGs, 768×768 rasterized PNGs, and per-comparison human vote records under a CC-BY-4.0 prompt license.

evaluation ai-image image llm huggingface+2

IFStruct v1.0

2026

Liquid AI

Measures whether models produce valid JSON/YAML that strictly follow a requested schema across diverse, naturally phrased prompts. Contains 2,000 frozen test prompts with binary structural validation (no constrained decoding), focusing on schema compliance and edge cases like escaping, wrapper keys, and fenced code blocks.

huggingface json evaluation pandas polars+3

Vāgdhenu — Sanskrit Chant Corpus

2026

prathoshap

Provides ~1,467 single-speaker Sanskrit chant audio clips (≈5.3 hours) with aligned transcripts and prosodic metadata for meter-aware TTS training. Two recording/config styles (style_a/style_b), 24 kHz mono WAVs, metadata includes Devanagari, SLP1, Kannada text, meter, duration, session/take. CC-BY-4.0.

tts audio speech huggingface nlp+1

Metacognition-Bench

2026

ginigen-ai, FINAL-Bench +1

Provides 300 adversarial "metacognitive-trap" problems to measure whether LLMs notice and recover from their own reasoning errors. Combines multiple-choice vulnerability tests with free-form adapter-gain evaluation and ships per-model metacognition adapters for frozen-base probing.

evaluation ai-leaderboard llm nlp huggingface+2

Gaming Dataset (gaming-1)

2026

markov-ai

Provides ~494.7 hours of trimmed native PC/console gameplay screen recordings organized by game, with per-session clips plus input and per-frame event annotations. Each workflow includes clip.mp4, events.json, frame_events.json, and metadata — suitable for training vision-action, behavior-cloning, and gameplay understanding models.

video ai-video vision multimodal agent-skills+5

Category

Explore by categories

All Categories

AI Leaderboard

AI Agent Tutorials

AI Coding Tutorials

AI Model

AI Agent Papers

Chatbot

AI Dataset

Machine Learning Foundation Books

AI Train

AI Deploy

AI Client

Machine Learning Foundation Papers

Machine Learning Foundation Tutorials

AI Image Demos

AI Agent

Large Language Model Tutorials

Large Language Model Papers

Machine Learning Engineering Papers

Computer Vision Tutorials

Computer Vision Papers

Natural Language Processing Papers

Reinforcement Learning Papers

Speech Technology Papers

AI API

AI Coding

AI Image

AI Video

MLOps

MCP Client

MCP Server

AI Video Papers

AI Audio

AI Others

AI Infra

Embodied AI

GLM-5.2 Agent traces

AgentWorldBench

SVG Generation Benchmark (Static)

IFStruct v1.0

Vāgdhenu — Sanskrit Chant Corpus

Metacognition-Bench

Gaming Dataset (gaming-1)