Search
Collection
Category
Tag
Daily AI

AIAny

Category

Explore by categories

AIAny

Curated AI Resources for Everyone

[email protected]

Product

Search
Collection
Category
Tag

Resources

Blog

Company

Privacy Policy
Terms of Service
Sitemap

All Categories

AI Leaderboard

AI Agent Tutorials

AI Coding Tutorials

AI Model

AI Agent Papers

Chatbot

AI Dataset

Machine Learning Foundation Books

AI Train

AI Deploy

AI Client

Machine Learning Foundation Papers

Machine Learning Foundation Tutorials

AI Image Demos

AI Agent

Large Language Model Tutorials

Large Language Model Papers

Machine Learning Engineering Papers

Computer Vision Tutorials

Computer Vision Papers

Natural Language Processing Papers

Reinforcement Learning Papers

Speech Technology Papers

AI API

AI Coding

AI Image

AI Video

MLOps

MCP Client

MCP Server

AI Video Papers

AI Audio

AI Others

AI Infra

Embodied AI

lazarus19/Vibe-Coding-Instruct

2026

lazarus19

A JSON dataset of ~1.1M anonymized coding-assistant instruction→response interactions for training and evaluating code-generation and instruction-following models; packaged for use with pandas/polars and sized at ~459 MB.

vibe-coding huggingface pandas polars ai-coding+2

Vibe-Coding-Instruct

2026

CodeDevX, Hugging Face

Curates ~1.1M instruction–response examples for 'vibe coding' scenarios where developers prompt LLMs to produce implementation plans, architecture choices, and deployment steps. Covers conversation memory, prompt templates, model routing, streaming responses, and scaling considerations; Apache-2.0.

vibe-coding ai-coding code json pandas+5

Vibe-Coding-Claude-Fable-5

2026

lazarus19

A JSON-format text dataset of 'vibe-coding' prompt–response examples sized in the 1M–10M category. Packaged for Hugging Face Datasets with pandas/polars-ready structure; useful for fine-tuning or evaluation but lacks an explicit license and detailed provenance.

huggingface pandas polars nlp llm+2

lordx64/agentic-distill-fable-5-sft

2026

lordx64, Glint-Research

lordx64

Provides 4,659 agentic single-turn SFT training pairs extracted from Claude Fable‑5, formatted as a single-column parquet for Qwen-style fine-tuning. Includes explicit chain-of-thought (<think>) blocks, XML-serialized <tool_use> calls, PII redaction, and AGPL-3.0 licensing.

huggingface claude qwen anthropic ai-agent+5

Glint-Research/Fable-5-traces

2026

Glint-Research

A collection of 953 JSON-formatted Fable 5 interaction traces (includes chain-of-thought entries), published on Hugging Face under AGPL-3.0 — meant for fine-tuning or analyzing LLM behavior but subject to license and provenance constraints.

huggingface pandas nlp llm ai-train+1

ABC-130k

2026

XDOF, UC Berkeley +3

Provides 130k+ bimanual teleoperation trajectories for robot imitation learning, recorded on low-cost YAM two-arm rigs and shared as MCAP episodes with subtask annotations, training code, and checkpoints.

robotics video huggingface ai-train ai-development

Nemotron-Personas-Belgium

2026

Pleias, NVIDIA Corporation +1

Pieter Delobelle, Pierre-Carl Langlais +12

Provides 1.8M synthetic Belgian personas (1.2M records; 300k per language) in Dutch/French/German/English, grounded in Belgian census distributions to improve representativeness for LLM training and evaluation. Includes 23 persona and contextual fields, CC BY 4.0 license, produced with NeMo Data Designer.

nvidia huggingface nlp LLM gemma+3

Claude Fable-5 Code

2026

PawanKrd

Contains 603 coding and math prompt–response pairs produced by Claude Fable‑5 (generated 2026-06-10), provided as a JSONL subset for fine-tuning, evaluation, and behavior analysis. Responses are 'non-thinking' (no chain-of-thought); small, anonymized, and lacking an explicit license.

code claude claude-code huggingface json+4

arXiv LaTeX Source Dataset

2026

scholarweave

Provides pre-parsed arXiv LaTeX source files aligned with official metadata as ready-to-query Parquet rows. Bundles each paper's .tex/.bib/.sty etc. into a single readable tree, updates monthly, and simplifies large-scale access for LLM pretraining, document understanding, and citation analysis while requiring adherence to original arXiv licenses.

paper huggingface science llm ai-development+2

DECOMEG — Brain Activity During Typing (MEG & EEG)

2026

Basque Center on Cognition, Brain and Language (BCBL), HybridMojo LLC +1

Jarod Lévy, Mingfang Zhang +5

Provides de-identified MEG and EEG recordings of 35 native Spanish speakers typing memorized sentences, with synchronized behavioral logs and standardized event tables. Includes raw .fif and BrainVision files plus MATLAB logs (≈262 GB total); released under CC BY-NC 4.0 for non-commercial research on brain-to-text decoding.

huggingface science ai nlp python

Complete FABLE.5 Traces 2M

2026

Crownelius

Provides a deduplicated 2.0M-row corpus of FABLE.5 / Mythos agent traces with row-level provenance and session-limit rows removed. Includes canonical Parquet and gzip JSONL exports, SHA256 row hashes, and provenance fields for tracing first-source datasets.

huggingface claude-code vibe-coding llm nlp+4

AFTER

2026

Julia Belikova, Rauf Parchiev +5

Benchmark for evaluating procedural skill evolution in LLM agents: isolates reusable skill bodies, role-specific work surfaces, and hidden oracle assets to measure whether skill refinements transfer across tasks, roles, and model backbones. Includes 382 workplace tasks, 22 skills, and a controlled evaluation protocol.

evaluation agent-skills huggingface llm ai-agent+2