Search
Collection
Category
Tag
Blog

AIAny

Learn Anything about AI in one site

Best learning resources for AI

AIAny

Learn Anything about AI in one site.

support@aiany.app

Product

Search
Collection
Category
Tag

Resources

Blog

Company

Privacy Policy
Terms of Service
Sitemap

AIAny

Search
Collection
Category
Tag
Blog

AIAny

Learn Anything about AI in one site

Best learning resources for AI

Sora

2024

OpenAI

OpenAI's video generation model.Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.

ai-tools ai-video vision

500 AI Agents Projects

2025

ashishpatel26

The 500 AI Agents Projects is a curated collection of AI agent use cases across various industries. It showcases practical applications and provides links to open-source projects for implementation, illustrating how AI agents are transforming sectors such as healthcare, finance, education, retail, and more.

ai-agent github tutorial ai-development ai-framework+1

Agent Starter Pack

2025

Google Cloud Platform

Agent Starter Pack is a Python package providing production-ready templates for GenAI agents on Google Cloud. Focus on your agent logic—the starter pack provides everything else: infrastructure, CI/CD, observability, and security. Includes pre-built templates like ReAct, RAG, multi-agent, and more.

google ai-agent gemini RAG mlops+1

CodeLayer

2025

HumanLayer Team

CodeLayer is an open-source IDE that orchestrates AI coding agents to tackle hard problems in complex codebases. Built on Claude Code, it features battle-tested workflows, keyboard-first interfaces for speed, advanced context engineering for team scaling, and multi-Claude parallel sessions.

ai-coding ai-agent claude github IDE+1

OpenCode

2025

sst, thdxr

OpenCode is the open source AI coding agent built for the terminal. It features built-in agents (build, plan, general), LSP support, multi-provider LLM compatibility (75+ including Claude, GPT, Gemini, local), native TUI, multi-session, share links, and privacy-first design. No account needed, 35k+ stars.

ai-coding ai-agent github ai-client llm

LEANN

2025

Yichuan Wang, Zhifei Li +1

LEANN is the world's smallest vector index, enabling RAG on everything from documents to chat histories with 97% storage savings, no accuracy loss, and 100% privacy on personal devices. It uses graph-based selective recomputation and high-degree preserving pruning for lightweight, scalable semantic search.

RAG github ai-library ai-tools ai-development+5

Magentic-UI

2025

Microsoft Research

Magentic-UI is a research prototype from Microsoft Research for a human-centered AI web agent. It automates complex web and coding tasks while keeping users in control, revealing plans before execution, allowing guidance, and requiring approvals for sensitive actions. Key features include co-planning, action guards, plan learning, and integration with models like GPT-4o and Fara-7B.

ai-agent microsoft github LLM

WhisperLiveKit

2025

QuentinFuxa

WhisperLiveKit is an ultra-low-latency, self-hosted speech-to-text toolkit with speaker identification. Powered by leading simultaneous speech research like Simul-Whisper and WhisperStreaming, it enables intelligent buffering and incremental processing for real-time transcription, translation across 200 languages, and speaker diarization. Ideal for meeting notes, accessibility tools, and content creation.

github ai-tools ai-inference ai-serving ASR+2

VibeVoice

2025

Microsoft

VibeVoice is Microsoft's open-source frontier voice AI framework designed for generating expressive, long-form, multi-speaker conversational audio (e.g., podcasts) from text. It supports up to 90 minutes of speech with up to 4 distinct speakers. Key innovations include continuous speech tokenizers at 7.5 Hz frame rate and next-token diffusion using LLMs for context and high-fidelity acoustics. Recently released VibeVoice-Realtime-0.5B for real-time streaming TTS with ~300ms latency

microsoft audio github huggingface paper+2

Hello-Agents: 从零开始构建智能体

2025

Datawhale Community, Si Zhou Chen +2

Hello-Agents is a systematic open-source tutorial from the Datawhale community, dedicated to building AI Native Agents. It covers agent fundamentals, history, large language model basics, classic paradigms like ReAct, low-code platforms such as Coze, mainstream frameworks like LangGraph, and custom framework development. Advanced topics include memory and retrieval, context engineering, agent communication protocols, Agentic-RL training, and performance evaluation, culminating in case studies like intelligent travel assistants and cyber towns, transforming learners from LLM users to agent builders.

tutorial ai-agent github LLM course+2

Video models are zero-shot learners and reasoners

2025

Thaddäus Wiedemer, Yuxuan Li +7

This paper demonstrates the zero-shot learning and reasoning abilities of the generative video model Veo 3, paralleling the evolution of Large Language Models (LLMs) in natural language processing. Veo 3 excels in diverse visual tasks without explicit training, such as object segmentation, edge detection, image editing, understanding physical properties, recognizing affordances, and simulating tool use, enabling early visual reasoning like maze solving and symmetry detection.

video vision LLM paper ai-video+3

Tinker Cookbook

2025

Thinking Machines Lab

Tinker Cookbook is an open-source library from Thinking Machines Lab for customizing language models via the Tinker API. It offers realistic fine-tuning examples for supervised learning, reinforcement learning, chat, math reasoning, preference learning, tool use, prompt distillation, and multi-agent setups, along with utilities for rendering, hyperparameters, and evaluation.