Best learning resources for AI
OpenAI's video generation model.Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.
The 500 AI Agents Projects is a curated collection of AI agent use cases across various industries. It showcases practical applications and provides links to open-source projects for implementation, illustrating how AI agents are transforming sectors such as healthcare, finance, education, retail, and more.
Agent Starter Pack is a Python package providing production-ready templates for GenAI agents on Google Cloud. Focus on your agent logic—the starter pack provides everything else: infrastructure, CI/CD, observability, and security. Includes pre-built templates like ReAct, RAG, multi-agent, and more.
CodeLayer is an open-source IDE that orchestrates AI coding agents to tackle hard problems in complex codebases. Built on Claude Code, it features battle-tested workflows, keyboard-first interfaces for speed, advanced context engineering for team scaling, and multi-Claude parallel sessions.
OpenCode is the open source AI coding agent built for the terminal. It features built-in agents (build, plan, general), LSP support, multi-provider LLM compatibility (75+ including Claude, GPT, Gemini, local), native TUI, multi-session, share links, and privacy-first design. No account needed, 35k+ stars.
LEANN is the world's smallest vector index, enabling RAG on everything from documents to chat histories with 97% storage savings, no accuracy loss, and 100% privacy on personal devices. It uses graph-based selective recomputation and high-degree preserving pruning for lightweight, scalable semantic search.
Magentic-UI is a research prototype from Microsoft Research for a human-centered AI web agent. It automates complex web and coding tasks while keeping users in control, revealing plans before execution, allowing guidance, and requiring approvals for sensitive actions. Key features include co-planning, action guards, plan learning, and integration with models like GPT-4o and Fara-7B.
WhisperLiveKit is an ultra-low-latency, self-hosted speech-to-text toolkit with speaker identification. Powered by leading simultaneous speech research like Simul-Whisper and WhisperStreaming, it enables intelligent buffering and incremental processing for real-time transcription, translation across 200 languages, and speaker diarization. Ideal for meeting notes, accessibility tools, and content creation.
VibeVoice is Microsoft's open-source frontier voice AI framework designed for generating expressive, long-form, multi-speaker conversational audio (e.g., podcasts) from text. It supports up to 90 minutes of speech with up to 4 distinct speakers. Key innovations include continuous speech tokenizers at 7.5 Hz frame rate and next-token diffusion using LLMs for context and high-fidelity acoustics. Recently released VibeVoice-Realtime-0.5B for real-time streaming TTS with ~300ms latency
Hello-Agents is a systematic open-source tutorial from the Datawhale community, dedicated to building AI Native Agents. It covers agent fundamentals, history, large language model basics, classic paradigms like ReAct, low-code platforms such as Coze, mainstream frameworks like LangGraph, and custom framework development. Advanced topics include memory and retrieval, context engineering, agent communication protocols, Agentic-RL training, and performance evaluation, culminating in case studies like intelligent travel assistants and cyber towns, transforming learners from LLM users to agent builders.
This paper demonstrates the zero-shot learning and reasoning abilities of the generative video model Veo 3, paralleling the evolution of Large Language Models (LLMs) in natural language processing. Veo 3 excels in diverse visual tasks without explicit training, such as object segmentation, edge detection, image editing, understanding physical properties, recognizing affordances, and simulating tool use, enabling early visual reasoning like maze solving and symmetry detection.
Tinker Cookbook is an open-source library from Thinking Machines Lab for customizing language models via the Tinker API. It offers realistic fine-tuning examples for supervised learning, reinforcement learning, chat, math reasoning, preference learning, tool use, prompt distillation, and multi-agent setups, along with utilities for rendering, hyperparameters, and evaluation.