Best learning resources for AI
CodeLayer is an open-source IDE that orchestrates AI coding agents to tackle hard problems in complex codebases. Built on Claude Code, it features battle-tested workflows, keyboard-first interfaces for speed, advanced context engineering for team scaling, and multi-Claude parallel sessions.
OpenCode is the open source AI coding agent built for the terminal. It features built-in agents (build, plan, general), LSP support, multi-provider LLM compatibility (75+ including Claude, GPT, Gemini, local), native TUI, multi-session, share links, and privacy-first design. No account needed, 35k+ stars.
LEANN is the world's smallest vector index, enabling RAG on everything from documents to chat histories with 97% storage savings, no accuracy loss, and 100% privacy on personal devices. It uses graph-based selective recomputation and high-degree preserving pruning for lightweight, scalable semantic search.
Magentic-UI is a research prototype from Microsoft Research for a human-centered AI web agent. It automates complex web and coding tasks while keeping users in control, revealing plans before execution, allowing guidance, and requiring approvals for sensitive actions. Key features include co-planning, action guards, plan learning, and integration with models like GPT-4o and Fara-7B.
WhisperLiveKit is an ultra-low-latency, self-hosted speech-to-text toolkit with speaker identification. Powered by leading simultaneous speech research like Simul-Whisper and WhisperStreaming, it enables intelligent buffering and incremental processing for real-time transcription, translation across 200 languages, and speaker diarization. Ideal for meeting notes, accessibility tools, and content creation.
VibeVoice is Microsoft's open-source frontier voice AI framework designed for generating expressive, long-form, multi-speaker conversational audio (e.g., podcasts) from text. It supports up to 90 minutes of speech with up to 4 distinct speakers. Key innovations include continuous speech tokenizers at 7.5 Hz frame rate and next-token diffusion using LLMs for context and high-fidelity acoustics. Recently released VibeVoice-Realtime-0.5B for real-time streaming TTS with ~300ms latency
Hello-Agents is a systematic open-source tutorial from the Datawhale community, dedicated to building AI Native Agents. It covers agent fundamentals, history, large language model basics, classic paradigms like ReAct, low-code platforms such as Coze, mainstream frameworks like LangGraph, and custom framework development. Advanced topics include memory and retrieval, context engineering, agent communication protocols, Agentic-RL training, and performance evaluation, culminating in case studies like intelligent travel assistants and cyber towns, transforming learners from LLM users to agent builders.
This paper demonstrates the zero-shot learning and reasoning abilities of the generative video model Veo 3, paralleling the evolution of Large Language Models (LLMs) in natural language processing. Veo 3 excels in diverse visual tasks without explicit training, such as object segmentation, edge detection, image editing, understanding physical properties, recognizing affordances, and simulating tool use, enabling early visual reasoning like maze solving and symmetry detection.
Tinker Cookbook is an open-source library from Thinking Machines Lab for customizing language models via the Tinker API. It offers realistic fine-tuning examples for supervised learning, reinforcement learning, chat, math reasoning, preference learning, tool use, prompt distillation, and multi-agent setups, along with utilities for rendering, hyperparameters, and evaluation.
A Next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visualization, with features like image replication, diagram history, interactive chat, AWS architecture support, and animated connectors.
LightX2V is an advanced lightweight video generation inference framework engineered to deliver efficient, high-performance video synthesis solutions. This unified platform integrates multiple state-of-the-art video generation techniques, supporting diverse generation tasks including text-to-video (T2V) and image-to-video (I2V). X2V represents the transformation of different input modalities (X, such as text or images) into video output (V).
Deepagents is an open-source agent harness built on LangChain and LangGraph. It equips agents with planning tools, a filesystem backend, sub-agent spawning, and middleware for handling complex, long-horizon tasks reliably and cost-effectively.