LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Learn Anything about AI in one site.

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2026 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • RAG

  • ai-agent

  • ai-api

  • ai-api-management

  • ai-client

  • ai-coding

  • ai-demos

  • ai-development

  • ai-framework

  • ai-image

  • ai-image-demos

  • ai-inference

  • ai-leaderboard

  • ai-library

  • ai-rank

  • ai-serving

  • ai-tools

  • ai-train

  • ai-video

  • ai-workflow

  • AIGC

  • alibaba

  • amazon

  • anthropic

  • audio

  • blog

  • book

  • bytedance

  • chatbot

  • chemistry

  • claude

  • course

  • deepmind

  • deepseek

  • engineering

  • foundation

  • foundation-model

  • gemini

  • github

  • google

  • gradient-booting

  • grok

  • huggingface

  • LLM

  • llm

  • math

  • mcp

  • mcp-client

  • mcp-server

  • meta-ai

  • microsoft

  • mlops

  • NLP

  • nvidia

  • ocr

  • ollama

  • openai

  • paper

  • physics

  • plugin

  • pytorch

  • RL

  • science

  • sora

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

  • xai

Icon for item

Docling

2024
Deep Search Team, IBM Research Zurich +1

Docling is an open-source document parsing and understanding library designed for generative-AI workflows. It processes many formats (PDF, DOCX, PPTX, HTML, images, audio, WebVTT), offers advanced PDF layout/table/code/formula understanding, OCR and ASR support, a unified document representation, multiple export formats, local execution for sensitive data, CLI, and integrations with popular agent/LLM frameworks. It also provides an MCP server for agentic usage.

ocrASRmcp-servermcpai-tools+6
Icon for item

Midscene.js

2024
web-infra-dev, Xiao Zhou +2

Midscene.js is an open-source JavaScript SDK and framework for vision-driven UI automation across web, Android, iOS and other interfaces. It uses visual-language models to localize and interact with UI purely from screenshots, lets you script automation via natural language, JavaScript or YAML, integrates with Puppeteer/Playwright or device bridges, and provides developer features such as caching, debugging replay, MCP services and zero-code browser extension.

visionai-agentai-toolsmcpgithub+4
Icon for item

SurfSense

2024
MODSetter

SurfSense is an open-source alternative to NotebookLM, Perplexity, and Glean. It integrates with personal knowledge bases and connects to external sources like search engines (SearxNG, Tavily, LinkUp), Slack, Linear, Jira, ClickUp, Confluence, BookStack, Gmail, Notion, YouTube, GitHub, Discord, Airtable, Google Calendar, Luma, Elasticsearch, and more. Key features include multi-file upload support (50+ formats), powerful search, natural language chat with citations, privacy-focused local LLM support (e.g., Ollama), self-hosting, team collaboration with RBAC, fast podcast generation, and advanced RAG techniques with 100+ LLMs, 6000+ embedding models, hierarchical indices, and hybrid search.

githubai-toolsRAGai-agentllm+3
Icon for item

BitNet (bitnet.cpp)

2024
Microsoft

BitNet (bitnet.cpp) is Microsoft's open-source inference framework for 1-bit large language models (LLMs). It provides optimized kernels for fast, lossless inference of 1.58-bit models on CPU (and GPU support added later), delivering substantial speed and energy improvements on ARM and x86. It integrates with Hugging Face models, includes build/run/benchmark tools, and aims to enable running large low-bit models locally (e.g., a 100B BitNet model on a single CPU at human-reading speeds).

microsoftgithubllmai-inferenceai-serving+4
Icon for item

NexaSDK

2024
NexaAI

NexaSDK is a cross‑platform developer toolkit and low‑level inference engine (NexaML) for running AI models locally on NPUs, GPUs and CPUs. It supports GGUF, MLX and .nexa model formats, provides Day‑0 support for new architectures, multimodal capabilities (text, vision, audio), mobile SDKs (Android/iOS), OpenAI‑compatible APIs, and optimized NPU support.

githubai-inferenceai-servingai-clientai-framework+5
Icon for item

MiniMind-V

2024
Jingyao Gong (jingyaogong)

MiniMind-V is an open-source tiny visual-language model (VLM) project that demonstrates how to train a 26M-parameter multimodal VLM from scratch quickly and cheaply (example: ~1 hour / single 3090 GPU and very low rental cost). The repo provides end-to-end code for data cleaning, pretraining, supervised fine-tuning (SFT), evaluation and demo, using CLIP as the visual encoder and MiniMind as the base LLM.

visionpytorchgithubllmai-train+2
Icon for item

olmOCR

2024
Allen Institute for AI (AI2), AllenNLP team

olmOCR is an open-source toolkit from the Allen Institute for AI (AI2) / AllenNLP team for converting image-based documents (PDF, PNG, JPEG) into clean, readable plain text or Markdown. It uses a 7B-parameter vision-language model to handle complex layouts, equations, tables and handwriting, removes headers/footers, and outputs text in natural reading order. The repo includes a processing pipeline, benchmark suite (olmOCR-Bench), training and RL components, Docker images, and an online demo. Licensed under Apache 2.0.

ocrvisionllmfoundation-modelhuggingface+4
Icon for item

Kortix (suna)

2024
Kortix (kortix-ai)

Kortix (suna) is a full platform for building, managing and training autonomous AI agents. It offers an agent builder, browser automation, file and data management, web intelligence, system operation tools, and integrations with LLM providers (OpenAI, Anthropic, LiteLLM). It supports self-hosting and scalable agent runtimes.

githubai-agentai-workflowmlopsai-development+2
Icon for item

DataFlow

2024
OpenDCAI, PKU-DCAI Research Team +6

DataFlow is an open-source, LLM-driven data preparation and workflow automation system from the OpenDCAI / PKU-DCAI team. It composes modular operators and pipelines to parse, generate, process, and evaluate high-quality training data from noisy sources (PDFs, plain text, low-quality QA) to improve domain-specific LLM performance. DataFlow includes many ready-made pipelines (Text, Reasoning, Text2SQL, Knowledge-Base Cleaning, Agentic RAG), a DataFlow Agent that auto-assembles pipelines, and a large operator library for filtering, synthesis, evaluation and more. It is distributed via GitHub and PyPI (open-dataflow) and comes with documentation, Colab demos, and Docker images for easy use.

LLMRAGai-workflowai-agentai-train+3
Icon for item

VideoCaptioner

2024
WEIFENG2333

VideoCaptioner is an AI-powered video subtitling assistant that combines ASR (local or cloud) with LLM-based subtitle segmentation, correction and translation. It supports offline GPU transcription, concurrent chunk transcription, VAD, speaker-aware processing, batch subtitling and one-click subtitle-to-video synthesis, with both GUI and CLI options.

videoai-videoaudioASRLLM+3
Icon for item

Awesome-ML-SYS-Tutorial

2024
zhaochenyang20

A GitHub repository of learning notes and code dedicated to ML + SYS (machine learning systems). It collects tutorials, code walkthroughs and engineering notes on RLHF, distributed training (FSDP, Megatron), inference and scheduling (SGLang, vllm), quantization, CUDA/GPU optimization, system design, and practical engineering.

githubmlopsai-trainpytorchLLM+6
Icon for item

Open Notebook

2024
Luis Novo

Open Notebook is an open-source, privacy-focused alternative to Google's NotebookLM. It enables local, multi-model AI usage with support for 16+ providers, multi-modal content organization (PDFs, videos, audio, web pages), professional podcast generation, intelligent search, and context-aware chats.

githubai-toolsai-clientai-apiRAG+1
  • Previous
  • 1
  • More pages
  • 9
  • 10
  • 11
  • More pages
  • 17
  • Next