Search
Collection
Category
Tag
Blog

AIAny

Tag

Explore by tags

AIAny

Learn Anything about AI in one site.

support@aiany.app

Product

Search
Collection
Category
Tag

Resources

Blog

Company

Privacy Policy
Terms of Service
Sitemap

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ocr

ollama

openai

paper

physics

plugin

pytorch

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

aisuite

2024

andrewyng

aisuite is a lightweight Python library that provides a unified API for working with multiple Generative AI providers. It supports models from OpenAI, Anthropic, Google, Hugging Face, AWS, Cohere, Mistral, Ollama, and others—abstracting away SDK differences, authentication details, and parameter variations. Modeled after OpenAI’s API style, it enables developers to build LLM-based or agentic applications across providers with minimal setup.

ai-library ai-api llm ai-client mcp+5

CosyVoice (Fun-CosyVoice)

2024

FunAudioLLM

CosyVoice (Fun-CosyVoice) is a multilingual, LLM-based text-to-speech (TTS) system that provides end-to-end capabilities for training, inference and deployment. It focuses on zero-shot voice cloning, strong content consistency, speaker similarity and natural prosody, supports many languages and Chinese dialects, pronunciation inpainting, text normalization, and low-latency bi-streaming for production use.

audio LLM huggingface pytorch ai-inference+2

Docling

2024

Deep Search Team, IBM Research Zurich +1

Docling is an open-source document parsing and understanding library designed for generative-AI workflows. It processes many formats (PDF, DOCX, PPTX, HTML, images, audio, WebVTT), offers advanced PDF layout/table/code/formula understanding, OCR and ASR support, a unified document representation, multiple export formats, local execution for sensitive data, CLI, and integrations with popular agent/LLM frameworks. It also provides an MCP server for agentic usage.

ocr ASR mcp-server mcp ai-tools+6

BitNet (bitnet.cpp)

2024

Microsoft

BitNet (bitnet.cpp) is Microsoft's open-source inference framework for 1-bit large language models (LLMs). It provides optimized kernels for fast, lossless inference of 1.58-bit models on CPU (and GPU support added later), delivering substantial speed and energy improvements on ARM and x86. It integrates with Hugging Face models, includes build/run/benchmark tools, and aims to enable running large low-bit models locally (e.g., a 100B BitNet model on a single CPU at human-reading speeds).

microsoft github llm ai-inference ai-serving+4

NexaSDK

2024

NexaAI

NexaSDK is a cross‑platform developer toolkit and low‑level inference engine (NexaML) for running AI models locally on NPUs, GPUs and CPUs. It supports GGUF, MLX and .nexa model formats, provides Day‑0 support for new architectures, multimodal capabilities (text, vision, audio), mobile SDKs (Android/iOS), OpenAI‑compatible APIs, and optimized NPU support.

github ai-inference ai-serving ai-client ai-framework+5

MiniMind-V

2024

Jingyao Gong (jingyaogong)

MiniMind-V is an open-source tiny visual-language model (VLM) project that demonstrates how to train a 26M-parameter multimodal VLM from scratch quickly and cheaply (example: ~1 hour / single 3090 GPU and very low rental cost). The repo provides end-to-end code for data cleaning, pretraining, supervised fine-tuning (SFT), evaluation and demo, using CLIP as the visual encoder and MiniMind as the base LLM.

vision pytorch github llm ai-train+2

olmOCR

2024

Allen Institute for AI (AI2), AllenNLP team

olmOCR is an open-source toolkit from the Allen Institute for AI (AI2) / AllenNLP team for converting image-based documents (PDF, PNG, JPEG) into clean, readable plain text or Markdown. It uses a 7B-parameter vision-language model to handle complex layouts, equations, tables and handwriting, removes headers/footers, and outputs text in natural reading order. The repo includes a processing pipeline, benchmark suite (olmOCR-Bench), training and RL components, Docker images, and an online demo. Licensed under Apache 2.0.

ocr vision llm foundation-model huggingface+4

Awesome-ML-SYS-Tutorial

2024

zhaochenyang20

A GitHub repository of learning notes and code dedicated to ML + SYS (machine learning systems). It collects tutorials, code walkthroughs and engineering notes on RLHF, distributed training (FSDP, Megatron), inference and scheduling (SGLang, vllm), quantization, CUDA/GPU optimization, system design, and practical engineering.

github mlops ai-train pytorch LLM+6

UI-TARS Desktop

2025

ByteDance

UI-TARS Desktop is a native desktop GUI agent by ByteDance that enables multimodal, vision-language-driven control of local and remote computers and browsers. It provides precise mouse/keyboard control, screenshot-based visual recognition, cross-platform support, and integration with the Agent TARS ecosystem and MCP tools. It focuses on private/local processing and building human-like task completion workflows.

bytedance github ai-agent vision LLM+7

ComfyUI-WanVideoWrapper

2025

Kijai

ComfyUI-WanVideoWrapper is a collection of custom ComfyUI nodes that wrap WanVideo (Wan2.1 and related models) and related components, enabling easier loading and use of WanVideo-based video generation models inside ComfyUI. It supports multiple model formats, memory/VRAM optimizations, and provides example workflows and compatibility patches.

github ai-video video ai-tools pytorch+3

MLX LM

2025

ml-explore (GitHub organization)

MLX LM is a Python package to run, generate with, and fine-tune large language models on Apple Silicon using MLX. It integrates with the Hugging Face Hub, supports quantization and uploading of models, low-rank and full-model fine-tuning (including for quantized models), distributed inference and training, streaming generation, sampling/custom logits processors, prompt caching, and a convenient CLI and Python API.

llm huggingface github ai-library ai-inference+4

Chatterbox TTS

2025

Resemble AI

Chatterbox is an open-source family of state-of-the-art text-to-speech models from Resemble AI. It includes Chatterbox-Turbo (a 350M-parameter efficient model with paralinguistic tags and single-step mel decoding), Chatterbox, and a multilingual model supporting 23+ languages. Designed for low-latency voice agents, narration, and creative workflows; includes built-in PerTh watermarking and demo/Hub integrations.

audio github ai-tools pytorch huggingface