LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Learn Anything about AI in one site.

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2026 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • RAG

  • ai-agent

  • ai-api

  • ai-api-management

  • ai-client

  • ai-coding

  • ai-demos

  • ai-development

  • ai-framework

  • ai-image

  • ai-image-demos

  • ai-inference

  • ai-leaderboard

  • ai-library

  • ai-rank

  • ai-serving

  • ai-tools

  • ai-train

  • ai-video

  • ai-workflow

  • AIGC

  • alibaba

  • amazon

  • anthropic

  • audio

  • blog

  • book

  • bytedance

  • chatbot

  • chemistry

  • claude

  • course

  • deepmind

  • deepseek

  • engineering

  • foundation

  • foundation-model

  • gemini

  • github

  • google

  • gradient-booting

  • grok

  • huggingface

  • LLM

  • llm

  • math

  • mcp

  • mcp-client

  • mcp-server

  • meta-ai

  • microsoft

  • mlops

  • NLP

  • nvidia

  • ocr

  • ollama

  • openai

  • paper

  • physics

  • plugin

  • pytorch

  • RL

  • science

  • sora

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

  • xai

Icon for item

aisuite

2024
andrewyng

aisuite is a lightweight Python library that provides a unified API for working with multiple Generative AI providers. It supports models from OpenAI, Anthropic, Google, Hugging Face, AWS, Cohere, Mistral, Ollama, and others—abstracting away SDK differences, authentication details, and parameter variations. Modeled after OpenAI’s API style, it enables developers to build LLM-based or agentic applications across providers with minimal setup.

ai-libraryai-apillmai-clientmcp+5
Icon for item

CosyVoice (Fun-CosyVoice)

2024
FunAudioLLM

CosyVoice (Fun-CosyVoice) is a multilingual, LLM-based text-to-speech (TTS) system that provides end-to-end capabilities for training, inference and deployment. It focuses on zero-shot voice cloning, strong content consistency, speaker similarity and natural prosody, supports many languages and Chinese dialects, pronunciation inpainting, text normalization, and low-latency bi-streaming for production use.

audioLLMhuggingfacepytorchai-inference+2
Icon for item

Docling

2024
Deep Search Team, IBM Research Zurich +1

Docling is an open-source document parsing and understanding library designed for generative-AI workflows. It processes many formats (PDF, DOCX, PPTX, HTML, images, audio, WebVTT), offers advanced PDF layout/table/code/formula understanding, OCR and ASR support, a unified document representation, multiple export formats, local execution for sensitive data, CLI, and integrations with popular agent/LLM frameworks. It also provides an MCP server for agentic usage.

ocrASRmcp-servermcpai-tools+6
Icon for item

BitNet (bitnet.cpp)

2024
Microsoft

BitNet (bitnet.cpp) is Microsoft's open-source inference framework for 1-bit large language models (LLMs). It provides optimized kernels for fast, lossless inference of 1.58-bit models on CPU (and GPU support added later), delivering substantial speed and energy improvements on ARM and x86. It integrates with Hugging Face models, includes build/run/benchmark tools, and aims to enable running large low-bit models locally (e.g., a 100B BitNet model on a single CPU at human-reading speeds).

microsoftgithubllmai-inferenceai-serving+4
Icon for item

NexaSDK

2024
NexaAI

NexaSDK is a cross‑platform developer toolkit and low‑level inference engine (NexaML) for running AI models locally on NPUs, GPUs and CPUs. It supports GGUF, MLX and .nexa model formats, provides Day‑0 support for new architectures, multimodal capabilities (text, vision, audio), mobile SDKs (Android/iOS), OpenAI‑compatible APIs, and optimized NPU support.

githubai-inferenceai-servingai-clientai-framework+5
Icon for item

MiniMind-V

2024
Jingyao Gong (jingyaogong)

MiniMind-V is an open-source tiny visual-language model (VLM) project that demonstrates how to train a 26M-parameter multimodal VLM from scratch quickly and cheaply (example: ~1 hour / single 3090 GPU and very low rental cost). The repo provides end-to-end code for data cleaning, pretraining, supervised fine-tuning (SFT), evaluation and demo, using CLIP as the visual encoder and MiniMind as the base LLM.

visionpytorchgithubllmai-train+2
Icon for item

olmOCR

2024
Allen Institute for AI (AI2), AllenNLP team

olmOCR is an open-source toolkit from the Allen Institute for AI (AI2) / AllenNLP team for converting image-based documents (PDF, PNG, JPEG) into clean, readable plain text or Markdown. It uses a 7B-parameter vision-language model to handle complex layouts, equations, tables and handwriting, removes headers/footers, and outputs text in natural reading order. The repo includes a processing pipeline, benchmark suite (olmOCR-Bench), training and RL components, Docker images, and an online demo. Licensed under Apache 2.0.

ocrvisionllmfoundation-modelhuggingface+4
Icon for item

Awesome-ML-SYS-Tutorial

2024
zhaochenyang20

A GitHub repository of learning notes and code dedicated to ML + SYS (machine learning systems). It collects tutorials, code walkthroughs and engineering notes on RLHF, distributed training (FSDP, Megatron), inference and scheduling (SGLang, vllm), quantization, CUDA/GPU optimization, system design, and practical engineering.

githubmlopsai-trainpytorchLLM+6
Icon for item

UI-TARS Desktop

2025
ByteDance

UI-TARS Desktop is a native desktop GUI agent by ByteDance that enables multimodal, vision-language-driven control of local and remote computers and browsers. It provides precise mouse/keyboard control, screenshot-based visual recognition, cross-platform support, and integration with the Agent TARS ecosystem and MCP tools. It focuses on private/local processing and building human-like task completion workflows.

bytedancegithubai-agentvisionLLM+7
Icon for item

ComfyUI-WanVideoWrapper

2025
Kijai

ComfyUI-WanVideoWrapper is a collection of custom ComfyUI nodes that wrap WanVideo (Wan2.1 and related models) and related components, enabling easier loading and use of WanVideo-based video generation models inside ComfyUI. It supports multiple model formats, memory/VRAM optimizations, and provides example workflows and compatibility patches.

githubai-videovideoai-toolspytorch+3
Icon for item

MLX LM

2025
ml-explore (GitHub organization)

MLX LM is a Python package to run, generate with, and fine-tune large language models on Apple Silicon using MLX. It integrates with the Hugging Face Hub, supports quantization and uploading of models, low-rank and full-model fine-tuning (including for quantized models), distributed inference and training, streaming generation, sampling/custom logits processors, prompt caching, and a convenient CLI and Python API.

llmhuggingfacegithubai-libraryai-inference+4
Icon for item

Chatterbox TTS

2025
Resemble AI

Chatterbox is an open-source family of state-of-the-art text-to-speech models from Resemble AI. It includes Chatterbox-Turbo (a 350M-parameter efficient model with paralinguistic tags and single-step mel decoding), Chatterbox, and a multilingual model supporting 23+ languages. Designed for low-latency voice agents, narration, and creative workflows; includes built-in PerTh watermarking and demo/Hub integrations.

audiogithubai-toolspytorchhuggingface
  • Previous
  • 1
  • 2
  • 3
  • 4
  • 5
  • Next