LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Learn Anything about AI in one site.

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2026 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • RAG

  • ai-agent

  • ai-api

  • ai-api-management

  • ai-client

  • ai-coding

  • ai-demos

  • ai-development

  • ai-framework

  • ai-image

  • ai-image-demos

  • ai-inference

  • ai-leaderboard

  • ai-library

  • ai-rank

  • ai-serving

  • ai-tools

  • ai-train

  • ai-video

  • ai-workflow

  • AIGC

  • alibaba

  • amazon

  • anthropic

  • audio

  • blog

  • book

  • bytedance

  • chatbot

  • chemistry

  • claude

  • course

  • deepmind

  • deepseek

  • engineering

  • foundation

  • foundation-model

  • gemini

  • github

  • google

  • gradient-booting

  • grok

  • huggingface

  • LLM

  • llm

  • math

  • mcp

  • mcp-client

  • mcp-server

  • meta-ai

  • microsoft

  • mlops

  • NLP

  • nvidia

  • ocr

  • ollama

  • openai

  • paper

  • physics

  • plugin

  • pytorch

  • RL

  • science

  • sora

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

  • xai

Icon for item

vLLM

2023
Woosuk Kwon, Zhuohan Li +7

vLLM is a high-throughput, memory-efficient inference and serving engine for large language models (LLMs), built to deliver state-of-the-art performance on GPUs with features such as PagedAttention and continuous batching.

ai-developmentai-libraryai-inferenceai-serving
Icon for item

KTransformers

2024
MADSys Lab, Tsinghua University, Approaching.AI +17

KTransformers is a flexible framework for experiencing cutting-edge optimizations in LLM inference and fine-tuning, focusing on CPU-GPU heterogeneous computing. It consists of two core modules: kt-kernel for high-performance inference kernels and kt-sft for fine-tuning. The project supports various hardware and models like DeepSeek series, Kimi-K2, achieving significant resource savings and speedups, such as reducing GPU memory for a 671B model to 70GB and up to 28x acceleration.

githubllmai-inferenceai-trainai-framework+3
Icon for item

SGLang

2024
LMSYS Org

SGLang is a high-performance serving framework for large language models (LLMs) and vision-language models, designed for low-latency and high-throughput inference across single GPUs to large distributed clusters. Key features include RadixAttention for prefix caching, zero-overhead batch scheduling, prefill-decode disaggregation, speculative decoding, continuous batching, paged attention, tensor/pipeline/expert/data parallelism, structured outputs, chunked prefill, and quantization (FP4/FP8/INT4/AWQ/GPTQ). It supports a wide range of models like Llama, Qwen, DeepSeek, and hardware from NVIDIA, AMD, Intel, TPUs, with an intuitive frontend for LLM applications.

llmai-servingai-inferencenvidiapytorch+3
Icon for item

Ollama

2023
Jeffrey Morgan, Michael Chiang

A lightweight open-source platform for running, managing, and integrating large language models locally via a simple CLI and REST API.

ai-developmentai-libraryai-inferenceai-servingLLM
Icon for item

TensorFlow Serving

2016
Google

An open-source, production-ready system for serving machine-learning models at scale.

ai-developmentai-libraryai-inferenceai-servinggoogle
Icon for item

TensorRT

2016
NVIDIA

NVIDIA TensorRT is an SDK and tool-suite that compiles and optimizes trained neural-network models for ultra-fast, low-latency inference on NVIDIA GPUs.

ai-developmentai-libraryai-inferenceai-servingnvidia
Icon for item

ONNX

2017
ONNX Project Contributors, Meta (Facebook) +1

ONNX (Open Neural Network Exchange) is an open ecosystem that provides an open source format for AI models, including deep learning and traditional ML. It defines an extensible computation graph model, built-in operators, and standard data types, focusing on inferencing capabilities. Widely supported across frameworks and hardware, it enables interoperability and accelerates AI innovation.

ai-frameworkmlopsai-inferenceai-servingpytorch+2
Icon for item

PaddleOCR

2019
PaddlePaddle

PaddleOCR is an industry-leading, production-ready OCR and document AI engine developed by the PaddlePaddle team. It supports over 100 languages and converts PDFs or image documents into structured AI-friendly data (e.g., JSON and Markdown), bridging the gap between images/PDFs and LLMs. Key features include multilingual support, high accuracy, handwriting recognition, and advanced document parsing for elements like tables, formulas, and charts, with end-to-end tools for training, inference, and deployment.

githubai-toolsai-imagevisionai-inference+2
Icon for item

ExecuTorch

2022
PyTorch

ExecuTorch is PyTorch’s unified on-device AI deployment solution for mobile, embedded, and edge devices. It enables direct export from PyTorch, ahead-of-time compilation, quantization and hardware partitioning to produce compact runtime programs (.pte) that run across many backends (XNNPACK, Vulkan, CoreML, Qualcomm, etc.). It supports LLMs, vision, speech and multimodal models with a small runtime footprint and production tools for profiling, memory planning, and selective operator builds.

pytorchai-inferenceai-servingai-frameworkmlops+3
Icon for item

FunASR

2022
Alibaba DAMO Academy, Northwestern Polytechnical University (NWPU) +5

FunASR is an open-source end-to-end speech recognition toolkit (ASR) led by Alibaba DAMO Academy. It supports ASR, voice activity detection (VAD), punctuation restoration, speaker verification/diarization, multi-talker ASR, emotion recognition and more. FunASR provides many industrial-grade pretrained models, inference scripts, and deployment runtimes for research and production use.

ASRaudiopytorchai-libraryhuggingface+4
Icon for item

LocalAI

2023
Ettore Di Giacinto

LocalAI is a free, Open Source OpenAI alternative. It acts as a drop-in replacement REST API that's compatible with OpenAI API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio, and more locally or on-prem with consumer-grade hardware, supporting multiple model families without requiring a GPU.

githubai-inferenceai-servingopenaiai-tools+4
Icon for item

BISHENG

2023
DataElement (dataelement)

BISHENG is an open LLM application DevOps platform for enterprise AI applications. It provides GenAI workflows, RAG, agents, unified model management, evaluation, SFT, dataset management, enterprise-grade system management and observability. Key strengths are a powerful visual workflow/orchestration system (with loops, parallelism, human-in-the-loop), high-precision document parsing, and features targeted at production enterprise deployments.

mlopsllmRAGai-workflowai-agent+5
  • Previous
  • 1
  • 2
  • 3
  • 4
  • Next