LMDeploy

Toolkit from InternLM for compressing, quantizing and serving LLMs with INT4/INT8 kernels on GPUs.

Visit Website

Introduction

Overview

LMDeploy integrates Patched-Triton back-end and web UI, delivering 10-15× speed-ups on InternLM and other models.

Key Capabilities

PTQ & AWQ quantization flows
Multi-GPU tensor & pipeline parallel
OpenAI-compatible FastAPI server

Back

Information

Websitelmdeploy.readthedocs.io
AuthorsInternLM Team
Published date2023/05/09

More Items

Cloudflare Vibe SDK

2024

Cloudflare

Cloudflare VibeSDK is an open-source full-stack AI webapp generator built on Cloudflare's developer platform. It enables users to describe apps in natural language, with AI agents creating and deploying React + TypeScript + Tailwind applications. Key features include phase-wise code generation, live previews in sandboxed containers, interactive chat, and one-click deployment to Workers for Platforms.

ai-coding ai-agent ai-development LLM

Foundry Local

2024

Microsoft

Foundry Local is an open-source tool from Microsoft that enables running generative AI models on local devices without an Azure subscription. It supports on-device processing for privacy and security, integrates models via an OpenAI-compatible API, and optimizes performance using ONNX Runtime and hardware acceleration.

microsoft ai-inference ai-serving llm ai-client+2

ONNX

2017

ONNX Project Contributors, Meta (Facebook) +1

ONNX (Open Neural Network Exchange) is an open ecosystem that provides an open source format for AI models, including deep learning and traditional ML. It defines an extensible computation graph model, built-in operators, and standard data types, focusing on inferencing capabilities. Widely supported across frameworks and hardware, it enables interoperability and accelerates AI innovation.

ai-framework mlops ai-inference ai-serving pytorch+2