LogoAIAny
Icon for item

LMDeploy

Toolkit from InternLM for compressing, quantizing and serving LLMs with INT4/INT8 kernels on GPUs.

Introduction

Overview

LMDeploy integrates Patched-Triton back-end and web UI, delivering 10-15× speed-ups on InternLM and other models.

Key Capabilities
  • PTQ & AWQ quantization flows
  • Multi-GPU tensor & pipeline parallel
  • OpenAI-compatible FastAPI server

Information

Categories