LogoAIAny
Icon for item

llama.cpp

GGML-based C/C++ implementation that runs LLaMA-family models locally with no dependencies.

Introduction

Overview

llama.cpp enables CPU-only inference via quantized GGUF weights and offers OpenAI-compatible HTTP and WebSocket servers.

Key Capabilities
  • int8/int4/quant-mulmat kernels in AVX2/AVX-VNNI/NEON
  • GPU offload (CUDA/Metal/OpenCL)
  • LoRA/QLoRA finetune utilities

Information

  • Websitegithub.com
  • Authorsggml-org
  • Published date2023/03/10

Categories