LogoAIAny
Icon for item

Foundry Local

Foundry Local is an open-source tool from Microsoft that enables running generative AI models on local devices without an Azure subscription. It supports on-device processing for privacy and security, integrates models via an OpenAI-compatible API, and optimizes performance using ONNX Runtime and hardware acceleration.

Introduction

Foundry Local: Local AI Model Execution Made Simple

Foundry Local is a powerful, open-source initiative by Microsoft designed to democratize access to advanced AI capabilities by allowing users to run generative AI models directly on their local hardware. Launched as part of the Azure AI ecosystem but strikingly independent, it eliminates the need for an Azure subscription, making high-quality AI inference accessible to developers, researchers, and enthusiasts without cloud dependencies. This tool is particularly valuable in scenarios where data privacy is paramount, latency must be minimized, or connectivity is limited, such as edge computing, mobile applications, or secure enterprise environments.

Core Features and Capabilities

At its heart, Foundry Local focuses on on-device inference, ensuring that all data processing occurs locally to enhance privacy and security. Users can deploy and interact with a variety of pre-optimized models without worrying about data transmission to external servers. The tool supports an OpenAI-compatible API, which means existing applications built with popular SDKs (like those from OpenAI) can seamlessly integrate with local models by simply pointing to the local endpoint.

Performance is a key pillar of Foundry Local. It leverages ONNX Runtime, a high-performance inference engine, combined with hardware acceleration for CPUs, GPUs (e.g., NVIDIA CUDA), and specialized NPUs (e.g., Qualcomm). When you run a model, Foundry Local intelligently selects the best variant based on your hardware—downloading CUDA-optimized versions for NVIDIA GPUs or CPU-optimized ones for standard setups. This automatic optimization reduces setup friction and maximizes efficiency.

Supported Models and Flexibility

Foundry Local comes with a catalog of readily available models, including popular ones like Phi-3.5-mini and Qwen2.5-0.5B. You can explore and list models using simple CLI commands like foundry model ls. Beyond pre-compiled models, advanced users can convert their own Hugging Face models for local execution, broadening its utility for custom AI workflows.

The tool is versatile for various use cases:

  • Privacy-Focused Applications: Ideal for handling sensitive data in healthcare, finance, or personal assistants where cloud uploads are not feasible.
  • Low-Latency Edge Computing: Perfect for IoT devices, real-time chatbots, or on-device transcription without network delays.
  • Development and Prototyping: Enables rapid iteration on AI features before scaling to production, with no ongoing cloud costs.
  • Model Versatility: Supports chat completions, audio transcription, and more through SDKs in C#, Python, and JavaScript.
Getting Started: Installation and Usage

Installation is straightforward and platform-specific. On Windows, use Winget: winget install Microsoft.FoundryLocal. For macOS (Apple Silicon only), employ Homebrew: brew install microsoft/foundrylocal/foundrylocal. Manual downloads are also available from GitHub releases for both architectures (x64/arm64 on Windows, Apple Silicon on macOS).

Once installed, launch your first model with a single command: foundry model run phi-3.5-mini. This auto-downloads the model if needed and starts an interactive chat session. Models are cached locally for future use, and the CLI provides commands for management, like listing (foundry model ls) or downloading specific variants.

SDK Integration for Developers

For deeper integration, Foundry Local offers SDKs across languages:

  • C#: Available via NuGet (dotnet add package Microsoft.AI.Foundry.Local.WinML). It includes self-contained APIs for chat completions and transcription without external dependencies. Example code allows listing models, downloading, loading, and streaming responses.
  • Python: Install via PyPI (pip install foundry-local-sdk openai). Use the OpenAI SDK to query the local endpoint, with automatic model selection by alias.
  • JavaScript: Via npm (npm install foundry-local-sdk openai). Supports async initialization and streaming completions.

These SDKs handle service startup, model loading, and API calls internally, making it easy to embed AI into apps.

Management and Maintenance

Upgrading is simple: Use winget upgrade on Windows or brew upgrade on macOS. Uninstallation follows similar patterns. For troubleshooting, consult the official docs or GitHub issues. The project is in preview, so community feedback is encouraged via GitHub.

Why Choose Foundry Local?

In a world dominated by cloud-based AI, Foundry Local stands out by bridging the gap to local execution without sacrificing usability or performance. It's development-friendly, cost-effective (no subscriptions), and future-proof with ongoing support for new models and hardware. Whether you're building prototypes or deploying secure AI solutions, Foundry Local empowers you to harness generative AI where it matters most—right on your device.

For more, visit the documentation or join the Discord community.

Information

  • Websitegithub.com
  • AuthorsMicrosoft
  • Published date2024/10/01