AIAny - EXO

What EXO is

EXO is an open-source system that pools compute and memory across multiple everyday devices (phones, laptops, desktops, Mac Studios, etc.) to run larger AI models and speed up inference by splitting work across the cluster. The project is maintained by exo labs and distributed via a GitHub repository.

Key features

Automatic Device Discovery: Devices running EXO discover each other automatically, requiring minimal manual setup. Each device exposes a local dashboard/API (default: http://localhost:52415) for cluster interaction.
RDMA over Thunderbolt: EXO ships with early (day-0) support for RDMA over Thunderbolt, significantly reducing inter-device latency and improving throughput when devices are connected via high-speed Thunderbolt links.
Topology-Aware Auto Parallel: EXO builds a realtime view of device topology (resources, latency, bandwidth) and uses that to decide how to split models across devices for best performance.
Tensor Parallelism & Sharding: Supports sharding of model tensors to run parts of a model across multiple devices; reported speedups include ~1.8x on 2 devices and ~3.2x on 4 devices under tensor-parallel configurations.
MLX Backend & Distributed Communication: Uses MLX as an inference backend and MLX distributed components for communication and coordination between nodes.

Typical workflows

Quick start (from source): clone the repo, build the dashboard, and run the EXO daemon. Example one-liner is provided in the README to get a cluster running quickly.
macOS app: EXO provides a macOS background app (requires macOS Tahoe 26.2 or later) for easier local use; the app may request permissions to modify network settings and install a network profile.

Hardware & platform support

macOS: EXO takes advantage of the GPU on macOS devices and integrates with Apple silicon hardware (benchmarks in the README show multi-M3 Ultra Mac Studio setups).
Linux: currently runs on CPU on Linux; additional accelerator support is actively being worked on.

Benchmarks & real-world use

The README includes external benchmark references (for example, multi M3 Ultra Mac Studio clusters running Qwen3-235B, DeepSeek, and other large models) demonstrating the ability to run very large models across aggregated VRAM and showing benefits from RDMA/Tensor Parallel setups.

Who it's for

EXO is aimed at developers, researchers, and enthusiasts who want to run large-model inference locally across multiple devices—especially users who prefer privacy, low-latency local deployments, or who want to leverage spare hardware rather than cloud GPUs.

Extensibility & contribution

The project is open-source (Apache-2.0) and encourages contributions. The README points to CONTRIBUTING.md for guidelines and notes that hardware accelerator support and platform improvements are on the roadmap.

EXO

Introduction

What EXO is

Key features

Typical workflows

Hardware & platform support

Benchmarks & real-world use

Who it's for

Extensibility & contribution

Information

Categories

Tags

More Items

Genesis

MemU

ms-swift (SWIFT: Scalable lightWeight Infrastructure for Fine-Tuning)