NVIDIA’s model-parallel training library for GPT-like transformers at multi-billion-parameter scale.
Open-source framework for building, shipping and running containerized AI services with a single command.
Netflix’s human-centric framework for building and operating real-life data-science and ML workflows with idiomatic Python and production-grade scaling.
A Kubernetes-native workflow engine (originally at Lyft, now LF AI & Data) that provides strongly-typed, versioned data/ML pipelines at scale.
A PyTorch-based system for large-scale model parallel training, memory optimization, and heterogeneous acceleration.
An extensible open-source MLOps framework that lets teams design portable, reproducible pipelines decoupled from infra stacks.
A Yunshan Networks open-source observability stack that delivers zero-code eBPF-based tracing, metrics and continuous profiling for cloud-native & AI workloads.
Open-source framework that provides composable building blocks to create, orchestrate and monitor LLM-powered applications and agents.
Hugging Face’s Rust + Python server for high-throughput, multi-GPU text generation.
Data framework that connects large-language models to private or enterprise data via indexing, retrieval and agent orchestration.
Memory layer that lets AI agents remember users and context across sessions.
Open-source community and framework researching the scaling laws of multi-agent systems.