tinygrad — a minimal, hackable deep learning stack
tinygrad is an intentionally small, end-to-end deep learning framework that implements the core pieces you need to build and understand modern neural networks. It provides a familiar tensor API with autograd, an intermediate representation (IR) and simple compiler for kernel fusion and lowering, a TinyJit-style JIT and graph execution system, and basic neural network building blocks (layers, optimizers, datasets) so you can run real training loops.
Key features
- Tensor library with autograd: eager Tensor API with gradients and common ops, modeled for ergonomics similar to PyTorch.
- IR and compiler: multiple lowering passes, kernel fusion, and a small scheduler that makes generated kernels visible and editable — great for learning how computation is lowered to device code.
- JIT / graph execution: function-level JIT that captures and replays kernels, enabling laziness and fusion in many cases.
- NN / optim / datasets: simple layers, optimizers (e.g., Adam), and utilities to run training loops and small experiments.
- Multi-accelerator support: out-of-the-box support for a wide range of backends (CPU, OpenCL, Metal, CUDA, AMD, NV, QCOM, WebGPU), and it is straightforward to add new backends because the required primitive set is small.
Design goals and tradeoffs
tinygrad is explicitly not trying to be a full-featured industrial framework. Its goals are:
- Education and transparency: keep the compiler, IR, and runtime small and readable so users can inspect and learn from them.
- Hackability: encourage experimentation — the entire pipeline is visible and easy to modify.
- Practicality: include enough functionality (autograd, optimizers, dataloaders) to run real experiments and small training jobs.
Because of these goals, tinygrad omits some large-scale features present in major frameworks (e.g., full vmap/pmap from JAX, exhaustive distributed training stacks) in favor of simplicity and clarity.
Typical usage
You can write training loops similar to PyTorch's. Example (simplified):
from tinygrad import Tensor, nn
class LinearNet:
def __init__(self):
self.l1 = Tensor.kaiming_uniform(784, 128)
self.l2 = Tensor.kaiming_uniform(128, 10)
def __call__(self, x:Tensor) -> Tensor:
return x.flatten(1).dot(self.l1).relu().dot(self.l2)
model = LinearNet()
optim = nn.optim.Adam([model.l1, model.l2], lr=0.001)
x, y = Tensor.rand(4, 1, 28, 28), Tensor([2,4,3,7])
with Tensor.train():
for i in range(10):
optim.zero_grad()
loss = model(x).sparse_categorical_crossentropy(y).backward()
optim.step()
print(i, loss.item())You can also enable debug output to inspect generated kernels and the lowering process.
Accelerators
tinygrad implements backends for many devices and exposes a small set of low-level ops that a device backend must implement. This makes it easy to add or experiment with new accelerators. Supported backends include (but are not limited to): CPU, OpenCL, Metal, CUDA, AMD, NV, QCOM, and WebGPU.
Installation and development
The recommended installation path is from source:
git clone https://github.com/tinygrad/tinygrad.git
cd tinygrad
python3 -m pip install -e .Or install directly from the repository:
python3 -m pip install git+https://github.com/tinygrad/tinygrad.gitThe project includes tests and CI; contributors are encouraged to add tests for bug fixes or features. There are contribution guidelines in the repository to help PRs get reviewed.
Who should use tinygrad
- Learners and researchers who want a compact, readable implementation of a deep learning stack and compiler.
- Developers who want to prototype novel compiler/IR or backend ideas without the complexity of large systems.
- Engineers looking to experiment with accelerator backends or to understand how high-level tensor ops map to device kernels.
Relationship to other projects
- PyTorch: similar eager Tensor API and training loop ergonomics, but tinygrad keeps the compiler/runtime visible and minimal.
- JAX: tinygrad adopts IR-based autodiff and a simple function-level JIT, but it has fewer functional transforms (no full vmap/pmap yet).
- TVM: shares ideas around lowering, scheduling and kernel generation, but tinygrad also ships the front-end tensor/nn API, making it a compact end-to-end system.
Where to learn more
- Repository: https://github.com/tinygrad/tinygrad
- Documentation and quickstart: https://docs.tinygrad.org/
- Project/organization site: https://tinygrad.org/
Overall, tinygrad is a compact, practical codebase for understanding and experimenting with the core pieces of modern deep learning systems: tensors, autograd, IR, JIT, and code generation.
