AIAny - Domain Arithmetic: One-Shot VLA Adaptation under Environmental Shifts

Introduction

VLA policies often fail when deployment environments differ from pretraining — new camera viewpoints or a different but similar robot can break a previously working skill. Collecting multiple demonstrations per task for each target environment is expensive. The core insight of DART is that domain shifts can be treated as additive directions in weight space: by extracting a domain-specific weight vector from a single demonstration and adding it (after denoising) to the pretrained policy, you can transfer behavior without gradient-based fine-tuning.

Key Findings

Analogy-based weight arithmetic: DART constructs an additive domain vector from one demonstration and applies it to the frozen pretrained VLA weights, enabling immediate task adaptation without policy gradient updates. This means adaptation is fast and requires minimal data.
Subspace alignment for denoising: to avoid adding task-irrelevant noise, DART aligns singular components between source and target weight vectors and removes components not consistent across domains. The result is more stable, targeted adaptation compared to naive weight additions.
Broad shift coverage: evaluated on simulated and real-robot benchmarks, DART consistently outperforms prior one-shot and few-shot VLA adaptation baselines across camera-pose changes and cross-embodiment transfers, demonstrating practical gains when collecting many demos is infeasible.

Who it's for and trade-offs

Great fit if you need rapid adaptation of a pretrained VLA policy to a new camera configuration or a similar robot with only one demonstration and you can access model weights. DART reduces data-collection and compute needs compared to fine-tuning or online adaptation. Look elsewhere if target tasks require fundamentally different action distributions, if you cannot access or modify model weights, or if extreme domain gaps make linear additive assumptions invalid.

Where it fits

DART sits between full fine-tuning / adapter-based adaptation and zero-adaptation baselines: it preserves the pretrained policy by injecting a compact, denoised domain correction in weight space. Compared to LoRA/meta-generated adapters or online self-distillation, DART emphasizes single-demo, analogy-style transfer with lightweight arithmetic operations rather than iterative optimization.

Domain Arithmetic: One-Shot VLA Adaptation under Environmental Shifts

Introduction

Key Findings

Who it's for and trade-offs

Where it fits

Information

Categories

Tags

More Items

PerceptionRubrics: Calibrating Multimodal Evaluation to Human Perception

LingBot-Map

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models