Trains LLMs with reinforcement learning using a surface chrF reward so models learn to extract and apply linguistic signals from rich context for translating completely unseen languages. Demonstrates better zero-shot translation than in-context learning or supervised fine-tuning, framing outcome-based RL as a meta-skill for language learning from context.