This paper demonstrates the zero-shot learning and reasoning abilities of the generative video model Veo 3, paralleling the evolution of Large Language Models (LLMs) in natural language processing. Veo 3 excels in diverse visual tasks without explicit training, such as object segmentation, edge detection, image editing, understanding physical properties, recognizing affordances, and simulating tool use, enabling early visual reasoning like maze solving and symmetry detection.