This paper introduces DeepSeek-R1, a large language model that improves reasoning purely through reinforcement learning (RL), even without supervised fine-tuning. It shows that reasoning skills like chain-of-thought, self-reflection, and verification can naturally emerge from RL, achieving performance comparable to OpenAI’s top models. Its distilled smaller models outperform many open-source alternatives, democratizing advanced reasoning for smaller systems. The work impacts the field by proving RL-alone reasoning is viable and by open-sourcing both large and distilled models, opening new directions for scalable, cost-effective LLM training and future development in reasoning-focused AI systems.