DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

A family of open code models (1.3B-33B) trained from scratch on 2T tokens of project-level code, using a 16K-window fill-in-the-blank objective. Beats Codex and GPT-3.5 on code benchmarks and ships under a license permitting commercial use.

Visual Explainer Visit Website

Introduction

Most code LLMs in early 2024 either stayed closed (Codex, GPT-3.5) or trailed them by a wide margin. DeepSeek-Coder's bet is that data organization matters as much as scale: instead of treating code as isolated files, it builds a corpus at the repository level, preserving cross-file dependencies that single-file training silently discards.

Key Findings

Project-level corpus construction lets the model reason across files in a repository, not just within a single snippet — the gap most code benchmarks fail to capture but real engineering depends on.
A fill-in-the-blank (FIM) objective with a 16K context window targets infilling and completion directly, which matters more for IDE-style use than left-to-right generation alone.
Trained from scratch on 2 trillion tokens, the 33B model surpasses open peers and closed models like Codex and GPT-3.5 across multiple benchmarks — evidence that a focused code-first pretraining recipe closes the open/closed gap.
The permissive license allows unrestricted commercial use, removing the usual barrier for teams that cannot adopt research-only weights.

Methodology

The approach combines repository-level data assembly (so the model sees realistic dependency structure) with a next-token plus FIM training mix, then scales the same recipe across the 1.3B-to-33B range. This makes the family a study in how far careful corpus design and objective choice can carry a code model, rather than relying on parameter count alone.

Who It's For

Great fit if you are evaluating self-hostable code models for completion or infilling, want commercially usable weights, or care about cross-file reasoning over toy single-function benchmarks. Look elsewhere if you need a general-purpose chat assistant — these are code-specialized base and instruct models, and later DeepSeek releases (V2/V3) supersede them on raw capability.

Back

Information

Websitearxiv.org
OrganizationsDeepSeek-AI
Published date2024/01/25

More Items

Large Language Model Papers2026

DecoEvo: Score-Decoupled Co-Evolution of Solver and Rubric-Generator Skills in Text Space

Jiangwang Chen, Zixin Song +11Tsinghua University, Qwen Business Unit of Alibaba +2

Co-evolves a solver skill and a rubric-generator skill for text-space LLM optimization under decoupled objectives to avoid rubric gaming without using gold rubrics. Solver updates use criterion-level feedback; generator updates use independent audits of requirement coverage and response discrimination.

LLM evaluation agent-skills qwen paper+2

AI Video Papers2026

Mage-VL: An Efficient Codec-Native Streaming Multimodal Foundation Model

Senqiao Yang, Kaichen Zhang +21

Real-time streaming multimodal foundation model that uses a codec-native tokenizer (Mage-ViT) to encode motion- and residual-rich regions from video I/P frames, reducing visual token usage by over 75% and enabling up to ~3.5× wall-clock inference speedup after training on ~560M images and 100M video frames.

multimodal video vision foundation-model ai+5

Large Language Model Papers2026

Kimi K3: Open Frontier Intelligence

Kimi Team, Tongtong Bai +400

Presents a 2.8T-parameter Mixture-of-Experts multimodal model with a 1-million-token context window and 104 billion activated parameters, targeting long-horizon agentic RL, coding, reasoning, and vision. Key innovations include Kimi Delta Attention, Attention Residuals, Stable LatentMoE (16 of 896 experts active per token), ~2.5× scaling efficiency over Kimi K2, and a public weight release.

kimi foundation-model llm multimodal vision+6