AIAny - NLP

DeepSeek-V3 Technical Report

2024

DeepSeek-AI, Aixin Liu +198

This paper introduces DeepSeek-V3, a 671B-parameter Mixture-of-Experts (MoE) language model that activates only 37B parameters per token for efficient training and inference. By leveraging innovations like Multi-head Latent Attention, auxiliary-loss-free load balancing, and multi-token prediction, it achieves top-tier performance across math, code, multilingual, and reasoning tasks. Despite its massive scale, DeepSeek-V3 maintains economical training costs and outperforms all other open-source models, achieving results comparable to leading closed-source models like GPT-4o and Claude-3.5, thereby significantly narrowing the open-source vs. closed-source performance gap.

NLP LLM deepseek paper

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

2025

DeepSeek-AI, Daya Guo +198

This paper introduces DeepSeek-R1, a large language model that improves reasoning purely through reinforcement learning (RL), even without supervised fine-tuning. It shows that reasoning skills like chain-of-thought, self-reflection, and verification can naturally emerge from RL, achieving performance comparable to OpenAI’s top models. Its distilled smaller models outperform many open-source alternatives, democratizing advanced reasoning for smaller systems. The work impacts the field by proving RL-alone reasoning is viable and by open-sourcing both large and distilled models, opening new directions for scalable, cost-effective LLM training and future development in reasoning-focused AI systems.

NLP LLM deepseek paper

Tag

Explore by tags

All

30u30

ASR

ChatGPT

GNN

IDE

RAG

ai-agent

ai-api

ai-api-management

ai-client

ai-coding

ai-demos

ai-development

ai-framework

ai-image

ai-image-demos

ai-inference

ai-leaderboard

ai-library

ai-rank

ai-serving

ai-tools

ai-train

ai-video

ai-workflow

AIGC

alibaba

amazon

anthropic

audio

blog

book

bytedance

chatbot

chemistry

claude

course

deepmind

deepseek

engineering

foundation

foundation-model

gemini

github

google

gradient-booting

grok

huggingface

LLM

llm

math

mcp

mcp-client

mcp-server

meta-ai

microsoft

mlops

NLP

nvidia

ollama

openai

paper

physics

plugin

pytorch

RL

science

sora

translation

tutorial

vibe-coding

video

vision

xAI

xai

DeepSeek-V3 Technical Report

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning