LogoAIAny
  • Search
  • Collection
  • Category
  • Tag
  • Blog
LogoAIAny

Tag

Explore by tags

LogoAIAny

Learn Anything about AI in one site.

support@aiany.app
Product
  • Search
  • Collection
  • Category
  • Tag
Resources
  • Blog
Company
  • Privacy Policy
  • Terms of Service
  • Sitemap
Copyright © 2025 All Rights Reserved.
  • All

  • 30u30

  • ASR

  • ChatGPT

  • GNN

  • IDE

  • RAG

  • ai-agent

  • ai-api

  • ai-api-management

  • ai-client

  • ai-coding

  • ai-demos

  • ai-development

  • ai-framework

  • ai-image

  • ai-image-demos

  • ai-inference

  • ai-leaderboard

  • ai-library

  • ai-rank

  • ai-serving

  • ai-tools

  • ai-train

  • ai-video

  • ai-workflow

  • AIGC

  • alibaba

  • amazon

  • anthropic

  • audio

  • blog

  • book

  • bytedance

  • chatbot

  • chemistry

  • claude

  • course

  • deepmind

  • deepseek

  • engineering

  • foundation

  • foundation-model

  • gemini

  • github

  • google

  • gradient-booting

  • grok

  • huggingface

  • LLM

  • llm

  • math

  • mcp

  • mcp-client

  • mcp-server

  • meta-ai

  • microsoft

  • mlops

  • NLP

  • nvidia

  • ollama

  • openai

  • paper

  • physics

  • plugin

  • pytorch

  • RL

  • science

  • sora

  • translation

  • tutorial

  • vibe-coding

  • video

  • vision

  • xAI

  • xai

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

2024
DeepSeek-AI, Aixin Liu +155

This paper presents DeepSeek-V2, a 236B-parameter open-source Mixture-of-Experts (MoE) language model that activates only 21B parameters per token, achieving top-tier bilingual (English and Chinese) performance with remarkable training cost savings (42.5%) and inference efficiency (5.76× throughput) compared to previous models. Its innovations—Multi-head Latent Attention (MLA) and DeepSeekMoE—reduce memory bottlenecks and boost specialization. The paper’s impact lies in advancing economical, efficient large-scale language modeling, pushing open-source models closer to closed-source leaders, and paving the way for future multimodal and AGI-aligned systems.

LLMNLPdeepseekpaper

DeepSeek-V3 Technical Report

2024
DeepSeek-AI, Aixin Liu +198

This paper introduces DeepSeek-V3, a 671B-parameter Mixture-of-Experts (MoE) language model that activates only 37B parameters per token for efficient training and inference. By leveraging innovations like Multi-head Latent Attention, auxiliary-loss-free load balancing, and multi-token prediction, it achieves top-tier performance across math, code, multilingual, and reasoning tasks. Despite its massive scale, DeepSeek-V3 maintains economical training costs and outperforms all other open-source models, achieving results comparable to leading closed-source models like GPT-4o and Claude-3.5, thereby significantly narrowing the open-source vs. closed-source performance gap.

NLPLLMdeepseekpaper

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

2025
DeepSeek-AI, Daya Guo +198

This paper introduces DeepSeek-R1, a large language model that improves reasoning purely through reinforcement learning (RL), even without supervised fine-tuning. It shows that reasoning skills like chain-of-thought, self-reflection, and verification can naturally emerge from RL, achieving performance comparable to OpenAI’s top models. Its distilled smaller models outperform many open-source alternatives, democratizing advanced reasoning for smaller systems. The work impacts the field by proving RL-alone reasoning is viable and by open-sourcing both large and distilled models, opening new directions for scalable, cost-effective LLM training and future development in reasoning-focused AI systems.

NLPLLMdeepseekpaper
  • Previous
  • 1
  • 2
  • 3
  • 4
  • Next