GPT2: Language Models are Unsupervised Multitask Learners

This paper introduces GPT-2, showing that large-scale language models trained on diverse internet text can perform a wide range of natural language tasks in a zero-shot setting — without any task-specific training. By scaling up to 1.5 billion parameters and training on WebText, GPT-2 achieves state-of-the-art or competitive results on benchmarks like language modeling, reading comprehension, and question answering. Its impact has been profound, pioneering the trend toward general-purpose, unsupervised language models and paving the way for today’s foundation models in AI.

Visit Website

Introduction

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets. We demonstrate that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText. When conditioned on a document plus questions, the answers generated by the language model reach 55 F1 on the CoQA dataset - matching or exceeding the performance of 3 out of 4 baseline systems without using the 127,000+ training examples. The capacity of the language model is essential to the success of zero-shot task transfer and increasing it improves performance in a log-linear fashion across tasks. Our largest model, GPT-2, is a 1.5B parameter Transformer that achieves state of the art results on 7 out of 8 tested language modeling datasets in a zero-shot setting but still underfits WebText. Samples from the model reflect these improvements and contain coherent paragraphs of text. These findings suggest a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Back

Information

Websitecdn.openai.com
AuthorsAlec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever
Published date2019/02/14

More Items

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

2025

DeepSeek-AI, Aixin Liu +262

DeepSeek-V3.2 is an open large language model that balances high computational efficiency with superior reasoning and agent capabilities. Key innovations include DeepSeek Sparse Attention (DSA) for reduced complexity in long contexts, a scalable reinforcement learning framework achieving GPT-5-level performance, and a large-scale agentic task synthesis pipeline for improved generalization in tool-use scenarios.

deepseek LLM paper RL ai-agent

LightRAG

2024

Zirui Guo, Lianghao Xia +3

LightRAG is an open-source framework designed for simple and fast Retrieval-Augmented Generation (RAG), integrating knowledge graphs, vector search, and efficient LLM-based processing to enhance question-answering over large document collections.

RAG LLM NLP github ai-development+5

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

2024

John Yang, Carlos E. Jimenez +5

SWE-agent is a system designed to empower language model (LM) agents to autonomously perform software engineering tasks. It features a custom agent-computer interface (ACI) that enhances the agent's ability to navigate repositories, create and edit code, and execute programs, achieving state-of-the-art results on the SWE-bench and HumanEvalFix benchmarks. [2, 5, 8]

paper ai-agent LLM ai-coding engineering