LogoAIAny
Icon for item

MiniMind

MiniMind is an open-source GitHub project that enables users to train a 26M-parameter tiny LLM from scratch in just 2 hours with a cost of 3 RMB. It provides native PyTorch implementations for Tokenizer training, pretraining, supervised fine-tuning (SFT), LoRA, DPO, PPO/GRPO reinforcement learning, and MoE architecture with vision multimodal extensions. It includes high-quality open datasets, supports single-GPU training, and is compatible with Transformers, llama.cpp, and other frameworks, ideal for LLM beginners.

Introduction

MiniMind: Training a Tiny LLM from Scratch

MiniMind is a comprehensive open-source project hosted on GitHub, designed to democratize the training of large language models (LLMs) by allowing users to build a functional 26M-parameter GPT-like model from absolute scratch using minimal resources—specifically, just 2 hours of training time on a single NVIDIA 3090 GPU and approximately 3 RMB in cloud computing costs. Launched by developer Jingyao Gong, this initiative addresses the inaccessibility of LLM development for individuals without access to massive computational resources or proprietary frameworks. By stripping away complex abstractions and providing raw, educational code, MiniMind serves as both a practical toolkit and a tutorial for understanding the inner workings of LLMs.

Core Features and Architecture

At its heart, MiniMind implements a lightweight Transformer-based decoder-only architecture inspired by models like GPT-3, Llama 3.1, and DeepSeek-V2. Key highlights include:

  • Model Variants: The project offers multiple configurations, such as MiniMind2-Small (26M parameters, 512 dimensions, 8 layers), MiniMind2 (104M parameters, 768 dimensions, 16 layers), and MiniMind2-MoE (145M parameters with Mixture-of-Experts for efficiency). These use RMSNorm pre-normalization, SwiGLU activation, and Rotary Position Embeddings (RoPE) for better long-sequence handling.

  • Tokenizer: A custom 6,400-vocabulary tokenizer is trained from scratch, balancing compression efficiency with model size. It avoids reliance on large vocabularies from models like Qwen or Llama, keeping the embedding layer lightweight.

  • Training Pipeline: Full end-to-end support covers all LLM training stages:

    • Pretraining: Unsupervised learning on ~1.6GB of high-quality Chinese corpus (e.g., from Jiangshu datasets) to learn language patterns.
    • Supervised Fine-Tuning (SFT): Instruction tuning on cleaned dialogue datasets (~1.2GB to 9GB) for chat capabilities
    • Parameter-Efficient Methods: LoRA for domain adaptation (e.g., medical or self-awareness fine-tuning) without full retraining.
    • Alignment Techniques: Direct Preference Optimization (DPO) for RLHF using preference pairs, and RLAIF methods like PPO, GRPO, and SPO for AI feedback-based reinforcement learning.
    • Knowledge Distillation: White-box and black-box methods to mimic larger models like Qwen2.5, including reasoning-focused distillation from DeepSeek-R1.
    • Multimodal Extension: MiniMind-V integrates vision capabilities via a separate repository.

The code is implemented natively in PyTorch without heavy dependencies on libraries like Transformers or TRL, though it remains compatible for easy integration. Training supports single/multi-GPU setups via DDP and DeepSpeed, with resume functionality, WandB/SwanLab logging, and dynamic batching. Datasets are pre-cleaned and open-sourced on ModelScope/HuggingFace, including pretrain_hq.jsonl (1.6GB), sft_mini_512.jsonl (1.2GB), and RLHF pairs like dpo.jsonl.

Use Cases and Accessibility

MiniMind lowers the barrier to LLM experimentation, making it feasible for hobbyists, students, and researchers with consumer hardware. For instance, training a basic 'Zero' chatbot model on pretrain_hq + sft_mini_512 datasets takes ~2.1 hours on a 3090, producing coherent Chinese dialogues on topics like history or daily queries. Advanced users can extend to English capabilities, long-context extrapolation via YaRN-scaled RoPE (up to 2048+ tokens), or deploy via OpenAI-compatible APIs, Streamlit UIs, or inference engines like vLLM, llama.cpp, Ollama, and MNN.

Evaluation shows MiniMind2 achieving ~25-26% on Chinese benchmarks (C-Eval, CMMLU), competitive with similarly sized models like SmolLM-135M or Aquila-135M, despite its tiny footprint. The project emphasizes educational value: users dissect every line of code, from attention mechanisms to RL objectives, fostering deep intuition over black-box usage.

Information

  • Websitegithub.com
  • AuthorsJingyao Gong
  • Published date2024/08/27

More Items