This site is under active construction, and contents are subject to change.
The Spring 2025 offering of the course is archived here.
Please fill out this form to apply for enrollment in CS336.
Applications are due by March 15 at 11:59 PM, and we will notify you of our decision by March 22 at 11:59 PM.
Due to the compute requirements for this class, we unfortunately have to limit enrollment.
Please submit the form using your Stanford email address.

Course Staff

Logistics

Content

What is this course about?

Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks. As the field of artificial intelligence (AI), machine learning (ML), and NLP continues to grow, possessing a deep understanding of language models becomes essential for scientists and engineers alike. This course is designed to provide students with a comprehensive understanding of language models by walking them through the entire process of developing their own. Drawing inspiration from operating systems courses that create an entire operating system from scratch, we will lead students through every aspect of language model creation, including data collection and cleaning for pre-training, transformer model construction, model training, and evaluation before deployment.

Prerequisites

Note that this is a 5-unit class. This is a very implementation-heavy class, so please allocate enough time for it.


Coursework

Assignments

  • Assignment 1: Basics (version from 2025)
    • Implement all of the components (tokenizer, model architecture, optimizer) necessary to train a standard Transformer language model.
    • Train a minimal language model.
  • Assignment 2: Systems (version from 2025)
    • Profile and benchmark the model and layers from Assignment 1 using advanced tools, optimize Attention with your own Triton implementation of FlashAttention2.
    • Build a memory-efficient, distributed version of the Assignment 1 model training code.
  • Assignment 3: Scaling (version from 2025)
    • Understand the function of each component of the Transformer.
    • Query a training API to fit a scaling law to project model scaling.
  • Assignment 4: Data (version from 2025)
    • Convert raw Common Crawl dumps into usable pretraining data.
    • Perform filtering and deduplication to improve model performance.
  • Assignment 5: Alignment and Reasoning RL (version from 2025)
    • Apply supervised finetuning and reinforcement learning to train LMs to reason when solving math problems.
    • Optional Part 2 (version from 2025): implement and apply safety alignment methods such as DPO.
All (currently tentative) deadlines are listed in the schedule.

Honor code

Like all other classes at Stanford, we take the student Honor Code seriously. Please respect the following policies:
  • Collaboration: Study groups are allowed, but students must understand and complete their own assignments, and hand in one assignment per student. If you worked in a group, please put the names of the members of your study group at the top of your assignment. Please ask if you have any questions about the collaboration policy.
  • AI tools: Prompting LLMs such as ChatGPT is permitted for low-level programming questions or high-level conceptual questions about language models, but using it directly to solve the problem is prohibited. We strongly encourage you to disable AI autocomplete (e.g., Cursor Tab, GitHub CoPilot) in your IDE when completing assignments (though non-AI autocomplete, e.g., autocompleting function names is totally fine). We have found that AI autocomplete makes it much harder to engage deeply with the content.
  • Existing code: Implementations for many of the things you will implement exist online. The handouts we'll give will be self-contained, so that you will not need to consult third-party code for producing your own implementation. Thus, you should not look at any existing code unless when otherwise specified in the handouts.

Submitting coursework

  • All coursework are submitted via Gradescope by the deadline. Do not submit your coursework via email.
  • If anything goes wrong, please ask a question in Slack or contact a course assistant.
  • You can submit as many times as you'd like until the deadline: we will only grade the last submission.
  • Partial work is better than not submitting any work.

Late days

  • Each student has 6 late days to use. A late day extends the deadline by 24 hours.
  • You can use up to 3 late days per assignment.

Regrade requests

If you believe that the course staff made an objective error in grading, you may submit a regrade request on Gradescope within 3 days after the grades are released.


Schedule (TENTATIVE)

# Date Description Course Materials Deadlines
1 Mon March 30 Overview, tokenization Assignment 1 out
2 Wed April 1 PyTorch, resource accounting
3 Mon April 6 Architectures, hyperparameters
4 Wed April 8 Mixture of experts
5 Mon April 13 GPUs
6 Wed April 15 Kernels, Triton Assignment 1 due
Assignment 2 out
7 Mon April 20 Parallelism
8 Wed April 22 Parallelism
9 Mon April 27 Scaling laws
10 Wed April 29 Inference Assignment 2 due
Assignment 3 out
11 Mon May 4 Scaling laws
12 Wed May 6 Evaluation Assignment 3 due
Assignment 4 out
13 Mon May 11 Data
14 Wed May 13 Data
15 Mon May 18 Alignment - SFT/RLHF
16 Wed May 20 Alignment - RL Assignment 4 due
Assignment 5 out
Mon May 25 No class (Memorial Day)
17 Wed May 27 Alignment - RL
18 Mon June 1 Guest lecture
19 Wed June 3 Guest lecture Assignment 5 due