Skip to main content

AI Learning Guides

A collection of long-form, project-driven guides for learning modern AI from first principles. Each guide takes a single topic from "I've heard of this" to "I can build it and debug it," organized into phases with runnable projects at every step.

Built and maintained by a solo developer as personal learning notes — published in case they're useful to others. Code is runnable; expect rough edges.


Guides

GuideTopicPhasesBest for
PyTorch Deep DiveTensors, autograd, performance, distributed, custom kernels11Going from PyTorch user to power user
LLMTransformers, GPT from scratch, pretraining, post-training, serving11Understanding and building language models
Image GenerationAutoencoders, VAEs, GANs, diffusion, latent diffusion, DiTs, flow matching11Generative vision from foundations to frontier
Reinforcement LearningMDPs, DQN, PPO, SAC, offline RL, RLHF11Learning RL as an algorithm family
Video GenerationVideo diffusion, latent video, DiTs, world models11Temporal generative models
RoboticsControl, perception, imitation learning, diffusion policies, VLAs, sim-to-real11Building robot learning systems
Multimodal LearningCLIP, fusion, VLMs, any-to-any models11Combining modalities into shared representations
Inference SystemsRequest lifecycle, KV cache, batching/scheduling, speculative decoding, serving-time quantization, distributed/disaggregated serving, observability & cost11Putting trained models into production at scale (LLM-focused)
AI HardwareGPU architecture, CUDA/Triton, quantization, serving11Making models fast on real silicon

The order above is a suggested learning progression: foundations first, then single-modality work, then cross-modality and applied tracks, with the systems-heavy guides (Inference Systems and AI Hardware) as the production-and-performance endpoint. Each guide is self-contained but cross-references the others where it makes sense. Project folders live under each guide's projects/ directory — see the structure section below.


Where to start

There is no single "first" guide. Where to start depends on what you're trying to build. A few common paths:

"I'm comfortable with deep learning basics, I want to actually understand what I'm using"

PyTorch Deep Dive. This is the foundation. If view vs reshape, autograd internals, or torch.compile feel hand-wavy to you, fix that first. Nearly every other guide assumes this fluency.

"I want to build language models"

LLM, with PyTorch Deep Dive as a parallel reference. Hit Phase 6 (post-training / RLHF) and you'll want the RLHF section of the Reinforcement Learning guide too. When you're ready to put a trained model in front of users, continue into Inference Systems.

"I want to serve LLMs in production"

Inference Systems. Assumes you've done at least Phases 1–3 of the LLM guide ("KV cache" and "decoder-only" should not feel fuzzy). Pair with AI Hardware Phases 4 + 7 when you hit kernel-level questions.

"I want to build agents / robots that learn"

Reinforcement Learning for the algorithms, then Robotics for applying them to physical systems. Robotics also leans on imitation learning and diffusion policies, which don't require RL — you can read robotics first and dip into RL on demand.

"I want to build generative models for vision"

Image Generation first (autoencoders → GANs → diffusion). Then Video Generation, which assumes image-side diffusion fluency. Multimodal Learning is the natural next step if you want text↔image↔video.

"I want to make models actually fast"

AI Hardware. Different prerequisite stack from the others — systems-programming chops matter more than ML depth. Pair with PyTorch Deep Dive Phases 5–6 (performance, custom kernels).

"I'm starting from scratch"

The honest answer: pick a project you actually want to build and work backward. Generic "learn AI" curricula tend to stall. If you want a default, do PyTorch Deep DiveLLM Phases 1–3 → pick a direction.


Prerequisites

These apply to every guide in this collection. Each guide's Phase 0: Prerequisites layers the topic-specific concepts and tooling on top of these.

Concepts to Know

  • Python: classes, decorators, context managers, generators, virtual environments
  • Linear algebra: matrix multiplication, vector spaces, broadcasting, basic eigenvalue intuition
  • Calculus: gradients, chain rule, partial derivatives
  • Probability: random variables, expectation, conditional probability
  • Deep learning basics: training loops, loss functions, backpropagation, what an nn.Module is. The PyTorch Deep Dive is the recommended foundation if any of this feels shaky.
  • Shell and git: you will read and clone a lot of repos

What You Need Installed

  • Python 3.10+, NumPy, PyTorch
  • A GPU — owned, rented, or borrowed. Cloud is fine. Each guide notes its specific VRAM and hardware needs.

Resources


Dependency graph

How the guides relate to each other:

┌──────────────────────┐
│ PyTorch Deep Dive │ ← foundation for everything
└──────────┬───────────┘

┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌────────────┐ ┌──────────────┐
│ LLM │ │ RL │ │ Image Gen │
└────┬─────┘ └──────┬─────┘ └──────┬───────┘
│ │ │
│ ▼ ▼
│ ┌────────────┐ ┌──────────────┐
│ │ Robotics │ │ Video Gen │
│ └────────────┘ └──────────────┘
│ │
├──────────────────┬───────────────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌────────────────┐
│ Inference │ │ Multimodal │ ← combines modalities;
│ Systems │ │ Learning │ read after LLM + Image Gen
└──────────────┘ └────────────────┘

│ shares territory (kernels, quantization)

┌──────────────┐
│ AI Hardware │ ← orthogonal systems track;
└──────────────┘ pair with PyTorch Deep Dive

Hard dependencies:

  • Video Generation assumes Image Generation (diffusion, U-Net, latent diffusion).
  • Robotics borrows continuous-control RL from the RL guide (SAC, PPO, sim-to-real).
  • Multimodal Learning assumes you've seen both a transformer-based LLM and an image encoder.
  • Inference Systems assumes LLM Phases 1–3 (transformer mechanics, KV cache concept).

Soft dependencies:

  • LLM Phase 6 (RLHF) reads better after RL Phase 9.
  • Robotics Phase 6 (VLAs) reads better after Multimodal Phase 5 (VLMs).
  • Inference Systems defers numeric-format theory to AI Hardware Phase 7 and kernel-writing fundamentals to AI Hardware Phase 4 / PyTorch Deep Dive Phase 6; read the upstream guide first if you want depth there.

Repository structure

ai-learning-guides/
├── README.md ← you are here (cross-guide Prerequisites live in this file)
├── shared/
│ └── glossary.md ← consolidated glossary for all guides
├── guides/
│ ├── pytorch-deep-dive/
│ │ ├── README.md ← the guide itself
│ │ ├── projects/
│ │ │ ├── 01-stride-explorer/
│ │ │ │ ├── README.md ← project explanation
│ │ │ │ ├── run.py ← entry point
│ │ │ │ └── ...
│ │ │ ├── 02-micrograd-pytorch-style/
│ │ │ └── ...
│ │ ├── requirements.txt
│ │ └── STATUS.md ← last-tested dates per project
│ ├── llm/
│ │ └── ... (same shape)
│ ├── image-generation/
│ │ └── ... (same shape)
│ ├── reinforcement-learning/
│ │ └── ... (same shape)
│ ├── video-generation/
│ ├── robotics/
│ ├── multimodal-learning/
│ ├── inference-systems/
│ └── ai-hardware/
└── LICENSE

Conventions used across all guides:

  • Phases. Each guide is split into ~6–11 phases that go from foundations to frontier. Phase 0, when present, is prerequisites.
  • Projects. Each phase ends with a projects table. Projects are difficulty-rated (⭐ to ⭐⭐⭐⭐⭐) and live as folders under guides/<topic>/projects/, numbered sequentially across the whole guide (not per phase).
  • Bidirectional links. Every project README links back to its phase; every phase links forward to its projects.
  • Runnable, not tested. Code is meant to run, but there's no CI. Each guide's STATUS.md notes when projects were last verified.
  • Pinned dependencies. Each guide has its own requirements.txt. Versions are pinned loosely (>=X,<Y) — strict enough to avoid silent rot, loose enough not to break weekly.

Project difficulty legend

TierMeaningTypical time
"Make sure you understand the concept" — single file, a few dozen lines30 min – 2 hours
⭐⭐"Build a working version" — multiple files, real implementation2 – 8 hours
⭐⭐⭐"Build something non-trivial" — design choices matter, debugging matters1 – 3 days
⭐⭐⭐⭐"Reproduce a paper" — published result on a small scale; expect to fight your tooling1 – 2 weeks
⭐⭐⭐⭐⭐"Research-level" — open-ended, likely no reference implementation matches yours exactly2+ weeks, often much more

These are guidelines, not promises. ⭐⭐⭐⭐ projects routinely overrun their estimates; ⭐⭐⭐⭐⭐ projects are graded on what you learned, not whether they "worked."


Philosophy

A few opinions that shape every guide here:

Build first, derive later. You learn faster when you have working code in front of you. Each phase opens with concepts, then code, then projects — in that order, but the projects are where it sticks.

Explicit over magic. Where there's a choice between calling a library and writing the loop, the guides write the loop at least once. You can use the library afterward with eyes open.

Equations are checkpoints, not décor. When an equation appears, it's because you'll need to recognize it in code five pages later. If you can't connect the math to the implementation, slow down.

Frontier topics are flagged honestly. The "Phase 10: Frontier" sections describe things actively being researched. They go stale fastest. Treat them as starting points, not conclusions.

No prestige hierarchy of topics. A working bigram model is more valuable than a half-built MoE. The ⭐ projects are not throwaway — they're the foundation the ⭐⭐⭐⭐⭐ ones rest on.


Suggested learning timelines

These are rough — real time depends heavily on background and how much you build vs. read.

PathGuidesApprox. time (part-time)
LLM engineer trackPyTorch Deep Dive → LLM → RL Phase 9 (RLHF)4 – 6 months
LLM serving trackPyTorch Deep Dive → LLM Phases 1–3 → Inference Systems3 – 5 months
Generative vision trackPyTorch Deep Dive → Image Gen → Video Gen3 – 5 months
Robotics ML trackPyTorch Deep Dive → RL Phases 1–5 → Robotics4 – 6 months
Multimodal trackPyTorch Deep Dive → LLM Phases 1–3 → Image Gen Phases 1–5 → Multimodal5 – 7 months
Systems trackPyTorch Deep Dive → AI Hardware → Inference Systems3 – 5 months
The whole thingAll nine guides, in dependency order14 – 20 months

If a phase is taking 3x as long as the guide's suggested timeline, that's a signal — usually a prerequisite is shaky, not that you're slow.


License

MIT License. See the LICENSE file for details.