Flow Matching for Generative Modeling

One. Course Details

This is Lecture 3 of Stanford University's CME 296 course, completing the trilogy of core generative modeling paradigms following diffusion (DDPM) in Lecture 1 and score matching in Lecture 2. Flow matching has emerged as the industry standard for modern generative models due to its mathematical simplicity, training stability, and fast inference capabilities.

The lecture balances intuitive explanations with rigorous mathematical derivation, starting with foundational concepts from optimal transport, then building up to the conditional flow matching loss that powers models like FLUX and Stable Diffusion 3. It concludes with a discussion of rectified flow for faster inference and a unifying framework that connects diffusion, score matching, and flow matching as different perspectives on the same underlying problem.

Two. Key Learning Takeaways

Flow matching frames generative modeling as an optimal transport problem, where the goal is to transport probability mass from a simple initial distribution (Gaussian noise) to the complex target data distribution.

The core object of interest in flow matching is the vector field (velocity field), which tells each particle how fast and in which direction to move at every point in space and time.

Conditional Flow Matching (CFM) simplifies the training objective to a trivial L2 regression loss between the predicted velocity and the target velocity (x₁ - x₀), eliminating the need for expensive likelihood estimation.

Lipschitz continuity of the vector field guarantees unique trajectories for each initial point, ensuring a one-to-one mapping between the initial and target distributions.

Rectified Flow is a fine-tuning procedure that straightens the learned trajectories, allowing for high-quality generation with as few as one or two inference steps.

All three major generative paradigms—diffusion, score matching, and flow matching—are mathematically equivalent and can be unified under the stochastic interpolants framework.

Flow matching produces deterministic trajectories by default, eliminating the stochasticity inherent in traditional diffusion models while still supporting diverse generation.

The training process for flow matching is significantly more stable than earlier methods, with fewer hyperparameters to tune and better scaling properties with model size.

Three. Course Gold Quotes

"The vector field is like giving self-driving cars turn-by-turn directions. The score is just a compass pointing toward high-density regions."

"All the complexity of optimal transport melts away into a single L2 loss. That's the magic of flow matching."

"If your vector field isn't Lipschitz continuous, you can have two particles starting at the same point ending up in completely different places. That's a disaster for generative modeling."

"We don't match the flow directly—we match the velocity. But if the velocity is Lipschitz, matching the velocity is exactly the same as matching the flow."

"Rectified flow doesn't make your model better—it makes your model faster. Sometimes speed is the most important feature of all."

"Diffusion, score matching, flow matching—they're all just different ways of looking at the same mountain. The view is different, but the summit is the same."

"The beauty of flow matching is that you don't need to be a math genius to implement it. The loss is literally just subtracting two vectors and squaring them."

Four. Layered Learning Notes

Module 1: Course Recap and Paradigm Comparison

The first two lectures established two foundational generative modeling paradigms:

DDPM Diffusion: Discrete-time process that gradually adds noise to clean images and learns to reverse it by predicting the added noise
Score Matching: Continuous-time process that learns the gradient of the log probability distribution (the score) and uses Langevin dynamics to sample from the data distribution

Both paradigms result in an L2 regression loss, but they have important limitations: diffusion requires carefully designed noise schedules, and score matching has complex sampling procedures. Flow matching addresses these limitations by reframing the entire problem from an optimal transport perspective.
A critical convention change is introduced in this lecture:

In diffusion and score matching: t=0 = clean data, t=T = pure noise
In flow matching: t=0 = pure noise (initial distribution p₀), t=1 = clean data (target distribution p₁)

This convention aligns with the optimal transport literature and has become the standard for all modern flow-based generative models.

Module 2: Core Terminology in Flow Matching

Flow matching is built on four interrelated concepts that form the foundation of the entire framework:

Trajectory (xₜ): The path taken by a single particle from its initial position at t=0 to its final position at t=1
Flow (ψₜ(x₀)): A function that maps an initial point x₀ to its position at time t. The flow can be thought of as the collection of all possible trajectories.
Probability Path (pₜ(x)): The probability distribution of particles at time t. p₀ is the initial Gaussian distribution, and p₁ is the target data distribution.
Vector Field (uₜ(x)): A time-dependent function that assigns a velocity vector to every point in space. The vector field tells each particle how to move at every moment in time.

The lecture uses an intuitive analogy to distinguish the vector field from the score function learned in score matching:

The vector field is like turn-by-turn directions for self-driving cars, telling each car exactly where to go and how fast to drive
The score function is like a compass that only points toward the nearest city, without giving specific directions

Module 3: Foundational Equations

Flow matching relies on two fundamental equations that connect the micro behavior of individual particles to the macro behavior of the entire probability distribution:

Ordinary Differential Equation (ODE): Describes the motion of a single particledx/dt = uₜ(x)This equation states that the velocity of a particle at position x and time t is exactly equal to the vector field at that point. A key mathematical result guarantees that if the vector field is Lipschitz continuous, each initial point will have a unique trajectory.
Continuity Equation: Describes the evolution of the entire probability distribution∂pₜ/∂t = -∇ · (pₜ uₜ(x))This equation enforces conservation of mass: the change in density at any point is equal to the net inflow of probability mass to that point. The divergence operator ∇· measures how much the vector field is spreading out or converging at a given point.

Together, these two equations form the complete mathematical description of the flow matching problem. If you know the vector field uₜ(x), you can solve the ODE to generate individual samples and use the continuity equation to verify that the resulting distribution matches the target data distribution.

Module 4: Conditional Flow Matching Derivation

The biggest challenge in flow matching is learning the vector field uₜ(x) from data. Direct maximum likelihood estimation (used in earlier continuous normalizing flows) is prohibitively expensive because it requires solving an ODE at every training step.
Conditional Flow Matching (CFM) solves this problem with a clever simplification:

Instead of trying to transport the entire initial distribution to the entire target distribution at once, consider the simpler problem of transporting the initial distribution to a single data point x₁
For this conditional problem, define a simple Gaussian probability path that interpolates between the initial Gaussian and a Dirac delta at x₁:pₜ(x|x₁) = 𝒩(t x₁, (1-t)² I)
The corresponding conditional vector field has a closed-form solution:uₜ(x|x₁) = (x₁ - x) / (1 - t)

A critical simplification occurs when xₜ is sampled from this conditional probability path. In this case, xₜ can be written as:xₜ = t x₁ + (1 - t) x₀where x₀ is sampled from the initial Gaussian distribution. Substituting this into the conditional vector field gives:uₜ(xₜ|x₁) = x₁ - x₀

This is an extremely simple result: the target velocity for any point on the straight line between x₀ and x₁ is just the difference between the two endpoints.

The final conditional flow matching loss is then:L(θ) = 𝔼_{t~𝒰(0,1), x₁~p_data, x₀~p₀} [ || u_θ(xₜ, t) - (x₁ - x₀) ||² ]

This is a standard L2 regression loss that is trivial to implement and extremely stable to train.

Module 5: Training and Inference

The training process for flow matching is remarkably simple:

Sample a random noise vector x₀ from the standard Gaussian distribution
Sample a random clean image x₁ from the training dataset
Sample a random time step t uniformly from [0, 1]
Construct the noisy intermediate point xₜ = t x₁ + (1 - t) x₀
Use the neural network to predict the velocity u_θ(xₜ, t)
Compute the L2 loss between the predicted velocity and the target velocity (x₁ - x₀)
Backpropagate the loss and update the model parameters

Inference is equally straightforward:

Sample an initial point x₀ from the standard Gaussian distribution
Numerically solve the ODE dx/dt = u_θ(x, t) from t=0 to t=1 using a numerical solver like Euler or Heun
The resulting point x₁ is the generated sample

Module 6: Rectified Flow for Faster Inference

While standard flow matching works well, it has one important limitation: the learned trajectories are often curved. This means that numerical solvers require many steps (typically 20-50) to accurately follow the trajectory, making inference slow.
Rectified Flow solves this problem with a simple fine-tuning procedure:

Train an initial flow matching model as described above
Generate a large number of samples by solving the ODE from x₀ to x₁
Use these (x₀, x₁) pairs to retrain the model
Repeat this process 1-2 times

Each reflow step straightens the trajectories, making them closer to straight lines. After just one reflow step, high-quality samples can be generated with as few as 2-4 inference steps. After two steps, even one-step generation becomes possible.
The tradeoff is that reflow can introduce small degradations in sample quality, so it is typically only done once or twice.

Module 7: Unification of Generative Paradigms

The lecture concludes with a powerful insight: diffusion, score matching, and flow matching are not competing methods—they are different perspectives on the same underlying problem.
All three paradigms can be unified under the Stochastic Interpolants framework, which shows that:

The noise predicted by diffusion models
The score predicted by score matching models
The velocity predicted by flow matching models

are all mathematically related. If you know any two of them, you can compute the third.

This unification explains why all three methods produce similar results when implemented correctly. It also allows researchers to combine insights from all three paradigms to develop even better generative models.

Wishing you all the best as you continue your journey into generative modeling. May your vector fields be perfectly Lipschitz continuous, your trajectories be straight and true, and your conditional flow matching loss drop smoothly to zero. May your rectified flow steps give you lightning-fast inference without sacrificing quality, and may your numerical solvers converge perfectly on the first try. The flow matching techniques you're learning today power the fastest and most powerful generative models in the world—keep exploring, keep deriving, and keep pushing the boundaries of what AI can create. Happy generating!

Video Source and Usage Instructions

Video Title: Stanford CME296 Diffusion & Large Vision Models | Spring 2026 | Lecture 3 - Flow matching
• Course Series: Stanford CME296: Diffusion & Large Vision Models
• Original Platform:
• Original Publisher: Stanford
• Original Video URL: https://youtu.be/agN3AlfGFrk?si=3Klfrt9_6fAfqDOK

Information About Website Advertising

This site is a non-profit educational sharing platform. The advertisements displayed on the pages are solely intended to cover basic operational costs such as server maintenance, bandwidth, and content upkeep. We do not generate any form of commercial profit from the video content, nor do we charge any fees for the original video content.

Copyright and Compliance Statement

1. We have preserved the original video in its entirety without making any modifications, edits, or alterations to the course content, in order to ensure the authenticity and integrity of the academic material.
2. All copyrights and intellectual property rights related to this video belong to the original author and Stanford. This repost strictly adheres to Creative Commons license and is intended solely for educational, research, and personal communication purposes.
3. If the original copyright holder believes this repost infringes upon your legitimate rights and interests, or if you have any objections to the operation of this site, please contact us through the website. We will remove the relevant content as soon as possible upon receiving notification.

1.If you have any questions, please email us.：[gwang4821@gmail.com]
2. You can also go directly to the Feedback Center,Feedback
3. We will address your feedback immediately upon receipt.