<aside> ❗

Disclaimer: I am not a researcher in the field. Some parts may be incorrect or may not be the best way to view the problem.

</aside>

When ChatGPT first came out, like most people, I was blown away. I still remember one of the first things I asked it (the halting problem)

Write a program that takes 2 inputs: a description of a program $P$ and an input $x$. The program outputs "YES" if $P(x)$ eventually terminates and "NO" otherwise.

At the time, I had no idea it had been trained on the entire internet. I was hoping it would somehow overload ChatGPT with all the thinking it had to do. To my dismay, it was already aware of the problem’s history. What unsettled me was that it knew about the problem at all. Eventually, I learned these machines were “predicting the next most probable token”. That explanation was enough for me to make peace with it.

At the same time, image models were gaining popularity. Unlike text generation, generating images with a computer completely broke my intuition. Even after watching videos that tried to explain diffusion (which typically went like “diffusion is a concept in physics” but did not go any deeper), it still felt like pure sorcery. Were image models using the same techniques as language models?

At some point, I realized the wizards behind the curtain were people working in the generative modelling field. The field is not focused on image generation specifically, but the task of image generation is a useful empirical task for comparing different generative models. Image data is useful because you can find a lot of it on the web, and images are high-dimensional enough so that computational efficiency is a concern, but not too high that training is impractical.

Over the past year, I read up on 4 frameworks for generative modelling, primarily targeting “flow-based” models. I heard that Flow Matching was the new kid on the block so I tried making my way up there. These frameworks are split into 4 sections. Years reflect when the methods gained popularity, not necessarily when they were first studied.

  1. Normalizing Flows (2014-2018)
  2. Continuous Normalizing Flows (2018)
  3. Diffusion (2015-2020)