Code at: https://github.com/yashwantherukulla/DDPM-From-Scratch

Linear Noise Scheduler

Forward Process

  • Given return
  • In order for the entire thing to be optimized, we will pre-compute and store the s and the cummulative product terms in advance.
  • Here and
  • The authors use a linear noise scheduler where they linearly scale with , and .

Reverse Process

  • Given , returns i.e., by sampling the reverse distribution.
  • where,
  • We will pre-compute and store,

Time Embedding Block

  • Given a 1D tensor of shape , this block will output a Time Embedding of shape
  • the input will pass through this block as follows,
    1. Positional Encoding: Sinusoidal Positional encoding used in Transformers
  1. FC
  2. SiLU (Sigmoid Linear Units activation)
  3. FC

U-Net Model

  • Inspired from SD’s U-Net architecture on HuggingFace
  • multiple layers of the above image is the down block in the below image.

Handwritten Notes on Detailed Math