🤓 Yashwanth's Notes

❯

❯

DDPM from Scratch

DDPM from Scratch

Feb 03, 20251 min read

Code at: https://github.com/yashwantherukulla/DDPM-From-Scratch

Linear Noise Scheduler

Forward Process

Given $x_{0}, t, ϵ$ return $x_{t} = \overset{α_{t}}{ˉ} x_{0} + 1 - \overset{α_{t}}{ˉ} ϵ$
In order for the entire thing to be optimized, we will pre-compute and store the $α$ s and the cummulative product terms $\overset{α_{t}}{ˉ}$ in advance.
Here $α_{t} = 1 - β_{t}$ and $\overset{α_{t}}{ˉ} = \prod_{i = 1}^{t} α_{i}$
The authors use a linear noise scheduler where they linearly scale $β_{t}$ with $β_{1} = 0.0001$ , $β_{T} = 0.02$ and $T = 1000$ .

Reverse Process

Given $x_{T}$ , returns $x_{T - 1}$ i.e., $p_{θ} (x_{t - 1} ∣ x_{t})$ by sampling the reverse distribution.
where,

μ_{θ} = \frac{x _{t}}{α _{t} ˉ} - \frac{( 1 - α _{t} ) ( 1 - α _{t} ˉ )}{( 1 - α _{t} ˉ ) α _{t}}

q \sum (t) = \frac{( 1 - α _{t} ) ( 1 - α ˉ _{t - 1} )}{1 - α ˉ _{t}} I

μ_{θ} + σ_{t} z \to x_{t - 1}

We will pre-compute and store, $(1 - α_{t}), (1 - \overset{α_{t}}{ˉ}), (1 - \overset{α_{t}}{ˉ})$

Time Embedding Block

Given a 1D tensor $t$ of shape $(B,)$ , this block will output a Time Embedding of shape $(B, embedding dimension)$
the input will pass through this block as follows,
1. Positional Encoding: Sinusoidal Positional encoding used in Transformers

P E_{(p os, 2 i)} = s in (p os /1000 0^{2 i / d_{model}})

P E_{(p os, 2 i + 1)} = cos (p os /1000 0^{2 i / d_{model}})

FC
SiLU (Sigmoid Linear Units activation)
FC

U-Net Model

Inspired from SD’s U-Net architecture on HuggingFace
multiple layers of the above image is the down block in the below image.

Handwritten Notes on Detailed Math

Backlinks

No backlinks found

Linear Noise Scheduler
Forward Process
Reverse Process
Time Embedding Block
U-Net Model
Handwritten Notes on Detailed Math

Graph View

Yashwanth's Notes

LinkedIn