I've tried to implement the diff2flow for sd 1.5 and have trained it following the paper. However, I'm unclear on the correct sampling approach after training. The core question: How should we sample from the trained model?
During training, the model sees:
Non-uniform t_dm (from t_fm ~ U(0,1) through FM→DM conversion)
Scaled interpolants x_dm = scale(t) * x_fm
Optimizes velocity loss through epsilon→velocity conversion
Possible approaches I've considered:
Standard DDIM - Treat model as regular diffusion model
Euler integration - Sample in FM space with FM↔DM conversion at each step
DDIM with wrapper - Model wrapper handles conversion (like FlowModelObj)
The mismatch is that standard DDIM provides uniform timesteps {1000, 950, 900, ...} and expects diffusion noise schedule, while the model was trained on non-uniform timesteps and scaled linear interpolants.
Could you clarify which approach is correct for epsilon-prediction models trained with Diff2Flow? Is there special handling needed for the training/inference distribution mismatch?
Thanks for the great work on this paper!
I've tried to implement the diff2flow for sd 1.5 and have trained it following the paper. However, I'm unclear on the correct sampling approach after training. The core question: How should we sample from the trained model?
During training, the model sees:
Non-uniform t_dm (from t_fm ~ U(0,1) through FM→DM conversion)
Scaled interpolants x_dm = scale(t) * x_fm
Optimizes velocity loss through epsilon→velocity conversion
Possible approaches I've considered:
Standard DDIM - Treat model as regular diffusion model
Euler integration - Sample in FM space with FM↔DM conversion at each step
DDIM with wrapper - Model wrapper handles conversion (like FlowModelObj)
The mismatch is that standard DDIM provides uniform timesteps {1000, 950, 900, ...} and expects diffusion noise schedule, while the model was trained on non-uniform timesteps and scaled linear interpolants.
Could you clarify which approach is correct for epsilon-prediction models trained with Diff2Flow? Is there special handling needed for the training/inference distribution mismatch?
Thanks for the great work on this paper!