Flow-matching models have traditionally treated the generation process as uniform noise-to-data transport, yet empirical evidence suggests that enforcing a hierarchical generation order—where coarse structures materialize before fine details—substantially improves synthesis quality for natural images. Recent methodologies have pursued this objective through divergent mechanisms. K-Flow implements hard frequency constraints by reinterpreting frequency scaling parameters as temporal variables within a transformed amplitude space, fundamentally altering the flow coordinate system. Conversely, Latent Forcing achieves soft ordering by coupling pixel-space trajectories with auxiliary semantic flows operating on asynchronous schedules, preserving the original interpolation geometry.

The authors propose Frequency-Forcing, which synthesizes these paradigms by realizing K-Flow's frequency-ordered generation through Latent Forcing's soft guidance mechanism. Rather than rewriting core flow coordinates, a standard pixel flow receives guidance from an auxiliary low-frequency stream that matures earlier temporally. The critical innovation lies in the self-forcing signal: instead of relying on heavy pretrained encoders (e.g., DINO), the frequency scratchpad is derived directly from data via a lightweight learnable wavelet packet transform. This approach offers dual advantages—eliminating external dependencies while learning bases optimally adapted to input data statistics, contrasting with fixed frequency decompositions in hard-constraint methods.

Empirically, Frequency-Forcing demonstrates consistent FID improvements over pixel- and latent-space baselines on ImageNet-256. The framework naturally composes with semantic auxiliary streams, yielding additional performance gains. Mathematically, the wavelet packet decomposition provides a learnable W that adaptively partitions frequency content, allowing the model to discover task-specific frequency hierarchies rather than enforcing predetermined orderings. This path-preserving formulation positions forcing-based scale ordering as a versatile architectural pattern compatible with existing flow-matching implementations.