Integrating hierarchical stochasticity into the StyleGAN3 pipeline: Transforming deterministic latent mapping into an explainable, variational generative process.
Traditional StyleGAN architectures utilize a deterministic 8-layer MLP to map latent codes to styles. This project re-engineers the mapping network into a Variational Mapping Network, introducing stochasticity at every layer of the transformation. By parameterizing each fully connected layer to output a Gaussian distribution, the model learns a probabilistic style space. This ensures that the resulting style vector is not a fixed point but a sample from a learned distribution, providing the foundational diversity required for high-fidelity generation.
The core innovation lies in the Variational Synthesis Layer, which replaces the standard deterministic blocks (L0 to L13). Each block is wrapped in a variational architecture that utilizes parallel paths to simultaneously calculate the mean feature map and the standard deviation. This dual-input system receives the style vector as a control signal for modulated convolutions, ensuring that the stochastic variations such as hair placement or skin texture, are intrinsically tied to the subject's local geometry while maintaining StyleGAN3's alias-free guarantees.
Custom Metrics: Quantifying Stochasticity
To validate the model beyond standard visual inspection, two primary metrics were utilized:
Layer-wise Mean Variance: A custom metric designed to measure the "amount" of stochasticity injected at different spatial resolutions. Unlike the baseline StyleGAN3, which produces zero variance for a fixed latent vector, the VNN maintains a non-zero, tunable variance.
Global_STD Scaling: A control parameter that allows for the real-time adjustment of internal noise. By scaling the predicted, we can quantitatively observe the transition from deterministic reconstruction to high-variance diverse sampling.
Conclusion
The integration of Variational Neural Networks into StyleGAN3 successfully decouples global identity from local structural uncertainty. By shifting from weight-based Bayesian methods to layer-output distributions, the model achieves superior FID scores while providing an explainable framework for stochasticity. The results demonstrate that internalizing variance within the synthesis hierarchy produces more natural, intrinsic details than traditional externally applied noise, paving the way for more robust and diverse generative models.
SEE ALSO