For LeJEPA, a extra predictor network is optional (not needed for this demo).
Learn more about how it works ↓
The 3D plot shows the first 3 embedding dimensions. Data manifolds affect only the first 2-3 dimensions; others are uniform.
by Scott Hawley. cf. LeJEPA paper. Blog Posts on SSL (2021): 1, 2, 3. & BYOL (2022)
In Self-Supervised Learning, we train models to map multiple "views" (e.g., jittered versions) of the same object to the same location in embedding space. Without a counter-force, models find a trivial "shortcut": they collapse representations into a lower-dimensional subspace—a point, a line, or a plane. This dimensional collapse discards nearly all the information required for downstream tasks.
The LeJEPA paper provides a mathematical proof that the Isotropic Gaussian is the unique optimal distribution for minimizing risk in both linear and nonlinear probing—i.e. maximizing utility of the embeddings for downstream classification-like tasks. It represents the maximum entropy state for a fixed variance, ensuring no single dimension dominates the representation.
Implementing a multivariate Gaussian test in high dimensions is notoriously difficult and computationally expensive, with traditional tests often scaling quadratically (\(O(N^2)\)) or worse with the number of samples. SIGReg (Sketched Isotropic Gaussian Regularization) bypasses this by projecting embeddings onto random 1D directions and applying the Epps-Pulley test to each projection, maintaining linear (\(O(N)\)) complexity.
The test statistic \(T_{EP}\) is defined as:
where \(\hat{\phi}(t)\) is the Empirical Characteristic Function (the discrete Fourier transform of the sample distribution), given by:
By operating on the ECF, we get a bounded loss with stable, non-zero gradients even when the data is highly clustered. Furthermore, because the complex exponentials are bounded, the gradients are naturally constrained—effectively giving us gradient clipping "for free" without manual heuristics.
The model achieves meaningful representations by incorporating two distinct mathematical effects into a single objective function:
Here, \(\mathcal{L}_{attr}\) (Attraction) pulls views of the same object together. To prevent collapse, \(\mathcal{L}_{\rm SIGReg} = T_{EP}\) (Regularization) applies the Epps-Pulley test to force the overall distribution toward an isotropic Gaussian. The parameter \(\lambda\) (set to 100 in this demo) controls the relative strength of these two effects.