Skip to content
Go back

What Worked (and What Didn't) When Training AEs and VAEs for Embedding Compression

· 4 min read

TL;DR


Dimension Choice


AE vs VAE Objectives


Regularization and Stability


Contrastive Signals (Kept Light)


Optimization Practices That Stuck


Evaluation Principles


A Practical Training Recipe

  1. Start with a simple AE baseline to validate the pipeline and catch data issues quickly.
  2. Switch to a VAE with: KL warmup, free-bits, and a moderate KL target.
  3. If retrieval is the goal, add a small latent alignment term and, if available, a queue-based contrastive mechanism.
  4. Pick a single promising dimensionality and tune learning rate and training length there before scaling out.
  5. Lock the recipe and run a small set of confirmatory evaluations before expanding the search.

Common Pitfalls (and Fixes)

PitfallFix
KL collapse or degenerate latentsIntroduce free-bits and a gradual KL warmup
Numerical instability (NaNs)Back off the learning rate, enable gradient clipping, verify normalization
Overfitting to reconstructionInclude retrieval-aware losses or lightweight contrastive terms; validate on retrieval tasks
Chasing tiny gainsOnce a medium-dimensional setting performs well, extra dimensions rarely pay off

Closing Thought

For embedding compression, VAEs with carefully scheduled KL and light contrastive shaping repeatedly offered the strongest blend of compactness, stability, and downstream retrieval quality. Keep the objective balanced, the regularization principled, and the evaluation focused on the end task—not just reconstruction loss.


Share this post on:

Previous Post
Post-Training Is Not 'One Algorithm': Objective Functions and Implementation Essentials for PPO / DPO / GRPO
Next Post
User Interest Modeling with Transformer Architectures