Tag: Pretraining

All the articles with the tag "Pretraining".

A Looped Transformer Router Shows Its First Replicated Gain

6 Jul, 2026 · 10 min read

A small-budget BPE language-model experiment where a sparse late-final-loop token-feedback router becomes the first route-looped Transformer candidate to beat matched fixed-loop baselines across several controlled checks.
When a Looped Transformer Router Almost Works

1 Jul, 2026 · 12 min read

A controlled small-scale language-composition experiment comparing fixed and routed looped Transformers, showing why the first sequence-level router was competitive but collapsed toward a weak exit policy.
How to Test Pretraining Ideas at Small Scale Before Betting on a Large Model

29 Jun, 2026 · 25 min read

A practical guide to validating pretraining improvements with small proxy models, scaling ladders, isoFLOP budgets, loss curves, downstream evals, and rank-correlation checks before committing to an expensive large-model run.
Why Embeddings Cannot Solve Eval-Set Contamination

29 Jun, 2026 · 11 min read

A technical deep dive on why semantic embedding search is useful but insufficient for eval-set decontamination: leakage is about evaluation advantage, not just text similarity.
Pretraining Contamination: Why Don't Train on the Test Set Became Hard

29 Jun, 2026 · 14 min read

A practical introduction to LLM pretraining contamination: why benchmark leakage is not ordinary deduplication, how public evals leak into web-scale corpora, and how layered decontamination pipelines reduce risk.

A Looped Transformer Router Shows Its First Replicated Gain