██████╗ ██████╗ ██╗ ██╗ ██████╗
██╔══██╗ ██╔═══██╗ ██║ ██║ ██╔════╝
██║ ██║ ██║ ██║ ██║ ██║ ██║ ███╗
██║ ██║ ██║ ██║ ██║ ██║ ██║ ██║
██████╔╝ ╚██████╔╝ ╚██████╔╝ ╚██████╔╝
╚═════╝ ╚═════╝ ╚═════╝ ╚═════╝ _
%
% ./robot
- Why Continuous Behavioral Cloning Fails Exponentially (and How Robotics Gets Away With It)Behavioral cloning has a well-known failure mode: small errors compound, and the policy drifts into states the expert never visited. The standard analysis bounds this drift as quadratic in the horizon — bad, but manageable. That bound assumes discrete actions. When actions are continuous, we show the picture is fundamentally worse: error grows exponentially in the horizon. Yet methods like ACT...robotics imitation-learningApr 16, 2026 · 11 min read
- Pushing T with TokensWhen a robot sees the same scene twice, it shouldn't always do the same thing — sometimes pushing left is just as good as pushing right. This is the central challenge of behavior cloning: expert demonstrations contain multiple valid modes of behavior, and naive imitation averages over them, producing actions that belong to none. The dominant solutions either discretize the action space or denoise...robotics imitation-learningApr 02, 2026 · 9 min read
- Why Tokens Are EnoughModern language models don't train on text — a tokenizer chops raw text into chunks, and the model only ever sees those chunks. This indirection raises two natural questions. First: what does tokenization lose? A language model is a distribution over strings, but we're learning a distribution over token sequences — does this restrict what we can express? Second: what does tokenization add? Even...tokenization information-theoryMar 16, 2026 · 9 min read
- Hidden Variance Reduction in Diffusion LossThe variational perspective formulates diffusion models as latent variable models (LVMs) trained by maximizing the evidence lower bound (ELBO). However, standard derivations of the diffusion ELBO often rely on lengthy algebra manipulations to arrive at the final objective function. Why do we go through the trouble of transforming a simple expectation into a complex sum of KL divergences? This...diffusion variance-reductionDec 06, 2025 · 10 min read