Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think your per-axis std normalization is likely doing a big pile of the work —- it’s fairly well-known that “wrong” PCA, setting sigma=Id or just taking a square root, gives better embeddings than the un-normalized version. It would be worth showing a comparison to similarly-normalized PCA I think, if it’s not too hard?


Good catch, this is the obvious ablation I should have included. I'll re-run with per-axis normalized PCA as a separate baseline and post numbers in this thread tomorrow. Prior: I expect some of the gap to come from normalization, but not all — the no-improvement results on isotropic datasets (§4) suggest there's structural signal the polynomial cross-terms catch that linear projection structurally can't. But that's a prediction; let me actually run it.


Just checked the normalization point. You were partially right, sqrt-normalization makes the difference x2 less. I'm updating the numbers in the post. Interesting moment. I did a smoke test of poly-AE without whitening, and the result didn't change. I won't mention it in the post cause right now I'm not sure if it's a random effect or really a polynomial lift compensates normalization




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: