Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This method was frequently used prior to the ubiquity of dummy tokens. XLNet was the paper that introduced me to this idea. I believe it’s been in PyTorch since 2019/2020. I would not be surprised if someone finds an earlier reference.

I’m surprised by the pompousness in the OP. Especially about something that most people who do transformer research understand. I’m also surprised that so many in the replies are taking the position of “this is what research should look like” when this is clearly an example of why research doesn’t work like this. Peer review is good for many things and one of those things is saving yourself some embarrassment.



He's not being pompous, people appreciate the informality and straightforwardness and self-deprecation, which are the opposite of pompus.

You are reading some of the more ambiguous self-deprecation as genuine claims.

TL;DR on why this is important and he's sharing: it's a sort of niche thing that really only matters if you're trying to run pale imitations of ChatGPT on constrained hardware. That's why it's entirely possible the big guns didn't see it as important, they're not trying to run LLMs on a 3090


> I’m surprised by the pompousness in the OP.

He’s writing in a colloquial and self-deprecating and humorous tone. I can’t speak to the merits, but I can follow the reasoning perfectly fine. It’d be hard to find something further from pompous.

> saving yourself some embarrassment

Implying of course that being wrong, or not the first one to discover this, is embarrassing. And that’s not pompous?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: