Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My guess: Decoding Base64 is easy because it's a 1:1 mapping between strings. Since it's not supposed to be an encryption or obfuscation, there must be huge lookup tables somewhere on the internet that it uses as Rosetta stones.


The other thing is, it's trained on a lot of HTML, which includes data: URLs that decode to JS and SVG (on which it is also trained). So that one transformation is probably the one that is really baked well into the weights now.

BTW it doesn't just decode it, it also encodes it quite happily - in real time, as it is producing the output, sometimes unprompted. I once had GPT-4, when asked to produce SVG, produce it in form of an <img src="data:data:image/svg+xml;base64,..."> - and when I copy-pasted and rendered it, it was a valid SVG file with shapes inside.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: