Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Their tokenizer is open source: https://github.com/openai/tiktoken

Data files that contain vocabulary are listed here: https://github.com/openai/tiktoken/blob/9e79899bc248d5313c7d...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: