Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Very cool! At Airtrain we’ve also found embeddings can be very valuable for building classification models. If you’re looking to play around with a large amount of text and embeddings we actually recently deduped and embedded all of fineweb-edu (also mentioned in the article) and put the resulting dataset on Hugging Face: https://huggingface.co/datasets/airtrain-ai/fineweb-edu-fort...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: