Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The quality of the embeddings is a limiting factor for this sort of search - OpenAI text-ada embeddings are great but that removes the local aspect, and the better huggingface models are too big. With the model sizes increasing it’s hard to see what the path will be for local/offline.


There are plenty of great embedding models that are on the order of a few hundreds megs (even outperforming ada-002). See the leaderboard here - https://huggingface.co/spaces/mteb/leaderboard. Local/offline is only growing.


Wow gte-small feels like a pretty great balance of size and quality (all-MiniLM-L6-v2 has been my go-to)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: