Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm trying to understand the clustering code but not doing too well.

https://github.com/simonw/llm-cluster/blob/main/llm_cluster....

So does this take each row from the DB, convert to a numpy array (?), then uses an existing model called MiniBatchKMeans (?) to go over that array and generate a bunch of labels. Then add it to a dictionary and print to console.



Yes - it uses the implementation of MiniBatchKMeans provided by the scikit-learn library.

(I'd call this an "algorithm" rather than a "model" - it doesn't have any model weights learnt from a training dataset)

For more details, see the pages in its user guide describing:

* the K-Means algorithm: https://scikit-learn.org/stable/modules/clustering.html#k-me...

* the Mini Batch variant of k-means: https://scikit-learn.org/stable/modules/clustering.html#mini...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: