I'm trying to understand the clustering code but not doing too well. https://git... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		politelemon on Oct 25, 2023 \| parent \| context \| favorite \| on: Embeddings: What they are and why they matter I'm trying to understand the clustering code but not doing too well. https://github.com/simonw/llm-cluster/blob/main/llm_cluster.... So does this take each row from the DB, convert to a numpy array (?), then uses an existing model called MiniBatchKMeans (?) to go over that array and generate a bunch of labels. Then add it to a dictionary and print to console.

jamessb on Oct 25, 2023 [–]

Yes - it uses the implementation of MiniBatchKMeans provided by the scikit-learn library.

(I'd call this an "algorithm" rather than a "model" - it doesn't have any model weights learnt from a training dataset)

For more details, see the pages in its user guide describing:

* the K-Means algorithm: https://scikit-learn.org/stable/modules/clustering.html#k-me...

* the Mini Batch variant of k-means: https://scikit-learn.org/stable/modules/clustering.html#mini...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact