Very cool. I jumped in here thinking it was gonna be something else though: a packaged service for distributing on-prem model running across multiple GPUs.
I'm basically imagining a vast.ai type deployment of an on-prem GPT; assuming that most infra is consumer GPUs on consumer devices, the idea of running the "company cluster" as combined compute of the company's machines
Let me know when you open source it; I think there is a place for this and I think we could integrate it as a plug in pretty easily into the LlamaFarm framework :)
I'm basically imagining a vast.ai type deployment of an on-prem GPT; assuming that most infra is consumer GPUs on consumer devices, the idea of running the "company cluster" as combined compute of the company's machines