Just a side note here, it seems that these are spot instances, whereas all our VMs are non-interruptible. So there is a bit of difference (e.g. you probably wouldn't do a weeklong Blender render on a GCP VM if it could be interrupted and lose your work, whereas you can definitely run it on TensorDock because our VMs are reserved).
Of course, you can set up data checkpointing to save your data, but overall, it is a bit of an extra hassle to run on spot/interruptible instances, and if you do get interrupted, you are wasting valuable time waiting for stock to free up again.
Wow, cool! Yes - interruptible can be very cheap... I'll add it to our backlog so we do that instead of idly mining
I was wondering, do you happen to have an API for listing servers? We're launching a marketplace later in August (https://www.tensordock.com/product-marketplace), and we expect pricing to be really really cheap. Like #1 in industry cheap while retaining.
Interruptible, if we add that, would probably be even less than those prices listed.
It'd be really cool if we could auto-update availabilities of GPU servers through an API so that we can list our servers on your tool as well :)
Didn't realize you made this tool. It's super useful.
Some unsolicited feedback, if you're still actively developing:
- You should consider some of the lesser known cloud providers (e.g. Coreweave/Lambda Labs/TensorDock).
- Add information about whether the servers support NVLink/NVswitch. For example, A100s come in 3 flavors: PCIe without NVLink bridges, PCIe with NVLink bridges, and SXM with NVlink/NVSwitch fabric.
There are hundreds of smaller providers, each having a different API , if having it at all, so this is not possible for a one man operation. CloudOptimizer is already the largest cloud comparison tool on the web (12 cloud providers listed)
[1] https://cloudoptimizer.io