So, can somebody in the know speculate about how Deepseek (or OpenAI, or whoever really) is actually running their API?
If I wanted to run a production-grade service using the full Deepseek model, with good tokens/sec and the ability to serve concurrent requests, what sort of hardware are we looking at?
Racks and Racks of servers (likely nVidia HGX H100/H200 8-GPU server) connected at at least 100GB (but more likely 400gb and 800gb) links. The servers alone start at about $350k. Then you need to supply power, cooling, networking and a technical team to support the program.
If I wanted to run a production-grade service using the full Deepseek model, with good tokens/sec and the ability to serve concurrent requests, what sort of hardware are we looking at?