Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
1M Tokens/s: Scaling Qwen 3.5 27B on 96 B200 GPUs with vLLM (medium.com/google-cloud)
3 points by m4r1k 10 days ago | hide | past | favorite | discuss
 help



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: