Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(author here)

The paper/model/code was just made public today. This may be why no one is talking about it yet.

Regarding whether the size is a hassle: It's possible to run inference on a single Google Cloud TPU v3-8 device or on a server with 4x 32GB v100 GPUs. Hugging Face also has an inference API for any model on the Hub: https://api-inference.huggingface.co/docs/python/html/index....



Do you have (rough) numbers for inference latency on 4x 32GB v100?


(author here)

I don't have exact numbers for latency but the inference widget is currently on a TPU v3-8 (which if I am not mistaken could roughly be compared to a cluster of 8 V100). That gives you a rough idea of the latency for short inputs.

Note that a colleague just reminded me that it is possible on a single (big) GPU with enough CPU to run inference for T5-11B (which is the size we use) with offloading -> https://github.com/huggingface/transformers/issues/9996#issu...


On the topic of GPT-3, I asked your creation:

"Who is better, you or GPT-3?"

> GPT-3


It somehow picked up Modesty.


Can this be used to generate prose at length? Or Reddit comment replies?


While in theory it could, the nature of its training favors shorter more factual replies.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: