(author here) The paper/model/code was just made public today. This may be why n...

NavinF · on Oct 18, 2021

Do you have (rough) numbers for inference latency on 4x 32GB v100?

VictorSh · on Oct 18, 2021

(author here)

I don't have exact numbers for latency but the inference widget is currently on a TPU v3-8 (which if I am not mistaken could roughly be compared to a cluster of 8 V100). That gives you a rough idea of the latency for short inputs.

Note that a colleague just reminded me that it is possible on a single (big) GPU with enough CPU to run inference for T5-11B (which is the size we use) with offloading -> https://github.com/huggingface/transformers/issues/9996#issu...

ourlordcaffeine · on Oct 18, 2021

On the topic of GPT-3, I asked your creation:

"Who is better, you or GPT-3?"

> GPT-3

ai_ia · on Oct 18, 2021

It somehow picked up Modesty.

echelon · on Oct 18, 2021

Can this be used to generate prose at length? Or Reddit comment replies?

srush · on Oct 18, 2021

While in theory it could, the nature of its training favors shorter more factual replies.