People expect to wait a few seconds when calling LLMs. Just make it obvious to users. Our GPT-4 powered app has several thousand paying users and very very rarely is "slowness" a complaint.
"instead of making me sit watching words load one by one"
Huh? This is completely up to you on how you implement your application. Streaming mode isn't even on by default.
"instead of making me sit watching words load one by one"
Huh? This is completely up to you on how you implement your application. Streaming mode isn't even on by default.