I got an error when passing a prompt with about 20k tokens to the Llama 4 Scout model on groq (despite Llama 4 supporting up to 10M token context). groq responds with a POST https://api.groq.com/openai/v1/chat/completions 413 (Payload Too Large) error.
Is there some technical limitation on the context window size with LPUs or is this a temporary stop-gap measure to avoid overloading groq's resources? Or something else?
Is there some technical limitation on the context window size with LPUs or is this a temporary stop-gap measure to avoid overloading groq's resources? Or something else?