> Mayybe for some things you could set it up so that the screen output is livestreamed back into the agent, but I highly doubt that anyone is doing that for agents like this yet
What do you mean by streaming? LLMs aren’t that advanced yet where they can consume a live video feed but people have been feeding them screenshots from Playwright and desktop apps for years (Anthropic even released the Computer Use feature based on this).
Gemini has the best visual intelligence but all three of the major models have supported this for a while. I don’t think it’d help with fixing subtle problems in shadows but it can fix other gui bugs using visual feedback.
What do you mean by streaming? LLMs aren’t that advanced yet where they can consume a live video feed but people have been feeding them screenshots from Playwright and desktop apps for years (Anthropic even released the Computer Use feature based on this).
Gemini has the best visual intelligence but all three of the major models have supported this for a while. I don’t think it’d help with fixing subtle problems in shadows but it can fix other gui bugs using visual feedback.