My main gripe - and I cannot sufficiently emphasise how fundamental this is - is that it seems, just as with Comet, this is not actually the integration of LLM agency into the browser. it's using LLMs to pretend to be humans doing things with a browser as a human would, using [a limited subset of] it's UI - the least powerful way to use the web, which is the source of the problem to begin with.
I set the same simple task as I set Comet: examine the code resposible for choosing language captions and displaying them with a view to enhance the functionality to display multiple language captions in parallel. This is a high-school exercise, except for how atrocious the HTML can be for any given media-player page.
BrowserOS, like Comet, pretends it isn't capable of actually working with the DOM. This is because bad - or, at best, unambitious - design choices have been made as to how the LLM interfaces with the browser.
I even made a local HTTP API for Comet to use the MacOS accessibility API (which ironically gives greater scriptable browser control) and it couldn't even make a background HTTP get request with AppleScript to localhost. I suspect BrowserOS will be similarly crippled, although at least I can in theory fix it, unlike with Perplexity's closed-source half-assed Chromium bundling.
Can we try a little bit harder, dream a little bigger, imagine how to train, tune and instruct language modes to realistically fight the complexity of web bloat and give the user full programmatic natural-language power to interface with digital services?
I don't believe it would take any more effort than has already been employed.
I set the same simple task as I set Comet: examine the code resposible for choosing language captions and displaying them with a view to enhance the functionality to display multiple language captions in parallel. This is a high-school exercise, except for how atrocious the HTML can be for any given media-player page.
BrowserOS, like Comet, pretends it isn't capable of actually working with the DOM. This is because bad - or, at best, unambitious - design choices have been made as to how the LLM interfaces with the browser.
I even made a local HTTP API for Comet to use the MacOS accessibility API (which ironically gives greater scriptable browser control) and it couldn't even make a background HTTP get request with AppleScript to localhost. I suspect BrowserOS will be similarly crippled, although at least I can in theory fix it, unlike with Perplexity's closed-source half-assed Chromium bundling.
Can we try a little bit harder, dream a little bigger, imagine how to train, tune and instruct language modes to realistically fight the complexity of web bloat and give the user full programmatic natural-language power to interface with digital services?
I don't believe it would take any more effort than has already been employed.