xebulon's comments

xebulon · 2025-07-19T10:38:00 1752921480

My main gripe - and I cannot sufficiently emphasise how fundamental this is - is that it seems, just as with Comet, this is not actually the integration of LLM agency into the browser. it's using LLMs to pretend to be humans doing things with a browser as a human would, using [a limited subset of] it's UI - the least powerful way to use the web, which is the source of the problem to begin with.

I set the same simple task as I set Comet: examine the code resposible for choosing language captions and displaying them with a view to enhance the functionality to display multiple language captions in parallel. This is a high-school exercise, except for how atrocious the HTML can be for any given media-player page.

BrowserOS, like Comet, pretends it isn't capable of actually working with the DOM. This is because bad - or, at best, unambitious - design choices have been made as to how the LLM interfaces with the browser.

I even made a local HTTP API for Comet to use the MacOS accessibility API (which ironically gives greater scriptable browser control) and it couldn't even make a background HTTP get request with AppleScript to localhost. I suspect BrowserOS will be similarly crippled, although at least I can in theory fix it, unlike with Perplexity's closed-source half-assed Chromium bundling.

Can we try a little bit harder, dream a little bigger, imagine how to train, tune and instruct language modes to realistically fight the complexity of web bloat and give the user full programmatic natural-language power to interface with digital services?

I don't believe it would take any more effort than has already been employed.