I think the more generic stream concept is interesting, but their proposal is based on different underlying assumptions.
From what it looks like, they want their streams to be compatible with AsyncIterator so it'd fit into existing ecosystem of iterators.
And I believe the Uint8Array is there for matching OS streams as they tend to move batches of bytes without having knowledge about the data inside. It's probably not intended as an entirely new concept of a stream, but something that C/C++ or other language that can provide functionality for JS, can do underneath.
For example my personal pet project of a graph database written in C has observers/observables that are similar to the AsyncIterator streams (except one observable can be listened to by more than one observer) moving about batches of Uint8Array (or rather uint8_t* buffer with capacity/count), because it's one of the fastest and easiest thing to do in C.
It'd be a lot more work to use anything other than uint8_t* batches for streaming data. What I mean by that, is that any other protocol that is aware of the type information would be built on top of the streams, rather than being part of the stream protocol itself for this reason.
Yeah it makes sense to me that the actual network socket is going to move data around in buffers. I'm just offering an abstraction over that so that you can write code that is wholly agnostic to how data is stored.
And yes, because it's a new abstraction the compat story is interesting. We can easily wrap any source so we'll have loads of working sources. The fight will be getting official data sinks that support a new kind of stream
I've been wondering too, what the solution would be. IF the bots were actually helpful, I wouldn't care, but they always push an agenda, create noise, or derail discussions instead.
For now maybe all forums should require some bloody swearing in each comment to at least prove you've got some damn human borne annoyance in you? It might even work against the big players for a little bit, because they have an incentive to have their LLMs not swearing. The monetary reward is after all in sounding professional.
Easy enough for any groups to overcome of course, but at least it'd be amusing for a while. Just watching the swear-farms getting set up in lower paid countries, mistakes being made by the large companies when using the "swearing enabled" models and all that.
I'm quite sure I've read your article before and I've thought about this one a lot. Not so much from GIT perspective, but about textual representation still being the "golden source" for what the program is when interpreted or compiled.
Of course text is so universal and allows for so many ways of editing that it's hard to give up. On the other hand, while text is great for input, it comes with overhead and core issues for (most are already in the article, but I'm writing them down anyway):
1. Substitutions such as renaming a symbol where ensuring the correctness of the operation pretty much requires having parsed the text to a graph representation first, or letting go of the guarantee of correctness in the first place and performing plain text search/replace.
2. Alternative representations requiring full and correct re-parsing such as:
- overview of flow across functions
- viewing graph based data structures, of which there tend to be many in a larger application
- imports graph and so on...
3. Querying structurally equivalent patterns when they have multiple equivalent textual representations and search in general being somewhat limited.
4. Merging changes and diffs have fewer guarantees than compared to when merging graphs or trees.
5. Correctness checks, such as cyclic imports, ensuring the validity of the program itself are all build-time unless the IDE has effectively a duplicate program graph being continuously parsed from the changes that is not equivalent to the eventual execution model.
6. Execution and build speed is also a permanent overhead as applications grow when using text as the source. Yes, parsing methods are quite fast these days and the hardware is far better, but having a correct program graph is always faster than parsing, creating & verifying a new one.
I think input as text is a must-have to start with no matter what, but what if the parsing step was performed immediately on stop symbols rather than later and merged with the program graph immediately rather than during a separate build step?
Or what if it was like "staging" step? Eg, write a separate function that gets parsed into program model immediately, then try executing it and then merge to main program graph later that can perform all necessary checks to ensure the main program graph remains valid? I think it'd be more difficult to learn, but I think having these operations and a program graph as a database, would give so much when it comes to editing, verifying and maintaining more complex programs.
> what if the parsing step was performed immediately on stop symbols rather than later and merged with the program graph immediately rather than during a separate build step?
I think this is the way to go, kinda like on Github, where you write markdown in the comments, but that is only used for input, after that it's merged into the system, all code-like constructs (links, references, images) are resolveed and from then you interact with the higher level concept (rendered comment with links and images).
For programinng langauge, Unison does this - you write one function at a time in something like a REPL and functions are saved in content addressed database.
> Or what if it was like "staging" step?
Yes, and I guess it'd have to go even deeper. The system should be able to represent broken program (in edited state), so conceptually it has to be something like a structured database for code which separates the user input from stored semantic representation and the final program.
IDE's like IntelliJ already build a program model like this and incrementally update it as you edit, they just have to work very hard to do it and that model is imperfect.
There's million issues to solve with this, though. It's a hard problem.
I think mostly because an LLM is not a "mind". I'm sure there'll be an algorithm that could be considered a "mind" in the future, but present day an LLM is not it. Not yet.
This is in my opinion the greatest weakness of everything LLM related. If I care about the application I'm writing, and I believe I should if I bother doing it at all, it seems to me that I should want to be precise and concise at describing it. In a way, the code itself serves as a verification mechanism for my thoughts and whether I understand the domain sufficiently.
English or any other natural language can of course be concise enough, but when being brief they leave much to imagination. Adding verbosity allows for greater precision, but I think as well that that is what formal languages are for, just as you said.
Although, I think it's worth contemplating whether the modern programming languages/environments have been insufficient in other ways. Whether by being too verbose at times, whether the IDEs should be more like databases first and language parsers second, whether we could add recommendations using far simpler, but more strict patterns given a strongly typed language.
My current gripes are having auto imports STILL not working properly in most popular IDEs or an IDE not finding referenced entity from a file, if it's not currently open... LLMs sometimes help with that, but they are extremely slow in comparison to local cache resolution.
Long term I think more value will be in directly improving the above, but we shall see. AI will stay around too of course, but how much relevance it'll have in 10 years time is anybody's guess. I think it'll become a commodity, the bubble will burst and we'll only use it when sensible after a while. At least until the next generation of AI architecture will arrive.
I do like the build of Macbooks and especially the solid casing. Unfortunately I could never get used to MacOS even within 2.5 years and it was not quite as reliable for me as it is for many others.
XCode installations failing, Docker installation failing after an OS update never to work again without completely reinstalling OS, plugging in headphones would crash the Macbook (until OS update 6 months after I got it), video calls slowing to a halt, if sharing screen etc.
Also there were some things I just never got used to in Mac like window tabbing & minimize working in a Mac way. Maybe if I hadn't had a personal laptop that used Linux at the same time, I would have gotten used to it a little better, but I just plain hated the way it worked.
To be fair, I think it was still more reliable than varieties of Windows, especially the later ones! If tabbing worked more like under Windows and it allowed a bit more configuration, I might be using Mac these days.
That leaves Linux. Although it's not flawless neither after configuring Debian + i3, it works exactly like I want and the same installation has been reliably working for 5+ years. However, getting to the setup that just works certainly took several tries and depends on laptop compatibility, so... No ideal choices exist right now I think. Just luck and what someone is most used to in the end.
I’ve used Macs nearly exclusively for 13 years and have not gotten used to the window tabbing. I just fundamentally don’t think windows of the same application should be grouped together.
I gave it a try on my current codebase out of curiosity. Definitely useful. It worked well and fast, but it has a lot of duplicates that get rendered as exports in the NodeJS modules based codebase. I think it can sometimes be caused by me just being haphazard about re-exporting them, but other times I'm not sure.
Eg authenticatedMenu() appears 4 times in authenticatedMenu.js, only one of them is imported by 2 different files and 3 are just there alone. There's a single export in the file and a number of other files import it through an index.js that re-exports several files other files too.
In my case I think it'd help, if I could disable the duplicates as they don't really provide any useful information when exploring the codebase.
Also, if there was optionally a way to ignore the files that re-export functions/classes and collapse those paths, it'd make the graph a lot smaller and more easy to understand. Maybe it's already something that depgraph does, but the duplicates confuse things, so I'm not sure.
> I think it can sometimes be caused by me just being haphazard about re-exporting them, but other times I'm not sure.
I think so too. I guess that's how your project is structured and duplicates maybe inevitable.
The graph shows exactly how the project is organized. Right - "duplicates confuse things" - this would suggest eliminating "files that re-export functions/classes" or passing an option (-i) for ignoring specific paths would help. Otherwise, this issue is noted for further analysis.
From what it looks like, they want their streams to be compatible with AsyncIterator so it'd fit into existing ecosystem of iterators.
And I believe the Uint8Array is there for matching OS streams as they tend to move batches of bytes without having knowledge about the data inside. It's probably not intended as an entirely new concept of a stream, but something that C/C++ or other language that can provide functionality for JS, can do underneath.
For example my personal pet project of a graph database written in C has observers/observables that are similar to the AsyncIterator streams (except one observable can be listened to by more than one observer) moving about batches of Uint8Array (or rather uint8_t* buffer with capacity/count), because it's one of the fastest and easiest thing to do in C.
It'd be a lot more work to use anything other than uint8_t* batches for streaming data. What I mean by that, is that any other protocol that is aware of the type information would be built on top of the streams, rather than being part of the stream protocol itself for this reason.
reply