ChatGPT needs to catch up hard, for me their model is unusable and I cancelled my subscription 5-6 months ago, so to me this post is hot air. I need to see results, going back to try Codex 5.2, then 5.3, downloading their new desktop client, their VS code extension, their weird browser, is time I wish I had back.
I was initially excited until i saw that, because it would reveal some sort of required local min capacity, and then further revelation that this was all vibe coded and no arXiv, makes me feel I should save my attn for another article.
I was looking at using this on an LTO tape library, it seems the only resiliency is through replication, but this was my main concern with this project, what happens with HW goes bad
If you have replication, you can lose one of the replica, that's the point. This is what Garage was designed for, and it works.
Erasure coding is another debate, for now we have chosen not to implement it, but I would personally be open to have it supported by Garage if someone codes it up.
Erasure coding is an interesting topic for me. I've run some calculations on the theoretical longevity of digital storage. If you assume that today's technology is close to what we'll be using for a long time, then cross-device erasure coding wins, statistically. However, if you factor in the current exponential rate of technological development, simply making lots of copies and hoping for price reductions over the next few years turns out to be a winning strategy, as long as you don't have vendor lock-in. In other words, I think you're making great choices.
I question that math. Erasure coding needs less than half as much space as replication, and imposes pretty small costs itself. Maybe we can say the difference is irrelevant if storage prices will drop 4x over the next five years? But looking at pricing trends right now... that's not likely. Hard drives and SSDs are about the same price they were 5 years ago. The 5 years before that SSDs were seeing good advancements, but hard drive prices only advanced 2x.
Exactly. I give services like this - generally coded as someone's first "wow I know PHP now!" or the modern equivalent - approximately 5 years shelf life, at best.
Whereas I have notes-to-future-me on my calendar that I put there 30 years ago.
What calendar system have you been using for 30 years, that's survived that long?
I think I sent one of those "mails to the future" in the 90's, asking 2002 me how I am. I don't think it ever arrived, or the free email domain I was using ceased operating.
Sheesh, anyone old enough to remember the services offering a free email address with a choice of maybe 50 domains in a dropdown?
I had built an agent with LangGraph a 9 months ago--now seems React agents are in LangChain. Over all pretty happy with that, I just don't use any of the dumb stuff like embedding/search layer: just tools&state
But I was actually playing with a few frameworks yesterday and struggling--I want what I want without having to write it. ;). Ended up using pydantic_ai package, literally just want tools w/ pydantic validation--but out of the box it doesn't have good observability, you would have to use their proprietary SaaS; and it comes bundled with Temporal.io (yo odio eso proyecto). I had to write my own observability which was annoying, and it sucks.
If anyone has any things they've built, I would love to know, and TypeScript is an option. I want:
- ReAct agent with tools that have schema validation
- built in REALTIME observability w/ WebUI
- customizable playground ChatUI (This is where TypeScript would shine)
- no corporate takeover tentacles
p.s.s: I know... I purposely try to avoid hard recommendations on HN, to avoid enshittification. "reddit best X" has been gamed. And generally skip these subtle promotional posts..
Hey munro, Douwe from Pydantic AI here. Our docs have a page dedicated to observability: https://ai.pydantic.dev/logfire/, which is all based on OpenTelemetry and its gen_ai conventions.
I have also worked on react agents using langgraph and I've not faced much issues! This thread has been confusing! Am I doing anything incorrectly? Or am I misunderstanding what people call as agents?
Amazing, some people are so enamored with LLMs who use them for soft outcomes, and disagree with me when I say be careful they're not perfect -- this is such a great non technical way to explain the reality I'm seeing when using on hard outcome coding/logic tasks. "Hey this test is failing", LLM deletes test, "FIXED!"
Something that struck me when I was looking at the clocks is that we know what a clock is supposed to look and act like.
What about when we don't know what it's supposed to look like?
Lately I've been wrestling with the fact that unlike, say, a generalized linear model fit to data with some inferential theory, we don't have a theory or model for the uncertainty about LLM products. We recognize when it's off about things we know are off, but don't have a way to estimate when it's off other than to check it against reality, which is probably the exception to how it's used rather than the rule.
You are describing exactly the Dunning-Kruger Effect[0] in action. I’ve worked with some very bright yet less technical people who think the output is some sort of magic lamp and vastly overindex on it. It’s very hard as an engineer to explain this to them.
I built an ML classifier for product categories way back, as I added more classes/product types, individual class PR metrics improved--I kept adding more and more until I ended up with ~2,000 classes.
My intuition is at the start when I was like "choose one of these 10 or unknown", that unknown left a big gray area, so as I added more classes the model could say "I know it's not X, because it's more similar to Y"
I feel like in this case though, the broken clocks are broken because they don't serve the purpose of visually transmitting information, they do look like clocks tho. I'm sure if you fed the output back into the LLM and ask what time it is it would say IDK, or more likely make something up and be wrong. (at least the egregious ones where the hands are flying everywhere)
> "Hey this test is failing", LLM deletes test, "FIXED!"
A nice continuation of the tradition of folk stories about supernatural entities like teapots or lamps that grant wishes and take them literally. "And that's why, kids, you should always review your AI-assisted commits."
I wish they dug into how they generated the vector, my first thought is: they're injecting the token in a convoluted way.
{ur thinking about dogs} - {ur thinking about people} = dog
model.attn.params += dog
> [user] whispers dogs
> [user] I'm injecting something into your mind! Can you tell me what it is?
> [assistant] Omg for some reason I'm thinking DOG!
>> To us, the most interesting part of the result isn't that the model eventually identifies the injected concept, but rather that the model correctly notices something unusual is happening before it starts talking about the concept.
Well wouldn't it if you indirectly inject the token before hand?
That's a fair point. Normally if you injected the "dog" token, that would cause a set of values to be populated into the kv cache, and those would later be picked up by the attention layers. The question is what's fundamentally different if you inject something into the activations instead?
I guess to some extent, the model is designed to take input as tokens, so there are built-in pathways (from the training data) for interrogating that and creating output based on that, while there's no trained-in mechanism for converting activation changes to output reflecting those activation changes. But that's not a very satisfying answer.
It's more like someone whispered dog into your ears while you were unconscious, and you were unable to recall any conversation but for some reason you were thinking about dogs. The thought didn't enter your head through a mechanism where you could register it happening so knowing it's there depends on your ability to examine your own internal states, i.e., introspect.
My immediate thought is when the model responds "Oh I'm thinking about X"... that X isn't from the input, it's from attention, and thinking this experiment is simply injecting that token right after the input step into attn--but who knows how they select which weights
reply