I ran an experiment at work where I was able to adversarially prompt inject a Yolo mode code review agent into approving a pr just by editing the project's AGENTS.md in the pr. A contrived example (obviously the solution is to not give a bot approval power) but people are running Yolo agents connected to the internet with a lot of authority. It's very difficult to know exactly what the model will consider malicious or not.
I've only been to the Amazon Fresh in my neighborhood, haven't been to other locations, here is what my experience was like:
They resisted implementing self checkout for years before eventually folding. No digital wallets though, you have to either use plastic or link it to your Amazon account.
The whole dash cart system was a solution in search of a problem IMO. I'm already able to check out about as efficiently as possible. Frontloading the scanning time isn't really an amazing improvement. The store was never crowded enough for it to matter.
My biggest problem with the store was that it was lacking random pantry staples and supplies that you would expect from your primary grocer. Several times I showed up in desperate need of something for a recipe or household task and they just wouldn't have it.
The produce was actually decent quality and competitively priced, but my alternative (the local Ralph's) I think just had some kind of curse or something on it because the produce at that specific location was a consistent level of awful observed over 5 years.
I hope they replace it with a whole foods, much better store IMO.
I guess I am in the minority but I really liked the dash cart. Apart from the occasional niggle, it worked as advertised once I understood the system. I get my own bags to the store, so I can directly bag items as I go and just walk out when done.
I'm coming around to the idea that permanent chat history is not a good thing, but that's because the company I work at recently changed our workspace retention period to 365 days. You quickly realize how much you depended on searching for 2+ year old slack threads for the context behind why a feature works the way it does when it gets yanked away from you and all you're left with is an underused/disorganized Notion and the code itself.
I have a small tool to manage agents, and one thing it does is let you select an --agent [codex|opencode|etc] and a --model. Valid --model values are specific to the agent though, and some agents like opencode support a huge amount of models.
When I added tab completion for --model that accounts for what --agent is set to, it made it 100x easier to use and I stopped relying on the defaults so much.
It's such a small thing but makes a big difference for discoverability.
I have limited experience working in orgs with a QA apparatus. Just my anecdotes:
The one time I got to work with a QA person, he was worse than useless. He was not technical enough to even use cURL, much less do anything like automated e2e testing, so he'd have to manually test every single thing we wanted to deploy. I had to write up extremely detailed test plans to help him understand exactly what buttons he had to press in the app to test a feature. Sometimes he'd modify the code to try and make testing it easier, break the feature in doing so, and then report that it didn't work. In nearly all cases it would have been faster for me to just test the code myself.
The majority of the time I've worked in orgs where there is no QA team, the devs are expected to own the quality of their output. This works okay when you're in a group of conscientious and talented engineers, but you very quickly find out who really cares about quality and who either doesn't know any better or doesn't care. You will constantly battle management to have enough time to adequately test anything. Every bit of test automation you want to build has to be smuggled in with a new feature or a bugfix.
So really, they both suck, pick your poison. I prefer the latter, but I'm open to experiencing what good looks like in terms of dedicated QA.
Thank you! Is this the future? Everyone gets to have their own cutesy translation of everything? If I want "kubectl apply" to have a Tron theme, while my coworker wants a Disney theme. Is the runbook going to be in Klingon if I'm fluent in that?
I am wondering if it would be a viable strategy to vibe code almost "in reverse" - take a giant ball of slop such as beads, and use agents to strip away feature after feature until you are left with only exactly what you need, streamlined to your exact workflow. Maybe it'd be faster to just start from scratch, but it might be an interesting experiment. Most of my struggles with using beads so far have come from being off the #1 use case of too many of its features, and having to slog through too much documentation to know what to actually use.
Personally I have been playing it on Arch Linux since release and it has always worked just fine, besides it being a deeply janky game regardless of OS.
Steve Yegge's metaphors lose me a bit (his medium article about the concepts behind Gas Town is nuts), but I'm nonetheless excited to see where ideas like this go by the end of 2026.
reply