More

naasking · 2026-02-20T16:50:15 1771606215

IIRC, some researchers are working on mixed AR+diffusion models for this sort of thing.

abeppu · 2026-02-20T17:26:33 1771608393

I think the gap is, if they're building hybrids with _forward_ AR and diffusion, they risk giving up the cool part of diffusion which is reasoning back. I may be imposing unreasonable human biases on to this, but I really think it would be interesting to have the model engage with the structure of the text, rather than just being either a sequence or an array of tokens. E.g. "I'm going to _ tomorrow." If the _ is not just a token but an expansion in context, which might be a noun phrase, a verb phrase etc, it could be filled in with "the mall", "practice guitar". In code "if (_1) { return _2; }", _1 could be an expression whose type is bool, and which makes sense as a check to confirm that some process is finished. I don't care specifically how many tokens either of those is, but I do care that it makes sense in context.

naasking · 2026-02-20T21:43:23 1771623803

I was thining of something like LLaDa that uses a Transformer to predict forward masked tokens:

https://arxiv.org/abs/2502.09992

naasking · 2026-02-19T15:33:30 1771515210

> Not only has USAID's destruction permanently destroyed US reputation in many place and will be responsible for the deaths of millions, including children, but many US farmers were USAID farmers. 100% of their crop and all of their income was tied to USAID.

I predict that these predictions will mostly not happen.

naasking · 2026-02-18T11:23:39 1771413819

> since the AI is writing the tests it's obvious that they're going to pass

That's not obvious at all if the AI writing the tests is different than the AI writing the code being tested. Put into an adversarial and critical mode, the same model outputs very different results.

t43562 · 2026-02-18T14:43:44 1771425824

IMO the reason neither of them can really write entirely trustworthy tests is that they don't have domain knowledge so they write the test based on what the code does plus what they extract from some prompts rather than based on some abstract understanding of what it should do given that it's being used e.g. in a nuclear power station or for promoting cat videos or in a hospital or whatever.

Obviously this is only partially true but it's true enough.

It takes humans quite a long time to learn the external context that lets them write good tests IMO. We have trouble feeding enough context into AIs to give them equal ability. One is often talking about companies where nobody bothers to write down more than 1/20th of what is needed to be an effective developer. So you go to some place and 5 years later you might be lucky to know 80% of the context in your limited area after 100s of meetings and talking to people and handling customer complaints etc.

naasking · 2026-02-18T16:02:05 1771430525

Yes, some kind of spec is always needed, and if the human programmer only has the spec in their head, then that's going to be a problem, but it's a problem for teams of humans as well.

disiplus · 2026-02-18T13:52:18 1771422738

Even if its a different session it can be enough. But that said i had times where it rewrote tests "because my implementation was now different so the tests needed to be updated" so you have to prompt even that to tell it to not touch the tests.

xmcqdpt2 · 2026-02-18T14:13:53 1771424033

and then verify that it obeyed the prompt!

Someone needs to build an agentic tool that does strict, enforced TDD.

naasking · 2026-02-17T05:09:31 1771304971

Publishing something publicly means readers can do anything they want with the knowledge they gained from it. They just can't redistribute it verbatim due to copyright.

naasking · 2026-02-17T04:08:24 1771301304

Nobody gets to have unbounded information about others. It's weird that you think there should be no privacy constraints.

hdlothia · 2026-02-17T10:05:44 1771322744

Why are you saying unbounded when the discussion is about court proceedings and convictions? There is a clear and consistent boundary here, no one is asking for search logs and round the clock surveillance.

dirasieb · 2026-02-17T07:14:19 1771312459

what if these “others” voluntarily apply to a position where they have regular contact and help take care of your children? is it ok then to be informed on whether or not they are a convicted child rapist?

naasking · 2026-02-16T22:48:10 1771282090

Historical record, as one example. We gain considerable value from official records from the past, why would our descendents be any different?

naasking · 2026-02-16T20:06:39 1771272399

Some pretty compelling evidence is history: we had dynamic and interactive web pages 20 years ago that were faster on computers that were an order of magnitude slower.

naasking · 2026-02-14T02:06:09 1771034769

If they're doing this and bothering to interact with tickets at all, presumably they've willingly taken on a duty to the software's quality and all that that entails.

otterley · 2026-02-14T19:07:39 1771096059

Maintaining an open-source software project is frequently a hobby that’s performed out of a labor of love. There’s no duty owed to anyone, nor should one be implied by past behavior. The open-source community is not a slave trade.

naasking · 2026-02-14T23:59:26 1771113566

Hence why I said "voluntarily".

otterley · 2026-02-15T00:36:02 1771115762

I don't see how that changes anything (and you didn't say "voluntarily"). Volunteering does not create a duty. One can volunteer to pick up litter and give up halfway through; the only consequence would be disappointment.

naasking · 2026-02-15T20:13:13 1771186393

> Volunteering does not create a duty.

Volunteering to maintain a project literally does create entail accepting duties, that's what taking on the role of maintainer entails. They are of course free to give up that role at any time, but those duties exist while that role has been adopted.

otterley · 2026-02-15T20:21:24 1771186884

Let's suppose for the sake of the argument that while you are a volunteer, you take on some duty. What is the nature of that duty? And how do we enforce the execution of said duty? What are the consequences of it not being performed?

You can't really say someone has a "duty" without also implying that they have a "responsibility," and thus liability if they fail to execute those duties properly. I don't see how this fits at all for a volunteer. Very few people are going to volunteer for no pay if they're taking on a risk of liability.

Maybe you mean a civic duty? That would make somewhat more sense, but the problem is that there’s no objective standard against which to test performance. It’s completely subjective and will be forever argued—much like this thread. :-)

naasking · 2026-02-12T19:20:02 1770924002

Google is still behind the largest models I'd say, in real world utility. Gemini 3 Pro still has many issues.

naasking · 2026-02-12T16:53:40 1770915220

That doesn't make sense with subscriptions.

subscribed · 2026-02-13T12:31:04 1770985864

It does, £15 Claude Pro licence is 2 hours with a small code base and Serena.