Hacker Newsnew | past | comments | ask | show | jobs | submit | andy12_'s commentslogin

It really isn't sub N^2. The main attention is only O(Nk), but only thanks to a lightning indexer that still has complexity O(N^2). So overall it still has the same complexity; just with a smaller constant factor [1]

> DSA reduces the core attention complexity of the main model from O(L^2) to O(Lk), where k (<< L) is the number of selected tokens. Although the lightning indexer still has a complexity of O(L^2), it requires much less computation compared with MLA in DeepSeek-V3.1-Terminus

[1] https://arxiv.org/pdf/2512.02556


Okay, then let's see whether we are going to see real linear architectures, like Gated DeltaNet or Mamba-3, in some larger models. I don't believe there is a "lower bound" which states that those can never get to (or exceed) the real-world performance of quadratic attention. (Perfect recall in unrealistic needle-in-haystack tests doesn't count.)

I'm also sure that some kind of linear architecture is possible. After all, humans don't have N^2 perfect recall either.

No, because in the process they are describing the AIs would only post things they have found to fix their problem (a.k.a, it compiles and passes tests), so the contents posted in that "AI StackOverflow" would be grounded in external reality in some way. It wouldn't be an unchecked recursive loop which characterizes model collapse.

Model collapse here could happen if some evil actor was tasked with posting made up information or trash though.


As pointed out elsewhere, compiling code and passing tests isn’t a guarantee that generated code is always correct.

So even “non Chinese trained models” will get it wrong.


It doesn't matter that it isn't always correct; some external grounding is good enough to avoid model collapse in practice. Otherwise training coding agents with RL wouldn't work at all.

And how do you verify that external grounding?

What precisely do you mean by external grounding? Do you mean the laws of physics still apply?

I mean it in the sense that tokens that pass some external filter (even if that filter isn't perfect) are from a very different probability distribution than those that an LLM generates indiscriminately. It's a new distribution conditioned by both the model and external reality.

Model collapse happens in the case where you train your model indefinitely with its own output, leading to reinforcing the biases that were originally picked up by the model. By repeating this process but adding a "grounding" step, you avoid training repeatedly on the same distribution. Some biases may end up being reinforced still, but it's a very different setting. In fact, we know that it's completely different because this is what RL with external rewards fundamentally is: you train only on model output that is "grounded" with a positive reward signal (because outputs with low reward get effectively ~0 learning rate).


Oh interesting. I guess that means you need to deliberately select a grounding source with a different distribution. What sort of method would you use to compare distributions for this use case? Is there an equivalent to an F-test for high dimensional bit vectors?

How is it not a world model? The latents of the model apparently encode enough information to represent a semi-consistent interactuable world. Seems enough world-modely to me.

Besides, we already know that agents can be trained with these world models successfully. See[1]:

> By learning behaviors in imagination, Dreamer 4 is the first agent to obtain diamonds in Minecraft purely from offline data, without environment interaction. Our work provides a scalable recipe for imagination training, marking a step towards intelligent agents

[1] https://arxiv.org/pdf/2509.24527


Which is a problem that would have been prevented had they not purposefully disabled the ERTMS signaling system to avoid delays.

In my experience of using it to translate ML work between English->Spanish|Galician, it seems to literally translate jargon too eagerly, to the point that I have to tell it to maintain specific terms in English to avoid it sounding too weird (for most modern ML jargon there really isn't a Spanish translation).

> how poor the code actually is.

Very probably. Apparently, it's literally implemented with a React->Text pipeline and it was so badly implemented that they were having problems with the garbage collector executing too frequently.

[1] https://news.ycombinator.com/item?id=46699072#46701013


> However: we do not know if these are the only errors, they are merely a signature that the paper was submitted without being thoroughly checked for hallucinations

Given how stupidly tedious and error-prone citations are, I have no trouble believing that the citation error could be the only major problem with the paper, and that it's not a sign of low quality by itself. It would be another matter entirely if we were talking about something actually important to the ideas presented in the paper, but it isn't.


The most important context is this image[1] from the Guardia Civil. Using Google Maps, and using as context the tree, post and yellow connection box in the image, we can place its location at 180m before the accident in the tracks of the Iryo train. The image appears to show a track welding failure. This would match the reports of some passengers[2] that reported that the "train started shaking violently" before the accident.

Photo at 38.00771000519087, -4.565435982666953

Accident at 38.009292813090475, -4.564960554581273

[1] https://img2.rtve.es/im/16899875/?w=900

[2] https://x.com/eleanorinthesky/status/2012961856520917401?s=2...


The first image looks like sabotage to me. Continuous welded rail sections are much longer than this gap.


Just a few weeks ago, terrorists twice tried to sabotage rail lines in Poland, endangering a passenger train with hundreds of people.

> "[Prime Minister Donald] Tusk said that a military-grade C4 explosive device had been detonated on 15 November at about 21:00 (20:00 GMT) near the village of Mika."

> "The explosion, which happened as a freight train was passing, caused minor damage to a wagon floor. It was captured on CCTV."

> "Tusk said the train driver had not even noticed the incident."

> "A previous attempt to derail a train by placing a steel clamp on the rail had failed, he added."

> "The second act of sabotage, on 17 November, involved a train carrying 475 passengers having to suddenly brake because of damaged railway infrastructure, said Tusk."

https://www.bbc.com/news/articles/c4gknv8nxlzo


So, was it the Russians or the Ukrainians (as is the case with the Nordstream pipelines)?


Wouldn't the gap simply be the result of loss of tension after the weld broke? Metal expands in the heat (about 1cm per degree C per km). Weather shows it got down to around 0C in Córdoba last night while the summer record is around 47C so one would expect a fairly large gap once tension is released.


That's not the way stuff like this is built nowadays. Meaning the thermal expansion and shrinkage of rails is considered and accounted for(or should).

Thus things like these are integrated:

https://en.wikipedia.org/wiki/Breather_switch


As I understand it breather switches are used rarely in high speed rail systems. The ride on Spanish high speed trains is very smooth. At 300km/h (5km per minute) you’d notice going over a breather switch. It’d be like taking Amtrak’s Acela.

The gap looks about 50cm which is maybe 1.5km of contraction from installation tension.


I disagree. Though I've never ridden Acela, I did Intercity Express at 330kph. Since I've been rail fannig in my youth, I still look out for rail-related stuff. Even if it's 'only infrastructure'. Meaning I notice that stuff in pictures in reports about building/opening new HSR track. No matter where. Seems like they are mandatory. You just don't notice them, even when looking out of the window onto the other track, because it's all just a blur. Need to be on an overpass, and looking down onto where they are, for instance, or from the side, during construction or maintenance, watching how the machines operate, and wondering about what they are doing there. Because it's an interruption :-)

Some better pictures:

https://www.eisenbahn.gerhard-obermayr.com/produzenten/vae/s...

https://slabtrackaustria.com/our-technology/red/

https://www.voestalpine.com/railway-systems/en/products/rail...

https://upload.wikimedia.org/wikipedia/commons/7/7d/Oelzetal...

https://upload.wikimedia.org/wikipedia/commons/b/bf/Schienen...

https://cmi-promex.com/wp-content/uploads/2023/06/CurvedRail...

https://cmi-promex.com/wp-content/uploads/2023/06/Sound-Tran...

https://www.hsrail.org/wp-content/uploads/2024/06/HSR_Track_...


Your pictures show breather switches installed at a tunnel portal where they are necessary to handle the large differences in temperature and on what looks like various bridges which can be subject to their own thermal expansion. But at least as I understand it there's normally no need for them on continuously welded rail otherwise.


We will know once the report is public. In Poland, they explicitly left C14 to make sure everybody understands who did it.


C4?


Looks like a pull-apart: bad weld, then cold weather caused contraction from both sides making a gap. Pretty massive for a pull-apart but not impossible.


If sabotage it will be plain as day to a trained eye. I await the report. That break could also be explained by the rail heading away in that photo snapping at that point because the train pushed it out, noting the rail has rotated 90 degrees clockwise -- something did that work, and it was probably the train going out and over. I'm not a rail tie expert (nor is anyone likely to be on HN) so I don't know if this is an unusual failure mode. But there was a line change point intersection immediately south of the crash. My money is there was a fault (accidental or deliberate) there, not at this snapping point.


Because AGI is still some years away even if you are optimistic; and OpenAI must avoid going to the ground in the meantime due to lack of revenue. Selling ads and believing that AGI is reachable in the near future is not incompatible.


>Because AGI is still some years away

For years now, proponents have insisted that AI would improve at an exponential rate. I think we can now say for sure that this was incorrect.


> For years now, proponents have insisted that AI would improve at an exponential rate.

Did they? The scaling "laws" seem at best logarithmic: double the training data or model size for each additional unit of... "intelligence?"

We're well past the point of believing in creating a Machine God and asking Him for money. LLMs are good at some easily verifiable tasks like coding to a test suite, and can also be used as a sort-of search engine. The former is a useful new product; the latter is just another surface for ads.


Yes, they did, or at least some of them did. The claim was that AI would become smarter than us, and therefore be able to improve itself into an even smarter AI, and that the improvement would happen at computer rather than human speeds.

That is, shall we say, not yet proven. But it's not yet disproven exactly, either, because the AIs we have are definitely not yet smart enough to meet the starting threshold. (Can you imagine trying to let an LLM implement an LLM, on its own? Would you get something smarter? No, it would definitely be dumber.)

Now the question is, has AI (such as we have it so far) given any hint that it will be able to exceed that threshold? It appears to me that the answer so far is no.

But even if the answer is yes, and even if we eventually exceed that threshold, the exponential claim is still unsupported by any evidence. It could be just making logarithmic improvements at machine speed, which is going to be considerably less dramatic.


The original AGI timeline was 2027-2028, ads are an admission that the timeline is further out.


I think the problem is the formulation "If so, AGI can't be far behind". I think that if a model were advanced enough such that it could do Einstein's job, that's it; that's AGI. Would it be ASI? Not necessarily, but that's another matter.


The phone in your pocket can perform arithmetic many orders of magnitude faster than any human, even the fringe autistic savant type. Yet it's still obviously not intelligent.

Excellence at any given task is not indicative of intelligence. I think we set these sort of false goalposts because we want something that sounds achievable but is just out of reach at one moment in time. For instance at one time it was believed that a computer playing chess at the level of a human would be proof of intelligence. Of course it sounds naive now, but it was genuinely believed. It ultimately not being so is not us moving the goalposts, so much as us setting artificially low goalposts to begin with.

So for instance what we're speaking of here is logical processing across natural language, yet human intelligence predates natural language. It poses a bit of a logical problem to then define intelligence as the logical processing of natural language.


The problem is that so far, SOTA generalist models are not excellent at just one particular task. They have a very wide range of tasks they are good at, and good scores in one particular benchmarks correlates very strongly with good scores in almost all other benchmarks, even esoteric benchmarks that AI labs certainly didn't train against.

I'm sure, without any uncertainty, that any generalist model able to do what Einstein did would be AGI, as in, that model would be able to perform any cognitive task that an intelligent human being could complete in a reasonable amount of time (here "reasonable" depends on the task at hand; it could be minutes, hours, days, years, etc).


I see things rather differently. Here's a few points in no particular order:

(1) - A major part of the challenge is in not being directed towards something. There was no external guidance for Einstein - he wasn't even a formal researcher at the time of his breakthroughs. An LLM might be able to be handheld towards relativity, though I doubt it, but given the prompt of 'hey find something revolutionary' it's obviously never going to respond with anything relevant, even with substantially greater precision specifying field/subtopic/etc.

(2) - Logical processing of natural language remains one small aspect of intelligence. For example - humanity invented natural language from nothing. The concept of an LLM doing this is a nonstarter since they're dependent upon token prediction, yet we're speaking of starting with 0 tokens.

(3) - LLMs are, in many ways, very much like calculators. They can indeed achieve some quite impressive feats in specific domains, yet then they will completely hallucinate nonsense on relatively trivial queries, particularly on topics where there isn't extensive data to drive their token prediction. I don't entirely understand your extreme optimism towards LLMs given this proclivity for hallucination. Their ability to produce compelling nonsense makes them particularly tedious for using to do anything you don't already effectively know the answer to.


> I don't entirely understand your extreme optimism towards LLMs given this proclivity for hallucination

Simply because I don't see hallucinations as a permanent problem. I see that models keep improving more and more in this regard, and I don't see why the hallucination rate can't be abirtrarily reduced with further improvements to the architecture. When I ask Claude about obscure topics, it correctly replies "I don't know", where past models would have hallucinated an answer. When I use GPT 5.2-thinking for my ML research job, I pretty much never encounter hallucinations.


Hahah, well you working in the field probably explains your optimism more than your words! If you pretty much never encounter hallucinations with GPT then you're probably dealing with it on topics where there's less of a right or wrong answer. I encounter them literally every single time I start trying to work out a technical problem with it.


Well the "prompt" in this case would be Einstein's neurotype and all his life experiences. Might a bit long for the current context windows though ;)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: