This sounds like an engineering quality problem rather than a tooling problem.
Well structured redux (or mobx or zustand for that matter) can be highly maintainable & performant, in comparison to a codebase with poorly thought out useState calls littered everywhere and deep levels of prop drilling.
But the popularity of Redux especially in the earlier days of react means there are quite a lot of redux codebases around, and by now many of them are legacy.
"We'll build a big centralised store and take slices out of it" still feels like something you should eventually realise your app now needs rather than a starting point, even in libraries which do it without as much ceremony and indirection as Redux.
This brought back flashbacks of a brief moment when I was left as the sole maintainer of a “soon-to-be-axed-but-currently-critical” feature in a checkout flow developed by people who A) knew redux and B) were not me. They recently implanted some state updates before departing, and the bugs for that fell to me. I think I spent a few days tracing the code and understanding how they managed their state, and I think I needed to touch about five files to fully encompass the updates for one data point - I felt nuts!
I believe Gary Marcus is quite well known for terrible AI predictions. He's not in any way an expert in the field. Some of his predictions from 2022 [1]
> In 2029, AI will not be able to watch a movie and tell you accurately what is going on (what I called the comprehension challenge in The New Yorker, in 2014). Who are the characters? What are their conflicts and motivations? etc.
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
> In 2029, AI will not be able to work as a competent cook in an arbitrary kitchen (extending Steve Wozniak’s cup of coffee benchmark).
> In 2029, AI will not be able to reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]
> In 2029, AI will not be able to take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
Many of these have already been achieved, and it's only early 2026.
Which ones are you claiming have already been achieved?
My understanding of the current scorecard is that he's still technically correct, though I agree with you there is velocity heading towards some of these things being proven wrong by 2029.
For example, in the recent thread about LLMs and solving an Erdos problem I remember reading in the comments that it was confirmed there were multiple LLMs involved as well as an expert mathematician who was deciding what context to shuttle between them and helping formulate things.
Similarly, I've not yet heard of any non-expert Software Engineers creating 10,000+ lines of non-glue code that is bug-free. Even expert Engineers at Cloud Flare failed to create a bug-free OAuth library with Claude at the helm because some things are just extremely difficult to create without bugs even with experts in the loop.
The bug-free code one feels unfalsifiable to me. How do you prove that 10,000 lines of code is bug-free, and then there's a million caveats about what a bug actually is and how we define one.
The second claim about novels seems obviously achieved to me. I just pasted a random obscure novel from project gutenberg into a file and asked claude questions about the characters, and then asked about the motivations of a random side-character. It gave a good answer, I'd recommend trying it yourself.
I agree with you but I'd point out that unless you've read the book it's difficult to know if the answer you got was accurate or it just kinda made it up. In my experience it makes stuff up.
Like, it behaves as if any answer is better than no answer.
You can't really compare to human performance because the failure modes and performance characteristics are so different.
In some instances you'll get results that are shockingly good (and in no time), in others you'll have a grueling experience going in circles over fundamental reasoning, where you'd probably fire any person on the spot for having that kind of a discussion chain.
And there's no learning between sessions or subject area mastery - results on the same topic can vary within same session (with relevant context included).
So if something is superhuman and subhuman a large percentage of time but there's no good way of telling which you'll get or how - the result isn't the average if you're trying to use the tool.
I've done similar loose experiments using a 19th century French novel I have read (L'Agonie by Jean Lombard), and which is quite obscure. There is no English translation of the novel published.
Using Gemini Notebooks I've found it passable at summarizing chapters, listing characters, major theme and so on, and it can do this in English.
"What is the symbolism of the Black Stone?"
> In the sources, the Black Stone (referred to as the Pierre-Noire) serves as the central religious icon of the Syrian Sun cult brought to Rome by the Emperor Elagabalus. Its symbolism is multifaceted, ranging from fertility and virility to a mystical pursuit of universal unity.
> It represents the perennity of the Sun, which fertilizes the world, causing "germs to rise" and spreading them through the atmosphere. It is viewed as the "definitive form" of divinity, intended to absorb and replace the "transitory forms" of all other Roman, Greek, Egyptian, and Persian gods, including the Christian "Kreistos".
> Virility and Phallic Symbolism
>
> The sources explicitly characterize the Black Stone as a gigantic sacred phallus. It is described as:
• An icon of virility and the "organ of generation" • A "unisexual icon" that materializes the generative force of nature • A representation of "virility in activity," which is why it is often paraded and elevated in an "orgasmic" or "colossal adoration"
> The Androgyne and Universal Unity - Beyond simple fertility, the philosopher Atillius explains a more complex, "mad" metaphysical project associated with the stone. It symbolizes "Life One" (Vie Une) and the return to a unisexual state of perfection. • The Androgyne: Atillius believes that by pursuing "the male sex by the male sex," the cult "inutilizes" the female sex to eventually create the Androgyne—a self-sufficient being containing both sexes • Unity: The stone signifies the fusion of all generative forces into a single Unity, reversing the "separation of the sexes" which is viewed as a state of unhappiness and impotence. • Marriage of Moon and Sun: The ritual marriage of the goddess Astaroth (representing the Moon and the female principle) to the Black Stone (representing the Sun and the male principle) symbolizes the merging of the Orient and Occident into this unified life principle
>
> Destruction of the Symbol - The Black Stone ultimately becomes a symbol of Oriental pollution and decadence to the Roman populace. During the final rebellion against Elagabalus, the stone is torn from its temple on the Palatine, defiled with filth, and broken into pieces to ensure that its "signification of Life" would never again dominate Rome.
This is all accurate to the book, even teasing out a couple themes that were only subconsciously present to me.
The NotebookLM version gives citations with links to the original text to support all these assertions, which largely are coherent with that purpose.
The input is raw images of a book scan! Imperfect as it is it still blows my mind. Not that long ago any kind of semantic search or analysis was a very hard AI problem.
Not quite the same analysis. The human is better, no surprise. But the NotebookLM output links back to the original book in a very useful way. If you think about it as fuzzy semantic search it's amazing. If you want an essay or even just creativity, yes it's lacking.
It doesn't have to be the same analysis to put it in a partially overlapping vector space. Not saying it wasn't a useful perspective shuffling in the vector space, but it definitely wasn't original.
LLMs haven't solved any of the 2029 predictions as they were posited. But I expect some will be reached by 2029. The AI hype acts like all this is easy. Not by 2029 doesn't mean impossible or even most of the way there.
LLMs will never achieve anything as long as any victory can be hand waved away with "in the training set". Somehow these models have condensed the entire internet down to a few TB's, yet people aren't backing up their terabytes of personal data down to a couple MB using this same tech...wonder why
It wasn't a hand wave. I gave an exact source, which OP admitted was better.
They certainly haven't "condensed the entire internet into a few TBs". People aren't backing up their personal data to a few MB because your assumption is false.
Maybe when people stop hand waving abilities that aren't there we will better understand their use as a tool and not magic.
I strongly disagree. I’ve yet to find an AI that can reliably summarise emails, let alone understand nuance or sarcasm. And I just asked ChatGPT 5.2 to describe an Instagram image. It didn’t even get the easily OCR-able text correct. Plus it completely failed to mention anything sports or stadium related. But it was looking at a cliche baseball photo taken by an fan inside the stadium.
I have had ChatGPT read text in an image, give me a 100% accurate result, and then claim not to have the ability and to have guessed the previous result when I ask it to do it again.
I'm still trying to find humans that do this reliably too.
To add on, 5.2 seems to be kind of lazy when reading text in images by default. Feeding it an image it may give the first word or so. But coming back with a prompt 'read all the text in the image' makes it do a better job.
With one in particular that I tested I thought it was hallucinating some of the words, but there was a picture in the picture with small words it saw I missed the first time.
I think a lot of AI capabilities are kind of munged to end users because they limit how much GPU is used.
1) Is it actually watching a movie frame by frame or just searching about it and then giving you the answer?
2) Again can it handle very long novels, context windows are limited and it can easily miss something. Where is the proof for this?
4 is probably solved
4) This is more on predictor because this is easy to game. you can create some gibberish code with LLM today that is 10k lines long without issues. Even a non-technical user can do
I think all of those are terrible indicators, 1 and 2 for example only measure how well LLMs can handle long context sizes.
If a movie or novel is famous the training data is already full of commentary and interpretations of them.
If its something not in the training data, well I don't know many movies or books that use only motives that no other piece of content before them used, so interpreting based on what is similar in the training data still produces good results.
EDIT:
With 1 I meant using a transcript of the Audio Description of the movie. If he really meant watch a movie I'd say thats even sillier because well of course we could get another Agent to first generate the Audio Description, which definitely is possible currently.
Just yesterday I saw an article about a police station's AI body cam summarizer mistakenly claim that a police officer turned into a frog during a call. What actually happened was that the cartoon "princess and the frog" was playing in the background.
Sure, another model might have gotten it right, but I think the prediction was made less in the sense of "this will happen at least once" and more of "this will not be an uncommon capability".
When the quality is this low (or variable depending on model) I'm not too sure I'd qualify it as a larger issue than mere context size.
My point was not that those video to text models are good like they are used for example in that case, but more generally I was referring to that list of indicators. Like surely when analysing a movie it is alright if some things are misunderstood by it, especially as the amount of misunderstanding can be decreased a lot. That AI body camera surely is optimized on speed and inference cost. but if you give an agent 10 1s images along with the transcript of that period and the full prior transcript, and give it reasoning capabilities, it would take almost endlessy for that movie to process but the result surely will be much better than the body cameras. After all the indicator talks about "AI" in general so judge a model not optimized for capability but something else to measure on that indicator
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo). Of course the movie variety is even more challenging since it involves especially complex multi-modal input. You could easily extend it to making sense of a whole TV series.
Yes. I am a novelist and I noticed a step change in what was possible here around Claude Sonnet 3.7 in terms of being able to analyze my own unpublished work for theme, implicit motivations, subtext, etc -- without having any pre-digested analysis of the work in its training data.
My word count has hovered around 100k for most of my three years of writing and revising. This does sometimes run up against limits on Claude (or recently, with Opus 4.5, compaction) but in the past the whole thing has fit just fine as a plain text file.
>Can AI actually do this? This looks like a nice benchmark for complex language processing, since a complete novel takes up a whole lot of context (consider War and Peace or The Count of Monte Cristo)
Yes, you just break the book down by chapters or whatever conveniently fits in the context window to produce summaries such that all of the chapter summaries can fit in one context window.
You could also do something with a multi-pass strategy where you come up with a collection of ideas on the first pass and then look back with search to refine and prove/disprove them.
Of course for novels which existed before the time of training an LLM will already contain trained information about so having it "read" classic works like The Count of Monte Cristo and answer questions about it would be a bit of an unfair pass of the test because models will be expected to have been trained on large volumes of existing text analysis on that book.
>reliably answer questions about plot, character, conflicts, motivations
LLMs can already do this automatically with my code in a sizable project (you know what I mean), it seems pretty simple to get them to do it with a book.
> Yes, you just break the book down by chapters or whatever conveniently fits in the context window to produce summaries such that all of the chapter summaries can fit in one context window.
I've done that a few month ago and in fact doing just this will miss cross-chapter informations (say something is said in chapter 1, that doesn't appears to be important but reveals itself crucial later on, like "Chekhov's gun").
Maybe doing that iteratively several time would solve the problem, I run out of time and didn't try but the straightforward workflow you're describing doesn't work so I think it's fair to say this challenge isn't solve. (It works better with non-fiction though, because the prose is usually drier and straight to the point).
That's what I did, but the thing is the LLM has no way to know what details are important in the first chapter before seeing their importance in the later chapters, and so these details usually get discarded by the summarization process.
Novel is different from a codebase. In code you can have a relationship between files and most files can be ignored depending on what you're doing. But for a novel, its a sequential thing, in most cases A leads to B and B leads to C and so on.
> Re: movies. Get YouTube premium and ask YouTube to summarize a 2hr video for you.
This is different from watching a movie. Can it tell what suit actor was wearing? Can it tell what the actor's face looked like? Summarising and watching are too different things.
Yes, it is possible to do those things and there are benchmarks for testing multimodal models on their ability to do so. Context length is the major limitation but longer videos can be processed in small chunks whose descriptions can be composed into larger scenes.
The Gary Marcus proposal you refer to was about a novel, and not a codebase. I think GP's point is that motivations require analysis outside of the given (or derived) context window, which LLMs are essentially incapable of doing.
No human reads a novel and evaluates it as a whole. It's a story and the readers perception changes over the course of reading the book. Current AI can certainly do that.
> It's a story and the readers perception changes over the course of reading the book.
You're referring to casual reading, but writers and people who have an interest and motivation to read deeply review, analyze, and summarize books under lenses and reflect on them; for technique as much as themes, messages, how well they capture a milieu, etc. So that's quite a bit more than "no human"!
I'm pretty sure it can do all of those except for the one which requires a physical body (in the kitchen) and the one that humans can't do reliably either (construct 10000 loc bug-free).
Besides being a cook which is more of a robotics problem all of the rest are accomplished to the point of being arguable about how reliably LLMs can perform these tasks, the arguments being between the enthusiast and naysayer camps.
The keyword being "reliably" and what your threshold is for that. And what "bug free" means. Groups of expert humans struggle to write 10k lines of "bug free" code in the absolutist sense of perfection, even code with formal proofs can have "bugs" if you consider the specification not matching the actual needs of reality.
All but the robotics one are demonstrable in 2026 at least.
I don't understand how this claim can even be tested:
> In 2029, AI will not be able to read a novel and reliably answer questions about plot, character, conflicts, motivations, etc. Key will be going beyond the literal text, as Davis and I explain in Rebooting AI.
Once you are "going beyond the literal text" the standard is usefulness of your insight about the novel, not whether your insight is "right" or "wrong".
Which ones of those have been achieved in your opinion?
I think the arbitrary proofs from mathematical literature is probably the most solved one. Research into IMO problems, and Lean formalization work have been pretty successful.
Then, probably reading a novel and answering questions is the next most successful.
Reliably constructing 10k bug free lines is probably the least successful. AI tends to produce more bugs than human programmers and I have yet to meet a programmer who can reliably produce less than 1 bug per 10k lines.
Formalizing an arbitrary proof is incredibly hard. For one thing, you need to make sure that you've got at least a correct formal statement for all the prereqs you're relying on, or the whole thing becomes pointless. Many areas of math ouside of the very "cleanest" fields (meaning e.g. algebra, logic, combinatorics etc.) have not seen much success in formalizing existing theory developments.
I have seen many people try to use Claude Code and get LOTS of bugs. Show me any > 10k project you have made with it and I will put the effort in to find one bug free of charge.
In my opinion, contrary to other comments here I think AI can do all of the above already except being a kitchen cook.
Just earlier today I asked it to give me a summary of a show I was watching until a particular episode in a particular season without spoiling the rest of it and it did a great job.
> Many of these have already been achieved, and it's only early 2026.
I'm quite sure people who made those (now laughable) predictions will tell you none of these has been achieved, because AI isn't doing this "reliably" or "bug-free."
Defending your predictions is like running an insurance company. You always win.
And why not? Is there any reason for this comment to not appear?
If Bill Gates made a predication about computing, no matter what the predication says, you can bet that 640K memory quote would be mentioned in the comment section (even he didn't actually say that).
I think it’s for good reason. I’m a bit at a loss as to why every time this guy rages into the ether of his blog it’s considered newsworthy. Celebrity driven tech news is just so tiresome. Marcus was surpassed by others in the field and now he’s basically a professional heckler on a university payroll. I wish people could just be happy for the success of others instead of fuming about how so and so is a billionaire and they are not.
> Everything about this is ridiculous, and it's all Anthropic's fault. Anthropic shouldn't have an all-you-can-eat plan for $200 when their pay-as-you-go plan would cost more than $1,000+ for comparable usage
Hard disagree. Companies can and do subsidize products to gather market share. It's just a loss leader [1]. The big money for them is likely satisfied software engineers pushing their employers to pay for more Anthropic products in an enterprise setting.
I have never really grokked Ruby on Rails and I passionately hate all the frameworks that try to adapt it to some other language.
That said, I suspect that Ruby on Rails itself occupies kind of a special space where the magic is acceptable because people who write Ruby are used to having very very sharp tools and have learned to wield them carefully. Give that magic to a PHP or Java programmer and there is immediately gallons of blood on the floor.
(says former Rubyist who was put off by the RoR stuff because I'm apparently more of a Haskeller at heart.)
Hard disagree. There are very few scenarios where I'd pick speed (quantity) over intelligence (quality) for anything remotely to do with building systems.
If you thought a human working on something will benefit from being "agile" (building fast, shipping quickly, iterating, getting feedback, improving), why should it be any different from AI models?
Implicit in your claim are specific assumptions about how expensive/untenable it is to build systemic guardrails and human feedback, and specific cost/benefit ratio of approximate goal attainment instead of perfect goal attainment. Rest assured that there is a whole portfolio of situations where different design points make most sense.
1. law of diminishing returns - AI is already much, much faster at many tasks than humans, especially at spitting out text, so becoming even faster doesn’t always make that much of a difference.
2. theory of constraints - throughput of a system is mostly limited by the „weakest link“ or slowest part, which might not be the LLM, but some human-in-the-loop, which might be reduced only by smarter AI, not by faster AI.
3. Intelligence is an emergent property of a system, not a property of its parts - with other words: intelligent behaviour is created through interactions. More powerful LLMs enable new levels of interaction that are just not available with less capable models. You don’t want to bring a knife, not even the quickest one in town, to a massive war of nukes.
I agree with you for many use cases, but for the use case I'm focused on (Voice AI) speed is absolutely everything. Every millisecond counts for voice, and most voice use cases don't require anything close to "deep thinking. E.g., for inbound customer support use cases, we really just want the voice agent to be fast and follow the SOP.
If you have a SOP, most of the decision logic can be encoded and strictly enforced. There is zero intelligence involved in this process, it’s just if/else. The key part is understanding the customer request and mapping it to the cases encoded in the SOP - and for that part, intelligence is absolutely required or your customers will not feel „supported“ at all, but be better off with a simple form.
What do you mean by "such a system"? One that uses AI to funnel your natural language request into their system of SOP? Or one that uses SOPs to handle cases in general? SOP are great, they drastically reduce errors, since the total error is the square root of the sum of squares of random error and bias – while bias still occurs, the random error can and should be reduced by SOPs, whenever possible. The problem is that SOPs can be really bad: "Wait, I will speak to my manager" -> probably bad SOP. "Wait, I will get my manager so that you can speak to them" -> might be a better SOP, depending on the circumstances.
It never works. You always just get the digital equivalent of a runaround and there simply isn't a human in the loop to take over when the AI botches it (again). So I gave up trying, this crap should not be deployed unless it works at least as good as a person. You can't force people to put up with junk implementations of otherwise good ideas in the hope that one day you'll get it right, customer service should be a service because on the other end of the line is someone with a very high probability of being already dissatisfied with your company and/or your product. For me this is not negotiable, if my time is less valuable to you, the company, than it is to actually put someone on to help then my money will go somewhere else.
Shameless self plug, but my workout tracking app[1] uses a sync engine and it has drastically simplified the complexities of things like retry logic, intermittent connectivity loss, ability to work offline etc.
Luckily this is a use case where conflict resolution is pretty straightforward (only you can update your workout data, and Last Write Wins)
Well structured redux (or mobx or zustand for that matter) can be highly maintainable & performant, in comparison to a codebase with poorly thought out useState calls littered everywhere and deep levels of prop drilling.
Redux Toolkit has been a nice batteries-included way to use redux for a while now https://redux-toolkit.js.org/
But the popularity of Redux especially in the earlier days of react means there are quite a lot of redux codebases around, and by now many of them are legacy.
reply