More

measurablefunc · 2026-01-27T20:22:15 1769545335

I thought they didn't believe in encrypted communication tools.

measurablefunc · 2026-01-27T04:10:44 1769487044

Something else to consider is that "reasoning" (marketing term having nothing to do w/ actual reasoning) will only work for problems that can be broken up into chunks of quadratic complexity such that each chunk can then be re-encoded into the context to carry out more quadratic operations. I don't know if there is a theorem in CS proving obstructions for breaking up problems into such chunks but I think it is intuitively obvious that there will be problems which can not be solved w/ such restrictions.

measurablefunc · 2026-01-26T07:24:12 1769412252

There are only 4 optimizations in computer science: inlining, partial evaluation, dead code elimination, & caching. It looks like AI researchers just discovered inlining & they already knew about caching so eventually they'll get to partial evaluation & dead code elimination.

johndough · 2026-01-26T08:43:40 1769417020

Which categories do algorithmic optimizations fall under? For example:

Strassen algorithm for matrix multiplication https://en.wikipedia.org/wiki/Strassen_algorithm

FFT convolution https://dsp.stackexchange.com/a/63211

Winograd convolution https://www.cv-foundation.org/openaccess/content_cvpr_2016/p...

And of course optimization algorithms themselves.

torginus · 2026-01-26T12:27:07 1769430427

Don't know about the others, but FFT is the classic case of common subexpression evaluation (its mathematically equivalent), which I think by OPs definition would fall under caching.

j-pb · 2026-01-26T09:48:01 1769420881

Partial evaluation on the symbolic structure of the problem.

jafioti · 2026-01-26T17:37:41 1769449061

That's a bit trite tbh. We all know of these techniques, but actually implementing them on GPUs in a low-overhead manner that maintains the model's fidelity is challenging. It's much more than just breaking out the old CS book and picking the next idea from there.

imtringued · 2026-01-26T10:00:08 1769421608

Your list is so short it doesn't even include the basics such as reordering operations.

It also feels incredibly snarky to say "they knew about caching" and that they will get to partial evaluation and dead code elimination, when those seem to be particularly useless (beyond what the CUDA compiler itself does) when it comes to writing GPU kernels or doing machine learning in general.

You can't do any partial evaluation of a neural network because the activation functions are interrupting the multiplication of tensors. If you remove the activation function, then you end up with two linear layers that are equivalent to one linear layer, defeating the point of the idea. You could have trained a network with a single layer instead and achieved the same accuracy with a corresponding shorter training/inference time.

Dead code elimination is even more useless since most kernels are special purpose to begin with and you can't remove tensors without altering the architecture. Instead of adding useless tensors only to remove them, you could have simply used a better architecture.

torginus · 2026-01-26T12:37:31 1769431051

I think you can. If you have a neuron whose input weights are 100,-1,2, with threshold 0, you can know the output of the neuron if the first input is enabled, as the other 2 dont matter, so you can skip evaluating those.

I'm not enough of an expert to see if there's any actualy merit to this idea, and if you can skip evaluating huge parts of the network and keeping track of such evaluations, is actually worth it, but it intuitively makes sense to me that making an omelette has nothing to do with the Battle of Hastings, so when making a query about the former, the neurons encoding the latter might not affect the output.

Afaik, there's already research into finding which network weight encode which concepts.

MOE is a somewhat cruder version of this technique.

fragmede · 2026-01-26T07:41:59 1769413319

Dead code elimination is already a technique in AI when someone takes an MoE model and removes an unused "E" from it.

direwolf20 · 2026-01-26T13:46:54 1769435214

Model pruning is dead code elimination

mxkopy · 2026-01-26T08:02:21 1769414541

AI actually has some optimizations unique to the field. You can in fact optimize a model to make it work; not a lot of other disciplines put as much emphasis on this as AI

tossandthrow · 2026-01-26T08:13:10 1769415190

Can you list these optimizations?

mxkopy · 2026-01-26T08:18:51 1769415531

RLHF is one that comes to mind

tossandthrow · 2026-01-26T09:52:07 1769421127

Well, this is an entirely other category of optimizations - not program performance but model performance.

lucrbvi · 2026-01-26T10:16:14 1769422574

Yes, in "runtime optimization" the model is just a computation graph so we can use a lot of well known tricks from compilation like dead code elimination and co..

tossandthrow · 2026-01-26T10:21:30 1769422890

We are getting closer!

What other optimizations are there that can be used than what explicitly falls into the 4 categories that the top commenter here listed out?

mirekrusin · 2026-01-26T13:32:33 1769434353

For inference assorted categories may include vectorization, register allocation, scheduling, lock elision, better algos, changing complexity, better data structures, profile guided specialization, layout/alignment changes, compression, quantization/mixed precision, fused kernels (goes beyond inlining), low rank adapters, sparsity, speculative decoding, parallel/multi token decoding, better sampling, prefill/decode separation, analog computation (why not) etc etc.

There is more to it, mentioned 4 categories are not the only ones, they are not even broad categories.

If somebody likes broad categories here is good one: "1s and 0s" and you can compute anything you want, there you go – single category for everything. Is it meaningful? Not really.

tossandthrow · 2026-01-26T16:29:50 1769444990

Thanks!

measurablefunc · 2026-01-26T06:39:44 1769409584

What about C-suite executives & shareholders? Are they safe from automation?

meindnoch · 2026-01-26T09:45:55 1769420755

A uniquely important thing that a CEO brings to the table is accountability. You can't automate accounta- ...sorry, I can't continue this with a straight face :DDD

oytis · 2026-01-26T13:14:32 1769433272

You can only replace someone who was useful. If one is useless, but is still there, it means they are not there for their contribution and you can't replace them by automating whatever it might have been.

bjt12345 · 2026-01-26T06:49:28 1769410168

The thing about C-suite executives is they usually have short tenures, however the management levels below them are often cozy in their bureaucracy, resist change, often trying to outlast the new management.

I actually argue that AI will therefore impact these levels of management the most.

Think about it, if you were employed as a transformational CEO would you risk trying to fight existing managers or just replace them with AI?

joe_mamba · 2026-01-26T07:11:34 1769411494

>I actually argue that AI will therefore impact these levels of management the most.

Not AI but bad economy and mass layoffs tend to wipe out management positions the most. As a decent IC, in case of layoffs in bad economy, you'll always find some place to work at if you're flexible with location and salary because everyone still needs people who know how to actually build shit, but nobody needs to add more managers in their ranks to consume payroll and add no value.

bjt12345 · 2026-01-26T07:40:11 1769413211

A lot of large companies lay off swags of technical staff regularly (or watch them leave), and rotate CEOs but their middle management have jobs for life - as the Peter Principe states, they are promoted to their highest respective incompetence and stay there because no CEO has time to replace them.

AI will transform this.

joe_mamba · 2026-01-26T07:50:37 1769413837

Disagree with the "jobs for life" part for management. Only managers who are there thanks to connection, nepotism or cronyism, are there for life as long as those shielding them also stay in place. THose who got in or got promoted to management meritocratically don't have that protection and are the first to be let go.

At all large MNCs I worked at, management got hired and fired mostly on their (or lack thereof) connections and less on what they actually did. Once they got let go, they had near impossible time finding another management position elsewhere without connections in other places.

mraza007 · 2026-01-26T07:37:10 1769413030

This is so true Especially with middle managers they are they the ones that are hit the hardest

joe_mamba · 2026-01-26T08:14:40 1769415280

Yes I was talking about middle managers mostly. Upper management, C-suite, execs are mostly protected from firing unless they F-up big time like sexual assault, hate speech, etc.

p_v_doom · 2026-01-26T07:33:28 1769412808

Generally yes. The more power one holds in an organization the more safe they are from automation.

netdevphoenix · 2026-01-26T13:26:01 1769433961

You can probably automate the full economy. Both production and consumption

rcbdev · 2026-01-26T09:06:59 1769418419

Yes. The AI cannot be the child/other type of beneficiary of a well-connected person, yet.

TeMPOraL · 2026-01-26T12:40:23 1769431223

Ultimately, no. But when we get to this point - once we have AI deciding on its own what needs to be done in the world in general - then the bottom falls out, and we'll all be watching a new global economy, in which humans won't partake anymore. At best, we'll become pets to our new AI overlords; more likely, resources to exploit.

vkou · 2026-01-26T08:13:06 1769415186

Automating away shareholders can't come soon enough.

vjvjvjvjghv · 2026-01-26T08:12:37 1769415157

The make the decisions so I doubt they will soon themselves to be automated away. Their main risk will be that nobody can buy their products once everything is automated.

I wonder if capitalism and democracy will be just a short chapter in history that will be replaced by something else. Autocratic governments seem to be the most prevalent form of government in history.

measurablefunc · 2026-01-24T01:17:52 1769217472

They are constantly trying to reduce costs which means they're constantly trying to distill & quantize the models to reduce the energy cost per request. The models are constantly being "nerfed", the reduction in quality is a direct result of seeking profitability. If they can charge you $200 but use only half the energy then they pocket the difference as their profit. Otherwise they are paying more to run their workloads than you are paying them which means every request loses them money. Nerfing is inevitable, the only question is how much it reduces response quality & what their customers are willing to put up with.

measurablefunc · 2026-01-23T21:26:16 1769203576

Amodei is on the record about completely automating AI research in 6-12 months. He thinks it's an "exponential" loop & Anthropic is going to be the first to get there. That's not esoteric knowledge, that's the CEO saying so in public at the same time that their consumer facing tool is failing & their automated abuse detection is banning users for legitimate use cases.

observationist · 2026-01-23T21:33:34 1769204014

I don't consider Anthropic to be one of the teams using AI particularly well. They're building the tools, they're not using the tools in the best, most skillful way possible.

Dario is delusional, for this and other reasons.

measurablefunc · 2026-01-23T18:46:51 1769194011

Maybe they shouldn't have trusted their LLM to optimize their kernels so much.

measurablefunc · 2026-01-23T01:37:30 1769132250

I'm using Google's antigravity & it works fine for my use cases.

measurablefunc · 2026-01-23T01:32:09 1769131929

There are people on twitter bragging about using 100s of agents. Here's one example: https://twitter.com/nearcyan/status/2012948508764946484

measurablefunc · 2026-01-23T01:07:09 1769130429

This is very cool. I looked at the Claude.md he was generating and it is basically all of Claude's failure modes in one file. I can think of a few reasons why Anthropic would not want this information out in the open or for someone to systematically collate all the data into one file.

genewitch · 2026-01-23T01:52:52 1769133172

i read the related parts of the linked file in the repo, and it took me a while to find your comment here again to reply to. Are you saying that the failure modes of claude with "coding" webapps or whatever OP was doing? i originally thought it might have meant like... jailbreak. But having read it, i assume you meant the former, as we both read the same thing and it seemed like a series of admonitions to the LLM, written by the LLM (with some spice added by OP? like "YOU ARE WRONG") and i couldn't find anything that would warrant a ban, you know?

measurablefunc · 2026-01-23T04:16:17 1769141777

I'm not saying he did anything wrong. I'm saying I can see how Anthropic's automated systems might have flagged & banned the account b/c one of the heuristics they probably use is that there should be no short feedback loops where outputs of Claude are fed back into inputs. So basically Anthropic tracks all calls to their API & they have some heuristics for going through the history & then assigning scores based on what they think is "abusive" or "loopy".

Of course none of it is actually written anywhere so this guy just tripped the heuristics even though he wasn't doing anything "abusive" in any meaningful sense of the word.

genewitch · 2026-01-23T07:10:48 1769152248

thank you for explaining, your point makes sense and i tend to agree with the surmise.