More

Manabu-eo · 2026-02-12T18:15:08 1770920108

How likely this problem is already on the training set by now?

simonw · 2026-02-12T18:24:55 1770920695

If anyone trains a model on https://simonwillison.net/tags/pelican-riding-a-bicycle/ they're going to get some VERY weird looking pelicans.

suddenlybananas · 2026-02-12T19:20:56 1770924056

Why would they train on that? Why not just hire someone to make a few examples.

simonw · 2026-02-12T19:36:01 1770924961

I look forward to them trying. I'll know when the pelican riding a bicycle is good but the ocelot riding a skateboard sucks.

suddenlybananas · 2026-02-12T19:59:13 1770926353

But they could just train on an assortment of animals and vehicles. It's the kind of relatively narrow domain where NNs could reasonably interpolate.

simonw · 2026-02-12T20:04:17 1770926657

The idea that an AI lab would pay a small army of human artists to create training data for $animal on $transport just to cheat on my stupid benchmark delights me.

suddenlybananas · 2026-02-12T20:07:15 1770926835

When you're spending trillions on capex, paying a couple of people to make some doodles in SVGs would not be a big expense.

simonw · 2026-02-12T20:41:19 1770928879

The embarrassment of getting caught doing that would be expensive.

jononor · 2026-02-14T19:32:16 1771097536

They were caught using all the data on the internet without asking for permission or compensating anyone. And it has cost them nothing and earned them billions so far.

red75prime · 2026-02-13T03:39:22 1770953962

Vetting them for the potential for whistleblowing might be a bit more involved. But conspiracy theories have an advantage because the lack of evidence is evidence for the theory.

toraway · 2026-02-13T06:27:54 1770964074

Huh? AI labs are routinely spending millions to billions to various 3rd party contractors specializing in creating/labeling/verifying specialized content for pre/post-training.

This would just be one more checkbox buried in hundreds of pages of requests, and compared to plenty of other ethical grey areas like copyright laundering with actual legal implications, leaking that someone was asked to create a few dozen pelican images seems like it would be at the very bottom of the list of reputational risks.

red75prime · 2026-02-13T07:08:22 1770966502

How do you think who's in on that? Not only pelicans, I mean, the whole thing. CEOs, top researchers, select mathematicians, congressmen? Does China participate in maintaining the bubble?

I, myself, prefer the universal approximation theorem and empirical finding that stochastic gradient descent is good enough (and "no 'magic' in the brain", of course).

usefulposter · 2026-02-13T12:15:23 1770984923

Well, since we're all talking about sourcing training material to "benchmaxx" for social proof, and not litigating the whole "AI bubble" debate, just the entire cottage industry of data curation firms:

https://scale.com/data-engine

https://www.appen.com/llm-training-data

https://www.cogitotech.com/generative-ai/

https://www.telusdigital.com/solutions/data-for-ai-training/...

https://www.nexdata.ai/industries/generative-ai

---

P.S. Google Comms would have been consulted re putting a pelican in the I/O keynote :-)

https://x.com/simonw/status/1924909405906338033

red75prime · 2026-02-13T13:04:59 1770987899

Cool. At least they are working across the board and benchmaxing random things like the theory of mind.

WarmWash · 2026-02-13T15:16:03 1770995763

I think no matter what happens with AI in the future, there will always be a subset of people with elaborate conspiracies about how it's all fake/a hoax.

suddenlybananas · 2026-02-13T18:58:09 1771009089

I'm not saying it's a hoax. If it gets better because of that data, tant mieux, but we have to be clear eyed about what these models are actually doing. Especially when companies don't explain what they've done.

dontwannahearit · 2026-02-13T10:08:28 1770977308

Would it not be better to have 100 such tests "Pelican on bicycle", "Tiger on stilts"..., and generate them all for every new model but only release a new one each time. That way you could show progression across all models, attempts at benchmaxxing would be more obvious.

Given the crazy money and vying for supremacy among AI companies right now it does seem naive to belive that no attempt at better pelicans on bicycles is being made. You can argue "but I will know because of the quality of ocelots on skateboards" but without a back catalog of ocelots on skateboards to publish its one datapoint and leaves the AI companies with too much plausible deniability.

The pelicans-on-bicycles is a bit of fun for you (and us!) but it has become a measure of the quality of models so its serious business for them.

There is an assymetry of incentives and high risk you are being their useful idiot. Sorry to be blunt.

Applejinx · 2026-02-13T14:43:53 1770993833

Or indeed do the Markov chain conceptual slip. Pelican on bicycle, badger on stool, tiger on acid. Pelican on bicycle is definitely cooked, though: people know it and it's talked about in language.

throwup238 · 2026-02-12T18:20:14 1770920414

For every combination of animal and vehicle? Very unlikely.

The beauty of this benchmark is that it takes all of two seconds to come up with your own unique one. A seahorse on a unicycle. A platypus flying a glider. A man’o’war piloting a Portuguese man of war. Whatever you want.

recursive · 2026-02-12T18:24:05 1770920645

No, not every combination. The question is about the specific combination of a pelican on a bicycle. It might be easy to come up with another test, but we're looking at the results from a particular one here.

svara · 2026-02-12T19:07:00 1770923220

More likely you would just train for emitting svg for some description of a scene and create training data from raster images.

recursive · 2026-02-12T23:49:39 1770940179

None of this works if the testers are collaborating with the trainers. The tests ostensibly need to be arms-length from the training. If the trainers ever start over-fitting to the test, the tester would come up with some new test secretly.

ebonnafoux · 2026-02-13T12:40:13 1770986413

You can easily make a RLAIF loop.

- Take a list of n animals * m vehicule

- Ask a LLM to generate SVG for this n*m options

- Generate png from the svg

- Ask a Model with vision to grade the result

- Change your weight accordingly

No need to human to draw the dataset, no need of human to evaluate.

verdverm · 2026-02-12T18:18:34 1770920314

I've heard it posited that the reason the frontier companies are frontier is because they have custom data and evals. This is what I would do too

zarzavat · 2026-02-12T18:22:29 1770920549

You can always ask for a tyrannosaurus driving a tank.

Manabu-eo · 2025-11-06T21:29:00 1762464540

Related to overbuilding, vertically mounted solar panels can help flatten the generation curve during the day, and may perform better than "optimally tilted" panels on winter, especially where snow might otherwise be a problem.

Manabu-eo · 2025-04-28T19:05:22 1745867122

"I believe in empathy, like, I think you should care about other people [...] empathy is good" — Elon on Joe Rogan

Manabu-eo · 2025-04-28T19:00:06 1745866806

"I believe in empathy, like, I think you should care about other people [...] empathy is good" - Elon Musk

Manabu-eo · 2025-04-22T20:14:54 1745352894

I saw the argument that the source code is the preferred base to make changes and modifications in software, but in the case of those large models, the weights themselves are the preferred way.

It's much easier and cheap to make a finetune or LoRA than to train from scratch to adapt it to your use case. So it's not quite like source vs binary in software.

Manabu-eo · on March 24, 2025

> the basic controls of fully automated vertical landing were directly demo'd in a real flying 1/3 scale test bed (though important to note not an orbital one) with the DC-X in 1993.

Surveyor 1 was the first automated vertical soft-landing rocket AFAIK, in 1966. DC-X was the first using turbo-pump engines.

Manabu-eo · on Feb 26, 2025

Framework entire brand is to let you upgrade parts on your own, and they said CAMM was technically impossible because signal integrity.

skyyler · on Feb 26, 2025

That sounds like an excuse.

aseipp · on Feb 26, 2025

You've never been able to upgrade RAM on GPUs for the exact same reasons that Framework is alluding to. That's what's happening here; GPUs have lots of cores, and to keep them feed you need lots of bandwidth to go along with it. All dedicated GPUs come with dedicated memory for this exact reason. Things like traditional "iGPUs" that use DDR can get away with it exactly because they are so small and computationally weak that the limited bandwidth isn't the immediate #1 bottleneck. But Strix Halo is not intended to be a measly iGPU system, so it has different needs.

There just isn't a free lunch here, it's an inherent design tradeoff for these kinds of chips. The CPU is just along for the ride, in this case.

Manabu-eo · on Feb 26, 2025

You just need to use two CAMM2 to get 256-bit bus, just like what you do with regular DIMMs when you need more channels.

Manabu-eo · on Feb 19, 2025

He has a degree in Physics, that is like half of any engineering curriculum. Before funding SpaceX he hired several industry consultants to educate him, indicate aerospace engineering textbooks to study, etc. And then he had about 6 years of experience as almost full time CTO and CEO of SpaceX, until he had to divide his attention with Tesla. And somehow, after he and the SpaceX team achieved what dozens of other teams with more funding failed, he "understands nothing"? No need to be "the faster learner in the history of mankind".

Someone being capable in one field doesn't means he isn't a insufferable jerk or a moron in other fields. I don't understand this impulse to paint someone as completely black or completely white.

pqtyw · on Feb 19, 2025

> He has a degree in Physics

Consensus seems to be that he has some kind of a dual degree (obtained simultaneously) which includes B.S. in economics and a B.A.(!) in physics. That A would imply that he probably took the easier physics related classes (and probably not that many in total given the 2 degrees for 1 thing).

Regardless, a bachelor degree hardly means much anyway...

Is there any indication that he's a particularly (or at all) talented engineer (software or any other field)? I mean, yeah, I agree that it doesn't really matter or change much. Just like Jobs had better/more important things (not being sarcastic) to do than directly designing hardware or writing software himself.

Manabu-eo · on Feb 20, 2025

I don't know how B.S. and B.A. degrees work, but apparently that B.A. in physics was enough for him be accepted to a graduate program in materials science at Stanford University.

He also "held two internships in Silicon Valley: one at energy storage startup Pinnacle Research Institute, which investigated electrolytic supercapacitors for energy storage, and another at Palo Alto–based startup Rocket Science Games."[1] , has some software patents (software patents should be abolished) from his time at Zip2, and made and sold a simple game when he was twelve.

So he has a little experience working directly at the low level with his physics degree and coding knowledge, but of course it was not his talent in those that made him a billionaire, it might even have been the opposite. So there is indication for the "at all" but not on how talented. I guess one versed in BASIC can read the source of his game, but that was when he was twelve...

But yeah, nowadays he has thousands of engineers working under him, of course he is going to delegate. The the important thing is the system engineering, making sure the efforts are going in the right direction and well coordinated. He seems knowledgeable and talented enough at that. Evidence for SpaceX: https://old.reddit.com/r/SpaceXLounge/comments/k1e0ta/eviden...

[1] https://web.archive.org/web/20191228213526/https://www.cnbc.... , https://fortune.com/longform/book-excerpt-paypal-founders-el...

pqtyw · on Feb 21, 2025

> I don't know how B.S. and B.A. degrees work, but apparently that B.A. in physics was enough for him be accepted to a graduate program in materials science at Stanford University.

Is there any conclusive evidence either way? IIRC he allegedly got into graduate program 2 years before getting his 2 B.S. / B.A.?

ein0p · on Feb 20, 2025

Don't let the facts get in the way of the "Musk is a midwit who just stumbles into founding trillion dollar companies" story. ;) It's an article of faith for these people.

pqtyw · on Feb 21, 2025

I'm not sure how Musk not being anywhere close to being a talented engineer or scientist somehow diminishes his extreme success in other fields? That seems mostly orthogonal.

Having a PhD. or any field is relatively ordinary and not that impressive on the grand scale of thing. Founding several extremely successful tech/etc. companies is on a whole other level. Being a horrible software engineer (as his public action/communication on the topic would imply) seems entirely insignificant and hardly relevant when he has much more important things to do.

Of course other with comparable achievements (e.g. like Jobs who I don't think ever claimed that he was a talented engineer) weren't even as remotely insecure or narcissistic as him.

lesuorac · on Feb 20, 2025

> he had about 6 years of experience as almost full time CTO and CEO of SpaceX, until he had to divide his attention with Tesla

So, by your own admission, he knows a lot about 2/10 of his companies.

20% is a F not a passing grade.

Manabu-eo · on Feb 20, 2025

You are making a big logical jump here. I only gave one company as example because that is enough to disprove your previous post.

Also, before you thought he knew very little about his many companies, implying no distinction, but now you adjusted up his knowledge about two companies, but inexplicably down for the others.

You also imply he should give equal attention to all of them, ignoring some of them are bigger, more important, or simply more interesting to him. Is equal attention the optimal strategy here, or you would be getting an F grade if you suggested that?

He didn't need to invest a lot of time to make a good investment in DeepMind, that was then bought by Google, for example. Investing in what you know and understand is a good investment advice, but so is to diversify your portfolio and to not spend too much time optimizing your investments in lieu of everything else.

Some of his "investments" are more like spending on a hobby (as destructive as it can be, in the case of twitter for example... or constructive like SpaceX), so not even bound by those rules...

Manabu-eo · on Feb 11, 2025

While not the main focus, see Section 6.1 and Figure 10 for a simple adaptative exit strategy for inference.

I imagine that they choose a fixed number of recurrent iterations during training for parallelization purposes. Not depending on the previous step to train the next is the main revolution about transformers vs LSTM (plus the higher internal bandwidth). But I agree that it might not be the most efficient model to train due to all that redundant work at large r.