More

RC_ITR · 2026-03-23T19:06:15 1774292775

This is a discussion with nearly unanimous agreement that poor ATC working conditions are causing Americans to die in preventable aviation accidents.

Maybe this is the one evidence-driven case where you can be open minded about the value of a public employee union?

nradov · 2026-03-23T20:07:20 1774296440

Nope. Public employee unions bring zero value and this incident is not evidence to support such unions. Relying on unions to act as ersatz safety regulators would be stupid, just completely the wrong approach. Decisions about things like ATC procedures, staffing levels, and training standards should be the responsibility of apolitical career bureaucrats.

RC_ITR · 2026-03-24T19:32:28 1774380748

Why would a career bureaucrat be a more efficient way to figure out how to attract and retain ATC workers, ass opposed to a union representing those ATC workers?

Your proposal intentionally injects inefficiency and noise into the system because you don't like some political boogeyman.

RC_ITR · 2026-03-16T15:29:50 1773674990

I'm not positive this was a secret (See: Reddit post about it from 2018):

https://www.reddit.com/r/TheSilphRoad/comments/8i7byi/pokemo...

DANmode · 2026-03-18T19:56:24 1773863784

I’m sure all the Pokémon Go players caught that post!

RC_ITR · 2026-03-25T15:54:15 1774454055

Well, then we get into the area of 'How many people know Google is logging their searches to serve them more targeted YouTube ads?'

RC_ITR · 2026-02-18T16:43:36 1771433016

It's like that FT chart claiming that the rapid rise in iOS apps is evidence of an AI-fueled productivity boom.

I always ask people, in the past year, how many AI-coded apps have you 1) downloaded 2) paid for?

sarchertech · 2026-02-18T16:56:48 1771433808

In addition to that, what they don’t mention is that:

1. Other app stores like Google Play and Steam haven’t seen this rapid rise.

2. There are thousands maybe tens of thousands of apps that are just wrappers calling OpenAI APIs or similar low effort AI apps making up a large percentage of this increase.

3. There are billions of dollars pouring into AI startups and many of them launch an iOS app.

vkou · 2026-02-18T18:38:59 1771439939

Has steam not seen a rapid rise in AI-asset shovelware?

I'm not talking about the AAA or the AA or even the A space (where AI is being incorporated into dev processes with various degrees of both success and low effort slop), I'm talking about the actual bottom of the barrel.

Jensson · 2026-02-18T21:18:11 1771449491

You never needed AI to make shovelware, you have been able to make a shitty game over a weekend ever since RPG maker was made and there are still games made using that.

AI just helps create some assets for games, it doesn't really make it easier or faster to make games but they might look a bit better.

sarchertech · 2026-02-18T18:48:39 1771440519

I can’t speak to the quality of all the games released, but in January 2025 there were 1,413 games released on Steam and in January of this year there were 1,448.

esseph · 2026-02-18T18:39:20 1771439960

> I always ask people, in the past year, how many AI-coded apps have you 1) downloaded 2) paid for?

In the past 5 years, the only "new" app I've added to my phone has been Claude.ai.

Before that I guess DoorDash. And that probably covers the past 7ish years of phone use.

There's just too much shit in the store, a lot of it is scammy or has dark patterns.

For me, "app stores" are largely dead.

disgruntledphd2 · 2026-02-18T16:52:33 1771433553

> It's like that FT chart claiming that the rapid rise in iOS apps is evidence of an AI-fueled productivity boom.

I mean, there is evidence for some change. Personally, I'm sceptical of what this will amount to, but prior to EOY 2025, there really wasn't any evidence for an app/service boom, and now there's weak evidence, which is better than none.

_DeadFred_ · 2026-02-18T18:32:03 1771439523

Because so much technical functionality has been lost/paywalled/dark patterned/enshitified, I've cut the number of apps I use. I've realized building core personal functionality around the whims of corporations eventually just gets weaponized against me, so I might as well start undoing that on my own terms. Who in 2026 is really bringing in a new app/Saas to do much of anything like we naively did a decade ago? No one I know, we've been shown we will be treated as suckers for doing that.

RC_ITR · 2026-02-11T21:23:55 1770845035

The bird not having wings, but all of us calling it a 'solid bird' is one of the most telling examples of the AI expectations gap yet. We even see its own reasoning say it needs 'webbed feet' which are nowhere to be found in the image.

This pattern of considering 90% accuracy (like the level we've seemingly we've stalled out on for the MMLU and AIME) to be 'solved' is really concerning for me.

AGI has to be 100% right 100% of the time to be AGI and we aren't being tough enough on these systems in our evaluations. We're moving on to new and impressive tasks toward some imagined AGI goal without even trying to find out if we can make true Artificial Niche Intelligence.

zarzavat · 2026-02-12T10:15:07 1770891307

This test is so far beyond AGI. Try to spit out the SVG for a pelican riding a bicycle. You are only allowed to use a simple text editor. No deleting or moving the text cursor. You have 1 minute.

RC_ITR · 2026-02-12T23:49:12 1770940152

Sorry, is your definition of AGI "doing things worse than humans can do, but way faster?" because that's been true of computers for a long time.

pixl97 · 2026-02-13T14:14:22 1770992062

I mean for this particular benchmark, yes.

You'd have to put it in an agentic loop to perform corrections otherwise.

Rudybega · 2026-02-11T21:50:39 1770846639

MMLU performance caps out around 90% because there are tons of errors in the actual test set. There's a pretty solid post on it here: https://www.reddit.com/r/LocalLLaMA/comments/163x2wc/philip_...

As far as I can tell for AIME, pretty much every frontier model gets 100% https://llm-stats.com/benchmarks/aime-2025

RC_ITR · 2026-02-12T23:44:13 1770939853

Here's the score for new AIME's, where we know the answers aren't in training.

https://matharena.ai/?view=problem&comp=aime--aime_2026

As for MMLU, is your assertion that these AI labs are not correcting for errors in these exams and then self-reporting scores less than 100%?

As implied by the video, wouldn't it then take 1 intern a week max to fix those errors and allow any AI lab to become the first to consistently 100% the MMLU? I can guarantee Moonshot, DeepSeek, or Alibaba would be all over the opportunity to do just that if it were a real problem.

kingstnap · 2026-02-12T15:50:36 1770911436

The benchmarks are harder than you might imagine and contain more wrong answers and terrible questions than you would expect.

You don't need to take my word for it, try playing MMLU yourself.

https://d.erenrich.net/are-you-smarter-than-an-llm/index.htm...

Its not MMLU-Pro btw, which is considerably harder.

RC_ITR · 2026-02-12T23:50:21 1770940221

Sure and AGI will 100% it 100% of the time, even if it is hard.

hieudesu · 2026-02-14T13:35:48 1771076148

Your definition of AGI must be absurd

simonw · 2026-02-12T00:22:52 1770855772

It has a wing. Look at the code comments in the SVG!

RC_ITR · 2026-02-04T16:29:46 1770222586

I may not be AGI, but here's a $615 2 Queen bed hotel room for the dates he wants in exactly the location he wants (just not on Airbnb).

https://www.booking.com/Share-Wt9ksz

Maybe he really is tied to $600 as his absolute upper limit, but also seems like something a few years from AGI would think to check elsewhere.

RC_ITR · 2026-02-04T16:16:49 1770221809

Yeah, I've found AI 'miracle' use-cases like these are most obvious for wealthy people who stopped doing things for themselves at some point.

Typing 'Find me reservations at X restaurant' and getting unformatted text back is way worse than just going to OpenTable and seeing a UI that has been honed for decades.

If your old process was texting a human to do the same thing, I can see how Clawdbot seems like a revolution though.

Same goes for executives who vibecode in-house CRM/ERP/etc. tools.

We all learned the lesson that mass-market IT tools almost always outperform in-house, even with strong in-house development teams, but now that the executive is 'the creator,' there's significantly less scrutiny on things like compatibility and security.

There's plenty real about AI, particularly as it relates to coding and information retrieval, but I'm yet to see an agent actually do something that even remotely feels like the result of deep and savvy reasoning (the precursor to AGI) - including all the examples in this post.

zer00eyz · 2026-02-04T17:51:11 1770227471

> Typing 'Find me reservations at X restaurant' and getting unformatted text back is way worse than just going to OpenTable and seeing a UI that has been honed for decades.

Your conflating the example with the opportunity:

"Cancel Service XXX" where the service is riddled with dark patterns. Giving every one an "assistant" that can do this is a game changer. This is why a lot of people who aren't that deep in tech think open claw is interesting.

> We all learned the lesson that mass-market IT tools almost always outperform in-house

Do they? Because I know a lot of people who have (as an example) terrible setups with sales force that they have to use.

candiddevmike · 2026-02-04T16:54:50 1770224090

I feel bad for whoever gets an oncall page that some executive's vibe coded app stopped working and needs to be fixed ASAP.

linschn · 2026-02-04T21:36:58 1770241018

> We all learned the lesson that mass-market IT tools almost always outperform in-house,

Funny, I learned the exact opposite lesson. Almost all software suck, and a good way for it not to suck is to know where the developer is and go tell them their shit is broken, in person.

If you want a large scale example, one of the two main law enforcement agency in france spun off libreoffice into their own legal writing software. Developped by LEOs that can take up to two weeks a year to work on that. Awesome software. Would cost litterally millions if bought on the market.

RC_ITR · 2025-12-09T18:32:11 1765305131

Speaking of suboptimal writing, why call it a 'gay' love affair, when he was openly gay?

Matterless · 2025-12-09T19:37:47 1765309067

One of the most important details of Sacks's life which dogged him nearly to the end (and which is important to this NY piece), was a minimization by Sacks of his own sexuality. He was not "openly gay" at all.

BeetleB · 2025-12-18T21:42:01 1766094121

For most of his life, he was not openly gay.

RC_ITR · 2025-11-23T16:57:04 1763917024

One of the biggest problems frontier models will face going forward is how many tasks require expertise that cannot be achieved through Internet-scale pre-training.

Any reasonably informed person realizes that most AI start-ups looking to solve this are not trying to create their own pre-trained models from scratch (they will almost always lose to the hyperscale models).

A pragmatic person realizes that they're not fine-tuning/RL'ing existing models (that path has many technical dead ends).

So, a reasonably informed and pragmatic VC looks at the landscape, realizes they can't just put all their money into the hyperscale models (LP's don t want that) and they look for start-ups that take existing hyperscale models and expose them to data that wasn't in their pre-Training set, hopefully in a way that's useful to some users somewhere.

To a certain extent, this study is like saying that Internet start-ups in the 90's relied on HTML and weren't building their own custom browsers.

I'm not saying that this current generation of start-ups will be successful as Amazon and Google, but I just don't know what the counterfactual scenario is.

Skunkleton · 2025-11-23T17:29:25 1763918965

The question that isn't answered completely in the article is how useful are the pipelines for these startups? The article certainly implies that for at least some of these startups there very little value add in the wrapper.

bradfa · 2025-11-23T17:20:55 1763918455

Got any links to explanations of why fine tuning open models isn’t a productive solution? Besides renting the GPU time, what other downsides exist on today’s SOTA open models for doing this?

RC_ITR · 2025-11-25T20:59:20 1764104360

When the new pre-trained parameters come out in a new model generation, your old fine tuning doesn't apply to them.

RC_ITR · 2025-11-13T17:51:58 1763056318

I think the word "de-enshittify" is probably the least elegant piece of slang ever uttered.

I know linguistics is descriptive not prescriptive, but it's truly amazing to me the lengths people will go to swear.

dang · 2025-11-13T18:22:28 1763058148

https://news.ycombinator.com/item?id=45918211

Blame Doctorow for swearing, not me!

RC_ITR · 2025-09-09T15:19:13 1757431153

I think it's interesting that everyone's immediate reaction now-a-days is to assume incompetence or maliciousness, rather than curiosity at the root cause (very telling this attitude has even permeated a forum for supposed 'hackers').

A high-level is that 80% of the economy is very easy to track b/c it's not very volatile (teachers, for example).

What we have seen is a huge surge in unpredictability in the most volatile 20% of jobs (mining, manufacturing, retail, etc.). The BLS can't really change their methods to catch up with this change for classic backwards compatibility and tech debt reasons.

Part of the reason 'being a quant' is so hot right now is that we truly are in weird times where volatility is much higher than most people realize across sectors of the economy (i.e. AI is changing formerly rock-solid SWE employment trends, tariffs/electricity are quickly and randomly changing domestic manufacturing profitability, etc.). This means that if you can build systems that track data better than the old official systems, you can make some decent money investing against your knowledge.

I think this is a bad state of affairs, but I don't have a good solution. Any private company won't release their data b/c it's too valuable and I am reluctant to encourage the BLS to rip up their methods when backwards compatibility is a feature worth saving.

tonyedgecombe · 2025-09-09T16:02:23 1757433743

Is there really more volatility? My gut feeling is that government interventions have flattened it over recent decades. I’d like to see some real figures on this.

RC_ITR · 2025-09-09T18:58:03 1757444283

https://fred.stlouisfed.org/graph/?g=1Mc3z

Manufacturing and mining are becoming much less correlated to the overall jobs market (likely, as you point out, b/c the government smooths the other sectors).

https://fred.stlouisfed.org/graph/?g=1Mc3I

This is despite being a relatively flat % of employment since 2010 (after a long period of decline).

https://fred.stlouisfed.org/graph/?g=1Mc4f

As mentioned, there is also the weirdness of SWE's going from 'better than the overall market' to 'worse than the overall market'.

https://fred.stlouisfed.org/graph/?g=1Mcer

Retail employment is also dislocating.

Those are just the examples I can think of with no research, I'm sure there are others.

chrisco255 · 2025-09-09T18:50:01 1757443801

Can you actually prove volatility is higher now than in the past? There have been plenty of volatile changes in the workforce over the past several decades, this is not anything new to the job market.

logifail · 2025-09-09T18:08:44 1757441324

> interesting that everyone's immediate reaction now-a-days is to assume incompetence or maliciousness, rather than curiosity at the root cause

I came across this claim last week regarding recent US jobs figures:

> "All jobs gains were part time. Full-time jobs: -357K. Part-time jobs: +597K"

If this claim is true, and I have no means to tell if it is, then - regardless of one's view on whoever is in power right now - do we really expect any elected representatives to be brave enough to say that out loud at a press conference?

I don't :/

riazrizvi · 2025-09-09T17:04:27 1757437467

Explain to me please why job numbers aren’t simply a matter of querying the Federal social security database? A longstanding process of polling businesses for what they want to report, followed by corrections up to one year later, has got to be a pantomime to fudge the numbers.

tzs · 2025-09-09T18:04:23 1757441063

They survey businesses because the Social Security database has too much lag and does not contain enough detail.

The lag is because it is based on employer submissions that are quarterly or annual.

riazrizvi · 2025-09-09T18:37:54 1757443074

Does that pass the basic common sense smell test? Everyone can see on their paycheck the amount, that is paid 30 days after any work day in the worst case. These payments are sent to a single federal bank account, and data-wise are combined with Social Security ID, sending bank id, date. It’s a bank, there’s a database. We are talking at most about 200mm records, a raspberry pi can process that query in minutes. If we can’t query this easily it’s by design. Or we could do some backflips and somersaults to try to come up with a reason for why the bureaucracy has to be more complicated.

tzs · 2025-09-09T20:09:51 1757448591

The payments are deposited monthly or semiweekly (for employers with large payroll) but that's a lump sum. If you are looking at that from the government side all you can tell is whether total payroll has gone up or down. That won't tell if any change is due to a change in number of employees or a change in pay rates or some combination of that.

It isn't until the employer files their quarterly Form 941 that you'd see employment numbers. Form 941 includes the number of employees and total wages and withholding.

It isn't until the annual W-2 filings that you would see a breakdown that includes number of employees and the individual pay.

riazrizvi · 2025-09-09T20:43:27 1757450607

Ah okay, this is why then. So all my other comments complaining about the lack of timeliness have this simple explanation. TIL

RC_ITR · 2025-09-09T19:09:35 1757444975

Not all 'normal income' is from a "job" as we think of it and assuming that does not even come close to passing any informed person's smell test.

Parsing tax or SS payments for what a "job" is would be a logistical nightmare, because that's not what the system is designed for (unlike the BLS's system, which is designed to count jobs).

riazrizvi · 2025-09-09T19:13:05 1757445185

When ppl want job numbers they want a reliable proxy for the state of the economy. Fixing it on changes to payroll-based social security payments would be far better than what we have now, if timely.

RC_ITR · 2025-09-09T19:16:07 1757445367

Sure, that's personal income and can be found here:

https://fred.stlouisfed.org/series/PINCOME

riazrizvi · 2025-09-09T20:08:27 1757448507

I only see a stat that reports the same number for full employment vs one person who fired them all and took their incomes. Is there a way to disaggregate to get some proxy for employment like we are talking about?

RC_ITR · 2025-09-09T20:33:18 1757449998

Yes, the BLS employment survey.

mannyv · 2025-09-10T16:27:13 1757521633

And yet, SS contributions are done every pay period.

Who has that data then? Treasury?

riazrizvi · 2025-09-09T20:58:03 1757451483

So the answer is payments per social security id are not reported to the social security Electronic Federal Tax Payment System (EFTPS), employers only report aggregate payments. And workers and employers only report payments by individual in W2’s in January.

chrisco255 · 2025-09-09T18:54:02 1757444042

Probably the only reason is because the BLS and SSA are completely separate, and SSA is probably antiquated and doesn't attempt to tag or organize their data along the same parameters as whatever the BLS defines. It likely neither has the staffing nor resources to provide those hooks and realtime anonymous aggregated data for other departments to consume.

SantalBlush · 2025-09-10T00:57:39 1757465859

A lot of people don't understand that collecting data is actually expensive and difficult when it doesn't involve surreptitiously stealing it via some piece of tech.

gdulli · 2025-09-09T15:28:10 1757431690

Are 'hackers' allowed to have priors regarding incompetence or malice, or are we supposed to look at everything with a clean slate and no context?