Not an insider but someone who uses the tools. It's a branding update, nothing more. The models haven't gotten any less sanctimonious, but the companies behind them have stopped harping on their restrictions in order to appeal to a broader customer base (gov contracts, etc.)
So the guardrails (for you and me) are still there. They just stopped committing the unforced error of excluding themselves from federal procurement. Under a different administration, the requirement might change, and you might see them boasting once more on "safety."
I don't think it's sanctimonious to say, hey, I don't want the technology I work on to be used for targeting decisions when executing people from the sky. Especially as the tech starts to play more active roles. You know governments will be quick to shift blame to the model developers when things go wrong.
> I don't want the technology I work on to be used for targeting decisions when executing people from the sky
one problem i have with this specific case and Anthropic/Claude working with the DOD is I feel an LLM is the wrong tool for targeting decisions. Maybe given a set of 10 targets an LLm can assist with compiling risks/reward and then prioritizing each of the 10 targets but it seems like there would be much faster and better way to do that than asking an LLM. As for target acquisition and identification, i think an LLM would be especially slow and cumbersome vs one of the many traditional ML AIs that already exist. DOD must be after something else.
> I don't want the technology I work on to be used for targeting decisions when executing people from the sky
What do you do when the government come to you and tell you that they do want that, and can back it up with threats such as nationalizing your technology? (see Anthropic)
We're back to "you might not care about politics, but that won't stop politics caring about you".
> I know this is a foreign concept to some, but you can have a backbone.
Challenge it in court. Move the company to a different jurisdiction. Burn everything down and refuse to comply.
Challenge in court is fine, even healthy.
Threatening to burn everything down and refuse to comply might well work; simply daring Trump to a game of Russian Roulette about this popping the bubble that's only just managing to keep the US economy out of recession, on the basis that he TACOs a lot, I can see it working in a way it wouldn't if he were a sane leader making the same actual demands just for sane reasons.
Move the company to a different jurisdiction? That would have worked if AI was a few hundred people and a handful of servers, as per classic examples of:
At the height of its power, Kodak employed more than 140,000 people and was worth $28 billion. They even invented the first digital camera. But today Kodak is bankrupt, and the new face of digital photography has become Instagram. When Instagram was sold to Facebook for a billion dollars in 2012, it employed only 13 people. Where did all those jobs disappear? And what happened to the wealth that all those middle class jobs created?
But (I think) now that AI needs new data centres so fast and on such a scale that they're being held back by grid connection and similar planning permission limits, this isn't a viable response.
They can be burned down, but I think they can't realistically be moved at this point. That said, I guess it depends on how much Anthropic relies on their own data centres vs. using 3rd parties, given Amazon's announced AWS sovereign cloud in Europe?
Unicode is both the best thing that's ever happened to text encoding and the worst. The approach I take here is to treat any text coming from the user as toxic waste. Assume it will say "Administrator" or "Official Government Employee" or be 800 pixels tall because it was built only out of decorative combining characters. Then put it in a fixed box with overflow hidden, and use some other UI element to convey things like "this is an official account."
The worst part that this article doesn't even touch on with normalizing and remapping characters is the risk your login form doesn't do it but your database does. Suddenly I can re-register an existing account by using a different set of codepoints that the login system doesn't think exists but the auth system maps to somebody else's record.
For some sorts of "confusables", you don't even need Unicode in some cases. Depending on the cursed combination of font, kerning, rendering and display, `m` and `rn` are also very hard to distinguish.
> or be 800 pixels tall because it was built only out of decorative combining characters
Also known as Zalgo. But it seems most renderers nowadays overlay multiple combining marks over each other rather than stack them, which makes it look far less eldritch than it used to.
It tracks with the approximate 70:30 split we inexplicably observe in other seemingly unrelated population-wide metrics, which I suppose makes sense if 30% of people simply lack the ability to reason. That seems more correct than me than "the question is framed poorly" - I've seen far more poorly framed ballot referendums.
While I’m sure it’s more than 0%, seems more likely that somewhere between 0% and 30% don’t feel obligated to give the inquiry anything more than the most cursory glance.
> which I suppose makes sense if 30% of people simply lack the ability to reason
I think it would be better to say that 30% of people either lack the ability to reason (inarguably true in a few cases, though I'd suggest, and hope, an order of magnitude or two less than 30%, as that would be a life-altering mental impairment) or just can't generally be bothered to, or just didn't (because they couldn't be bothered, or because they felt some social pressure to answer quickly rather than taking more than an instant time to think) at the time of being asked this particular question.
An automated system like an LLM to not have this problem. It has no path to turn off or bypass any function that it has, so if it could reason it would.
This is something I have wondered about before: whether AIs are more likely to give wrong answers when you ask a stupid question instead of a sensible one. Speaking personally, I often cannot resist the temptation to give reductio-ad-absurdum answers to particularly ridiculous questions.
If 30% of humans on the internet can't be bothered to make an effort to answer stupid questions correctly, then one would expect AIs to replicate this behaviour. And if humans on the internet sometimes provide sarcastic answers when presented with ridiculous questions, one would expect AIs to replicate this behavior as well.
So you really cannot say they have no incentive to do so. The incentive they have is that they get rewarded for replicating human behaviour.
I don't think 30% of people can't reason. I think 30% of people will fail fairly simple trick questions on any given attempt. That's not at all the same thing.
Some people love riddles and will really concentrate on them and chew them over. Some people are quickly burning through questions and just won't bother thinking it through. "Gotta go to a place, but it's 50 feet away? Walk. Next question, please." Those same people, if they encountered this problem in real life, or if you told them the correct answer was worth a million bucks, would almost certainly get the answer right.
This. The following question is likely to fool a lot of people, too. "I have a rooster named Pat. (Lots of other details so you're likely to forget Pat is a rooster, not a hen). Pat flies to the top of the roof and lays an egg right on the ridge of the roof. Which way will the egg roll?"
But if you omit the details designed to confuse people, they're far less likely to get it wrong: "I have a rooster named Pat. Pat flies to the top of the roof and lays an egg right on the ridge of the roof. Which way will the egg roll?"
It's not about reasoning ability, it's about whether they were paying close attention to your question, or whether their minds were occupied by other concerns and didn't pay attention.
What does “get it wrong” mean for you with this question? Or what is “getting it right” here? If i hear that Pat is a rooster and i understand and retain that information I will look at you like you are dumb for saying such an impossible story. If i don’t i will look at you like you are dumb because how is anyone supposed to know which way will an egg laid on a ridge roll. How are you supposed to even score this?
My interpretation is that Pat is a rooster and he has laid an egg. That's in the question. A normal rooster can't normally lay an egg, but so what, that's completely irrelevant. Maybe Pat is not a normal rooster. Maybe by "lay" an egg, the question meant "put it down carefully". Maybe it's just that the questioner's English is poor and when they said rooster they meant hen.
"Getting it right" for this particular trick question means saying "Hey, roosters can't lay eggs". If someone tries to figure out which way the egg will roll then they've missed the trick. In most cases the person's response will tell you whether they caught the trick or not, though in the case of someone who just looks at you like you're dumb and doesn't say anything I will grant that you wouldn't be able to tell until they said something. But their first verbal response would probably reveal whether they saw through the trick question or not.
Tell me you've never done any farming in your life without telling me you've never done any farming in your life. The difference between male and female animals matters, a lot, to farmers (or ranchers). There's a reason the English language has the words cow and bull, sow and boar, ewe and ram, rooster and hen, nanny and billy, mare and stallion, and many more (and has had those words for centuries). And that reason is precisely because of how mammal (and avian) reproduction works. A cow can't do a bull's job, nor vice-versa, if you want to have calves next year, and grow the size of your herd (or sell the extra animals for income). And so, centuries ago, English-speaking farmers who didn't want to spend the extra syllables on words like "male cattle" and "female cattle" came up with handy, short words (one-syllable words for most species, though not goats and horses) to express those distinctions. Because as I mentioned, they matter a lot when you're raising animals.
When you are doing workshops, particularly teaching something that people are "sitting through" rather than engaging with, you see very similar ratios on end of segment assessment multiple choice questions. I mentioned elsewhere that this is the same kind of ratio you see on cookie dialogs (in either direction).
Think basic security (password management, email phishing), H&S etc. I've ran a few of these and as soon as people hear they don't have to get it right a good portion of people just click through (to get to what matters). Nearly 10 years ago I had to make one of my security for engineers tests fail-able with penalty because the front-end team were treating it like it didn't matter - immediately their results effectively matched the backend team, who viewed it as more important.
I talked to an actor a few days ago, who told me he files his self-assessment on the principle "If I don't immediately know the answer, just say no and move on". I talked to a small company director about a year ago whose risk assessments were "copy+paste a previous job and change the last one".
Anyone who has analysed a help desk will know that its common for a good 30+% of tickets to be benign 'didn't reason' tickets.
I think the take-away is that many people bother to reason about their own lives, not some third parties' bullshit questions.
Is this your experience? Do you think 30% of your friends or family members can't answer this question? If not, do you think your friends or family are all better than the general population?
I'd look for explanations elsewhere. This was an online survey done by a company that doesn't specialize in surveys. The results likely include plenty of people who were just messing around, cases of simple miscommunication (e.g., asking a person who doesn't speak English well), misclicks, or not even reaching a human in the first place (no shortage of bots out there).
People often trip up on similar questions, anything to do with simple math. You know when they go out in the street and ask random people if 5 machines can produce 5 parts in 5 minutes, how long will it take for 100 machines.
Unlike the car question, where you can assume the car is at home and so the most probable answer is to drive, with the machines it gets complicated. Since the question doesn't specify if each machine makes one part or if they depend on each other (which is pretty common for parts production). If they are in series and the time to first part is different than time to produce 5 parts, the answer for 100 machines would be the time to produce the first part. Where if each machine is independent and takes 5 minutes to produce single part, the time would be 5 minutes.
Theory of mind won’t help you answering this question. It is obviously an underspecified question (at least in any contexts where you are not actively designing/thinking about some specific industrial process). As such theory of mind indicates that the person asking you is either not aware that they are asking an underspecified question, or are out to get you with a trick. In the first case it is better to ask clarifying question. In the second case your choosen answer depend on your temperament. You can play along with them, or answer an intentionally ridiculous answer, or just kick them in the shin to stop them messing with you.
There is nothing “mathematical” about any of this though.
>As such theory of mind indicates that the person asking you is either not aware that they are asking an underspecified question, or are out to get you with a trick.
Context would be key here. If this were a question on a grade school word problem test then just say 100, as it is as specified as it needs to be. If it's a Facebook post that says "We asked 1000 people this and only 1 got it right!" then it's probably some trick question.
If you think it's not specified enough for a grade school question, then I would challenge you to come up with a version that's specified rigorously enough for any sufficiently picky interviewee. (Hint: This is not possible)
>There is nothing “mathematical” about any of this though.
Finding the correct approach to solve a problem specified in English is a mathematical skill.
> If this were a question on a grade school word problem test then just say 100
Let me repeat the question again: "If 5 machines can produce 5 parts in 5 minutes, how long will it take for 100 machines?" Do you think that by adding 95 more machines they will suddenly produce the same 5 parts 95 minutes slower?
What kind of machine have you encountered where buying more of them the ones you already had started working worse?
> then I would challenge you to come up with a version that's specified rigorously enough for any sufficiently picky interviewee.
This is nonsense. The question is under specified. You don't demonstrate that something is underspecified by formulating a different well specified question. You demonstrate it by showing that there are multiple different potentially correct answers, and one can't know which one is the right one without obtaining some information not present in the question.
Let me show you that demonstration. If the machines are for example FDM printers each printing on their own a benchy each, then the correct answer is 5 minutes. The additional printers will just sit idle because you can't divide-and-conquer the process of 3d printing an object.
If the machines are spray paint applying robots, and the parts to be painted are giant girders then it is very well possible that the additional 95 paint guns make the task of painting the 5 girders quasi-instantaneous. Because they would surround the part and be done with 1 squirt of paint from each paint gun. This classic video demonstrates the concept: https://www.youtube.com/shorts/vGWoV-8lteA
This is why the question is under specified. Because both 1ms and 5 minutes are possibly correct answers depending on what kind of machine is the "machine". And when that is the case the correct answer is neither 1ms nor 5 minutes, but "please, tell me more. There isn't enough information in the question to answer it."
Note: I'm struggling to imagine a possible machine where the correct answer is 100 minutes. But I'm sure you can tell what kind of machine you were thinking of.
It's not theory of mind, it's an understanding of how trick questions are structured and how to answer one. Pretty useless knowledge after high school - no wonder AI companies didn't bother training their models for that
It's not a trick question. It has a simple answer. It's literally impossible to specify a question about real world objects without some degree of prior knowledge about both the contents of the question and the expectation of the questioner coming into play.
The obvious answer here is 100 minutes because it's impossible to perfectly encapsulate every real life factor. What happens if a gamma ray burst destroys the machines? What happens if the machine operators go on strike? Etc, etc. The answer is 100.
There are different kind of statements. Do you mean in a defined time interval or on average? Men are stronger than women. Does that mean there is no woman who is stronger then a man? You can't drive over 50 here. Does that mean it's physically impossible?
Well, these type of questions are looking for intelligent assumptions. Similar to IQ tests, you are supposed to understand patterns and make educated guesses.
> Do you think 30% of your friends or family members can't answer this question? If not, do you think your friends or family are all better than the general population?
That actually would be quite feasible. Intelligence seems to be heritable and people will usually find friends that communicate on their level. So it wouldn't be odd for someone who is smarter than the general population to have friends and family who are too.
My friend's and family all tell me they are above
average at work, yet most of them will tell me
they have coworkers who won't pay enough attention
to a question to answer it correctly.
>If not, do you think your friends or family are all better than the general population?
Since most people live in social bubbles that would be a very plausible case, especially on HN.
If you're a college educated developer, with a college educated wife, and smart, well educated children, perhaps yourselves the children of college educated parents, and your social circle/friends are of similar backgrounds, you'd of course be "better than the general population".
I don't think it's the lack of the ability to reason. The question is by definition a trick question. It's meant to trip you up, like '
"Could God make a burrito so hot that even he couldn't touch it?" Or "what do cows drink?" or "a plane crashes and 89 people died. Where were the survivors buried?"
I've seen plenty of smart people trip up or get these wrong simply because it's a random question, there's no stakes, and so there's no need to think too deeply about it. If you pause and say "are you sure?" I'm sure most of that 70% would be like "ohhh" and facepalm.
> which I suppose makes sense if 30% of people simply lack the ability to reason
You can't really infer that from survey data, and particularly from this question. A few criticisms that I came up with off the top of my head:
- What if the number were actually 60% but half guessed right and half guessed wrong?
- Assuming the 30% is a failure of reasoning, it's possible that those 30% were lacking reason at that moment and it's not a general trend. How many times have you just blanked on a question that's really easy to answer?
- A larger percentage than you expected maybe never went to a car wash or don't know what one is?
- Language barrier that leaked through vetting? (Would be a small %, granted)
- Other obvious things like a fraction will have lied just because it's funny, were suspicious, weren't paying attention and just clicked a button without reading the question.
I do agree that the question isn't framed particularly badly, however. I'm just focusing on cognitive impairment, which I don't think is necessarily true all of the time.
The problem of "kids accessing the Internet" is a purposeful distraction from the intent of these laws, which is population-level surveillance and Verified Ad Impressions.
Today, in practice it's not a choice, because even the most attentive parents fail to block internet access. Parental controls are ineffective, and all the kid's friends have access so they become alienated. https://beasthacker.com/til/parental-controls-arent-for-pare...
But laws alone won't fix this, and laws aren't necessary (except maybe a law that prevents kids from buying phones). In the article, the child's devices had parental controls, but they were ineffective. There's demand for a phone with better parental controls, so it will come, and more parents are denying access, so their kids will become less alienated.
Well, age verification is the "we have to do something about this nebulous problem even if the best thing we can think of actually makes everything worse for everyone but it makes us feel better" fallacy, which is equally ridiculous.
No, it's not the same. There are anonymous solutions that solve this problem that are perfectly acceptable. Not perfect for prevention, but a good compromise nonetheless. Like cig/alcohol underage consumption prevention.
I think we totally disagree on the degree of how much this is actually a problem compared to how much we're willing to invest in it. Those anonymous solutions are fairly idealistic and Nirvana-esque themselves, I don't think they'd see wide adoption. Beyond that I'm firmly in the camp that age verification for the kids is a complete smokescreen for the actual intent of these efforts, which is more surveillance, so on principal I'm opposed to any movement in this direction and doubt we'll find common ground.
Yeah, sure, no matter the studies, no matter the developmental indices, ni matter the WHO, no matter the psychologists. Let's also talk about climate change and how it's up for debate?
We don't disagree on whether it is actually a problem, you just have your opinion about facts.
We are arguing different things. I have never stated "psychological effects of the Internet aren't real and therefore this discussion is moot." My argument is "psychological effects or not (and personally I think they are overplayed), the privacy tradeoff of trying to fix them is not worth it (and I doubt any vague gestures in the direction of age assurance would help)." You are focusing on the first parenthetical but the important part is outside it.
We also have no way to actually measure this even if we wanted to do an experiment. So comparing this very soft science to climate change is a bit out of pocket.
> We also have no way to actually measure this even if we wanted to do an experiment.
Sorry, WHAT? No way to measure it? My god, are we talking about the same thing? Are you sure you haven't missed past 12-24 months of increased reporting on the matter from several different angles, from cognitive skills, anxiety, sexual drive, and so on?
In my experience the people who want "privacy preserving age verification" are the same people who want "encryption backdoors but only for the good guys." Shockingly the technically minded among them do seem to recognize the impossibility of the latter, without applying the same chain of thought to the former.
They are fundementally different problems. It is already the governments job to maintain a record of their citizens and basic demographic information like age.
Private actors are already offering verification as a paid service. They are accumulating vast troves of private date to offer the service.
> and still there's no mechanism for privacy friendly approval for adults apart from sending over the whole ID. Of course this is a huge failure of governments but probably also of W3C
I consider it a huge success of the Internet architects that we were able to create a protocol and online culture resilient for over 3 decades to this legacy meatspace nonsense.
> That being said, this is a 1 bit information, adult in current legislation yes/no.
If that's all it would take to satisfy legislatures forever, and the implementation was left up to the browser (`return 1`) I'd be all for it. Unfortunately the political interests here want way more than that.
If I click fast enough on mobile it starts trying to select/highlight text, should be able to prevent that with CSS too. I find this is somehow a common issue that separates a lot of PWAs from real apps, the browser text engine is still lurking there in the background trying to recall us all to the glory days of hypermedia
I'm also trying to think of what use I'd make of sugar that would specifically not be for addictive purposes. Maybe keeping down medicine?
Point being, the internet is the clutchable pearl de jour for easy political points. There's far more proven addictions and harm elsewhere, but those problems are boring and trodden and don't give a dopamine hit to regulate quite like the rancor that proposals like this drum up. Hey, aren't dopamine hits what they're trying to mediate in the first place?
So the guardrails (for you and me) are still there. They just stopped committing the unforced error of excluding themselves from federal procurement. Under a different administration, the requirement might change, and you might see them boasting once more on "safety."
reply