Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The root of the problem is that AI-as-a-service is corked, because companies providing it have a hell of an incentive to use all that data to out-compete their competitors, and they can do so in secret. To say nothing of salivating law-enforcement who really, really wants to tap into it. I'm hoping there will be at some point open-source and affordable hardware that can run competent models.
 help




that story remains ridiculous, and the RCMP is trying to pass the buck, when they already knew the issues and had confiscated the guns

... then gave them back, and then tbe shooting happened.


Never pass a good incident to create more mass surveillance

>and they can do so in secret

Is that really true? Zero Data Retention (ZDR) is standard language in enterprise contracts and it seems quite egregious a vendor would want to take on that amount of liability and ignore the contract terms.

On top of that, Anthropic is SOC2 and ISO27001 so they've had _some_ independent auditing (although they could still try to hide such logging/recording anyway)

With that in mind, they also have a hell of an incentive to _not_ secretly collect that data.

Of course ZDR oftentimes comes with contract minimums so individuals and small corps are locked out and subject to the whims of the provider.


Remember that time that Amazon swore they weren't using data to outcompete people on their platform? Then they did that.

That was Amazon retail, which made no such promise.

AWS makes ZDR promises



Every retailer does that

Zero Data Retention does not mean zero information retention. First, this whole discussion regards AWS Bedrock's straightforward policy for users of OpenAI GPT-5.4 and GPT-5.5 models and Anthropic Claude Fable 5 saying "[inputs and outputs will be retained for up to 30 days](https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-d... )." That's plenty of time for a model training run using those inputs/outputs and, once the information is encoded into model weights, the original training data can be deleted to meet the ZDR contract. For models without the 30-day retention clause, it's still possible for AWS to route inputs and outputs through a dynamic training system to encode the information into model tensors and then toss out the original "data".

Edit: linkify


what an accountant audit help in this case? because that's literary all that's required for those.

I'm 100% certain they keep that for retraining. I've seen advertising pipelines promise the same thing and drown in data "because it's anonimized".

I'm certain same exact thing happens with Ai chatbots, even on top enterprise licenses.


SOC 2 and ISO27001 are definitely not accounting audits. Our auditors request policies, procedures, and evidence that we're following the policies and procedures. Oftentimes evidence is screenshots of the running environment (vomit) or audit logs. The auditor may or may not selectively request more information on demand (so you can't go in being sure you know what they're looking at)

If this is something you care about (compliance) your vendor due diligence process should include ensuring the company used a respected/trusted auditor.


right. because everyone cares about compliance. sorry for the snarky tone, but it really unavoidable here.

it IS an accounting certification. That include a cursory look at (likely outdated, often creator for the audit and never read by anyone) documentation.


Yeah cause all these frontier labs totally followed all relevant copyright and ip protection laws, so of course they'll follow your little contract, and what will be the consequences when it turns out they lied (again)? Oh maybe a fine, something fair like 0.5% of profits, can't make it too high or too anti business.

>Oh maybe a fine, something fair like 0.5% of profits, can't make it too high or too anti business.

No, this would be a civil lawsuit not criminal. The plaintiff (the harmed party) could sue Anthropic for whatever they wanted. Put another way, they're at the mercy of big corp army of lawyers, not a paid off politician.


I was under the impression that a SOC 2 Type 2 audit requires the auditor to verify access, so if you are purchasing a paid/business version from a top 3 vendor (Anthropic, Google, OpenAI) it is SOC 2 Type 2 and any SOC 2 Type 2 service has to maintain access logs and have an independent auditor validate that data isn't being accessed or used against the rules?

Essentially, this is why AWS is reporting this to begin with.


The root of the problem is that ordinary people don't make enough noises for any problems they see in life, so they are essentially cattles.

Do you care about cattle's opinions? I guess a few of us do, but most of us don't.


Would humans change their treatment of cattle if the cattle made louder noises? That seems doubtful.

A very large and powerful government puts an awful lot of effort into making sure people don't reference a particular time their military vehicle made contact with a person standing still decades ago.

They even design AI models around that!

Maybe -- I think that's the reasoning behind government-enforced bans on photography and recording inside of slaughterhouses.

That's not the "root", you can go at least one step further:

The wealthy CEOs and boardmembers found a way to make even more money, but know that it will make the people who are aware of it angry. So they, as a class, find other issues that they can enflame (or manufacture wholesale), through the manipulation of social media algorithms and legacy media, both of which they own and control. They would much rather have "ordinary people" angry about trans athletes or immigrants, than about the surveillance state they profit from, or stealing our data they profit from, etc...

Unfortunately, we humans are very easy to manipulate by making us angry. If "ordinary people don't make enough noises for any problems they see in life", its hardly our fault if we're too busy surviving in the current economy, and the elites are spending billions to make us angry about anything except the elites.


There's a limit to how much angry people can be. Dilute it on irrelevancies, the anger directed at the real problems goes down.

the root of the problem is that we have no data privacy laws.

It's all extremely dystopian and I don't see how things improve. The handful of megacorps that have access to the compute and troves of stolen IP to train their secret models on have no incentive to contribute back.

They say their models are too dangerous for the public, so they can nerf the GA versions while allowing only their preferred megacorp or nation state partners access to the real secret good versions.

We can hope the Chinese open weight models will catch up, but if/when they really reach parity with proprietary frontier models you can bet they'll stop releasing their weights too. They don't do this stuff out of the kindness of their hearts.

It's tough to imagine what might possibly derail this.


Realistically, local/open weight models will always be limited in idiosyncratic world knowledge compared to the proprietary frontier. There's just very limited upside to releasing tens or hundreds of terabytes of open weights for something that literally can only run in very large AI data centers, and Fable/Mythos is near enough to that class. Smaller models can be smart in very real ways, but the extent to which those "smarts" can apply to real-world problems will be limited.

I think the best bet is that that at some point going from 30B params to 9T params is realistically going to give the closed model a 10% edge in niche tasks, but that the open model would be very useful most of the time still.

I don't know how realistic that expectation is, but if you think about the difference between say 10,000 USD speakers and 50,000 speakers then the 50k ones may sound slightly better but certainly not enough to justify the 40k difference


It's also proven over and over again that people are okay with "good enough" 99% of the time:

- Smartphone cameras > dedicated cameras

- "UHD" streaming video > UHD Blueray @3-7x the bitrate

- 128kbps music streams > CDs

- Airpods > equally priced but much better sounding headphones

Sure the nicer stuff still exists and is indeed more performant, but it's not cheap and it's also not what's driving the market. I don't see why this won't apply to AI once local models become "good enough" too.


> The handful of megacorps that have access to the compute and troves of stolen IP to train their secret models on have no incentive to contribute back.

Meta and Anthropic both trained on pirated books and there were not required to destroy their models. I simply don't get it. It just encourages to do things first and see later what happens. Regulations are just a small business cost.


You got it right! Regulations are just for small guys! You don’t see agents after Anthropic’s CEO or after Sam Altman as we’ve seen on Kim Dotcom

What's interesting about the rise of the mega weight models is that if you look at the smaller models of the same family you see some significant improvements over time. So there's possibly some trickle down, at least some learning from techniques that is improving things across all model classes.

The other interesting one is how some of the Chinese open weights models have changed licenses that prevent some commercial exploitation of them. That's not closing their doors, but it's some steps towards ensuring their business model is protected.


> It's tough to imagine what might possibly derail this.

Public utilities?


I don't think this makes much sense. The best filter is money and they're not going to go through this convoluted malarkey to limit their customers.

IMHO this is about protecting their model. If you can get a N-1 model for 1% of the N cost their business breaks down.


> They don't do this stuff out of the kindness of their hearts

No, but they do have incentive to continue to release with open weights because doing so directly affects the US based labs that are doing this for profit and power.

What's likely to happen is import controls on software as a form of US protectionism. It will be the encryption battle all over again, but this time about your right to both run AI models locally on your own hardware (that the labs and big tech would love if you could continue to not able to afford or acquire so they can rent it to you), and a ban on the distribution and use of foreign models.

I wouldn't be surprised of Anthropic and OpenAI also successfully lobby for a limit on how big open source models can be in the US as well in the name of "safety."

Make no mistake, they all fully intend to pull the ladder up behind them, and they intend to do it soon.


You can see already a lot of PR from Anthropic on this(ban the unsafe open source) in all major newspapers(I.e WSJ,Ft etc).

I don't think there's any realistic way to block importing open source models.

Chinese open weight models will be forced to do the same to remain competitive with other frontier labs. The moat is data going forward.

Indeed. And we have to remember what it is that authorities and others are tapping into here: the human thought process.

I've said it before and I say it again: nomatter where you stand on generative AI's usefulness, you are crazy for putting your last private space – your thoughts – in the hands of someone other than you. Going further down this line will not end well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: