The root of the problem is that AI-as-a-service is corked, because companies providing it have a hell of an incentive to use all that data to out-compete their competitors, and they can do so in secret. To say nothing of salivating law-enforcement who really, really wants to tap into it. I'm hoping there will be at some point open-source and affordable hardware that can run competent models.
Is that really true? Zero Data Retention (ZDR) is standard language in enterprise contracts and it seems quite egregious a vendor would want to take on that amount of liability and ignore the contract terms.
On top of that, Anthropic is SOC2 and ISO27001 so they've had _some_ independent auditing (although they could still try to hide such logging/recording anyway)
With that in mind, they also have a hell of an incentive to _not_ secretly collect that data.
Of course ZDR oftentimes comes with contract minimums so individuals and small corps are locked out and subject to the whims of the provider.
Zero Data Retention does not mean zero information retention. First, this whole discussion regards AWS Bedrock's straightforward policy for users of OpenAI GPT-5.4 and GPT-5.5 models and Anthropic Claude Fable 5 saying "[inputs and outputs will be retained for up to 30 days](https://docs.aws.amazon.com/bedrock/latest/userguide/abuse-d... )." That's plenty of time for a model training run using those inputs/outputs and, once the information is encoded into model weights, the original training data can be deleted to meet the ZDR contract.
For models without the 30-day retention clause, it's still possible for AWS to route inputs and outputs through a dynamic training system to encode the information into model tensors and then toss out the original "data".
SOC 2 and ISO27001 are definitely not accounting audits. Our auditors request policies, procedures, and evidence that we're following the policies and procedures. Oftentimes evidence is screenshots of the running environment (vomit) or audit logs. The auditor may or may not selectively request more information on demand (so you can't go in being sure you know what they're looking at)
If this is something you care about (compliance) your vendor due diligence process should include ensuring the company used a respected/trusted auditor.
right. because everyone cares about compliance. sorry for the snarky tone, but it really unavoidable here.
it IS an accounting certification. That include a cursory look at (likely outdated, often creator for the audit and never read by anyone) documentation.
Yeah cause all these frontier labs totally followed all relevant copyright and ip protection laws, so of course they'll follow your little contract, and what will be the consequences when it turns out they lied (again)? Oh maybe a fine, something fair like 0.5% of profits, can't make it too high or too anti business.
>Oh maybe a fine, something fair like 0.5% of profits, can't make it too high or too anti business.
No, this would be a civil lawsuit not criminal. The plaintiff (the harmed party) could sue Anthropic for whatever they wanted. Put another way, they're at the mercy of big corp army of lawyers, not a paid off politician.
I was under the impression that a SOC 2 Type 2 audit requires the auditor to verify access, so if you are purchasing a paid/business version from a top 3 vendor (Anthropic, Google, OpenAI) it is SOC 2 Type 2 and any SOC 2 Type 2 service has to maintain access logs and have an independent auditor validate that data isn't being accessed or used against the rules?
Essentially, this is why AWS is reporting this to begin with.
A very large and powerful government puts an awful lot of effort into making sure people don't reference a particular time their military vehicle made contact with a person standing still decades ago.
That's not the "root", you can go at least one step further:
The wealthy CEOs and boardmembers found a way to make even more money, but know that it will make the people who are aware of it angry. So they, as a class, find other issues that they can enflame (or manufacture wholesale), through the manipulation of social media algorithms and legacy media, both of which they own and control. They would much rather have "ordinary people" angry about trans athletes or immigrants, than about the surveillance state they profit from, or stealing our data they profit from, etc...
Unfortunately, we humans are very easy to manipulate by making us angry. If "ordinary people don't make enough noises for any problems they see in life", its hardly our fault if we're too busy surviving in the current economy, and the elites are spending billions to make us angry about anything except the elites.
It's all extremely dystopian and I don't see how things improve. The handful of megacorps that have access to the compute and troves of stolen IP to train their secret models on have no incentive to contribute back.
They say their models are too dangerous for the public, so they can nerf the GA versions while allowing only their preferred megacorp or nation state partners access to the real secret good versions.
We can hope the Chinese open weight models will catch up, but if/when they really reach parity with proprietary frontier models you can bet they'll stop releasing their weights too. They don't do this stuff out of the kindness of their hearts.
It's tough to imagine what might possibly derail this.
Realistically, local/open weight models will always be limited in idiosyncratic world knowledge compared to the proprietary frontier. There's just very limited upside to releasing tens or hundreds of terabytes of open weights for something that literally can only run in very large AI data centers, and Fable/Mythos is near enough to that class. Smaller models can be smart in very real ways, but the extent to which those "smarts" can apply to real-world problems will be limited.
I think the best bet is that that at some point going from 30B params to 9T params is realistically going to give the closed model a 10% edge in niche tasks, but that the open model would be very useful most of the time still.
I don't know how realistic that expectation is, but if you think about the difference between say 10,000 USD speakers and 50,000 speakers then the 50k ones may sound slightly better but certainly not enough to justify the 40k difference
It's also proven over and over again that people are okay with "good enough" 99% of the time:
- Smartphone cameras > dedicated cameras
- "UHD" streaming video > UHD Blueray @3-7x the bitrate
- 128kbps music streams > CDs
- Airpods > equally priced but much better sounding headphones
Sure the nicer stuff still exists and is indeed more performant, but it's not cheap and it's also not what's driving the market. I don't see why this won't apply to AI once local models become "good enough" too.
> The handful of megacorps that have access to the compute and troves of stolen IP to train their secret models on have no incentive to contribute back.
Meta and Anthropic both trained on pirated books and there were not required to destroy their models. I simply don't get it. It just encourages to do things first and see later what happens. Regulations are just a small business cost.
What's interesting about the rise of the mega weight models is that if you look at the smaller models of the same family you see some significant improvements over time. So there's possibly some trickle down, at least some learning from techniques that is improving things across all model classes.
The other interesting one is how some of the Chinese open weights models have changed licenses that prevent some commercial exploitation of them. That's not closing their doors, but it's some steps towards ensuring their business model is protected.
> They don't do this stuff out of the kindness of their hearts
No, but they do have incentive to continue to release with open weights because doing so directly affects the US based labs that are doing this for profit and power.
What's likely to happen is import controls on software as a form of US protectionism. It will be the encryption battle all over again, but this time about your right to both run AI models locally on your own hardware (that the labs and big tech would love if you could continue to not able to afford or acquire so they can rent it to you), and a ban on the distribution and use of foreign models.
I wouldn't be surprised of Anthropic and OpenAI also successfully lobby for a limit on how big open source models can be in the US as well in the name of "safety."
Make no mistake, they all fully intend to pull the ladder up behind them, and they intend to do it soon.
Indeed. And we have to remember what it is that authorities and others are tapping into here: the human thought process.
I've said it before and I say it again: nomatter where you stand on generative AI's usefulness, you are crazy for putting your last private space – your thoughts – in the hands of someone other than you. Going further down this line will not end well.