True, and they're being tried in a federal court of law for it. NYT v. OpenAI is still very much alive, these things just take a while. Can the same be said about DeepSeek or any other open-source model provider performing distillation?
Pandora's box has already been opened and there is no going back. I doubt OpenAI, et al will get anything but a slap on the wrist in court because punishing AI companies would have a negative effect on the US economy.
>Can the same be said about DeepSeek or any other open-source model provider performing distillation?
Open source models that distill from SoTA reminds me of the story of Robin Hood -- robbing the rich and giving it to the poor. So to answer your question: yes, but it's better than the alternative where only a select few companies have SoTA models.
Robin Hood, famous for spinning his acts into a $220M ARR SaaS business (as of mid 2025 [0], likely >$1B by now) and using charity as a marketing mechanism.
Since it's open weights it'll be available on AWS Bedrock soon(ish), likely at a higher price than the official API but still coming in under those GPT-5-mini prices.
Depends on who's making the call for who gets cut. A key part of decimation was that the doomed soldiers were beaten to death by their comrades to give the remaining 9 a bloody, lasting impression of their dishonor. If Meta makes everybody sit in a group with their ten closest coworkers and debate until they decide who gets cut it's a lot closer to decimation than if management suddenly shuts off 10% of employee computers.
Yes, once benchmarks get saturated they get replaced by harder ones. You don’t see GSM8K, MMLU, or HellaSwag anymore because they’re essentially solved. It takes constant work to make benchmarks hard enough to show meaningful model performance differences but easy enough to score higher than the noise threshold.
It's frustrating how cavalier they are about killing old Gemini releases. My read is that once a new model is serving >90% of volume, which happens pretty quickly as most tools will just run the latest+greatest model, the standard Google cost/benefit analysis is applied and the old thing is unceremoniously switched off. It's actually surprising that they recently extended the EOL date for Gemini 2.5. Google has never been a particularly customer-obsessed company...
Consistency, new models don't behave the same on every task as their predecessors. So you end up building pipelines that rely on specific behavior, but now you find that the new model performs worse with regards to a specific task you were performing, or just behaves differently and needs prompt adjustments. They also can fundamentally change the default model settings during new releases, for example Gemini 2.5 models had completely different behavior with regards to temperature settings than previous models. It just creates a moving target that you constantly have to adjust and rework instead of providing a platform that you and by extension your users can rely on. Other providers have much longer deprecation windows, so they must at least understand this frustration.
> Consistency, new models don't behave the same on every task as their predecessors. So you end up building pipelines that rely on specific behavior
If this is a deal breaker, then self-hosting is the only solution. Due to the hardware premium, all models hosted by 3rd-parties will be deprecated to make room for newer, better, and more efficient models.
Sure, but Google also leaves little to no overlap between models and often will leave models in preview mode (which many companies cannot use in production for legal reasons) - right up until the point that the previous model is deprecated.
The point is that if you want to build a platform that customers can rely on based on their own schedules of feature development, you need to support models for longer periods of time. For example, OpenAI is still offering older models like gpt4 which was released in 2023 - this gives customers plenty of time to test, experiment and eventually migrate to a newer model if it makes sense.
If you're trying to run repeatable workflows, stability from not changing the model can outweigh the benefits of a smarter new model.
The cost can also change dramatically: on top of the higher token costs for Gemini Pro ($1.25/mtok input for 2.5 versus $2/mtok input for 3.1), the newer release also tokenizes images and PDF pages less efficiently by default (>2x token usage per image/page) so you end up paying much much more per request on the newer model.
These are somewhat niche concerns that don't apply to most chat or agentic coding use cases, but they're very real and account for some portion of the traffic that still flows to older Gemini releases.
The impact of a few more network calls and decreased privacy is basically never felt by users beyond this abstract "they're spying on me" realization. The impact of this telemetry for a product development team is material.
Not saying that telemetry more valuable than privacy, just that it's a straightforward decision for a company to make when real benefits are only counterbalanced by abstract privacy concerns. This is why it's so universally applied across apps and tools developed commercially.
For most CLIs, I definitely feel extra network calls because they translate to real latency for commands that _should_ be quick.
If I run "gh alias set foo bar", and that takes even a marginally perceptible amount of time, I'll feel like the tool I'm using is poorly built since a local alias obviously doesn't need network calls.
I do see that `gh` is spawning a child to do sending in the background (https://github.com/cli/cli/blob/3ad29588b8bf9f2390be652f46ee...), which also is something I'd be annoyed at since having background processes lingering in a shell's session is bad manners for a command that doesn't have a very good reason to do so.
If it's done in a background process then it won't impact the speed of the tool at all. When the choice is between getting data to help improve the tool at the cost of "bad manners" whatever that means, the choice is pretty easy.
I just bought an M5 Macbook from an electronics retailer because they actually stocked it, whereas ordering the same machine for the same price from Apple would have been a custom build delivered mid May.
Apple has an enormous global footprint. Their devices are made in China, India, and Vietnam, and source parts from basically everywhere. More than half of Apple's revenue comes from outside the US and there are 1.5 billion iPhones in use across the world (somewhat larger than the population of the US).
reply