Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is the kind of thing that Anthropic et al should be worried about. As it becomes easier and easier to run local models, the ceiling of what they'll be able to charge will get lower and lower. Not that nobody will be willing to pay $$$$$ per month, but a lot of people are going to multiply the per-month charge by 12 or 24 and say "Could I set up a local model for less than that, and have it pay for itself within a year or two?" And if a significant portion of customers decide to buy instead of rent, the companies whose business model is entirely centered around renting will suddenly find themselves hurting for customers.
 help



The opposite of that has been happening for 20 years now with cloud compute.

It won't happen with AI models either.

It's almost ingrained in the American business model now. Outsource everything. Nobody wants to manage a room full of servers when they can spend 2-3x as much and outsource that headache along with the responsibility for it.

Same will happen with AI. Whether that means paying Anthropic that premium or paying AWS.

I'm in a relatively small business, we recently had an outage related to our local infrastructure.

I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single of the larger recent AWS outages.

Everyone wants to shuck the chore and the responsibility.


> The opposite of that has been happening for 20 years now with cloud compute. It won't happen with AI models either.

AI is different.

Cloud computing genuinely is cheaper on average. It's better than paying for cisco servers, and at scale, it's cheaper than managed platforms (ala Heroku), and it's a coin toss for when you're in the middle ground and constantly approaching the point of rebuilding poor-man versions of existing products but with very very expensive engineering salaries.

In contrast, local models offer dramatic savings, and are magnitude of orders better in certain aspects: like stability - the performance is all over the place with traditional AI companies as they divert compute to their next big thing.

The benefits to maintaining your own infrastructure are pretty moderate to low, with very high risk.

And also, alternate models are pretty easy to use and easy to swap out unlike the vendor lock-in that exists with cloud services.


> AI is different.

I agree. The other thing here is that, once you can run LLMs on a single piece of commodity hardware (whether that includes one GPU or several), the difference between cloud vs. on-premise LLMs will largely be about where your hardware is located. There will be very little software configuration involved (just an HTTP endpoint that talks to the GPU). This is decidedly different from cloud products where the moat of hyperscalers is largely in the software and services on top of the hardware, not the hardware itself. (Sure, GPUs will eventually break & need replacement, too, but there's no state to lose, so that's already orders of magnitude easier than replacing hard drives.)


There's also a difference in the cost of downtime. A server hosting your website or SaaS, if it's down for five minutes, costs you a lot of real revenue. So you plan for redundancy, you set up automatic failover so that if one node goes down the next node can handle the load while the first one reboots, and so on. But for the LLM that's just serving your local model? You can tell everyone "Hey, we're taking it down for a 15-minute window, so plan your lunch break while it's down". Unplanned downtime can interrupt what people were doing and cost you productivity and thus money, but it's a lot easier to schedule planned downtime and have people work on non-model-using tasks during those periods: the model is helpful, but not essential.

There's no economic reason why running a model locally should be better than using a cloud hosted version.

“There is no reason anyone would want a computer in their home." - Ken Olson, Founder of Digital Equipment Corporation, in 1977

In hindsight this is getting truer, what with the push of dumb terminal for everyone

Everyone has at least one in their pocket right now though.

Sure there is. Keeping your IP in house.

You pay a 3x markup to rent a server through AWS than managing your own. You pay for convenience. At shall annals that's fine, but for large companies with their own datacenters, you generally do things in house.

> Cloud computing genuinely is cheaper on average.

For some applications, sure. Availability is a large part of what one is paying for with cloud computing, but it's also something that not every business needs.

If you sacrifice availability and have a pure-compute use case (low durability requirements), on-prem can quickly end up cheaper for far better hardware.


AI is different because you can't encrypt it. An running on someone else's hardware is basically just 'trust me bro! I won't read it!'. Of course you can say that about I.e. database too, but at least you can run it on your own dedicated hardware in some datacenter, so it is password protected, you can encrypt it at rest and you will only know the key.

With AI, no, you can't . model needs plain text to be able to work. If somebody will be able to figure out models with asymmetric keys will make a lot of money.


For many companies (country-dependent) that's not really why they use cloud services vs purchasing. It's tax shenanigans and business process overhead. OpEx vs CapEx, and a small (%) bump in the huge AWS bill no one will even notice or a $30k+ invoice for hardware that has to go through rigorous review and 3 departments.

Same reason people pay for things through the AWS marketplace (like Vanta) instead of having to go through their invoicing process.


Good point. Maybe there'll be companies that maintain your on-premise GPU cluster just like there are companies that service the coffee machine in your office?

This is far more likely than everyone racking their own servers.

> on-premise GPU cluster

Renting a GPU server from a cloud and hosting your own llama.cpp is the path of least resistance.


It's just not comparable though is it? You need cloud services because it's physically impossible to use your single home computer as a server, CDN, load balancer, mass storage, security service, and distributed system.

But AI is just weights, you can run a reasonably intelligent model at home, or on a few GPUs if you're a small-medium sized company, and it doesn't require dedicated maintenance.


If you're a medium-large company, you should definitely run your own AI because you can max out the CPUs more often. You're not only able to run privately and locally, but you're also able to run efficiently.

> I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single of the larger recent AWS outages.

Same here. My job as a software dev does not require me to self-host services we need and use. Quite the opposite. But, I am reluctant to hand over all control to AWS or equivalent for several reasons that I will get into here.

I have found that Infrastructure as Code (IaC) and modern tools like opentofu, ansible, combined with frontier AI models and harnesses gives you superpowers in this space. Almost all of our self-hosted services are fully managed by these tools. e.g. We perform backups and test them more often now than we ever did before. Entirely because it is so much easier to do all of that now.


IMO local-vs-cloud may be a misleading dichotomy, versus:

    1. Individual dev machines
    2. Shared local server
    3. Shared server in corporate cloud
    4. Third-party LLM SaaS provider
Even if you don't want your laptop melting, there are still some important differences between 3 and 4 in terms of data privacy and security.

> It won't happen with AI models either.

AI is definitely different. Cloud compute is incredibly convenient to the point where even if AWS is more expensive it's just so _nice_. LLM models are much more abstract and while I can't easily swap AWS for Hetzner to save 80% of my costs I can absolutely get close to that for many of LLM tasks, even today.

I suspect Anthropic and gang all know that that's why they are buying up dev tools and shifting towards long-running agents because that's where they can get AWS's "nicesness" that they can charge for.


There is efficiency in the cloud model for models. So maybe there is a scope for Apple or an "Apple for AI" in the AI compute game - mainly from the perspective of privacy etc.

And once the servers are in space, everything is fully out there.


Still though, perhaps the existence of low-margin, generic, cloud LLM's puts some downward pressure on the 'brand name' companies?

I suppose cloud won because: - nobody wants to deal with the networking stack on the internet - you want servers alive all the time - it's businesses running their software on servers to serve to customers

Do these apply to AI?


That's an interesting take, however there is no ongoing maintenance related to local models, maybe the only effort is giving more capable machines to the workforce; but yeah I can see how it might feel like a barrier.

The hardware, the power systems, the cooling systems. They need maintenance.

The OS needs updates, file systems get corrupted.

Fans get dirty.

All the things that you need to deal with in hosting your own server infrastructure you have to deal with when hosting your own AI infrastructure (which runs on servers...)


However, you can get many of the benefits of a "local model" by outsourcing all the hardware maintenance but still using an open model. Guaranteed repeatability for one.

A lot of the reason people outsource normal software is its brittle security properties, not sure that even applies to an LLM - it can go and look up the latest security best practices just like an engineer can.


on prem cloud is harder because of the scale up and scale down requirements. If you are a growing business which most decent ones are, you constantly have to think about that.

> Everyone wants to shuck the chore and the responsibility.

Which gives all the power to the big techs. I'll never understand why the average company seems to have no problem with this.


Did you build your own house using tools that you forged from iron-rich ore yourself? Did you grow your own wheat to make bread for your lunchtime sandwich today?

There's a reason most people pay other people to do these things for them.


It's a longstanding management principle, so old that people may not even say it explicitly any more, which states "focus on your core competencies," the corollary of which is "outsource anything that is not a core competency."

I can see how it makes sense for companies, because money is "only money" but an ongoing operational distraction can be much more costly, as in, it can be detrimental to the success of the overall business.


> in the American business model

AI company valuations won't survive if they're only for the "American business model".


Exactly. American businesses aren't even particularly efficient or well run

outsource that headache along with the responsibility for it

You know what gives me headaches? When I'm in the middle of a session and the model gets rug-pulled out from under me because somebody at the model provider didn't pay the Trump bill that month.

Or when someone at the model provider decides that the curve-fitting algorithm in my graphics package looks a little too much like Skynet for comfort.

Or when they do any number of other things to undermine my work for the sake of their business model, some of which I won't even notice until the damage is done.

The sad thing is, if you know how inference works, you know that it really is insanely wasteful for everybody to run it locally. If anything naturally belongs in the cloud, it's inference. But at the same time, what choice are we being given?


What about inference suggests it naturally belongs in the cloud?

Inference basically looks like this (neglecting a whole bunch of stuff):

    for t in tokens_in_context
        for p in model_weights
            do something with p*t
The expensive part is fetching each weight from memory, which is why VRAM/HBM is such a big deal. Conceptually, for a huge, dense (non-MoE) model, the inner loop might run a trillion times for every token generated.

Obviously that's not how it really works in practice, but the point is, if you are only running one prompt at a time, each weight gets fetched, applied to the token being processed, and then never touched again until the next token is processed.

So when you submit a prompt to a model that's running a bunch of other peoples' contexts concurrently, it can reuse each weight multiple times before moving on to the next one:

    for p in model_weights
        for u in users
           for t in u's context
              do something with p*t
The same is true in an agent-heavy scenario where you have several contexts in play at once.

Worst case, in terms of energy efficiency, is a single user sitting around waiting for a single response. I don't feel like I'm explaining it well, but the core idea is that every time a weight is fetched from memory, you want to get as much work done as possible with it.


That makes a lot of sense, thank you. I think a pirate cloud of local models could make sense, but that would be regulated into oblivion

I'm curious when coding-heavy companies will start running their own on-prem AI clusters. Has anyone had the idea to sell something like 4 GPU machine an engineering team could throw in a closet somewhere and run whatever they want on it? I imagine this won't appeal to everybody but with the trust issues the hyperscalers have developed hoovering up people's data and using it to train their models, I imagine some will find value in a machine and model they have transparent control over including the option to walk over and unplug the thing.

Has anyone had the idea to sell something like 4 GPU machine an engineering team could throw in a closet somewhere and run whatever they want on it?

I think that's basically Geohot's business model at Tiny Corp.


Earlier I was thinking it's maybe comparable to paying for Netflix vs torrenting and running Plex or something. For the majority of normal, mainstream users I feel like most would just pay for the thing that is already setup and ready for them. There'll still be all the more techy or determined types who will do it themselves, I just wonder what the percentages of both groups will be.

> I feel like most would just pay for the thing that is already setup and ready for them

Nothing stopping turnkey OSS AI hardware being productised, including niceties like opt-in automated updates. If the trend continues of models becoming smaller and more capable for everyday use, it also derisks against obsolescence.


if we get to the stage where the AI hardware is a more of a commodity and usability becomes 10x simpler, then people may buy their own hardware and run local models.

Everybody owns a car, washer, TV, etc today. Maybe one could finance a server-box/trailer costing $20k, trade it in every 7 years for a newer model, etc. Many people are going to own a $20k Optimus.


The car, TV, washer, and whatever humanoid robot finds product market fit physically need to be in my house, or close to it, in order for them to be useful to me. Thanks to the Internet, the data center doesn't need to be, like at all. Economy of scale says that renting a slice of time on the most expensive GPU supercomputer out there is going to be faster and also probably cheaper since I'd only be getting a slice while the server is serving multiple users.

Why don't you just buy a chromebook or equiv and do that now then?

They are working hard on you not being able to run a thing locally. OpenAI buys all RAM on the spot market, causing the rise of RAM/VRAM prices 6x, making GPUs and decent computers unreachable for the majority of the population. OK, some richer folks might be able to get a 512GB MacStudio or a single RTX Pro 6000 for 13k and be able to run some decent local models, but the vast majority will need to use API. And at some point Nvidia might say: "We don't sell that many 6000s, so let's just cancel them altogether as we can gain 4x profit on datacenter-only GPUs" and then they'll become unobtainium and no private person would ever be able to run anything decent (~1 year behind the frontier) locally.

I wonder if this move will backfire on them. All the fabs are focusing on HBM and leaving DDR behind, if one of the big frontier labs folds all the memory fabs will be left holding a big bag of HBM memory. They won't have any other choice but sell for cheap so it wouldn't surprise me if we see a return of HBM in the consumer market in 3-5 years.

These local models can do some of the work the non-frontier models can do but for me, that's not worth much. If I am just using Sonnet 4.6, I can pretty much work all day on the $20/month plan. And Sonnet is still a way more powerful model than a one you could self host on an M2 mac.

If things change to token usage billing for everyone, maybe I'll be singing a different tune but on a subscription, I don't think it makes sense financially.

Fun? Yes. Financially sound? No.


What about when the gravy train stops and Sonnet is priced with some marine above the cost to provide it?

The general consensus is that local models will continue to improve drastically, but hosted models will as well. There will _always_ be a pretty big gulf of capability between what you can do with a desk full of hardware at home vs a few racks of hardware in a datacenter. That seems to be the real "moat" of hosted models at this point in time: access to capital.

What's interesting/exciting is that local models are _already_ quite good at tasks we never imagined AI _ever_ doing before ChatGPT hit the scene just a few short years ago.

We're also in an interesting point in time where companies are releasing the fruits of their research/labor (the LLMs) to the general public for free. For now, I think they see it in their best interest to gain mindshare and rapport, as well as advancing the state of the art in smaller LLMs ("a rising tide lifts all boats") but I fear and expect that these will dry up as the major players buy the minor players, and all will seek a return on their considerable investments in AI research.


I believe there's a level of diminishing returns. Sure, SOTA will probably always benchmark better than local models. But do we need it? That's the question that the likes of OpenAI and Anthropic should be worried about.

The difference won't be in the individual tasks. It'll be in the scale of job they can take on and how you interact with the model. Think of pairing with a junior vs replacing a full delivery team, that's the sort of difference we'll be looking at. We'll be able to get closer to the latter by being more clever with harnesses, I reckon, but the frontier labs will run ahead because for any given harness trick they can lean harder on model smarts.

True, but my point is that if/when local models get to the point where they are capable of doing the "delivery team" work what's next? What can these bigger SOTA models offer? And especially what can they offer above and beyond what you might be able to get from much cheaper models which the open models are based on?

That's what I mean by diminishing returns.


> There will _always_ be a pretty big gulf of capability between what you can do with a desk full of hardware at home vs a few racks of hardware in a datacenter

True, but this difference _should_ shrink with the hardware craze and we're able to buy memory (the biggest bottleneck at the moment) again, shouldn't it?


There is also the thing of workflow.

We have set up something where you create a ticket, Make sure it contains enough information, and with the right tag added it will make a branch with PR for you which stays up to date based on updates to the ticket and comments on the PR.

It’s creepy in a way. But you also can’t really use local (as in workstation LLM) for that. Sure we could run something like a distributed task scheduler across all our engineer devices but just pushing it to copilot is easier.


It the model is as good as composer, has a decent harness around it, and isn't incompetent at tool calls - it'll be useful at least as a sub agent for most workflows in perpituity.

Nothing will improve drastically anymore. And when big ones run out of money to burn, who will train your local models?

AI usage is very spiky and good models require very expensive hardware. Running locally would just result in it sitting idle ~90% of the time. I think renting will always be cheaper, for comparable performance at least.

> but a lot of people are going to multiply the per-month charge by 12 or 24 and say "Could I set up a local model for less than that, and have it pay for itself within a year or two?" And if a significant portion of customers decide to buy instead of rent, the companies whose business model is entirely centered around renting will suddenly find themselves hurting for customers.

And those are going to all be big enterprise companies that probably will set up LLM services entirely in-house, because they've got the headcount to utilize servers at 100%.

I wonder if there will be (or is currently) business in selling their compute while they're not working, to opposite time zones, etc.

What's left for the big providers will be the dregs of individual subscriptions and small businesses that at their least paranoid might let employees just use their own subscriptions for work.


I know coding is the killer app thus far, but if businesses are seeing any kind of significant cost for other LLM usecases, seems like at least a decent consultancy opportunity to set up medium-sized businesses with in-house kit.

The other question is how the middle ground (hetzner etc) is shaping up, because obviously so many orgs won't want to run servers.


What I don't understand is that on one hand we read 'what they charge is much less than it costs them' and on the other hand this thread seems to suggest that 'what they charge is more than it would cost me'.

What it costs is tricky to measure. A large part of the costs are training the model. Once they have the model they are making a ton of profit from what they charge (or so we think - I haven't seen the numbers). However the sunk costs of getting the model need to be paid for and that means an accounting problem where we have to guess how much the model will be used in the future.

Accountants are reasonably good at figuring this out - there are a lot of different things that need a large upfront investment before you can charge anything. People still debate if they are correct in this each case.


Bigger models that Antrophic want to sell cost disproportionately more (e.g. 100% more cost for 5% performance improvement) than small models you would use locally

They have to provide the service at peak scale and high-availability, your local setup doesn't have those extremely expensive requirements.

Someone was able to run gemma-4-26B-A4B on an i5-8500 with 32 gb ram with NO GPU. Granted this is an extreme example these MoE models are value for money for a lot of use cases.

https://www.reddit.com/r/LocalLLaMA/s/YontVNVRbL


Maybe that is why they are buying up as much hardware as they can? If their service is the only game in town.

Data Center providers are buying hardware, not anthropic. Certainly related but alot of the hardware purchased is just sitting in a warehouse waiting for a data center to get built.

Anthropic isn't just renting out compute, they're renting out a closed model that's better than anything you can download for free. So they're rightfully focused on preventing others from distilling their model.

It's in Anthropic's best interest to focus the conversation on "distillation".

Imo the more interesting thing to focus on is that there are now several more labs with the expertise and capabilities to train trillion parameter models. That's a serious technical accomplishment and the main reason why open models are catching up to Anthropic and OpenAI (and local models are typically distillations of much larger models).

Who cares that they got some small amount of training data out of Claude. The crux is that the big US labs are not special, they just have a first mover advantage that's slowly shrinking as incremental progress becomes harder.


i think in the long term the problem is going to be this: a great small model always come from a greater large model, but the larger base model keep getting larger and more closed sourced

so long as there's no algorithm breakthrough


Local models will never achieve "real" performance (i.e actual usage, not benchmarks) compared to frontier models.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: