Hacker Newsnew | past | comments | ask | show | jobs | submit | jimbo808's commentslogin

I have read this same comment so many times in various forms. I know many of them are shill accounts/bots, but many are real. I think there are a few things at play that make people feel this way. Even if you're in a CRUD shop with low standards for reliability/scale/performance/efficiency, a person who isn't an experienced engineer could not make the LLM do your job. LLMs have a perfect combination of traits that cause people to overestimate their utility. The biggest one I think is that their utility is super front-loaded.

If a task before would take you ten hours to think through the thing, translate that into an implementation approach, implement it, and test it, and at the end of the ten hours you're 100% there and you've got a good implementation which you understand and can explain to colleagues in detail later if needed. Your code was written by a human expert with intention, and you reviewed it as you wrote it and as you planned the work out.

With an LLM, you spend the same amount of time figuring out what you're going to do, plus more time writing detailed prompts and making the requisite files and context available for the LLM, then you press a button and tada, five minutes later you have a whole bunch of code. And it sorta seems to work. This gives you a big burst of dopamine due to the randomness of the result. So now, with your dopamine levels high and your work seemingly basically done, your brain registers that work as having been done in those five minutes.

But you now (if you're doing work people are willing to pay you for), you probably have to actually verify that it didn't break things or cause huge security holes, and clean up the redundant code and other exceedingly verbose garbage it generated. This is not the same process as verifying your own code. First, LLM output is meant to look as correct as possible, and it will do some REALLY incorrect things that no sane person would do that are not easy to spot in the same way you'd spot them if it were human-written. You also don't really know what all of this shit is - it almost always has a ton of redundant code, or just exceedingly verbose nonsense that ends up being technical debt and more tokens in the context for the next session. So now you have to carefully review it. You have to test things you wouldn't have had to test, with much more care, and you have to look for things that are hard to spot, like redundant code or regressions with other features it shouldn't have touched. And you have to actually make sure it did what you told it to, because sometimes it says it did, and it just didn't. This is a whole process. You're far from done here, and this (to me at least) can only be done by a professional. It's not hard - it's tedious and boring, but it does require your learned expertise.


So set up e2e tests and make sure it does things you said you wanted. Just like how you use a library or database. Trust, but verify. Only if it breaks do you have to peak under the covers.

Sadly people do not care about redundant and verbose code. If that was a concern, we wouldn't have 100+mb of apps, nor 5mb web app bundles. Multibillion b2b apps shipping a 10mb json file just for searching emojis and no one blinks an eye.


The effort to set up e2e tests can be more than just writing the thing. Especially for UI as computers just does not interpret things as humans do (spatial relation, overflow, low to no contrast between elements).

Also, the assumption that you can do ___ thing (tests, some dumb agent framework, some prompting trick), and suddenly magically all of the problems with LLMs vanish, is very wrong and very common.

> Also, the assumption that you can do ___ thing

...

3. profit

4. bro down


I just wanna make the point that I've grown to dislike the term 'CRUD' especially as a disparaging remark against some software. Every web application I've worked on featured a database, that you could usually query or change through a web interface, but that was an easy and small part of the whole thing it did.

Is a webshop a CRUD app? Is an employee shift tracking site? I could go on, but I feel 'CRUD' app is about as meaningful a moniker as 'desktop app'


It's a pretty easy category to identify, some warning signs:

- You rarely write loops at work

- Every performance issue is either too many trips to the database or to some server

- You can write O(n^n) functions and nobody will ever notice

- The hardest technical problem anyone can remember was an N+1 query and it stuck around for like a year before enough people complained and you added an index

- You don't really ever have to make difficult engineering decisions, but if you do, you can make the wrong one most of the time and it'll be fine

- Nobody in the shop could explain: lock convoying, GC pauses, noisy neighbors, cache eviction cascades, one hot shard, correlating traces with scheduler behavior, connection pool saturation, thread starvation, backpressure propagation across multiple services, etc

I spent a few years in shops like this, if this is you, you must fight the urge to get comfortable because the vibe coders are coming for you.


I think a lot of the proliferation of AI as a self-coding agent has been driven by devs who haven’t written much meaningful code, so whatever the LLM spits out looks great to them because it runs. People don’t actually read the AI’s code unless something breaks.

There are exceptions to what I'm about to say, but it is largely the rule.

The thing a lot of people who haven't lived it don't seem to recognize is that enterprise software is usually buggy and brittle, and that's both expected and accepted because most IT organizations have never paid for top technical talent. If you're creating apps for back office use, or even supply chain and sometimes customer facing stuff, frequently 95% availability is good enough, and things that only work about 90-95% of the time without bugs is also good enough. There's such an ingrained mentality in big business that "internal tools suck" that even if AI-generated tools also suck similarly it's still going to be good enough for most use cases.

It's important for readers in a place like HN to realize that the majority of software in the world is not created in our tech bubble, and most apps only have an audience ranging from dozens to several thousands of users.


Internal tools do suck as far as usability, but you can bet your ass they work if they're doing things that matter to the business, which is most of them. Almost every enterprise system hooks into the finance/accounting pipeline to varying degrees. If these systems do not work at your company I'd like to know which company you work at and whether they're publicly traded.

A potential difference I see is that when internal tools break, you generally have people with a full mental model of the tool who can take manual intervention. Of course, that fails when you lay off the only people with that knowledge, which leads to the cycle of “let’s just rewrite it, the old code is awful”. With AI it seems like your starting point is that failure mode of a lack of knowledge and a mental model of the tool.

Forced? It's been a delight. I'd say if anything, I've only ever felt forced to use MacOS or Windows, never forced to use Linux.

Who could have possibly anticipated this?

As long as you pretend these didn't happen:

Iran (1953) - overthrow of the Shah by the CIA

Guatemala (1954) - overthrow of an elected government on behalf of US corporate interests

Cuba (1961) - invaded Cuba via proxy forces, attempted to assassinate Castro (Bay of Pigs)

Vietnam, Laos, Cambodia (1955–1975)

Chile (1973) – overthrow of Salvador Allende

Nicaragua (1980s) - overthrow the Sandinista government (Contra war) without international authorization

Panama (1989) - invaded and overthrew the government without international without international authorizationauthorization

Iraq (2003) - invaded and overthrew the government without international authorization

Serbia (1999) - airstrikes without international authorization

Libya (2011) - exceeded authorization by UN to effect regime change

Syria (2014–present) - US military occupation and oil seizure is ongoing

There are many more, these are the more notable ones


A$ap Rocky's music videos have some really good examples of how AI can be used creatively and not just to generate slop. My favorite is Taylor Swif, it's a super fun video to watch.

https://www.youtube.com/watch?v=5URefVYaJrA


I think we massively downplay the experience and expertise required to ask the right question.

> which is going to blow out the economics on inference

At this point, I don't even think they do the envelope math anymore. However much money investors will be duped into giving them, that's what they'll spend on compute. Just gotta stay alive until the IPO!


At the end of the day, it doesn't really get you that much if you get 70% of the way there on your initial prompt (which you probably spent some time discussing, thinking through, clarifying requirements on). Paid, deliverable work is expected to involve validation, accountability, security, reliability, etc.

Taking that 70% solution and adding these things is harder than if a human got you 70% there, because the mistakes LLMs make are designed to look right, while being wrong in ways a sane human would never be. This makes their mistakes easy to overlook, requiring more careful line-by-line review in any domain where people are paying you. They also duplicate code and are super verbose, so they produce a ton tech debt -> more tokens for future agents to clog their contexts with.

I like using them, they have real value when used correctly, but I'm skeptical that this value is going to translate to massive real business value in the next few years, especially when you weigh that with the risk and tech debt that comes along with it.


> and are super verbose...

Since I don't code for money any more, my main daily LLM use is for some web searches, especially those where multiple semantic meanings would be difficult specify with a traditional search or even compound logical operators. It's good for this but the answers tend to be too verbose and in ways no reasonably competent human would be. There's a weird mismatch between the raw capability and the need to explicitly prompt "in one sentence" when it would be contextually obvious to a human.


Imo getting 70% of the way is very valuable for quickly creating throwaway prototypes, exploring approaches and learning new stuff.

However getting the AI to build production quality code is sometimes quite frustrating, and requires a very hands-on approach.


Yep - no doubt that LLMs are useful. I use them every day, for lots of stuff. It's a lot better than Google search was in its prime. Will it translate to massively increased output for the typical engineer esp. senior/staff+)? I don't think it will without a radical change to the architecture. But that is an opinion.


I completely agree, I found it very funny that I have been transitioning from an "LLM sceptic" to a "LLM advocate", without changing my viewpoint. I have long said that LLM's won't be replacing swathes of the workforce any time soon and that LLM's are of course useful for specific tasks, especially prototyping and drafting.

I have gone from being challenged on the first point, to the second. The hype is not what it has been.


Most text worth paying for (code, contracts, research) requires:

- accountability

- reliability

- validation

- security

- liability

Humans can reliably produce text with all of these features. LLMs can reliably produce text with none of them.

If it doesn't have all of these, it could still be worth paying for if it's novel and entertaining. IMO, LLMs can't really do that either.


Let's not put humans on too much of a pedestal, there are plenty of us who are not that reliable either. That's why we have tests, linting, types and various other validation systems. Incidentally, LLMs can utilize these as well.


Humans are unreliable in predictable ways. This makes review relatively painless since you know what to look for, and you can skim through the boilerplate and be pretty confident that it's right and isn't redundant/insecure, etc.

LLMs can use linters and type checkers, but getting past them often times leads it down a path of mayhem and destruction, doing pretty dumb things to get them to pass.


I've only experienced de-motivation from managers, personally. At least for me, motivation comes from ownership, impact, autonomy, respect. You can cause me to lose motivation in a lot of ways, but you can't really cause me to gain motivation unless you've already de-motivated me somehow.

You can de-motivate me in a lot of ways, some examples:

- throwing me or a coworker under the bus for your mistakes

- crediting yourself for the work of someone else

- attempting to "motivate" me when I'm already motivated

- manufacturing a sense of urgency, this is especially bad if you try to sustain this state all indefinitely

- using AI or market conditions as a fear tactic to motivate the team

- visibly engaging in any kind of nepotism

Honestly this list could go on and on, but those are some that come to mind.


> manufacturing a sense of urgency, this is especially bad if you try to sustain this state all indefinitely

Sadly, I have seen this in almost every startup led by founders without an engineering background I've ever been a part of.

In my personal experience, this is often caused by overeager sales team promising the world for the next deal, only to fob it off to the engineering team who now "urgently" need to build "features" and "work hard" to make it happen. This is when your intrinsically motivated engineers start looking for the exit.


Also:

- not letting me have ownership of what I build and dictating features

- not giving me autonomy of how to solve a problem


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: