Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Our newsroom AI policy (arstechnica.com)
175 points by zdw 14 hours ago | hide | past | favorite | 114 comments
 help



AI is in danger of peeing in it's own water source. It's unbelievably useful at imitating and generating content, but it needs enough original content to be able to train and scrape.

Google got one thing wrong and nearly destroyed the internet - people need to have an incentive to contribute content online, and that incentive should not be to game the system for advertising.

This in particular dawned on me when asking Claude for instructions in taking apart my dryer. There was literally only one webpage on the internet left with instructions for my particular dryer - the page was more or less unusable with rotten links and riddled with adware. Claude did it's best but filled in the missing diagrams with hallucinations.

I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

It might not be a lot of money, but it would certainly be more than the pitiful ad revenue you get from posting content online right now. And if I want to upload corrected instructions for repairing this dryer I would have reason to.


> Paid out like Spotify pays out artists.

So, mostly to fraudulent AI spam?

AI makes this problem worse in both directions. It makes it fantastically easy to produce ""content"". So if you're scraping content, or browsing content, you're going to run in to increasing amounts of AI. Micropayments makes this worse, because it's then a means of getting paid to produce spam. The problem comes when you want the ""content"" to be connected to real questions like "how does my dryer work" or "what is going to happen to oil availability six months from now".

AI trainers didn't pay book authors until forced to. $3,000 ended up being a pretty high value! But it was also a one-off. Everyone writing books from now on is going to have to deal with being free grist to the machine.


> So, mostly to fraudulent AI spam?

Most of Spotify’s payments do not go to fraudulent AI spam.

I am aware that AI spam exists on the platform and I’ve read the articles, too. That does not mean that “most” of their payments go to AI spam.

Their pay scales by listens. The AI spam doesn’t collect many listens. The spammers do it because they can automate it and make it low effort, but it’s not a cash cow for the spammers.


An interesting listen https://darknetdiaries.com/episode/171/ about money laundering and spam in streaming services

I find that very believable. My completely unsubstantiated conspiracy theory is that OnlyFans is a money-laundering and dragnet-style-blackmail campaign for unlawful mass surveillance. I can’t imagine a normal or even abnormal person paying content makers, but I could imagine contractors and NGOs smurfing payments.

Spammers do it because it pays out.

I worked in music streaming for several years. Yes, there is spam, but in my experience this was less than 1% of total consumption, even if now it is a huge share of available content (also a lot of it seems to be mostly for money laundering). Also, the share of revenue that Spotify and the other services pass on to rights holders is roughly on the scale of old brick and mortar retail. But how people spend has changed. Indie music nerds used to spend much more than the average mainstream listener on records and CDs. Under streaming, both mostly pay the same subscription price, so enthusiasts spend, while casual listeners spend more. On streaming platforms payouts are tied to streaming consumption not purchases, so music with strong branding, playlist support, and promotional backing does well, and the major labels are good at that.

What share of what Spotify pays out makes it's way into the pockets of song writers and musicians is a more complicated story, generally more if the artists are with a good indie label, generally less if they are with a major. At the same time, majors have had to offer less abusive deals than they used to, because DIY and indie distribution more viable.

The other big shift is that in the retail days new releases drove most purcahse, but with streaming catalog is a source of reliable recurring revenue, and the majors own a lot of catalogue, especially stuff they acquired outright in an era when artists often had their work basically stolen from them.

The key difference between Spotify and LLMs scraping the open internet is provenance. Music on Spotify does not just appear there out of nowhere. It arrives through an accountable chain: a label, a distributor, an aggregator, a publisher, a rights holder. Sometimes this chain is thin, like with self-serve, pay to publish distribution through companies like CD Baby. Most of what is actually streamed has a provenance that reflects serious editorial and financial commitment by an organisation in the form of money spent recording, developing, and promoting an artist. This provenance chain is critical contextual information about who vouched for the work, who invested in it, who holds rights to it, and when it entered the culture. Art, music, writing do not exist in a vacuum. They are part of an ongoing cultural conversation, and who said what, when, and under what institutional backing is integral to its meaning.

So I share OP's hope the long-run equilibrium for LLMs looks more like licensed media than scraping and open web search. I want a world where models license published content from rights holders, not for training, though that would be nice, but to surface answers with links to identifiable sources in a verifiable published database, and let part of my subscription pay for access to the underlying referenced material. Information is valuable, and it's reasonable to pay for it. Aligning incentives around truth is the challenge.

Putting ink on paper and moving books around is the least important part of what a publisher does. The important part is selection, investment, positioning, promotion, and accountability. This curatorial function has always been important, and it can only become more important the tsunmai of ai slop and misinformation grows. I hope that chatbot manufacturers partner responsibly with rights holders and lean into the value that publishers have created instead of potentially destroying it.


> Paid out like Spotify pays out artists.

As others said, Spotify pays shit for artists, but maybe that's the problem with the whole thing here. It should be more like how Bandcamp pays artists (80% to the artists, 20% for Bandcamp), but then the rapacious economy supporting the largest LLM providers would collapse and (wipes away a single tear) we'd all have to use simpler, cheaper, most likely local models.


“Since Spotify pays out two-thirds of all music revenue to the industry – almost 70% of what we take in – as Spotify revenues grow, music payouts have grown as well. “

https://newsroom.spotify.com/2026-01-28/2025-music-industry-...

That’s not that far off from 80%.


I think people get distracted by the "percentage of revenue paid to musicians" thing, when the bigger reason streaming pays out so little to artists is that people pay $10-$15 per month for unlimited access to all music. Even 80% of that, split across dozens or hundreds of musicians, is not very much. Of course, it's also worth remembering that streaming was partially a response to widespread piracy. It's difficult to get people to pay very much at scale for easily copied digital media.

In addition, a greater share of the payout (relative to number of streams) goes to big music distributors that control the biggest, most popular artists and have the leverage and employees to negotiate those agreements.


It's not evenly distributed. Big labels get much better payouts per listen than independent artists

> Paid out like Spotify pays out artists.

That's probably not the best comparison. Spotify only benefits the big players resp. those with the most bots. If you actually want to support specific artists, you'd have to use Bandcamp or similar sites.


There were a couple of proposals for compensating authors in similar manner - there is a wikipedia page on them: https://en.wikipedia.org/wiki/Copyright_alternatives - but it somehow does not mention the one that was most pro-sharing - the Creative Contribution by Philippe Aigrain https://www.jstor.org/content/oa_book_monograph/j.ctt46mvx8

> Paid out like Spotify pays out artists.

As an artist, you don't want this. I promise you you don't want this.


I think most labs actively create synthetic data using existing model as part of the mix for the pretraining stage for their next model.

Would love to know exactly what the latest process is to keep slop out of training data.


I think everyone overblows the whole "AI is poisoning AI!" thing. It could be a problem but the genuine value in Reddit or any other human social media is honestly pretty low from my estimates. It's great for seeing how humans talk but in terms of 'nutritional' value for truth or answers... I am not sold. If I was choosing what to 'feed' AI, I wouldn't even bother with textual social media (besides Github / Gitlab / other source control)

There's way more value, if seeking out answers, in following the links to external sources, scraping books, and other sources that aren't "unwashed masses saying whatever they want".


> the genuine value in Reddit or any other human social media is honestly pretty low from my estimates. It's great for seeing how humans talk but in terms of 'nutritional' value for truth or answers...

> ...

> scraping books, and other sources that aren't "unwashed masses saying whatever they want".

The problem is there's a lot of knowledge that only exists as reddit comments, blog posts, or social Q&A.


You can put it in scare quotes all you want, doesn't stop you from sounding like Scrooge McDuck.

const isAiContent = (str) => str.includes('—');?

:)


Latest generation LLM's use en dashes instead of em dashes to avoid detection.

No, they don’t. But obviously GP was tongue–—in-–cheek.

> in danger

It has already done so, and we can be confident in saying that.

Verified content will always be relatively expensive when compared to AI content.

Visits to wikipedia and most sites have dropped. Rtings has gone full paywall. Ad revenue for producing Verified content will be too meager to allow for public consumption.

Theres jokes about GenAI being the great filter; while I doubt this, I do hope this is the final push that makes us think of how we want our information commons to be nurtured.


> Verified content will always be relatively expensive when compared to AI content....

> Visits to wikipedia and most sites have dropped. Rtings has gone full paywall. Ad revenue for producing Verified content will be too meager to allow for public consumption.

AI is a technology that's going to further entrench inequality, by warping incentives to push us further away from democratization. Unless you've got $$$ to drop on verified content, you'll be served prolefeed slop and be that much more ignorant.


At this point, it feels like most technology will be used in favor of people with power, and not in a democratizing manner.

I'd argue that this is something that is more about the state of play, than tech itself.


Tech has never been about democratization. Put elsewise: he who have the trebuchet, has thine castle.

> I'd argue that this is something that is more about the state of play, than tech itself.

What do you mean by that? It seems inherent to the technology under capitalism: it allows a flood of slop and anything public and valuable will be plundered, so the incentive is to make valuable stuff exclusive and elite.


I mean that

> inherent to the technology

Vs

> inherent to the technology under capitalism

The TLDR of my point is going to be that wealth concentration and information pollution sets up economies that don’t work for us in a manner that is healthy for us.


We need to find a way to incentivize progress that does not involve purely personal wealth.

> The TLDR of my point is going to be that wealth concentration and information pollution sets up economies that don’t work for us in a manner that is healthy for us.

I agree. Though I think it's important to understand that a capitalist economy serves wealth, and nothing else. It's depressing, but I think it's more likely we'll have a genocide of workers than any kind of non-capitalist economy, since modern advances are simultaneously entrenching the power of elites and sapping it from everyone else. Even if you could overcome fragmentation and manage to organize a general strike, the trillionaires won't care because it's robots and thoroughly indoctrinated libertarians doing the remaining work.


> I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

As a software user I wish I could do the same for all the software I use.


Many open source projects accept donations. There's also explicitly paid-for software. What exactly do you wish for that you can't do right now?

Specifically the part where engineers get paid the same way as artists on Spotify.

So a handful will make a buttload but the vast majority won't make enough to pay rent?

Certainly that's how open source pans out.

So not at all for their work and with a reverse Robin Hood model? That would be terrible for software. The way artists gets paid on streaming is a genius play at catering to the biggest artists and labels and screw over the smaller ones, especially true on Spotify with their freemium model

> I was imaging if LLMs could finally solve the micropayments solution people have always proposed for the internet. Part of my monthly payment gets split between all of the sites that the LLM scraped knowledge. Paid out like Spotify pays out artists.

This system is usually called taxes.

Which then pay for the universal healthcare, free education, affordable housing, libraries, parks,.. and so on.

LLM doesn't need to invent it, we should stop allowing them (people and companies behind LLM) to avoid it.


Self-contradictory policy.

> Reporters may use AI tools vetted and approved for our workflow to assist with research, including navigating large volumes of material, summarizing background documents, and searching datasets.

If this is their official policy, Ars Technica bears as much responsibility as the author they fired for the fabricated reporting. LLMs are terrible at accurately summarizing anything. They very randomly latch on to certain keywords and construct a narrative from them, with the result being something that is plausibly correct but in which the details are incorrect, usually subtly so, or important information is omitted because it wasn't part of the random selection of attention.

You cannot permit your employees to use LLMs in this manner and then tell them it's entirely their fault when it makes mistakes, because you gave them permission to use something that will make mistakes 100% without fail. My takeaway from this is to never trust anything that Ars reports because their policy is to rely on plausible generated fictional research and their solution to getting caught is to fire employees rather than taking accountability for doing actual research.

---

Edit: Two replies have found complaint with the fact that I didn't quote the following sentence, so here you go:

> Even then, AI output is never treated as an authoritative source. Everything must be verified.

If I wasn't clear, I consider this to be part of what makes the policy self-contradictory. In my eyes, this is equivalent to providing all of your employees with a flamethrower, and then saying they bear all responsibility for the fires they start. "Hey, don't blame us for giving them flamethrowers, it's company policy not to burn everything to the ground!". Rather than firing the flamethrower-wielding employees when the inevitable burning happens, maybe don't give them flamethrowers.


Is this a case of Moral crumple zones? where "responsibility for an action may be misattributed to a human actor who had limited control over the behavior" https://www.researchgate.net/publication/351054898_Moral_Cru...

Related concept: Unaccountability machines [0] where the system (electronic or organizational) mainly exists to make things nobody's fault.

There's Discworld bit [1] that often comes to mind for me, where the protagonist is reading a press-release by a corporate communications monopoly

> The Grand Trunk’s problems were clearly the result of some mysterious spasm in the universe and had nothing to do with greed, arrogance, and willful stupidity. Oh, the Grand Trunk management had made mistakes—oops, “well-intentioned judgments which, with the benefit of hindsight, might regrettably have been, in some respects, in error”—but these had mostly occurred, it appeared, while correcting “fundamental systemic errors” committed by the previous management. No one was sorry for anything, because no living creature had done anything wrong; bad things had happened by spontaneous generation in some weird, chilly, geometric otherworld, and “were to be regretted.”

[0] https://press.uchicago.edu/ucp/books/book/chicago/U/bo252799...

[1] Going Postal by Terry Pratchett


Sidebar I like “moral crumple zone” much more than “moral hazard” just because it conjures up a much clearer picture of the problem it depicts.

> You cannot permit your employees to use LLMs in this manner and then tell them it's entirely their fault when it makes mistakes, because you gave them permission to use something that will make mistakes 100% without fail.

Yes you can. The same way Wikipedia (or, way back when, a paper encyclopedia) can be used for research but you have to verify everything with other sources because it is known there are errors and deficiencies in such sources. Or using outsourced dev resource (meat-based outsourced devs can be as faulty as an LLM, some would argue sometimes more so) without reviewing their code before implemeting it in production.

Should they also ban them from talking to people as sources of information, because people can be misinformed or actively lie, rather than instead insisting that information found from such sources be sense-checked before use in an article?

Personally I barely touch LLMs at all (at some point this is going to wind up DayJob where they think the tech will make me more efficient…) but if someone is properly using them as a different form of search engine, or to pick out related keywords/phrases that are associated with what they are looking for but they might not have thought of themselves, that would be valid IMO. Using them in these ways is very different from doing a direct copy+paste of the LLM output and calling it a day. There is a difference between using a tool to help with your task and using a tool to be lazy.

> it's company policy not to burn everything to the ground!

The flamethrower example is silly hyperbole IMO, and a bad example anyway because everywhere where potentially dangerous equipment is actually made available for someone's job you will find policies exactly like this. Military use: “we gave them flamethrowers for X and specifically trained them not to deploy them near civilians, the relevant people have been court-martialled and duly punished for the burnign down of that school”. Civilian use: “the use of flamethrowers to initiate controlled land-clearance burns must be properly signed-off before work commences, and the work should only be signed of to be performed by those who have been through the full operation and safety training programs or without an environmental risk assessment”.


> The same way Wikipedia can be used for research

Before LLMs, Wikipedia was the greatest source of disinformation in human history. No journalist should ever have been using it for research. At best it's a fun project for satisfying people's idle curiosity where the truth of what they read doesn't really matter, but if your job is to report factual information, reading Wikipedia is doing a disservice to yourself and your readers. Just like people don't properly verify the BS LLMs fabricate, very few people thoroughly read the citations on Wikipedia, which often involves purchasing books and getting access to research papers. If they did read citations, they would realise that Wikipedia citations are all too frequently unsupported by the actual material they're citing, or in some cases, the cited material establishes the exact opposite. This is to say nothing of cherry-picking sources, of course.

> Should they also ban them from talking to people as sources of information, because people can be misinformed or actively lie, rather than instead insisting that information found from such sources be sense-checked before use in an article?

Statements made by people are attributed to those people to account for this. Rather than saying "Company X's product is the safest product ever made", a journalist says "Company X's CEO claims their product is the safest ever made". People do not do this with LLMs or Wikipedia, rather than attributing it to an understood-to-be-unreliable source they just present it as a factual statement. Also, if the journalist has good reason to believe the quoted statement is false, it is in fact journalistic malpractice to cite the quote wholesale without caveats informing the reader of the evidence that the quoted person is trying to mislead them.

> because everywhere where potentially dangerous equipment is actually made available for someone's job you will find policies exactly like this

Which maybe makes sense when the flamethrowers are a necessary part of the job. Flamethrowers are not necessary for journalism, full stop, so the fault rests with the organization introducing a dangerous tool into the work environment unnecessarily.


> Yes you can. The same way Wikipedia (or, way back when, a paper encyclopedia) can be used for research but you have to verify everything with other sources because it is known there are errors and deficiencies in such sources.

I think that if Wikipedia had no recommendations on good sources for their own articles and did not ever ban sources, companies would not be so sanguine about letting people use Wikipedia. There's an entire internal process associated with evaluating sources, and the expectation when using Wikipedia is that nothing written in an article is going to be sourced from the Daily Mail or Conservapedia, as an example. Also, I do think that there are companies that do have policies against talking to known liars. Given the Wikipedia bans sources and news agencies ban human sources once they've been shown to be unreliable, I don't think it's insane to then have such companies or agencies say that AI shouldn't be used because it's been shown to be unreliable. Obviously there's a balancing act of utility versus accuracy, and Ars has (probably incorrectly) decided that the utility of AI outweighs its inaccuracies.

What is frustrating is that AI cannot have a higher accuracy than the median reporter, given a little more time. AI is trained on all digitizable text, including falsehoods and inaccuracies by laypeople. Humans can look up digitizable text using search engines, too. An AI can't follow up on leads or ask anyone questions. There's no world in which synthesizing available data from digitized sources alone ends up with more accurate data than a human with a search engine and the ability to make a phone call. So allowing LLM use at all is a direct admission that seeking out the "truth" is not an important goal because it could never actually improve accuracy and could only worsen it through hallucinated, probable reporting. It's one thing when companies say that they're committed to truth and then secretly their most important overriding concern is their bottom line - it's quite another thing when a company directly says that the bottom line is their most important concern. Imagine the emperor walking through the parade, nude, saying "So what if I am nude? What are you going to do about it?"


> companies would not be so sanguine about letting people use Wikipedia

Are companies sanguine about using Wikipedia without verification. Maybe some, but they darn well shouldn't be. And I say this as someone who uses Wikipedia for many minor things (though for anything important, I verify elsewhere).

> Also, I do think that there are companies that do have policies against talking to known liars.

No doubt most/all. But such policies will always be caviated with exceptions if the information is properly validated afterwards.

> So allowing LLM use at all is a direct admission that seeking out the "truth" is not an important goal because it could never actually improve accuracy and could only worsen it through hallucinated, probable reporting.

I'm generaly anti-LLM, but this is… ad absurdum.

There is a huge difference between lazily accepting what an LLM spews out, and using that along with other sources for further research. No good reporter will trust a single source away from exceptional circumstances, wether that source is a person or an LLM, and what would be considered “exceptional circumstances” for trusting specific meat-sourced information won't apply for an LLM-sourced summary.

If you can trust Wikipedia as a starting point, you can trust a good LLM as a starting point. Both are offering a summary of what a bunch of people on the internet have written, neither should be trusted as a reliable source.

> I don't think it's insane to then have such companies or agencies say that AI shouldn't be used because it's been shown to be unreliable

If taking an absolutist approach. I would be a little more quakified and say that LLM output should never be used without verification of all details, rather than should not be used at all. It may be the case that this verification makes using LLMs no more efficient than doing research from other sources in the first place, and I suspect that this is the case often, if when using LLMs proper time is given to verifying the output.

The problem is people musunderstanding what an LLM is: a summariser, offering access to a compressed version of its sources. If you are using them as sources rather tham summarisers then you are using them wrongly. Unfortunately, that means a great many people are using them wrongly…


The next sentence after your quoted section:

“Even then, AI output is never treated as an authoritative source. Everything must be verified.”


Any verification process thorough enough to catch all LLM fabrications would take more work than simply not using the LLM in the first place. If anything verifying what an LLM wrote is substantially more difficult than just reading the material it's "summarising", because you need to fully read and comprehend the material and then also keep in mind what the LLM generated to contrast and at that point what the fuck are you even doing?

I believe this policy can never result in a positive outcome. The policy implicitly suggests that verification means taking shortcuts and letting fabrications slip through in the name of "efficiency", with the follow-up sentence existing solely so that Ars won't take accountability for enabling such a policy but instead place the blame entirely on the reporters it told to take shortcuts.


The LLM can find material that it would be hard or time-consuming for you to do.

You still need to verify it, but "find the right things to read in the first place" is often a time intensive process in itself.

(You might, at that point, argue that "what if LLM fails to find a key article/paper/whatever", which I think is both a reasonable worry, and an unreasonable standard to apply. "What if your google search doesn't return it" is an obvious counterpoint, and I don't think you can make a reasonable argument that you journalists should be forced to cross-compare SERPs from Google/Bing/DuckDuckGo/AltaVista or whatever.)


I believe what their point is is that if you give people a "extract-needle-from-haystack" machine and then tell them they have to manually find where in the haystack the needle was, it defeats the purpose of having the machine.

With that said, a good RAG solution would come with metadata to point to where it was sourced from.


> I believe what their point is is that if you give people a "extract-needle-from-haystack" machine and then tell them they have to manually find where in the haystack the needle was, it defeats the purpose of having the machine.

We've got to be careful to not let the perfect be the enemy of the good.

I'm not an LLM enthusiast, but I think you have actually compare it against what the alternative would really be. If you give the journalist a haystack but insufficient time to manually search it properly, they're going to have to take some shortcut. And using an LLM to sort through it and verifying it actually found a needle probably better than randomly sampling documents at random or searching for keywords.


This is a bogus analogy leaidng to a bogus conclusion.

If something points to the needle in the haytack (saying "this haystack has a needle positioned eighteen centimeters from the top and three left of center"), it's much easier to verify that indeed there is a needle there than it would be to find that needle in the first place.

If an LLM spits out a claim that something happened (citing a certain article), it's less work to read the article and verify the claim than it would be to DISCOVER the article in the first place.

In other words, LLMs can be a time-saving search engine, and the idea that it's just as much work to find+verify information as it is to have the LLM find it and then you verify it is hokum.


I don't want to come off as an AI-maximalist or whatever, but, I mean, at some point, skill issue, right?

You can use Google to find you results reinforcing your belief that the earth is flat too; but we don't condemn Google as a helpful tool during research.

If you trust whatever the LLM spits out unconditionally, that's sorta on you. But they _can_ be helpful when treated as research assistants, not as oracles.


when you use the extract-needle-from-haystack machine, verify that it actually extracted a needle.

that's much easier than manually extracting the needle yourself


Another interpretation is if you have multiple haystacks, and the machine tells you which haystack likely has a needle in it. You still need to extract the needle yourself,

> Any verification process thorough enough to catch all LLM fabrications would take more work than simply not using the LLM in the first place

Sometimes you have a weak hunch that may take hours to validate. Putting an LLM to doing the preliminary investigation on that can be fruitful. Particularly if, as if often the case, you don't have a weak hunch, but a small basket of them.


You can prompt LLMs to scan thousands of documents to generate text validating your hunches. In some cases those validated hunches may even be correct.

It's easy to get an LLM to make any argument you like based on whatever data is available. Those arguments are going to be trivially bad if that data is bad.

It's more using LLMs like a metal detector, rather than digging through the entire beach by yourself.

You still need to check the junk you dig up using the metal detector.


Disagree. If I’m I’m a reporter and I’m trawling though a mass data dump - say the Epstein files or Wilileaks or statistics on environmental spills or something, using AI to pull out potential patterns in the data, or find specific references can be useful. Obviously you go and then check the particular citations. This will still save a lot of time.

> I believe this policy can never result in a positive outcome.

I get where you're coming from (I'm learning more and more over time that every sentence or line of code I "trust" an AI with, will eventually come back to bite me), but this is too absolutist. Really, no positive result, ever, in any context? We need more nuanced understanding of this technology than "always good" or "always bad."


If you need accuracy, an LLM is not the tool for that use case. LLMs are for when you need plausibility. There are real use cases for that, but journalism is not one of them.

I didn't say in any context. I'm specifically talking about this policy on journalistic research.

> permit

I suspect that it's more like "ordered."

Many companies are now requiring staff to use AI.

Of course, if the employee screws up, then The Secretary Will Disavow Any Knowledge of Your Activities.


> ...this is equivalent to providing all of your employees with a flamethrower, and then saying they bear all responsibility for the fires they start.

This is essentially the policy of most SWE groups but with PRs merged, though right?

You can / should use AI to accelerate SWE workflows and assist in reviews but if you merge something that is bad or breaks production that is on you.

> "Hey, don't blame us for giving them flamethrowers, it's company policy not to burn everything to the ground!".

Flamethrowers are inherently dangerous to the operator and are ~intended to be used burn things to the ground.

I'm no expert on arms but there is probably a simile with a better fit out there.


The way LLMs are being forced upon the workforce in tech is just as bad, actually, yes.

> Flamethrowers are inherently dangerous to the operator and are ~intended to be used burn things to the ground.

I actually think bringing up this point reinforces the analogy rather than undercutting it. LLMs are ~intended to spread disinformation, eg. Deepseek on 1989, Grok going full Mecha-Hitler, ChatGPT selling out prompts to advertisers. One of the biggest impacts LLMs will have on human society is as a propaganda tool with a reach of billions.


I have a little bit of a bias here, as I am building forth.news, which is an AI-powered news platform -- but I am also a former journalist.

It's not necessarily contradictory. I see this more like giving your employees cars, but telling them they are responsible if they get into accidents.

All of this is entirely predicated on expectation and responsibility. First, mark something as being AI if it cannot be verified, and verify everything that you can.

Forth is using AI so we can detect and push out stories as quickly as possible, getting breaking news out there as soon as it breaks. Our summaries are AI, but marked as AI. Our underlying source information is right there and cited. We try to be as transparent as possible about the tools we are using, and the tradeoffs.

Every journalist should instinctively and reflexively double check everything, regardless of the source. There's an old maxim, "if your mom tells you she loves you, check it out." Being from an LLM doesn't change that.


I'm not a journalist and just for random things I'm interested in, I have no problem using an LLM to point me in a direction and then directly engage with the source rather than treat any of the LLM output as authoritative. It's easy to do. This is not a flamethrower.

> the author they fired for the fabricated reporting

Didn't one of the magazine's editors share the byline?


Yes but editors don't take the fall, they take the credit.

Everything occurred exactly as predicted.


They're also allowed to use wikipedia to use research. It has similar sorts of problems.

> LLMs are terrible at accurately summarizing anything.

I think you are perhaps stuck in 2023?


And yet we are discussing this in the context of a reporter having been fired from Ars Technica for publishing an article which included inaccurate LLM-generated summaries in 2026. How come?

https://news.ycombinator.com/item?id=47226608


Maybe you should read the article? :)

What failed was extracting verbatim quotes, not summarizing.

If you want an LLM to do verbatim anything, it has to be a tool call. So I’m not surprised.


> LLMs are terrible at accurately summarizing anything. They very randomly latch on to certain keywords and construct a narrative from them, with the result being something that is plausibly correct but in which the details are incorrect, usually subtly so, or important information is omitted because it wasn't part of the random selection of attention.

I don't know what you've been doing, but the summaries I get from my LLMs have been rather accurate.

And in any event, summaries are just that - summaries.

They don't need to be 100% accurate. Demanding that is unreasonable.


The LLM meeting-summary bot in Teams seems accurate… unless you were in the meeting, and also closely read the summary afterward. It misrepresents what people actually said all the time.

Depends on topic, often what they consider important isn't what is important and details that are essential get out of view. I'm having good success with youtube video, not as much with technical docs.

Yes, search and summarization is where LLMs shine. I use them all the time for that, and much less for code generation. I would say search > summarization > debugging > code gen/image gen

>They don't need to be 100% accurate. Demanding that is unreasonable.

If an intern was routinely making up stuff in the summaries they provided to their bosses, they'd be let go.


AI policy with AI usage is always difficult to write/read. Lengthy, frontloaded with excuses, values, and big words, followed by more words to fill the gap between sliced up ugly truth.

AI policy without AI usage is easy to read and write.

> We don't use them. That's it.


All the beating around the bush is because they use AI throughout the process, but want to frame it as not being written by AI (oh, a human signs off on the AI content).

Related discussions from a couple months ago:

Ars Technica fires reporter after AI controversy involving fabricated quotes (606 points, 394 comments)

https://news.ycombinator.com/item?id=47226608

Editor's Note: Retraction of article containing fabricated quotations (308 points, 211 comments)

https://news.ycombinator.com/item?id=47026071


First, they broke page loading without javascript, now they slopifying reporting.

\1 AI-generated news is unhuman slop. Crikey is banning it (2024) - Crikey.com.au - https://www.crikey.com.au/2024/06/24/crikey-insider-artifici...

\2 Why Crikey retracted an article that we found out was written with AI help (2026) - https://www.crikey.com.au/2026/03/19/crikey-responds-to-ai-c...

  Yesterday, we published an article by a contributor who later confirmed they used AI in some aspects of its production.

  This goes against our editorial policies. As a result, we’ve taken down the story and the preceding three stories in the series.
(\2) is an interesting follow on from the policy set two years earlier (\1) as the specific piece in question "used AI in some aspects of its production" but was largely very much a human conceived, shaped and written piece that was only "assisted" by AI.

The Australian Media Watch team looked at this tension closely and felt the rejection was unfair, pointing out that while slop is bad, assistance (subject to terms and conditions) can enhance.

- Media Watch, likely geolocked to AU, might need a proxy - https://www.abc.net.au/mediawatch/episodes/ep-08/106487250


> AI tools must not be used to generate, extract, or summarize material that is then attributed to a named source, whether as a direct quote, a paraphrase, or a characterization of someone’s views.

This sounds overfit to their earlier incident.

Besides, I expect they already have a policy on accurate quotes.

> Anyone who uses AI tools in our editorial workflow is responsible for the accuracy and integrity of the resulting work.

AFAICT that's the actual simplified policy. Reasonable!


like ok? awfully self-congratulatory for the bare minimum of doing your job.

Context:

"An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library.

...

I’ve talked to several reporters, and quite a few news outlets have covered the story. Ars Technica wasn’t one of the ones that reached out to me, but I especially thought this piece from them was interesting (since taken down – here’s the archive link). They had some nice quotes from my blog post explaining what was going on. The problem is that these quotes were not written by me, never existed, and appear to be AI hallucinations themselves.

This blog you’re on right now is set up to block AI agents from scraping it (I actually spent some time yesterday trying to disable that but couldn’t figure out how). My guess is that the authors asked ChatGPT or similar to either go grab quotes or write the article wholesale. When it couldn’t access the page it generated these plausible quotes instead, and no fact check was performed.

...

Update: Ars Technica issued a brief statement admitting that AI was used to fabricate these quotes" [1].

[1] https://theshamblog.com/an-ai-agent-published-a-hit-piece-on...

Discussion: https://news.ycombinator.com/item?id=47009949


> Cmd-F "fact check"

No results


I have had that feeling for a while and Ars had some high-profile slipups over the past few months. This confirms that it is just slop now :(

> AI-powered tools may be used to assist with editing and workflow

I'm glad they're declaring outright that they're (as we suspected for a while now) a slop factory. Onto the blacklist they go..


> Our creative team may use AI tools in the production of certain visual material, but the creative direction and editorial judgment are human-driven.

How is this different from anyone else publishing AI slop images on their blog? Those people also direct the AI through prompting and evaluate the results.

I mean, use AI images, so long as they are not crap, but why keep up this charade of "we're authoring the slop".


Trust, reputation, and credibility will become (even more of) a premium.

These statements seem to be in contradiction

> The short version: Ars Technica is written by humans. AI doesn’t write our stories, generate our images, or put words in anyone’s mouth.

> Our creative team may use AI tools in the production of certain visual material

Something that bothers me is that people who value their organic output (Ars is written word, obviously they care more about the writing than the thumbnails) seem to treat their work as more deserving of human generation than the associated work. Which seems fine on its face, but the problem is that it's devaluing the marketplace for original human content.

Paying people to create original media even when it's not the primary output of your media organization is important to keeping the craft alive in general.

I see this all the time locally, where arts organizations will use generative AI for everything but their own product, not realizing that the very use of it at all is destroying art.


Good to know which companies/news to avoid.

Drivel. Either forbid it from the newsroom entirely or call yourself what you are: a slop shop.

Newspapers have existed for hundreds of years without AI. They don't need it now. Employ humans. Human journalists. Human editors.

I suppose I'll take a moment to plug my HN userscript "HN Blacklist": https://greasyfork.org/en/scripts/427213-hn-blacklist

If you want to not see articles from Ars Technica on HN simply install the userscript and add:

  source:arstechnica.com

AI for writing feels like such a stupid idea.

When you write things you just need to think about what you want to say and write it. It's not hard.

If you make an LLM generate the text, and you still bear responsibility for whether it's correct or not, you haven't actually reduced the amount of work you have to do at all because now instead of "thinking -> writing" you need "reading -> thinking -> rewriting."

"Reading is easier than writing" sounds like something that someone who never writes anything would think. In both code and natural languages writing is easier than reading. Text is after all a representation of thought, so to write you just need to represent your thoughts in a way others can understand, but in order to read you must decipher what the author was trying to convey from their written word. If the author is an LLM that is going to be a lot more difficult and annoying.

I've had similar feelings about LLM's "completing" code. It "feels" like I'm more productive, but at the same time the "completions" feel more like constant interjections that won't let me get into a flow of writing, and often when things don't work it's because the AI quickly generated code that looked correct at first glance, so I left it that way, but when I tried to run the program it turns out it was wrong, and when it happens I can't help but think: of course it's wrong, I haven't written the code yet. There is code, the LLM wrote it. But I haven't written it. I haven't done the work. So it's like having a "placeholder" that you still HAVE to check but there is no metadata that says this is a WIP, no # TODO: check the LLM output. It's a placeholder camouflaged as finished work.


> When you write things you just need to think about what you want to say and write it. It's not hard.

At least for technical writing at work (internal docs for example), i typically have an extended back and forth conversation with claude refining my ideas, then I have it sketch out the outline which saves _so much time_. Then all I have to do is fix what it got wrong or add things that it missed.

I am not doing this as a creative expression that I find personal fulfillment in, I am doing it to check off a box on a jira issue.


Doesn't need Ars Technica added to the title

My news reading AI policy - I will use whatever source for news, AI or human, that gives me the best news. A lot of times it's human, like I appreciate diverse human comments on ycombinator. CNN used to have comments threads on articles and then gave that up because some comments were spicy, so I stopped reading. I don't remember the last time I went to arstechnica, I guess because they didn't standout compared to just asking Grok what's new in tech or browsing Reddit. If they could have used more AI to make their site more interesting, they should have.

It is nice to see, but I fear it will be the same as with papers and their news and internet. I could buy a paper and read it but why would I?

The same will most likely happen with human written news and cheap AI slop news. Why would anyone pay more for higher quality when you can have low quality cheap product?

Look at food for example. Price is most important factor in the choice of what you are going to buy. I will probably not happen now, in few months or in even few years but it will happen if models will still be advancing.


Food would actually be a pretty good example - people pay extra for higher quality food, local farm food, whatever all the time. They go out to expensive restaurants that talk up their techniques and sourcing. There's a lot of defined space for refined food like this.

If ai generated and human written content ended up like that you would have a pretty decent shot at a fully human authored blog or substack that people paid for specifically, or human written books specially curated.


You could say the same about horses: people still riding those, have stables, buy expensive ones or even bread ones themselves. Does not change the fact that common people usually drive cheap cars.

Of course everything can be argued via analogy that way, but I think outcome of cheap, mostly correct but often completely wrong news will be more probable.

Just like todays social media


From TFA:

> Our approach comes from two convictions:

Uhuh.

> that AI cannot replace human insight, creativity, and ingenuity

Sure, agreed, no dispute.

> and that these tools, used well, can help professionals do better work.

[[citation needed]]

Prove it. I don't believe you.


If AI slop-generation helps your work, then your work was at least partially generating slop.

Slop may be useful, helpful, any number of things, but if you're using slop you're using slop.


> Our creative team may use AI tools in the production of certain visual material, but the creative direction and editorial judgment are human-driven.

As opposed to what? This is a little facetious, but what could it possibly mean to have creative direction and editorial judgement without human involvement?

Presumably we're talking about image generated by a diffusion model or something, but further, an image which is generated without being edited by any human. The prompt used to generate the image isn't written by a human, and it can't really be based on the contents of the (human authored and edited) article either. No human may select the service or model used, and once generated the image is published sight unseen without being reviewed by any human.

If some kind of agentic AI does any of these things it is one which appears ex nihilo, spontaneously appearing without being created or directed by any human.


There's a good post from Aurich in the comments of the article detailing the practical reality of how they (don't) use AI tools in their image work, but as a policy statement this sentence is 100% vibes, 0% actual guidance or restriction

> Anyone who uses AI tools in our editorial workflow is responsible for the accuracy and integrity of the resulting work. This responsibility cannot be transferred to colleagues, editors...

This sounds a direct callout to the incident earlier this year where an apparently sick staff member relied on an AI to reproduce quotes, and it did not. Ars retracted the article and the staffmember was fired.

I have felt very ethically uneasy about this because the person was ill, and I emailed the Ars editorial team directly to express concern re labour conditions, and to note that it is the editorial team's responsibility to do things like check quotes.

Of course it is the journalist's responsibility: when you have a job you do your job by policy (I wonder if this policy existed in writing at the time of the firing?) plus, it is part of the job to be accurate. But I am also a firm believer in responsibility being greater at higher levels. This sounds a direct abrogation of journalistic standards by the Ars editorial team.


> and to note that it is the editorial team's responsibility to do things like check quotes.

Publishing things online for free (as Ars does) is difficult business. I doubt they can realistically afford an "editorial team" which checks quotes. Paying the journalists is expensive enough.


> apparently sick staff member relied on an AI to reproduce quotes

"Apparently sick", you couldn't phrase it more accurately.

Kudos for firing them, the only valid course of action for a publisher.


That's harsh. I feel any situation where someone is ill and required to work (the appearance, which is a labour issue if true), and makes mistakes while sick, should be treated with a little kindness. I worry they were made an example of.

>This sounds a direct abrogation of journalistic standards by the Ars editorial team.

We depended on an ecosystem of news and journalism to keep our polities informed.

However, if that ecosystem is starving it will increasingly fail to live up to its standards and we can expect these failures to impact us increasingly.

I am not defending bad journalists, nor creating an excuse to tolerate such behavior in the future.

I am describing the macro trend we are facing, the failure state we can expect, and asking what happens if nothing grows to replace it.

The NYT earns revenue through games more than journalism and ads. Wikipedia is seeing reduced visitors due to AI summaries, and this leads to lower donations. A review site I used went into a full paywall.

I don't really see how Ars or most other sites will be able to earn revenue and pay salaries in this bot first environment.


>We depended on an ecosystem of news and journalism to keep our polities informed.

If this is true and necessary we might as well skip the middleman and have the news and journalists run the polities.


Ars has a decently pricey direct subscription, doesn't it? With a lot of tech focused features included. Their strategy is probably the best you could set up in this ecosystem.

If it isn't clear from this policy that Ars is run by the advertisers and not the subscribers, I don't know what would make it clear.

Advertisers only care about eyeballs and really bad press; AI increases the first and rarely causes the second.


My more cynical take is that this might be as subscriber driven as it's possible for a news outlet now. Keep an eye on 404 and see if they can resist the gravity of ads, I guess?

I agree with you; what I am noting is that traditional journalism ethics (editors are responsible for fact checking) is explicitly refused by this policy.

They can simultaneously set standards for their staff -- as they should -- and retain professional standards for the more senior staff as well.

To remove responsibility from those more senior and make those more junior the only ones responsible is in any company a serious professional issue. Here it is also specifically contrary to the professional standards in their business area.

I see my parent comment is downvoted. Yet, this is firmly the ethical and professional and traditional stance. I don't believe AI or any random upcoming technology should change this.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: