Hacker Newsnew | past | comments | ask | show | jobs | submit | iamnothere's commentslogin

As an American, most Americans are unable to distinguish between “liberal” (American left, non-specific), Liberal (Lockean traditional capitalism), neoliberal, Communist, socialist, Social Democratic, “progressive”, and the Democratic Party.

Come to think of it, I’m not sure I understand anymore, either. I really do feel like we’re entering a post-ideological tribal era. Ideological stances change minute to minute, mostly according to “who and whom.”


The ads may not be announced. If ads can be subtly inserted “organically” through crafted weights then AI companies may try to claim that it isn’t advertising, if it’s even possible to catch them doing this. For instance, advertisers could pay to have their product embedded as the “best” in a category during training. If this is done as a fine-tuning step then it could be re-run later as advertisers and base models change.

How would the billing work for this? So much of advertising technology is tracking for the purposes of attribution.

How does openAI know what to charge for a particular product and category? How do I know if my money was well spent to boost my product in that category?

I don’t think you’re wrong! I’m just curious about how the new pricing models will work.


> How would the billing work for this? So much of advertising technology is tracking for the purposes of attribution.

This isn’t a necessary condition for an ad to exist. When companies pay for their name on a sports stadium, they use various proxies to tell whether their name recognition goes up, but by and large you just don’t know if it’s worth it.


Individually constructed models serving selected poisoned datasets. No different to adwords.

If company bid is highest, customer is in selected demographic, topic is appropriate - answer query using biased model.

It would be trivial to make a poisoned model that always rates the best duvet as DuvetCompany001 in all related queries for example. Then simply charge per impression.


Yeah, this is what I was thinking. It’s not a PPC or PPI model, it’s more like you pay upfront to hopefully influence people over a longer period of time. It’s like brand placement in TV/film. Not clear if most advertisers would be interested, but I’m sure that some would be.

Maybe someone is putting out public “scraper lists” that small companies or even individuals can use to find potentially useful targets, perhaps with some common scraper tool they are using? That could explain it? I am also mystified by this.

I am starting to think these are not just AI scrapers blindly seeking out data. All kinds of FOSS sites including low volume forums and blogs have been under this kind of persistent pressure for a while now. Given the cost involved in maintaining this kind of widespread constant scraping, the economics don’t seem to line up. Surely even big budget projects would adjust their scraping rates based on how many changes they see on a given site. At scale this could save a lot of money and would reduce the chance of blocking.

I haven’t heard of the same attacks facing (for instance) niche hobby communities. Does anyone know if those sites are facing the same scale of attacks?

Is there any chance that this is a deniable attack intended to disrupt the tech industry, or even the FOSS community in particular, with training data gathered as a side benefit? I’m just struggling to understand how the economics can work here.


>I haven’t heard of the same attacks facing (for instance) niche hobby communities. Does anyone know if those sites are facing the same scale of attacks?

They are. I participate in modding communities for very niche gaming projects. All of them experienced massive DDOS attacks from AI scrappers on their websites over the past year. They are long running non-commercial projects that don’t present any business interest to anyone to be worth expending resources purely to bring them offline. They had to temporarily put the majority of their discussion boards and development resources behind a login wall to avoid having to go down completely.


Thanks. The scale of this is just mind-boggling. Unbelievably wasteful.

Just as an additional anecdata point:

I run a small, niche browser game (~125 weekly unique users, down from around 1500 at its peak 15 years ago), and until I put its Wiki behind a login wall a few months ago, we were getting absolutely hammered by the bots. Not open source, not anything of particular interest to anyone beyond those already playing the game and the very select group of people who, if they found it, might actually enjoy it. (It's all text, almost-entirely-player-driven, and can be very slow at times, so people used to modern mobile games and similar dopamine factories tend to bounce off of it very quickly.)

Some of the UAs we saw included Claude and OpenAI, but there were a lot of obviously-bot requests to the Wiki that were using generic UAs and residential IPs.

If there's a concerted effort to swamp open-source projects, it's not the only thing going on. I think it's much more likely that the primary cause of this flood is people who a) think they have the right to absolutely everything on the internet, b) expect everyone they scrape from to be actively trying to hide the data from them (so, for instance, they will ignore any exposed API), and c) don't care either how many resources they use, or how much damage they do.


How many of these scrapers are written by AI by data-science folks who don't remotely care how often they're hitting the sites, and is data they wouldn't even think to give or ask the LLM about?

But does that explain all of the various scrapers doing the same thing across the same set of sites? And again, the sheer bandwidth and CPU time involved should eventually bother the bean counters.

I did think of a couple of possibilities:

- Someone has a software package or list of sites out there that people are using instead of building their own scrapers, so everyone hits the same targets with the same pattern.

- There are a bunch of companies chasing a (real or hoped for) “scraped data” market, perhaps overseas where overhead is lower, and there’s enough excess AI funding sloshing around that they able to scrape everything mindlessly for now. If this is the case then the problem should fix itself as funding gets tighter.


My theory on this one is some serial wantrepreneur came up with a business plan of scraping the archive and feeding it into a LLM to identify some vague opportunity. Then they paid some Fiverr / Upwork kid in India $200 to get the data. The good news is this website and any other can mitigate these things by moving to Cloudflare and it's free.

A couple of forums I have lurked on for years have closed up and now require a login to read.

I've wondered for a while if simple interaction systems would be good enough to fend these things off without building up walls like logins. Things like Anubis do system checks, but I'm wondering if it would be even easier to do something like the oldschool Captchas where you just have a single interactive element that requires user input to redirect to another page. Like you hit a landing page and drag a slider or click and hold to go to the page proper, things that aren't as annoying as modern Captchas and are like a fun little interactive way to enter.

As I'm writing this I'm reminded of Flash based homepages. And it really makes it apparent that Flash would be perfect for impeding these LLM crawlers.


> I haven’t heard of the same attacks facing (for instance) niche hobby communities. Does anyone know if those sites are facing the same scale of attacks?

Yes. Fortunately if your hobby community is regional you can be fairly blunt in terms of blocks.


People are legally required to pay into the fund to pay for a legally guaranteed benefit. The mechanics of the program are immaterial. If the program doesn’t pay for the benefit they were promised, after taking their money, that’s theft.

You could argue that you shouldn’t have to pay for social security. But hopefully you aren’t arguing that you shouldn’t have to pay and prior payers should get screwed. Any exit to social security should ensure that the previous bargain is upheld, somehow, given the forced participation and the number of people who have planned their retirement around it.


Reminder: it’s bad on purpose to make you click

Care to share more details about this? Which account? What do you mean by “suspicious”? What specific effects does this have?

I use a VPN 24/7 on one machine. Zero issues even with banking, although sometimes I have to answer CAPTCHAs.


Some sites block or offer a degraded experience for VPN traffic. This happens occasionally when I use Mullvad.

Some examples: Imgur loads but does not display any content. USPS's website does not load.


Now that I think about it USPS is the one place I get blocked. Still, it seems like people facing an age gate should be able to circumvent it without much hassle on a regular basis.

Your VPN provider shares their IP lists publicly. For a lot of website owners, blocking those is a simple way of getting rid of 80% of spam.

And yet I don’t get blocked. Curious.

What VPN?! Proton literally doesn’t work on anything

The extreme danger of marijuana and its role as a “gateway drug” was also extensively studied and “proven” by a handful of moralist researchers and groups who had an agenda to push. The highly biased “researchers” who pursued this were often directly funded by the US government.

And now? This research has been debunked. It’s likely bad for people prone to mental illness, especially when taken regularly and in excess, and even stable people shouldn’t overdo it, much like alcohol. But it’s not going to cause lasting harm to most people.

Regarding porn, your argument from authority is extremely suspect. Porn is considered morally suspect due to lingering Puritan values, and if there is a research deficit (which I doubt) then it is likely because reputable researchers avoid the topic due to reputational damage. Sex researchers in general have often faced harassment, targeted government inquiries, and threats. So in short, I don’t believe you here.

I haven’t personally met anyone whose life was negatively affected by porn, except for a couple of people who were in relationships where one partner considered porn to be a form of infidelity. Utterly ridiculous from my perspective.

Edit: Total bunk. After looking into it, reputable meta-studies have showed no link between porn and sexual violence, ED, or mental health issues. It’s trivial to find this research, search for it if you care.


Sustainable and importantly ethical.

Totalitarianism has the same end state whether it comes from the left or the right. It always results in suppression of the truth, broken feedback loops that lead to poor decisions by government, economic failure, and finally either bloody repression, war, or revolution.

It’s possible to move through this to a place of stability. After all, China only had to kill 15-55 million people in the Great Leap Forward and a couple thousand more in 1989. Today they are fairly stable and prosperous, even with tight controls on information. Perhaps the UK will have a similar path!


Both extreme leftism and extreme rightism are composed of the same people - more focused on ideology rather than truth, and authoritarian control rather than voice of reason.

In the middle, there is an acceptable range of compromise. Social media is the new town square. People shouldn't be able to post stuff on there without recourse for lying and spreading misinformation, just like they shouldn't be able to do this in public. History shows that this leads to bad outcomes. Also, history also shows that we can't just have personal freedoms unrestricted.

And just because that "freedom" is being taken away, doesn't mean that the leftists are in charge.


Historically, at least in the US, people are indeed explicitly allowed to lie or spread misinformation in the town square. This is specifically allowed by the first amendment and backed by court cases. Your mention of the “town square” here is interesting, as Marsh v. Alabama and Pruneyard Shopping Center v. Robins both center around this idea. In both cases the Supreme Court ruled that unrestricted free speech was allowed in the town square, whether it was a company town or a shopping mall, so long as the location was effectively serving as a surrogate town square.

Now of course this is about the UK. But to my knowledge and based on research there are no laws or cases about lying in public. As long as you aren’t committing perjury or slander, or urging violence, or inciting a panic, this isn’t illegal.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: