Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The publisher can control how much content is exposed via RSS (typically just the lede), whereas with presenting scraped content by third party news aggregators, the user will never need to visit the origin site.


The publisher can also control how much is shared with third party aggregators, either through robots.txt or a paywall method.

Which has been the case since search engines became a thing.


That isn't the same at all. A publisher cannot use robots.txt, and much less paywalls, to indicate a part of text that can be shared in syndication.


A paywall can. The page displays the snippet the publication is allowing to be shared, while the paywall hides the rest. I believe this is what a few of the bigger US newspapers are doing right now.


Ok, but that would require regular readers to have credentials for the paywall. I understood the discussion to be about scraping publicly accessible sites.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: