Unfortunately someone would raise a privacy stink just like they did for the net...

petercooper · on Dec 16, 2010

But Delicious's database is already public (if you take out private fields on the user table and the private links). Even just the links + tags without any user info would rock for semantic Web usage.

sabat · on Dec 16, 2010

It's public right now. It won't be once Yahoo pulls the plug on Delicious.

Tichy · on Dec 16, 2010

User-agent: * Disallow: /

I don't remember the robots.txt rules for sure, but doesn't that mean they don't allow crawlers at all?

ivank · on Dec 16, 2010

That's the rule for crawlers that aren't Slurp, Googlebot, Teoma, or msnbot.

Tichy · on Dec 16, 2010

I noticed the extra rules, but I am neither Slurp, Googlebot, Teoma nor msnbot :-(

mfukar · on Dec 17, 2010

robots.txt is merely a suggestion.

ryanwaggoner · on Dec 16, 2010

It's public data! You could scrape and index it now for free if you wanted...

scorpion032 · on Dec 17, 2010

They even throttle the friendfeed scraper which graciously pulls all its users data at once.

You can't write a simple scraper that is not distributed in 100 of machines across the web to pull out their data.

viraptor · on Dec 17, 2010

I heard that there is this thing called "the cloud" where you can rent services based on the work time. That makes cheapo servers both realistic and quite simple ;)

Actually I just noticed you get 750h of free micro instance time from aws... I wonder if it would be worth doing. I imagine the link+tags are <100GB in total.

petercooper · on Dec 17, 2010

Though I've noticed only pages up to 200 work when going back through history.. this only gets you a few days back on the most popular tags.

mikeklaas · on Dec 17, 2010

Sure, but user pages go back farther than that...

bootload · on Dec 16, 2010

"... It's public data! You could scrape and index it now for free ..."

That is the most insightful thing I've read today.