Why does FB need a crawler? Their users provide the content for their site. Is t...

youeseh · on June 11, 2020

When you paste in a URL to share it with your friends, Facebook tries to grab some information from that webpage to provide a summary.

toomuchtodo · on June 11, 2020

What happens if that link performs an action upon a GET request?

Edit: Folks, I agree with you all, but I've seen a lot of garbage out there. Just asking the question for the discussion.

dragonwriter · on June 11, 2020

Then the action needs to be harmless, because GET is defined as not merely idempotent but also safe.

Didn't we already learn this lesson after all the unsafe-GET problems unveiled when prefetching browser accelerators came on the scene in, IIRC, the late 1990s?

thaumasiotes · on June 11, 2020

> Then the action needs to be harmless to repeat, because GET is defined as idempotent.

No, this is a terrible response. The action needs to be harmless to execute every time, not just every time after the first time.

HTTP DELETE is conceptually idempotent, but you don't want to be deleting stuff with GET requests. That's why the standard provides a DELETE method! The distinction that really matters is safe/unsafe, not idempotent/unique.

(Do you need to use DELETE for deleting stuff? No, POST is fine.)

dragonwriter · on June 11, 2020

> The action needs to be harmless to execute every time, not just every time after the first time.

You are correct, and the grandparent post has been updated appropriately.

bkanber · on June 11, 2020

> No, this is a terrible response.

As an aside, this kind of hyperbole really gets under my skin. It wasn't a terrible response. That statement is already technically correct: GET is idempotent, and the definition of idempotency is that it is harmless to repeat.

Your gripe is that OP didn't mention that GET is not only idempotent but must also be "safe"; i.e. that it should not alter the resource. OP got it 50% correct.

Does that omission make his comment a "terrible response"? No -- just incomplete.

thaumasiotes · on June 11, 2020

Yes, it was a terrible response. Here are some examples of idempotent requests:

- Change the email address registered to my account from [email protected] to [email protected] .

- Instead of sending my direct deposit to account XXXX XXXX at Bank of America, from now on, send it to account YYYY YYYY at Wells Fargo.

- Delete my account.

- Drop the database.

None of these have any business being available to GET requests. Objecting to a misconfigured endpoint on the grounds that the functionality it implements is not idempotent implies that the lack of idempotence is what was wrong. That's a bad thing to do - anyone who takes your lesson to heart is still going to screw themselves over, because you gave them terrible advice. They may do it more than they otherwise would have, because you gave them advice that directly endorses really bad ideas. Idempotence or the lack thereof is beside the point.

Messing up on endpoint idempotence means you might hurt the feelings of a document. Messing up on endpoint safety means you might lose all your data as soon as anyone else links to your homepage. Or worse.

jon_richards · on June 11, 2020

Most of the time, GET should not affect system state at all (other than caching, etc). Idempotent can still have effects on the first access. Even deletion can be idempotent.

I generally see PUT and PATCH classified as idempotent (for the same input).

crazygringo · on June 11, 2020

Then like literally every search engine and website crawler ever does, the action will be performed.

Which is why it's bad practice to design your website using GET for actions. That's what POST is for.

I mean, using GET for actions will break so many things -- browser prefetching, link previews, the list is endless. If you use GET for actions, just... yikes.

dependenttypes · on June 11, 2020

> browser prefetching, link previews

Both of them are pretty bad. I do not see GET actions breaking anything that is not trash.

thinkindie · on June 11, 2020

that's your own problem because you are going against the HTTP protocol standard. GET should be idempotent.

thaumasiotes · on June 11, 2020

Going against the HTTP standard isn't a problem. For example, it's a good practice to ignore HEAD requests as opposed to responding appropriately.

The problem with unsafe GET is that it conflicts with reality, not that it conflicts with the standard.

dragonwriter · on June 11, 2020

> Going against the HTTP standard isn't a problem. For example, it's a good practice to ignore HEAD requests as opposed to responding appropriately.

That's only against the standard of you advertise HEAD as a supported method on the resource, which converts it from a good idea in some circumstances to a bad one, so if there is a good example to support your claim, that isn't it.

thaumasiotes · on June 12, 2020

The most classic example is the JWT specification, which says you need to honor the encryption algorithm defined by the token you receive. (JWT includes a "none" algorithm, making token forgery trivial when the parser implements the standard.)

It's known widely enough now that people have chosen to reinterpret the language of the standard in order to claim that their implementations are compliant -- after making the change specifically to bring themselves out of compliance.

(It's possible that it's been so long that the standard itself has been changed to accommodate this. But regardless, the point stands that standards compliance is not a virtue for its own sake. This wasn't a good idea back when everyone agreed that the standard required it, it's not a good idea now, and future bad ideas do not in general become good ideas by virtue of being specified in standards.)

claudiulodro · on June 11, 2020

I'm not sure what sort of action you mean, but Facebook doesn't fetch the page in real-time using the client's browser. Some sort of job scrapes the page HTML looking for metadata and then uses that metadata to populate the preview. If you want to play around with what it "sees", you can test using the sharing debugger: https://developers.facebook.com/tools/debug/

vasco · on June 11, 2020

That "link" doesn't respect the GET semantics then.

WookieRushing · on June 11, 2020

Then it performs the action, what else would it do?

slig · on June 11, 2020

And to prevent some URLs to be posted.

systemvoltage · on June 11, 2020

This pattern needs to die.

Whenever I paste a link in any sort of messaging app - iMessages to Slack - it puts a thumbnail and summary, polluting the entire conversation with tons of noise.

Messaging apps have no business in looking up the URL. Just let it pass as a link.

Fuck everything about this and we need to push back on this nonsense.

crazygringo · on June 11, 2020

I disagree completely, I find it extremely useful.

Most links are unreadable (e.g. an ID to a cloud file), and are truncated anyways even if readable.

The preview gives me the title of the webpage, which is infinitely more useful -- especially letting me know whether it's a document I've already seen (and don't need to open) or something new, and if so, what.

systemvoltage · on June 11, 2020

To me it pollutes the discussion away from the messages and puts a bunch of thumbnails where it could just be a clean text conversation.

The thing that bothers me is also that I, as a user, have no choice - it just does it automatically.

Do you like IRC?

JadeNB · on June 11, 2020

I suspect that most apps have a way to turn this off. Slack certainly does. That doesn't stop other users from seeing previews of your links, if it's set up that way for them, but it does prevent you from seeing previews of others' (and your own) links.

occamrazor · on June 11, 2020

Whatsapp allows to delete the link box: enter the link, wait for the preview, backspace, and send. The hyperlink remains clickable too.

shultays · on June 11, 2020

Slack does it as well

katbyte · on June 11, 2020

this is one of those "people like different things" situations and a config option should be added to support both.

JadeNB · on June 11, 2020

Slack, and I suspect most other messaging apps, has such a config option.

mikece · on June 11, 2020

A funny story (well, funny because it didn't happen to me) was chronicled on a recent episode of Corey Quinn's "Whiteboard Confessions" -- https://www.lastweekinaws.com/podcast/aws-morning-brief/whit... -- where Slackbot's auto URL unfurling feature "clicked" an SNS alert unsubscribe link when a tech posted the a report including that link into a slack channel. If nobody lost their job over this triggering a SEV-1 alert I'm sure they had a laugh about it later.

Another issue with auto-unfurling links or generating previews of URLs is the potential for stalking. If I'm hunting my ex and I know their cell phone number AND that they have an iPhone I can simply send a link to a website and let iOS generate the preview... which then pings the URL of the website which I set up and I can now run geo-IP resolution and narrow down where my ex is (and maybe I follow up with a well-crafted PDF exploit when I'm ready to narrow down even more on an area).

While systemvoltage was a bit brusque in how the complaint was worded above, these "friendly and helpful" features of Slack, SMS, and other apps can have devastating unforeseen consequences. I question the value of having to opt out of these features instead of making them opt-in instead.

crazygringo · on June 11, 2020

That's an interesting idea of a security breach.

In practice it's going to be difficult because:

1) You can't do effective geolocation on cell phone IP address, except perhaps at a country level [1]

2) Your ex would need to open your message to trigger the link preview loading. If you're stalking them, they're probably either ignoring you or blocking you

3) At most, if the ex is connected to a Wi-Fi network like Starbucks and opens the message, you'll be able to get the city they're in, maybe.

But at the end of the day, what you'd get from link previews is no different from embedding a tracker image in an e-mail. And while it requires someone to suspect they're being stalked, blocking would prevent all of this.

[1] http://www.cs.yale.edu/homes/mahesh/papers/ephemera-imc09.pd...

Nextgrid · on June 11, 2020

As far as I know the iOS implementation mitigates that by generating the preview on the sender side. The receiver side will not make a request to the URL unless they manually click on it.

icebraining · on June 11, 2020

Same for Whatsapp and Wire.

systemvoltage · on June 11, 2020

haha, wow. This is some story. Thanks for sharing.

compiler-guy · on June 11, 2020

Part of the goal here is to prevent people from clicking on malicious links. A preview helps with that.

Or, more simply, you can't be rickrolled if you know the destination in advance.

netsharc · on June 11, 2020

The next level would be for the server to see who is requesting content. Facebook IP? "Here's some harmless HTML!". Browser IP? "Here's an executable pretending to be a harmless page!"

nitrogen · on June 11, 2020

I like link previews. I don't like when they are managed server-side instead of client-side.

crazygringo · on June 11, 2020

Caching a link's title and thumbnail server-side will save your site potentially millions of requests from Facebook and elsewhere.

Seems like a benefit for site owners to me.

Not to mention that the server is reducing the image size of a thumbnail, potentially converting from HTTP to HTTPS, and so on.

lixtra · on June 11, 2020

> Seems like a benefit for site owners to me.

Maybe, but it’s definitely less control for the site owner.

Maybe she wants to do analytics on the previews (how far shared, which ip regions) or delete the content at a specific time. Etc.

ufmace · on June 11, 2020

Also considering the sibling story, at least the server-side preview builder is definitely entirely unauthenticated to every possible service. A preview builder running on the client side might conceivably pick up your authentication to something in various circumstances.

elviswolcott · on June 11, 2020

The issue with doing them client side (other than the missed cache opportunity of using a server) is that it means you're having every user in the chat send a request to an arbitrary URL from their device. Also CORS makes this impossible 99% of the time.

ekimekim · on June 11, 2020

The third approach, which Signal uses as a privacy preservation measure, is for the sender to generate the link preview on send. Of course, this then opens up abuse where the sender can arbitrarily control the "link preview" to say whatever they want.