Don't you think that if this data was known to be widely and mostly beneficial, reCaptcha would be falling all over themselves to grab the good PR? The fact that regular folks hear virtually nothing about this strongly indicates that it's like most data collection -- if people knew the real deal they probably wouldn't happily sign on and would likely bring more questions than they want to deal with.
Not really, non-tech people don't know or care about reCaptcha. I still think its evil for reCaptcha to be so prolific and used for data collection, but it's a positive side-effect that its also used for less evil things like labeling for data sets.
Right. But if it was mostly good, whoever reCaptcha is could raise/make boatloads of money with "you're not just practicing safe computing, you're helping save children's lives" type ads/fundraising.
Then they can pay for their own mechanical Turk labour, thank you very much. I will not be sponsoring a corporation out of the goodness of my heart, out of my own time.
If I ever learn that they release that dataset to the public, my position on this may change.
If that is something important that we should rely on any company involved in that should be spending the readily available resources to do it correctly, not hoping that random people trying to log in to their email pick the correctly labeled data.