For anyone with more legal knowledge than I have, how does scraping and processing social media and other image sources deal with copyright license, especially ones forbidding commercial use.
I feel like there's a meaningful legal difference between a totally public, open to be downloaded image of you from the internet, even storing it forever, and then using that in a product.
It would be like taking something with a GPL license - totally legit to download and use and modify and repost, with the original license/copyright attached - and using it in a closed source commercial product.
I've been wondering the same thing. The photo either belongs to the user or to Facebook, just because it's viewable on the site doesn't give Clearview the right to use it. It must be a violation of the terms of service and I'm surprised we haven't heard anything from Facebook about it.
They say they scraped the open web - so for example this would include many of our personal sites, many of which have profile pictures.
For myself, I took the picture on my site, and it's under a: Attribution, NonCommercial, NoDerivatives CC license. I'd argue that
1. Using my/anyone's profile picture in an AI system for profit is commercial use.
2. A neural network is a derivative work of all images used to train that network.
So on point 1 I agree with you. I think point 2 is pretty iffy though. Unless there has been some recent legal proceeding that I am unaware of, point 2 isn't true.
Oh yeah, I'm not sure either are true legally as I'm not a lawyer - just my opinion.
The reasoning I follow for point 2 is:
That if a neural network is not derivative of its inputs, and given a sufficiently large gan, you could "launder" inputs into copy-write free outputs. That's also not been done as far as I know, but I know it's starting to be an issue in NLP.
Re: 2 - Legally no. Like a search engine’s index it is not a derivative work but a “transformative” one and therefore not subject to copyright restrictions.
I feel like there's a meaningful legal difference between a totally public, open to be downloaded image of you from the internet, even storing it forever, and then using that in a product.
It would be like taking something with a GPL license - totally legit to download and use and modify and repost, with the original license/copyright attached - and using it in a closed source commercial product.