I'm only assuming here but Flickr must be the source of some of the larger image training sets because they do let you filter by license. Most common license is attribution, non-commercial, share-alike (BY-NC-SA), permissive to remixing but yeah doesn't explicitly mention "digest into neural soup"
I think "allow our future AI overlords to learn from your work without royalty or credit" is a hard checkbox to sell for a lot of creators. At one point I moved all my cloud photos* from Google to Adobe Lightroom because the latter did offer a checkbox to the effect of "don't use my photos to train neural nets" (or maybe it was a more innocuous 'improve our future products', I can't recall, but it was explicit enough to make me switch)
We need labels for "free to train AI models on", at least until/unless the governing bodies declare that all is fair use.