It's interesting. Everybody is always talking about creating unbiased machine le...

Solvency · on June 2, 2023

In the data sense isn't bias literally just the result of limited/narrow data? So isn't the problem not in how you train models but simply the fact that it's impossible/exceeding difficult to provide omnipotent and universal data?

bravura · on June 2, 2023

Bias of a data set is when it doesn't reflect the true underlying distribution of nature.

So a face corpus with only white faces doesn't reflect the diversity of faces one encounters in the world.

With that said, unbiasing data is extremely difficult because the true distribution of things is unknown and sometimes subjective. The visual images you would encounter as a human from birth to death growing up in a first world country would be very different from that of a drone's video camera. Are we really sure that imagenet should be K% animals and not K/2% animals? And if you train a machine learning algorithm on every possible image with every possible pixel, it will just learn noise.

WalterBright · on June 2, 2023

I'm not biased. It's everyone else that is.