He makes an excellent point about the possibility of comparing word vectors trained on different corpora to make quantitative statements about differences in culture, either over time or between sub-cultures:
"I’d like to emphasize that which words are feminine or masculine, young or adult, isn’t intrinsic. It’s a reflection of our culture, through our use of language in a cultural artifact. What this might say about our culture is beyond the scope of this essay. My hope is that this trick, and machine learning more broadly, might be a useful tool in sociology, and especially subjects like gender, race, and disability studies."
Thanks! That was one of the most exciting parts of the post for me.
It would be really cool to have something, like the Google Books ngram viewer [1], which would allow you to see how this changes over time, using a huge corpus. I imagine a graph where the x-axis is year, the y-axis is a linear combination of word vectors that the user defines, and then the user can select words and see them plotted over time.
Ha, I was thinking along exactly the same lines when I read that paragraph. Last week, I found myself reading a 1784 magazine article [1] about "Aerostatical Experiments" (the first hot air balloons) which referred to "inflammable air" which is what they called hydrogen in those days. Google n-gram viewer gives a beautiful illustration of when the name changed [2] -- this is an obvious switchover but I imagine many words change meaning and usage more slowly and in more subtle ways, so your proposal was quite exciting to think about. Let's hope someone takes up that line of research, either from the humanities side or the machine learning side.
"I’d like to emphasize that which words are feminine or masculine, young or adult, isn’t intrinsic. It’s a reflection of our culture, through our use of language in a cultural artifact. What this might say about our culture is beyond the scope of this essay. My hope is that this trick, and machine learning more broadly, might be a useful tool in sociology, and especially subjects like gender, race, and disability studies."