You might be interested to read this: http://arxiv.org/pdf/1402.3722v1.pdf word2...

nl · on June 3, 2015

Yeah. Pulling a quote:

Why does this produce good word representations? Good question. We don’t really know. The objective above clearly tries to increase the quantity vw·vc for good word-context pairs, and decrease it for bad ones. Intuitively, this means that words that share many contexts will be similar to each other (note also that contexts sharing many words will also be similar to each other). This is, however, very hand-wavy.

I love an honest paper.