Choose a <target word, correct context example> pair as positive example, then randomly pick up k words from dictionary as negative samples.
How we should the k words?
If we choose it uniform, it does not work well; if we pick up words by their frequency, it is easy to have a lot of common words like “a”, “the”, “of”.
Choose a solution in the middle:
f(wi) is word frequency.