Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As dimensionality grows, data is more linearly sperable, so fewer dimensions are signficant in distinguishing the data (though for any pair of points, those dimensions will be different).

Grad desc, as an exhaustive search over all those dimensions "maybe" less performant in very high dimensions, where we have enough data to be "right some of the time" when randomly choosing which dim to discriminate on.

If we force zero loss, ie., an interpolation regime, then we're getting interpolation as usual. Can we get there faster when dimensionality increases?

It's plausible if count(relevant dimensions) << count(dimensions), and if they discriminating dimensions for any two random points is itself random.



I believe what you are describing is grid search, not gradient descent. Gradient descent is not an exhaustive search.


It is if you see vec_a . vec_b, as a traversal over the dimensions of both vectors.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: