As dimensionality grows, data is more linearly sperable, so fewer dimensions are... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		mjburgess on March 2, 2022 \| parent \| context \| favorite \| on: Gradients Without Backpropagation As dimensionality grows, data is more linearly sperable, so fewer dimensions are signficant in distinguishing the data (though for any pair of points, those dimensions will be different). Grad desc, as an exhaustive search over all those dimensions "maybe" less performant in very high dimensions, where we have enough data to be "right some of the time" when randomly choosing which dim to discriminate on. If we force zero loss, ie., an interpolation regime, then we're getting interpolation as usual. Can we get there faster when dimensionality increases? It's plausible if count(relevant dimensions) << count(dimensions), and if they discriminating dimensions for any two random points is itself random.

cracker_jacks on March 2, 2022 [–]

I believe what you are describing is grid search, not gradient descent. Gradient descent is not an exhaustive search.

mjburgess on March 2, 2022 | [–]

It is if you see vec_a . vec_b, as a traversal over the dimensions of both vectors.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact