*It's because there is no formal definition of dependence in statistics. Let tha...

srean · on July 6, 2018

Strongly agreed. It seems robius really is clue less when he/she's talking about modeling independence or modeling the lack of independence.

bjourne · on July 6, 2018

All statistical learning, which neural nets are a form of, use the iid assumption. See https://stats.stackexchange.com/questions/213464/on-the-impo...

bunderbunder · on July 6, 2018

But they use it in different ways. For example, an ARMA model is specifically looking for dependencies among the data points, so there assuming iid among them would be an absurdity. In time series analysis, you're looking for the model's residuals, not the source data, to be independent and identically distributed.

Also, in real-world statistical modeling, there's nuance. Just like for any assumption of a parametric model, the data not being iid doesn't mean that the model is 100% crap, it means that you can't draw specific conclusions about the quality of the model.

Which is fine, because maybe you don't care to draw those conclusions, anyway. One of the key differences between machine learning and traditional statistical analysis is that you aren't so worried about developing parsimonious models with well-defined parameters. You're typically just empirically interested in the model's predictive or descriptive utility. This difference isn't a result of one school being more principled and the other being more lackadaisical. It's reflective of differing goals: One approach was developed for use in scientific hypothesis testing, where your primary deliverable is (in the case of something like regression, anyway) the model's parameters, and its estimates are a means to evaluate those parameters. The other approach is used for modeling processes, where the primary deliverable is the estimates, and the parameters are a means to get those estimates.

yorwba · on July 6, 2018

That kind of iid assumption could be summarized as "the training data is representative of the data we want to apply the model to", and if it doesn't hold, that's indeed a problem.

But "Data has to be changed and manipulated into i.i.d. form, or the algorithms won't work. How does an independent set of random variables give us a model of the actual dataset which is a very limited representation of the real world?" strongly implies that the data itself should be decomposed into iid variables. While whitening ("manipulating into iid form") is a common preprocessing technique because it's simple and effective, that doesn't mean that learning algorithms wouldn't work without it. They'd just take a bit longer to arrive at the same result.

avaku · on July 6, 2018

Agree. I can't downvote, so I just agree :)