It's because there is no formal definition of dependence in statistics. Let that sink in for a minute.
What? Statistical dependence (of random variables) is defined clearly and precisely.
Data has to be changed and manipulated into i.i.d. form, or the algorithms won't work
Neural networks don't use the iid assumption.
I downvoted you because it seems like you don't really know what you're talking about and you're currently the top post in the thread. Please don't spread misinformation.
But they use it in different ways. For example, an ARMA model is specifically looking for dependencies among the data points, so there assuming iid among them would be an absurdity. In time series analysis, you're looking for the model's residuals, not the source data, to be independent and identically distributed.
Also, in real-world statistical modeling, there's nuance. Just like for any assumption of a parametric model, the data not being iid doesn't mean that the model is 100% crap, it means that you can't draw specific conclusions about the quality of the model.
Which is fine, because maybe you don't care to draw those conclusions, anyway. One of the key differences between machine learning and traditional statistical analysis is that you aren't so worried about developing parsimonious models with well-defined parameters. You're typically just empirically interested in the model's predictive or descriptive utility. This difference isn't a result of one school being more principled and the other being more lackadaisical. It's reflective of differing goals: One approach was developed for use in scientific hypothesis testing, where your primary deliverable is (in the case of something like regression, anyway) the model's parameters, and its estimates are a means to evaluate those parameters. The other approach is used for modeling processes, where the primary deliverable is the estimates, and the parameters are a means to get those estimates.
That kind of iid assumption could be summarized as "the training data is representative of the data we want to apply the model to", and if it doesn't hold, that's indeed a problem.
But "Data has to be changed and manipulated into i.i.d. form, or the algorithms won't work. How does an independent set of random variables give us a model of the actual dataset which is a very limited representation of the real world?" strongly implies that the data itself should be decomposed into iid variables.
While whitening ("manipulating into iid form") is a common preprocessing technique because it's simple and effective, that doesn't mean that learning algorithms wouldn't work without it. They'd just take a bit longer to arrive at the same result.
What? Statistical dependence (of random variables) is defined clearly and precisely.
Data has to be changed and manipulated into i.i.d. form, or the algorithms won't work
Neural networks don't use the iid assumption.
I downvoted you because it seems like you don't really know what you're talking about and you're currently the top post in the thread. Please don't spread misinformation.