I post occasionally on LinkedIn, and the site responds to my stated expertise with questions which I am prompted to answer. I decided to answer one about the problem of multicollinearity in machine learning. My rather snarky response follows.
Machine learning is (mostly) a new label attached to estimation techniques that have been around for a century or more, and have been implemented using computers for fifty years or so. Broadly speaking, they are variations on least-squares regression, developed by Laplace and Gauss in the early 19th century. The most important variation is discriminant analysis (the technique underlying pattern recognition) developed by Fisher in 1936.
(Image generated by DALL-E)
Multicollinearity (meaning that the explanatory/predictive variables are correlated) is a problem for all such techniques. It means that the contributions of individual explanatory variables can't easily be distinguished.
Given the "black box" approach characteristic of discussions of machine learning, this may not be of much concern. As long as the domain of application is the same as that from which the training set is derived, multicollinearity doesn't affect the accuracy of predictions. If that's all you care about you don't need to worry. But that's a big "if" and a bigger "as long as".
Note I also tried ChatGPT on this question. I couldn’t get it to say anything distinctive about machine learning. When I asked about ways to make machine learning more robust to multicollinearity, I got answers like “ridge regression”, which was old stuff when I was still wearing bell-bottomed trousers.
A little off topic, but I've ran into a lot of mathematicians, linguists, psychologists etc. etc. who seem palpably disappointed- even a little angry- that machine learning is not theoretically novel- that it's just the equivalent of banging an enormous amounts of computation and data together. These people are disappointed because they wanted big AI advances to flow from, or perhaps lead to, deep secrets about the human mind, the nature of thought etc.
To me, the effectiveness of just ramming data and computational power over ****relatively**** simple models until they work is a testament to the power of evolution.
Thank god, I feel like I'm living in the land that invented invisible clothes. As an a former programmer with an interest in statistics, linear regression and predictive models being the main interest, I'm continually aghast at the inflated claims of the unprecedented power of so called AI and ML. It's simplify another IT industry hype cycle the result of which there will be marginal change for vast majority of the world's population and millions transferred to overrated IT firms. Hopefully I can soon retire from what has become a symbiosis of snake oil and ignorance. It's a pity, was programming was once a really satisfying craft, I pity the grads today.