Monday, June 1, 2020

In Which I Venture Into The Thickets Of Data Science And Hume's Problem Of Induction

One of the things I started doing in the middle of lockdown was courses at Data Camp. I started with Machine Learning for Everyone, then moved on to Python for Beginners. In case this isn't your universe, Python is a programming language that is often used for data science.

I want to emphasize that I did not do this because I suddenly had "extra time" on my hands or because I was casting around for something to do. There are different lockdown experiences out there, and the "extra time" experience has not been my experience. For one thing, everything to do with my work seems to take four times as long as it did before.

Rather, the way my emotional life works, I often have a background sadness that I keep at bay through doing things. In normal life, the bustle of activity and the feeling of accomplishment are central to that process. With lockdown, there is no "bustle of activity." So accomplishing things -- or feeling like I am accomplishing things -- has become a huge thing. So why not learn something about data science?

The classes are excellent, with lots of examples and exercises. On encountering these, I immediately started thinking about data science and Hume's problem of induction.

One of the first examples that my course used to illustrate machine learning concepts had to do with predicting how much money a movie would make based on input factors like star power, budget, advertising, and so on. And I was like, "Wait, what"? Is the idea supposed to be using the data of the past to predict earnings in the future? But isn't the popularity of works of art always shifting and changing? Isn't art frequently based on novel ideas? Also, I thought the popularity of films was regarded as wildly unpredictable.

If you've studied philosophy, you won't be surprised to hear that my next thought was, "What about Hume's problem of induction"?

If you haven't: briefly, Hume's problem of induction is that inductive reasoning -- in which we go from past cases to generalities and the future -- always rests implicitly on an assumption that the future is going to be like the past. And yet we have no logical reason to believe that the future is going to be like the past. So inductive reasoning, which is at the core of basically all empirical science, has no justification. You might try saying "Hey, but the future has always been like the past." But to use that to solve the problem would mean applying the past to the future, and so would be induction, and so would be circular.  

You can see right off the bat that these are deep waters we are getting into, and I have to warn you that this is going to be the Phil 101 level version of things because I'm not a specialist in this area, I'm just a person thinking about data science. But I do remember from teaching Phil 101 that the point with Hume isn't just about a lack of certainty. It's no help to say that while we're not sure the future will be exactly like the past, we have reason to believe it will probably be like the past. Because whatever version of "probably" you come to, that judgment relies on thinking that in the future, things will occur with the likelihood that they did in the past. In other words, we're back with the circularity problem.

Anyway, I'd been wondering vaguely for a long time about social science and the problem of induction, and then I started thinking about data science and the problem of induction. In the context of social reality, Hume's problem starts to take on a practical urgency. Because when it comes to people, when is the future ever like the past? Our current moment seems designed to hammer this point home. Ha ha, you thought the future was going to be like the past? Guess again, suckers.

So like anyone else, I then googled "data science," and "Hume" and "problem of induction." (This is where I have to admit that my usual searching via Duck Duck Go got me nowhere and so I was forced to recall to mind the superiority of Google as a search engine).

I found this discussion, which gives a good overview, but which ends by saying that "instead of strictly rejecting or accepting, we can use inductive reasoning in a probable manner." But I didn't understand this, as I thought the problem applied to probabilistic reasoning as described above.

I also found this piece, which covers a lot of interesting territory but which concludes that AI works because "the problem of induction can be managed," which again, I didn't understand. 

So then I was like, Do I not know what is going on? So I went to the Stanford Encyclopedia of Philosophy entry on the Problem of Induction. Yes, there are attempts to get around the problem of induction via "Arguing for a Probable Conclusion." Not surprisingly, the matter turns out to be very complicated, though I note that each subsection seems to end with the author of the article basically saying "this is why that doesn't really work."

Noticing that the entry points the reader also to "Philosophy of Statistics," I went there, and was fascinated to see in the first section:  "Arguably, much of the philosophy of statistics is about coping with this challenge [of the problem of induction], by providing a foundation of the procedures that statistics offers, or else by reinterpreting what statistics delivers so as to evade the challenge... It is debatable that philosophers of statistics are ultimately concerned with the delicate, even ethereal issue of the justification of induction. In fact, many philosophers and scientists accept the fallibility of statistics, and find it more important that statistical methods are understood and applied correctly." 

So at this point, I guess figuring out what I think about data science and the problem of induction will require some intense intellectual effort. I think it will be worth it though. The most interesting item I found in my searching argues that the real challenge that the problem of induction poses for data science is that people "change and grow morally and socially in non-transitive, non-linear ways."   

I agree, and I would add that social institutions and practices also change in complicated ways. We now get into the debate over whether there are simple and uniform laws that lie beneath what looks like social chaos, or whether people and their doings create novelty in ways that are inherently impossible to pin down. You may not be surprised to hear I tend toward the latter view, not because I think free will lies outside the laws of the universe, but more because the creativity and complexity of humans isn't susceptible to that kind of generalizing thinking.

The topic is complex. But on my side can I present the wild success of Parasite, surely a film whose budget and star power would never have led to predictions for its success? 

No comments: