Thursday, March 24, 2011

A Rambling About Stats

Let me get this out of the way first: Ken Pomeroy is one smart dude. He does stuff with sports and numbers that I could only dream of, and he's turned it into what seems to be an awesome career, being a student of sports and stats, earning courtside seats at NCAA games and tweeting win probabilities. The guy has gained traction among the most renowned sportswriters out there, and for that he deserves all the credit in the world.

Now that I've gotten that out of the way, let's talk about win probabilities. I just don't get it. I understand that each and every possession has some varying amount of impact on how the rest of the game will play out, based upon the success or failure of that possession. However, from what I've followed of the KenPom win probability tweets during the tourney, they just seem off.

Example #1: Villanova vs. George Mason, Ken has Nova at 80% at the under-4 media timeout in the second half. Now, all things considered, being up by 6 points with 4 minutes left is an enviable position, but purely on the surface, I see it as only a two possession game when each team has around 4 possessions left. Inwardly, I doubted the 80% probability, knowing of GMU's penchant for magic and Nova's, well, shittiness to end the season, especially that horrible loss to USF at the Big East tourney. At the end of the season, Nova clearly looked like a team that didn't know how to win games; Mason, on the other hand, has the experience of a recent Final Four trip and a coach who can coax great things out of his players. I know KenPom is only looking at numbers, but when handicapping a game with win probability, don't these intangibles/qualitative factors matter at least a little bit, if not a whole lot? How can Nova's late-season futility factor into that win probability? Maybe it does--I know KenPom is a smart guy and generally thinks his numbers through to the last detail. But still, 80% seemed very high to me in a two-possession game with 4 minutes left, and Mason ended up actually winning the game.

Example #2: Duke vs. Arizona, KenPom has Duke at 91% at halftime. Duke is up by 6 in a game that has seesawed during the first half, with neither team convincingly pulling ahead of the other for a sustained period (Duke was up 11, but the lead vanished rapidly). Knowing what I know as a Duke fan--that long, athletic, strong rebounding teams are the perfect recipe for disaster for Duke; that Arizona goes 2-3 guys deeper than Duke and would thus likely be fresher in the second half; and knowing what stats gurus know about Arizona--that Sean Miller is the best after-timeout coach in the NCAA, how does KenPom handicap Duke at 91% in a two-possession game when Sean Miller is coming out of a 20-minute timeout? Again, I'm not using science here--I am just saying that if someone had given me 9 to 1 odds on Arizona winning that game at halftime, I would have taken them in a heartbeat. That game was not nearly 90% decided at the half, but somehow KenPom's numbers said it was.

I only bring this up because of something that happened to me long ago. When interviewing for a job, my boss-to-be asked me, "Can you statistically prove that there is such a thing as 'The Zone?'" He offered the Mike Dunleavy, three 3's in three possessions scenario that unfolded in the 2001 Duke-Arizona title game. As a 35-38% three-point shooter at the time, the chance that Dunleavy makes three in a row is not great--around 4-5%. Yet Dunleavy did it in the game, and I'm certain at many other points in the season, although I cannot be sure without watching tape. Of course, I bombed the question, but the fact remains, Dunleavy made a 4-5% likely scenario occur more than 4-5% of the time, suggesting that there is such a thing as "The Zone" which makes him more likely to hit 3-point shots on a given night. The idea is similar to the recent financial crisis, where some uncountable number of six-sigma events all occurred one after the other. By definition, six-sigma events should occur once every 10,000 years or something bizarre like that, yet we had Bear Stearns funds collapsing, Lehman failing, housing tanking, and subprime lenders failing all at the same time. In that case, the economy was in "The Zone" of doing really shittily. Just like Dunleavy was in "The Zone" for making 3's against Arizona in 2001. So while numbers explain a lot, much of the time--more often than the number suggest--the numbers end up being flat-out wrong. Which brings me back to the damn win probabilities. Nova = wrong. Duke = wrong. Bigtime. The numbers told us one thing, the reality was something different--and no one was THAT surprised.

I can't begin to understand the numbers, models, and details involved in calculating a win probability. I just know that the win probability cannot and will not incorporate all the available information at hand (intangibles, qualitative factors, fatigue in a late-season game), and thus cannot be that accurate. I'd be interested to know how the win probabilities have fared for the entire tournament. Until then, I'll just go with my gut.

No comments:

Post a Comment