All posts by year

Showing everything matching: clear

2023

Checksum Your Data Processing

08 Jan 2023
Data Analysis Rants

This year I volunteered to manage the fall fundraiser for my kids’ Girl Scout troop. The girls sell snacks and magazines to their family and friends, and then get a cut of the proceeds. They also can earn patches and little trinkets based on how many items they sell. The Girl Scouts use the volunteer managers as a free last-mile distribution network, which means that I spent the better part of my morning today going through bags of tchotchkes and packaging them up for the 20+ girls in the troop based on what they earned.

This is fiddly work. The patches are small and easily miscounted or mislaid, and there are more than 15 different kinds of rewards which makes it easy to accidentally skip over one. And mistakes are painful — I’m on the hook if anything is missing[^money] — so it’s important to get it right. So I borrowed something from my data analysis bag of tricks and used a checksum.

More...
2021

Spicy Cheese Bread: The Best Baked Good in Wisconsin

04 Apr 2021
Food Analysis

Wisconsin: land of beer, bratwurst, and cheddar cheese. I consumed copious amounts of all three during my time as a grad student at UW-Madison, but none is the food that I miss the most. That honor is reserved for spicy cheese bread.

More...

Lies, Damn Lies, and NFL Next Gen Stats Powered by AWS

24 Jan 2021
Football Transparency Rants

I like to imagine that at some point in 2017[^2017] the NFL execs gathered around a long mahogany table in their secret clubhouse NYC headquarters. They took a break from their important discussions about how to downplay the connection between football and CTE or the best ways to sucker cash-strapped municipalities into funding new stadium development, to grill the middle manager who was in charge of their data. “Hey nerd!” I presume they opened, “Why the hell have we been paying all this money for the last three years for these stupid RFID chips? Teams are barely using it!”

More...

Political Polling Part 4: The Tricky Stuff

08 Jan 2021
Polling Analysis

A big problem, possibly the biggest problem in political polling, is that you can’t guarantee what demographics will be correlated with the candidates, or how voters will turn out at the ballot box or even pick up the phone to respond to a poll.

More...

Political Polling Part 3: Turnout and Response Rate

03 Jan 2021
Polling Analysis

Previously in this series I discussed the concept of statistical sampling, and how even the perfectly constructed poll will produce a distribution of possible results due to the random chance of who happens to respond. Those are so-called “random errors”, and they’re relatively easy to predict and quantify. Now let’s talk about other kinds of errors, ones that pollsters spend the bulk of their time worrying about.

More...
2020

Political Polling Part 2: Demographics

30 Dec 2020
Polling Analysis

In the previous post I simulated an electorate as though every person in it was essentially the same. That was useful to show the effects of statistical sampling, but the real world works differently: different demographic groups vary in their candidate preference, turnout likelihood, and even in how they interact with polls.

More...

Political Polling Part 1: Sampling

28 Dec 2020
Polling Analysis

The central idea that underpins all polling is the concept of statistical sampling, which may sound intimidating but for our purposes really boils down to two things:

More...

Political Polling Part 0: Introduction

24 Dec 2020
Polling Analysis

In the aftermath of the 2020 U.S. elections, I was confused.

More...