Log in

No account? Create an account
09 May 2013 @ 11:18 pm
Warning: maaaaaaajor geekery ahead! A closer look at Doctor Who ratings  

OK so I like math. Although I'm not a statistician, I am pretty good with math (which is lucky for anyone in the USA haha bc in a few short months I will be using said math to prepare intravenous drug compounds for hospitalized patients... tl;dr if I sucked at this, it would suck WAAAY worse to be you bwahahah XD)

ANYWAY! It should not have escaped anyone's attention that Doctor Who ratings appear to be teetering a bit. Some people think "eh, it's not a big deal" while some think that this is dangerous. I'm one of the latter group. And because I have a huge-ass assignment due tomorrow that I don't feel like doing right now, I thought I'd explain why. (For simplicity's sake, there are no Xmas specials or 2009 specials included in this data - just final BARB ratings from 2005-present).

OK first off, if you compare the OVERALL numbers from s1-s7, although s7 (in red) looks like it's a bit low, there doesn't seem to be much difference ... right? Right???? And, truth be told, the only ~statistically significant~ differences (eg, where s7 really comparatively sucks balls) are when it's directly compared to s1. But ... this chart is a colorful mess. A colorful MEANINGLESS mess, because all I'm showing you is a bunch of lines without analysis. Six would probably proudly wear this chart as a coat it's so fugly. Anyway, this is usually the data people are looking at when they glance at the ratings and shrug it off as being "not all that different." This is not accurate.


So let's clean it up a bit!!! To simplify things, I'm gonna compare apples to apples. All RTD-era episodes are accounted for by the blue line ("You said BLUE!!" ... "I said NOT blue!!!"), the beginning of the Moffat era (s5-6) is the green line, and s7 is the red line. NOW things start to look interesting!!

OK this is not a calculus class but I hope this shows why math is kind of cool if you're a total fandom nerd and you want to prove other fandom nerds wrong XD. Look at the pretty lines and numbers!!! Here's what they mean: see the dotted lines with the equations? Those are ~trendlines~ for the graph. Basically what that means is it tells you, on average, where the hell your data is going. See the equations? Those tell you how fast viewers are flocking to your show (or, alternatively, turning it off bc it sucks and going to read fanfic or something lol idk). And see the (sorrysorry tiny font I knowww) "R^2" value? That tells you if you can trust your trends or not (lol @ those evil, untrustworthy trend bitches). The closer to "1" the better, and these are all pretty freaking close to one which means the trends are pretty strong. (So anyone who tries to respond and say it's meaningless - look at the R^2 value and hush lol).

So what does this mean? Again this isn't a calculus class so I'll skip the lecture on how to calculate derivatives and try not to make this too boring (BUT CALCULUS IS SUPER COOL AND YOU SHOULD LOVE IT GUYZ), but essentially the first number (x^2) is saying "this is how fast viewers are coming/going".

And this is where the RTD era is strong, s5-6 are a bit weaker, and s7 is in trouble. For the RTD era, the first number shows that yeah viewers were coming and going - but that there was a general trend back up. For s5-6, there are fewer people coming and going. And for s7, the number is negative --- that means there is a trend of people leaving. How reliable is this? Well back to the R-squared thingy I was telling you about - it's pretty freaking close to 1, so the trend is pretty tight.


One of the big weaknesses here is that premieres and season finales tend to have more viewers, so in this next graph, I simply removed the premieres and finales (which meant I had to remove mainly s6-7 episodes from the data pool bc of the split season). Taking away those premiere/finale bumps in viewership looks even worse for s7 - the number of viewers leaving is even MORE negative now!!!! And s5-6 has a much flatter line too ... viewers were pretty stagnant. Again, the RTD era had some swings, but at the end of the day, viewers were coming home. That's not happening for the past few years, especially this year.


So what can we make out of all of this? Tl;dr, the numbers aren't good. And they're getting not-gooder by the season.


eve11eve11 on May 10th, 2013 06:21 pm (UTC)
Sorry, I am a card-carrying statistician and all I see is a general trend for all seasons, bucked by the end of season 4 with its specials.

For your positively curved red and blue lines, re-calculate with the last 3 episodes removed as you have for the S7 data. Willing to bet they will also trend down. You are fitting a trend line that looks like some kind of cubic... why that particular line? ETA: my mistake, quadratic. That is pretty highly parametric, which means the format of line is going to be susceptible to outliers, and especially susceptible to boundary points. If you get a new data point that changes inflection it has the possibility to severely affect the form of the line. And you don't really have a lot of points to fit something like a smoother. You could take the mean of each episode crosswise and see if any seasons lie "more likely above" or "more likely below", maybe?

ETA: and one last thing which is that anyone can look at the above chart and see that S4 is an obvious outlier at the end. So the big uptick in the blue line in your chart is mostly due to incorporating the s4 outlier.

I think you should also include at least the month of airing as a covariate.

Gonna have to wait for the next few episodes to see if s7 trends upward again like the previous seasons or if it goes down.

Edited at 2013-05-10 06:30 pm (UTC)
eve11eve11 on May 10th, 2013 06:37 pm (UTC)
My mistake, it's the end of series 4 (not the specials) that is the outlier. eg, Turn Left, The Stolen Earth and Journey's End.
hammard: patrick troughtonhammard on May 10th, 2013 06:42 pm (UTC)
Thanks for your post! I agree totally and have done some other posts on it.

The difference in perceptions between us Statisticians and them Engineers ay ;)
kilodaltonkilodalton on May 10th, 2013 09:18 pm (UTC)
Eh for s7 I'll keep updating the data as it comes in and repost accordingly. But s5 and s6 are fairly flat, and with the same number of total eps so I'm not holding out much hope there. But we'll see.

I took the parametric line because hands down I was able to get the best fit from it no matter the season. I suppose I could have done it several different ways (and *did* do it several different ways - including T-test lol), but each looked uglier than the last and I thought this one illustrated the point more elegantly and just as accurately as anything else I could have screencapped - all of which would have looked much uglier.

Re s4 - true, that's why I got rid of all finales and premieres. There are a lot of things which cause bumps and dips - s2 for example had a crap-ton of two parters, the second halves of which pushed its numbers lower (bc all 2-parters pretty much have low-rated second halves, across the board). I left that alone though because I didn't think the RTD era needed any help looking better. It's not perfect - but it is what it is! =)

I could have included a lot of other things as covariates too - but I'm sure the BBC is already doing that, with much more advanced software than I have XD