?

Log in

No account? Create an account
 
 
09 May 2013 @ 11:18 pm
Warning: maaaaaaajor geekery ahead! A closer look at Doctor Who ratings  

OK so I like math. Although I'm not a statistician, I am pretty good with math (which is lucky for anyone in the USA haha bc in a few short months I will be using said math to prepare intravenous drug compounds for hospitalized patients... tl;dr if I sucked at this, it would suck WAAAY worse to be you bwahahah XD)

ANYWAY! It should not have escaped anyone's attention that Doctor Who ratings appear to be teetering a bit. Some people think "eh, it's not a big deal" while some think that this is dangerous. I'm one of the latter group. And because I have a huge-ass assignment due tomorrow that I don't feel like doing right now, I thought I'd explain why. (For simplicity's sake, there are no Xmas specials or 2009 specials included in this data - just final BARB ratings from 2005-present).

OK first off, if you compare the OVERALL numbers from s1-s7, although s7 (in red) looks like it's a bit low, there doesn't seem to be much difference ... right? Right???? And, truth be told, the only ~statistically significant~ differences (eg, where s7 really comparatively sucks balls) are when it's directly compared to s1. But ... this chart is a colorful mess. A colorful MEANINGLESS mess, because all I'm showing you is a bunch of lines without analysis. Six would probably proudly wear this chart as a coat it's so fugly. Anyway, this is usually the data people are looking at when they glance at the ratings and shrug it off as being "not all that different." This is not accurate.


image


So let's clean it up a bit!!! To simplify things, I'm gonna compare apples to apples. All RTD-era episodes are accounted for by the blue line ("You said BLUE!!" ... "I said NOT blue!!!"), the beginning of the Moffat era (s5-6) is the green line, and s7 is the red line. NOW things start to look interesting!!

OK this is not a calculus class but I hope this shows why math is kind of cool if you're a total fandom nerd and you want to prove other fandom nerds wrong XD. Look at the pretty lines and numbers!!! Here's what they mean: see the dotted lines with the equations? Those are ~trendlines~ for the graph. Basically what that means is it tells you, on average, where the hell your data is going. See the equations? Those tell you how fast viewers are flocking to your show (or, alternatively, turning it off bc it sucks and going to read fanfic or something lol idk). And see the (sorrysorry tiny font I knowww) "R^2" value? That tells you if you can trust your trends or not (lol @ those evil, untrustworthy trend bitches). The closer to "1" the better, and these are all pretty freaking close to one which means the trends are pretty strong. (So anyone who tries to respond and say it's meaningless - look at the R^2 value and hush lol).

So what does this mean? Again this isn't a calculus class so I'll skip the lecture on how to calculate derivatives and try not to make this too boring (BUT CALCULUS IS SUPER COOL AND YOU SHOULD LOVE IT GUYZ), but essentially the first number (x^2) is saying "this is how fast viewers are coming/going".

And this is where the RTD era is strong, s5-6 are a bit weaker, and s7 is in trouble. For the RTD era, the first number shows that yeah viewers were coming and going - but that there was a general trend back up. For s5-6, there are fewer people coming and going. And for s7, the number is negative --- that means there is a trend of people leaving. How reliable is this? Well back to the R-squared thingy I was telling you about - it's pretty freaking close to 1, so the trend is pretty tight.

image


One of the big weaknesses here is that premieres and season finales tend to have more viewers, so in this next graph, I simply removed the premieres and finales (which meant I had to remove mainly s6-7 episodes from the data pool bc of the split season). Taking away those premiere/finale bumps in viewership looks even worse for s7 - the number of viewers leaving is even MORE negative now!!!! And s5-6 has a much flatter line too ... viewers were pretty stagnant. Again, the RTD era had some swings, but at the end of the day, viewers were coming home. That's not happening for the past few years, especially this year.


image


So what can we make out of all of this? Tl;dr, the numbers aren't good. And they're getting not-gooder by the season.

(AND CALCULUS ROCKS AND YOU SHOULD TOTALLY LOVE IT!!)


 
 
 
hammard: tennanthammard on May 10th, 2013 06:41 pm (UTC)
You should read Tom Spilsbury's post on this subject he did back during Season 6 where he analysed ALL the ratings and showed the change was very little:
http://tomspilsbury.moonfruit.com/#/home/4554491282/Let%27s-Kill-This-Myth/195123

Whilst we cannot analyse all the recent ratings in this way due to lack of data (Iplayer stats will not be available for a while) it is probable to be the same.

What the figures are possibly mapping is the move from TV to online watching (just as the strong downwards trend from S1 to S2 tracks people moving to repeat vieiwings).

Also, out of curiousity, would you really say r-squared is valuable for such a small data set for a trend of fit. Perhaps it's different in other sbjects but I studied statistics and we would not consider it a relaiable measure of trend for a data set this small. As for example, your r-squared function for this season is 0.81 based on only 7 variables but it can only explain every 4 in 5 variances so the trend cannot be said to be solid.

Anyway, don't want to start a flame war on your own blog. I love maths too :D

Edited at 2013-05-10 06:53 pm (UTC)
eve11eve11 on May 10th, 2013 07:39 pm (UTC)
Another interesting story would be to start with eg, episode 4 in each series, and plot the quadratic curves point by point as you gain new points with each new episode, and see how much the prediction changes with the next new data point. The future predictive trend depends rather massively on where it estimates the inflection point.
kilodaltonkilodalton on May 10th, 2013 09:08 pm (UTC)
Eh Spilsbury's post is kind of what I was saying about people who just look at the raw data and don't look for any statistical significance or trendlines. (And besides the trendlines, I went so far as to run a one-tailed T-test: Spilsbury is wrong. Season ONE is the outlier and is statistically significant, not season 4 -- see what I mean about misleading data on the surface?) Not that I'm surprised though - he makes his living off DWM, what the heck is he supposed to say lol?

.... and if you truly don't think that R^2 values are reliable for small data sets, then I would personally make sure to never take any medication EVER. Because the pharmacokinetic data used to determine safety profiles and how often you can safely dose a med without killing someone is often based on fewer parameters XD (so that said, I'm comfortable-enough. No statistic is perfect, and of course more data is better - but if it's good enough for the FDA, it's good enough for me).

Re iPlayer though - everyone keeps putting their eggs in that basket. If that basket counted UNIQUE views instead of TOTAL views, and broke out the data from those who watch it on TV then rewatch later it might mean something. But at this point, it's kind of junk data, which is too bad =/
eve11eve11 on May 10th, 2013 09:16 pm (UTC)
Can you explain to me what you mean by "Season 1 is statistically significant?" Do you mean the average ratings for season 1 are statistically significantly different from other seasons? You have 13 correlated data points in a temporal trend, and I'm willing to bet that whatever calculation was done in terms of a simple t-test for statistical significance is making assumptions that the data populations do not follow in this case.
kilodaltonkilodalton on May 10th, 2013 09:20 pm (UTC)
Do you mean the average ratings for season 1 are statistically significantly different from other seasons?

Statistically significantly higher than s7. I didn't compare it to other seasons, I just wanted to see how other seasons compared to s7.
eve11eve11 on May 10th, 2013 09:24 pm (UTC)
Higher how? Average? Looks not. More like pairwise difference episode by episode? s1 is the only one I see on the chart where s7 is lower on a point by point basis. If you don't account for episode by episode there doesn't seem to be a difference in the range. I would also worry about normality assumptions with this data and would do a bootstrap or permutation test before a t-test for significance.
kilodaltonkilodalton on May 10th, 2013 09:33 pm (UTC)
More like pairwise difference episode by episode?

Yup.

I would also worry about normality assumptions with this data and would do a bootstrap or permutation test before a t-test for significance.

... True, but again - if it's good enough for the New England Journal of Medicine or Pharmacotherapy to print that X cancer treatment is better than Y cancer treatment because of a retrospective medical chart review with the statistics based on paired T tests, then it's good enough for me sitting at home bored on a Thursday night trying to figure out silly TV ratings XD

(Maybe they should hire more statisticians? Or maybe we should just try not to get cancer so our MDs don't give us treatment X based on imperfect stats XD)
eve11eve11 on May 10th, 2013 10:00 pm (UTC)
Eh, Medical stats is a rather staid field. A lot of the basic stats they have are basic because they are derived from well-designed and highly controlled experiments, and because the researchers don't know any better. Sample size is also pretty important in using t-tests. I just don't know what it really means that s1 and s7 are pairwise separated, when they both also seem to be lying right smack in the middle of the point cloud. What is the interpretation of this difference in that case? There are only 7 data points per episode in the list: it might be more useful to look at something like relative ranking per episode number? That would include information from the obvious outliers but would mitigate their effect.

But my main caveat with the analysis is extrapolating the quadratic trend when it is pretty obvious from prior seasons that extrapolation in time based on previous performance is not a very good indicator of where the next data point will be. The curves for seasons 1-6 are highly parametric and fitted retrospectively. What would you have predicted for the rest of each season individually, based only on the numbers up to episode 8 or 9? It's going to swing dramatically in the extrapolation, which means that R^2 is not what need concern you for goodness of fit, and I'd be willing to bet that any confidence interval you made for an extrapolated point based on only the previous data would get blown out of the water upon seeing the next point, with that quadratic curve. So for season 7 we really can't say anything convincing until the numbers for the last episodes come back.
kilodaltonkilodalton on May 10th, 2013 10:01 pm (UTC)
I will definitely be updating this as the final BARB numbers keep rolling in - we'll see where it ends up!! =)
eve11eve11 on May 11th, 2013 05:31 am (UTC)
Well you have piqued my curiosity so I scraped the info from the wikipedia pages (air date, writer, director, rating, appreciation index), and for context I am also working on parsing the top 30 episodes per channel scraped from the BARB "top 30" site for all of the weeks Doctor Who aired. I just need to write a little python parser to get everything in a simple pipe delimited format, as it right now has the table headers and week names mixed in (plus when it has "other" categories, it changes the format, bleh).

Let me know if you want the data set when I am finished. It will have air dates, times, ratings, rankings on BBC1, rankings on the other big channels from the BARB top 30 tables, and I will put in labels of other covariates: 2-parters, and also I was curious about how well episodes with famous old school monsters do, comparatively. There will be two tables that can be cross classified as to the leading show, competing shows on different channels, and overall ratings trends per channel for each week.
kilodaltonkilodalton on May 11th, 2013 01:58 pm (UTC)
Wow, sure thing, thanks!
eve11eve11 on May 11th, 2013 09:35 pm (UTC)
Two files:
https://dl.dropboxusercontent.com/u/71925398/format-DWratings.txt

https://dl.dropboxusercontent.com/u/71925398/format-ratings.txt

First is data on the 88 episodes from series 1 through series 7 "Journey to the Center of the TARDIS", with some covariates

Second is the top 30 shows for each of the big channels for that week, based on the BARB top 30 numbers. Both files are pipe ("|") delimited. Date formats should match up.

I didn't do all of the cable channels, only the top ones that were listed based on the BARB "Top 30" lists. But starting in 2008 they keep track of the top 30 "Other" channels on the site, so you can bound the competitors ratings that aren't on this list with the lowest rating on that list.

There is one set of discrepancies between ratings pulled from the wiki (which are the source of the DW-specific data set) and ratings listed on the BARB site. For series 5, the wiki has all of the ratings higher. I checked this out and it is because in 2010, original viewings were split up across BBC HD and BBC1. So if you go to the "Top 10" program lists in the BARB site for the Series 5 weeks, and do a search for "DOCTOR WHO" it will give you two original airings: BBC1 and BBCHD. The big data set lists just the BBC1 airings. The wiki consolidates both of them.
kilodaltonkilodalton on May 12th, 2013 12:55 pm (UTC)
Thanks for these, they'll be interesting to peruse! My updates to this will still be based on my methodology tho. Honestly, I think for our purposes it's enough -- and one of the most important things is making sure to keep it simple and understandable. Occam's Razor. Even with my data as it stands some folks had trouble following it - and I made a point to keep it VERY user-friendly. If I start chucking around too many terms and parabolas nobody is going to understand it lol.
eve11eve11 on May 12th, 2013 01:05 pm (UTC)
Surely that's your prerogative. You can see on my lj why I think the global curves fit to this data is a bad idea. I drew pictures. I tend to try also to keep it simple, which is why I start with informative graphs & EDA and let those drive any hypotheses.
kilodaltonkilodalton on May 12th, 2013 01:09 pm (UTC)
No I totally get it - but honestly I think it's good enough. And what you were saying about lack of statistical significance between s5-7 is what I found too using my methodology. I'm not arguing with yours at all, you're clearly good at what you do, I just think that mine suits the purpose as wel =)