Getting ready to live blog David Weisz (data journalist and digital producer at the Toronto Star) talk on Data Journalism
"Dirty Data and Common Mistakes" is the name of David Weisz presentation
"All journalism with numbers is data journalism" says Weisz
"Never try doing calculations when your eyes are heavy- you WILL make mistakes- do it when you're fresh" David Weisz
The Ontario Film Review Board watches all film, including pornography. It is supposed to operate as cost recovery, but going through their annual reports and database, Weisz discovered they were making over $2 million net annually. A large amount from pornography.
All data sets are dirty- you have to take all data with a grain of salt, mistakes are made and its your job to catch them.
Weisz says don't be a cowboy (/cowgirl) about getting Data. Exhaust all available resources (Freedom of Information, Annual Reports, Communications people). Relying primarily on scraped data is a great way to be wrong...
Weisz exposed that the Ontario Film Review Board (which is supposed to operate as cost recovery) makes over $2 million net annually, a large amount of which from screening pornography.
Weisz discovered a few problems with the numbers provided to him while researching- his online database said there are 1000+ more films reviewed than were included in the annual reports. The answer was in different administrative processes due to VHS/DVD.
Step two: Let Common Sense Prevail-
Don't fall in love with your data, scrutinize it and embrace your fear.
"Look for outliers" sometimes mistakes are made in data entry, use common sense, identify outliers, and determine whether they affect the overall story.
Weisz ended up speaking to the CEO of "Pure Play Broadcasting" who spent over $2.3 million just to get Ontario to review pornographic films. Another way to corroborate your findings is ask those affected directly "did this happen to you?"
1. Make sure you have a solid index
2. Get as much context for your data as possible
3. Know the weaknesses of your data, and be prepared to defend them on Twitter/Canadaland
4. Don't do calculations when you're sleepy
Thanks to David Weisz for a great session on Data Journalism!