Thursday, January 5, 2012

Big Data and Journalism

With the rise of cheap computing and data storage has come the ability to measure and store huge amounts of data.  What's coming along a bit more slowly is the interest in, and ability to make use of all of that data.  Another jump in the ability to make use of all that info came with distributed computing - first with standalone projects like SETI@home, which used the processing power of millions of home computers to process billions of pieces of data (2 billion so far), and now the ability to harness the thousands of virtual computers in the Cloud.

So what is  Big Data and what does it have to do with the future of journalism?  The "Big Data" concept refers to the tools and processes for managing and using large datasets.  The idea of data-driven journalism, has been around for decades, but for the most part been limited to focused use of datasets to answer specific questions.  And, quite frankly, it's been severely limited by most journalist's seemingly inherent antipathy to numbers and math, as well as the decline in investigative journalism.

More recently, the concept of database journalism has emerged.  Unlike data-driven journalism, the idea of database journalism is to aggregate the materials collected by journalists into databases, which can then be used to spot trends or provide local illustrations for local versions of stories.

Neither of these fit the idea of Big Data, however.  What the Las Vegas Sun is doing with data may qualify, though - they exploit the massive amounts of audience metric generated by their online edition to suggest coverage, link to public databases to generate real-time informational maps of things like police reports, real estate listings, and retail hours  for local editions, and used public and online databases to research a story on local healthcare.

But there is the potential for much more - particularly in today's age of big data and huge document dumps (often designed to hide the big stories from easy access).  Only traditional journalism hasn't had a lot of interest in, or ability to exploit, Big Data.  From various Wikileaks dumps to the release of Stimulus-funded projects data, to Sarah Palin's emails, journalists have let others do the analysis and largely just reported what they were told (if they reported it at all).  That's a shame, because there is an unprecedented amount of publicly available information on government activities at all levels, campaign contributions and links between big money and "independent" public interest groups that should make a "watchdog" press drool.  Not to mention how monitoring search engines and social media could alert journalists to emerging issues and hot topics.  (For example, Google does a faster and better job of tracking flu outbreaks than the CDC, simply by monitoring searches for "flu remedies" and "flue symptoms.").

If journalism is to have a future, they need to do more than simply report what others say and do - they need to originate news, add value to stories, and reveal the needle in the haystack.  And doing that through Big Data, through the use and analysis of available information, is becoming easier and cheaper.  Will journalists acquire the interest and skills to do so, or will they leave that to others? (and in doing so render themselves even more irrelevant).

Source -  Big Data: Why All the Fuss?  InformationWeek Global CIO

No comments:

Post a Comment