Shedding Some Light on Dark Data


There has been an explosion of discussions about the importance of dark data to businesses looking to mine their data resources for competitive advantage. What is it? Why is it important? In a recent blog post, technology writer Isaac Sacolick offers a business definition of dark data:

Dark data is data and content that exists and is stored, but is not leveraged and analyzed for intelligence or used in forward looking decisions. It includes data that is in physical locations or formats that make analysis complex or too costly, or data that has significant data quality issues. It also includes data that is currently stored and can be connected to other data sources for analysis, but the business has not dedicated sufficient resources to analyze and leverage. Finally (and this may be debatable), dark data also includes data that currently isn't captured by the enterprise, or data that exists outside of the boundary of the enterprise.

Elsewhere, an article on IT World , notes that dark data is ubiquitous:

Every enterprise accumulates dark data. Companies don't try to hoard this unanalyzed information, it just happens because it's created almost everywhere. Servers in data centers generate an enormous trove of largely untapped log file data. Manufacturers' shop floor control systems and robots produce dark data as well as widgets. Little of the data from a retailer's point of sale system gets mined. Information from diagnostic equipment in intensive care units is generally ignored. The list goes on.

Consultant Matt Hunt posts on his blog that more leaders should focus on dark data, because it can lead to significant insights. He cites a Wired article that introduced him to the concept. It contains the long refutation of the now debunked claim that coffee consumption was linked to pancreatic cancer (it took 20 years), which could have been accomplished years earlier had scientists examined the dark data of other studies that were looking for other relationships

Hunt notes that data often becomes dark because it fails to fit into expectations as it emerges, and that this is true for business as well as scientific organizations. While working as a data analyst for a major retailer, he realized the company could glean insights by paying more attention to its failed projects— by learning from its dark data. “A company may have spent thousands or even millions of dollars on their latest innovation initiative,” he says. “If that initiative fails, few people within the organization will know the details of why.”

He suggests that companies would do well by analyzing their failures, not just casting them to oblivion. Companies can bring light to dark data by more thoroughly analyzing their failures: What was accomplished? What was learned? What would have been done differently on retrospect? Pursuing this kind of formal post mortem on failed initiatives could help keep valuable data from going dark.

If, as a recent post suggests on , dark data is more important than big data—or simply if its potential is better understood—Hunt’s recommendation might be considered as a best practice for businesses seriously investing in data analysis.

Posted by the Epicor Social Media Team


Share on...