Dark Data

What is Dark Data?

Dark data is the term used to describe the vast amount of data (primarily unstructured data) that organizations collect, generate, and store but do not actively use, analyze, or leverage for decision-making, business intelligence, analytics, AI or other purposes. This data typically remains untapped or unexplored due to various reasons, such as lack of awareness, inadequate data management processes, or technical challenges.

Gartner defines Dark Data as:

The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.

In the article: 5 Steps for Minimizing Dark Data Risk, the first step to protecting dark data is visibility. (See Komprise Analysis.)

komprise_stats_3Examples of Dark Data

  • Unstructured data: This includes text documents, images, videos, audio files, and other forms of data that are not organized in a structured format like databases.
  • Log files: Many systems generate log files to record events, errors, and other activities, but these logs may not be regularly reviewed or analyzed.
  • Historical data: Older datasets that were collected for specific projects or purposes might no longer be actively used or considered valuable.
  • Redundant or duplicated data: Copies of data that were created for backup or replication purposes but are not actively used. (Sometimes known as Redundant, Outdated, Trivial or ROT data.)
  • Siloed data: Data that is isolated in different departments or systems, making it challenging to access and integrate with other data sources.
  • IoT-generated data: With the proliferation of Internet of Things (IoT) devices, there’s an increasing amount of data being generated, but not all of it is fully utilized.

Dark Data Challenges

Some of the known challenges for the accumulation of so-called Dark Data include:

  • Data storage costs: Storing large amounts of unused data can be costly, both in terms of hardware and cloud storage expenses.
  • Security and privacy risks: Dark data may contain sensitive information that isn’t adequately protected, increasing the risk of data breaches.
  • Missed insights: Valuable insights and opportunities for improvement may be hidden within the dark data, preventing organizations from making data-driven decisions.
  • Compliance and legal challenges: Regulatory requirements may demand proper data management and disposal practices, which dark data may violate.

To address dark data challenges, organizations need to implement better data governance practices, invest in data management tools and infrastructure, particularly unstructured data management, and establish processes to identify, classify, and leverage relevant data both efficiently and effectively. By doing so, they can ensure strong data protection is established while unlocking the potential hidden within their dark data and turn it into valuable insights for better decision-making, strategic planning and the growing opportunity presented by artificial intelligence in the enterprise.

Want To Learn More?

Related Terms

Getting Started with Komprise:

Contact | Data Assessment