The disciplines of unstructured data management, data storage, data security and AI are colliding. With so much changing all the time, how does one keep up with new practices and trends? Well, by visiting the Komprise Data Management Glossary, of course. We’ve compiled below a list of the most popular 10 terms for the year so far. Each has a short abstract but click on the link for the full description.
AI Compute
AI compute refers to the computational resources required for artificial intelligence systems to perform tasks, such as processing data, training machine learning models, and making predictions. These resources can be provided by various hardware and software platforms, including GPUs, TPUs, cloud computing, and edge computing devices. The amount of AI compute needed depends on the complexity of the AI system and the amount of data being processed.
Data Hoarding
Data hoarding is the common enterprise practice of retaining large amounts of data that is no longer needed or is rarely used, for extended periods of time. In many organizations, employees tend to save data out of habit, fear of losing it, or simply because they don’t know what to do with it. Enterprise IT teams may have retention and deletion policies but they can be hard to enforce. This can lead to overspending in data storage as well as compliance and security risks from large, unmanaged data estates.
Data Management Policy
A data management policy addresses the operating policy that focuses on the management and governance of data assets. This policy should be managed by a team within the organization that identifies how the policy is accessed and used, who enforces the policy, and how it is communicated to employees. Ultimately, a data management policy should guide your organization’s philosophy toward managing data as a valued enterprise asset. With automation, IT can “set and forget” the policy to ensure continuous adherence to policies.
Data Tiering
Data Tiering is a technique of moving less frequently used data, also known as cold data, to cheaper levels of storage or tiers. The term “data tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems. The glossary page goes into further detail about different types of tiering and how to avoid common problems.
Data Tagging
Data tagging is the process of adding metadata to your file data in the form of key value pairs. These values give context to your data, so that others can easily find it and search and execute actions on it, such as move to confinement or a cloud-based data lake. Data tagging is valuable for research queries and analytics projects or to comply with regulations and policies.
Learn more about automated data tagging with Komprise.
Metadata
Metadata means “data about data” or data that describes other data. The prefix “meta” typically means “an underlying definition or description” in technology circles. Metadata makes finding and working with data easier, allowing the user to sort or locate specific documents. Some examples of basic metadata are author, date created, date modified, and file size. Metadata can be stored and managed in a database, however, without context, it may be impossible to identify metadata just by looking at it. Metadata is useful in managing unstructured data since it provides a common framework to identify and classify a variety of data including videos, audios, genomics data, seismic data, user data, documents, logs.
Read our CEO’s two-part blog series on metadata management.
Petabyte
A petabyte (PB) is a unit of data storage that represents 1,000,000,000,000,000 bytes or 10^15 bytes. It is 1000x larger than a terabyte (TB) and one million times larger than a gigabyte (GB). Petabytes are commonly used to describe the capacity of large-scale data storage systems, run by data heavy industries such as those used in scientific research, big data analytics, and cloud computing. For example, a single petabyte could store over 200 million 5 MB photos, or about 13.3 years’ worth of HD video content. There are 1,000 petabytes (PB) in a zettabyte (ZB).
Petabyte Comparison Examples
- A typical HD movie is about 4-5 GB in size. A petabyte could store around 200,000 HD movies.
- An average MP3 song is about 5 MB. A petabyte could hold approximately 210 million songs.
- A 1 terabyte hard drive can store around 250,000 photos. A petabyte could hold about 256 million photos.
Rehydration
Rehydration is the process to fully reconstitute files so the transferred data can be accessed and used. Block tiering rehydrates any data accessed from the cloud. This requires that there be space to accommodate some percent of cold data, which in turn reduces the potential cost savings. Since the cold data is tiered to the cloud in a proprietary format, when it is time to decommission your storage array and replace it with a new one you must stay with the same vendor. If you elect to change vendors, you will have to rehydrate all of the data back to the original storage array and then migrate that data to the new storage array and then tier that data using some other tiering solution. No rehydration is needed with Komprise, which uses file-based tiering.
Secondary Storage
Secondary storage devices operate alongside the computer’s primary storage, RAM, and cache memory and can hold from megabyte volumes of data to petabytes. These devices store almost all types of programs and applications. This can consist of items like the operating system, device drivers, applications, and user data. For example, internal secondary storage devices include the hard disk drive, the tape disk drive, and compact disk drive. Secondary storage is typically designed for long-term storage of less-active data and is often orders of magnitude cheaper than primary storage.
Symbolic Link
Symbolic Links, also known as symlinks and symbolic linking, are file-system objects that point toward another file or folder. These links act as shortcuts with advanced properties that allow access to files from locations other than their original place in the folder hierarchy by providing operating systems with instructions on where the “target” file can be found. Komprise uses the standard, built-in feature of Windows, Linux, and Mac symbolic links, which replace a file with a tiny pointer to another location. By using Dynamic Links inside the standard symbolic link, Komprise extends the file system to call these files from the cloud or other storage systems.
Read more about Transparent Move Technology.
Unstructured Data Management
Unstructured data management is a category of software that has emerged to address the explosive growth of unstructured data in the enterprise and the modern reality of hybrid cloud storage. Data storage and data backup technology vendors are now recognizing the importance of unstructured data management as data outlives infrastructure and as data mobility is needed to leverage cloud data storage. Unstructured data management must be independent and agnostic from data storage, backup, and cloud infrastructure technology platforms.
Check out the 2024 Komprise State of Unstructured Data Management to learn about the latest trends in this area.