Avoid Data Lock-In: IT Infrastructure and Operations Best Practices

datalock-inbloginterview_resource_thumbnail_800x533

This article has been adapted from its original version on BuiltIn.

Enterprise IT teams today must walk a tightrope between maintaining their long-standing storage vendor relationships and creating an environment of flexibility and portability so they can attain maximum value from their unstructured data, especially for AI.

Unlike traditional workloads, AI thrives on reuse and cross-pollination of unstructured data sets. A single corpus of information may inform dozens of applications, from predictive maintenance to fraud detection to personalized customer experiences. Locking data in a proprietary system slows progress.

Why Does Data Lock-In Happen?

Data lock-in happens for several reasons:

When data is stored in unique ways that only a particular vendor’s tools or APIs can interpret, moving files to another environment is far from straightforward.
When IT uses the storage vendor’s tiering technology, it stores the tiered data as proprietary blocks that only their file system can read, creating hidden dependencies and complicating migration or reuse.
Many cloud providers charge egress fees when moving data out of their environments.
Workflows that depend on a vendor’s APIs, caching mechanisms or specific interfaces are also disruptive when moving data.
Finally, contractual terms often exacerbate the issue, as long-term agreements or restrictive licensing clauses may lock enterprises into a vendor for years.

Why Data Lock-In Is a Problem Now More Than Ever

Although vendor lock-in of data has always been a concern, the stakes are much higher today. The sheer growth of unstructured data is one factor. Enterprises now store petabytes of information in the form of images, video, documents, logs and design files. This data has now become a potential source of competitive differentiation, especially as organizations explore new use cases and applications for AI.

Budget and performance pressures add another layer of urgency. You can save tremendously by offloading cold data to lower-cost storage tiers. Yet, if retrieving that data requires rehydration, metadata reconciliation or funneling requests through proprietary gateways, the savings are quickly offset. Finally, the rapid evolution of technology means enterprises need flexibility to adopt new tools and services. Being locked into a single vendor makes it harder to pivot as the landscape changes.

What are Some Strategies to Prevent Data Lock-In?

Preventing data lock-in requires intentional design. Here’s what to consider:

Transparent Tiering

Ideally, when files are relocated from expensive primary storage to more economical secondary or object storage, the change should be invisible to end users. Files should appear to remain in place and be accessible without requiring agents, stubs or specialized clients. Komprise Transparent Move Technology (TMT) ™ uses symbolic links to avoid user and application disruption that normally occurs with proprietary tiering technology. Users simply click on the original link to access their data.

File-Object Duality

When data lands in object storage, it should remain accessible through traditional file system interfaces and standard object APIs from any vendor, not only the original file system. Komprise TMT allows organizations to run AI and analytics directly on the data without first moving it back into a file system and doesn’t sit in the hot data path which impacts performance.

Attributes and Permissions

Another top practice is to preserve metadata, directory structures and file system semantics when data is moved. This prevents disruption for applications that rely on permissions, timestamps or directory hierarchies.

Agile Mobility

Enterprises should also track the true cost of mobility by running regular tests or simulations of data migrations to understand the technical and financial barriers they might face if they needed to move a lot of data fast.

Global Visibility

Finally, organizations should be able to see all data across their storage environments so they can classify data effectively, apply policies for movement and ensure that data sets remain portable. Read about the Komprise Global Metadatabase.

How to Balance Avoiding Lock-In With Other IT Infrastructure Priorities

It’s important to balance lock-in avoidance against other priorities such as beneficial vendor partnerships, operational simplicity and cost-performance considerations. Longstanding vendor relationships often provide stability, support and volume pricing discounts.

The more pragmatic approach is to partner deeply while insisting on open standards and negotiating agreements that preserve data mobility. Data lock-in is more than a technical concern. It threatens agility and competitiveness in the AI era. IT infrastructure should allow data to move freely, transparently, and without disruption.

Data Lock-In Is Killing AI Agility

Why Does Data Lock-In Happen?

Why Data Lock-In Is a Problem Now More Than Ever