This blog was adapted from the original article on ITProToday.
The vast majority of enterprise data is unstructured — think audio and video files, medical images, genomics research data, electric cars and the digital exhaust of IoT products. As storage costs comprise more than 30% of IT budgets in most organizations, the cloud has become a cheaper, simpler alternative for unstructured file and object storage.
A survey of U.S. and U.K. IT managers and directors found that more than half (56%) say that moving more data to the cloud is their top priority with unstructured data.
Yet these cloud data migrations are fraught with complexity and risk. Moving large volumes of data to the cloud can result in errors and data loss. They also take an inordinately long time to complete — sometimes months — and may not result in predicted cost savings. For these reasons, enterprise IT teams may delay or forgo cloud file migrations altogether.
The leading cloud data migration issues and decisions include:
- Deciding which unstructured data should move to cloud storage;
- Understanding the different storage tiers and when it makes sense to use lower-cost object storage tiers such as Amazon S3 Glacier Instant Retrieval or Azure Blob and when a higher-performing file storage option like Azure Files or Amazon FSx for NetApp ONTAP is ideal and the process for moving data between storage classes once in the cloud;
- Security — the configuration of cloud storage presents new challenges and complications especially when blending hybrid environments;
- If multi-cloud architecture is in place, deciding which cloud to use for which data and workloads;
- Understanding the potential uses of cloud-native services for machine learning and AI projects and considerations for successfully moving data into those services.
Opportunity Abounds: But Which Cloud and Which File and Object Storage to Migrate To?
As demand for cloud file storage has accelerated, the options for customers are changing continually. While this is great news, it’s also confusing. The major cloud vendors have dozens of classes of file and object storage from which to choose, each with tradeoffs on cost and performance. Plus, there’s always the risk of getting burned on cloud egress fees if users wind up needing to bring that data back out of the cloud more frequently than expected.
The Analysis-First File Data Migration Strategy
Typically, cloud file migrations are executed as lift and shift programs. IT organizations migrate entire file shares and directories to the cloud. You may not be able to get the best cost advantage of the cloud from a one-size-fits-all strategy and lift-and-shift strategies are often “set and forget”. These moves don’t account for long-term plans for unstructured data — such as making data in the cloud available for cloud-based machine learning and AI.
With so much emphasis on data as a strategic lever for competitive advantage and operational efficiencies, it makes sense to institute an analysis-first approach to migrations. Start by getting visibility into data usage and growth — across on-premises, edge and clouds — to understand not only your overall data profile but the requirements of different data sets.
Strive to answer questions these unstructured data migration questions:
- What data do I have and where is it stored?
- What data sets are accessed most frequently (a.k.a. hot data) and which are rarely accessed (a.k.a. cold data)?
- What types of files do we have and which comprise the most storage: a.k.a. image files, video or audio files, sensor data, text data.
- What is the cost of storing these different file types?
- Which types of files should be stored in a higher security level — a.k.a. those containing PII or IP data or belonging to mission-critical projects?
Benefits of an Analytics-First Cloud File Migration Approach
- Cost savings: Based on analysis of unstructured data before you migrate or backup, you may decide to first tier 60% of the data to archive storage in the cloud (like AWS S3 Glacier), and then migrate the remaining 40% to cloud file storage. This can cut down your cloud storage bill significantly.
- Faster migrations with lower risks: By first analyzing data and then migrating or moving by workload, data type or other key value, you can also be more agile: you’ll break a massive and disruptive task into smaller bites which is faster and less risky. An added benefit of granular data sets is the ability to pivot to use new cloud resources as they become available with ease.
- Comprehensive data lifecycle management: With regular analysis running on your data assets, you can continually optimize data over its lifecycle — from expensive hot storage to lower-priced warm storage to cold (rarely if ever accessed) storage and then eventually, deletion.
Migrating Data for the Endgame: Native Cloud Analytics
The major cloud providers now have dozens of services which go far beyond hosting and storage into IoT, DevOps and data lakes. Cloud providers are investing billions into quantum computing, AI and ML to give customers powerful analytics capabilities they’d otherwise need to build and support internally at a high price.
IT organizations will need to carefully consider the data management tools and platforms they are using to tier and migrate data into the cloud, so that they can easily access and move data elsewhere as needed to leverage cloud-native analytics tools. Storage vendors may implement proprietary data formats that prevent direct access to data tiered or migrated to the cloud outside of their own appliance. This approach locks customers into a static storage strategy and prevents access by cutting edge analytics and AI/ML services that access data via open APIs.
Accelerating file data migrations to the cloud can bring a host of benefits, from cost savings and automation to using cloud tools to uncover hidden insights from massive volumes of unstructured data. Maximizing ROI from these migrations requires a nuanced, analytics-based approach using open unstructured data management tools and processes. This will right-place data into the appropriate storage class based on age, usage, compliance needs and/or business priority and allows IT teams to easily move the data again and again as new enterprise storage innovations come to light.