Unstructured data is running our world in more than one way. Whether it is user-generated or file data from genomics, PACS imaging, seismic data, electronic design data, streaming or surveillance video systems, it has the potential to deliver valuable insights for business analysis and decision-making. If harnessed, this data can improve quality of life by helping us understand how things work. But its sheer volume is overwhelming IT organizations everywhere. It’s expensive, hard to manage and sometimes impossible to use. Krishna Subramanian, President and COO of Komprise, talks about these issues on a recent EM360 podcast. We captured the highlights below. You can listen to the entire conversation here.
Let’s get clear on the unstructured data problem. What’s going on?
KS: There is a lot of unstructured data today compared with 10 years ago. Every time we go to the doctor, use our phone, drive a car, it generates all this data. Around 90% of all data collected and stored is unstructured and that is a problem. Most of the systems in IT today were not designed to handle unstructured data. In a database, you might be updating one column or row and you can store that on expensive storage. But unstructured data is so huge and there are billions of files. If you apply the same techniques to manage it along with all the backups, it doesn’t scale on that expensive storage. So that’s why people have been talking about moving data to the cloud because it’s a cost-efficient solution. But how you migrate it will determine if the cloud will be cheaper or more expensive. (Read the white paper: How to Accelerate NAS and Cloud Migrations.)
What are some top considerations when embarking on a cloud data migration project?
KS: If you have 1PB of data, which is maybe 5 billion files, and you need to take all those files and move them to cloud storage it takes time. It may take a few weeks to do that. You don’t want users to not be able to access data for a period of time. So, how do you make a migration non-intrusive and even when data moves to cloud, users don’t see a difference? If 70% of data hasn’t been touched in a year, why not move that first? It’s also important to not affect user experience if possible. Make sure that the data is transparent and that means providing the same link in the same location to access the data as the user has always had.
Describe the two main approaches for migration: storage centric and data centric?
KS: Storage-centric migration is a proprietary storage technique that moves data to the cloud and to get to your data you always have to go through the storage software, whereas data-centric migration is your data moved to the right place such as the cloud in a standard format so you can access the data always without needing any third-party software. The natural solution is to ask your storage vendor if they can do the cloud tiering. Cloud storage gateways were developed a few years ago as another option. They will take your data to the cloud and sync it for you.
On the surface it seems like these solutions help with cost and user experience, but both are storage-centric. They are putting a storage software or hardware in your environment and bringing pieces of files to the cloud, at the block level, and keeping some pieces locally and it is all proprietary. But if you access that data too often you will pay to do that in the form of egress fees. And you have to pay licensing costs forever to the gateway vendor just to get to your own data in the cloud. Also, the cloud is not a cheap storage locker the way storage-centric solutions treat it. It’s an on-demand service, with a lot of compute capability, AI and big data tools. (Read the white paper: Cloud Tiering – Storage-Based vs. Gateways vs Files)
And Komprise brings another option. Can you explain?
KS: The way to use the cloud is to put data into the lowest cost storage possible but when I need it, I shouldn’t have to bring it back into my own data center to use it. I should be able to use the tools and technologies in the cloud to work with the data. The storage-centric solutions lock you in and don’t give you the ability to use the data in the cloud without going through them. This will actually create a more restrictive environment, which can result in 75% higher egress costs and 300% higher data management costs. The data-centric approach, which is what we do, is to take the entire file to the cloud as an object. You can see it from the original storage but you can use it directly from the cloud service without going through Komprise or the original storage. This can reduce cost of the cloud and your total TCO.
Yet, isn’t the age-old storage problem here to stay? After all, data is going to continue to grow and take up more budget one way or another.
KS: Data is taking up more budget, as it should, because organizations in all sectors are becoming data-driven operations. Yet you don’t want to waste money if you can avoid it. No matter what you’re doing with your data, every company should try and avoid vendor lock-in. The options to manage data are continually changing. Amazon began with three classes of storage and they now have 16. At some point you will have to bring all the data back and rehydrate it before moving to another solution and this can be very expensive. We can help you break lock-in and move file data to the cloud in its native format. You can switch vendors including Komprise, as you see fit. It’s all about future-proofing your data.