Enrico Signoretti is an internationally renowned expert author, blogger, and speaker on data storage. He has tracked the evolution of the storage industry for years as a GigaOm Research Analyst, an independent analyst, and as a contributor to The Register.
In the recent GigaOm Radar for Unstructured Data Management, you segmented the results into business vendors versus IT focused vendors. Can you explain the difference?
The difference is that infrastructure-focused solutions look at aspects of the infrastructure that are more related to storage management than from the data perspective–so we are talking about tiering and about the TCO of the infrastructure. These solutions don’t look inside the data itself but show customers how to save money and understand how much data they have and what people are doing with it. Business focused solutions, on the other hand, look into the data or at least go beyond the metadata. Think about, for example, the Chief Data Officer (CDO) which is a role that is now very common in Europe. The CDO doesn’t really need to know where the data is physically stored but this individual does want to know if the organization is storing personal information or files that they shouldn’t store anymore, because of policy or regulations or whatever. The business unstructured data management solutions are much more focused on delivering business insights and retaining compliance and governance information.
Are you seeing in practice an evolution where organizations are looking at data management as a different discipline than storage management?
It really depends on the size of the data under management and the type of company. The very large companies have regulations, they have more needs than in the past and they are creating huge amounts of data. They are recognizing that they really need to understand all this data. In many cases, smaller organizations just need to save money. There is, however, a general maturation in the market.
Even a few years ago, large organizations weren’t that interested in managing unstructured data. Now with the rate of data growth, large organizations are saying that they need to manage data more strategically.
They may learn that they have customer data stored in a remote location that’s in text files where anyone can see it. Before ransomware is detected, bad actors are looking at the data before encrypting it, they are copying it, and all this information is trickling away. As a result, ransomware has been another strong driver for organizations to invest in unstructured data management.
Following on that, are you seeing greater collaboration between data professionals and storage professionals or is there still a wall between these parties?
When it comes to business-focused infrastructure data management solutions, storage people must talk to other personas in the organization. CDOs in some companies have large budgets and they are craving solutions for unstructured data management. The environment is very complex in Europe with all our regulations, such as GDPR and privacy, and this reality makes collaboration with the CDO and data teams critical. In general, there are skilled engineers who have deep knowledge of storage systems, but with all the automation that many vendors have introduced this is less important somehow. Storage professionals today need to learn more about the data they’re storing and about how the applications interact with the data. They need to understand how users are working with this data over how the single bits are stored in a repository. To gain this insight, it is becoming imperative for storage managers to work closely with departments and users.
Do you see growing adoption of cloud data services such as AI/ML on data sets in the cloud?
There are a lot of enterprise IT teams who are starting to think about using AI and ML more, but it’s not as easy as some of the cloud vendors show us. To make it relevant for your business, you have to select the right amount of data and move it to the cloud. A client of mine has thousands of lawyers distributed across the world and they produce documents every day. They want to use AI and ML to scan these documents for any kind of mistakes or formatting errors which can create issues later, but how are they going to select only the relevant documents? If they want to run analysis on all files that include a contract of a certain type, you just can’t do it without automation using the proper data management tool.