Life Sciences & Unstructured Data Management

Labstock_143712208-2048x986This blog is part of an industry series on unstructured data management. Read the first post here.

The life sciences industry has undergone significant industry transformation in recent years. The sector is an $8-10 billion global industry with leaders including Eli Lilly, Pfizer, Johnson & Johnson, Merck and Abbvie.

Pharma and biotech companies have been at the center of global innovation, with rampant revenue growth and demand fueled by the Covid-19 pandemic. Investments in cloud, AI and digital technologies have intensified, delivering groundbreaking changes in how companies develop, test and deliver products to market.

  • Common file types in life sciences include: clinical images, genome sequencing and other instrument data, as well as research documents. These data types don’t work well with traditional data analytics tools; life sciences companies are increasingly moving research data to the cloud to leverage affordable and scalable processing and analytics services for research data, offered by the large cloud providers (CSPs).
  • Common data management challenges: Life sciences organizations can struggle with data silos hampering visibility and collaboration across teams; difficulties searching and securely accessing data exported into cloud-based data lakes and other platforms; continual change in regulations, affecting data practices and too much time–at least 50%– spent on data preparation and deployment, according to IDC. Last but not least, cost optimization is a prevailing sentiment which runs across all IT infrastructure. Since data storage comprises a hefty portion of IT budgets, managing data as pertains to storage, backup and DR costs is a growing priority.

The AI Opportunity

Artificial Intelligence played a significant role in the rapid development of Covid-19 vaccines. The potential for AI to positively impact all operations of life sciences is on the horizon. A few examples include:

  • R&D: Running AI algorithms against large amounts of data to identify compounds that have the potential for development into new therapies will potentially cut years off time to market. AI’s also being used to optimize the design of new medical devices.
  • Clinical trials: Large pharma companies are using machine learning and AI to identify ideal patient populations for their clinical trials and monitor patient outcomes during the trial.
  • There are many other examples of the impact of AI on life sciences here.

Faster, safer development of life sciences products depends upon getting the right unstructured data to the right tools at the right time.

This could be instrument data from laboratory systems, genomics data, diagnostic and monitoring data from patient wearables, and patient demographics and outcomes data from clinical information systems. Organizations also need tools and processes to manage the data risks from AI technologies, which include data quality, data accuracy, data bias and protection of sensitive and regulated data sets.

How Unstructured Data Management Helps

An unstructured data management platform can index data allows users to apply metadata tags such as project, disease type, instrument type and demographics to the files. That way when IT moves files to the cloud, researchers can search on keywords and find what they need without manual digging. Automated workflow capabilities such as Komprise Smart Data Workflows (see diagram below) streamline the process of finding, copying, migrating and/ or tiering data to cloud data lakes and AI tools.


After a project has finished, a researcher can and add tags to the resulting data sets to support new searches and projects. Unstructured data management solutions also help life sciences companies manage the vast expense of data storage, by identifying data that can move to cold data storage. Tools that can effectively migrate data to the cloud and to the right tier of storage based on the data set’s age and value will be imperative—especially with regulations requiring the retention of certain data types for many years.

Case in Point

Pfizer is saving 75% on storage using Komprise to analyze and continuously tier and migrate cold data to Amazon S3 as it ages. Pfizer storage managers and researchers are finding additional benefits from analytics-driven unstructured data management, including zero user disruption and a foundation for delivering self-service to line of business teams. The company is looking to use Komprise further by leveraging Deep Analytics and the Global File Index so that authorized research users can search for their own data and copy or move it to locations for analysis. “You can use Komprise to scan all your data, analyze costs and create business rules and then Komprise will act automatically against those rules,” said a Pfizer IT director.

In our next post in this industry series, we will take a closer look at the healthcare industry, another sector with massive unstructured data challenges and which has been going through enormous transitions in recent years.

Getting Started with Komprise:

Contact | Data Assessment