Biosciences Leader Manages Explosive Sequencing Data Growth and Saves 60%

“Projecting a three-year 28X data increase, we could knock out walls and build more data centers, but for what? We needed a platform on which to grow our business.”
Jay Smestad, Senior Director of Information Technology, PacBio

Pacific Biosciences (PacBio) produces genomic sequencers that are accelerating the pace of genomic discovery—while creating a deluge of data through manufacturing, testing and R&D. The company’s technology is used in human biomedical research, plant and animal sciences and microbiology.

“As we iterate our technology, we are looking at a 28X increase in data compounded over three years,” says Jay Smestad, Senior Director of Information Technology at PacBio. “We could knock out walls and build more data centers, but for what? We needed a platform on which to grow our business.”

The Solution

With over 10 petabytes of data and growing, PacBio turned to Komprise Intelligent Data Management to gain insight into data growth and usage across its heterogeneous storage environment. Komprise provides a real-time analysis of PacBio’s environment, showing how data is growing, which data is active and which data can move to archival storage for cost savings.

Using Komprise’s analysis, PacBio is tiering at least 50% of its data to secondary or tertiary storage and has plans to increase that percentage soon.

“Most people aren’t very good at organizing data in a way that makes it digestible for archiving,” says Adam Knight, Director of Infrastructure at PacBio. “Where Komprise is most useful today is in delivering metrics, such as the types of files we have and when they were last accessed. This helps us make data management decisions with our researchers who generate most of our data.”

PacBio is also working closely with its internal customers to set policies for deletion and then using Komprise to target that data and delete it. As the company has progressed its use of Komprise, it’s looking at expanding its tiering and retention policies. For instance, manufacturing data may have shorter retention policies then perhaps researchers, whom often want to retain data for as long as possible in case it may have value for scientifical publication or to repeat a process. “We’re developing more classifications so that we can set granular policies for archiving and deleting across our environment,” Knight explains.

The Results

  • Using Komprise, PacBio has a powerful solution for data visualization: “We now have a rapid way to visualize large amounts of data, which means that we can quickly determine the volume of data that is growing and the lifecycle of the data,” Knight says. “That gives us good data to go to a group whether it’s a department director or VP and say, hey, let’s talk about your data.”
  • PacBio is saving time with an automated way to move data. PacBio IT employees do not have to move data manually nor write scripts to handle that. Komprise handles all the movement automatically, according to policies that users create.
  • Reduced costs by 60% through moving storage to a lower tier. “In the future, I would like all of our data to land on a really high-performance tier and to very quickly be tiered off to less expensive tiers of storage based on its importance and use,” Knight says.

Looking Ahead

Smestad is already preparing for the coming data storm from their next generation of technology—starting with the object storage and cloud. Komprise integrates with the comprehensive metadata capabilities of object storage to make tagging and finding data faster and easier, providing new insights into data.

“How do you search data when you have six, ten, or a hundred petabytes?” asks Smestad. “Komprise’s Deep Analytics and its Global File Index across file and object storage is exciting because it gives us a tool to do that. Our partnership with Komprise will enable us to better support the business even as we grow.”

