Interview: PacBio’s Data Management Journey


Adam Knight has been at Pacific Biosciences, a company that develops genomics sequencers, since 2014. He started his career there in the HPC group and today is the director overseeing IT infrastructure, which consists of HPC compute, storage and networking.

PacBio is one of the early customers of Komprise. It’s great to be still working with your company! Bring us up to speed on your use of Komprise and your overall data management objectives?

AK: We’re still on an aggressive growth trend with data and storage and facing the challenge of keeping as much data online as possible all the time. That’s where we leverage Komprise the most, as people aren’t very good at bucketing or organizing that data in a way that makes it digestible for archiving.

Komprise helps us by looking across our storage and surfacing metrics such as who owns the files and when the data was last accessed.

Genomics is a data-intensive industry. What are some challenges that you face with storage and unstructured data management in this sector?

AK: Having people properly organize their data is always a challenge. Who is generating it, how long does it need to be kept? How important is it? Is it going to the right place? If files can be identified ahead of time, then it’s an order of magnitude easier than trying to identify them later. Other challenges are, do we need to keep data forever? The unclear lifecycle of our data is a challenge.

Read the Case Study > >

PacBio is using Komprise to tier data from your highest performance storage on VAST Data and NetApp to archival storage, which is currently Spectra Logic BlackPearl NAS. How do you create archive policies?

AK: It depends on the type of data. For example, instrument-generated data is a giant classification of data here and we are trying to set a more consistent policy for that data type. Then we’ve got other types of data for secondary analysis and our manufacturing group is looking at data which may not be relevant for as long. We are trying to classify all departmental data so that we can set more granular policies for archiving and deleting.

How about deletion policies? Does this relate to regulatory requirements for data retention?

AK: We’re working closely with our internal customers to set policies for deletion and then using Komprise to target that data and delete it. For a scientist to run a sequence, that costs a certain amount from the sample prep and chemistry to the time it ran on the instrument, from the chips that consumed all of that, which helps determine the value of that data and when we can delete it. As well, researchers may need to keep data if a publication has written about it or that process is reevaluated a couple years later for manufacturing. It’s not a regulatory requirement but determining the value of the data from a future business perspective.

What are your main use cases of Komprise today?

AK: First, visualization. We now have a rapid way to visualize large amounts of data, which means that we can quickly determine the volume of data that is growing and the lifecycle of the data. That information includes how much data we have, how many files, when they were accessed and the types of files. That gives us good data to go to a group whether it’s a department director or VP and say, hey, let’s talk about your data. The second is movement of data from one cost tier to another including deletion. It’s valuable to have Komprise do that in an automated fashion versus myself or someone in my group having to go do it manually.

What’s next?

AK: We want to be more granular with our unstructured data management so that we can save more. We’re excited to use Komprise Deep Analytics as we expand our use cases. I would like Komprise to operate on all of our data so that data lands on Tier 1 storage, a really high-performance tier, and then very quickly is tiered off to less expensive storage based on its importance and use. I also see the value in tagging data based on location, machine, size, research team and so on, which will make it easier to search for specific data sets and create plans around them. I’m sure we will take advantage of that in the future.


Getting Started with Komprise:

Contact | Data Assessment