In this Data on the Move discussion, Darren and Kumar discuss KAPPA data services, the Global Metadatabase and many use cases.
_______________________
Data on the Move: KAPPA Data Services
In this episode of Data on the Move, Komprise Co-founder and CEO Kumar K. Goswami and Darren Cunningham discuss KAPPA Data Services.
Visit the KAPPA Data Services Library
- KAPPA DICOM Metadata Extraction (YouTube)
- KAPPA EXIF Metadata Extraction (YouTube)
- KAPPA FASTQ Metadata Extraction (YouTube)
KAPPA data services allow rapid development of custom data functions by automating infrastructure and execution across large unstructured datasets.
ETL and other traditional approaches of data processing via pre-built connectors and plug-ins are time-consuming to create, inflexible, and costly to update. With KAPPA, you can create custom data services to meet any requirement in just hours, not months. This serverless compute offering for unstructured data allows IT and data experts to focus on the per-file function without having to provision or manage the infrastructure to process the operation across large datasets.
Learn more about KAPPA data services: komprise.com.ai/kappa
KAPPA FAQs
What makes KAPPA data services different from ETL tools for unstructured data preparation?
Traditional ETL tools require building a separate connector for every data source or machine type, and those connectors are brittle and expensive to maintain. Kumar Goswami, co-founder and CEO of Komprise, puts it plainly in this interview: building individual connectors to each imaging vendor like GE or Philips takes significant effort and breaks when those systems change. KAPPA data services take a fundamentally different approach. IT and data experts write a few lines of Python for the per-file operation they need, and Komprise handles all infrastructure provisioning and execution across the full dataset. A custom metadata extraction job that would take months with ETL is complete in hours. The focus stays on the data logic, not the plumbing.
Can KAPPA data services handle industry-specific metadata requirements without vendor connectors?
Yes, and that is precisely the problem KAPPA was designed to solve. As Kumar describes it, consider a healthcare organization working with DICOM files from multiple imaging systems. Rather than building machine-specific connectors, a data expert writes roughly 15 lines of Python using PyDICOM to extract the relevant field, such as body part examined, and tag it directly into the Komprise Global Metadatabase. Komprise then runs that function automatically across all DICOM files in the environment. The same pattern applies in media and entertainment (EXIF metadata from digital assets), pharmaceutical, life sciences, and genomics (ELN project context, FASTQ sequencing metadata), and financial services (ERP project codes and budget IDs). Every organization customizes what gets extracted without waiting on a vendor roadmap.
Visit the KAPPA Data Services Library.
How does KAPPA fit into a broader AI data workflow?
KAPPA data services operate as part of Komprise Intelligent Data Management, not as a standalone tool. The full workflow often starts with PII and PHI detection through Smart Data Workflows, which finds sensitive data across unstructured file and object silos and lets security teams set policies before data moves anywhere. KAPPA then enriches that data in place with custom metadata. The enriched metadata is stored in the Komprise Global Metadatabase, where data scientists and researchers can use Deep Analytics to query across all their unstructured data by file system metadata and custom tags, without opening or scanning file content, to identify and curate precisely the right datasets for AI. From there, Komprise Intelligent AI Ingest delivers those curated datasets directly to AI pipelines with governance and repeatability, so AI teams get the right data without having to go find it themselves. Agentic AI workflows can also invoke KAPPA functions directly on demand, for example tagging all files for a customer journey with a reservation number, or preparing prior art for a grant-writing agent using ERP project codes. Each step connects without requiring a separate pipeline build.
Does KAPPA require data to be moved before metadata can be extracted?
No. KAPPA functions execute against data where it lives across hybrid storage environments, including on-premises NAS, cloud object stores, and SaaS platforms. Data does not need to be copied, staged, or migrated before enrichment runs. This eliminates the latency and cost of moving petabytes before an AI pipeline can use it. Komprise also handles automated pre- and post-processing around each workflow, including starting a cloud AI service before processing begins and shutting it down when the job completes, so there are no idle compute costs between runs.
Why does enterprise unstructured data need custom metadata enrichment rather than generic tagging?
As Kumar frames it, the unstructured data that has been accumulating in enterprise file stores for decades is now a goldmine for AI, but it is incredibly hard to tap into because almost every organization has unique requirements for what that data means. See Dark Data. Generic tagging does not capture the context that makes data actually useful. A DICOM file needs its body part and modality extracted in a format that matches the research team’s classification scheme, not a generic label. A media asset needs production metadata that was lost when the file was stored. A research file needs the ELN project number that connects it to a grant. KAPPA data services exist because metadata for unstructured data is contextual and enterprise-specific, and no pre-built connector library can cover every organization’s requirements at the speed AI programs demand.
_______________________
