Data Management Glossary
DICOM
What is DICOM?
DICOM (Digital Imaging and Communications in Medicine) is the international standard for transmitting, storing, retrieving, printing, processing, and displaying medical imaging information. Every medical image generated by a CT scanner, MRI machine, X-ray unit, ultrasound device, or digital pathology slide scanner is created, stored, and transmitted in DICOM format. DICOM is not just an image format — it is a comprehensive communication protocol that standardizes how imaging data flows between modalities, PACS systems, EHRs, AI tools, and storage infrastructure across every healthcare organization in the world.
A single CT scan can produce over 1,000 images, and a busy radiology department may process hundreds of studies daily — all in DICOM format. Multiply that across a health system with dozens of facilities and the result is petabytes of DICOM data accumulating annually, driving some of the largest unstructured data management challenges in enterprise IT.
Why DICOM Data Is a Storage and Cost Management Challenge
DICOM files are large, constantly growing, and subject to strict regulatory retention requirements. Healthcare organizations managing DICOM data face four compounding challenges:
- Volume and growth — health facilities and imaging centers generate billions of DICOM objects annually and that volume keeps growing; a single health system can accumulate 1+ petabytes of DICOM data with no end in sight
- Retention requirements — DICOM studies must be retained for 7 to 10 years for adults and until age 21 for pediatric patients in most US jurisdictions, with some studies retained permanently; this creates a continuously growing archive of data that cannot be deleted
- Storage cost crisis — with DRAM and SSD prices rising 130% by end of 2026 according to Gartner, storing petabytes of DICOM data on expensive primary storage has become financially unsustainable for most health systems; the cost of storing a DICOM study over its full retention lifecycle often exceeds the cost of acquiring it
- AI data preparation — infrastructure gaps in legacy PACS and non-scalable storage systems cannot support the processing and bandwidth demands of AI-enhanced imaging; preparing DICOM data for AI requires much more than storage — it requires the ability to find, classify, curate, and govern specific subsets of imaging data across petabyte-scale repositories
Why DICOM Metadata Is Critical for AI and Data Management
DICOM files contain rich embedded metadata in their headers — patient ID, study date, modality, body part examined, institution name, referring physician, and hundreds of additional attributes. This metadata is what makes DICOM data uniquely valuable for AI training and analytics, but it is also what makes managing DICOM data at scale uniquely complex:
- DICOM metadata is embedded in individual file headers, not in a centralized, queryable database; finding specific studies across a petabyte-scale PACS without specialized tools requires querying each file individually
- DICOM metadata quality is inconsistent; modality-specific attributes, custom tags, and non-standard header values vary across equipment vendors and departments, making cross-system queries unreliable without enrichment
- DICOM standardizes modality communication, image formats, and metadata tagging across vendors at the point of acquisition, but the long-term management, enrichment, and governance of that metadata across storage tiers is not addressed by the DICOM standard itself
How Komprise Manages DICOM Data at Petabyte Scale
Komprise Intelligent Data Management addresses the full lifecycle of DICOM data — from visibility and cost optimization through AI data preparation and sensitive data governance — without disrupting PACS workflows or clinical access. Komprise works alongside existing PACS, VNA, and EHR systems as an unstructured data management layer that operates independently of any imaging vendor:
- Global visibility across DICOM repositories — the Komprise Global Metadatabase, with KAPPA data services, continuously indexes all DICOM files across PACS, NAS, VNA, and cloud object storage, capturing file age, size, modality, access patterns, study date, and custom header attributes across the full imaging estate from a single interface
- Transparent tiering of cold DICOM studies — Komprise can identify DICOM studies stored in enterprise NAS shares that have not been accessed in months or years and tier them transparently to lower-cost cloud or object storage using Komprise Transparent Move Technology; clinicians and PACS systems access studies from their original location with no change to workflow and zero rehydration penalty
- KAPPA for DICOM header extraction — KAPPA data services extract custom attributes from DICOM headers at petabyte scale using serverless processing and a few lines of Python; modality type, body region, study status, patient ID, and institution-specific custom tags are written back to the Global Metadatabase, making the DICOM estate searchable and AI-ready
- AI data preparation for imaging AI — Komprise Smart Data Workflows and Deep Analytics query the Global Metadatabase to find exactly the right DICOM studies for a given AI use case — for example, chest X-rays for male patients over 35 with a specific diagnosis code — reducing a dataset of millions of files to tens of thousands before any data moves; NewYork-Presbyterian used this approach to achieve 10x faster AI ingestion and 96% lower cloud costs for its digital pathology AI program
- PHI protection throughout — Komprise Sensitive Data Management detects and remediates PHI in DICOM files and associated metadata before data reaches AI pipelines, cloud storage, or shared research environments, maintaining HIPAA compliance throughout every data movement and enrichment operation
- Ransomware defense — cold DICOM studies tiered to immutable object storage (AWS S3 Object Lock, Azure Blob with versioning) are protected even if primary PACS storage is compromised, enabling clean recovery without paying ransom (learn more)
What is DICOM and why is it the standard for medical imaging?
DICOM is the international standard for medical imaging that defines how images are formatted, transmitted, stored, and retrieved across healthcare systems. Every imaging modality — CT, MRI, X-ray, ultrasound, digital pathology — generates DICOM files that contain both pixel data (the image itself) and a rich metadata header with hundreds of attributes describing the patient, study, modality, and acquisition parameters. DICOM is the universal language of medical imaging, enabling any compliant system to read, display, and process images regardless of which vendor’s equipment generated them. Its universality is what makes it the foundation for interoperability between PACS, VNA, EHR, and AI systems across healthcare organizations.
Why is managing DICOM data so expensive and how can healthcare IT reduce storage costs without disrupting clinical access?
DICOM data is expensive to manage because it combines three cost-compounding factors: large file sizes, mandatory long-term retention, and a tendency to accumulate cold studies on expensive primary PACS storage indefinitely. With flash and SSD prices rising, the cost of storing decades of DICOM studies on high-performance primary storage is unsustainable for most health systems. Komprise Flash Stretch addresses this directly by identifying cold DICOM studies and tiering them transparently to lower-cost cloud or object storage, reclaiming 70%+ of primary storage capacity without any disruption to clinical workflows or access patterns. Because Komprise moves files with full metadata fidelity and provides transparent access from the original location, radiologists and clinical applications see no change in how they retrieve studies.
How does Komprise extract and use DICOM metadata for AI data preparation?
KAPPA data services use serverless processing to extract custom attributes from DICOM file headers at petabyte scale, writing them back to the Komprise Global Metadatabase as searchable, queryable metadata tags. This enables healthcare IT teams and researchers to run precise queries across billions of DICOM files — finding specific study types, modalities, patient cohorts, and clinical characteristics — without moving the underlying data. Komprise Smart Data Workflows then use these enriched metadata queries to curate exactly the right subset of DICOM studies. For NewYork-Presbyterian’s digital pathology AI program Komprise is able to:
- filter out irrelevant, duplicate, and sensitive studies before they reach the AI pipeline for AI training or inferencing
- deliver 10x faster AI ingestion and 96% cloud cost reduction
How does Komprise protect PHI in DICOM files during storage optimization and AI data preparation?
Komprise sensitive data management with Smart Data Workflows scans DICOM files and their associated metadata for PHI including patient identifiers, medical record numbers, and protected health information embedded in image headers, before any data movement or AI ingestion occurs. Files containing PHI can be automatically excluded from AI workflows, confined to secure storage tiers, or flagged for de-identification by policy. Every scan, classification, and data movement operation is logged with a complete audit trail supporting HIPAA breach prevention reporting and governance reviews. This ensures that PHI never reaches AI tools, cloud analytics platforms, or shared research environments without explicit authorization and documented governance.
Learn more about Komprise for Healthcare
How DICOM Differs from PACS and VNA
DICOM is a standard, not a system. This is the most important distinction to understand in healthcare imaging:
- DICOM is the file format and communication protocol — it defines how medical images are structured, labeled, transmitted, and stored; every imaging device, PACS, VNA, EHR, and AI tool speaks DICOM to exchange imaging data
- PACS is the clinical system that uses DICOM — it receives DICOM files from modalities, stores them, and serves them to viewing workstations and clinical applications
- VNA is the storage architecture that uses DICOM — it archives DICOM files in vendor-neutral format for long-term retention, independent of any specific PACS
- DICOM defines the what (file format and communication rules); PACS and VNA define the how and where (workflow management and storage architecture)
Without DICOM, no two imaging systems from different vendors could communicate. DICOM standardizes modality communication, image formats, and metadata tagging across vendors, making it the universal language that enables a CT scanner from Siemens to send images to a PACS from GE that stores them in a VNA from Philips.
How does DICOM Works with NAS?
NAS (Network Attached Storage) is the most common underlying storage infrastructure for DICOM archives in healthcare. The relationship between DICOM and NAS is straightforward but has significant implications for cost and performance:
- DICOM files are stored as standard files on NAS file systems using NFS or SMB protocols; a NAS volume presented to a PACS or VNA appears as a standard file share where DICOM studies are written and read
- Storage infrastructures have been eliminated from PACS at the hardware and file system level — Direct Attached Storage (DAS), NAS, and SAN — and are supplied instead by non-domain-specific storage vendors; this means NAS vendors like NetApp, Dell, and IBM provide the storage layer while PACS and VNA vendors provide the application layer above it
- The separation of DICOM application layer (PACS/VNA) from NAS storage layer is what makes intelligent data management possible: Komprise can analyze, tier, and govern DICOM files at the NAS level without any PACS-specific integration
- High-performance all-flash NAS is used for active DICOM studies; as studies age, intelligent tiering moves them to lower-cost NAS tiers or cloud object storage while maintaining PACS access via standard file paths
Primary DICOM-Generating Modality Vendors
The major vendors whose equipment generates DICOM files that healthcare organizations must store and manage:
- Siemens Healthineers — CT, MRI, PET, ultrasound, X-ray, digital pathology
- GE HealthCare — CT, MRI, PET/CT, ultrasound, X-ray, mammography
- Philips — CT, MRI, ultrasound, X-ray, cardiac imaging
- Canon Medical — CT, MRI, ultrasound, X-ray
- Hologic — mammography, tomosynthesis, bone density
- Leica Biosystems / Aperio — digital pathology whole-slide imaging
- Fujifilm — X-ray, digital pathology, endoscopy
