Data Management Glossary

Back

Apache Iceberg

What Is Apache Iceberg

Apache Iceberg is an open source table format that brings database style reliability to data stored in cloud object storage such as Amazon S3, Azure Blob Storage, and Google Cloud Storage. It adds ACID transactions, schema evolution, time travel, and hidden partitioning to files that would otherwise be unmanaged collections of Parquet, Avro, or ORC files. Query engines including Apache Spark, Trino, Snowflake, Databricks, and Amazon Redshift can all read and write Iceberg tables, so a single copy of data can be queried consistently across tools instead of being locked into one vendor’s proprietary format. Iceberg originated at Netflix and is now maintained as an Apache Software Foundation project, and it has become the default open standard for building a data lakehouse, a storage architecture that combines the low cost and flexibility of a data lake with the transactional guarantees of a data warehouse.

Learn more

Why Apache Iceberg Matters for Unstructured Data

Iceberg was designed for structured and semi-structured records, but most enterprise data is neither. Files, images, video, genomic sequences, and sensor logs make up the majority of enterprise storage, and Gartner estimates this unstructured data is growing at 55% to 65% annually, roughly three times the growth rate of structured data. This data has historically lived outside the lakehouse entirely, siloed in file shares and object buckets that data engines cannot query directly. Over 80% of enterprise data is unstructured, yet less than 1% of it has ever reached AI or a data lakehouse, largely because there has been no practical way to represent file and object metadata as Iceberg tables without a costly bulk migration first.
Source: Gartner, as cited by Komprise
Source: Komprise Delivers Query-Ready Enterprise Unstructured Data to AI and Data Lakehouses Without Moving a Single File, Komprise

Apache Iceberg and the Case for Unstructured Data Management

Making unstructured data queryable in Iceberg format takes more than pointing a connector at a file share. Enterprise file and object data spans NAS systems, cloud tiers, and multiple storage vendors, and most of it has no consistent metadata, tags, or classification applied to it. A data platform team building Iceberg tables directly from raw file systems first has to solve indexing at scale across billions of files in different silos, enrichment that adds the business context and PII flags that make the data usable, and currency to keep the tables in sync as files change, before the data is trustworthy enough for AI or analytics workloads. This is why unstructured data management, not just table format adoption, determines whether an Iceberg based lakehouse strategy actually extends to the majority of enterprise data.

How Komprise Brings Unstructured Data Into Apache Iceberg

Komprise Transparent File Tables expose the Global Metadatabase, the continuously updated index of file and object metadata across NAS and cloud storage, directly as Apache Iceberg tables that Databricks, Snowflake, and other Iceberg-compatible engines can query with no migration project and no copying of the underlying files. For the full technical architecture, including how the Global Metadatabase, KAPPA data services, and Smart Data Workflows work together to enrich and govern that metadata before it reaches an Iceberg table, see the Transparent File Tables glossary page.

Apache Iceberg Frequently Asked Questions

How is Apache Iceberg different from a traditional data lake?

A traditional data lake stores files with no transactional guarantees, so concurrent reads and writes can produce inconsistent results and schema changes require rewriting data. Apache Iceberg adds a metadata layer on top of the same low cost object storage that provides ACID transactions, safe schema evolution, and the ability to query data as of a specific point in time, which is why it is described as bringing database reliability to the data lake.

Does adopting Apache Iceberg require moving unstructured data?

No, not when the unstructured data is exposed as Iceberg tables through metadata rather than migrated file by file. Komprise Transparent File Tables generate Iceberg tables directly from the Global Metadatabase, so file and object data becomes queryable in place.

Is Apache Iceberg widely adopted for AI and analytics workloads?

Yes. In a January 2026 survey of 252 senior data leaders running Iceberg in production, 58% reported running business critical analytics on Iceberg and 95% reported using or planning to use Iceberg for AI and machine learning workloads.
Source: The State of Apache Iceberg in the Enterprise (2026), Ryft and TrendCandy

How does Apache Iceberg relate to a data lakehouse?

A data lakehouse combines the cost and flexibility of object storage with warehouse style querying and governance, and an open table format is the technical foundation that makes that possible. Nearly three-quarters of technology leaders in a Databricks-sponsored survey report having already adopted a lakehouse architecture, with the remainder planning to within three years.
Source: Databricks lakehouse adoption survey, as reported by Red Oak Strategic

Do Databricks and Snowflake both support Apache Iceberg?

Yes. Databricks supports reading and writing Iceberg tables through Unity Catalog and the Iceberg REST Catalog API, and Snowflake supports Iceberg tables using either Snowflake as the catalog or an external catalog. Because both platforms support the same open Iceberg standard, tables exposed through Komprise Transparent File Tables can be queried from either environment without a separate integration for each.
Source: What is Apache Iceberg in Databricks?, Databricks
Source: Apache Iceberg Tables, Snowflake

What is the difference between Apache Iceberg, Delta Lake, and Apache Hudi?

All three are open source, Apache licensed table formats that add transactional guarantees to data stored in object storage. Iceberg has the broadest support across cloud providers and query engines, Delta Lake remains strongest inside Databricks and Spark centric environments, and Hudi is built for high frequency updates and streaming ingestion.

Related Terms: Transparent File Tables, Data Lakehouse, Global Metadatabase, Transparent Move Technology, Metadata Intelligence, Unstructured Data Management

Want To Learn More?

Data Management Glossary

Apache Iceberg

What Is Apache Iceberg

Why Apache Iceberg Matters for Unstructured Data

Apache Iceberg and the Case for Unstructured Data Management

How Komprise Brings Unstructured Data Into Apache Iceberg

Apache Iceberg Frequently Asked Questions

How is Apache Iceberg different from a traditional data lake?

Does adopting Apache Iceberg require moving unstructured data?

Is Apache Iceberg widely adopted for AI and analytics workloads?

How does Apache Iceberg relate to a data lakehouse?

Do Databricks and Snowflake both support Apache Iceberg?

What is the difference between Apache Iceberg, Delta Lake, and Apache Hudi?

Related Terms

Getting Started with Komprise:

Platform

Industries

Use Cases

Resources

Company

Resellers