Choosing the right Apache Iceberg platform: Build, semi-managed or fully managed? [Part 2]

Written by David Tomlins | Dec 10, 2025 10:19:20 AM

Once you understand what Apache Iceberg is and the operational challenges involved (covered in the first blog of this series), the natural next step is choosing how to run Iceberg in your organisation. Although many vendors now support Iceberg, the choices fall into three clear categories:

Fully self-managed / open-source build
Semi-managed platforms
Fully managed Iceberg optimisation platforms (e.g., Qlik Open Lakehouse)

Each approach comes with different costs, trade-offs and operational responsibilities. Below is a pragmatic comparison using real platform examples.

This is Part 2 of a two-part series. Part 1 explains what Apache Iceberg is and the operational challenges teams face in production.

Option 1: Fully self-managed / open-source build

This approach means your organisation builds and operates the Iceberg platform yourself using open-source tools and cloud components. Typical stacks include:

AWS Glue as the Iceberg catalogue
Athena, EMR, Trino or Spark clusters
Airflow or Step Functions for orchestration
Custom scripts for compaction, delete cleanup and optimisation
Open-source Iceberg libraries for integration

This is the most flexible approach, but also the highest engineering and operational burden.

What you must build and manage yourself

Compaction jobs, thresholds, scheduling and conflict avoidance
Delete file cleanup ("delete debt")
Snapshot retention and metadata pruning
Partition evolution processes
Schema evolution governance
Health monitoring, alerts and table metrics
Cost management for compute and object storage
Integration with security, IAM and governance tooling

Who chooses this model?

Organisations with a strong internal platform engineering team
Companies that prioritise full control and open-source transparency
Teams willing to invest in building their own optimisation layer
Businesses with on-prem or hybrid environments that already run Spark/Trino

Cost profile: Development-led

Low software licensing cost
High engineering cost (ongoing salary, headcount, training)
High operational cost for maintenance, optimisation and monitoring
Risk of under- or over-compaction, leading to hidden performance costs
Slow time to value due to build-out requirements

In practice, this approach makes sense when total control is a higher priority than time to value or operational simplicity.

Option 2: Semi-managed platform

The next category includes platforms that support Iceberg and can simplify some aspects of the architecture, but which do not fully automate Iceberg optimisation. These are partially managed solutions: they reduce operational work, but do not eliminate it.

Two major examples:

Snowflake Iceberg Tables
Databricks (Delta + Iceberg interoperability)

These platforms provide:

Built-in compute engines
Managed catalogues
Reliable SQL interfaces
Platform-level optimisation for their execution engine
Security and governance layers
Integration with their own ecosystems

However, they still require Iceberg-specific engineering effort for:

Compaction (Snowflake can perform rewrites but not automatically in all cases)
Managing delete files and retention
Snapshot and metadata pruning
Partition design and evolution
Cost optimisation for compute-heavy Iceberg operations
Orchestration of ingestion pipelines
Cross-engine integration outside the vendor ecosystem

Where they excel

Organisations already heavily invested in Snowflake compute
Teams using Databricks for Spark workloads and lakehouse patterns
Use cases where Iceberg is one part of a broader managed ecosystem
Organisations that want simplicity but still accept vendor dependence

Cost profile: Mixed model

Medium to high software licensing cost (compute + storage + execution)
Lower engineering cost than self-managed
Some operational overhead remains (especially for compaction, retention and ingestion tuning)
Costs can scale unpredictably with compute-heavy operations

This approach suits organisations that want reduced operational burden without fully outsourcing Iceberg optimisation.

Option 3: Fully managed Iceberg optimisation platform

The final category is a platform specifically designed to manage Iceberg for you. Rather than providing a general-purpose data warehouse or Spark environment, these platforms sit directly on Iceberg and provide intelligent, automated optimisation.

Qlik Open Lakehouse is the clearest example of this model.

What Qlik Open Lakehouse provides

Continuous adaptive compaction
Intelligent delete file management
Metadata pruning and snapshot retention
Query-pattern-driven optimisation (not fixed schedules)
Real-time ingestion that avoids small file proliferation
Built-in governance, lineage and cataloguing
Integration with Snowflake, Databricks, Spark, Trino and BI tools
Vendor-neutral architecture anchored in open Iceberg tables

In practical terms, Qlik provides the optimisation logic that most teams struggle to build internally.

This removes the need to design

Compaction thresholds
File sizing strategies
Delete cleanup logic
Schema evolution workflows
Table health monitoring
Data quality pipelines
Indexing and optimisation rules

Who chooses this model

Organisations that want Iceberg without the operational overhead
Teams without large platform engineering headcount
Businesses prioritising faster time to value
CIOs and data leaders aiming to reduce cost and complexity
Companies wanting engines like Snowflake, Trino, Spark, Qlik and AI tools to all share the same Iceberg tables

Cost profile: Software-led

Predictable software cost
Much lower engineering cost
Minimal operational overhead (platform handles optimisation)
Significant savings in compute due to smarter compaction and retention
Faster delivery of data products and analytics

This is the best fit when the goal is to use Iceberg for value creation, not sink time into building platform foundations.

How to choose between the three models

A clear way to frame the decision is by plotting two dimensions:

Engineering capacity
Desire (or need) for control

Self-managed / open-source

High control, high engineering cost, slower time to value
Lower software cost, high long-term support cost

Semi-managed

Medium control, mixed cost model
Vendor convenience with some engineering still required

Fully managed

Lower engineering cost, fast time to value
Predictable software cost, strong optimisation outcomes
Vendor neutral and open-data centric

Another helpful breakdown is:

Self-managed = cheapest software, most expensive people
Semi-managed = balanced software + engineering
Fully managed = more software, far fewer people, lowest operational risk (expect significantly lower compute cost too)

Why open lakehouse is a strategic priority for data leaders

Across industries, data leaders are facing rising cloud costs, increasingly complex data estates, and ambitious expectations around AI. Many organisations have already modernised once, typically by moving to cloud data warehouses or early lakehouse models, but are now encountering structural limitations around cost, flexibility and governance.

Against this backdrop, the open lakehouse, built on open standards such as Apache Iceberg, is emerging as a strategic priority for CIOs, CTOs and CDOs who need a simpler, more flexible, and more economically scalable foundation.

The limits of today's data estates

Most enterprise data platforms have evolved gradually over time, creating complexity that is now difficult to unwind. Common challenges include:

Multiple warehouses, marts and BI extracts containing similar datasets
Tool- and vendor-specific pipelines that duplicate logic and storage
Proprietary storage formats that lock data into a single ecosystem
Governance, lineage and quality that differ across tools and teams
Increasing cloud compute consumption without equivalent business value
Slow onboarding of new domains and use cases due to pipeline sprawl

These issues were manageable when analytics was simpler. But with AI, self-service analytics and near-real-time insight now core to business strategy, the cracks are showing.

The modern data estate must be more open, more governed, and more adaptable than the architectures of the past decade.

The open lakehouse

At an executive level, the summary is straightforward:

An open lakehouse brings warehouse-style governance to low-cost object storage using open standards, especially Apache Iceberg, to ensure that all analytics and AI workloads can operate from the same governed tables, without duplicating data across tools.

Put simply: store data once, use it everywhere.

Why Iceberg and the open lakehouse matter to data leaders

Apache Iceberg provides the table format that makes the open lakehouse model work across multiple tools and clouds. For business and technology leaders, the advantages consolidate into several clear strategic benefits.

Cost efficiency

Iceberg tables are stored in inexpensive cloud object storage, while still delivering warehouse-like structure. This eliminates redundant storage layers, reduces data duplication, and avoids locking data inside high-cost proprietary engines. When paired with intelligent optimisation (see earlier section), compute usage also drops significantly.

Vendor flexibility and lower long-term risk

Iceberg is open and engine-neutral. The same tables can be accessed by Snowflake, Databricks, Spark, Trino, Qlik and emerging AI tooling, without conversion or duplication. This reduces dependence on any single vendor and creates freedom to adopt new capabilities as they mature.

Improved governance and auditability

Iceberg's snapshot model provides full historical traceability. Combined with a central catalogue, this enables consistent access control, lineage, quality checks and regulatory compliance across the estate.

AI-ready by design

AI initiatives rely on consistent, high-quality, well-governed data. Iceberg creates a unified data layer that supports analytics, feature engineering, vector search and model monitoring from the same tables ensuring AI workloads operate on trusted data.

Faster delivery and better use of talent

With one shared data layer, teams no longer rebuild pipelines for each tool. They create reusable, governed data products that can serve BI, AI and operational analytics simultaneously. This improves delivery speed and frees talent to focus on high-value work rather than pipeline duplication.

Architectural stability for the next decade

By decoupling data storage from compute, Iceberg provides a stable foundation that can evolve as tools, vendors and AI paradigms change without replatforming.

Together, these benefits demonstrate why the open lakehouse is increasingly seen as a long-term strategic asset, not a tactical upgrade.

A pragmatic roadmap for data leaders

Most successful open lakehouse journeys follow a similar pattern:

Assess the current data estate: duplication, costs, governance gaps, vendor dependencies
Define a future-state architecture centred on open storage and open table formats
Select the right Iceberg delivery model (self-managed, semi-managed, or fully managed)
Prioritise high-value domains or use cases that can deliver measurable ROI quickly
Automate early, including compaction, retention, lineage and data quality checks
Scale to more workloads, teams and tools once the approach is proven

A foundation for the future

The move to an open lakehouse is not about swapping one vendor for another. It is a structural response to long-standing challenges: fragmented data estates, duplicated storage, rising cloud costs, governance inconsistencies and barriers to AI enablement.

The open lakehouse, powered by Apache Iceberg, consolidates data into a single, governed foundation that scales economically, supports AI, reduces vendor dependence and accelerates delivery.

Iceberg makes this possible. Your chosen delivery model determines how quickly and easily your organisation can unlock the value.

For many data leaders, the open lakehouse is becoming the architectural backbone for the next decade of analytics and AI.

Need additional help?

If you’re exploring Iceberg or modern lakehouse architectures, our experts can help you evaluate the right approach for your business. If you’d like to talk it through, schedule a call with our team.

View full post