Ometis Blog

Choosing the right Apache Iceberg platform: Build, semi-managed or fully managed? [Part 2]

Written by David Tomlins | Dec 10, 2025 10:19:20 AM

Once you understand what Apache Iceberg is and the operational challenges involved (covered in the first blog of this series), the natural next step is choosing how to run Iceberg in your organisation. Although many vendors now support Iceberg, the choices fall into three clear categories:

  1. Fully self-managed / open-source build
  2. Semi-managed platforms
  3. Fully managed Iceberg optimisation platforms (e.g., Qlik Open Lakehouse)

Each approach comes with different costs, trade-offs and operational responsibilities. Below is a pragmatic comparison using real platform examples.

This is Part 2 of a two-part series. Part 1 explains what Apache Iceberg is and the operational challenges teams face in production.

Option 1: Fully self-managed / open-source build

This approach means your organisation builds and operates the Iceberg platform yourself using open-source tools and cloud components. Typical stacks include:

  • AWS Glue as the Iceberg catalogue
  • Athena, EMR, Trino or Spark clusters
  • Airflow or Step Functions for orchestration
  • Custom scripts for compaction, delete cleanup and optimisation
  • Open-source Iceberg libraries for integration

This is the most flexible approach, but also the highest engineering and operational burden.

What you must build and manage yourself

  • Compaction jobs, thresholds, scheduling and conflict avoidance
  • Delete file cleanup ("delete debt")
  • Snapshot retention and metadata pruning
  • Partition evolution processes
  • Schema evolution governance
  • Health monitoring, alerts and table metrics
  • Cost management for compute and object storage
  • Integration with security, IAM and governance tooling

Who chooses this model?

  • Organisations with a strong internal platform engineering team
  • Companies that prioritise full control and open-source transparency
  • Teams willing to invest in building their own optimisation layer
  • Businesses with on-prem or hybrid environments that already run Spark/Trino

Cost profile: Development-led

  • Low software licensing cost
  • High engineering cost (ongoing salary, headcount, training)
  • High operational cost for maintenance, optimisation and monitoring
  • Risk of under- or over-compaction, leading to hidden performance costs
  • Slow time to value due to build-out requirements

In practice, this approach makes sense when total control is a higher priority than time to value or operational simplicity.

Option 2: Semi-managed platform

The next category includes platforms that support Iceberg and can simplify some aspects of the architecture, but which do not fully automate Iceberg optimisation. These are partially managed solutions: they reduce operational work, but do not eliminate it.

Two major examples:

  • Snowflake Iceberg Tables
  • Databricks (Delta + Iceberg interoperability)

These platforms provide:

  • Built-in compute engines
  • Managed catalogues
  • Reliable SQL interfaces
  • Platform-level optimisation for their execution engine
  • Security and governance layers
  • Integration with their own ecosystems

However, they still require Iceberg-specific engineering effort for:

  • Compaction (Snowflake can perform rewrites but not automatically in all cases)
  • Managing delete files and retention
  • Snapshot and metadata pruning
  • Partition design and evolution
  • Cost optimisation for compute-heavy Iceberg operations
  • Orchestration of ingestion pipelines
  • Cross-engine integration outside the vendor ecosystem

Where they excel

  • Organisations already heavily invested in Snowflake compute
  • Teams using Databricks for Spark workloads and lakehouse patterns
  • Use cases where Iceberg is one part of a broader managed ecosystem
  • Organisations that want simplicity but still accept vendor dependence

Cost profile: Mixed model

  • Medium to high software licensing cost (compute + storage + execution)
  • Lower engineering cost than self-managed
  • Some operational overhead remains (especially for compaction, retention and ingestion tuning)
  • Costs can scale unpredictably with compute-heavy operations

This approach suits organisations that want reduced operational burden without fully outsourcing Iceberg optimisation.

Option 3: Fully managed Iceberg optimisation platform

The final category is a platform specifically designed to manage Iceberg for you. Rather than providing a general-purpose data warehouse or Spark environment, these platforms sit directly on Iceberg and provide intelligent, automated optimisation.

Qlik Open Lakehouse is the clearest example of this model.

What Qlik Open Lakehouse provides

  • Continuous adaptive compaction
  • Intelligent delete file management
  • Metadata pruning and snapshot retention
  • Query-pattern-driven optimisation (not fixed schedules)
  • Real-time ingestion that avoids small file proliferation
  • Built-in governance, lineage and cataloguing
  • Integration with Snowflake, Databricks, Spark, Trino and BI tools
  • Vendor-neutral architecture anchored in open Iceberg tables

In practical terms, Qlik provides the optimisation logic that most teams struggle to build internally.

This removes the need to design

  • Compaction thresholds
  • File sizing strategies
  • Delete cleanup logic
  • Schema evolution workflows
  • Table health monitoring
  • Data quality pipelines
  • Indexing and optimisation rules

Who chooses this model

  • Organisations that want Iceberg without the operational overhead
  • Teams without large platform engineering headcount
  • Businesses prioritising faster time to value
  • CIOs and data leaders aiming to reduce cost and complexity
  • Companies wanting engines like Snowflake, Trino, Spark, Qlik and AI tools to all share the same Iceberg tables

Cost profile: Software-led

  • Predictable software cost
  • Much lower engineering cost
  • Minimal operational overhead (platform handles optimisation)
  • Significant savings in compute due to smarter compaction and retention
  • Faster delivery of data products and analytics

This is the best fit when the goal is to use Iceberg for value creation, not sink time into building platform foundations.

How to choose between the three models

A clear way to frame the decision is by plotting two dimensions:

  1. Engineering capacity
  2. Desire (or need) for control

Self-managed / open-source

  • High control, high engineering cost, slower time to value
  • Lower software cost, high long-term support cost

Semi-managed

  • Medium control, mixed cost model
  • Vendor convenience with some engineering still required

Fully managed

  • Lower engineering cost, fast time to value
  • Predictable software cost, strong optimisation outcomes
  • Vendor neutral and open-data centric

Another helpful breakdown is:

  • Self-managed = cheapest software, most expensive people
  • Semi-managed = balanced software + engineering
  • Fully managed = more software, far fewer people, lowest operational risk (expect significantly lower compute cost too)

Why open lakehouse is a strategic priority for data leaders

Across industries, data leaders are facing rising cloud costs, increasingly complex data estates, and ambitious expectations around AI. Many organisations have already modernised once, typically by moving to cloud data warehouses or early lakehouse models, but are now encountering structural limitations around cost, flexibility and governance.

Against this backdrop, the open lakehouse, built on open standards such as Apache Iceberg, is emerging as a strategic priority for CIOs, CTOs and CDOs who need a simpler, more flexible, and more economically scalable foundation.

The limits of today's data estates

Most enterprise data platforms have evolved gradually over time, creating complexity that is now difficult to unwind. Common challenges include:

  • Multiple warehouses, marts and BI extracts containing similar datasets
  • Tool- and vendor-specific pipelines that duplicate logic and storage
  • Proprietary storage formats that lock data into a single ecosystem
  • Governance, lineage and quality that differ across tools and teams
  • Increasing cloud compute consumption without equivalent business value
  • Slow onboarding of new domains and use cases due to pipeline sprawl

These issues were manageable when analytics was simpler. But with AI, self-service analytics and near-real-time insight now core to business strategy, the cracks are showing.

The modern data estate must be more open, more governed, and more adaptable than the architectures of the past decade.

The open lakehouse

At an executive level, the summary is straightforward:

An open lakehouse brings warehouse-style governance to low-cost object storage using open standards, especially Apache Iceberg, to ensure that all analytics and AI workloads can operate from the same governed tables, without duplicating data across tools.

Put simply: store data once, use it everywhere.

Why Iceberg and the open lakehouse matter to data leaders

Apache Iceberg provides the table format that makes the open lakehouse model work across multiple tools and clouds. For business and technology leaders, the advantages consolidate into several clear strategic benefits.

Cost efficiency

Iceberg tables are stored in inexpensive cloud object storage, while still delivering warehouse-like structure. This eliminates redundant storage layers, reduces data duplication, and avoids locking data inside high-cost proprietary engines. When paired with intelligent optimisation (see earlier section), compute usage also drops significantly.

Vendor flexibility and lower long-term risk

Iceberg is open and engine-neutral. The same tables can be accessed by Snowflake, Databricks, Spark, Trino, Qlik and emerging AI tooling, without conversion or duplication. This reduces dependence on any single vendor and creates freedom to adopt new capabilities as they mature.

Improved governance and auditability

Iceberg's snapshot model provides full historical traceability. Combined with a central catalogue, this enables consistent access control, lineage, quality checks and regulatory compliance across the estate.

AI-ready by design

AI initiatives rely on consistent, high-quality, well-governed data. Iceberg creates a unified data layer that supports analytics, feature engineering, vector search and model monitoring from the same tables ensuring AI workloads operate on trusted data.

Faster delivery and better use of talent

With one shared data layer, teams no longer rebuild pipelines for each tool. They create reusable, governed data products that can serve BI, AI and operational analytics simultaneously. This improves delivery speed and frees talent to focus on high-value work rather than pipeline duplication.

Architectural stability for the next decade

By decoupling data storage from compute, Iceberg provides a stable foundation that can evolve as tools, vendors and AI paradigms change without replatforming.

Together, these benefits demonstrate why the open lakehouse is increasingly seen as a long-term strategic asset, not a tactical upgrade.

A pragmatic roadmap for data leaders

Most successful open lakehouse journeys follow a similar pattern:

  1. Assess the current data estate: duplication, costs, governance gaps, vendor dependencies
  2. Define a future-state architecture centred on open storage and open table formats
  3. Select the right Iceberg delivery model (self-managed, semi-managed, or fully managed)
  4. Prioritise high-value domains or use cases that can deliver measurable ROI quickly
  5. Automate early, including compaction, retention, lineage and data quality checks
  6. Scale to more workloads, teams and tools once the approach is proven

A foundation for the future

The move to an open lakehouse is not about swapping one vendor for another. It is a structural response to long-standing challenges: fragmented data estates, duplicated storage, rising cloud costs, governance inconsistencies and barriers to AI enablement.

The open lakehouse, powered by Apache Iceberg, consolidates data into a single, governed foundation that scales economically, supports AI, reduces vendor dependence and accelerates delivery.

Iceberg makes this possible. Your chosen delivery model determines how quickly and easily your organisation can unlock the value.

For many data leaders, the open lakehouse is becoming the architectural backbone for the next decade of analytics and AI.

 

Need additional help?

If you’re exploring Iceberg or modern lakehouse architectures, our experts can help you evaluate the right approach for your business. If you’d like to talk it through, schedule a call with our team.