💫 Qbeast v0.4.0 is Here

💫 Qbeast v0.4.0 is Here

DML Support, Apache Iceberg and Hudi Support, Auto-Optimization & More!
← Blogs /
💫 Qbeast v0.4.0 is Here
From Chaos to Canvas: Repainting the Lakehouse with Multidimensional Indexing

We’re thrilled to announce a massive new release of Qbeast Platform and Portal — v0.4.0 is live! This release brings powerful indexing capabilities to more formats, improves support for dynamic data, and adds advanced automation and observability tools across the board.

Whether you’re using Delta Lake, Apache Hudi, or Apache Iceberg, this release packs something for you.

💥 Highlights at a Glance

❄️ Iceberg Just Got Qbeasted

Our first-ever Qbeast +Iceberg integration is now available in preview!
Qbeast can now index and optimize Apache Iceberg tables — unlocking blazing-fast queries on massive datasets with full compatibility and documentation. Qbeast metadata is packed into Iceberg’s native Puffin metadata files.

☀️ Qbeast Now on Google Cloud

Qbeast 0.4.0 brings GA support for Google Cloud Platform (GCP)! Tested in conjunction with Google Dataproc and BigQuery, making it easier than ever to enhance your GCP-native analytics stack with blazing fast indexing and query efficiency.

🏔️ Hudi Goes General Availability

Support for Apache Hudi has been in preview for some time, and goes GA with this release. With comprehensive support with the Qbeast portal, table insights along with Qbeast optimization.

✅ DML Support Has Landed (Delete,Update, Merge)

Data lakes aren’t static—and now your indexing engine isn’t either. With Qbeast 0.4.0, you can Delete,Update, and Merge data in indexed tables without breaking performance.
We leverage a new Resilient Index Builder and native MoR strategies to seamlessly handle changes, even for unindexed files.

♻️ Auto-Optimization of UnindexedFiles

Qbeast now automatically handles files that result from external writers and DML operations. You can configure optimization policies that detect and reindex unindexed data, preserving performance without manual intervention.

This means that Qbeast can transparently integrate with data written from existing workflows in external engines!

Enable automatic, incemental optimization via a table property:

ALTER TABLE table-name SET TBLPROPERTIES (
'use.optimization.autoOptimize.enabled'='true',
'use.optimization.autoOptimize.optimizeUnindexedFiles.enabled'='true'
);

Manual optimization is also supported through the Qbeast Table API via SQL:

OPTIMIZE table-name;

and via Spark Scala DSL:

qbeastTable.optimize(0L, fraction = 0.5)

📊 Smarter Layouts through Quantile-based Indexing

Skewed data? No problem.

Highly skewed data can play havoc with table optimization strategies. It can lead to unbalanced files, or high-levels of overlap between files. Qbeast indexing has advanced capability to better deal with skewed data by offering support to index data according to that columns  quantile distribution. Basically,taking the frequency of given values into account, which can result in a huge performance & efficiency boost for these columns.

How do I know if I need it? Quite easy, actually. Simply query for “skewness” of the data:

SELECT skewness(brand) from table_name;

Any value less than -1 or greater than 1 is significant skew, and will benefit from quantile indexing.

Then, using Spark with Qbeast, compute column stats and add as column statistics when indexing:

val columnQuantiles =QbeastUtils.computeQuantilesForColumn(df, "brand")
val columnStats =s"{"brand_quantiles":$columnQuantiles}"

df
.write
.mode("overwrite")
.format("qbeast")
.option("columnsToIndex", "brand:quantiles")
.option("columnStats", columnStats)
.save("/tmp/qbeast_table_quantiles")

📈 Qbeast Portal v0.4.0:Your Data at a Glance

Alongside the platform release, the Qbeast Portal got a serious upgrade.
We focused on visibility, usability, and speed.

🎯 New in the Portal

·     Ingestion Completion Rate: Track indexing status and data ingestion completion in real-time.

·     30-Day Optimization Tracking: Visualize how much data is being optimized over time.

·     Color-Coded Navigation: Find what you need faster with our updated sidebar.

·     Performance Boosts: Faster load times with smarter query caching.

·     Fresh Branding: We’ve polished up the interface to match our bold new direction.

⚙️ Experimental Performance Features

We’ve added feature flags for advanced performance tuning, including:

·     Leveraging sampling during indexing to accelerate very large data-set indexing.

·     Improvements to Qbeast file roll-up techniques,which is a strategy for preventing too many small files. Roll ups can now handle more edge cases and produce consistent file sizes when consolidating multiple small clusters.

These can help improve consistency and reduce overhead in larger workloads. Use them with care — but stay tuned as they stabilize!

📦 Modular Spark Support

We now publish separate JARs for Delta, Hudi, and Iceberg. No more bundling extra dependencies — just grab what you need.

🐞 Bug Fixes & Improvements

We crushed bugs, reduced noise in logs, and added more flexible time formats and error handling for non-deterministic queries.

We also improved our CI workflows,dependency graphs, and internal separation between integration and unit testing.

📦 Download, Docs & GettingStarted

📚 Full Release Notes& Changelog »

🌐Portal Access & Documentation »

🎉Thank You

A huge thank you to all our contributors and early adopters who continue to shape Qbeast into the go-to optimization& efficiency layer for open Lakehouse architectures.
We’re just getting started.

Follow us on LinkedIn and X/Twitter for updates — and join the conversation with #qbeast.