We’re thrilled to announce a massive new release of Qbeast Platform and Portal — v0.4.0 is live! This release brings powerful indexing capabilities to more formats, improves support for dynamic data, and adds advanced automation and observability tools across the board.
Whether you’re using Delta Lake, Apache Hudi, or Apache Iceberg, this release packs something for you.
Our first-ever Qbeast +Iceberg integration is now available in preview!
Qbeast can now index and optimize Apache Iceberg tables — unlocking blazing-fast queries on massive datasets with full compatibility and documentation. Qbeast metadata is packed into Iceberg’s native Puffin metadata files.
Qbeast 0.4.0 brings GA support for Google Cloud Platform (GCP)! Tested in conjunction with Google Dataproc and BigQuery, making it easier than ever to enhance your GCP-native analytics stack with blazing fast indexing and query efficiency.
Support for Apache Hudi has been in preview for some time, and goes GA with this release. With comprehensive support with the Qbeast portal, table insights along with Qbeast optimization.
Data lakes aren’t static—and now your indexing engine isn’t either. With Qbeast 0.4.0, you can Delete,Update, and Merge data in indexed tables without breaking performance.
We leverage a new Resilient Index Builder and native MoR strategies to seamlessly handle changes, even for unindexed files.
Qbeast now automatically handles files that result from external writers and DML operations. You can configure optimization policies that detect and reindex unindexed data, preserving performance without manual intervention.
This means that Qbeast can transparently integrate with data written from existing workflows in external engines!
Enable automatic, incemental optimization via a table property:
ALTER TABLE table-name SET TBLPROPERTIES (
'use.optimization.autoOptimize.enabled'='true',
'use.optimization.autoOptimize.optimizeUnindexedFiles.enabled'='true'
);
Qbeast Table
API via SQL:
OPTIMIZE table-name;
and via Spark Scala DSL:
qbeastTable.optimize(0L, fraction = 0.5)
Highly skewed data can play havoc with table optimization strategies. It can lead to unbalanced files, or high-levels of overlap between files. Qbeast indexing has advanced capability to better deal with skewed data by offering support to index data according to that columns quantile distribution. Basically,taking the frequency of given values into account, which can result in a huge performance & efficiency boost for these columns.
SELECT skewness(brand) from table_name;
Any value less than -1 or greater than 1 is significant skew, and will benefit from quantile indexing.
Then, using Spark with Qbeast, compute column stats and add as column statistics when indexing:
val columnQuantiles =QbeastUtils.computeQuantilesForColumn(df, "brand")
val columnStats =s"{"brand_quantiles":$columnQuantiles}"
df
.write
.mode("overwrite")
.format("qbeast")
.option("columnsToIndex", "brand:quantiles")
.option("columnStats", columnStats)
.save("/tmp/qbeast_table_quantiles")
Alongside the platform release, the Qbeast Portal got a serious upgrade.
We focused on visibility, usability, and speed.
· Ingestion Completion Rate: Track indexing status and data ingestion completion in real-time.
· 30-Day Optimization Tracking: Visualize how much data is being optimized over time.
· Color-Coded Navigation: Find what you need faster with our updated sidebar.
· Performance Boosts: Faster load times with smarter query caching.
· Fresh Branding: We’ve polished up the interface to match our bold new direction.
We’ve added feature flags for advanced performance tuning, including:
· Leveraging sampling during indexing to accelerate very large data-set indexing.
· Improvements to Qbeast file roll-up techniques,which is a strategy for preventing too many small files. Roll ups can now handle more edge cases and produce consistent file sizes when consolidating multiple small clusters.
These can help improve consistency and reduce overhead in larger workloads. Use them with care — but stay tuned as they stabilize!
We now publish separate JARs for Delta, Hudi, and Iceberg. No more bundling extra dependencies — just grab what you need.
We crushed bugs, reduced noise in logs, and added more flexible time formats and error handling for non-deterministic queries.
We also improved our CI workflows,dependency graphs, and internal separation between integration and unit testing.
📚 Full Release Notes& Changelog »
🌐Portal Access & Documentation »
A huge thank you to all our contributors and early adopters who continue to shape Qbeast into the go-to optimization& efficiency layer for open Lakehouse architectures.
We’re just getting started.
Follow us on LinkedIn and X/Twitter for updates — and join the conversation with #qbeast.