Qbeast 0.6: Multi-cloud SaaS, AI-native access, and faster queries

Databricks: Thank you for the tremendous response at Databricks AI Summit 2025. Your interest in our work inspires us to keep pushing the boundaries of data innovation.

| Qbeast Secures $7.6M Seed Funding from PeakXV to Help Open Data Platforms Scale Efficiently. |

Databricks: Thank you for the tremendous response at Databricks AI Summit 2025. Your interest in our work inspires us to keep pushing the boundaries of data innovation.

| Qbeast Secures $7.6M Seed Funding from PeakXV to Help Open Data Platforms Scale Efficiently |

Databricks: Thank you for the tremendous response at Databricks AI Summit 2025. Your interest in our work inspires us to keep pushing the boundaries of data innovation.

← Blogs /

Qbeast 0.6: Multi-cloud SaaS, AI-native access, and faster queries

An efficient, format-agnostic, multi-dimensional clustering engine for the lakehouse— now delivered as a distributed service.

Qbeast 0.6 marks a structural shift: from a library to a distributed, lakehouse-adjacent service.With SOC 2 compliance and multi-cloud deployment across AWS, Azure, and GCP, Qbeast now runs where your data lives—collocating optimization with your lakehouse estate. It is available both as a fully managed SaaS and deployable within your own resources, giving enterprises flexibility across security and operational models.

This release also introduces AI-native access via MCP, along with meaningful query performance gains through new storage and execution optimizations.

Here's what's new.

1. Qbeast is now multi-cloud with flexible deployment models

Qbeast runs natively on Kubernetes, and you can now run it as a managed service or in a provided Kubernetes cluster on AWS, Azure, or GCP — in your VPC. For the lakehouse catalog, we support Hive Metastore (Thrift), JDBC, and AWS Glue.

Deployments are orchestrated with Crossplane, enabling fully reproducible and maintainable infrastructure: cluster provisioning, networking, storage and observability are defined as code rather than bespoke setup.

We support two deployment models:

Customer VPC Managed — Qbeast runs in your cloud account, operated by us
Customer VPC Provided — you provide the Kubernetes cluster; we deploy Qbeast onto it

This release also introduces foundational platform capabilities:

A Control Plane API for programmatic management of tenants, environments, and workspaces, with full Swagger documentation.
qbctl, a CLI that wraps the Control Plane API for operators who'd rather not hit the REST endpoints directly.
Usage metering and billing to track indexed and optimized bytes per table and integrates with Stripe, which means usage-based pricing is now actually wired up.
A full observability stack (Victoria Metrics, Victoria Logs, Grafana, AlertManager) for proactive monitoring.
(Victoria Metrics, Victoria Logs, Grafana, AlertManager) for proactive monitoring.

2. AI-native SQL access via MCP

Qbeast now includes a Model Context Protocol (MCP) server, enabling direct integration with AI tools — e.g., Claude Code, OpenAI Codex, Cursor, and other MCP-compatible agents.This makes Qbeast a first-class data source for agentic workflows.

Key capabilities:

Natural language → SQL → results, executed directly on Qbeast-indexed data
No data movement or custom integration layers required
Secure access via OIDC, with per-session isolation
Admin APIs for session management

The MCP server is available across all supported clouds and includes built-in observability (Prometheus + Grafana). Auth0 is supported as an OIDC provider.The practical outcome: analysts and agents can query optimized lakehouse data conversationally while benefiting from Qbeast’s indexing—without additional pipelines.

Screenshots showing the use of the MCP Server with a Scala shell or via Agentic coding tool such as Claude Code.

3. Query performance: PCA and Storage Partition Joins

Qbeast continues to focus on its core objective: reducing query cost through data layout optimization. Two major enhancements in 0.6:

Page-Cube Alignment (PCA)

A cube-aware Parquet writer that co-locates data by cube within files.

This improves data locality and reduces I/O during query execution—amplifying the benefits of multi-dimensional indexing.

Storage Partition Joins (SPJ)

Qbeast now aligns its weighting model with Iceberg bucketing metadata, allowing Spark to eliminate shuffle operations for joins on bucketed tables. For join-heavy workloads, this translates directly into lower compute cost and faster execution.

Additional improvements

Up to 76% query improvement on single-column indexing using quantiles on a customer data set and up to 33% improvement with multi-column indexing
~66% reduction in auto-optimization cycle time via leveled compaction
Prior 0.5 gains retained: ~40% faster write performance

4. Onboarding existing tables

Qbeast supports both new and existing tables, with flexible onboarding strategies. For large datasets, full rewrites are often impractical. Qbeast allows:

Incremental optimization based on customer-defined thresholds
Indexing over existing partitioned tables
Layout improvements without requiring full data reorganization

This capability is particularly relevant for enterprises operating at multi-terabyte or petabyte scale.

Diagram describing the global indexing of a table combined with physical partitioning

Also in this release

We have additionally included the following:

Full stack upgrades: Spark 4.0, Scala 2.13, Java 21
Qlens CLI pipeline for query-driven optimization and benchmarking
Hive Metastore support via PostgreSQL JDBC
Envoy Gateway replacing ingress-nginx (retired and no longer supported) for improved OIDC and connection stability
Security updates, including mitigation of recent Apache Parquet vulnerabilities

What's next

With 0.6.0, Qbeast now supports all three major open table formats (Iceberg, Delta Lake, Hudi) across all three major cloud providers (AWS, GCP, Azure), with automated deployment into the customer environment. Query performance — the thing we care most about — also sees a number of enhancements in this release.

What's next: enabling agentic AI over the lakehouse. As autonomous agents increasingly operate directly on data, efficient storage layout and query performance become critical to controlling compute costs. Qbeast is positioning itself as the optimization layer that makes this viable at scale.

‍

← Blogs /

Qbeast 0.6: Multi-cloud SaaS, AI-native access, and faster queries

From Chaos to Canvas: Repainting the Lakehouse with Multidimensional Indexing