Qbeast Open Source

Open source is about more than just code, it’s about creating a space where people from all backgrounds collaborate on building something great together.

Empowering a diverse community
and democratizing Big Data.

Qbeast Format

Based on Delta Lake format, Qbeast adds the necessary information to query efficiently
We organize the data in what we call “cubes”. Each cube’s elements are written in a single parquet file, allowing the query engine to filter out some of them before reading their content.

ACID properties
Multi-column index
Efficient sampling
Resource saving

Apache Spark Integration

    • df.write.format("qbeast").option("columnsToIndex",
    • (df.write.format("qbeast").option("columnsToIndex",
    val qbeastDf = spark.read.format("qbeast").load("your-storage-path")
    • (qbeastDf = spark.read.load("your-storage-path", format="qbeast"))
    qbeastDf.sample(0.1).show
    • qbeastDf.sample(0.1).show
    • qbeastDf.createOrReplaceTempView("qbeast_table")
      spark.sql("SELECT * FROM qbeast_table TABLESAMPLE(1 PERCENT)")
Write on your favourite object storage
  • df.write.format("qbeast")
  • (df.write.format("qbeast")
Load the data onto a Spark Data Frame
  • val qbeastDf =
  • (qbeastDf = spark.read.load
And query with sampling
  qbeastDf.sample(0.1).show
  • qbeastDf.sample(0.1).show
  • qbeastDf.createOrReplaceTempView(
    spark.sql("SELECT * FROM

Faster than plain Spark

Apache Spark

Time: 151.36s
Result: 37.869383

Apache Spark + Qbeast Sample

Time: 6.62s
Result: 37.856333

Why Qbeast Open Source?

Because it embodies some of the internet’s best qualities: cooperation, knowledge, and skill sharing in the pursuit of a common goal.

Public usage
Share our technology openly to help in every way we can.
The entire community benefits from the collective innovation.
Drive Adoption
People can help the project by adding what they need, making it grow faster and more inclusively.

