Qbeast Open Source

Open source is about more than just code, it’s about creating a space where people from all backgrounds collaborate on building something great together.

Empowering a diverse community
and democratizing Big Data.

Qbeast Format

Based on Delta Lake format, Qbeast adds the necessary information to query efficiently
We organize the data in what we call “cubes”. Each cube’s elements are written in a single parquet file, allowing the query engine to filter out some of them before reading their content.

Qbeast Format

ACID properties
Multi-column index
Efficient sampling
Resource saving

Apache Spark Integration

    • df.write.format("qbeast").option("columnsToIndex",
      "your,columns").save("your-storage-path")
    • (df.write.format("qbeast").option("columnsToIndex",
      "your,columns").save("your-storage-path"))
    • val qbeastDf = spark.read.format("qbeast").load("your-storage-path")
    • (qbeastDf = spark.read.load("your-storage-path", format="qbeast"))
    • qbeastDf.sample(0.1).show
    • qbeastDf.sample(0.1).show
    • qbeastDf.createOrReplaceTempView("qbeast_table")
      spark.sql("SELECT * FROM qbeast_table TABLESAMPLE(1 PERCENT)")
Write on your favourite object storage
  • df.write.format("qbeast")
    .option("columnsToIndex",
    "your,columns")
    .save("your-storage-path")
  • (df.write.format("qbeast")
    .option("columnsToIndex",
    "your,columns")
    .save("your-storage-path"))
Load the data onto a Spark Data Frame
  • val qbeastDf =
    spark.read.format("qbeast")
    .load("your-storage-path")
  • (qbeastDf = spark.read.load
    ("your-storage-path",
    format="qbeast"))
And query with sampling
  • qbeastDf.sample(0.1).show
  • qbeastDf.sample(0.1).show
  • qbeastDf.createOrReplaceTempView(
    "qbeast_table")
    spark.sql("SELECT * FROM
    qbeast_table
    TABLESAMPLE(1 PERCENT)"
    )

Faster than plain Spark

Apache Spark

Time: 151.36s
Result: 37.869383

Apache Spark + Qbeast Sample

Time: 6.62s
Result: 37.856333

Why Qbeast Open Source?

Because it embodies some of the internet’s best qualities: cooperation, knowledge, and skill sharing in the pursuit of a common goal.

Public usage
Share our technology openly to help in every way we can.
Community
The entire community benefits from the collective innovation.
Drive Adoption
People can help the project by adding what they need, making it grow faster and more inclusively.

An open source data lakehouse enhancement with efficient data sampling

An open source data lakehouse enhancement with efficient data sampling

Our partners

Join our newsletter

Subscribe to our newsletter to receive product announcements.

    Contact us info@qbeast.io

    C/ Roc Boronat 117, 2a Planta, 08018 Barcelona

    © 2020 Qbeast
    Design by Xurris