Author: clemensjesche

Qbeast raises €2.5m to make data lakes fast and easy to use

February 23, 2023

Today we are proud to announce our seed round of €2.5m led by Elaia, with the participation of new investors Sabadell Venture Capital and Uber Founding CTO Oscar Salazar and existing investor Inveready. Previous investors include BStartup Banco Sabadell and business angels.

The funding will allow us to accelerate go-to-market by filling key commercial positions and further investing in product development, technology, and partnerships.

Faster, cheaper, and easier-to-use data lakes

Qbeast’s solution simplifies the work for engineers managing data lakes. Qbeast organizes all the companies’ data so that they can read it efficiently from any solution they prefer, making it way cheaper to use their favorite Business Intelligence tool or train their Machine Learning models at scale.

“Companies dealing with data have little choice: if they want to know what is happening in their business, they need to use a data warehouse. While if they want to predict the future, optimize its processes and use Machine Learning, they need a data lake. They end up using different technologies and needing different people, but also with double the cloud bill and double the time to develop, which is a huge problem. At Qbeast, we are going to change this by making the life of data teams easier and data tools more efficient .” explains Cesare Cugnasco (CEO).

Qbeast developed the most advanced open-source format for data lakes to power its solution

We are building a community around the open-source data lake format. The Engineering team is currently further integrating the format within the modern data infrastructure stack to appeal to even more companies regardless of their technology stack.

Faster analytics and ML model training to drastically reduce cloud costs and energy consumption

For one of our clients, Qbeast improved execution time in their data analytics by 68% with full precision and by up to 50 times using sampling, enabling faster analytics, cloud cost savings, and lower energy consumption.

Qbeast has already signed paying clients, such as a Cybersecurity company. Furthermore, Qbeast collaborates with Preply with the aim of improving the efficiency of Machine Learning model training. Qbeast’s innovative solutions can be particularly interesting for Marketplaces, E-Commerce platforms, IoT, and companies from industries such as Advertising & Marketing, Manufacturing, Retail, and Financial Services.

“We are proud to back Qbeast and their highly-capable team anchored in advanced research. The founders are on a mission to reduce the friction that prevents hundreds of thousands of companies from efficiently leveraging data lakes because of their perceived complexity. There is a lot of value to be created by Qbeast and we can’t wait to start this journey together.” adds Sébastien Lefebvre, Partner at Elaia.

A deep tech spin-off of the Barcelona Supercomputing Center

The trade-off between flexibility and efficiency in data analytics was already a pressing concern in the Barcelona Supercomputing Center when Cesare Cugnasco (CEO) started to design Qbeast in 2015, way ahead of the battle between Data Warehouse and Data Lake solutions. In 2020, he gathered a team of 4 co-founders, Pol Santamaria, Paola Pardo, Clemens Jesche, and Nicolas Escartin, and it was the start of Qbeast. Today, Qbeast put together a team of 14 talents, 4 of which joined since the closing of the round. The team combines skills in engineering and business and plans to grow further over the next year.

Thank you

What we have achieved so far would not have been possible without our fantastic Qbeast community. Thanks to our partners, customers, investors, advisors and team members for all of your contributions and support!

About Qbeast 
Qbeast optimizes the organization of data to simplify the work of Engineers and make it faster and cheaper to get insights, build data products, and train Machine Learning models at scale. Qbeast developed the most advanced open format for data lakes to power the solution. Qbeast is a spin-off of the Barcelona Supercomputing Center.

Learn more https://qbeast.io https://github.com/Qbeast-io/qbeast-spark

© 2020 Qbeast. All rights reserved.
Share:

Back to menu

Continue reading

Why the industry is moving towards an open data ecosystem

November 3, 2022

Is vendor lock-in suddenly out of fashion? Looking at recent headlines, it very much seems so.

Google: “Building the most open data cloud ecosystem: Unifying data across multiple sources and platforms” 

Google announced several steps to provide the most open and extensible Data Cloud and to promote open standards and interoperability between popular data applications. Some of the most interesting steps are the following:

  • Support for major data formats in the industry, including Apache Iceberg, and soon Delta Lake and Apache Hudi.
  • A new integrated experience in BigQuery for Apache Spark, an open-source query engine.
  • Expanding integrations with many of the most popular enterprise data platforms to help remove barriers between data and give customers more choice and prevent data lock-in.

Snowflake: “Iceberg Tables: Powering Open Standards with Snowflake Innovations” 

Snowflake recently announced Iceberg Tables to combine Snowflake capabilities with the open-source projects Apache Iceberg and Apache Parquet to solve challenges such as control, cost, and interoperability. With Iceberg tables, companies can benefit from the features and performance of Snowflake but can also use open formats, tools outside of Snowflake, or their own cloud storage. 

To put that into perspective. We just read the announcements of two leading providers of proprietary cloud data warehouses that they are opening their systems. This is remarkable because having customers and their data locked in solutions is an excellent business for those providers.

Why is this happening, and why are players such as Google and Snowflake joining the movement toward an open data ecosystem?

Why we need an open data ecosystem

Digital transformation is held back by challenges that can only be tackled and solved with an open approach. Companies have a significant part of data use cases where proprietary warehouse solutions are not well suited. Those include complex and machine learning use cases such as demand forecasting or personalized recommendations. Companies also require flexibility to adjust quickly to a fast-changing environment and to take full advantage of all their data. Being dependent on the roadmap of a single provider limits the ability to innovate. If a new provider offers a solution that is ideal for your needs or complements your existing solution, you want to be able to take that opportunity. This interoperability and flexibility are only possible with open standards.

On top of that, the current macro-environment forces companies to optimize their spending on data analytics and machine learning, and costs can escalate quickly with proprietary cloud data warehouses. 

The convergence of Data Lakes and Data Warehouses 

We saw that cloud data warehouse providers are moving towards an open ecosystem, joining other companies at the forefront of the movement, such as Databricks and Dremio, among others. They are pushing for the Data Lakehouse approach

In a nutshell, the Data Lakehouse combines the advantages of data warehouses and data lakes. It is open, simple, flexible, and low-cost. It is designed to allow companies to serve all their Business intelligence and Machine Learning use cases from one system. 

Open data formats

A crucial part of this approach are open data formats such as Delta Lake, Iceberg, or Hudi. Those formats provide a Metadata and Governance Layer or, let’s say, the ¨magic¨ to solve the problems of traditional data lakes. Traditional data lakes do not enforce data quality and lack governance. Users can also not work on the same data simultaneously, and only limited metadata is available to provide information on the data layout, which makes loading data and analysis very slow.

.

How Data Lakehouses benefit companies

Companies such as H&M and HSBC have already adopted the open Data Lakehouse approach, and many others will follow. 

H&M, for example, faced the problem that their legacy architecture couldn’t support company growth. Complex infrastructure took a toll on the Data Engineering team, and scaling was very costly. All of this led to slow time-to-market for data products and ML models. Implementing a Data Lakehouse approach, in this case with Databricks on Delta Lake, led to simplified data operations and faster ML innovations. The result was a 70% reduction in operational costs and improved strategic decisions and business forecasting.¹

HSBC, on the other hand, was able to replace 14 databases with Delta Lake. They were able to improve engagement in their mobile banking app by 4,5 times with more efficient data analytics and data science processes.²

So, does the Data Lakehouse solve it all? Not quite; the reality is that some challenges still need to be addressed.

Pending problems

Firstly, the performance of solutions based on open formats is not yet good enough. There is a heated debate ongoing on Warehouse vs. Lakehouse performance, but I think it’s fair to say that, at least in some use cases, the Lakehouse still needs to catch up. Data Warehouses are optimized for the processing and storage of structured data and are very performant in those cases. For example, if you want to identify the most profitable customer segments for the marketing team based on the information you collected from different sources.

Secondly, working with open formats is complex, and you need a skilled engineering team to build and maintain your data infrastructure and ensure data quality.

How Qbeast supports the open data ecosystem

At Qbeast, we embrace the open data ecosystem and want to do our part to push it forward. We developed the open-source Qbeast Format, which improves existing open data formats such as Delta Lake.

We enhance the metadata layer and use multi-dimensional indexing and efficient sampling techniques to improve performance significantly. Simply put, we organize the data smarter so it can be analyzed much faster and cheaper.

We also know that data engineering is a bottleneck for many companies. Serving the data requirements for Business Intelligence or Machine Learning use cases can be tricky. Data needs to be extracted, transformed, and served correctly. Developing and maintaining these ETL processes is a considerable challenge, especially when your engineering power is limited. At Qbeast, we built a managed solution to ensure those processes run smoothly. We handle data ingestion and transformations and ensure data quality. We make sure that the data layout is optimal for consumption so that the tools you use for BI or ML run in the most efficient way possible. This means that we not only help to break the engineering bottlenecks but that we also help companies to realize significant cost savings.

We use open-source formats and tools, so we make sure to help companies with the latest and best tools available in the open data ecosystem.

An open data ecosystem is the future

We are extremely excited to see the industry moving towards an open data ecosystem, and we are convinced that it is the future. As Sapphire Ventures points out in their blog, the benefits for customers are clear: cost-effectiveness, scalability, choice, democratization, and flexibility. 

At Qbeast, we are dedicated to accelerating this transition and supporting an ecosystem that enables companies to pick the right tools from the best providers without worrying about compatibility and switching costs. To power true innovation.

References

About Qbeast
Qbeast is here to simplify the lives of the Data Engineers and make Data Scientists more agile with fast queries and interactive visualizations. For more information, visit qbeast.io
© 2020 Qbeast. All rights reserved.
Share:

Back to menu

Continue reading

© 2020 Qbeast
Design by Xurris