πŸ“Š Cutting Analytics Costs for Fund Performance Workloads

πŸ“Š Cutting Analytics Costs for Fund Performance Workloads

How a leading global asset manager reduced data scanned by up to 99% and executor runtime by up to 88% on Databricks Photon.

Executive Summary

Use Case: fund performance analytics, NAV history, share-class lineage, factsheet generation, compliance lookups, and AI/ML feature access in a modern Lakehouse environment.

  • 99% less data scanned on the largest 25-billion-row fund analytics query.
  • 88% reduction in executor runtime on the same workload, directly supporting lower Databricks compute cost.
  • No disruption to downstream users: no SQL rewrites, pipeline changes, dashboard migration, or new query engine.

‍

‍

The Challenge

High-value tables, high-cost access

For asset managers, a handful of analytical tables sit at the center of the operating model: daily fund results, composite returns, NAV history, performance attribution facts, and other datasets that support reporting and decision-making across the firm.

These are the datasets behind morning checks, monthly fact sheets, quarterly reporting, compliance reviews, institutional client requests, and analyst research. They are also some of the most expensive datasets to operate.

The reason is straightforward: they grow every business day,across every fund, portfolio, share class, vehicle, and return period. At the same time, more teams depend on them as reporting, analytics, AI, and self-service use cases expand.

In a modern Lakehouse environment, this growth is expected.The challenge is that cost often scales with data volume and consumption - not necessarily with business value.

That was the challenge we evaluated with a leading global asset manager. Their data services team had built a modern and well-architected Lakehouse: open formats on S3, Databricks for processing, and governed self-service access for downstream teams. The platform worked. The issue was not architecture. The issue was cost trajectory.

As fund analytics workloads grew, the firm needed a way to reduce compute cost and improve performance without disrupting existing pipelines, dashboards, or analyst workflows.

Qbeast was evaluated for exactly that purpose.

Can we materially reduce analytics cost and improve query performance without changing the existing SQL, pipelines, dashboards, or user experience?

‍

This is the problem we set out to evaluate with a leading global asset manager. Their data services team had built exactly the right kind of Lakehouse: open formats on S3, Databricks for processing, governed self-service for downstream teams. It worked. It was just getting more expensive every quarter, and the trajectory wasn't bending.

‍

The Workload

What we tested

The evaluation focused on five representative queries from the asset manager's real analytical workload.

These were not synthetic benchmarks. They were production-shaped SQL queries reflecting how analysts, reporting jobs, and downstream systems actually interact with fund-performance data.

  • A long-horizon aggregation against a 25-billion-row, 600 GB fact mart, filtered to a single investment vehicle across a 22-year window. The kind of query that powers performance attribution and compliance look ups on a specific fund family.
  • A monthly NAV history retrieval for a single portfolio across share classes β€” the workhorse behind tear sheets and pricing reviews.
  • A share-class lineage lookup β€” find the earliest inception date per source identifier for a given portfolio. Standard fund administration.
  • A return-data pull for a specific portfolio at "Before Fees", joined to investment account metadata. The shape of a query a fact sheet generator runs thousands of times a day.
  • A second variant of the lineage lookup, used to test consistency.

The common thread: highly selective filters on portfolio identifiers, vehicle classes, share classes, and date ranges, joined to small reference tables, often grouped by month or by share class for the final aggregate. This is what the daily life of a fund-analytics Lakehouse looks like.

We compared the existing Delta tables against Qbeast-indexed equivalents on the same Databricks Photon runtime, with the Spark environment restarted between runs to eliminate caching effects. Tables, queries, pipelines, dashboards β€” all unchanged. The only difference was the layout underneath.

‍

Proof of Value

The headline result

The largest query produced the most significant result.

On the 22-year, 25-billion-row fact table, Qbeast reduced data scanned from 82 GB to 877 MB - a 99% reduction.

  • Records read dropped from 14.25 billion to 326 million.
  • Elapsed time fell from 4.1 minutes to 30seconds.
  • Executor runtime, the metric most closely aligned with Databricks compute cost, dropped by 88%.

That result matters because it shows what is possible on the types of large, frequently queried tables that drive fund analytics costs.

Query 1 Β· 25-Billion-Row Fact Mart

22-year fact aggregation against a 600 GB table. The kind of query that runs at the edge of what's feasible. On the largest query in our test, against the 25-billion-row fact mart, the results are unambiguous.

Elapsed Time
Before
4.1 min

With Qbeast
30 sec
8.4Γ— FASTER
Bytes Read
Before
82 GB

With Qbeast
877 MB
99% LESS SCANNED
Records Read
Before
14.25 B

With Qbeast
326 M
98% FEWER ROWS
EXECUTOR RUNTIME β€” THE METRIC THAT MAPS DIRECTLY TO COMPUTE COST β€” DROPPED 88%

But the impact was not limited to the largest query. Across the smaller fund daily results workloads, the pattern remained consistent:significantly less data scanned, lower executor runtime, and improved cost efficiency.

Results Β· All 5 Queries Β· Photon

Query Executor Runtime Improvements with Qbeast

Lower is better. Executor runtime maps most directly to Databricks compute cost. Reductions held consistently across all five queries β€” from the 25B-row fact mart down to the smallest single-portfolio lookups.

Baseline
Qbeast
Values in seconds (Photon executor runtime).
Query 1 β€” 25B-row fact mart aggregation
972.7s
βˆ’88%
117.1s
Shorter Queries (Different Scale)
Query 2 β€” Monthly NAV history per portfolio
29.8s
βˆ’51%
14.6s
Query 3 β€” Share-class inception lookup
53.8s
βˆ’62%
20.2s
Query 4 β€” Portfolio return retrieval with join
7.6s
βˆ’71%
2.2s
Query 5 β€” Share-class inception (variant)
14.4s
βˆ’52%
6.9s

Query What it does Bytes read (baseline β†’ Qbeast) Executor runtime
1 Long-horizon fact aggregation on 25B-row table 82 GB β†’ 877 MB (βˆ’99%) βˆ’88%
2 Monthly NAV history per portfolio 60.6 MB β†’ 21.2 MB (βˆ’65%) βˆ’51%
3 Share-class inception lookup 595 MB β†’ 131 MB (βˆ’78%) βˆ’62%
4 Portfolio return retrieval with join 4.1 MB β†’ 6.0 MB βˆ’71%
5 Share-class inception (variant) 595 MB β†’ 131 MB (βˆ’78%) βˆ’52%
A note on Query 4: Query 4 read slightly more bytes than the baseline because the original query was already extremely selective, scanning only 4.1 MB. Executor runtime still dropped by 71%, showing the value of better-organized data as well as lower scan volume.

‍

Second-Order Benefits

Why this matters for asset managers

The immediate benefit is lower compute cost. But the broader value is strategic.

When core analytics queries become faster and cheaper, the operating model changes. Teams can ask more questions, run deeper analysis, and support more downstream use cases without requiring a proportional increase in infrastructure spend.

Analysts stay in the flow of work
A four-minute query creates friction. An analyst runs the query, switches context, responds to messages, and comes back later. By then, the analytical thread may be broken. A 30-second query is different. It keeps the analyst inside the question. That matters for performance analysis, portfolio investigation, compliance review, and client reporting. The faster the query returns, the more naturally analysts can explore, refine, and validate their thinking. For this asset manager, one of the key hypotheses was that interactive BI directly on the Lakehouse could become a practical reality - not just a technical aspiration. The evaluation results strongly supported that hypothesis.
Richer questions become economically viable
In many financial institutions, analytical questions are shaped by infrastructure cost. A performance query may be scoped to five years because a 22-year view is too expensive to run repeatedly. A cohort analysis may be simplified because the granular version scans too much data. A compliance lookup may be constrained by partition boundaries rather than business relevance. When executor runtime drops by 50-88% on representative workloads, those constraints begin to change. Teams can ask the question they actually want answered, rather than the cheaper proxy version of that question.
AI and ML pipelines become easier to scale
Fund history is increasingly important for AI-driven workflows: anomaly detection, risk modeling, portfolio explainability, performance commentary, and retrieval-augmented analytics. But AI and ML workflows often require repeated access to large volumes of historical data. Without efficient data access, model development becomes slow, expensive, and operationally complex. A common workaround is to build separate sampled marts or pre-aggregated datasets. That adds maintenance overhead and creates additional governance complexity. With multi-dimensional indexing, teams can access relevant subsets of large fund-result tables more efficiently, helping data scientists iterate during the workday rather than scheduling heavy jobs overnight. The bottleneck moves back to where it should be: model quality, not data access cost.
Data services teams gain capacity
Federated data services teams are constantly asked to support new reporting views, new factsheet variants, new regulatory cuts, and new institutional client requirements. Each new request has a compute footprint. When the underlying tables become materially more efficient, more requests can be approved without triggering a new cost escalation. The team gains capacity without requiring a re-platforming project or asking downstream users to change how they work. That operational flexibility is difficult to capture in a single benchmark, but it is often where the business value becomes most visible.

‍

Why this workload is a strong fit for Qbeast

Asset management analytics has a distinctive query pattern,and that pattern is well suited to multi-dimensional indexing.

‍

Filters arrive in combinations.

An analyst rarely filters on only one field.

A typical query may ask for a specific portfolio, share class, vehicle, fee treatment, and date range. Traditional partitioning often optimizes for one primary column. Sorting and clustering can improve access along a chosen order, but filters outside that order may still require unnecessary scanning.

Qbeast organizes data across multiple dimensions, allowing the query planner to prune more effectively across combinations of filters.

That is particularly valuable for fund analytics, where different teams use the same tables in different ways.

‍

The same tables serve many business functions.

Performance reporting, compliance, fact sheet generation,exploratory analysis, and AI workflows may all depend on the same fund-result tables.

Optimizing for only one query shape can create tradeoffs elsewhere. A sustainable layout strategy needs to work across the broader filter space, not just for one reporting workload.

Qbeast is designed for that type of multi-dimensional access pattern.

‍

The optimization is transparent to downstream users

One of the most important aspects of this evaluation was what did not change.

  • There was no new query engine.
  • No dashboard migration.
  • No pipeline rewrite.
  • No new SDK.
  • No change to analyst behavior.
Qbeast provides multi-dimensional pruning aligned to real-world financial use-cases.

‍

Qbeast operates beneath the table, preserving the existing Lakehouse architecture while improving how data is physically organized.

For a governed self-service environment, that matters. The data services team can improve performance and cost efficiency centrally, while downstream consumers continue using the tools and queries they already know.

Because Qbeast operates beneath the table β€” the format stays open, the queries don't change, the pipelines keep running β€” the gains accrue to every downstream team without coordination. There's no "migrate your dashboard to the new system" conversation. There's no new SDK. The team that tunes the table and the teams that consume it can stay decoupled, which is the whole point of governed self-service.

Bending the cost curve

The most important takeaway is not that one query became faster.

The important takeaway is that the cost profile of a critical fund analytics workload changed.

For the largest query, data scanned fell by 99% and executor runtime dropped by 88%. Across representative daily fund-result queries,executor runtime fell by 50-71%.

That level of improvement does more than reduce the current Databricks bill. It creates headroom.

  • Headroom for new analytics products.
  • Headroom for AI initiatives.
  • Headroom for regulatory reporting.
  • Headroom for institutional client requests.
  • Headroom for more interactive exploration on the Lakehouse.

And it does so without forcing the firm into an architectural migration.

For an asset manager that has already invested in a modern Lakehouse, this is an important distinction. The architecture was already sound. What changed was the physical layout layer underneath it.

That is why the evaluation was so compelling: it showed that meaningful cost reduction and performance improvement can be achieved without disrupting the systems, workflows, and governance model already in place.

‍

Conclusion

Fund analytics workloads are only becoming larger, more complex, and more widely consumed. For asset managers, the question is not whether these datasets will grow. They will.

The question is whether infrastructure cost must grow at the same rate.

This evaluation showed that it does not have to.

By applying Qbeast's multi-dimensional indexing to existing Lakehouse tables, a leading global asset manager significantly reduced data scanned, lowered executor runtime, and improved query performance - without changing pipelines, dashboards, or SQL.

For firms running similar workloads across fund administration, NAV history, performance reporting, fact sheet generation,compliance, or AI-driven analytics, the opportunity is clear:

You may not need a new platform to reduce analytics cost. You may need a better layout for the data you already have.

‍

Reach to us to request a demo,and explore how Qbeast can help reduce the cost of your fund analytics workloads:Β https://qbeast.io/request-demo

Get in touch
Running fund administration, NAV history, or factsheet workloads on a Lakehouse? We'd like to hear about it.
Request a demo β†’

‍

← Blogs /
πŸ“Š Cutting Analytics Costs for Fund Performance Workloads
From Chaos to Canvas: Repainting the Lakehouse with Multidimensional Indexing