See live examples and documentation at hashquery.dev
BI tools like Tableau and Looker are great for creating standard reports, but they have one fatal flaw: once you create definitions, it’s nearly impossible to get the logic out. Calculations and definitions are trapped in these tools by design. The next generation of tools will need to free the logic — not only because today’s ecosystem of tools requires access to metrics logic, but also because Large Language Models will only be useful for analytics if they can tap into your organization’s core logic.
Today, we’re announcing three major releases:
A way to model data and metrics **in python so that the logic can be reused.
A way to import and query semantic models from python. (You can query models from our BI tool Hashboard and reuse metrics.)
A way of expressing and querying event-style analyses like funnels. This saves tons of time and effort writing complex SQL to generate funnels.
Three things. Modeling, querying and funnels. Modeling, querying, funnels. Are you getting it? These are not three separate Python packages. These are all features of a single Python framework: Hashquery!
Semantic layers and BI tools exist because SQL isn’t composable
SQL has endured for the last fifty years and will likely endure for the next fifty: it’s an accessible and useful language for querying data and every technical data analyst knows it. But SQL has one fundamental flaw: there’s no way to save calculations and logic in SQL.
So, if you’re a subscription-based business and you figure out a helpful SQL query that calculates new subscriptions in a given month, there’s no place to store that calculation using only SQL.
Enter Business Intelligence (BI) tools: the defacto place to store operational and business calculations — either as raw SQL snippets in tools like Mode and Hex, or in Domain Specific Languages in tools like Looker with LookML.
The primary motivation of BI tools, and semantic layers, is to solve for the technical limitations posed by SQL: that it is not very composable and does not allow for the efficient reuse of logic.
Challenge: the only way to query a semantic layer is with a GUI
The way you get data out of a semantic layer is by clicking around in your BI tool’s Graphical User Interface (GUI). This is for a good reason: semantic layers have been primarily designed to empower humans to find robust answers to their questions. For example, you might code your profit calculation in LookML. And you query LookML by clicking around in an exploratory GUI. Historically, this is also how users have queried Hashboard.
But this approach isn’t entirely satisfying:
What if you need to use that logic in a client-facing application?
What if you need to use that logic for a machine learning model (use it as a feature store)?
What if you’re a data scientist and want to analyze an A/B experiment in a notebook, and you want to use the same metrics that you use in your BI tool?
What if you want to make your semantics accessible to AI assistants and Large Language Models? LLMs are bad at clicking around in GUIs, but write good Python.
If your BI tool can generate interesting SQL that is hard to write from scratch, it’s sort of annoying that you can’t access that logic except by clicking around. The Hashquery funnels API is a great example of this — it will save hundreds of lines of sql, and it would be a shame if that logic was just trapped in Hashboard.
So, the data ecosystem has a great language for querying data (SQL) and a great way for defining semantics (Business Intelligence tools). But, ironically, we have no good way of querying semantic layers.
Querying and modeling together
While building Hashquery, we took a lot of inspiration from Malloy, which cleverly mixes data modeling and querying in a single programming interface. As a data scientist, their approach makes a lot of sense to me. I love Randy Au’s years-old blog post that points out that data cleaning, data modeling and data analysis are all part of a continuous workflow. Our tools should better reflect that continuous workflow.
Hashquery allows you to define models, query and build up complex logic — all in Python that compiles to SQL and runs directly against the data warehouse. One important distinction between Hashquery and Malloy, though, is that Hashquery also allows you to import data models from Hashboard, our BI tool.
import hashquery as hq
# this imports the product_events model direclty
# from Hashboard and runs the requisite query against the data warehouse
experiment_metrics = hq.project.models.product_events
# or define models inline
new_one_off_model = hq.Model().with_source().with_attributes("user", "timestamp")
# get data from models
experiment_metrics.df()
We also designed Hashquery to be extensible, so we built out a module for event analytics to prove that it could adapt to different usecases.
Event and funnel analytics — with 100% less clunky SQL
The secret of Hashquery is that we started building it for ourselves. We wanted to add event-style analytics inside of Hashboard similar to Amplitude or Mixpanel. Event-style analytics are incredibly common, but notoriously awkward to construct in BI tools. Usually you end up writing complex SQL to pre-pivot data and get it into the right shape for a BI tool. While your organization uses Amplitude and Mixpanel, you almost always end up having to build funnels in your canonical BI tool as well.
We built a python API to allow our application to do these types of analytics. It turns out it was so useful for our internal development that we thought we would share it with the world. What was once dozens or hundred of lines of SQL is now a single API call, which compiles sql and issues it to your data warehouse. And it’s surprisingly performant in our early testing.
event_model.funnel(
"landing-page",
"viewed-product",
"started-checkout",
"paid"
)
To get started from scratch, use this simple call to generate an event style model and a product funnel:
event_model = Model(
connection="bigquery", sql_query=""
).with_activity_schema(
time=a.timestamp,
event=a.event_type,
group=a.users
)
# an aggregate funnel, or use funnel().source_data to get a user list
event_model.funnel(
"signed-up","finished-onboarding","paid"
).df()
Why not a DSL?
We built Hashquery because we loved the expressiveness and power of data modeling languages like Malloy but wanted to avoid the portability and developer experience drawbacks of custom Domain Specific Languages. Hashquery is a python package that can run anywhere, and it can be used as a querying language, a semantic layer or a headless BI API — or all three at once.