Jay Grossman
Jay Grossman
Published on:

Associating dbt models to queries in Snowflake

Associating dbt models to queries in Snowflake

Since 2017, I have been userof dbt - the most popular open source framework for transforming data in data warehouses. I have brought the framework into my last 4 companies, so needless to say I am a big fan boy as it has solved a bunch of common problems.

This week I got to attend dbt labs's biggest conference in Las Vegas - Coalesce conference. Overall there have been some excellent talks and I have gotten to meet a lot of people at all different stages in their careers who are solving interesting data related problems. It has been awesome to catch up with some folks who I haven't seen in person in many years.

I want to find the dbt code for slow queries

Yesterday I attended a session titled "Supercharge your data pipelines with AI & ML using dbt Labs and Snowflake". During the session I asked a question to the moderator:

When a query runs slow and I find it running slowly in Snowflake’s Profiler, how can I can make it easier to associate the query to the dbt model where the code lives that I will need to refactor?

Some background on queries in Snowflake

Snowflake's Query History shows queries run, who ran it and how long they took. I generally like to spend time looking at the longest running (most expensive) queries to look for opportunities to refactor them and save money.

snowflake_query_history
Snowflake's Query History view.

You can see the details of the query by clicking on the query link, including the SQL that was run and Query Profiler. The Profiler shows valuable diagnostic information about the query run and which logical parts of the query are the most expensive.

snowflake_query_profiler
Snowflake's Query Profile view.

The solution:

While the moderator of the session did not have much interest providing a solution, one of the audience members had a great suggestion:

You can put a comment with the model’s name in the SQL that dbt generates, so you’ll be able to see it in the query run.

Adding a comment with the model name using a macro

We can a dbt macro that will add the model name as shown below:

1
2
3
{% macro model_comment() %}
/* source dbt model: {{ model.name }} */
{% endmacro %}

Then we can add the comment to the bottom of the SQL file for the model:

1
2
3
4
5
6
SELECT 
  field1
  , field2
FROM my_table

{{ model_comment() }}
Automatically adding this info to all dbt models

While the macro is simple and works well, I am lazy and don't want to go through the exercise of adding to the 100's of models in my projects. I'd like to automagically add it to all of the models!

In dbt, the query-comment configuration in dbt_project.yml allows you to inject custom comments into the SQL queries that dbt runs against your database. This feature serves several important purposes:

  • Query Attribution: It helps attribute SQL statements to specific dbt resources like models and tests.
  • Debugging and Tracing: Custom comments can aid in debugging by providing additional context about the query's origin and purpose.
  • Customization: You can customize the comment to include relevant information such as the user running the query, the dbt version, or any other metadata you find useful1.

So we can add this into our dbt_project.yml file:

1
2
3
query-comment: 
  comment: "source dbt model: {{ node.name }}"
  append: True

We need to use the {{node}} object at a project level, as the {{model}} object is not available at a project level. We need the append: True to add the comment to the bottom of the SQL file because Snowflake will filter out comments at the top of the file.

I have the following model named my_dbt_model.sql in my dbt project:

1
2
3
4
SELECT 
  field1
  , field2
FROM my_table

I ran the following to run the model in our Snowflake environment:

1
dbt run -s my_dbt_model

When I look at the query history in Snowflake, I can now see the following sql for my_dbt_model.sql (with the comment line at the bottom):

1
2
3
4
5
6
7
8
9
create or replace table MY_DB.MY_SCHEMA.my_dbt_model 
      as
      (
SELECT 
  field1
  , field2
FROM my_table
  )
/* source dbt model: my_dbt_model */;