Understanding the scopes of dbt tags
dbt (data build tool) is really a great tool, as I posted “5 reasons why BigQuery users should use dbt” before. Especially, dbt tags is very useful to select models depending on the situation by taking advantage of model selection syntax. In the article, I describe the scopes of dbt tags that I misunderstood before. And that can be a pitfall for others too.
How can we use dbt tags?
Assume if we have a dbt source with various level tags as below. We can annotate dbt tags to a source, a column and a test respectively.
The dbt CLI provides very useful syntax to select dbt models, tests and so on. If we want to run only dbt schema testswith the tag_z
tag, we can pass the syntax to--models
option.
$ dbt test --models "tag:tag_z"
What are the scopes of dbt tags by level?
In the beginning, I didn’t understand the scopes of dbt tags correctly. In the case of above, if I execute dbt test --models "tag:tag_x"
, the schema testunique
in id
column is not executed. But, the schema test is executed. It seems that dbt tags has a kind of scope inheritance.
Table-level tags
I call tags like tag_x
as table-level tags. The scope affects the schema tests which are not_null
and unique
, the id
column even though the two schema tests doesn’t have tag_x
.
$ dbt test --models "tag:tag_x"16:21:46 | 1 of 3START test not_null_sample_gcp_project__sample_dataset__users_id [RUN]
16:21:50 | 2 of 3 START test source_unique_sample_gcp_project__sample_dataset__users_id [RUN]
16:21:55 | 3 of 3 START test not_null_sample_gcp_project__sample_dataset__users_name [RUN]
Column-level tags
I call tags like tag_y
as table-level tags. The scope affects the schema tests which are not_null
and unique
of the id
column, even though the two schema tests doesn’t have tag_y
.
$ dbt test --models "tag:tag_y"16:21:46 | 1 of 2 START test not_null_sample_gcp_project__sample_dataset__users_id [RUN]
16:21:50 | 2 of 2 START test source_unique_sample_gcp_project__sample_dataset__users_id [RUN]
Test-level tags
This case is clearly intuitive. I call tags like tag_z
as table-level tags. The scope affects only the schema test with unique
of theid
column.
$ dbt test --models "tag:tag_z"16:21:50 | 1 of 1 START test source_unique_sample_gcp_project__sample_dataset__users_id [RUN]
Summary
In the article, I described and introduced the different scopes of dbt tags.
- Table-level tags affect all schema and data tests under a source.
- Column-level tags affect all schema tests under a column.
- Test-level tags affect only a schema test.