Automate dbt review on GitHub
Today, we are bringing many DevOps concepts to the data engineering world. One of the most basic measures is static code analysis used to flag programming errors, bugs, stylistic errors and suspicious constructs. It makes our programming code more readable and maintainable as well. Especially, integrating static code analysis tools with continuous integration (CI) is very useful. As I am a dbt lover, I would like to automatically check coding style for team development taking advantage of CI, especially GitHub Actions. Standardized coding style on SQL for dbt can help us to maintain even complicated SQL queries.
In this article, I would like to describe automating SQL coding style leveraging sqlfluff which is a SQL linter and auto-formatter and reviewdog which leaves comments about violations of coding style to GitHub pull requests. By automating reviewing style check, we can be more focus our time on what matters.
What is sqlfluff?
SQLFluff is a SQL linter and auto-formatter. It is a dialect-flexible and configurable SQL linter. SQLFluff also works with Jinja templating and dbt. It supports various dialects not only ANCI SQL but also BigQuery, Hive, Redshift, Snowflake and so on. A strong point for dbt users is that it officially supports JINJA template and dbt models. So, I am assuming many dbt users take advantage of SQLFluff on their projects so that they detect violations of SQL queries. The reason why I use sqlfluff is that it is the best open-sourced linter for dbt at the moment. There are some other linter for SQL, but quire few target dbt.
SQLFluff has some sub commands for a SQL linter and auto-formatter. For instance, sqlfluff lint
enables us to find violations of SQL coding style. As it is very flexible, we can easily customize coding standard by selecting rules from the reference. We can also fine-tune rules in a configuration file.
$ sqlfluff lint models/== [models/test.sql] FAIL
L: 1 | P: 1 | L003 | Single indentation uses a number of spaces not a multiple of 4
L: 1 | P: 14 | L006 | Operators should be surrounded by a single space unless at the start/end of a line
L: 1 | P: 27 | L001 | Unnecessary trailing whitespace
What is reviewdog?
Reviewdog is an automated code review tool integrated with any code analysis tools regardless of programming language. For instance, action-shellcheck
is a github action to show violations of shellcheck. Getting resulting statuses of CI jobs is of course significant. But, it is very useful to visually understand what violates coding style at where by leaving comments in pull requests.
We can take advantage of existing github actions with reviewdog. Besides, reviewdog enables us to easily implement a custom action using Reviewdog Diagnostic Format (RDFormat) as a generic diagnostic format. All we have to do is to pass a JSON object follows RDFormat to the reviewdog CLI.
cat <"violations.rdjson" | reviewdog -f=rdjson
What if SQLFluff meets reviewdog?
As I described, SQLFluff itself is very useful. Furthermore, there are a couple of github actions to run SQLFluff to get the overall status of CI jobs. But, visually understanding each violation would be helpful to review dbt models. We can skip a boring step to give feedbacks about just coding style violation. So, I implemented yu-iskw/action-sqlfluff
to automatically leave comments in pull requests with sqlfluff and reviewdog.
The image below is an example of a comment on a github pull request. The github action leaves the comment about how the part violates and its rule ID. As SQLFluff has the rule references, a link to the corresponding rule is put as well.
How is the action used?
We can put the action to steps as below. The action has many inputs to control sqlfluff and reviewdog. For instance, we can specify the version of sqlfluff with sqlfluff_version
. Please make sure the README document and action.yml
in the github repository.
Since the action is a container action, we don’t have to care the python environment. It doesn’t impact other python environments of other steps, even if we set up another python environment with something like actions/setup-python
.
One thing I have to note is the limitation of version. If we use dbt v1, I would recommend we have to set sqlfluff_version
to 0.9.0
which is the latest version at the time of writing the article., because sqlfluff==0.8.2 doesn’t work with dbt v1. So, if we use dbt v0, sqlfluff_version
can be 0.8.2
and so on.
name: sqlfluff with reviewdog
on:
pull_request:
jobs:
sqlfluff:
name: runner / sqlfluff (github-check)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: yu-iskw/action-sqlfluff@v2
id: lint-sql
with:
github_token: ${{ secrets.github_token }}
reporter: github-pr-review
sqlfluff_version: "0.9.0"
config: "${{ github.workspace }}/.sqlfluff"
paths: '${{ github.workspace }}/models'
Summary
I write up the article because I would like to introduce the github action to automatically leave comments about coding style violations of SQL and dbt models on github. By doing that, we don’t have to give feedback about coding style. Rather than that, we more focus on what matters semantically. I plan to support sqlfluff fix
in the action to automatically format dbt model files, rather than leaving comments on pull requests in the next version. I hope it makes you more productive to develop dbt models.
Last but not least, I was able to implement the action easily, thanks to sqlfluff and reviewdos. What I basically implemented is to transform the resulting JSON object by sqlfluff to the RDJson format. I appreciate the creators and contributors of the both projects.
We are hiring!
We are hiring not only in Japan, but also in Singapore. We are looking for talented persons to change the world together. If you are interested in Ubie, please contact me. I am glad to chat with you. Please feel free to leave comments.