Six Moths of Dbt Unit Testing

Nov 26, 2024 · 1187 words · 6 minute read

It is about 10 months since I first wrote my post on dbt unit tests. That was before launch, before betas. dbt unit tests were released with dbt 1.8 in early may, and I have had the chance to do some real dbt development since then.

Why does it feel different now? 🔗

It is not often I think of a minor release as transformative. But dbt 1.8 was. For the first time (in dbt, anyways), I could write tests as part of writing the logic.

My development cycle used to be something like:

Write half-decent SQL
Run query
inspect result
pick example rows
compare with source tables
repeat

Nowadays, while I still look at database tables, it is a much smaller part of my workflow. Today it is something more like:

write half-decent SQL
run query
inspect result
write test
run test
inspect test result
repeat

Theoretically I could start by writing the test, as prescribed by TDD, but I’m a pragmatist.

The dos and donts 🔗

As with anything, some ways of using unit tests are better than others.

The perhaps stupidest way, that I have caught myself doing a couple of times, is to write the SQL and paste source tables and query result into the unit test. It feels smart because, in Snowflake, the object_construct_keep_nulls(*) function assembles an almost perfect set of json input and output rows. Such a convenient way to get all the input and all the output without all the typing. But it negates the entire purpose of the test.

Related to the object_construct approach, it is probably a bad idea to include every single column of the source tables in the unit test. The wildcard thing is really tempting, but as soon as you remove a column in the upstream table, your unit tests are going to fail. And that could be a lot of unit tests.

A better way to use unit tests is to make sure they cover the things that aren’t in the data. In my case, I have a lot of single-element arrays. Guess what my unit tests have? Dual-element arrays.

Unit tests are also great to isolate pieces of logic. The complicated case-when with an order-by stuck in there. Forget all the other columns. Unit tests gives you a place to focus on that one thorny issue.

And unit tests are documentation. When you deliver code, you can prove what it does.

It helps you think - if you let it 🔗

It came as a surpise to me, but unit tests can really help you think. By writing a test, it spells out what you think the SQL will do. When the test fails, it isn’t necessarily that the SQL is wrong, but maybe your idea of what a sensible answer should be is off.

A failure mode here is to go blind. If the unit tests give you a huge diff, it is only a matter of time before you start mindlessly changing values to whatever the actual values were. This happens more often than I like to admit, and it has even masked actual errors.

The rough parts 🔗

I’m pretty sure dbt has a longer wishlist than I do, but here are the few I have noted:

unit tests when the ref is an ephemeral model requires input to be defined as SQL. It isn’t that hard to do, but it is annoying - a little more verbose, but mostly just something you need to remember.
There is a curious quirk in unit-testing incremental models (hang on): You can define this as an input to a test, but because dbt uses the structure of the table as part of the test, the table needs to exist before the unit test can run. But dbt build runs the unit tests before running the model, meaning that if your CI uses the build command, you can’t make a unit test of an incremental model where you simulate a pre-populated target table.
The diff output can be hard to read. It is basically a tiny CSV printed to the command line, and so getting the column name from a diff in some row is hard. Two things that could help: Copy the output to a CSV file and use the “rainbow csv” plugin in VSCode to lessen the eye strain, and/or make sure the test values in your input data hints at what column it is in. So instead of using “A” and “B” for all categorical columns, use “PRODUCT1” and “PRODUCT2” for the product column and “CATEGORY1” and “CATEGORY2” for the product category column.
Testing relative dates can be icky. If your SQL uses the current timestamp directly or indirectly, you might write unit tests that work today but fail tomorrow if you aren’t careful. So far I have been able to get away by writing dates either far in the past or far in the future. Hopefully we will have migrated our code by 2499.

A final rough part, which is not the fault of dbt but rather the inherent nature of data, is that the tests can get complex. Especially when revisiting a test, it can be hard to discern what you wanted to test. Not to mention of someone else needs to work on them. There is a lot of state, often many input tables, and the tests can get both long and wide. Comment what you are doing, and prefer many small tests to one big test - for readability and sanity, if nothing else.

Unit tests are not pytest 🔗

Compared to tests python, dbt unit tests have one big advantage: They are easy. You don’t have to mock an S3 bucket or anything your code might interact with. It is SQL. It interacts with other models. The tests can be just about the tests. So if you think pytest and mocks and patches when you hear unit tests, rest assured it isn’t like that.

Unit tests and data tests are very different 🔗

The contrast between data tests and unit tests become starker the more I use them. I write data tests whenever I have some general idea of what the data should look like. I will add a not_null or unique mostly for old times sake, but those tests are no longer necessary to ensure I wrote my join correctly. Instead, I see data tests more as a relative to freshness tests. They let you know that it is time to look closely at the data. My favorite data test nowadays is not_constant on columns that are linked to the data collection process. If you collect data from all your stores around the country, make sure there is more than one store in the data.

If I were to guess where dbt tests will go from here, I would say it will focus on a new deliniation between “we shouldn’t recreate this table because it is wrong” type of tests, and “this is a quality issue you should look into” type of tests. Both of these types of tests fall in the data-test category, but they are also not the same.