Another non-comprehensive list of things I have read and/or thought about since last time:
Data Is Plural: A weekly newsletter with links to datasets. I have been down the professional ETL rabbit hole for a while now, and the thought of a dataset just existing as it is, without being some steady stream of new data shifting and changing, is a relief. Sometimes, data is just data. https://www.data-is-plural.com/
The new Zed editor is promising, but right now it is just a text editor with a built-in chatbot. I will definetely return to it, but it probably needs some time. https://zed.dev/
It seems everyone loves the new UV package manager, I see the selling point (speed), but it has never bothered me. I will probably change my mind at some point, but right now I don’t need yet another package manager to keep me occupied.
I wonder what the cheapest options for orchestration tools are, and what the trade-off between cheap and safe are. I guess the question has too many variables to answer, when you are in a small business and need to run arbitrary python code, what is the cheapest option? Github Actions comes to mind as cheap and safe, but it has a 6 hour limit. Airflow is free, but securing it gets complicated and risky. Azure WebApps have a built-in IAP (read: SSO) and can run docker-compose files… Maybe that is a useful compromise?
dbt unit tests fail for incremental models. I guess I have something to look forward to, mocking the
{{ this }}
object has been a lifelong dream for me ever since I thought about it.I made a simple CLI utility,
dbt-autodoc
to use GPT-4 to help document dbt models by parsing the manifest, extracting the code and upstream dependencies for a model, and send it off to GPT-4 to create ayml
file that gets written to the project (or to stdout, depending on your preference). It will probably go through a major rewrite to accomodate some further feature ideas. https://github.com/radbrt/dbt-autodoc